Everything you need to know about XML
XML (eXtensible Markup Language, .xml) is the W3C's general-purpose markup language from 1998. Once the dominant data interchange format on the web, XML has been displaced by JSON for most uses but remains entrenched in enterprise systems, document formats, and protocols where its strict schemas, namespaces, and extensive tooling matter.
How it works under the hood
- Tag-based markup. Every element is `<tag attribute='value'>content</tag>`. Tags must be properly nested and closed - XML is stricter than HTML.
- Schemas. XSD (XML Schema Definition) and DTD (Document Type Definition) define what tags/attributes are valid. JSON has nothing equivalently mature (JSON Schema came much later).
- Namespaces. `xmlns` attributes prevent tag name collisions when combining XML from multiple sources - critical for SOAP, RSS, RDF, ATOM.
- XPath and XSLT. XPath queries XML like SQL queries databases. XSLT transforms one XML to another. No equivalent ecosystem for JSON.
Where you'll actually use it
- SOAP web services (still huge in banking, insurance, healthcare)
- RSS/Atom feeds
- Office Open XML (DOCX, XLSX), OpenDocument (ODT, ODS)
- Configuration files for Java enterprise (Spring, Maven, Hibernate)
How it compares to alternatives
XML vs JSON: JSON is lighter and faster; XML has better schema validation and tooling for complex documents. XML vs YAML: YAML is human-friendlier; XML has stronger validation. Modern web prefers JSON; legacy enterprise prefers XML.
Things that will trip you up
- XXE (XML External Entity) attacks are real - disable external entity resolution unless you need it
- XML is much more verbose than JSON - 30-50% larger for equivalent data
- Whitespace handling between elements is parser-dependent - explicit `xml:space='preserve'` saves debugging time