Everything you need to know about PDF
PDF (Portable Document Format, .pdf) is the universal document format created by Adobe in 1993 and standardized as ISO 32000 in 2008. It preserves exact visual fidelity across all platforms - the same PDF looks identical on a phone, a printer, and a 1995 Windows machine. This determinism is its superpower.
How it works under the hood
- Page-as-program. Each page is a small program in a subset of PostScript. Text, vectors, images, and forms are drawn by executing operators on a graphics state stack.
- Cross-reference table. The file ends with an `xref` table that maps every object to its byte offset, enabling random access without parsing the whole file - essential for large documents and viewers.
- Embedded fonts. PDFs embed font subsets to guarantee identical rendering. This is why a PDF made on macOS still looks identical on Windows or Android.
- Encryption and signatures. PDF supports AES-256 encryption, digital signatures (PKCS#7), redaction, and DRM. PDF/A is an ISO subset designed for long-term archival (no JavaScript, no external dependencies, embedded fonts mandatory).
Where you'll actually use it
- Final-form documents: contracts, invoices, certificates, forms
- Print-ready files (PDF/X variant for prepress workflows)
- Long-term archival (PDF/A, ISO 19005)
- Digital books and academic papers (embedded fonts ensure correct math symbols and diacritics)
How it compares to alternatives
PDF vs DOCX: PDF is read-only and pixel-perfect; DOCX is editable and reflows by viewer. PDF vs EPUB: PDF preserves layout exactly; EPUB reflows for screen size (better for e-readers). PDF vs HTML: HTML is fluid and accessible; PDF is fixed and harder to make accessible.
Things that will trip you up
- Scanned PDFs are images of text, not text - run OCR (Tesseract, Adobe Acrobat) before text extraction
- Forms come in two incompatible flavors: AcroForms (older, widely supported) and XFA (XML-based, deprecated by Adobe)
- JavaScript inside PDFs is a known security risk - keep your reader updated and disable JS for untrusted files