Concept · Metadata
What is XMP metadata
in PDF?
XMP (Extensible Metadata Platform) is an XML packet embedded inside the PDF file itself, describing the document: title, author, dates, rights, and arbitrary custom schemas. It is standardised and machine-readable, so any document management system, validator, or archiving tool can read it without parsing the page content. This guide explains what XMP is, why it matters, and how it underpins standards like PDF/A, ZUGFeRD, and PDF/UA.
The simplest analogy: a catalog card inside the book
Think of a library catalog card: it records the title, author, subject, call number, and publication date so the librarian can find and classify the book without opening every page. In a traditional library the card lives in a separate drawer and can get lost or fall out of date.
XMP metadata is that catalog card, but stored inside the book's own cover so it can never be separated. When you send a PDF, the metadata travels with it. Any tool that receives the file, whether a DMS, a validator, an archiving system, or a search engine crawler, can read the title, the author, the creation date, and the conformance identifiers without rendering a single page.
The /Info dictionary and XMP: old and new
PDFs have always had a place for metadata. The older mechanism is the
/Info dictionary. The modern, extensible mechanism is XMP.
Both can be present in the same file, and for archival PDFs they must be.
The /Info dictionary
A flat key-value store in the PDF trailer. Defined in PDF 1.0, still supported everywhere.
Fixed schema, no namespaces, no custom fields. Cannot hold conformance identifiers for PDF/A, ZUGFeRD, or PDF/UA.
XMP metadata
A full XML packet embedded as a stream. Standardised by ISO 16684, extensible by design.
Open, extensible namespaces (Dublin Core, PDF/A, ZUGFeRD, PDF/UA, and your own). Machine-readable by any XML tooling.
Modern PDFs use XMP. PDF/A requires both to be present and consistent: the same title and author must appear in both places, and neither may contradict the other.
Where XMP carries conformance identifiers
XMP is how a PDF announces which standard it follows. Validators check the XMP schemas first. If the identifier is missing or wrong, the file fails conformance regardless of how its content is structured.
PDF/A
pdfaid:part / pdfaid:conformanceThe pdfaid namespace tells validators the conformance level (1b, 2b, 2a, 3b, 3a). Without it, a PDF that follows every other rule still fails PDF/A. See What is PDF/A?
ZUGFeRD / Factur-X
fx: DocumentType / ConformanceLevelThe Factur-X fx schema records the document type, version, and profile (MINIMUM through EXTENDED). Accounting systems read this to know how to parse the embedded XML. See What is ZUGFeRD / Factur-X?
PDF/UA
pdfuaid:partThe pdfuaid identifier declares accessibility conformance (PDF/UA-1). Assistive technologies and validators use it to confirm the file's structure tree and tagging requirements. See What is PDF/UA?
Document management
dc: / xmp: namespacesDMS platforms index XMP fields (title, author, subject, keywords, dates) to power full-text search, filtering, and classification without parsing page content.
Digital preservation
xmpMM: / xmpRights:Archiving systems record provenance, rights, and modification history in XMP so a document's origin and lineage remain readable for decades, independent of the application that created it.
Custom schemas
any namespaceXMP is open by design. Any organisation can define a namespace for its own metadata, from workflow status to approval signatures, and embed it in a standard-compliant packet.
Keeping /Info and XMP in sync
PDF/A section 6.7.3 is strict: every field that appears in both the
/Info dictionary and the XMP must carry the same value.
If the title in /Info says "Draft" and the XMP says
"Final", the file fails conformance. This is a common authoring mistake
when metadata is set in two separate places by two separate code paths.
Every call to set_info updates a shared entry list.
When to_bytes runs, rust-pdf generates both the
/Info dictionary and the XMP packet from that one list.
They are guaranteed to agree. There is no separate "XMP step" that
can drift from the PDF trailer metadata.
For cases where you supply a fully custom XMP packet via
set_xmp, you take responsibility for keeping the
namespaces and values consistent with any /Info fields you
also set. The veraPDF validator will catch any mismatch.
How to set XMP metadata with rust-pdf
Set standard fields, or embed a full custom XMP packet.
# pip install rustpdf
import rustpdf
ed = rustpdf.EditableDoc.load(open("document.pdf", "rb").read())
ed.set_info("Title", "Annual Report 2026")
ed.set_info("Author", "Finance Team")
ed.set_xmp(open("metadata.xmp", "rb").read()) # full custom XMP packet
ed.save("document_meta.pdf")
// dotnet add package RustPdf
using RustPdf;
using var ed = EditableDoc.Load(File.ReadAllBytes("document.pdf"));
ed.SetInfo("Title", "Annual Report 2026");
ed.SetInfo("Author", "Finance Team");
ed.SetXmp(File.ReadAllBytes("metadata.xmp"));
ed.Save("document_meta.pdf");
// go get github.com/rustpdf/rustpdf-go@latest
ed, _ := rustpdf.Load(mustRead("document.pdf"))
defer ed.Close()
ed.SetInfo("Title", "Annual Report 2026")
ed.SetInfo("Author", "Finance Team")
ed.SetXMP(mustRead("metadata.xmp"))
ed.Save("document_meta.pdf")
// npm install rustpdf
const { EditableDoc } = require("rustpdf");
const fs = require("fs");
const ed = EditableDoc.load(fs.readFileSync("document.pdf"));
ed.setInfo("Title", "Annual Report 2026");
ed.setInfo("Author", "Finance Team");
ed.setXmp(fs.readFileSync("metadata.xmp"));
ed.save("document_meta.pdf");
When you also enable PDF/A via pdfa(), rust-pdf
automatically generates both the /Info dictionary and the
XMP conformance block from the same entry list, so they are always in
sync. Full details in the documentation.
XMP metadata FAQ
What is XMP metadata?
XMP (Extensible Metadata Platform) is an XML metadata packet embedded directly inside a PDF file. It describes the document in a machine-readable way: title, author, creation and modification dates, subject, keywords, and arbitrary custom schemas. Because it is embedded in the file itself, the metadata travels with the document and can be read by any conforming PDF reader or document management system without opening the page content.
How is XMP different from the Info dictionary?
The /Info dictionary is an older PDF structure that stores basic fields (Title, Author, CreationDate, and so on) as key-value pairs inside the PDF trailer. XMP is a modern XML-based alternative that is extensible, richer, and standardised. Modern and archival PDFs use XMP. PDF/A requires both to be present and to agree with each other. rust-pdf builds both from a single source so they are always in sync and can never disagree.
Why does PDF/A require XMP?
PDF/A is the archival PDF standard and it requires XMP so that metadata is stored in a standardised, open, machine-readable format that remains accessible in the future. The pdfaid conformance identifier in the XMP tells any reader or validator which part and conformance level the file adheres to. PDF/A section 6.7.3 also requires the XMP and /Info dictionary to be kept in sync so there is no ambiguity between the two metadata sources.
How do XMP and standards like ZUGFeRD / PDF/UA relate?
XMP is the declaration layer that every modern PDF standard uses to announce its conformance. PDF/A writes the pdfaid identifier, ZUGFeRD and Factur-X add the fx schema with document type, version and profile, and PDF/UA adds the pdfuaid identifier. Any validator (veraPDF, Acrobat, a DMS) checks XMP first to know which rules to apply. Without the correct XMP identifiers, a file fails conformance even if all the structural rules are met.
How do I set PDF metadata in code?
With rust-pdf, call set_info on an EditableDoc to set individual fields like Title or Author, or call set_xmp with a full custom XMP packet in bytes to embed your own XML. Both methods are available across all nine supported languages: Python, C#/.NET, Go, Node.js, Java, PHP, Ruby, Delphi, and Swift.
Set and embed XMP metadata in your language
One Rust core, nine language bindings, and a single source of truth for your metadata. Prototype for free, license the corporate features when you ship.