Concept · Reading PDFs
Search Text in a PDF
and Get Its Coordinates
Last updated: 2026-06-29
Extraction tells you what a PDF says. Positional search tells you where it says it. Give rust-pdf a query and it returns one bounding box per match (page, x, y, width and height in PDF points), so you can drop a signature next to an anchor word, highlight a clause, or place a redaction exactly over a phrase. One call, no OCR.
The missing link between extraction and stamping
Plenty of tools can pull the text out of a PDF, and plenty can draw new content onto a page. The gap is connecting the two: you read a contract, you know it contains the word “Signature”, but to put a signature box there you need its coordinates, and a flat string has thrown those away.
Positional search keeps them. As it walks the page it tracks the current text matrix, the graphics matrix (CTM) and each glyph's advance width, so every character it decodes carries a position in PDF user space. When your query matches a run of characters, it unions their boxes into a single rectangle and hands it back. Those are the same coordinates the drawing operators use, so the rectangle is immediately usable for an overlay, a visible signature, a highlight or a redaction.
How positional search works
Three steps, all from the bytes already in the file: no font files, no rendering, no OCR.
Track the matrices
The content stream is replayed operator by operator. cm, Tm, Td, T* and TJ updates keep the running text and graphics matrices, so the origin of the next glyph is always known.
Advance per glyph
Each glyph's width comes from the font's /Widths (simple fonts) or /W+/DW (Type0/CID fonts); no font program is parsed. Width plus matrix gives every glyph an axis-aligned box, and the code maps to Unicode via the ToUnicode CMap.
Match and union
Your query is matched against the page's glyph stream (case-insensitive by default). For each occurrence, the boxes of the covered glyphs are unioned into one rectangle, returned as { page, text, x, y, width, height } in points.
What positional search unlocks
Once you have a box, the page becomes addressable.
Anchor a signature
Search for “Signature:” or a party name, then place a visible signature widget or appearance exactly beside it, even when the layout shifts between documents.
Highlight & annotate
Turn a search term into a highlight rectangle or a link annotation on every page it appears, the way a viewer's find-and-highlight works, but headless and in bulk.
Locate table cells
Find a column header or a label and use its coordinates to slice the region beneath it, a lightweight way to pull values out of fixed-layout reports and forms.
Targeted redaction
Search for sensitive phrases such as account numbers, names or regulated terms, and feed each box into a redaction so the covered content is removed, not just hidden behind a rectangle.
Verify placement
In a generation pipeline, assert that a heading or a total landed where it should by searching for it and checking the returned coordinates in a test.
Stamp watermarks by anchor
Place a “DRAFT” or approval stamp relative to a found word rather than the page centre, so it tracks the content instead of the geometry.
Search a PDF in one call
Returns a list of hits, each with a page index and a bounding box in points.
# pip install rustpdf
import rustpdf
pdf = open("contract.pdf", "rb").read()
for hit in rustpdf.find_text(pdf, "Signature"):
print(hit.page, hit.x, hit.y, hit.width, hit.height)
// dotnet add package RustPdf
using RustPdf;
byte[] pdf = File.ReadAllBytes("contract.pdf");
foreach (var hit in Pdf.FindText(pdf, "Signature"))
Console.WriteLine($"{hit.Page} {hit.X} {hit.Y} {hit.Width} {hit.Height}");
// go get github.com/rustpdf/rustpdf-go@latest
data, _ := os.ReadFile("contract.pdf")
hits, _ := rustpdf.FindText(data, "Signature", false)
for _, h := range hits {
fmt.Println(h.Page, h.X, h.Y, h.Width, h.Height)
}
// npm install rustpdf
const { findText } = require("rustpdf");
const fs = require("fs");
for (const h of findText(fs.readFileSync("contract.pdf"), "Signature")) {
console.log(h.page, h.x, h.y, h.width, h.height);
}
The same function is available in every language binding. Full reference in the documentation.
PDF text search FAQ
How do I find the position of text in a PDF?
Pass the PDF bytes and a query to rust-pdf's find_text. It returns one hit per occurrence, each with the page index and a bounding box (x, y, width, height) in PDF user-space points, origin at the page's lower-left. Those are the coordinates content is drawn in, so a box can be fed straight into an overlay or a visible signature.
What coordinate system are the boxes in?
PDF user space: points (1/72 inch), origin at the lower-left, y pointing up, the same space the drawing operators use. Page /Rotate is not applied; the box is where the text lives in the page's own coordinate system, which is what you want when drawing back onto it.
Is search the same as extraction?
They share machinery but answer different questions. Extraction flattens a page to a string. Positional search additionally tracks the text and graphics matrices so every glyph carries a position, then unions the boxes of the glyphs each match covers. Use extraction to read or index; use search to know where.
Is matching case-sensitive?
Case-insensitive by default; an option switches it to exact matching. Each match returns a separate hit, so a word that appears five times on a page yields five boxes.
Does it need OCR?
No. It reads the real encoded text and computes positions from the font metrics and text matrices in the file. A scanned PDF with no text layer has nothing to search, so that would need OCR, which rust-pdf does not provide.
Search PDFs in your language
One Rust core, ten language bindings, one function call. Prototype for free, license the corporate features when you ship.