Is positional search the same as extraction?

They share machinery but answer different questions. Extraction flattens a page to a Unicode string. Positional search additionally tracks the text and graphics matrices so every shown glyph carries a position, then unions the boxes of the glyphs each match covers. Use extraction for reading or indexing; use positional search when you need to know where on the page something is.

Concept · Reading PDFs

Search Text in a PDF
and Get Its Coordinates

Q: How do I find the position of text in a PDF?

Pass the PDF bytes and a query string to rust-pdf's find_text function. It returns one hit per occurrence, each with the page index and a bounding box (x, y, width, height) in PDF user-space points, with the origin at the page's lower-left corner. Those are the same coordinates content is drawn in, so you can feed a box straight back into an overlay or a visible signature.

Q: Does it require OCR?

No. Positional search reads the real text the PDF author encoded and computes positions from the font metrics and text matrices in the file. A scanned PDF with no text layer has nothing to search; that would need OCR, which rust-pdf does not provide.

Last updated: 2026-06-29

Extraction tells you what a PDF says. Positional search tells you where it says it. Give rust-pdf a query and it returns one bounding box per match (page, x, y, width and height in PDF points), so you can drop a signature next to an anchor word, highlight a clause, or place a redaction exactly over a phrase. One call, no OCR.

Search a PDF with rust-pdf How it works

The missing link between extraction and stamping

Plenty of tools can pull the text out of a PDF, and plenty can draw new content onto a page. The gap is connecting the two: you read a contract, you know it contains the word “Signature”, but to put a signature box there you need its coordinates, and a flat string has thrown those away.

Positional search keeps them. As it walks the page it tracks the current text matrix, the graphics matrix (CTM) and each glyph's advance width, so every character it decodes carries a position in PDF user space. When your query matches a run of characters, it unions their boxes into a single rectangle and hands it back. Those are the same coordinates the drawing operators use, so the rectangle is immediately usable for an overlay, a visible signature, a highlight or a redaction.

How positional search works

Three steps, all from the bytes already in the file: no font files, no rendering, no OCR.

Track the matrices

The content stream is replayed operator by operator. cm, Tm, Td, T* and TJ updates keep the running text and graphics matrices, so the origin of the next glyph is always known.

Advance per glyph

Each glyph's width comes from the font's /Widths (simple fonts) or /W+/DW (Type0/CID fonts); no font program is parsed. Width plus matrix gives every glyph an axis-aligned box, and the code maps to Unicode via the ToUnicode CMap.

Match and union

Your query is matched against the page's glyph stream (case-insensitive by default). For each occurrence, the boxes of the covered glyphs are unioned into one rectangle, returned as { page, text, x, y, width, height } in points.

What positional search unlocks

Once you have a box, the page becomes addressable.

Anchor a signature

Search for “Signature:” or a party name, then place a visible signature widget or appearance exactly beside it, even when the layout shifts between documents.

Highlight & annotate

Turn a search term into a highlight rectangle or a link annotation on every page it appears, the way a viewer's find-and-highlight works, but headless and in bulk.

Locate table cells

Find a column header or a label and use its coordinates to slice the region beneath it, a lightweight way to pull values out of fixed-layout reports and forms.

Targeted redaction

Search for sensitive phrases such as account numbers, names or regulated terms, and feed each box into a redaction so the covered content is removed, not just hidden behind a rectangle.

Verify placement

In a generation pipeline, assert that a heading or a total landed where it should by searching for it and checking the returned coordinates in a test.

Stamp watermarks by anchor

Place a “DRAFT” or approval stamp relative to a found word rather than the page centre, so it tracks the content instead of the geometry.

Search a PDF in one call

Returns a list of hits, each with a page index and a bounding box in points.

# pip install rustpdf
import rustpdf

pdf = open("contract.pdf", "rb").read()
for hit in rustpdf.find_text(pdf, "Signature"):
    print(hit.page, hit.x, hit.y, hit.width, hit.height)

// dotnet add package RustPdf
using RustPdf;

byte[] pdf = File.ReadAllBytes("contract.pdf");
foreach (var hit in Pdf.FindText(pdf, "Signature"))
    Console.WriteLine($"{hit.Page} {hit.X} {hit.Y} {hit.Width} {hit.Height}");

// go get github.com/rustpdf/rustpdf-go@latest
data, _ := os.ReadFile("contract.pdf")
hits, _ := rustpdf.FindText(data, "Signature", false)
for _, h := range hits {
    fmt.Println(h.Page, h.X, h.Y, h.Width, h.Height)
}

// npm install rustpdf
const { findText } = require("rustpdf");
const fs = require("fs");

for (const h of findText(fs.readFileSync("contract.pdf"), "Signature")) {
  console.log(h.page, h.x, h.y, h.width, h.height);
}

The same function is available in every language binding. Full reference in the documentation.

PDF text search FAQ

How do I find the position of text in a PDF?

Pass the PDF bytes and a query to rust-pdf's find_text. It returns one hit per occurrence, each with the page index and a bounding box (x, y, width, height) in PDF user-space points, origin at the page's lower-left. Those are the coordinates content is drawn in, so a box can be fed straight into an overlay or a visible signature.

What coordinate system are the boxes in?

PDF user space: points (1/72 inch), origin at the lower-left, y pointing up, the same space the drawing operators use. Page /Rotate is not applied; the box is where the text lives in the page's own coordinate system, which is what you want when drawing back onto it.

Is search the same as extraction?

They share machinery but answer different questions. Extraction flattens a page to a string. Positional search additionally tracks the text and graphics matrices so every glyph carries a position, then unions the boxes of the glyphs each match covers. Use extraction to read or index; use search to know where.

Is matching case-sensitive?

Case-insensitive by default; an option switches it to exact matching. Each match returns a separate hit, so a word that appears five times on a page yields five boxes.

Does it need OCR?

No. It reads the real encoded text and computes positions from the font metrics and text matrices in the file. A scanned PDF with no text layer has nothing to search, so that would need OCR, which rust-pdf does not provide.

Search PDFs in your language

One Rust core, ten language bindings, one function call. Prototype for free, license the corporate features when you ship.

Read the docs View pricing & licensing

Extract text Extract images Watermark PDF Render to image All languages

Search Text in a PDFand Get Its Coordinates