Skip to main content

PDF Highlight Extractor — Export Highlights to Text

Pull every highlighted passage out of a PDF and export them as plain text or Markdown. Highlights are grouped by page with their page number for easy reference.

Tap to select a file

Supports PDF, up to 100MB

Runs entirely in your browser

What to do next

Related tools

About PDF Highlight Extractor

If you ever read research papers, study textbooks, or work through reports as PDFs, you have probably built up a backlog of files with dozens of highlighted passages — and no good way to pull those highlights out into a notes app without typing them by hand. The PDF Highlight Extractor solves exactly that. Drop a PDF in and the tool walks every page's annotation list, finds every highlight, looks up the underlying text, and exports a clean transcript grouped by page. Output is plain .txt or Markdown.

Detection works on the standard PDF highlight annotation type (subtype = "Highlight"), which is what commercial PDF editors, Apple Preview, standard PDF readers, reference managers, reference managers, Calibre, GoodNotes and almost every other reading app produces. Optional toggles also include Underline, Squiggly and Strikeout annotations if you use them as a different colour-code in your reading workflow. The text-resolution step uses Mozilla's PDF.js renderer to map highlight rectangles to the actual glyph runs underneath, the same approach used by reference managers — so highlights captured in one app can be exported with another, and the result is the literal sentence the highlighter covered, not an OCR re-read.

The output is shaped for downstream notes systems. Plain .txt format groups highlights under "--- Page N ---" headers, one highlight per line, with optional reading-app comments inline. Markdown format produces "## Page N" headings followed by blockquoted highlights, suitable for direct paste into Notion, Obsidian, Bear, Logseq, or any other knowledge-base tool that speaks Markdown. Both formats preserve the original page numbers so you can always trace any highlight back to its source. Annotations that have a comment attached (the little popup note a commercial PDF editor lets you type) carry that comment through to the export.

What this tool cannot do is also worth knowing. It needs a real text layer in the PDF — a scanned book without OCR has no text to map highlights onto. Run OCR PDF first to add a text layer, then re-export. It also cannot recover highlights that were "drawn" with a freeform shape annotation rather than a real highlight (some apps do this on iPad to support pencil-style highlighting); those show up as page-level shape annotations, not as highlight annotations with QuadPoints. The tool reports a clear error if it finds zero highlights so you do not silently end up with an empty file.

How it works

  1. 1Drop a PDF onto the upload area. Files up to 100 MB are accepted.
  2. 2Pick the output format: plain .txt for raw lines, or Markdown for a Notion/Obsidian-friendly structure with page headings and blockquotes.
  3. 3Optional: include Underline / Squiggly / Strikeout annotations if you use them as additional colour-codes in your reading workflow.
  4. 4Mozilla’s PDF.js parses every page’s annotation list and the underlying text layer.
  5. 5For each highlight, the tool maps the highlight rectangle onto the glyph runs underneath and reconstructs the highlighted passage.
  6. 6Download the .txt or .md file. Highlights are grouped by page so you always know where each one came from.

Common use cases

  • Export every highlight from a stack of research papers into a single Notion page for a literature review
  • Pull reading-app highlights out of a downloaded textbook PDF for revision notes before an exam
  • Generate a structured Markdown summary of a 200-page market research report for a slide deck
  • Collect every highlighted clause from a long legal contract into one document for an annotation pass with a colleague
  • Extract reading notes from a non-fiction book for a blog post or essay
  • Build a personal knowledge base by pulling highlights from every paper you have ever annotated and dropping them into Obsidian

FAQ

Which highlight types are recognised?

The standard PDF highlight annotation (subtype "Highlight") that every common reading app produces. Squiggly, underline and strikeout annotations are also captured if you toggle the option.

Will it work on scanned PDFs?

Only if the scanned document was OCR-processed and has a real text layer. The extractor uses Mozilla’s PDF.js renderer to map highlight rectangles to the underlying glyph runs; without a text layer there is nothing to map to. Run OCR PDF first if needed.

How is the output organised?

Highlights are grouped by page in the order they appear, prefixed with their page number. Pick the .txt format for raw lines or Markdown for a structured outline ready to paste into Notion / Obsidian / Bear.

Does it include the surrounding context?

No — only the actual highlighted text is exported. Surrounding sentence context can be added with the post-processing toggle, which extends each extract to the nearest sentence boundary on either side.

Is anything uploaded?

No. The PDF is processed by pdf.js running inside your browser tab. The text extracts are built in memory and offered as a download.

Which highlight tools is this compatible with?

Anything that produces a standard PDF highlight annotation, which is essentially every reading app — every major desktop and mobile PDF reader, and the highlight tools built into iOS Books and Android’s default PDF viewer. The output is identical regardless of which app produced the highlight, because the on-disk format is standardised.

Why do some highlights come back with garbled or missing text?

Three causes. First, the PDF was scanned without OCR — there is no text under the highlight to map to. Second, the text layer is unusually fragmented and Mozilla’s PDF.js could not reassemble the glyph order. Third, the highlight in the source app was drawn with a freeform shape rather than as a real highlight annotation. Re-OCR the PDF first if you suspect the first cause; for the third, switch to the standard highlight tool in your reading app.

Can it export comments and notes attached to highlights?

Yes. If you typed a popup note on a highlight in a standards-compliant PDF reader, the comment is exported alongside the highlight text, prefixed with "Note:" in the .txt format and rendered as italic text in the .md format.

Does it work on iPad highlights done with Apple Pencil?

It works on highlights produced with the Highlighter tool in apps like GoodNotes and PDF Expert, because those produce real highlight annotations. It does not work on freeform pen markings drawn directly with the Pencil — those are shape annotations, not text highlights.

Will it find highlights in a password-protected PDF?

Not directly. PDF passwords encrypt the entire document including the annotations, so the extractor cannot read them. Use Remove PDF Password first to unlock the file with the correct password, then run the unlocked output through this tool.

Is anything uploaded?

No. Mozilla’s PDF.js renderer parses the document in your browser; the extracted text is built in memory and offered as a download. Nothing leaves your device.

How does this compare to most reference-manager exports?

The underlying mechanism is identical — both use PDF.js to map highlight rectangles to text. This tool runs entirely client-side without an account, accepts any PDF (not just papers in a single reference manager), and lets you switch between .txt and Markdown output formats at export time.

PDF Excerpt → Image Card

Highlight a passage in a PDF and turn it into a polished social-quote image. The page background appears subtly behind the passage; perfect for sharing a line from a book or report on social media.

Compress PDF

Reduce PDF file size while maintaining quality.

PDF Editor

Edit PDFs in your browser — add text, images, highlights, signatures, and more. No signup, no uploads, no limits.

Merge PDF

Combine multiple PDF files into one document.

JPG to PDF

Convert JPG images to a PDF document.

PDF to JPG

Convert PDF pages to JPG images.

Split PDF

Split a PDF into separate files by page range.

Image to PDF

Convert any image format to a PDF document.

View all PDF Tools