Shell API

pypdfium2 can also be used from the command-line.

Version

$ pypdfium2 --version
pypdfium2 4.30.0+12.g6736a5d
pdfium 130.0.6721.0 at /opt/hostedtoolcache/Python/3.12.6/x64/lib/python3.12/site-packages/pypdfium2_raw/libpdfium.so

Main Help

$ pypdfium2 --help
usage: pypdfium2 [-h] [--version]
                 {arrange,attachments,extract-images,extract-text,imgtopdf,pageobjects,pdfinfo,render,tile,toc}
                 ...

Command line interface to the pypdfium2 library (Python binding to PDFium)

positional arguments:
  {arrange,attachments,extract-images,extract-text,imgtopdf,pageobjects,pdfinfo,render,tile,toc}
    arrange             rearrange/merge documents
    attachments         list/extract/edit embedded files
    extract-images      extract images
    extract-text        extract text
    imgtopdf            convert images to PDF
    pageobjects         print info on page objects
    pdfinfo             print info on document and pages
    render              rasterize pages
    tile                tile pages (N-up)
    toc                 print table of contents

options:
  -h, --help            show this help message and exit
  --version, -v         show program's version number and exit

Arranger

$ pypdfium2 arrange --help
usage: pypdfium2 arrange [-h] [--pages PAGES [PAGES ...]]
                         [--passwords PASSWORDS [PASSWORDS ...]] --output
                         OUTPUT
                         inputs [inputs ...]

rearrange/merge documents

positional arguments:
  inputs                Sequence of PDF files.

options:
  -h, --help            show this help message and exit
  --pages PAGES [PAGES ...]
                        Sequence of page texts, definig the pages to include
                        from each PDF. Use '_' as placeholder for all pages.
  --passwords PASSWORDS [PASSWORDS ...]
                        Passwords to unlock encrypted PDFs. Any placeholder
                        may be used for non-encrypted documents.
  --output OUTPUT, -o OUTPUT
                        Target path for the output document

Attachments

$ pypdfium2 attachments --help
usage: pypdfium2 attachments [-h] [--password PASSWORD]
                             input {list,extract,edit} ...

list/extract/edit embedded files

positional arguments:
  input                Input PDF document
  {list,extract,edit}

options:
  -h, --help           show this help message and exit
  --password PASSWORD  A password to unlock the PDF, if encrypted
$ pypdfium2 attachments file.pdf list --help
usage: pypdfium2 attachments input list [-h]

options:
  -h, --help  show this help message and exit
$ pypdfium2 attachments file.pdf extract --help
usage: pypdfium2 attachments input extract [-h] [--numbers NUMBERS]
                                           --output-dir OUTPUT_DIR

options:
  -h, --help            show this help message and exit
  --numbers NUMBERS
  --output-dir OUTPUT_DIR, -o OUTPUT_DIR
$ pypdfium2 attachments file.pdf edit --help
usage: pypdfium2 attachments input edit [-h] [--del-numbers DEL_NUMBERS]
                                        [--add-files F [F ...]] --output
                                        OUTPUT

options:
  -h, --help            show this help message and exit
  --del-numbers DEL_NUMBERS, -d DEL_NUMBERS
  --add-files F [F ...], -a F [F ...]
  --output OUTPUT, -o OUTPUT

Image Extractor

$ pypdfium2 extract-images --help
usage: pypdfium2 extract-images [-h] [--password PASSWORD] [--pages PAGES]
                                --output-dir OUTPUT_DIR
                                [--max-depth MAX_DEPTH] [--use-bitmap]
                                [--format FORMAT] [--render]
                                input

extract images

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --output-dir OUTPUT_DIR, -o OUTPUT_DIR
                        Output directory to take the extracted images
  --max-depth MAX_DEPTH
                        Maximum recursion depth to consider when looking for
                        page objects.
  --use-bitmap          Enforce the use of bitmaps rather than attempting a
                        smart extraction of the image.
  --format FORMAT       Image format to use when saving bitmaps. (Fallback if
                        doing smart extraction.)
  --render              Whether to get rendered bitmaps, taking masks and
                        transform matrices into account. (Fallback if doing
                        smart extraction.)

Text Extractor

$ pypdfium2 extract-text --help
usage: pypdfium2 extract-text [-h] [--password PASSWORD] [--pages PAGES]
                              [--strategy {range,bounded}]
                              input

extract text

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --strategy {range,bounded}
                        PDFium text extraction strategy (range, bounded).

Image Converter

$ pypdfium2 imgtopdf --help
usage: pypdfium2 imgtopdf [-h] --output OUTPUT [--inline] images [images ...]

convert images to PDF

positional arguments:
  images                Input images

options:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Target path for the new PDF
  --inline              If JPEG, whether to use PDFium's inline loading
                        function.

Page Objects Info

$ pypdfium2 pageobjects --help
usage: pypdfium2 pageobjects [-h] [--password PASSWORD] [--pages PAGES]
                             [--n-digits N_DIGITS] [--filter T [T ...]]
                             [--max-depth MAX_DEPTH] [--info [INFO ...]]
                             input

print info on page objects

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --n-digits N_DIGITS   Number of digits to which coordinates/sizes shall be
                        rounded
  --filter T [T ...]    Object types to include. Choices: ['?', 'text',
                        'path', 'image', 'shading', 'form']
  --max-depth MAX_DEPTH
                        Maximum recursion depth to consider when descending
                        into Form XObjects.
  --info [INFO ...]     Object details to show (pos, imageinfo).

Document Info

$ pypdfium2 pdfinfo --help
usage: pypdfium2 pdfinfo [-h] [--password PASSWORD] [--pages PAGES]
                         [--n-digits N_DIGITS]
                         input

print info on document and pages

positional arguments:
  input                Input PDF document

options:
  -h, --help           show this help message and exit
  --password PASSWORD  A password to unlock the PDF, if encrypted
  --pages PAGES        Page numbers and ranges to include
  --n-digits N_DIGITS  Number of digits to which coordinates/sizes shall be
                       rounded

Renderer

$ pypdfium2 render --help
usage: pypdfium2 render [-h] [--password PASSWORD] [--pages PAGES] --output
                        OUTPUT [--prefix PREFIX] [--format FORMAT]
                        [--engine ENGINE_CLS] [--scale SCALE]
                        [--rotation {0,90,180,270}] [--fill-color C C C C]
                        [--optimize-mode {lcd,print}]
                        [--crop CROP CROP CROP CROP]
                        [--draw-annots | --no-draw-annots]
                        [--draw-forms | --no-draw-forms]
                        [--no-antialias {text,image,path} [{text,image,path} ...]]
                        [--force-halftone]
                        [--bitmap-maker {native,foreign,foreign_packed,foreign_simple}]
                        [--grayscale] [--byteorder REV_BYTEORDER]
                        [--x-channel | --no-x-channel] [--linear [LINEAR]]
                        [--processes PROCESSES]
                        [--parallel-strategy {spawn,forkserver,fork}]
                        [--parallel-lib {mp,ft}] [--parallel-map PARALLEL_MAP]
                        [--sample-theme] [--path-fill C C C C]
                        [--path-stroke C C C C] [--text-fill C C C C]
                        [--text-stroke C C C C] [--fill-to-stroke]
                        input

rasterize pages

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --output OUTPUT, -o OUTPUT
                        Output directory where the serially numbered images
                        shall be placed.
  --prefix PREFIX       Custom prefix for the images. Defaults to the input
                        filename's stem.
  --format FORMAT, -f FORMAT
                        The image format to use.
  --engine ENGINE_CLS   The saver engine to use (pil, numpy+cv2)
  --scale SCALE         Define the resolution of the output images. By
                        default, one PDF point (1/72in) is rendered to 1x1
                        pixel. This factor scales the number of pixels that
                        represent one point.
  --rotation {0,90,180,270}
                        Rotate pages by 90, 180 or 270 degrees.
  --fill-color C C C C  Color the bitmap will be filled with before rendering.
                        It shall be given in RGBA format as a sequence of
                        integers ranging from 0 to 255. Defaults to white.
  --optimize-mode {lcd,print}
                        The rendering optimisation mode. None if not given.
  --crop CROP CROP CROP CROP
                        Amount to crop from (left, bottom, right, top).
  --draw-annots, --no-draw-annots
                        Whether annotations may be shown (default: true).
  --draw-forms, --no-draw-forms
                        Whether forms may be shown (default: true).
  --no-antialias {text,image,path} [{text,image,path} ...]
                        Item types that shall not be smoothed.
  --force-halftone      Always use halftone for image stretching.

Bitmap options:
  Bitmap config, including pixel format.

  --bitmap-maker {native,foreign,foreign_packed,foreign_simple}
                        The bitmap maker to use.
  --grayscale           Whether to render in grayscale mode (no colors).
  --byteorder REV_BYTEORDER
                        Whether to use BGR or RGB byteorder (default:
                        conditional).
  --x-channel, --no-x-channel
                        Whether to prefer BGRx/RGBx over BGR/RGB (default:
                        conditional).

Parallelization:
  Options for rendering with multiple processes.

  --linear [LINEAR]     Render non-parallel if page count is less or equal to
                        the specified value (default is conditional). If this
                        flag is given without a value, then render linear
                        regardless of document length.
  --processes PROCESSES
                        The maximum number of parallel rendering processes.
                        Defaults to the number of CPU cores.
  --parallel-strategy {spawn,forkserver,fork}
                        The process start method to use. ('fork' is
                        discouraged due to stability issues.)
  --parallel-lib {mp,ft}
                        The parallelization module to use (mp =
                        multiprocessing, ft = concurrent.futures).
  --parallel-map PARALLEL_MAP
                        The map function to use (backend specific, the default
                        is an iterative map).

Forced color scheme:
  Options for using pdfium's forced color scheme renderer. Deprecated,
  considered not useful.

  --sample-theme        Use a dark background sample theme as base. Explicit
                        color params override selectively.
  --path-fill C C C C
  --path-stroke C C C C
  --text-fill C C C C
  --text-stroke C C C C
  --fill-to-stroke      Only draw borders around fill areas using the
                        `path_stroke` color, instead of filling with the
                        `path_fill` color.

Page Tiler

$ pypdfium2 tile --help
usage: pypdfium2 tile [-h] [--password PASSWORD] --output OUTPUT --rows ROWS
                      --cols COLS --width WIDTH --height HEIGHT [--unit UNIT]
                      input

tile pages (N-up)

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --output OUTPUT, -o OUTPUT
                        Target path for the new document
  --rows ROWS, -r ROWS  Number of rows (horizontal tiles)
  --cols COLS, -c COLS  Number of columns (vertical tiles)
  --width WIDTH         Target width
  --height HEIGHT       Target height
  --unit UNIT, -u UNIT  Unit for target width and height (pt, mm, cm, in)

TOC Reader

$ pypdfium2 toc --help
usage: pypdfium2 toc [-h] [--password PASSWORD] [--n-digits N_DIGITS]
                     [--max-depth MAX_DEPTH]
                     input

print table of contents

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --n-digits N_DIGITS   Number of digits to which coordinates/sizes shall be
                        rounded
  --max-depth MAX_DEPTH
                        Maximum recursion depth to consider when parsing the
                        table of contents