Shell API

pypdfium2 can also be used from the command-line.

Version

$ pypdfium2 --version
pypdfium2 5.0.0b1+65.g3792af7
pdfium 134.0.6996.0 at /opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/pypdfium2_raw/libpdfium.so

Main Help

$ pypdfium2 --help
usage: pypdfium2 [-h] [--version]
                 {arrange,attachments,extract-images,extract-text,imgtopdf,pageobjects,pdfinfo,render,tile,toc}
                 ...

Command line interface to the pypdfium2 library (Python binding to PDFium)

positional arguments:
  {arrange,attachments,extract-images,extract-text,imgtopdf,pageobjects,pdfinfo,render,tile,toc}
    arrange             rearrange/merge documents
    attachments         list/extract/edit embedded files
    extract-images      extract images
    extract-text        extract text
    imgtopdf            convert images to PDF
    pageobjects         print info on pageobjects
    pdfinfo             print info on document and pages
    render              rasterize pages
    tile                tile pages (N-up)
    toc                 print table of contents

options:
  -h, --help            show this help message and exit
  --version, -v         show program's version number and exit

Arranger

$ pypdfium2 arrange --help
usage: pypdfium2 arrange [-h] [--pages PAGES [PAGES ...]]
                         [--passwords PASSWORDS [PASSWORDS ...]] --output
                         OUTPUT
                         inputs [inputs ...]

rearrange/merge documents

positional arguments:
  inputs                Sequence of PDF files.

options:
  -h, --help            show this help message and exit
  --pages PAGES [PAGES ...]
                        Sequence of page texts, definig the pages to include
                        from each PDF. Use '_' as placeholder for all pages.
  --passwords PASSWORDS [PASSWORDS ...]
                        Passwords to unlock encrypted PDFs. Any placeholder
                        may be used for non-encrypted documents.
  --output OUTPUT, -o OUTPUT
                        Target path for the output document

Attachments

$ pypdfium2 attachments --help
usage: pypdfium2 attachments [-h] [--password PASSWORD]
                             input {list,extract,edit} ...

list/extract/edit embedded files

positional arguments:
  input                Input PDF document
  {list,extract,edit}

options:
  -h, --help           show this help message and exit
  --password PASSWORD  A password to unlock the PDF, if encrypted

$ pypdfium2 attachments file.pdf list --help
usage: pypdfium2 attachments input list [-h]

options:
  -h, --help  show this help message and exit

$ pypdfium2 attachments file.pdf extract --help
usage: pypdfium2 attachments input extract [-h] [--numbers NUMBERS]
                                           --output-dir OUTPUT_DIR

options:
  -h, --help            show this help message and exit
  --numbers NUMBERS
  --output-dir OUTPUT_DIR, -o OUTPUT_DIR

$ pypdfium2 attachments file.pdf edit --help
usage: pypdfium2 attachments input edit [-h] [--del-numbers DEL_NUMBERS]
                                        [--add-files F [F ...]] --output
                                        OUTPUT

options:
  -h, --help            show this help message and exit
  --del-numbers DEL_NUMBERS, -d DEL_NUMBERS
  --add-files F [F ...], -a F [F ...]
  --output OUTPUT, -o OUTPUT

Image Extractor

$ pypdfium2 extract-images --help
usage: pypdfium2 extract-images [-h] [--password PASSWORD] [--pages PAGES]
                                --output-dir OUTPUT_DIR
                                [--max-depth MAX_DEPTH] [--use-bitmap]
                                [--format FORMAT] [--render]
                                [--scale-to-original | --no-scale-to-original]
                                input

extract images

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --output-dir OUTPUT_DIR, -o OUTPUT_DIR
                        Output directory to take the extracted images
  --max-depth MAX_DEPTH
                        Maximum recursion depth to consider when looking for
                        pageobjects.
  --use-bitmap          Enforce the use of bitmaps rather than attempting a
                        smart extraction of the image.
  --format FORMAT       Image format to use when saving bitmaps. (Fallback if
                        doing smart extraction.)
  --render              When --use-bitmap is given, whether to get rendered
                        bitmaps, taking masks and transform matrices into
                        account.
  --scale-to-original, --no-scale-to-original
                        When --use-bitmap --render is given, whether to scale
                        the image so it is rendered at its native resolution,
                        or close to that. This should improve output quality.
                        The default is True, but you may opt out.

Text Extractor

$ pypdfium2 extract-text --help
usage: pypdfium2 extract-text [-h] [--password PASSWORD] [--pages PAGES]
                              [--strategy {range,bounded}]
                              input

extract text

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --strategy {range,bounded}
                        PDFium text extraction strategy (range, bounded).

Image Converter

$ pypdfium2 imgtopdf --help
usage: pypdfium2 imgtopdf [-h] --output OUTPUT [--inline] images [images ...]

convert images to PDF

positional arguments:
  images                Input images

options:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Target path for the new PDF
  --inline              If JPEG, whether to use PDFium's inline loading
                        function.

Pageobjects Info

$ pypdfium2 pageobjects --help
usage: pypdfium2 pageobjects [-h] [--password PASSWORD] [--pages PAGES]
                             [--n-digits N_DIGITS] [--filter T [T ...]]
                             [--max-depth MAX_DEPTH]
                             [--info {pos,imginfo} [{pos,imginfo} ...]]
                             input

print info on pageobjects

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --n-digits N_DIGITS   Number of digits to which coordinates/sizes shall be
                        rounded
  --filter T [T ...]    Object types to include. Choices: ['?', 'text',
                        'path', 'image', 'shading', 'form']
  --max-depth MAX_DEPTH
                        Maximum recursion depth to consider when descending
                        into Form XObjects.
  --info {pos,imginfo} [{pos,imginfo} ...]
                        Object details to show.

Document Info

$ pypdfium2 pdfinfo --help
usage: pypdfium2 pdfinfo [-h] [--password PASSWORD] [--pages PAGES]
                         [--n-digits N_DIGITS]
                         input

print info on document and pages

positional arguments:
  input                Input PDF document

options:
  -h, --help           show this help message and exit
  --password PASSWORD  A password to unlock the PDF, if encrypted
  --pages PAGES        Page numbers and ranges to include
  --n-digits N_DIGITS  Number of digits to which coordinates/sizes shall be
                       rounded

Renderer

$ pypdfium2 render --help
usage: pypdfium2 render [-h] [--password PASSWORD] [--pages PAGES] --output
                        OUTPUT [--prefix PREFIX] [--format FORMAT]
                        [--engine ENGINE_CLS] [--scale SCALE]
                        [--rotation {0,90,180,270}] [--fill-color C C C C]
                        [--optimize-mode {lcd,print}] [--crop C C C C]
                        [--draw-annots | --no-draw-annots]
                        [--draw-forms | --no-draw-forms]
                        [--no-antialias {text,image,path} [{text,image,path} ...]]
                        [--force-halftone]
                        [--bitmap-maker {native,foreign,foreign_packed,foreign_simple}]
                        [--grayscale] [--byteorder REV_BYTEORDER]
                        [--x-channel | --no-x-channel]
                        [--bgra-on-transparency | --no-bgra-on-transparency]
                        [--linear [LINEAR]] [--processes PROCESSES]
                        [--parallel-strategy {spawn,forkserver,fork}]
                        [--parallel-lib {mp,ft}] [--parallel-map PARALLEL_MAP]
                        [--sample-theme] [--path-fill C C C C]
                        [--path-stroke C C C C] [--text-fill C C C C]
                        [--text-stroke C C C C] [--fill-to-stroke]
                        [--invert-lightness] [--exclude-images]
                        input

rasterize pages

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --output OUTPUT, -o OUTPUT
                        Output directory where the serially numbered images
                        shall be placed.
  --prefix PREFIX       Custom prefix for the images. Defaults to the input
                        filename's stem.
  --format FORMAT, -f FORMAT
                        The image format to use (default: conditional).
  --engine ENGINE_CLS   The saver engine to use ('pil', 'numpy+pil',
                        'numpy+cv2')
  --scale SCALE         Define the resolution of the output images. By
                        default, one PDF point (1/72in) is rendered to 1x1
                        pixel. This factor scales the number of pixels that
                        represent one point.
  --rotation {0,90,180,270}
                        Rotate pages by 90, 180 or 270 degrees.
  --fill-color C C C C  Color the bitmap will be filled with before rendering.
                        Shall be given in RGBA format as a sequence of
                        integers ranging from 0 to 255. Defaults to white.
  --optimize-mode {lcd,print}
                        The rendering optimisation mode. None if not given.
  --crop C C C C        Amount to crop from (left, bottom, right, top).
  --draw-annots, --no-draw-annots
                        Whether annotations may be shown (default: true).
  --draw-forms, --no-draw-forms
                        Whether forms may be shown (default: true).
  --no-antialias {text,image,path} [{text,image,path} ...]
                        Item types that shall not be smoothed.
  --force-halftone      Always use halftone for image stretching.

Bitmap options:
  Bitmap config, including pixel format.

  --bitmap-maker {native,foreign,foreign_packed,foreign_simple}
                        The bitmap maker to use.
  --grayscale           Whether to render in grayscale mode (no colors).
  --byteorder REV_BYTEORDER
                        Whether to use BGR or RGB byteorder (default:
                        conditional).
  --x-channel, --no-x-channel
                        Whether to prefer BGRx/RGBx over BGR/RGB (default:
                        conditional).
  --bgra-on-transparency, --no-bgra-on-transparency
                        Whether to use BGRA if there is page content that has
                        transparency. Note, this makes format selection page-
                        dependent. As this behavior can be confusing, it is
                        not currently the default, but recommended for
                        performance in these cases.

Parallelization:
  Options for rendering with multiple processes.

  --linear [LINEAR]     Render non-parallel if page count is less or equal to
                        the specified value (default: 4). If this flag is
                        given without a value, then render linear regardless
                        of document length.
  --processes PROCESSES
                        The maximum number of parallel rendering processes.
                        Defaults to the number of CPU cores.
  --parallel-strategy {spawn,forkserver,fork}
                        The process start method to use. ('fork' is
                        discouraged due to stability issues.)
  --parallel-lib {mp,ft}
                        The parallelization module to use (mp =
                        multiprocessing, ft = concurrent.futures).
  --parallel-map PARALLEL_MAP
                        The map function to use (backend specific, the default
                        is an iterative map).

Flat color scheme:
  Options for using pdfium's color scheme renderer. Note that this may
  flatten different colors into one, so the usability of this is limited.
  Alternatively, consider post-processing with lightness inversion (see
  below).

  --sample-theme        Use a dark background sample theme as base. Explicit
                        color params override selectively.
  --path-fill C C C C
  --path-stroke C C C C
  --text-fill C C C C
  --text-stroke C C C C
  --fill-to-stroke      When rendering with custom color scheme, only draw
                        borders around fill areas using the `path_stroke`
                        color, instead of filling with the `path_fill` color.
                        This is actually recommended, since with a single fill
                        color for paths the boundaries of adjacent fill paths
                        are less visible.

Post processing:
  Options to post-process rendered images. Note, this may have a strongly
  negative impact on performance.

  --invert-lightness    Invert lightness using the HLS color space (e.g.
                        white<->black, dark_blue<->light_blue). The intent is
                        to achieve a dark theme for documents with light
                        background, while providing better visual results than
                        classical color inversion or a flat pdfium color
                        scheme. However, note that --optimize-mode lcd is not
                        recommendable when inverting lightness.
  --exclude-images      Whether to exclude PDF images from lightness
                        inversion.

Page Tiler

$ pypdfium2 tile --help
usage: pypdfium2 tile [-h] [--password PASSWORD] --output OUTPUT --rows ROWS
                      --cols COLS --width WIDTH --height HEIGHT [--unit UNIT]
                      input

tile pages (N-up)

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --output OUTPUT, -o OUTPUT
                        Target path for the new document
  --rows ROWS, -r ROWS  Number of rows (horizontal tiles)
  --cols COLS, -c COLS  Number of columns (vertical tiles)
  --width WIDTH         Target width
  --height HEIGHT       Target height
  --unit UNIT, -u UNIT  Unit for target width and height (pt, mm, cm, in)

TOC Reader

$ pypdfium2 toc --help
usage: pypdfium2 toc [-h] [--password PASSWORD] [--n-digits N_DIGITS]
                     [--max-depth MAX_DEPTH]
                     input

print table of contents

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --n-digits N_DIGITS   Number of digits to which coordinates/sizes shall be
                        rounded
  --max-depth MAX_DEPTH
                        Maximum recursion depth to consider when parsing the
                        table of contents