Shell API
pypdfium2 can also be used from the command-line.
Version
$ pypdfium2 --version
pypdfium2 4.30.0+12.g6736a5d
pdfium 130.0.6721.0 at /opt/hostedtoolcache/Python/3.12.6/x64/lib/python3.12/site-packages/pypdfium2_raw/libpdfium.so
Main Help
$ pypdfium2 --help
usage: pypdfium2 [-h] [--version]
{arrange,attachments,extract-images,extract-text,imgtopdf,pageobjects,pdfinfo,render,tile,toc}
...
Command line interface to the pypdfium2 library (Python binding to PDFium)
positional arguments:
{arrange,attachments,extract-images,extract-text,imgtopdf,pageobjects,pdfinfo,render,tile,toc}
arrange rearrange/merge documents
attachments list/extract/edit embedded files
extract-images extract images
extract-text extract text
imgtopdf convert images to PDF
pageobjects print info on page objects
pdfinfo print info on document and pages
render rasterize pages
tile tile pages (N-up)
toc print table of contents
options:
-h, --help show this help message and exit
--version, -v show program's version number and exit
Arranger
$ pypdfium2 arrange --help
usage: pypdfium2 arrange [-h] [--pages PAGES [PAGES ...]]
[--passwords PASSWORDS [PASSWORDS ...]] --output
OUTPUT
inputs [inputs ...]
rearrange/merge documents
positional arguments:
inputs Sequence of PDF files.
options:
-h, --help show this help message and exit
--pages PAGES [PAGES ...]
Sequence of page texts, definig the pages to include
from each PDF. Use '_' as placeholder for all pages.
--passwords PASSWORDS [PASSWORDS ...]
Passwords to unlock encrypted PDFs. Any placeholder
may be used for non-encrypted documents.
--output OUTPUT, -o OUTPUT
Target path for the output document
Attachments
$ pypdfium2 attachments --help
usage: pypdfium2 attachments [-h] [--password PASSWORD]
input {list,extract,edit} ...
list/extract/edit embedded files
positional arguments:
input Input PDF document
{list,extract,edit}
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
$ pypdfium2 attachments file.pdf list --help
usage: pypdfium2 attachments input list [-h]
options:
-h, --help show this help message and exit
$ pypdfium2 attachments file.pdf extract --help
usage: pypdfium2 attachments input extract [-h] [--numbers NUMBERS]
--output-dir OUTPUT_DIR
options:
-h, --help show this help message and exit
--numbers NUMBERS
--output-dir OUTPUT_DIR, -o OUTPUT_DIR
$ pypdfium2 attachments file.pdf edit --help
usage: pypdfium2 attachments input edit [-h] [--del-numbers DEL_NUMBERS]
[--add-files F [F ...]] --output
OUTPUT
options:
-h, --help show this help message and exit
--del-numbers DEL_NUMBERS, -d DEL_NUMBERS
--add-files F [F ...], -a F [F ...]
--output OUTPUT, -o OUTPUT
Image Extractor
$ pypdfium2 extract-images --help
usage: pypdfium2 extract-images [-h] [--password PASSWORD] [--pages PAGES]
--output-dir OUTPUT_DIR
[--max-depth MAX_DEPTH] [--use-bitmap]
[--format FORMAT] [--render]
input
extract images
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--pages PAGES Page numbers and ranges to include
--output-dir OUTPUT_DIR, -o OUTPUT_DIR
Output directory to take the extracted images
--max-depth MAX_DEPTH
Maximum recursion depth to consider when looking for
page objects.
--use-bitmap Enforce the use of bitmaps rather than attempting a
smart extraction of the image.
--format FORMAT Image format to use when saving bitmaps. (Fallback if
doing smart extraction.)
--render Whether to get rendered bitmaps, taking masks and
transform matrices into account. (Fallback if doing
smart extraction.)
Text Extractor
$ pypdfium2 extract-text --help
usage: pypdfium2 extract-text [-h] [--password PASSWORD] [--pages PAGES]
[--strategy {range,bounded}]
input
extract text
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--pages PAGES Page numbers and ranges to include
--strategy {range,bounded}
PDFium text extraction strategy (range, bounded).
Image Converter
$ pypdfium2 imgtopdf --help
usage: pypdfium2 imgtopdf [-h] --output OUTPUT [--inline] images [images ...]
convert images to PDF
positional arguments:
images Input images
options:
-h, --help show this help message and exit
--output OUTPUT, -o OUTPUT
Target path for the new PDF
--inline If JPEG, whether to use PDFium's inline loading
function.
Page Objects Info
$ pypdfium2 pageobjects --help
usage: pypdfium2 pageobjects [-h] [--password PASSWORD] [--pages PAGES]
[--n-digits N_DIGITS] [--filter T [T ...]]
[--max-depth MAX_DEPTH] [--info [INFO ...]]
input
print info on page objects
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--pages PAGES Page numbers and ranges to include
--n-digits N_DIGITS Number of digits to which coordinates/sizes shall be
rounded
--filter T [T ...] Object types to include. Choices: ['?', 'text',
'path', 'image', 'shading', 'form']
--max-depth MAX_DEPTH
Maximum recursion depth to consider when descending
into Form XObjects.
--info [INFO ...] Object details to show (pos, imageinfo).
Document Info
$ pypdfium2 pdfinfo --help
usage: pypdfium2 pdfinfo [-h] [--password PASSWORD] [--pages PAGES]
[--n-digits N_DIGITS]
input
print info on document and pages
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--pages PAGES Page numbers and ranges to include
--n-digits N_DIGITS Number of digits to which coordinates/sizes shall be
rounded
Renderer
$ pypdfium2 render --help
usage: pypdfium2 render [-h] [--password PASSWORD] [--pages PAGES] --output
OUTPUT [--prefix PREFIX] [--format FORMAT]
[--engine ENGINE_CLS] [--scale SCALE]
[--rotation {0,90,180,270}] [--fill-color C C C C]
[--optimize-mode {lcd,print}]
[--crop CROP CROP CROP CROP]
[--draw-annots | --no-draw-annots]
[--draw-forms | --no-draw-forms]
[--no-antialias {text,image,path} [{text,image,path} ...]]
[--force-halftone]
[--bitmap-maker {native,foreign,foreign_packed,foreign_simple}]
[--grayscale] [--byteorder REV_BYTEORDER]
[--x-channel | --no-x-channel] [--linear [LINEAR]]
[--processes PROCESSES]
[--parallel-strategy {spawn,forkserver,fork}]
[--parallel-lib {mp,ft}] [--parallel-map PARALLEL_MAP]
[--sample-theme] [--path-fill C C C C]
[--path-stroke C C C C] [--text-fill C C C C]
[--text-stroke C C C C] [--fill-to-stroke]
input
rasterize pages
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--pages PAGES Page numbers and ranges to include
--output OUTPUT, -o OUTPUT
Output directory where the serially numbered images
shall be placed.
--prefix PREFIX Custom prefix for the images. Defaults to the input
filename's stem.
--format FORMAT, -f FORMAT
The image format to use.
--engine ENGINE_CLS The saver engine to use (pil, numpy+cv2)
--scale SCALE Define the resolution of the output images. By
default, one PDF point (1/72in) is rendered to 1x1
pixel. This factor scales the number of pixels that
represent one point.
--rotation {0,90,180,270}
Rotate pages by 90, 180 or 270 degrees.
--fill-color C C C C Color the bitmap will be filled with before rendering.
It shall be given in RGBA format as a sequence of
integers ranging from 0 to 255. Defaults to white.
--optimize-mode {lcd,print}
The rendering optimisation mode. None if not given.
--crop CROP CROP CROP CROP
Amount to crop from (left, bottom, right, top).
--draw-annots, --no-draw-annots
Whether annotations may be shown (default: true).
--draw-forms, --no-draw-forms
Whether forms may be shown (default: true).
--no-antialias {text,image,path} [{text,image,path} ...]
Item types that shall not be smoothed.
--force-halftone Always use halftone for image stretching.
Bitmap options:
Bitmap config, including pixel format.
--bitmap-maker {native,foreign,foreign_packed,foreign_simple}
The bitmap maker to use.
--grayscale Whether to render in grayscale mode (no colors).
--byteorder REV_BYTEORDER
Whether to use BGR or RGB byteorder (default:
conditional).
--x-channel, --no-x-channel
Whether to prefer BGRx/RGBx over BGR/RGB (default:
conditional).
Parallelization:
Options for rendering with multiple processes.
--linear [LINEAR] Render non-parallel if page count is less or equal to
the specified value (default is conditional). If this
flag is given without a value, then render linear
regardless of document length.
--processes PROCESSES
The maximum number of parallel rendering processes.
Defaults to the number of CPU cores.
--parallel-strategy {spawn,forkserver,fork}
The process start method to use. ('fork' is
discouraged due to stability issues.)
--parallel-lib {mp,ft}
The parallelization module to use (mp =
multiprocessing, ft = concurrent.futures).
--parallel-map PARALLEL_MAP
The map function to use (backend specific, the default
is an iterative map).
Forced color scheme:
Options for using pdfium's forced color scheme renderer. Deprecated,
considered not useful.
--sample-theme Use a dark background sample theme as base. Explicit
color params override selectively.
--path-fill C C C C
--path-stroke C C C C
--text-fill C C C C
--text-stroke C C C C
--fill-to-stroke Only draw borders around fill areas using the
`path_stroke` color, instead of filling with the
`path_fill` color.
Page Tiler
$ pypdfium2 tile --help
usage: pypdfium2 tile [-h] [--password PASSWORD] --output OUTPUT --rows ROWS
--cols COLS --width WIDTH --height HEIGHT [--unit UNIT]
input
tile pages (N-up)
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--output OUTPUT, -o OUTPUT
Target path for the new document
--rows ROWS, -r ROWS Number of rows (horizontal tiles)
--cols COLS, -c COLS Number of columns (vertical tiles)
--width WIDTH Target width
--height HEIGHT Target height
--unit UNIT, -u UNIT Unit for target width and height (pt, mm, cm, in)
TOC Reader
$ pypdfium2 toc --help
usage: pypdfium2 toc [-h] [--password PASSWORD] [--n-digits N_DIGITS]
[--max-depth MAX_DEPTH]
input
print table of contents
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--n-digits N_DIGITS Number of digits to which coordinates/sizes shall be
rounded
--max-depth MAX_DEPTH
Maximum recursion depth to consider when parsing the
table of contents