Shell API
pypdfium2 can also be used from the command-line.
Version
$ pypdfium2 --version
pypdfium2 5.0.0b1
pdfium 134.0.6996.0 at /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/pypdfium2_raw/libpdfium.so
Main Help
$ pypdfium2 --help
usage: pypdfium2 [-h] [--version]
{arrange,attachments,extract-images,extract-text,imgtopdf,pageobjects,pdfinfo,render,tile,toc}
...
Command line interface to the pypdfium2 library (Python binding to PDFium)
positional arguments:
{arrange,attachments,extract-images,extract-text,imgtopdf,pageobjects,pdfinfo,render,tile,toc}
arrange rearrange/merge documents
attachments list/extract/edit embedded files
extract-images extract images
extract-text extract text
imgtopdf convert images to PDF
pageobjects print info on pageobjects
pdfinfo print info on document and pages
render rasterize pages
tile tile pages (N-up)
toc print table of contents
options:
-h, --help show this help message and exit
--version, -v show program's version number and exit
Arranger
$ pypdfium2 arrange --help
usage: pypdfium2 arrange [-h] [--pages PAGES [PAGES ...]]
[--passwords PASSWORDS [PASSWORDS ...]] --output
OUTPUT
inputs [inputs ...]
rearrange/merge documents
positional arguments:
inputs Sequence of PDF files.
options:
-h, --help show this help message and exit
--pages PAGES [PAGES ...]
Sequence of page texts, definig the pages to include
from each PDF. Use '_' as placeholder for all pages.
--passwords PASSWORDS [PASSWORDS ...]
Passwords to unlock encrypted PDFs. Any placeholder
may be used for non-encrypted documents.
--output OUTPUT, -o OUTPUT
Target path for the output document
Attachments
$ pypdfium2 attachments --help
usage: pypdfium2 attachments [-h] [--password PASSWORD]
input {list,extract,edit} ...
list/extract/edit embedded files
positional arguments:
input Input PDF document
{list,extract,edit}
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
$ pypdfium2 attachments file.pdf list --help
usage: pypdfium2 attachments input list [-h]
options:
-h, --help show this help message and exit
$ pypdfium2 attachments file.pdf extract --help
usage: pypdfium2 attachments input extract [-h] [--numbers NUMBERS]
--output-dir OUTPUT_DIR
options:
-h, --help show this help message and exit
--numbers NUMBERS
--output-dir OUTPUT_DIR, -o OUTPUT_DIR
$ pypdfium2 attachments file.pdf edit --help
usage: pypdfium2 attachments input edit [-h] [--del-numbers DEL_NUMBERS]
[--add-files F [F ...]] --output
OUTPUT
options:
-h, --help show this help message and exit
--del-numbers DEL_NUMBERS, -d DEL_NUMBERS
--add-files F [F ...], -a F [F ...]
--output OUTPUT, -o OUTPUT
Image Extractor
$ pypdfium2 extract-images --help
usage: pypdfium2 extract-images [-h] [--password PASSWORD] [--pages PAGES]
--output-dir OUTPUT_DIR
[--max-depth MAX_DEPTH] [--use-bitmap]
[--format FORMAT] [--render]
[--scale-to-original | --no-scale-to-original]
input
extract images
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--pages PAGES Page numbers and ranges to include
--output-dir OUTPUT_DIR, -o OUTPUT_DIR
Output directory to take the extracted images
--max-depth MAX_DEPTH
Maximum recursion depth to consider when looking for
pageobjects.
--use-bitmap Enforce the use of bitmaps rather than attempting a
smart extraction of the image.
--format FORMAT Image format to use when saving bitmaps. (Fallback if
doing smart extraction.)
--render When --use-bitmap is given, whether to get rendered
bitmaps, taking masks and transform matrices into
account.
--scale-to-original, --no-scale-to-original
When --use-bitmap --render is given, whether to scale
the image so it is rendered at its native resolution,
or close to that. This should improve output quality.
The default is True, but you may opt out.
Text Extractor
$ pypdfium2 extract-text --help
usage: pypdfium2 extract-text [-h] [--password PASSWORD] [--pages PAGES]
[--strategy {range,bounded}]
input
extract text
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--pages PAGES Page numbers and ranges to include
--strategy {range,bounded}
PDFium text extraction strategy (range, bounded).
Image Converter
$ pypdfium2 imgtopdf --help
usage: pypdfium2 imgtopdf [-h] --output OUTPUT [--inline] images [images ...]
convert images to PDF
positional arguments:
images Input images
options:
-h, --help show this help message and exit
--output OUTPUT, -o OUTPUT
Target path for the new PDF
--inline If JPEG, whether to use PDFium's inline loading
function.
Pageobjects Info
$ pypdfium2 pageobjects --help
usage: pypdfium2 pageobjects [-h] [--password PASSWORD] [--pages PAGES]
[--n-digits N_DIGITS] [--filter T [T ...]]
[--max-depth MAX_DEPTH]
[--info {pos,imginfo} [{pos,imginfo} ...]]
input
print info on pageobjects
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--pages PAGES Page numbers and ranges to include
--n-digits N_DIGITS Number of digits to which coordinates/sizes shall be
rounded
--filter T [T ...] Object types to include. Choices: ['?', 'text',
'path', 'image', 'shading', 'form']
--max-depth MAX_DEPTH
Maximum recursion depth to consider when descending
into Form XObjects.
--info {pos,imginfo} [{pos,imginfo} ...]
Object details to show.
Document Info
$ pypdfium2 pdfinfo --help
usage: pypdfium2 pdfinfo [-h] [--password PASSWORD] [--pages PAGES]
[--n-digits N_DIGITS]
input
print info on document and pages
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--pages PAGES Page numbers and ranges to include
--n-digits N_DIGITS Number of digits to which coordinates/sizes shall be
rounded
Renderer
$ pypdfium2 render --help
usage: pypdfium2 render [-h] [--password PASSWORD] [--pages PAGES] --output
OUTPUT [--prefix PREFIX] [--format FORMAT]
[--engine ENGINE_CLS] [--scale SCALE]
[--rotation {0,90,180,270}] [--fill-color C C C C]
[--optimize-mode {lcd,print}] [--crop C C C C]
[--draw-annots | --no-draw-annots]
[--draw-forms | --no-draw-forms]
[--no-antialias {text,image,path} [{text,image,path} ...]]
[--force-halftone]
[--bitmap-maker {native,foreign,foreign_packed,foreign_simple}]
[--grayscale] [--byteorder REV_BYTEORDER]
[--x-channel | --no-x-channel]
[--bgra-on-transparency | --no-bgra-on-transparency]
[--linear [LINEAR]] [--processes PROCESSES]
[--parallel-strategy {spawn,forkserver,fork}]
[--parallel-lib {mp,ft}] [--parallel-map PARALLEL_MAP]
[--sample-theme] [--path-fill C C C C]
[--path-stroke C C C C] [--text-fill C C C C]
[--text-stroke C C C C] [--fill-to-stroke]
[--invert-lightness] [--exclude-images]
input
rasterize pages
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--pages PAGES Page numbers and ranges to include
--output OUTPUT, -o OUTPUT
Output directory where the serially numbered images
shall be placed.
--prefix PREFIX Custom prefix for the images. Defaults to the input
filename's stem.
--format FORMAT, -f FORMAT
The image format to use (default: conditional).
--engine ENGINE_CLS The saver engine to use ('pil', 'numpy+pil',
'numpy+cv2')
--scale SCALE Define the resolution of the output images. By
default, one PDF point (1/72in) is rendered to 1x1
pixel. This factor scales the number of pixels that
represent one point.
--rotation {0,90,180,270}
Rotate pages by 90, 180 or 270 degrees.
--fill-color C C C C Color the bitmap will be filled with before rendering.
Shall be given in RGBA format as a sequence of
integers ranging from 0 to 255. Defaults to white.
--optimize-mode {lcd,print}
The rendering optimisation mode. None if not given.
--crop C C C C Amount to crop from (left, bottom, right, top).
--draw-annots, --no-draw-annots
Whether annotations may be shown (default: true).
--draw-forms, --no-draw-forms
Whether forms may be shown (default: true).
--no-antialias {text,image,path} [{text,image,path} ...]
Item types that shall not be smoothed.
--force-halftone Always use halftone for image stretching.
Bitmap options:
Bitmap config, including pixel format.
--bitmap-maker {native,foreign,foreign_packed,foreign_simple}
The bitmap maker to use.
--grayscale Whether to render in grayscale mode (no colors).
--byteorder REV_BYTEORDER
Whether to use BGR or RGB byteorder (default:
conditional).
--x-channel, --no-x-channel
Whether to prefer BGRx/RGBx over BGR/RGB (default:
conditional).
--bgra-on-transparency, --no-bgra-on-transparency
Whether to use BGRA if there is page content that has
transparency. Note, this makes format selection page-
dependent. As this behavior can be confusing, it is
not currently the default, but recommended for
performance in these cases.
Parallelization:
Options for rendering with multiple processes.
--linear [LINEAR] Render non-parallel if page count is less or equal to
the specified value (default: 4). If this flag is
given without a value, then render linear regardless
of document length.
--processes PROCESSES
The maximum number of parallel rendering processes.
Defaults to the number of CPU cores.
--parallel-strategy {spawn,forkserver,fork}
The process start method to use. ('fork' is
discouraged due to stability issues.)
--parallel-lib {mp,ft}
The parallelization module to use (mp =
multiprocessing, ft = concurrent.futures).
--parallel-map PARALLEL_MAP
The map function to use (backend specific, the default
is an iterative map).
Flat color scheme:
Options for using pdfium's color scheme renderer. Note that this may
flatten different colors into one, so the usability of this is limited.
Alternatively, consider post-processing with lightness inversion (see
below).
--sample-theme Use a dark background sample theme as base. Explicit
color params override selectively.
--path-fill C C C C
--path-stroke C C C C
--text-fill C C C C
--text-stroke C C C C
--fill-to-stroke When rendering with custom color scheme, only draw
borders around fill areas using the `path_stroke`
color, instead of filling with the `path_fill` color.
This is actually recommended, since with a single fill
color for paths the boundaries of adjacent fill paths
are less visible.
Post processing:
Options to post-process rendered images. Note, this may have a strongly
negative impact on performance.
--invert-lightness Invert lightness using the HLS color space (e.g.
white<->black, dark_blue<->light_blue). The intent is
to achieve a dark theme for documents with light
background, while providing better visual results than
classical color inversion or a flat pdfium color
scheme. However, note that --optimize-mode lcd is not
recommendable when inverting lightness.
--exclude-images Whether to exclude PDF images from lightness
inversion.
Page Tiler
$ pypdfium2 tile --help
usage: pypdfium2 tile [-h] [--password PASSWORD] --output OUTPUT --rows ROWS
--cols COLS --width WIDTH --height HEIGHT [--unit UNIT]
input
tile pages (N-up)
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--output OUTPUT, -o OUTPUT
Target path for the new document
--rows ROWS, -r ROWS Number of rows (horizontal tiles)
--cols COLS, -c COLS Number of columns (vertical tiles)
--width WIDTH Target width
--height HEIGHT Target height
--unit UNIT, -u UNIT Unit for target width and height (pt, mm, cm, in)
TOC Reader
$ pypdfium2 toc --help
usage: pypdfium2 toc [-h] [--password PASSWORD] [--n-digits N_DIGITS]
[--max-depth MAX_DEPTH]
input
print table of contents
positional arguments:
input Input PDF document
options:
-h, --help show this help message and exit
--password PASSWORD A password to unlock the PDF, if encrypted
--n-digits N_DIGITS Number of digits to which coordinates/sizes shall be
rounded
--max-depth MAX_DEPTH
Maximum recursion depth to consider when parsing the
table of contents