Changelog
Warning
This is a documentation build for an unreleased version of pypdfium2, so it is possible that new changes are not logged yet.
Next
API changes
Rendering / Bitmap
Removed
PdfDocument.render()
(see deprecation rationale in v4.25 changelog). Instead, usePdfPage.render()
with a loop or process pool.Removed
PdfBitmap.get_info()
andPdfBitmapInfo
, which existed mainly on behalf of data transfer withPdfDocument.render()
. Instead, take the info from thePdfBitmap
object directly. (If using an adapter that copies, you may want to store the relevant info in variables to avoid holding a reference to the original buffer.)PdfBitmap.fill_rect()
: Changed argument order. Thecolor
parameter now goes first.PdfBitmap.to_numpy()
: If the bitmap is single-channel (grayscale), use a 2d shape to avoid needlessly wrapping each pixel value in a list.PdfBitmap.from_pil()
: Removedrecopy
parameter.
Pageobjects
Renamed
PdfObject.get_pos()
to.get_bounds()
.Renamed
PdfImage.get_size()
to.get_px_size()
.PdfImage.extract()
: Removedfb_render
option because it does not fit in this API. If the image’s rendered bitmap is desired, use.get_bitmap(render=True)
in the first place.
PdfDocument.get_toc()
: ReplacedPdfOutlineItem
namedtuple with method-oriented wrapper classesPdfBookmark
andPdfDest
, so callers may retrieve only the properties they actually need. This is closer to pdfium’s original API and exposes the underlying raw objects. Provides signed count as-is rather than splitting inn_kids
andis_closed
. Also distinguishes betweendest is None
and a dest with unknown mode.Renamed misleading
PdfMatrix.mirror()
parametersv, h
toinvert_x, invert_y
, as the terms horizontal/vertical flip commonly refer to the transformation applied, not the axis around which is being flipped (i.e. the previousv
meant flipping around the Y axis, which is vertical, but the resulting transform is inverting the X coordinates and thus actually horizontal). No behavior change if you did not use keyword arguments.PdfTextPage.get_text_range()
: Removed implicit translation of default calls to.get_text_bounded()
, as pdfium revertedFPDFText_GetText()
to UCS-2, which resolves the allocation concern. However, callers are encouraged to explicitly use.get_text_bounded()
for full Unicode support.In
pageobjects.py
, renamed an effectively-internal namedtuple and exception and made them private.Removed legacy version flags.
Improvements and new features
Added
PdfPosConv
andPdfBitmap.get_posconv(page)
helper for bidirectional translation between page and bitmap coordinates.Added
PdfObject.get_quad_points()
to get the corner points of an image or text object.Exposed
PdfPage.flatten()
(previously semi-private_flatten()
), after having found out how to correctly use it. Added check and updated docs accordingly.With
PdfImage.get_bitmap(render=True)
, addedscale_to_original
option (defaults to True) to temporarily scale the image to its native pixel size. This should improve output quality and make the API substantially more useful. Thanks to Lei Zhang for the suggestion.Added context manager support to
PdfDocument
, so it can be used in awith
-statement, because opening from a file path binds a file descriptor (usually on the C side), which should be released explicitly, given OS limits.If document loading failed,
err_code
is now assigned to thePdfiumError
instance so callers may programmatically handle the error subtype.In
PdfPage.render()
, added a new optionuse_bgra_on_transparency
. If there is page content with transparency, using BGR(x) may slow down PDFium. Therefore, it is recommended to set this option to True if dynamic (page-dependent) pixel format selection is acceptable. Alternatively, you might want to use only BGRA viaforce_bitmap_format=pypdfium2.raw.FPDFBitmap_BGRA
(at the cost of occupying more memory compared to BGR).In
PdfBitmap.new_*()
methods, avoid use of.from_raw()
, and instead call the constructor directly, as most parameters are already known on the caller side when creating a bitmap.In the rendering CLI, added
--invert-lightness --exclude-images
post-processing options to render with selective lightness inversion. This may be useful to achieve a “dark theme” for light PDFs while preserving different colors, but goes at the cost of performance. (PDFium also provides a color scheme option, but this only allows you to set colors for certain object types, which are then forced on all instances of the type in question. This may flatten different colors into one, leading to a loss of visual information.)Corrected some null pointer checks: we have to use
bool(ptr)
rather thanptr is None
.Improved startup performance by deferring imports of optional dependencies to the point where they are actually needed, to avoid overhead if you do not use them.
Simplified version classes (no API change expected).
Platforms
Experimental Android support added (cf. PEP 738).
arm64_v8a
,armeabi_v7a
,x86_64
,x86
are now handled in setup and should implicitly download the right binaries. We do not publish any android wheels at this time. However, we might want to providearm64_v8a
(and maybearmeabi_v7a
) wheels in the future. Note, android support is provided on a best effort basis, and largely untested (only arm64 Termux prior to PEP 738 has been tested on the author’s phone). Please report success or failure.Experimental iOS support added as well (cf. PEP 730).
arm64
device and simulator, andx86_64
simulator are now handled and should implicitly download the right binaries. However, this is untested and may not be enough to get all the way through. In particular, the PEP hints that the binary needs to be moved to a Frameworks location, in which case you’d also need to change the library search path. No iOS wheels will be provided at this time. However, if there are testers and an actual demand, iOS arm64 wheels may be enabled in the future.Note, we have no intent to provide wheels for the simulators (
android x86_64/x86
,ios arm64_simu/x86_64
), as they are only relevant to developers, and installing from source with automatic binary deployment should be roughly equialvent.
Setup
Avoid needlessly calling
_get_libc_ver()
. Instead, call it only on Linux. A negative side effect of calling this unconditionally is that, on non-Linux platforms, an empty string may be returned, in which case the musllinux handler would be reached, which uses non-public API and isn’t meant to be called on other platforms (though it seems to have passed).If packaging with
PDFIUM_PLATFORM=sourcebuild
, forward the platform tag determined bybdist_wheel
’s wrapper, rather than using the underlyingsysconfig.get_platform()
directly. This may provide more accurate results, e.g. on macOS.
Project
Replaced the bash
./run
file with ajustfile
. Note that the runfile previously did not fail fast and propagate errors, which is potentially dangerous for a release workflow. This had been fixed on the runfile in v5.0.0b1 before introducing the justfile.CI: Added Linux aarch64 (GH now provides free runners) and Python 3.13 to the test matrix.
Merged
tests_old/
back intotests/
.Migrated from deprecated
.reuse/dep5
/.reuse/dep5-wheel
to more visibleREUSE.toml
/REUSE-wheel.toml
.Docs: Improved logic when to include the unreleased version warning and upcoming changelog.
Bumped minimum pdfium requirement in conda recipe to
>6635
(effectively>=6638
), due to new errchecks that are not version-guarded.Cleanly split out conda packaging into an own file, and confined it to the
conda/
directory, to avoid polluting the main setup code.
5.0.0b1 (2025-02-03)
Updated PDFium from
6899
to6996
.See the beta release notes on GitHub here
4.30.1 (2024-12-19)
Updated PDFium from
6462
to6899
.PdfPage.get_objects()
: Don’t register pageobjects as children, because they don’t need to be closed by the caller when part of a page. This avoids excessive caching of weakrefs that are not cleaned up with the object they refer to.Fixed another dotted filepath blunder in the
extract-images
CLI. (ThePdfImage.extract()
API is not affected this time.)Adapted setup code to
bdist_wheel
relocation (moved from wheel to setuptools).Fixed installation with reference bindings (
PDFIUM_BINDINGS=reference
) by actually including them in the sdist and adding a missingmkdir
call. (In older versions, this can be worked around by cloning the repository and creating the missing directory manually before installation.)Fixed sourcebuild on windows by syncing patches with pdfium-binaries.
Updated test expectations: due to changes in pdfium, some numbers are now slightly different.
Fixed conda packaging: It is now required to explicitly specify
-c defaults
with--override-channels
, presumably due to an upstream change.Autorelease: Swapped default condition for minor/patch update, as pypdfium2 changes are likely more API-significant than pdfium updates. Added ability for manual override.
Bumped workflows to Python 3.12.
Updated docs on licensing.
This is expected to be the last release of the v4 series.
4.30.0 (2024-05-09)
Backported bug fixes / corrections from current development branch.
Updated PDFium from
6406
to6462
.Fixed blunder in
PdfImage.extract()
producing an incorrect output path for prefixes containing a dot. In theextract-images
CLI, this caused all output images of a type to be written to the same path for a document containing a non-extension dot in the filename.XFA / rendering CLI: Fixed incorrect recognition of document length.
pdf.init_forms()
must be called beforelen(pdf)
.Made
get_text_range()
allocation adapt to pdfium version, asFPDFText_GetText()
has been reverted to UCS-2. (See v4.28 changelog for background.)Updated workflows to include both
macos-13
andmacos-14
in test matrices because v13 is Intel and v14 ARM64 on GH actions. Removed python 3.7 testing because not supported anymore onmacos-14
runners.
4.29.0 (2024-04-10)
Updated PDFium from
6337
to6406
.
4.28.0 (2024-03-10)
Updated PDFium from
6281
to6337
.get_text_range()
: Fixed a buffer size regression introduced in v4.26.0, caused by an unexpected behavior change in pdfium (thanks @elonzh for the bug report, #298). Since that change, it is not possible anymore to tell the exact amount of memory needed, so we have to allocate for the worst case. Therefore, while this problem persists, it is recommended to instead useget_text_bounded()
where possible.
4.27.0 (2024-02-10)
Updated PDFium from
6233
to6281
.Added ability to define
$CTYPESGEN_PIN
when building sdist via./run craft pypi --sdist
, which allows to reproduce our sdists when set to the head commit hash ofpypdfium2-team/ctypesgen
at the time of the build to reproduce. Alternatively, you may patch the relevantpyproject.toml
entry yourself and usePDFIUM_PLATFORM=sdist python -m build --sdist
as usual.Set up Dependabot for GH Actions. Updated dependencies accordingly.
4.26.0 (2024-01-10)
Updated PDFium from
6164
to6233
.Pin ctypesgen in sdist to prevent re-occurrence of #264 / #286. As a drawback, the pin is never committed, so the sdist is not simply reproducible at this time due to dependence on the latest commit hash of the ctypesgen fork at build time.
Wheel tags: Added back
manylinux2014
in addition tomanylinux_{glibc_ver}
to be on the safe side. Suspected relation to the above issues.
4.25.0 (2023-12-10)
Updated PDFium from
6110
to6164
.Removed multiprocessing from deprecated
PdfDocument.render()
API and replaced with linear rendering. See below for more info.setup: Fixed blunder in headers cache logic that would cause existing headers to be always reused regardless of version. Note, this did not affect release workflows, only local source re-installs.
Show path of linked binary in
pypdfium2 -v
.conda: Improved installation docs and channel config.
conda/workflows: Added ability to (re-)build pypdfium2_raw bindings with any given version of pdfium. Fixes #279.
Made reference bindings more universal by including V8, XFA and Skia symbols. This is possible due to the dynamic symbol guards.
Instruct ctypesgen to exclude some unused alias symbols pulled in from struct tags.
Improved issue templates, added pull request template.
Improved ctypesgen (pypdfium2-team fork).
Rationale for PdfDocument.render()
deprecation
The parallel rendering API unfortunately was an inherent design mistake: Multiprocessing is not meant to transfer large amounts of pixel data from workers to the main process.
Bitmap transfer is so expensive that it essentially outweighed parallelization, so there was no real performance advantage, only higher memory load.
As a related problem, the worker pool produces bitmaps at an independent speed, regardless of where the receiving iteration might be, so bitmaps could queue up in memory, possibly causing an enormeous rise in memory consumption over time. This effect was pronounced e.g. with PNG saving via PIL, as seen in Facebook’s
nougat
project.Instead, each bitmap should be processed (e.g. saved) in the job which created it. Only a minimal, final result should be sent back to the main process (e.g. a file path).
This means we cannot reasonably provide a generic parallel renderer; instead it needs to be implemented by callers.
Historically, note that there had been even more faults in the implementation:
Prior to
4.22.0
, the pool was always initialized withos.cpu_count()
processes by default, even when rendering less pages.Prior to
4.20.0
, a full-scale input transfer was conducted on each job (rendering it unusable with bytes input). However, this can and should be done only once on process creation.
pypdfium2’s rendering CLI cleanly re-implements parallel rendering to files. We may want to turn this into an API in the future.
Due to the potential for serious issues as outlined above, we strongly recommend that end users update and dependants bump their minimum requirement to this version. Callers should move away from PdfDocument.render()
and use PdfPage.render()
instead.
4.24.0 (2023-11-10)
Updated PDFium from
6097
to6110
.Added GitHub issue templates.
4.23.1 (2023-10-31)
No PDFium update.
Fixed (Test)PyPI upload.
4.23.0 (2023-10-31)
Note: (Test)PyPI upload failed for this release due to an oversight.
Updated PDFium from
6070
to6097
.Fixed faulty version repr (avoid trailing
+
if desc is empty).Merged conda packaging code, including CI and Readme integration.
Updated setup code, mainly to support conda.
Independent bindings cache. Download headers from pdfium. Extract archive members explicitly.
Cleaned up version integration of sourcebuild.
Changed
system
platform to generate files according to given version, instead of expecting given files.Added
prepared!
prefix to platform spec, allowing to install with given files.Added
PDFIUM_BINDINGS=reference
to use pre-built bindings when installing from source.
Updated Readme.
4.22.0 (2023-10-19)
Updated PDFium from
6056
to6070
.Changed
PDFIUM_PLATFORM=none
to strictly exclude all data files. Added new targetsystem
consuming bindings and version files supplied by the caller.Enhanced integration of separate modules. This blazes the trail for conda packaging. We had to move metadata back to
setup.cfg
since we need a dynamic project name, whichpyproject.toml
does not support.Major improvements to version integration.
Ship version info as JSON files, separately for each submodule. Expose as immutable classes. Legacy members have been retained for backwards compatibility.
Autorelease uses dedicated JSON files for state tracking and control.
Read version info from
git describe
, providing definite identification.If a local git repo is not available or
git describe
failed (e.g. sdist or shallow checkout), fall back to a supplied version file or the autorelease record. However, you are strongly encouraged to provide a setup that works withgit describe
where possible.
Added musllinux aarch64 wheel. Thanks to
@jerbob92
.
4.21.0 (2023-10-11)
Updated PDFium from
6002
to6056
.PdfTextPage.get_text_range()
: Correct the allocation in case of excluded/inserted chars, modify scope to prevent pdfium from reading beyondrange(index, index+count)
(which otherwise it does with leading excluded chars). Update docs to note the two different representations. Thanks to Nikita Rybak for the discovery (#261).Setup changes (partly ported from the devel branch)
ctypesgen fork: replaced the old, bloated library loader with a new, lean version
Merged
$PDFIUM_VERSION
and$PDFIUM_USE_V8
into the existing$PDFIUM_PLATFORM
specifier (see Readme for updated description).Removed the
build
package from pyproject buildsystem requires, where it was unnecessary. Thanks to Anaconda Team.Split in two separate modules: pypdfium2 for helpers (pure-python), pypdfium2_raw for the core bindings (data files).
Switched PyPI upload to “trusted publishing” (OIDC), which is considered safer. Further, the core maintainers have set up 2FA as requested by PyPI.
Note: Earlier releases may fail to install from source due to API-breaking changes to our ctypesgen fork (see #264). Where possible, avoid source installs and use the wheels instead (the default behavior). If you actually have to do this, consider --no-build-isolation
and pre-installed dependencies, including ctypesgen prior to commit 61c638b
.
Warning: musllinux wheels prior to pdfium-binaries 6043
might be invalid.
4.21.0b1 (2023-09-14)
Updated PDFium from
5989
to6002
.
4.20.0 (2023-09-10)
This release backports some key fixes/improvements from the development branch
Updated PDFium from
5975
to5989
.[V8/XFA] Fixed XFA init. This issue was caused by a typo in a struct field. Thanks to Benoît Blanchon.
[ctypesgen fork] Prevent setting nonexistent struct fields.
[V8/XFA] Expose V8/XFA exclusive members in the bindings file by passing ctypesgen the pre-processor defines in question.
Fixed some major non-API implementation issues with multipage rendering:
Avoid full state data transfer and object re-initialization for each job. Instead, use a pool initializer and exploit global variables. This also makes bytes input tolerable for parallel rendering.
In the CLI, use a custom converter to save directly in workers instead of serializing bitmaps to the main process.
Set pdfium version fields to unknown for
PDFIUM_PLATFORM=none
(sdist). This prevents encoding a potentially incorrect version. Also improve CLI version print.Fixed sourcebuild with system libraries.
Fixed RTD build (
system_packages
option removal).Attempt to fix automatic GH pages rebuild on release.
4.19.0 (2023-08-28)
Updated PDFium from
5868
to5975
.Reset main branch to stable and shifted v5 development to a branch, so that pdfium updates (and possibly bug fixes) can still be handled. v5 development is delayed and unexpectedly tough, so this seemed necessary.
The automated schedule has been slowed down from weekly to monthly for the time being. Further manual releases may be triggered as necessary.
4.18.0 (2023-07-04)
Updated PDFium from
5854
to5868
.
4.17.0 (2023-06-27)
Updated PDFium from
5841
to5854
.
4.16.0 (2023-06-20)
Updated PDFium from
5827
to5841
.
4.15.0 (2023-06-13)
Updated PDFium from
5813
to5827
.In helpers, closing a parent object now automatically closes the children to ensure correct order. This notably enhances safety of closing and absorbs the common mistake of closing a parent but missing child close calls. See commit eb07605 for more info.
In
init_forms()
, attempt to callFPDF_LoadXFA()
and warn on failure, though as of this writing it always fails.
4.14.0 (2023-06-06)
Updated PDFium from
5799
to5813
.
4.13.0 (2023-05-30)
Updated PDFium from
5786
to5799
.
4.12.0 (2023-05-23)
Updated PDFium from
5772
to5786
.
4.11.0 (2023-05-16)
Updated PDFium from
5758
to5772
.In
PdfDocument.render()
, fixed a badbitmap.close()
call that would lead to a downstream use after free when using the combination of foreign bitmap and no-copy conversion. Using foreign bitmaps was not the default and expressly not recommended.
4.10.0 (2023-05-09)
Updated PDFium from
5744
to5758
.
4.9.0 (2023-05-02)
Updated PDFium from
5731
to5744
.
4.8.0 (2023-04-25)
Updated PDFium from
5715
to5731
.PdfTextPage.get_rect()
: Added missing return code check and updated docs regarding dependence oncount_rects()
. Fixed related test code that was broken but disabled by accident (missing asserts). Thanks to Guy Rosin for reporting #207.Added
PdfImage.get_size()
wrapping the new pdfium functionFPDFImageObj_GetImagePixelSize()
, which is faster than getting image size through the metadata.build_pdfium.py --use-syslibs
: Changedsysroot="/"
(invalid) touse_sysroot=false
(valid). This allows us to remove a botched patch.
4.7.0 (2023-04-18)
Updated PDFium from
5705
to5715
.Fixed
PdfPage.remove_obj()
wrongly retaining the page as parent in the finalizer hierarchy.
4.6.0 (2023-04-11)
Updated PDFium from
5692
to5705
.
4.5.0 (2023-04-04)
Updated PDFium from
5677
to5692
.In pdfium-binaries, forms init for V8/XFA enabled builds was fixed by correctly setting up XFA on library init (see pdfium-binaries#105). Updated pypdfium2’s support model accordingly.
4.4.0 (2023-03-28)
Updated PDFium from
5664
to5677
.
4.3.0 (2023-03-21)
Updated PDFium from
5648
to5664
.Fixed forms rendering in the multi-page renderer by initializing a formenv in worker jobs if the triggering document has one.
4.2.0 (2023-03-14)
Updated PDFium from
5633
to5648
.API-breaking changes around forms code, necessary to fix conceptual issues. Closes #182.
may_init_forms
parameter replaced withinit_forms()
, so that a custom form config can be provided.formtype
attribute replaced withget_formtype()
. Previously,formtype
would only be set correctly on formenv init, which caused confusion for documents that have forms but no formenv was initialized.
PdfPage.get_*box()
functions now provide an option to disable fallbacks. Closes #187.Some formerly hidden utilities are now exposed in the new namespace
pypdfium2.internal
.
4.1.0 (2023-03-07)
Updated PDFium from
5619
to5633
.The
PdfDocument
parametermay_init_forms
is now False by default.
4.0.0 (2023-02-28)
Updated PDFium from
5579
to5619
.Full support model rewrite. Many existing features changed and new helpers added. Numerous bugs fixed on the way. Read the updated documentation to migrate your code.
The raw API is now isolated in a separate namespace (
pypdfium2.raw
). Moreover, the raw API bindings do not implicitly encode strings anymore (pypdfium2 is now built with a patched version of ctypesgen by default).Helper objects now automatically resolve to the underlying raw object if used as ctypes function parameter.
Overhauled the code base to use
pathlib
and f-strings.Updated wheel tags.
Improved command-line interface, setup code, and documentation.
4.0.0b2 (2023-02-23)
First successful beta release for v4.
4.0.0b1 (2023-02-22)
Attempted beta release for v4. PyPI upload failed due to #177.
History
pypdfium2 is on PyPI since Dec 3, 2021. New versions have been released on a regular basis ever since.
There have been the following version ranges: 0.1 - 0.15
, 1.0 - 1.11
, 2.0 - 2.11
, 3.0 - 3.21.1
.
Entries for releases below version 4 have been removed from the changelog because they were too inconsistent.