Constants and Enumerations

Constants and enumerations of MuPDF as implemented by PyMuPDF. Each of the following variables is accessible as fitz.variable.

Constants

Base14_Fonts

Predefined Python list of valid PDF Base 14 Fonts.

Return type

list

csRGB

Predefined RGB colorspace fitz.Colorspace(fitz.CS_RGB).

Return type

Colorspace

csGRAY

Predefined GRAY colorspace fitz.Colorspace(fitz.CS_GRAY).

Return type

Colorspace

csCMYK

Predefined CMYK colorspace fitz.Colorspace(fitz.CS_CMYK).

Return type

Colorspace

CS_RGB

1 – Type of Colorspace is RGBA

Return type

int

CS_GRAY

2 – Type of Colorspace is GRAY

Return type

int

CS_CMYK

3 – Type of Colorspace is CMYK

Return type

int

VersionBind

‘x.xx.x’ – version of PyMuPDF (these bindings)

Return type

string

VersionFitz

‘x.xxx’ – version of MuPDF

Return type

string

VersionDate

ISO timestamp YYYY-MM-DD HH:MM:SS when these bindings were built.

Return type

string

Note

The docstring of fitz contains information of the above which can be retrieved like so: print(fitz.__doc__), and should look like: PyMuPDF 1.10.0: Python bindings for the MuPDF 1.10 library, built on 2016-11-30 13:09:13.

version

(VersionBind, VersionFitz, timestamp) – combined version information where timestamp is the generation point in time formatted as “YYYYMMDDhhmmss”.

Return type

tuple

Document Permissions

Code

Permitted Action

PDF_PERM_PRINT

Print the document

PDF_PERM_MODIFY

Modify the document’s contents

PDF_PERM_COPY

Copy or otherwise extract text and graphics

PDF_PERM_ANNOTATE

Add or modify text annotations and interactive form fields

PDF_PERM_FORM

Fill in forms and sign the document

PDF_PERM_ACCESSIBILITY

Obsolete, always permitted

PDF_PERM_ASSEMBLE

Insert, rotate, or delete pages, bookmarks, thumbnail images

PDF_PERM_PRINT_HQ

High quality printing

PDF encryption method codes

Code

Meaning

PDF_ENCRYPT_KEEP

do not change

PDF_ENCRYPT_NONE

remove any encryption

PDF_ENCRYPT_RC4_40

RC4 40 bit

PDF_ENCRYPT_RC4_128

RC4 128 bit

PDF_ENCRYPT_AES_128

Advanced Encryption Standard 128 bit

PDF_ENCRYPT_AES_256

Advanced Encryption Standard 256 bit

PDF_ENCRYPT_UNKNOWN

unknown

Font File Extensions

The table show file extensions you should use when extracting fonts from a PDF file.

Ext

Description

ttf

TrueType font

pfa

Postscript for ASCII font (various subtypes)

cff

Type1C font (compressed font equivalent to Type1)

cid

character identifier font (postscript format)

otf

OpenType font

n/a

built-in font (PDF Base 14 Fonts or CJK: cannot be extracted)

Text Alignment

TEXT_ALIGN_LEFT

0 – align left.

TEXT_ALIGN_CENTER

1 – align center.

TEXT_ALIGN_RIGHT

2 – align right.

TEXT_ALIGN_JUSTIFY

3 – align justify.

Text Extraction Flags

Option bits controlling the amount of data, that are parsed into a TextPage – this class is mainly used only internally in PyMuPDF.

For the PyMuPDF programmer, some combination (using Python’s | operator, or simply use +) of these values are aggregated in the flags integer, a parameter of all text search and text extraction methods. Depending on the individual method, different default combinations of the values are used. Please use a value that meets your situation. Especially make sure to switch off image extraction unless you really need them. The impact on performance and memory is significant!

TEXT_PRESERVE_LIGATURES

1 – If set, ligatures are passed through to the application in their original form. Otherwise ligatures are expanded into their constituent parts, e.g. the ligature “ffi” is expanded into three eparate characters f, f and i. Default is “on” in PyMuPDF.

TEXT_PRESERVE_WHITESPACE

2 – If set, whitespace is passed through to the application in its original form. Otherwise any type of horizontal whitespace (including horizontal tabs) will be replaced with space characters of variable width. Default is “on” in PyMuPDF.

TEXT_PRESERVE_IMAGES

4 – If set, then images will be stored in the structured text structure. This causes the presence of (usually large!) binary image contents in the output of text extractions of types “dict”, “json”, “rawdict”, “rawjson”, “html”, and “xhtml” and is the default here. If used with “blocks” however (default “off”), only image metadata will be returned, not the image itself.

TEXT_INHIBIT_SPACES

8 – If set, we will not try to add missing space characters where there are large gaps between characters. In PDF, the creator usually does not insert (multiple) spaces to point to the next character’s position, but will provide a direct location address for the character. The default in PyMuPDF is “off”.

TEXT_DEHYPHENATE

16 – Ignore hyphens at line ends and join with next line. Used internally with the text search functions.

TEXT_PRESERVE_SPANS

32 – Generate a new line for every span. Not used (“off”) in PyMuPDF, but available for your use. Every line in “dict”, “json”, “rawdict”, “rawjson” will contain exactly one span.

Widget Constants

Widget Types (field_type)

PDF_WIDGET_TYPE_UNKNOWN 0
PDF_WIDGET_TYPE_BUTTON 1
PDF_WIDGET_TYPE_CHECKBOX 2
PDF_WIDGET_TYPE_COMBOBOX 3
PDF_WIDGET_TYPE_LISTBOX 4
PDF_WIDGET_TYPE_RADIOBUTTON 5
PDF_WIDGET_TYPE_SIGNATURE 6
PDF_WIDGET_TYPE_TEXT 7

Text Widget Subtypes (text_format)

PDF_WIDGET_TX_FORMAT_NONE 0
PDF_WIDGET_TX_FORMAT_NUMBER 1
PDF_WIDGET_TX_FORMAT_SPECIAL 2
PDF_WIDGET_TX_FORMAT_DATE 3
PDF_WIDGET_TX_FORMAT_TIME 4

Widget flags (field_flags)

Common to all field types:

PDF_FIELD_IS_READ_ONLY 1
PDF_FIELD_IS_REQUIRED 1 << 1
PDF_FIELD_IS_NO_EXPORT 1 << 2

Text widgets:

PDF_TX_FIELD_IS_MULTILINE  1 << 12
PDF_TX_FIELD_IS_PASSWORD  1 << 13
PDF_TX_FIELD_IS_FILE_SELECT  1 << 20
PDF_TX_FIELD_IS_DO_NOT_SPELL_CHECK  1 << 22
PDF_TX_FIELD_IS_DO_NOT_SCROLL  1 << 23
PDF_TX_FIELD_IS_COMB  1 << 24
PDF_TX_FIELD_IS_RICH_TEXT  1 << 25

Button widgets:

PDF_BTN_FIELD_IS_NO_TOGGLE_TO_OFF  1 << 14
PDF_BTN_FIELD_IS_RADIO  1 << 15
PDF_BTN_FIELD_IS_PUSHBUTTON  1 << 16
PDF_BTN_FIELD_IS_RADIOS_IN_UNISON  1 << 25

Choice widgets:

PDF_CH_FIELD_IS_COMBO  1 << 17
PDF_CH_FIELD_IS_EDIT  1 << 18
PDF_CH_FIELD_IS_SORT  1 << 19
PDF_CH_FIELD_IS_MULTI_SELECT  1 << 21
PDF_CH_FIELD_IS_DO_NOT_SPELL_CHECK  1 << 22
PDF_CH_FIELD_IS_COMMIT_ON_SEL_CHANGE  1 << 26

PDF Standard Blend Modes

For an explanation see Adobe PDF References, page 520:

PDF_BM_Color "Color"
PDF_BM_ColorBurn "ColorBurn"
PDF_BM_ColorDodge "ColorDodge"
PDF_BM_Darken "Darken"
PDF_BM_Difference "Difference"
PDF_BM_Exclusion "Exclusion"
PDF_BM_HardLight "HardLight"
PDF_BM_Hue "Hue"
PDF_BM_Lighten "Lighten"
PDF_BM_Luminosity "Luminosity"
PDF_BM_Multiply "Multiply"
PDF_BM_Normal "Normal"
PDF_BM_Overlay "Overlay"
PDF_BM_Saturation "Saturation"
PDF_BM_Screen "Screen"
PDF_BM_SoftLight "Softlight"

Stamp Annotation Icons

MuPDF has defined the following icons for rubber stamp annotations:

STAMP_Approved 0
STAMP_AsIs 1
STAMP_Confidential 2
STAMP_Departmental 3
STAMP_Experimental 4
STAMP_Expired 5
STAMP_Final 6
STAMP_ForComment 7
STAMP_ForPublicRelease 8
STAMP_NotApproved 9
STAMP_NotForPublicRelease 10
STAMP_Sold 11
STAMP_TopSecret 12
STAMP_Draft 13