Converts PDF documents to the PDF/A format for long-term archiving. Thereby create Pdf documents that are by standard Pdf compliant.
Level
Intend
Supported Compliances
B: Basic Conformance
Demands only standards needed for reliable reproduction of a PDFs visual representation.
PDFA1B, PDFA2B, PDFA3B
A: Accessible Conformance
Basic Conformance and other features to improve the PDF’s accessibility such as hierarchical document structure.
PDFA1A, PDFA2A, PDFA3A
U: Unicode Mapping
All text must have unicode mapping. Guidelines have been replaced with more detailed technical specifications.
PDFA2U, PDFA3U
Warning: It may well happen that it is not possible to convert a given PDF to the PDF/A format. The conversion to Compliance Level A is most critical, as it requires tagging. Tags provide a logical structure which allows the contents of an image to be described to the visually impaired. Adding tagging information without prior knowledge about the input file’s structure and content is impossible. In which case a Pdf4meBackendException is thrown.
// create createPdfA object
var createPdfA = new CreatePdfA()
{
// document
Document = new Document()
{
DocData = File.ReadAllBytes("myPdf.pdf"),
Name = "myPdf.pdf",
},
// action
PdfAAction = new PdfAAction()
{
Compliance = PdfAActionCompliance.PdfA2b,
}
};
// create PDF/A
var res = await Pdf4meClient.Pdf4me.Instance.PdfAClient.PdfAAsync(createPdfA);
// extract the PDF/A and writing it to disk
byte[] pdfA = res.Document.DocData;
File.WriteAllBytes("pdfA.pdf", pdfA);
// setup the pdfAClient
PdfAClient pdfAClient = new PdfAClient(pdf4meClient);
// create createPdfA object
CreatePdfA createPdfA = new CreatePdfA();
// document
Document document = new Document();
document.setDocData(Files.readAllBytes(Paths.get("myPdf.pdf")));
createPdfA.setDocument(document);
// action
PdfAAction pdfAAction = new PdfAAction();
pdfAAction.setCompliance(ComplianceEnum.PDFA2B);
createPdfA.setPdfAAction(pdfAAction);
// create PDF/A
CreatePdfARes res = pdfAClient.pdfA(createPdfA);
// extracting the generated PDF and writing it to disk
byte[] pdfA = res.getDocument().getDocData();
FileUtils.writeByteArrayToFile(new File("pdfA.pdf"), pdfA);
// creating PDF/A - only providing the file and writing it to disk
byte[] pdfA = await Pdf4meClient.Pdf4me.Instance.PdfAClient.CreatePdfAAsync(File.ReadAllBytes("myPdf.pdf"));
File.WriteAllBytes("pdfA.pdf", pdfA);
// creating PDF/A - providing the file and the desired pdfCompliance and writing it to disk
byte[] pdfA = await Pdf4meClient.Pdf4me.Instance.PdfAClient.CreatePdfAAsync(File.ReadAllBytes("myPdf.pdf"), PdfAActionCompliance.PdfA2b);
File.WriteAllBytes("pdfA.pdf", pdfA);
// setup the pdfAClient
PdfAClient pdfAClient = new PdfAClient(pdf4meClient);
// creating PDF/A - only providing the file and writing it to disk
byte[] pdfA = pdfAClient.createPdfA(new File("myPdf.pdf"));
FileUtils.writeByteArrayToFile(new File("pdfA.pdf"), pdfA);
// creating PDF/A - providing the file and the desired pdfCompliance and writing it to disk
byte[] pdfA = pdfAClient.createPdfA(ComplianceEnum.PDFA2B, new File("myPdf.pdf"));
FileUtils.writeByteArrayToFile(new File("pdfA.pdf"), pdfA);
// create pdf4meClient
const pdf4meClient = pdf4me.createClient('YOUR API KEY')
// conversion to PDF/A with pdf_compliance specification
pdf4meClient.createPdfA('pdfA2b', fs.createReadStream(path.join(__dirname, 'myPdf.pdf')))
.then(pdf => {
// writing the PDF/A to disk
fs.writeFileSync(path.join(__dirname, 'pdfA.pdf'), pdf)
})
.catch(error => {
console.error(error)
})
// conversion to PDF/A with pdf_compliance specification
$res = $client->pdf4me()->createPdfA(
[
"pdfCompliance"=> "pdfA2b",
"file" => __DIR__.'/myPdf.pdf'
]
);
//writing it to file
file_put_contents('pdfA.pdf', $createPdfAFile);
# setup the pdfA_client
pdfA_client = PdfAClient(pdf4me_client)
# conversion to PDF/A with pdf_compliance specification
pdfA = pdfA_client.create_pdfA(
pdf_compliance='pdfA2b',
file=FileReader().get_file_handler(path='myPdf.pdf')
)
# or conversion to PDF/A without pdf_compliance specification
pdfA = pdfA_client.create_pdfA(
file=FileReader().get_file_handler(path='myPdf.pdf')
)
# writing the PDF/A to disk
with open('pdfA.pdf', 'wb') as f:
f.write(pdfA)
By default, fonts that are embedded are automatically subset to minimize the file size. Whether fonts are subset or not is irrelevant with respect to the compliance with PDF/A. (Relevant is only that all used glyphs are contained in the font program.) Additionals fonts can be given in this FontsToSubset List.
compliance
Properties like compliance
customProperties
String
PDF compliance. Some files cannot be converted to the compliance requested. This will be detected and, if possible, an up- (allowUpgrade) or downgrade (allowDowngrade) of the compliance will be applied automatically.
allowDowngrade
boolean
Automatic downgrade of the PDF/A conformance level. - true: the level is downgraded under the following conditions: A) Downgrade to level B: If a file contains text that is not extractable (i.e. missing ToUnicode information). Example: Downgrade PDF/A-2u to PDF/A-2b. B) Downgrade to level U (PDF/A-2 and PDF/A-3) or B(PDF/A-1): Level A requires logical structure information and “tagging” information, so if a file contains no such information, its level is downgraded. Logical structure information in a PDF defines the structure of content, such as titles, paragraphs, figures, reading order, tables or articles. Logical structure elements can be “tagged” with descriptions or alternative text. “Tagging” allows the contents of an image to be described to the visually impaired. It is not possible for Pdf/A converter to add meaningful tagging information. Adding tagging information without prior knowledge about the input file’s structure and content is neither possible nor allowed by the PDF/A standard. For that reason, the conformance level is automatically downgraded to level B or U. - false: and an input file cannot be converted to the requested standard, e.g. because of missing “tagging” information, the conversion is aborted and the ErrorCode set to PDF_E_DOWNGRADE.
allowUpgrade
boolean
Automatic upgrade of the PDF/A version. - true: automatic upgrade of the PDF/A version is allowed. If the target standard is PDF/A-1 and a file contains elements that cannot be converted to PDF/A-1, the target standard is upgraded to PDF/A-2. This avoids significant visual differences in the output file. For example, the following elements may lead to an automatic upgrade: A) Transparency B) Optional content groups (OCG, layers). C) Real values that exceed the implementation limit of PDF/A-1. D) Embedded OpenType font files. E) Predefined CMap encodings in Type0 fonts. - false: the compliance is not upgraded. And in case of: A) occurance of visual differences in output file B) removal of optional content groups (layers) (PDF/A-1 only) C) removal of transparency (PDF/A-1 only) D) removal of embedded files E) removal of non convertible XMP metadata F) the input document is corrupt and should be repaired. The conversion will fail with a conversion error PDF_E_CONVERSION.
outputIntentProfile
String
Output Intent Profile. The given profile is embedded only if the input file does not contain a PDF/A output intent already.
linearize
boolean
Linearization of the PDF output file i.e. optimize file for fast web access. A linearized document has a slightly larger file size than a non-linearized file and provides the following main features: When a document is opened in a PDF viewer of a web browser, the first page can be viewed without downloading the entire PDF file. In contrast, a non-linearized PDF file must be downloaded completely before the first page can be displayed. When another page is requested by the user, that page is displayed as quickly as possible and incrementally as data arrives, without downloading the entire PDF file. Signed files cannot be linearized. So this property must be set to false if a digital signature is applied.
Logging information about the request, e.g. timestamp.
Page
Name
Type
Description
Notes
documentId
String
Globally unique Id.
pageId
String
Globally unique Id.
pageNumber
Integer
PageNumber, starting with 1.
rotate
double
By how much the page was rotated from its original orientation.
thumbnail
byte
Thumbnail representing this particular page.
sourceDocumentId
String
Id of the document it was created from, e.g. in case of an extraction, the result's sourceDocumentId is the Id of the PDF the pages have been extracted from.
sourcePageNumber
Integer
Page number of the original page in the original document, e.g. let's assume document B consists of page number 4 of document A (extraction). Thus, document B's only page's sourcePageNumber is number 4.