PDF/A

Converts PDF documents to the PDF/A format for long-term archiving. Thereby create Pdf documents that are by standard Pdf compliant.

LevelIntendSupported Compliances
B: Basic Conformance Demands only standards needed for reliable reproduction of a PDFs visual representation. PDFA1B, PDFA2B, PDFA3B
A: Accessible Conformance Basic Conformance and other features to improve the PDF’s accessibility such as hierarchical document structure. PDFA1A, PDFA2A, PDFA3A
U: Unicode Mapping All text must have unicode mapping. Guidelines have been replaced with more detailed technical specifications. PDFA2U, PDFA3U

Warning: It may well happen that it is not possible to convert a given PDF to the PDF/A format. The conversion to Compliance Level A is most critical, as it requires tagging. Tags provide a logical structure which allows the contents of an image to be described to the visually impaired. Adding tagging information without prior knowledge about the input file’s structure and content is impossible. In which case a Pdf4meBackendException is thrown.

FeatureParameterResponseActionDescriptionLinks
pdfApdfApdfARespdfAAction Convert PDF documents to pdfA Swagger
Sample
createPdfApdfCompliance,
file
file stream Convert PDF documents to pdfA Swagger
Sample

Samples

pdfA

  • curl
  • C#
  • Java
  • JavaScript
  • PHP
  • Python
  • Ruby
curl No Sample
// create createPdfA object
var createPdfA = new CreatePdfA()
{
    // document
    Document = new Document()
    {
        DocData = File.ReadAllBytes("myPdf.pdf"),
        Name = "myPdf.pdf",
    },
    // action
    PdfAAction = new PdfAAction()
    {
        Compliance = PdfAActionCompliance.PdfA2b,
    }
};

// create PDF/A
var res = await Pdf4meClient.Pdf4me.Instance.PdfAClient.PdfAAsync(createPdfA);

// extract the PDF/A and writing it to disk
byte[] pdfA = res.Document.DocData;
File.WriteAllBytes("pdfA.pdf", pdfA);
// setup the pdfAClient
PdfAClient pdfAClient = new PdfAClient(pdf4meClient);

// create createPdfA object
CreatePdfA createPdfA = new CreatePdfA();
// document
Document document = new Document();
document.setDocData(Files.readAllBytes(Paths.get("myPdf.pdf")));
createPdfA.setDocument(document);
// action
PdfAAction pdfAAction = new PdfAAction();
pdfAAction.setCompliance(ComplianceEnum.PDFA2B);
createPdfA.setPdfAAction(pdfAAction);

// create PDF/A
CreatePdfARes res = pdfAClient.pdfA(createPdfA);

// extracting the generated PDF and writing it to disk
byte[] pdfA = res.getDocument().getDocData();
FileUtils.writeByteArrayToFile(new File("pdfA.pdf"), pdfA);
// create pdf4meClient
const pdf4meClient = pdf4me.createClient('YOUR API KEY')

// create createPdfA object
const createPdfAReq = {
  // document
  document: {
    docData: fs.readFileSync(path.join(__dirname, 'myPdf.pdf')).toString('base64'),
  },
  // action
  pdfAAction: {
    compliance: 'pdfA2b',
  },
}

// create PDF/A
pdf4meClient.pdfA(createPdfAReq)
  .then(function(pdfARes) {
    // extract the PDF/A and writing it to disk
    const pdfDocument = Buffer.from(pdfARes.document.docData, 'base64')
    fs.writeFileSync(path.join(__dirname, 'pdfA_result.pdf'), pdfDocument)
  })
  .catch(error => {
    console.log(error)
    process.exit(1)
  })
// create createPdfA object
$create_pdfa = [
    // document
    "document" => [
        'name' => 'test.pdf',
        'docData' => $client->getFileData('myPdf.pdf')
    ],
    // action
    "pdfAAction" => [
        "compliance" => "pdfa2b"
    ]
];

// conversion to PDF/A with pdf_compliance specification
$res = $client->pdf4me()->pdfA($create_pdfa);

// extracting the generated PDF
$pdfA = base64_decode($createPdfA->document->docData);
// and writing it to file
file_put_contents('pdfA.pdf', $pdfA);
# setup the pdfA_client
pdfA_client = PdfAClient(pdf4me_client)

# create the create_pdfA object
create_pdfA = CreatePdfA(
    # document
    document=Document(
        doc_data=FileReader().get_file_data('myPdf.pdf')
    ),
    # action
    pdf_a_action=PdfAAction(
        compliance='pdfA2b'
    )
)

# conversion to PDF/A
res = pdfA_client.pdfA(create_pdfA=create_pdfA)

# extracting the generated PDF/A
pdfA = base64.b64decode(res['document']['doc_data'])
# writing it to disk
with open('pdfA.pdf', 'wb') as f:
    f.write(pdfA)
file_path = './myPdf.pdf'

    # create the pdfA object
    action = Pdf4me::PdfA.new(
      # document
      document: Pdf4me::Document.new(
        doc_data: Base64.encode64(File.open(file_path, 'rb', &:read))
      ),
      # action
      pdf_a_action: Pdf4me::PdfAAction.new(
        compliance: 'pdfA2b',
        allowDowngrade: true,
        allowUpgrade: true,
        outputIntentProfile: 'sRGBColorSpace',
        linearize: true
      )
    )
    response = action.run

    # saving the PDF/A document
    File.open('./pdfA.pdf', 'wb') do |f|
     f.write(Base64.decode64(response.document.doc_data))
    end

createPdfA

  • curl
  • C#
  • Java
  • JavaScript
  • PHP
  • Python
  • Ruby
curl https://api.pdf4me.com/PdfA/CreatePdfA ^
    -H "Authorization: Basic DEV-KEY" ^
    -F pdfCompliance=pdfA2b ^
    -F "file=@./myPdf.pdf" ^
    -o ./pdfA.pdf
// creating PDF/A - only providing the file and writing it to disk
byte[] pdfA = await Pdf4meClient.Pdf4me.Instance.PdfAClient.CreatePdfAAsync(File.ReadAllBytes("myPdf.pdf"));
File.WriteAllBytes("pdfA.pdf", pdfA);

// creating PDF/A - providing the file and the desired pdfCompliance and writing it to disk
byte[] pdfA = await Pdf4meClient.Pdf4me.Instance.PdfAClient.CreatePdfAAsync(File.ReadAllBytes("myPdf.pdf"), PdfAActionCompliance.PdfA2b);
File.WriteAllBytes("pdfA.pdf", pdfA);
// setup the pdfAClient
PdfAClient pdfAClient = new PdfAClient(pdf4meClient);

// creating PDF/A - only providing the file and writing it to disk
byte[] pdfA = pdfAClient.createPdfA(new File("myPdf.pdf"));
FileUtils.writeByteArrayToFile(new File("pdfA.pdf"), pdfA);

// creating PDF/A - providing the file and the desired pdfCompliance and writing it to disk
byte[] pdfA = pdfAClient.createPdfA(ComplianceEnum.PDFA2B, new File("myPdf.pdf"));
FileUtils.writeByteArrayToFile(new File("pdfA.pdf"), pdfA);
// create pdf4meClient
const pdf4meClient = pdf4me.createClient('YOUR API KEY')

// conversion to PDF/A with pdf_compliance specification
pdf4meClient.createPdfA('pdfA2b', fs.createReadStream(path.join(__dirname, 'myPdf.pdf')))
  .then(pdf => {
    // writing the PDF/A to disk
    fs.writeFileSync(path.join(__dirname, 'pdfA.pdf'), pdf)
  })
  .catch(error => {
    console.error(error)
  })
// conversion to PDF/A with pdf_compliance specification
$res = $client->pdf4me()->createPdfA(
    [
        "pdfCompliance"=> "pdfA2b",
        "file" => __DIR__.'/myPdf.pdf'
    ]
);

//writing it to file
file_put_contents('pdfA.pdf', $createPdfAFile);
# setup the pdfA_client
pdfA_client = PdfAClient(pdf4me_client)

# conversion to PDF/A with pdf_compliance specification
pdfA = pdfA_client.create_pdfA(
    pdf_compliance='pdfA2b',
    file=FileReader().get_file_handler(path='myPdf.pdf')
)
# or conversion to PDF/A without pdf_compliance specification
pdfA = pdfA_client.create_pdfA(
    file=FileReader().get_file_handler(path='myPdf.pdf')
)

# writing the PDF/A to disk
with open('pdfA.pdf', 'wb') as f:
    f.write(pdfA)
a = Pdf4me::CreatePdfA.new(
        file: '/myPdf.pdf',
        compliance: 'pdfA1b',
        save_path: 'pdfA.pdf'
   )
a.run

Models

pdfA

Name Type Description Notes
document Document
pdfAAction PdfAAction
jobId String [optional]
jobIdExtern String [optional]
integrations [String] [optional]

pdfAAction

Name Type Description Notes
fontsToSubset [PdfFont] By default, fonts that are embedded are automatically subset to minimize the file size. Whether fonts are subset or not is irrelevant with respect to the compliance with PDF/A. (Relevant is only that all used glyphs are contained in the font program.) Additionals fonts can be given in this FontsToSubset List.
compliance Properties like compliance
customProperties String PDF compliance. Some files cannot be converted to the compliance requested. This will be detected and, if possible, an up- (allowUpgrade) or downgrade (allowDowngrade) of the compliance will be applied automatically.
allowDowngrade boolean Automatic downgrade of the PDF/A conformance level. - true: the level is downgraded under the following conditions: A) Downgrade to level B: If a file contains text that is not extractable (i.e. missing ToUnicode information). Example: Downgrade PDF/A-2u to PDF/A-2b. B) Downgrade to level U (PDF/A-2 and PDF/A-3) or B(PDF/A-1): Level A requires logical structure information and “tagging” information, so if a file contains no such information, its level is downgraded. Logical structure information in a PDF defines the structure of content, such as titles, paragraphs, figures, reading order, tables or articles. Logical structure elements can be “tagged” with descriptions or alternative text. “Tagging” allows the contents of an image to be described to the visually impaired. It is not possible for Pdf/A converter to add meaningful tagging information. Adding tagging information without prior knowledge about the input file’s structure and content is neither possible nor allowed by the PDF/A standard. For that reason, the conformance level is automatically downgraded to level B or U. - false: and an input file cannot be converted to the requested standard, e.g. because of missing “tagging” information, the conversion is aborted and the ErrorCode set to PDF_E_DOWNGRADE.
allowUpgrade boolean Automatic upgrade of the PDF/A version. - true: automatic upgrade of the PDF/A version is allowed. If the target standard is PDF/A-1 and a file contains elements that cannot be converted to PDF/A-1, the target standard is upgraded to PDF/A-2. This avoids significant visual differences in the output file. For example, the following elements may lead to an automatic upgrade: A) Transparency B) Optional content groups (OCG, layers). C) Real values that exceed the implementation limit of PDF/A-1. D) Embedded OpenType font files. E) Predefined CMap encodings in Type0 fonts. - false: the compliance is not upgraded. And in case of: A) occurance of visual differences in output file B) removal of optional content groups (layers) (PDF/A-1 only) C) removal of transparency (PDF/A-1 only) D) removal of embedded files E) removal of non convertible XMP metadata F) the input document is corrupt and should be repaired. The conversion will fail with a conversion error PDF_E_CONVERSION.
outputIntentProfile String Output Intent Profile. The given profile is embedded only if the input file does not contain a PDF/A output intent already.
linearize boolean Linearization of the PDF output file i.e. optimize file for fast web access. A linearized document has a slightly larger file size than a non-linearized file and provides the following main features: When a document is opened in a PDF viewer of a web browser, the first page can be viewed without downloading the entire PDF file. In contrast, a non-linearized PDF file must be downloaded completely before the first page can be displayed. When another page is requested by the user, that page is displayed as quickly as possible and incrementally as data arrives, without downloading the entire PDF file. Signed files cannot be linearized. So this property must be set to false if a digital signature is applied.

pdfARes

Name Type Description Notes
document Document pdfA compliant file.  

Document

Name Type Description Notes
jobId String JobId of Documents WorkingSet.
documentId String Document Id
name String Filename inlcuding filetype.
docStatus String Status of the Document, e.g. Stamped.
pages Page Description of pages.
docData [byte] Document bytes.
docMetadata DocMetadata Document metadata such as title, pageCount et al.
docLogs DocLog Logging information about the request, e.g. timestamp.

Page

Name Type Description Notes
documentId String Globally unique Id.
pageId String Globally unique Id.
pageNumber Integer PageNumber, starting with 1.
rotate double By how much the page was rotated from its original orientation.
thumbnail byte Thumbnail representing this particular page.
sourceDocumentId String Id of the document it was created from, e.g. in case of an extraction, the result's sourceDocumentId is the Id of the PDF the pages have been extracted from.
sourcePageNumber Integer Page number of the original page in the original document, e.g. let's assume document B consists of page number 4 of document A (extraction).
Thus, document B's only page's sourcePageNumber is number 4.

DocMetadata

Name Type Description Notes
title String Title of document.
subject String Subject of document.
pageCount long Number of pages.
size long Number bytes of the document.
isEncrypted boolean If the document is Encrypted
pdfCompliance String Pdf Compliance, e.g. PDF/A.
isSigned boolean If the document is Encrypted
uploadedMimeType String Uploaded MimeType, e.g. application/bson.
uploadedFileSize long Uploaded file size.

DocLog

Name Type Description Notes
messageType String MessageType, e.g. PdfALog.
message String Message itself, e.g. a warning.
timestamp dateTime Timestamp.
docLogLevel String Type of message. Supported Values :
"verbose", "info", "warning", "error", "timing"
durationMilliseconds long Timing for requested log information [ms].

How can we help?