Extract

Pdf4me Extract lets you extract pages from a Pdf document. As a result, forms a new PDF consisting of the pages which have been extracted from an existing PDF document. These can be single pages or a range of pages.

Feature	Parameter	Response	Action	Description	Links
extract	Extract	ExtractRes	ExtractAction	Generates a new PDF consisting of the pages extracted from a given pdf.	swagger sample
extractPages	pageNrs, file	file stream		List of the pages which will be extracted. Page number 1 corresponds to the first page.	swagger sample
extractResources	ExtractResources	ExtractResourcesRes	ExtractResourcesAction	Extracts resources from a Pdf document like metadata.	swagger sample

Try Extract Resources Online

Samples

Extract

curl
C#
Java
JavaScript
PHP
Python
Ruby

curl No Sample

// create extract object
Extract extract = new Extract()
{
    // document
    Document = new Document()
    {
        DocData = File.ReadAllBytes("myPdf.pdf"),
        Name = "myPdf.pdf",
    },
    // action
    ExtractAction = new ExtractAction()
    {
        // list of pages to be extracted
        ExtractPages = new System.Collections.Generic.HashSet() { 1, 4 },
    }
};

// extraction
ExtractRes res = await Pdf4meClient.Pdf4me.Instance.ExtractClient.ExtractAsync(extract);

// extracting the generated PDF and writing it to disk
byte[] extractedPdf = res.Document.DocData;
File.WriteAllBytes("extractedPdf.pdf", extractedPdf);

// setup the extractClient
ExtractClient extractClient = new ExtractClient(pdf4meClient);

// create extract object
Extract extract = new Extract();
// document
Document document = new Document();
document.setDocData(Files.readAllBytes(Paths.get("myPdf.pdf")));
extract.setDocument(document);
// action
ExtractAction extractAction = new ExtractAction();
extractAction.setExtractPages(Arrays.asList(1, 4));
extract.setExtractAction(extractAction);

// extraction
ExtractRes res = extractClient.extract(extract);

// extracting the generated PDF and writing it to disk
byte[] extractedPdf = res.getDocument().getDocData();
FileUtils.writeByteArrayToFile(new File("extractedPdf.pdf"), extractedPdf);

// setup the pdf4meClient
const pdf4meClient = pdf4me.createClient('YOUR API KEY')

// create extract object
const extractReq = {
  // document
  document: {
    docData: fs.readFileSync(path.join(__dirname, 'myPdf.pdf')).toString('base64'),
  },
  // action
  extractAction: {
    extractPages: [1, 4],
  },
}

// extraction
pdf4meClient.extract(extractReq)
  .then(function(extractRes) {
    // extracting the generated PDF and writing it to disk
    const pdfDocument = Buffer.from(extractRes.document.docData, 'base64')
    fs.writeFileSync(path.join(__dirname, 'extractedPdf.pdf'), pdfDocument)
  })
  .catch(error => {
    console.log(error)
  })

// create extract object
$create_exrtract = [
    //document
    "document" => [
        "docData" => $client->getFileData('myPdf.pdf')
    ],
    //action
    "extractAction" => [
        "extractPages" => [
            1,
            4
        ]
    ]
];

// extraction
$extractedPdf = $client->pdf4me()->extract($create_extract);

// extracting the generated PDF and writing it to disk
$extractedPdf = base64_decode($createExtract->document->docData);
file_put_contents('extractedPdf.pdf', $extractedPdf);

# setup the extract_client
extract_client = ExtractClient(pdf4me_client)

# create the extract object
extract = Extract(
    # document
    document=Document(
        doc_data=FileReader().get_file_data('myPdf.pdf')
    ),
    # action
    extract_action=ExtractAction(
        extract_pages=[1,4]
    )
)

# extraction
res = extract_client.extract(extract=extract)

# extracting the generated PDF and writing it to disk 
extracted_pdf = base64.b64decode(res['document']['doc_data'])
with open('extractedPdf.pdf', 'wb') as f:
    f.write(extracted_pdf)

file_path = './myPdf.pdf'

 action = Pdf4me::Extract.new(
        # document
        document: Pdf4me::Document.new(
          doc_data: Base64.encode64(File.open(file_path, 'rb', &:read))
        ),
        # action
        extract_action: Pdf4me::ExtractAction.new(
          extract_pages: [1, 4]
        ),
       
    )
response = action.run

    # saving extracted pages
    File.open('/extractedPdf.pdf', 'wb') do |f|
      f.write(Base64.decode64(response.document.doc_data))
    end

ExtractPages

curl
C#
Java
JavaScript
PHP
Python
Ruby

curl https://api.pdf4me.com/Extract/ExtractPages ^
    -H "Authorization: Basic DEV-KEY" ^
    -F pageNrs=1,4 ^
    -F "file=@./myPdf.pdf" ^
    -o ./extractedPdf.pdf

// extraction 
byte[] extractedPdf = await Pdf4meClient.Pdf4me.Instance.ExtractClient.ExtractPagesAsync(File.ReadAllBytes("myPdf.pdf"),"1,4");
// and writing the generated PDF to disk
File.WriteAllBytes("extractedPdf.pdf", extractedPdf);

// setup the extractClient
ExtractClient extractClient = new ExtractClient(pdf4meClient);

// extraction and writing the generated PDF to disk
byte[] extractedPdf = extractClient.extractPages("1,4", new File("myPdf.pdf"));
FileUtils.writeByteArrayToFile(new File("extractedPdf.pdf"), extractedPdf);

// setup the extractClient
const extractClient = new pdf4me.ExtractClient(pdf4meClient);

// extraction
extractClient.extractPages('1,4', fs.createReadStream('./myPdf.pdf'))
    .then(pdf => {
        fs.writeFileSync('./extractedPdf.pdf', pdf);
    })
    .catch(err => {
        console.log(err);
    });

Name	Type	Notes
`document`	`Document`
`extractAction`	`ExtractAction`
`jobId`	`String`	[optional]
`jobIdExtern`	`String`	[optional]
`integrations`	`[String]`	[optional]

Name	Type	Description
`jobId`	`String`	JobId of Documents WorkingSet.
`documentId`	`String`	Document Id
`name`	`String`	Filename inlcuding filetype.
`docStatus`	`String`	Status of the Document, e.g. Stamped.
`pages`	`Page`	Description of pages.
`docData`	`[byte]`	Document bytes.
`docMetadata`	`DocMetadata`	Document metadata such as title, pageCount et al.
`docLogs`	`DocLog`	Logging information about the request, e.g. timestamp.

Name	Type	Description
`documentId`	`String`	Globally unique Id.
`pageId`	`String`	Globally unique Id.
`pageNumber`	`Integer`	PageNumber, starting with 1.
`rotate`	`double`	By how much the page was rotated from its original orientation.
`thumbnail`	`byte`	Thumbnail representing this particular page.
`sourceDocumentId`	`String`	Id of the document it was created from, e.g. in case of an extraction, the result's sourceDocumentId is the Id of the PDF the pages have been extracted from.
`sourcePageNumber`	`Integer`	Page number of the original page in the original document, e.g. let's assume document B consists of page number 4 of document A (extraction). Thus, document B's only page's sourcePageNumber is number 4.

Name	Type	Description
`title`	`String`	Title of document.
`subject`	`String`	Subject of document.
`pageCount`	`long`	Number of pages.
`size`	`long`	Number bytes of the document.
`isEncrypted`	`boolean`	If the document is Encrypted
`pdfCompliance`	`String`	Pdf Compliance, e.g. PDF/A.
`isSigned`	`boolean`	If the document is Encrypted
`uploadedMimeType`	`String`	Uploaded MimeType, e.g. application/bson.
`uploadedFileSize`	`long`	Uploaded file size.

Name	Type	Description	Notes
`messageType`	`String`	MessageType, e.g. PdfALog.
`message`	`String`	Message itself, e.g. a warning.
`timestamp`	`dateTime`	Timestamp.
`docLogLevel`	`String`	Type of message.	Supported Values : "verbose", "info", "warning", "error", "timing"
`durationMilliseconds`	`long`	Timing for requested log information [ms].

API - Documentation

Extract

Samples

Extract

ExtractPages

ExtractResources

Models

Extract

ExtractAction

ExtractRes

Document

Page

DocMetadata

DocLog

API - Documentation

Samples

Extract

ExtractPages

ExtractResources

Models

Extract

ExtractAction

ExtractRes

Document

Page

DocMetadata

DocLog

How can we help?