Extract
Pdf4me Extract lets you extract pages from a Pdf document. As a result, forms a new PDF consisting of the pages which have been extracted from an existing PDF document. These can be single pages or a range of pages.
Feature | Parameter | Response | Action | Description | Links |
extract | Extract | ExtractRes | ExtractAction | Generates a new PDF consisting of the pages extracted from a given pdf. | swagger sample |
extractPages | pageNrs, file | file stream | List of the pages which will be extracted. Page number 1 corresponds to the first page. | swagger sample | |
extractResources | ExtractResources | ExtractResourcesRes | ExtractResourcesAction | Extracts resources from a Pdf document like metadata. | swagger sample |
Samples
Extract
- curl
- C#
- Java
- JavaScript
- PHP
- Python
- Ruby
curl No Sample
// create extract object
Extract extract = new Extract()
{
// document
Document = new Document()
{
DocData = File.ReadAllBytes("myPdf.pdf"),
Name = "myPdf.pdf",
},
// action
ExtractAction = new ExtractAction()
{
// list of pages to be extracted
ExtractPages = new System.Collections.Generic.HashSet() { 1, 4 },
}
};
// extraction
ExtractRes res = await Pdf4meClient.Pdf4me.Instance.ExtractClient.ExtractAsync(extract);
// extracting the generated PDF and writing it to disk
byte[] extractedPdf = res.Document.DocData;
File.WriteAllBytes("extractedPdf.pdf", extractedPdf);
// setup the extractClient
ExtractClient extractClient = new ExtractClient(pdf4meClient);
// create extract object
Extract extract = new Extract();
// document
Document document = new Document();
document.setDocData(Files.readAllBytes(Paths.get("myPdf.pdf")));
extract.setDocument(document);
// action
ExtractAction extractAction = new ExtractAction();
extractAction.setExtractPages(Arrays.asList(1, 4));
extract.setExtractAction(extractAction);
// extraction
ExtractRes res = extractClient.extract(extract);
// extracting the generated PDF and writing it to disk
byte[] extractedPdf = res.getDocument().getDocData();
FileUtils.writeByteArrayToFile(new File("extractedPdf.pdf"), extractedPdf);
// setup the pdf4meClient
const pdf4meClient = pdf4me.createClient('YOUR API KEY')
// create extract object
const extractReq = {
// document
document: {
docData: fs.readFileSync(path.join(__dirname, 'myPdf.pdf')).toString('base64'),
},
// action
extractAction: {
extractPages: [1, 4],
},
}
// extraction
pdf4meClient.extract(extractReq)
.then(function(extractRes) {
// extracting the generated PDF and writing it to disk
const pdfDocument = Buffer.from(extractRes.document.docData, 'base64')
fs.writeFileSync(path.join(__dirname, 'extractedPdf.pdf'), pdfDocument)
})
.catch(error => {
console.log(error)
})
// create extract object
$create_exrtract = [
//document
"document" => [
"docData" => $client->getFileData('myPdf.pdf')
],
//action
"extractAction" => [
"extractPages" => [
1,
4
]
]
];
// extraction
$extractedPdf = $client->pdf4me()->extract($create_extract);
// extracting the generated PDF and writing it to disk
$extractedPdf = base64_decode($createExtract->document->docData);
file_put_contents('extractedPdf.pdf', $extractedPdf);
# setup the extract_client
extract_client = ExtractClient(pdf4me_client)
# create the extract object
extract = Extract(
# document
document=Document(
doc_data=FileReader().get_file_data('myPdf.pdf')
),
# action
extract_action=ExtractAction(
extract_pages=[1,4]
)
)
# extraction
res = extract_client.extract(extract=extract)
# extracting the generated PDF and writing it to disk
extracted_pdf = base64.b64decode(res['document']['doc_data'])
with open('extractedPdf.pdf', 'wb') as f:
f.write(extracted_pdf)
file_path = './myPdf.pdf'
action = Pdf4me::Extract.new(
# document
document: Pdf4me::Document.new(
doc_data: Base64.encode64(File.open(file_path, 'rb', &:read))
),
# action
extract_action: Pdf4me::ExtractAction.new(
extract_pages: [1, 4]
),
)
response = action.run
# saving extracted pages
File.open('/extractedPdf.pdf', 'wb') do |f|
f.write(Base64.decode64(response.document.doc_data))
end
ExtractPages
- curl
- C#
- Java
- JavaScript
- PHP
- Python
- Ruby
curl https://api.pdf4me.com/Extract/ExtractPages ^
-H "Authorization: Basic DEV-KEY" ^
-F pageNrs=1,4 ^
-F "file=@./myPdf.pdf" ^
-o ./extractedPdf.pdf
// extraction
byte[] extractedPdf = await Pdf4meClient.Pdf4me.Instance.ExtractClient.ExtractPagesAsync(File.ReadAllBytes("myPdf.pdf"),"1,4");
// and writing the generated PDF to disk
File.WriteAllBytes("extractedPdf.pdf", extractedPdf);
// setup the extractClient
ExtractClient extractClient = new ExtractClient(pdf4meClient);
// extraction and writing the generated PDF to disk
byte[] extractedPdf = extractClient.extractPages("1,4", new File("myPdf.pdf"));
FileUtils.writeByteArrayToFile(new File("extractedPdf.pdf"), extractedPdf);
// setup the extractClient
const extractClient = new pdf4me.ExtractClient(pdf4meClient);
// extraction
extractClient.extractPages('1,4', fs.createReadStream('./myPdf.pdf'))
.then(pdf => {
fs.writeFileSync('./extractedPdf.pdf', pdf);
})
.catch(err => {
console.log(err);
});
// extraction
$extractPages = $client->pdf4me()->extractPages(
[
"pageNrs" => "1,4"
"file" => __DIR__.'/myPdf.pdf'
]
);
//writing it to file
file_put_contents('extractedPdf.pdf', $extractPages);
# setup the extract_client
extract_client = ExtractClient(pdf4me_client)
# extraction
extracted_pdf = extract_client.extract_pages(
page_nrs='1,4',
file=FileReader().get_file_handler(path="myPdf.pdf")
)
# writing the generated PDF to disk
with open('extractedPdf.pdf', 'wb') as f:
f.write(extracted_pdf)
a = Pdf4me::ExtractPages.new(
file: '/myPdf.pdf',
pages: [4],
save_path: 'extractedPdf.pdf'
)
a.run
ExtractResources
- curl
- C#
- Java
- JavaScript
- PHP
- Python
- Ruby
curl No Sample
// create extract resource object
var req = new ExtractResources()
{
//document
Document = new Document()
{
DocData = File.ReadAllBytes("myPdf.pdf"),
Name = "myPdf.pdf",
},
//action
ExtractResourcesAction = new ExtractResourcesAction()
{
ExtractFonts = true,
ExtractImages = true,
Outlines = true,
XmpMetadata = true,
ListFonts = true,
ListImages = true
}
};
//extracting resources
var res = Pdf4me.Instance.ExtractClient.ExtractResourcesAsync(req).GetAwaiter().GetResult();
//saving extracted resource info to a json file
File.WriteAllText("extractResources_result.json", JsonConvert.SerializeObject(res));
// setup the pdf4meClient
const pdf4meClient = pdf4me.createClient('YOUR API KEY')
// create extract resource object
const extractResourcesReq = {
// document
document: {
docData: fs.readFileSync(path.join(__dirname, 'myPdf.pdf')).toString('base64'),
},
// action
extractResourcesAction: {
extractFonts: true,
extractImages: true,
listFonts: true,
listImages: true,
outlines: true,
xmpMetadata: true,
},
}
// extract resources
pdf4meClient
.extractResources(extractResourcesReq)
.then(function(extractResourcesRes) {
// and writing it to disk
fs.writeFileSync(path.join(__dirname, 'extractResources_result.json'), JSON.stringify(extractResourcesRes, null, 2))
})
.catch(error => {
console.log(error)
process.exit(1)
})
// create extract resource object
$create_extract_resource = [
'document'=> [
'name' => 'PDF_10pages.pdf',
'docData' => $pdf4meclient->getFileData('PDF_10pages.pdf')
],
'ExtractResourcesAction' => [
'outlines' => 0,
'xmpMetadata' => 1,
'listFonts' => 1,
'extractFonts' => 1,
'extractImages' => 1,
'listImages' => 1
]
];
// extract resources
$res = $pdf4meclient->pdf4me()->extractResources($create_extract_resource);
echo $res["pdfResources"];
# setup the extract_client
extract_client = ExtractClient(pdf4me_client)
# create the extract object
extract_resources = ExtractResources(
# document
document=Document(
doc_data=FileReader().get_file_data('PDF_10pages.pdf')
),
# action
extract_resources_action=ExtractResourcesAction(
extract_fonts=1,
extract_images=1,
list_fonts=1,
list_images=0,
outlines=1,
xmp_metadata=1
)
)
# extraction
res = extract_client.extract_resources(extract_resources=extract_resources)
# writing it to disk
with open(testfolder+'\extractResources_result.json', 'w') as f:
json.dump(res, f)
Models
Extract
Name | Type | Description | Notes |
---|---|---|---|
document |
Document |
||
extractAction |
ExtractAction |
||
jobId |
String |
[optional] | |
jobIdExtern |
String |
[optional] | |
integrations |
[String] |
[optional] |
ExtractAction
Name | Type | Description | Notes |
---|---|---|---|
extractPages |
[Integer] |
Page number of pages that needed to be extracted from the document. | [Optional] |
ExtractRes
Name | Type | Description | Notes |
---|---|---|---|
document |
Document |
PDF consisting of the extracted pages. |
Document
Name | Type | Description | Notes |
---|---|---|---|
jobId |
String |
JobId of Documents WorkingSet. | |
documentId |
String |
Document Id | |
name |
String |
Filename inlcuding filetype. | |
docStatus |
String |
Status of the Document, e.g. Stamped. | |
pages |
Page |
Description of pages. | |
docData |
[byte] |
Document bytes. | |
docMetadata |
DocMetadata |
Document metadata such as title, pageCount et al. | |
docLogs |
DocLog |
Logging information about the request, e.g. timestamp. |
Page
Name | Type | Description | Notes |
---|---|---|---|
documentId |
String |
Globally unique Id. | |
pageId |
String |
Globally unique Id. | |
pageNumber |
Integer |
PageNumber, starting with 1. | |
rotate |
double |
By how much the page was rotated from its original orientation. | |
thumbnail |
byte |
Thumbnail representing this particular page. | |
sourceDocumentId |
String |
Id of the document it was created from, e.g. in case of an extraction, the result's sourceDocumentId is the Id of the PDF the pages have been extracted from. | |
sourcePageNumber |
Integer |
Page number of the original page in the original document, e.g. let's assume document B consists of page number 4 of document A (extraction). Thus, document B's only page's sourcePageNumber is number 4. |
DocMetadata
Name | Type | Description | Notes |
---|---|---|---|
title |
String |
Title of document. | |
subject |
String |
Subject of document. | |
pageCount |
long |
Number of pages. | |
size |
long |
Number bytes of the document. | |
isEncrypted |
boolean |
If the document is Encrypted | |
pdfCompliance |
String |
Pdf Compliance, e.g. PDF/A. | |
isSigned |
boolean |
If the document is Encrypted | |
uploadedMimeType |
String |
Uploaded MimeType, e.g. application/bson. | |
uploadedFileSize |
long |
Uploaded file size. |
DocLog
Name | Type | Description | Notes |
---|---|---|---|
messageType |
String |
MessageType, e.g. PdfALog. | |
message |
String |
Message itself, e.g. a warning. | |
timestamp |
dateTime |
Timestamp. | |
docLogLevel |
String |
Type of message. | Supported Values : "verbose", "info", "warning", "error", "timing" |
durationMilliseconds |
long |
Timing for requested log information [ms]. |