Convert from PDF
Convert your PDF document to Word, Excel, PowerPoint and eBook formats. API provides OCR option to produce readable document from scanned documents as well.
Offers conversion with below output options:
- MS Word
- Excel
- PowerPoint
- PDF with OCR
- Text with OCR
- ePub
- Mobi
Depending on the need Quality type can be set as Draft or High.
To see how good the PDF result is, just copy and paste the curl sample and run from your command line.
Feature | Parameter | Response | Action | Description | Links |
convertFromPdf | ConvertFromPdf | ConvertFromPdfRes | ConvertFromPdfAction | Convert PDF to .docx, .xlsx, .pptx, .epub and .mobi formats | Swagger Sample |
Samples
ConvertFromPdf
- curl
- C#
- Java
- JavaScript
- PHP
- Python
- Ruby
curl
// setup convertFromPdf object
var convertFromPdf = new ConvertFromPdf()
{
// document
Document = new Document()
{
DocData = File.ReadAllBytes("myPdf.pdf"),
Name = "myPdf.pdf",
},
// action
ConvertFromPdfAction = new ConvertFromPdfAction()
{
OutputFormat = ConvertFromPdfActionOutputFormat.PdfOcr,
QualityType = ConvertFromPdfActionQualityType.High
},
};
// conversion
var res = await Pdf4meClient.Pdf4me.Instance.ConvertFromPdfClient.ConvertFromPdfAsync(convertFromPdf);
// extract the generated PDF and write it to disk
byte[] generatedPdf = res.Document.DocData;
File.WriteAllBytes("generatedPdf.pdf", generatedPdf);
import base64
import datetime
import os
from pdf4me.client.pdf4me_client import Pdf4meClient
from pdf4me.client.convert_client import ConvertClient
from pdf4me.model import ConvertFromPdf, Document, ConvertFromPdfAction
from pdf4me.helper.file_reader import FileReader
""" Pass token as argument """
pdf4me_client = Pdf4meClient(token='')
# setup the convert_client
convert_client = ConvertClient(pdf4me_client)
# create the convert_to_pdf object
convert_from_pdf = ConvertFromPdf(
document=Document(
doc_data=FileReader().get_file_data('myPdf.pdf'),
name='myPdf.pdf'
),
convert_from_pdf_action=ConvertFromPdfAction(
output_format='pdfOcr',
quality_type='high'
)
)
# conversion
res = convert_client.convert_from_pdf(convert_from_pdf=convert_from_pdf)
# extracting the generated PDF
generated_pdf = base64.b64decode(res['document']['doc_data'])
# writing it to disk
with open('pdfOcr.pdf', 'wb') as f:
f.write(generated_pdf)
Models
ConvertFromPdf
Name | Type | Description | Notes |
---|---|---|---|
document |
Document |
||
convertFromPdfAction |
ConvertFromPdfAction |
||
jobId |
String |
[optional] | |
jobIdExtern |
String |
[optional] | |
integrations |
[String] |
[optional] |
ConvertFromPdfAction
Name | Type | Description | Notes |
---|---|---|---|
outputFormat |
enum |
Supported Values: "None", "DocX", "Excel", "Pptx", "PdfOcr", "TextOcr", "Epub", "Mobi" |
|
qualityType |
enum |
Supported Values: "Draft", "High" |
|
singlePage |
Bool |
[Optional] | |
docAuthor |
String |
[Optional] | |
docTitle |
String |
[Optional] | |
firstPageCover |
String |
[Optional] | |
coverThumbnail |
byte[] |
[Optional] | |
thumbnailExt |
String |
[Optional] | |
device |
String |
[Optional] |
ConvertToPdfRes
Name | Type | Description | Notes |
---|---|---|---|
document |
Document |
Generated PDF document. |
Document
Name | Type | Description | Notes |
---|---|---|---|
jobId |
String |
JobId of Documents WorkingSet. | |
documentId |
String |
Document Id | |
name |
String |
Filename inlcuding filetype. | |
docStatus |
String |
Status of the Document, e.g. Stamped. | |
pages |
Page |
Description of pages. | |
docData |
[byte] |
Document bytes. | |
docMetadata |
DocMetadata |
Document metadata such as title, pageCount et al. | |
docLogs |
DocLog |
Logging information about the request, e.g. timestamp. |
Page
Name | Type | Description | Notes |
---|---|---|---|
documentId |
String |
Globally unique Id. | |
pageId |
String |
Globally unique Id. | |
pageNumber |
Integer |
PageNumber, starting with 1. | |
rotate |
double |
By how much the page was rotated from its original orientation. | |
thumbnail |
byte |
Thumbnail representing this particular page. | |
sourceDocumentId |
String |
Id of the document it was created from, e.g. in case of an extraction, the result's sourceDocumentId is the Id of the PDF the pages have been extracted from. | |
sourcePageNumber |
Integer |
Page number of the original page in the original document, e.g. let's assume document B consists of page number 4 of document A (extraction). Thus, document B's only page's sourcePageNumber is number 4. |
DocMetadata
Name | Type | Description | Notes |
---|---|---|---|
title |
String |
Title of document. | |
subject |
String |
Subject of document. | |
pageCount |
long |
Number of pages. | |
size |
long |
Number bytes of the document. | |
isEncrypted |
boolean |
If the document is Encrypted | |
pdfCompliance |
String |
Pdf Compliance, e.g. PDF/A. | |
isSigned |
boolean |
If the document is Encrypted | |
uploadedMimeType |
String |
Uploaded MimeType, e.g. application/bson. | |
uploadedFileSize |
long |
Uploaded file size. |
DocLog
Name | Type | Description | Notes |
---|---|---|---|
messageType |
String |
MessageType, e.g. PdfALog. | |
message |
String |
Message itself, e.g. a warning. | |
timestamp |
dateTime |
Timestamp. | |
docLogLevel |
String |
Type of message. | Supported Values : "verbose", "info", "warning", "error", "timing" |
durationMilliseconds |
long |
Timing for requested log information [ms]. |