Convert from PDF

Convert your PDF document to Word, Excel, PowerPoint and eBook formats. API provides OCR option to produce readable document from scanned documents as well.

Offers conversion with below output options:

  • MS Word
  • Excel
  • PowerPoint
  • PDF with OCR
  • Text with OCR
  • ePub
  • Mobi

Depending on the need Quality type can be set as Draft or High.

To see how good the PDF result is, just copy and paste the curl sample and run from your command line.

FeatureParameterResponseActionDescriptionLinks
convertFromPdfConvertFromPdfConvertFromPdfResConvertFromPdfActionConvert PDF to .docx, .xlsx, .pptx, .epub and .mobi formatsSwagger
Sample

Samples

ConvertFromPdf

  • curl
  • C#
  • Java
  • JavaScript
  • PHP
  • Python
  • Ruby
curl 
// setup convertFromPdf object
var convertFromPdf = new ConvertFromPdf()
{
    // document
    Document = new Document()
    {
        DocData = File.ReadAllBytes("myPdf.pdf"),
        Name = "myPdf.pdf",
    },
    // action
    ConvertFromPdfAction = new ConvertFromPdfAction()
    {
        OutputFormat = ConvertFromPdfActionOutputFormat.PdfOcr,
        QualityType = ConvertFromPdfActionQualityType.High
    },
};

// conversion
var res = await Pdf4meClient.Pdf4me.Instance.ConvertFromPdfClient.ConvertFromPdfAsync(convertFromPdf);

// extract the generated PDF and write it to disk
byte[] generatedPdf = res.Document.DocData;
File.WriteAllBytes("generatedPdf.pdf", generatedPdf);
import base64
import datetime
import os
from pdf4me.client.pdf4me_client import Pdf4meClient
from pdf4me.client.convert_client import  ConvertClient
from pdf4me.model import ConvertFromPdf, Document, ConvertFromPdfAction
from pdf4me.helper.file_reader import FileReader

""" Pass token as argument """
pdf4me_client = Pdf4meClient(token='')

# setup the convert_client
convert_client = ConvertClient(pdf4me_client)

# create the convert_to_pdf object
convert_from_pdf = ConvertFromPdf(
    document=Document(
        doc_data=FileReader().get_file_data('myPdf.pdf'),
        name='myPdf.pdf'
        ),
    convert_from_pdf_action=ConvertFromPdfAction(
        output_format='pdfOcr',
        quality_type='high'
        )
    )

# conversion
res = convert_client.convert_from_pdf(convert_from_pdf=convert_from_pdf)

# extracting the generated PDF
generated_pdf = base64.b64decode(res['document']['doc_data'])
# writing it to disk
with open('pdfOcr.pdf', 'wb') as f:
    f.write(generated_pdf)

Models

ConvertFromPdf

Name Type Description Notes
document Document
convertFromPdfAction ConvertFromPdfAction
jobId String [optional]
jobIdExtern String [optional]
integrations [String] [optional]

ConvertFromPdfAction

Name Type Description Notes
outputFormat enum Supported Values:
"None", "DocX", "Excel", "Pptx", "PdfOcr", "TextOcr", "Epub", "Mobi"
qualityType enum Supported Values:
"Draft", "High"
singlePage Bool [Optional]
docAuthor String [Optional]
docTitle String [Optional]
firstPageCover String [Optional]
coverThumbnail byte[] [Optional]
thumbnailExt String [Optional]
device String [Optional]

ConvertToPdfRes

Name Type Description Notes
document Document Generated PDF document.

Document

Name Type Description Notes
jobId String JobId of Documents WorkingSet.
documentId String Document Id
name String Filename inlcuding filetype.
docStatus String Status of the Document, e.g. Stamped.
pages Page Description of pages.
docData [byte] Document bytes.
docMetadata DocMetadata Document metadata such as title, pageCount et al.
docLogs DocLog Logging information about the request, e.g. timestamp.

Page

Name Type Description Notes
documentId String Globally unique Id.
pageId String Globally unique Id.
pageNumber Integer PageNumber, starting with 1.
rotate double By how much the page was rotated from its original orientation.
thumbnail byte Thumbnail representing this particular page.
sourceDocumentId String Id of the document it was created from, e.g. in case of an extraction, the result's sourceDocumentId is the Id of the PDF the pages have been extracted from.
sourcePageNumber Integer Page number of the original page in the original document, e.g. let's assume document B consists of page number 4 of document A (extraction).
Thus, document B's only page's sourcePageNumber is number 4.

DocMetadata

Name Type Description Notes
title String Title of document.
subject String Subject of document.
pageCount long Number of pages.
size long Number bytes of the document.
isEncrypted boolean If the document is Encrypted
pdfCompliance String Pdf Compliance, e.g. PDF/A.
isSigned boolean If the document is Encrypted
uploadedMimeType String Uploaded MimeType, e.g. application/bson.
uploadedFileSize long Uploaded file size.

DocLog

Name Type Description Notes
messageType String MessageType, e.g. PdfALog.
message String Message itself, e.g. a warning.
timestamp dateTime Timestamp.
docLogLevel String Type of message. Supported Values :
"verbose", "info", "warning", "error", "timing"
durationMilliseconds long Timing for requested log information [ms].

How can we help?