OCR

Optical Character Recognition is commonly used for recognizing text in scanned documents . We can use Pdf4me OCR for recognizing texts in scanned documents, images .etc.

FeatureParameterResponseActionDescriptionLinks
recognizeDocumentfile
OcrAction

Repairs Pdfs. For example recover pages, rebuild fonts etc.
Swagger
Sample

Samples

RecognizeDocument

  • curl
  • C#
  • Java
  • JavaScript
  • PHP
  • Python
  • Ruby
curl 
// setup recognizeDocument object
var recognizeDocument = new RecognizeDocument()
{
    // document
    Document = new Document()
    {
        DocData = File.ReadAllBytes("myPdf.pdf"),
        Name = "myPdf.pdf",
    },
    // action
    OcrAction = new OcrAction()
    {
        OutputType = OcrActionOutputType.PdfSearchable
    },
};

// conversion
var res = Pdf4me.Instance.OcrClient.RecognizeDocumentAsync(recognizeDocument);

// extract the json and write it to disk
File.WriteAllText("generatedPdf.pdf", res.StructuredDataJson);

Models

OcrAction

Name Type Description Notes
stapel String [Optional]
businesssCardReco String [Optional]
fullTextSearch String [Optional]
outputType enum Supported Values:
"undef", "txt", "docx", "xlsx", "pptx", "pdfSearchable", "xml", "rtf", "rtt", "vcf", "json"
[Optional]

How can we help?