OCR
Optical Character Recognition is commonly used for recognizing text in scanned documents . We can use Pdf4me OCR for recognizing texts in scanned documents, images .etc.
Feature | Parameter | Response | Action | Description | Links |
recognizeDocument | file | OcrAction | Repairs Pdfs. For example recover pages, rebuild fonts etc. | Swagger Sample |
Samples
RecognizeDocument
- curl
- C#
- Java
- JavaScript
- PHP
- Python
- Ruby
curl
// setup recognizeDocument object
var recognizeDocument = new RecognizeDocument()
{
// document
Document = new Document()
{
DocData = File.ReadAllBytes("myPdf.pdf"),
Name = "myPdf.pdf",
},
// action
OcrAction = new OcrAction()
{
OutputType = OcrActionOutputType.PdfSearchable
},
};
// conversion
var res = Pdf4me.Instance.OcrClient.RecognizeDocumentAsync(recognizeDocument);
// extract the json and write it to disk
File.WriteAllText("generatedPdf.pdf", res.StructuredDataJson);