![]() ![]() I have used hocr2pdf to recreate PDFs out of the original image-only PDFs and OCR results. This way you can create "searchable" PDFs from which you can copy text. The nice thing about it is that it can output position information for the OCR text in hOCR format, so that it becomes possible to put the text back in in the correct position in a hidden layer of a PDF file. While it appears to be essentially undocumented apart from a brief README file, I've found the OCR results quite good. Be sure to have the ImageMagick C++ libraries installed to have support for essentially any input image format (otherwise it will only accept BMP). ![]() No binary packages seem to be available, so you need to build it from source. I have had success with the BSD-licensed Linux port of Cuneiform OCR system.
0 Comments
Leave a Reply. |