Fill Broken and Damaged Texts - OPENCV - c++14

I am trying to extract the text from an image using openCV and tesseract.
Since, we need to pass a clean image to tesseract to fetch the text properly with accuracy. But I am facing an issue related to preprocessing of an image using opencv.
How can i join the texts and get proper text to to pass to the tesseract?
image1(ORIGINAL IMAGE)
image2(PROCESSED IMAGE).

Related

Check if image contains text

I want to detect and extract texts in a natural image if it contains as google vision do. I found a library on GitHub that detect regions on image to find texts and after detection it does OCR. I want it to be faster and before text detection and extraction, I want to check if an image contains text or not.
I know I can run OCR on it but I want it to be faster than that. If it contains text then it should OCR, if not it should discard the image. Any ideas?

Add Background image to 3D Plotly - Python

I have been trying to add an image to my 3D surface plots on Plotly; where the image is stored in a local file and I give the file path as an input to "source" in go.Layout; but it does not show up and shows no errors.
Plotly background images this doesn't work for me either; can someone explain it a bit more!
The final plot could look like this https://plot.ly/~empet/14397/plotly-plot-of-a-map-from-data-available/#/ where instead of json file,I just want to add am image.

Dicom image not converting with dcmtk

The image http://www.barre.nom.fr/medical/samples/files/MR-MONO2-16-head.gz on http://www.barre.nom.fr/medical/samples/is not converting to other image formats. I tried following commands (after extracting the dicom file):
dcm2pnm --write-png MR-MONO2-16-head out.png
dcm2pnm +obr MR-MONO2-16-head out.bmp
dcm2pnm MR-MONO2-16-head out.pnm
It also did not work with dcmj2pnm and dcml2pnm. All of them just produce a gray box. The image otherwise is OK and is correctly read by proper dicom viewer softwares. Where is the problem and how can it be solved?
The problem is that no windowing settings are present in the header. You need to instruct dcm2pnm to calculate a window from the histogram (+Wm) or specify windowing values to apply.
dcm2pnm +Wm +obr MR-MONO2-16-head MR-MONO2-16-head.bmp
yields a bitmap image that looks fine to me.

Scanned Image/PDF to Searchable Image/PDF

Can anyone suggest me how to convert a scanned image into a searchable image or a scanned pdf to a searchable pdf ?
I have been stuck in this situation since quite a while now.
i have tried pdfocr application in ubuntu but no success.
Tesseract version 3.03 supports creation of searchable PDF from image. For PDF, you can use GhostScript to convert it to image before sending it to Tesseract.
https://github.com/tesseract-ocr/tesseract
Currently, there is no right way of doing this on Ubuntu. All OCR engines output plain text and there is no way to add that text as a hidden layer on PDF over the image text.
Option 1: Use gscan2pdf which will make you a searchable PDF, but the OCRed text is placed in the top-left corner of the page, is invisible and much too small.
Option 2: Use PDF X-Change Viewer which has an option to OCR and works correctly by adding a text layer over the scanned image which is in concordance with it. You'll have to run it in wine, because it is a Windows application.

How to subset raster image using gdal?

I have read pixel values of an raster image using GDAL libraries in visual studio 2010(vc++).
Next is , I have to crop the image (subset) according to the grid given in shape file.
Forget about the grid this time.
I just want to clip square or rectangular area and save to new file.
I have read some documents which suggest about gdal_translate and gdal_warp function to use but it can only be run in python where as i want to use c++.
Please help me as early as possible.
I have solved the problem of cropping the image using VC++ with gdal libraries. I have created VRTDataset of my desired size of raster to be cropped and then save it using CreateCopy().

Resources