Creating new eng.tessdata file for custom font in Tesseract giving error - windows

Converted the PDF file into .tiff which is pretty straightforward
convert -depth 4 -density 300 -background white +matte eng.arial.pdf eng.arial.tiff
Then train tesseract for the .tiff file -
tesseract eng.arial.tiff eng.arial batch.nochop makebox
Then feed the .tiff file into tesseract -
tesseract eng.arial.tiff eng.arial.box nobatch box.train .stderr
Detect the Character set used -
unicharset_extractor *.box
But I am getting this error -
unicharset_extractor:./.libs/lt-unicharset_extractor.c:233: FATAL: couldn't find unicharset_extractor.
And it also happening for mftraining and combine_tessdata as well.
UPDATE
Ran unicharset_extractor on single box file and still doesn't work.
And it is not only with this command but also with mftraining, cntraining and combine_tessdata.

Related

Basic ImageMagic disconnect: converting plain ASCII.txt file into image file ASCII.jpg - ending up with errors

I have been trying to convert a plain text file (containing ASCII sentences / lines) into an JPEG image - not caring about any formats or styles - just a first try - keen on quick results:
After half a day trials and errors in all variations - all failing with the below (or similar) errors - and nearly going mad I have solved the following error message :
...
convert-im6.q16: not authorized `#a.txt' # error/property.c/InterpretImageProperties/3516.
I have gotten the wanted result when I have deactivitated
/etc/ImageMagick-6/policy.xml
into
/etc/ImageMagick-6/_policy.xml
Can somebody explain - please - what the magic behind is with this (deactivated) file ???
And please - being sure this file should become re-activated as it is by default - what do I need to change instead (in the file ???)
Thanks in advance with BR
If you are not on a shared server, you can edit your ImageMagick policy.xml file to give read|write permissions for the use of "#".
I suspect the defaults are still "none" for the use of "#" even if you disable the policy.xml file.
Then you can read a text file (in this case ipsum_lorem.txt) :
convert -size 1000x -font arial -pointsize 28 caption:"#ipsum_lorem.txt" x.jpg
or
cat ipsum_lorem.txt | convert -size 1000x -font arial -pointsize 28 caption:"#-" x.jpg
and the result should be:

GhostScript generating damaged tiff file

I am trying to generate tiff file using ghost script from a pdf.
I am using this command
gs -dSAFER -dBATCH -dNOPAUSE -dFirstPage=2 -dLastPage=2 -r450x635 -sDEVICE=tiffgray -sCompression=lzw -sOutputFile=test2_local.tif foil.pdf
The tiff file is getting generated but it is not opening and i am getting message of damaged file. If I reduce resolution then it works fine. And it also works if remove the compression.
But i can not remove compression as file size generated is 174MB. I am using GS 9.22 on mac OS.

cygwin file not found on network windows drive, possible issue with spaces

to try and not over complicate my issue i'm using cygwin and have two windows computers on a network sharing data. i'm trying to add a logo to some photos and are having a tough time with finding the logo file with cygwin in this code
for f in *.jpg; do
convert "$f" \
-gravity southeast \
-draw 'image over 0,0 0,0 "//deepfrogphotopc/L Xotio Passport/temp/logo2.png"' \
-write //deepfrogphotopc/L Xotio Passport/temp/og-rotate-logo/"$f" \
-resize 1200x1200 \
//deepfrogphotopc/L Xotio Passport/temp/1200px/"$f"
done
I have no problem when using cd to get to this location with cygwin
//deepfrogphotopc/L Xotio Passport/temp/
but then when using the above loop to watermark the files cygwin can't find the logo file and I assume it won't write to the correct directories after either because of the same problem. I actually got it working using
../logo2.png
but really want to use the actual address like I'm trying. Any help with this is very much appreciated!

How to convert jp2000 to geotiff using GDAL?

I'm trying to convert Sentinel-2 imagery in jp2000 (.jp2) format to geotiff format using gdal_translate. However, it appears .jp2 format is not recognized. What method should I use to convert jp2000 format to geotiff?
$ gdal_translate B02.jp2 B02.tif
ERROR 4: `B02.jp2' not recognised as a supported file format.
GDALOpen failed - 4
`B02.jp2' not recognised as a supported file format.
If you are on macOS and want JP2000 with GDAL, one option is to use homebrew to install it like this:
brew install gdal --with-complete
Then you get this:
gdalinfo --formats | grep -i jp
JPEG (rwv): JPEG JFIF
JPEG2000 (rwv): JPEG-2000 part 1 (ISO/IEC 15444-1)
This is because the jasper jp2 driver in gdal cannot handle big jp2 files. Also, there is no easy way of changing the jp2 driver of gdal.
So, simply install Kakadu from:
http://kakadusoftware.com/downloads/
Then convert your large jp2000 file to geotiff using Kakadu:
kdu_expand -i input.JP2 -o output.tif -num_threads 4
Then you can use your gdal functions with the converted geotiff.

jpg won't optimize (jpegtran, jpegoptim)

I have an image and it's a jpg.
I tried running through jpegtran with the following command:
$ jpegtran -copy none -optimize image.jpg > out.jpg
The file outputs, but the image seems un-modified (no size change)
I tried jpegoptim:
$ jpegoptim image.jpg
image.jpg 4475x2984 24bit P JFIF [OK] 1679488 --> 1679488 bytes (0.00%), skipped.
I get the same results when I use --force with jpegoptim except it reports that it's optimized but there is no change in file size
Here is the image in question: http://i.imgur.com/NAuigj0.jpg
But I can't seem to get it to work with any other jpegs I have either (only tried a couple though).
Am I doing something wrong?
I downloaded your image from imgur, but the size is 189,056 bytes. Is it possible that imgur did something to your image?
Anyway, I managed to optimize it to 165,920 bytes using Leanify (I'm the author) and it's lossless.

Resources