mftraining gives Warning: no protos/configs for F in CreateIntTemplates() - windows

EDIT: mftraining gives the warning in the title for all the characters in the unicharset (so not just F, but a, b, c, d, etc also) How do I create these protos/configs?
I'm following this tutorial
Previous question that is now solved:-
Error:Assert failedWarning:in file ....\classify\trainingsampleset.cpp, line 622 no protos/ Segmentation Fault
This is the entire command + output:-
C:\training>mftraining -F font_properties -U unicharset -O eng.unicharset eng.impact.box.tr Warning: No shape table file present: shapetable Reading eng.impact.box.tr ... Font id = -1/0, class id = 1/103 on sample 0 font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file....\classify\trainingsampleset.cpp, line 622
I've looked through everything I could find on this warning in the title for all the characters in the unicharset (which wasn't much as it is)so not just F, but a, b, c, d, etc also) How do I can't figure out what the problem is and what would make it work. create these protos/configs?
I also tried the shapeclustering command, but that gives me the same error.
Also, when I run these on cygwin, it displays Segmentation Fault instead of the assertion error.

I was having the same problem, and it was indeed a problem with font_properties. However, in my case, it was solved by making sure that the font in font_properties matched exactly the font name in the .tr file. In my case, that was [fontname].exp0.

I have the same problem with you.
And It's because the font_properties is not formatted right.
Each line of the font_properties file is formatted as follows:
fontname italic bold fixed serif fraktur
here only the fontname is needed.
when I changed the file from lang.fontname.exp0 0 0 0 0 0 to fontname 0 0 0 0 0, my problem fixed

I have found two possible causes of this problem.
Possible cause 1: incorrect font_properties
The font_properties file should contain the content described at:
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02#font_properties-new-in-301
and the file encoding should met the requirements of:
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02#requirements-for-text-input-files
This is the most common answer on the Internet.
(Also make sure you specify the font in font_properties and not the language.)
Possible cause 2: wrong training file name
However I found that trying to fix font_properties didn't work for me, and discovered another cause that gave the same error in my case.
The file .tr files must contain the following format:
<language>.<fontname>.exp<num>.tr
and not:
<language>.<fontname>.exp<num>.box.tr
(as is seen in some tutorials)
So in my case, this will NOT work:
tesseract eng.unknown.exp1.png eng.unknown.exp1.box nobatch box.train
unicharset_extractor eng.unknown.exp1.box
mftraining -F font_properties -U unicharset -O eng.unicharset eng.unknown.exp1.box.tr
whereas this small change does work:
tesseract eng.unknown.exp1.png eng.unknown.exp1 nobatch box.train
unicharset_extractor eng.unknown.exp1.box
mftraining -F font_properties -U unicharset -O eng.unicharset eng.unknown.exp1.tr

You misses a shapeclustering step, which is new in Tesseract 3.02 training.

I had the same issue and changing
fontname 0 0 0 0 0
to
fontname.exp0 0 0 0 0 0
according to the fontname in the .tr file fixed it

I had the same issue, and changing font_properties as following fixed it:
from -
batangche 1 0 0 0 0
to -
batangche.exp0 1 0 0 0 0

In my case the font name in the font_properties file was uppercase, where the font name in the .tr file was lowercase. Changing them to the same case solved the problem.

Related

Firefox does not render .svg properly

I manage a blog where I use .svg files as illustrations, you can see a live example here: https://salarship.com/article/dress-fast-food-job-interview/
The problem is that the .svg files do not render properly on Firefox. Here is how the image looks on Firefox and how it looks on other browsers. Here is the raw file of the image: https://salarship.com/wp-content/uploads/2022/01/wear-job-interview-fast-food.svg
Is it a problem with this particular file, how can I fix it? I have hundreds of articles with this problem, is there a way to fix each image relatively quickly?
There is a path in there with the following command sequence (shortened for clarity):
... v 0 a 0.25,0.25 0 0 0 0,0.07 0.19,0.19 0 0 1 0,0.07 8.510071e11,8.510071e11 0 0 1 0,0.18 ...
The arc command with radius 8.510071e11 seems to throw Firefox. That is a bug.
since an arc with such a large radius is straight anyway, the sequence could be changed such that a v line command is used instead:
... v 0 a 0.25,0.25 0 0 0 0,0.07 0.19,0.19 0 0 1 0,0.07 v 0.18 ...

Faster way of Appending/combining thousands (42000) of netCDF files in NCO

I seem to be having trouble properly combining thousands of netCDF files (42000+) (3gb in size, for this particular folder/variable). The main variable that i want to combine has a structure of (6, 127, 118) i.e (time,lat,lon)
Im appending each file 1 by 1 since the number of files is too long.
I have tried:
for i in input_source/**/**/*.nc; do ncrcat -A -h append_output.nc $i append_output.nc ; done
but this method seems to be really slow (order of kb/s and seems to be getting slower as more files are appended) and is also giving a warning:
ncrcat: WARNING Intra-file non-monotonicity. Record coordinate "forecast_period" does not monotonically increase between (input file file1.nc record indices: 17, 18) (output file file1.nc record indices 17, 18) record coordinate values 6.000000, 1.000000
that basically just increases the variable "forecast_period" 1-6 n-times. n = 42000files. i.e. [1,2,3,4,5,6,1,2,3,4,5,6......n]
And despite this warning i can still open the file and ncrcat does what its supposed to, it is just slow, at-least for this particular method
I have also tried adding in the option:
--no_tmp_fl
but this gives an eror:
ERROR: nco__open() unable to open file "append_output.nc"
full error attached below
If it helps, im using wsl and ubuntu in windows 10.
Im new to bash and any comments would be much appreciated.
Either of these commands should work:
ncrcat --no_tmp_fl -h *.nc
or
ls input_source/**/**/*.nc | ncrcat --no_tmp_fl -h append_output.nc
Your original command is slow because you open and close the output files N times. These commands open it once, fill-it up, then close it.
I would use CDO for this task. Given the huge number of files it is recommended to first sort them on time (assuming you want to merge them along the time axis). After that, you can use
cdo cat *.nc outfile

Ghostscript 'offending input'

When searching for an occurrence of text in a PostScript file, I receive the following error:
gsapi_run_string_continue returns -21
The API documentation specifies that return codes > 0 are "Error" but doesn't describe it any more specifically. Full error console output below - error occurs twice identically, only one occurrence displayed here.
GPL Ghostscript 9.15 (2014-09-22)
Copyright (C) 2014 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Displaying DSC file C:/Users/c-toothm/Desktop/PRDFlow12_30_2014_050307/1230ouptut.ps
Displaying page 1
%%[ ProductName: GPL Ghostscript ]%%
%%[ LastPage ]%%
Extracting text using pstotext...
Ghostscript returns error code -21`
--- Begin offending input ---
evice /pop , d
initmatrix [1 0 0 1 0 0] concat colspSet`
0.00 43.32 +
0.94 0.95 +S
(XSFT2200041.img) run
EPSFILE2200041 restore
;
0 0 0 sco 5 Lw N 4950 4742 M 4800 4742 I K
0 0 0 sco 5 Lw N 4950 4752 M 4800 4752 I K
0 0 0 sco 5 Lw N 4950 4762 M 4800 476
--- End offending input ---
gsapi_run_string_continue returns -21`
[duplicate error redacted]
Our production output creates a giant .ps file every day and this error occurs in many, but not all, .ps files when searching for text. Randomly selected .ps files from the web do not throw the error, so this GS build seems OK - definitely a problem with my file.
What "offending input" is being referred to here and what can I do to address it?
I'd need to see the PostScript file to tell you exactly what is wrong, but 'evice' is not a PostScript operator and so that is likely the problem. Also, from ghostpdl/gs/psi/ierrors.h error code -21 is e_undefined which means the interpreter has encountered an undefined token, which is some confirmation that this is the problem.
This could be because the file contains a 'typo' like that (perhaps it should be setpagedevice or something), or it could be because a filter is improperly terminated, or has insufficient data, and consumes extra bytes from the input stream, chewing up your program.
You should start by using the Ghostscript executable and reproduce the error with that (you might also try the display device, to see whether the problem is related to pstotext), that will allow you to give a command line which other people can then duplicate. With that, and a copy of the offending file I can tell you exactly what's wrong, without it, not much hope.
Bear in mind that PostScript is an interpreted programming language, so its pretty much impossible to tell you what's wrong with your program without seeing the code.
FWIW you might like to try the Ghostscript txtwrite device instead of pstotext, the device doesn't rely on tinkering with the language like pstotext does. pstotext is also really old (the last release is coming up on its 11th birthday) and unsupported.....

MS-DOS debug -l 0 not working

I want to write a bin file to a flash drive. I'm supposed to run:
n helloworld.bin
l 0
w 0 0 0 1
But when I run l 0 I get a File not found error. What am I doing wrong?
Two issues:
MS-DOS filenames should have a maximum of 8 letters before the dot and a maximum of 3 letters after the dot.
For this use of the l command in debug, provide no parameters. The file will always be loaded to CS:0100.
(I somehow find it worrying that my brain saved this useless information for all those years...)

wkhtmltopdf with full page background

I am using wkhtmltopdf to generate a PDF file that is going to a printer and have some troubles with making the content fill up an entire page in the resulting PDF.
In the CSS I've set the width and height to 2480 X 3508 pixels (a4 300 dpi) and when creating the PDF I use 0 for margins but still end up with a small white border to the right and bottom. Also tried to use mm and percentage but with the same result.
I'd need someone to please provide an example on how to style the HTML and what options to use at command line so that the resulting PDF pages fill out the entire background. One way might be to include bleeding (this might be necessary anyway) but any tips are welcome. At the moment I am creating one big HTML page (without CSS page breaks - might help?) but if needed it would be fine to generate each page separately and then feed them all to wkhtmltopdf.
wkhtmltopdf v 0.11.0 rc2
What ended up working:
wkhtmltopdf --margin-top 0 --margin-bottom 0 --margin-left 0 --margin-right 0 <url> <output>
shortens to
wkhtmltopdf -T 0 -B 0 -L 0 -R 0 <url> <output>
Using html from stdin (Note dash)
echo "<h1>Testing Some Html</h2>" | wkhtmltopdf -T 0 -B 0 -L 0 -R 0 - <output>
Using html from stdin to stdout
echo "Testing Some Html" | wkhtmltopdf -T 0 -B 0 -L 0 -R 0 - test.pdf
echo "Testing Some Html" | wkhtmltopdf -T 0 -B 0 -L 0 -R 0 - - > test.pdf
What did not work:
Using --dpi
Using --page-width and --page-height
Using --zoom
We just solved the same problem by using the --disable-smart-shrinking option.
I realize this is old and cold, but just in case someone finds this and has the same/similar problem, here's a workaround that worked for me after some trial&error.
I created a simple filler.html as:
<!DOCTYPE html>
<html>
<head>
</head>
<body style="margin: 0; padding: 0;">
<div style="height: 30mm; background-color: #F7EBD4;">
</div>
</body>
</html>
Use valid HTML (!DOCTYPE is important) and only inline styles. Match the background color to that of the main document and use height equal or bigger than your margins.
I run version 0.12.0 with the following arguments:
wkhtmltopdf --print-media-type --orientation portrait --page-size A4
--encoding UTF-8 --T 10mm --B 10mm --L 0mm --R 0mm
--header-html filler.html --footer-html filler.html - - <file.html >file.pdf
Hoping this helps someone...
I'm using version 0.12.2.1 and setting:
body { padding: 0; margin 0; }
div.page-layout { height: 295.5mm; width: 209mm;}
worked for me.
Of course need to add 0 margins by:
wkhtmltopdf -T 0 -B 0 -L 0 -R 0
At http://code.google.com/p/wkhtmltopdf/issues/detail?id=359 I found out more people 'suffer' from this bug. The --dpi 300 workaround did not work for me, I had to set --zoom 1.045 to zoom in a bit which made the extra right and bottom border disappear...
Works fine for me with -B 0 -L 0 -R 0 -T 0 options and using your trick of setting up an A4 sized div.
Did you remember to use body {margin:0; padding:0;} in the top of your CSS?
I cannot help you with CSS page breaks as I have not trialled an errored those yet, however, you can run scripts on the page to do clever things. Here is a jQuery example of how to split content down into page size chunks based on the length of the content. If you can get that adapted to work with wkhtmltopdf then please post here!
http://www.script-tutorials.com/demos/79/index.html
What you are experiencing is a bug.
You'll need to set the --dpi option when converting the file. In you case you will probably want --dpi 300, but that can be set lower.
Solved it by increasing the DPI
I'm working with an A4 size in portrait mode. Had white space to the right.
I noticed that as the dpi is increased, the white space got thinner.
at 300 dpi the white space is not visible in chrome pdf view at (max) zoomed at 500%
In Adobe reader it's still visible. It got better at 600 DPI and at 1200 DPI it's become invisible even at 6500% zoom.
There's no disadvantage to this so far as I observed, all dpi generate the same file size and run at the same speed (tested on 1 page).
effectively my settings are as follows:
echo "<html style='padding=0;margin=0'><body style='background-color:black;padding=0;margin=0'></html>" | wkhtmltopdf -T 0 -B 0 -L 0 -R 0 --disable-smart-shrinking --orientation portrait --page-size A4 --dpi 1200 - happy.pdf
If using an unscaled PNG image (thus will be pixel perfect) the default ratio, for an A4 needs to be 120ppi thus # 210mm = 993 pixels wide x 1404 pixels high, if the source is 72 or 300 dpi it makes no difference for a default placement, its the 993 that's counted as 210 mm
No heights, no width, no stretch, nor shrink just default place image as background un-scaled.
wkhtmltopdf --enable-local-file-access -T "0mm" -L "0mm" -R "0mm" -B "0mm" test.html test.pdf
here is such an image reduced into A 4 pdf page 2 different densities same number of pixels
If you use scaling you can use different density values, but this is all that is needed by default's, since PDF works on overall pixel values not DPI as such. Note the PNG is actually smaller by insertion in a PDF than the source JPG which was over 372 KB

Resources