is there a way i can use Asterix(*) in tesseract? - cmd

am trying to extract to use all my box file to extract characters and when i try this line
unicharset_extractor *.box
it gives me an error that it cannot find *.box instead of loading all box files.

That specific program does not support such syntax. You have to chain the names of all the box files and feed to it, such as:
unicharset_extractor lang.fontname.exp0.box lang.fontname.exp1.box ...
You can write a script (e.g., train.ps1) to automate the process.
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

I finally made my own tool for that .
[link]http://code.google.com/p/serak-tesseract-trainer/

Related

Biopython: SeqIO.parse() FileNotFoundError

I'm new in Bioinformatics and Biopython, so I have some difficulties with it.
I was reading the Biopython (SeqIO) documentation, but when I try to execute some SeqIO.parse() commands I get FileNotFoundError.
For example, I want to get "example.fasta" file (which I don't have it on my PC). I try to do it with this command:
for record in SeqIO.parse("example.fasta", "fasta"):
print(record.id)
But, all I get is FileNotFoundError: [Errno 2] No such file or directory
Can someone help me with this?
My understanding is that FileNotFoundError occurs when the code tries to open a file on your computer and does not find it.
This can happen either because you simply do not have this file, or you gave the name with a typo, or the path to the file is not correct (This is an important notion: the path to the file should be absolute, or relative to the current working directory (usually the one from which you executed the python script)).
As suggested in the comments to your question, you seem to be expecting SeqIO.parse to get the file for you. This is not the case. The first argument you give to this function (in the example "example.fasta") is the path to an existing file that you want to "parse", that is, interpret its information content and make this content available to the rest of your program in a convenient form.
So in order to get this example working, you first need to get a fasta file. If you do not already have one, you can download some manually from genbank, or find one in the biopython installation (if you installed it from source and know where the source code is located), for instance in Tests/Quality/example.fasta.

sql loader without .dat extension

Oracle's sqlldr defaults to a .dat extension. That I want to override. I don't like to rename the file. When googled get to know few answers to use . like data='fileName.' which is not working. Share your ideas, please.
Error message is fileName.dat is not found.
Sqlloder has default extension for all input files data,log,control...
data= .dat
log= .log
control = .ctl
bad =.bad
PARFILE = .par
But you have to pass filename without apostrophe and dot
sqlloder pass/user#db control=control data=data
sqloader will add extension. control.ctl data.dat
Nevertheless i do not understand why you do not want to specify extension?
You can't, at least in Unix/Linux environments. In Windows you can use the trailing period trick, specifying either INFILE 'filename.' in the control file or DATA=filename. on the command line. WIndows file name handling allows that; you can for instance do DIR filename. at a command prompt and it will list the file with no extension (as will DIR filename). But you can't do that with *nix, from a shell prompt or anywhere else.
You said you don't want to copy or rename the file. Temporarily renaming it might be the simplest solution, but as you may have a reason not to do that even briefly you could instead create a hard or soft link to the file which does have an extension, and use that link as the target instead. You could wrap that in a shell script that takes the file name argument:
# set variable from correct positional parameter; if you pass in the control
# file name or other options, this might not be $1 so adjust as needed
# if the tmeproary file won't be int he same directory, need to be full path
filename=$1
# optionally check file exists, is readable, etc. but overkill for demo
# can also check temporary file does not already exist - stop or remove
# create soft link somewhere it won't impact any other processes
ln -s ${filename} /tmp/${filename##*/}.dat
# run SQL*Loader with soft link as target
sqlldr user/password#db control=file.ctl data=/tmp/${filename##*/}.dat
# clean up
rm -f /tmp/${filename##*/}.dat
You can then call that as:
./scriptfile.sh /path/to/filename
If you can create the link in the same directory then you only need to pass the file, but if it's somewhere else - which may be necessary depending on why renaming isn't an option, and desirable either way - then you need to pass the full path of the data file so the link works. (If the temporary file will be int he same filesystem you could use a hard link, and you wouldn't have to pass the full path then either, but it's still cleaner to do so).
As you haven't shown your current command line options you may have to adjust that to take into account anything else you currently specify there rather than in the control file, particularly which positional argument is actually the data file path.
I have the same issue. I get a monthly download of reference data used in medical application and the 485 downloaded files don't have file extensions (#2gb). Unless I can load without file extensions I have to copy the files with .dat and load from there.

How to use im2rec in MXnet to create my own dataset

In windows 10, I followed the step-by-step MXnet tutorial to use im2rec.py to create a dataset. I created a image list file like this:
integer_image_index \t label_index \t path_to_image
Next, I modified .txt to .lst.
Finally, I executed the command:
python im2rec.py --exts '.jpg' --train-ratio 0.41 --test-ratio 0.49 --recursive=True --pack-label=True D:\CUB_200_2011\data\image_label.lst D:\CUB_200_2011\CUB_200_2011\image
It is shown that "read no error", but the files created by the command like .lst and .rec are 0K, there is empty. I don't know why.
Please tell me what mistakes I made.
im2rec.py will print
read none error:(filename)
for any file that it can't load for whatever reason. Maybe some of the files you list aren't there or are empty? Or maybe the base path you've specified is wrong -- I notice you have the folder name CUB_200_2011 twice.

download list of images from urls

I need to find (preferably) or build an app for a lot of images.
Each image has a distinct URL. There are many thousands, so doing it manually is a huge effort.
The list is currently in an csv file. (It is essentially a list of products, each with identifying info (name, brand, barcode, etc) and a link to a product image.
I'd like to loop through the list, and download each image file. Ideally I'd like to rename each one - something like barcode.jpg.
I've looked at a number of image scrapers, but haven't found one that works quite this way.
Very appreciative of any leads to the right tool, or ideas...
Are you on Windows or Mac/Linux? In Windows you can use a powershell script for this, on mac/linux a shell script with about 1-5 lines of code.
Here's one way to do this:
# show what's inside the file
cat urlsofproducts.csv
http://bit.ly/noexist/obj101.jpg, screwdriver, blackndecker
http://bit.ly/noexist/obj102.jpg, screwdriver, acme
# this one-liner will GENERATE one download-command per item, but will not execute them
perl -MFile::Basename -F", " -anlE "say qq(wget -q \$F[0] -O '\$F[1]--\$F[2]--). basename(\$F[0]) .q(')" urlsofproducts.csv
# Output :
wget http://bit.ly/noexist/obj101.jpg -O ' screwdriver-- blackndecker--obj101.jpg'
wget http://bit.ly/noexist/obj101.jpg -O ' screwdriver-- acme--obj101.jpg'
Now back-substitute the wget commands into the shell.
If possible please use google sheets to run a function for this kind of work, I was also puzzled on this one and now found a way to by which the images are not only downloaded but those are renamed on the real time.
Kindly reply if you want the code.

VIM+Ctags doesn't work in WinXP

Okay guys, you're my only help :)
I have GVim v. 7.3, Exuberant CTags 5.8, omnicppcomplete (0.41) - all latest, to be exact.
I'm trying to generate tags to use in VIM, but it seems to totally ignore data in tags file.
I've used ctags to generate tags file for bada framework - the file seems to be okay, class definitions present etc. I also tried to apply the same command to STL from Visual Studio.
ctags -R --c++-kinds=+p --fields=+iaS --extra=+q --language -force=C++ "c:\bada\1.0.0\Include\"
Also, I've mapped generating tags via hotkey.
map <C-F12> :!ctags -R --c++-kinds=+p --fields=+iaS --extra=+q .
Trying to use any of files generated by these commands did not succeed.
The command :tags shows empty tag list, but doesn't give any error, and I have no clue how to fix this.
Yes, seems that vim actually handles spaces in a weird way (Windows only?), however there are workarounds: either use dos 8.3 short names or use a wildcard instead of a space (?), like
set tags=c:\program?files?(x86)\vim\tags
PS: which tag files was successfully loaded could be checked with the
:echo tagfiles()
command
The problem was with path to tags file: c:\Program Files\Vim\bada. The VIM didn't want to parse string with spaces no matter what the slashes/backslashes used.
Reinstalling VIM to c:\VIM solved the problem.

Resources