How to use im2rec in MXnet to create my own dataset - windows

In windows 10, I followed the step-by-step MXnet tutorial to use im2rec.py to create a dataset. I created a image list file like this:
integer_image_index \t label_index \t path_to_image
Next, I modified .txt to .lst.
Finally, I executed the command:
python im2rec.py --exts '.jpg' --train-ratio 0.41 --test-ratio 0.49 --recursive=True --pack-label=True D:\CUB_200_2011\data\image_label.lst D:\CUB_200_2011\CUB_200_2011\image
It is shown that "read no error", but the files created by the command like .lst and .rec are 0K, there is empty. I don't know why.
Please tell me what mistakes I made.

im2rec.py will print
read none error:(filename)
for any file that it can't load for whatever reason. Maybe some of the files you list aren't there or are empty? Or maybe the base path you've specified is wrong -- I notice you have the folder name CUB_200_2011 twice.

Related

Biopython: SeqIO.parse() FileNotFoundError

I'm new in Bioinformatics and Biopython, so I have some difficulties with it.
I was reading the Biopython (SeqIO) documentation, but when I try to execute some SeqIO.parse() commands I get FileNotFoundError.
For example, I want to get "example.fasta" file (which I don't have it on my PC). I try to do it with this command:
for record in SeqIO.parse("example.fasta", "fasta"):
print(record.id)
But, all I get is FileNotFoundError: [Errno 2] No such file or directory
Can someone help me with this?
My understanding is that FileNotFoundError occurs when the code tries to open a file on your computer and does not find it.
This can happen either because you simply do not have this file, or you gave the name with a typo, or the path to the file is not correct (This is an important notion: the path to the file should be absolute, or relative to the current working directory (usually the one from which you executed the python script)).
As suggested in the comments to your question, you seem to be expecting SeqIO.parse to get the file for you. This is not the case. The first argument you give to this function (in the example "example.fasta") is the path to an existing file that you want to "parse", that is, interpret its information content and make this content available to the rest of your program in a convenient form.
So in order to get this example working, you first need to get a fasta file. If you do not already have one, you can download some manually from genbank, or find one in the biopython installation (if you installed it from source and know where the source code is located), for instance in Tests/Quality/example.fasta.

sql loader without .dat extension

Oracle's sqlldr defaults to a .dat extension. That I want to override. I don't like to rename the file. When googled get to know few answers to use . like data='fileName.' which is not working. Share your ideas, please.
Error message is fileName.dat is not found.
Sqlloder has default extension for all input files data,log,control...
data= .dat
log= .log
control = .ctl
bad =.bad
PARFILE = .par
But you have to pass filename without apostrophe and dot
sqlloder pass/user#db control=control data=data
sqloader will add extension. control.ctl data.dat
Nevertheless i do not understand why you do not want to specify extension?
You can't, at least in Unix/Linux environments. In Windows you can use the trailing period trick, specifying either INFILE 'filename.' in the control file or DATA=filename. on the command line. WIndows file name handling allows that; you can for instance do DIR filename. at a command prompt and it will list the file with no extension (as will DIR filename). But you can't do that with *nix, from a shell prompt or anywhere else.
You said you don't want to copy or rename the file. Temporarily renaming it might be the simplest solution, but as you may have a reason not to do that even briefly you could instead create a hard or soft link to the file which does have an extension, and use that link as the target instead. You could wrap that in a shell script that takes the file name argument:
# set variable from correct positional parameter; if you pass in the control
# file name or other options, this might not be $1 so adjust as needed
# if the tmeproary file won't be int he same directory, need to be full path
filename=$1
# optionally check file exists, is readable, etc. but overkill for demo
# can also check temporary file does not already exist - stop or remove
# create soft link somewhere it won't impact any other processes
ln -s ${filename} /tmp/${filename##*/}.dat
# run SQL*Loader with soft link as target
sqlldr user/password#db control=file.ctl data=/tmp/${filename##*/}.dat
# clean up
rm -f /tmp/${filename##*/}.dat
You can then call that as:
./scriptfile.sh /path/to/filename
If you can create the link in the same directory then you only need to pass the file, but if it's somewhere else - which may be necessary depending on why renaming isn't an option, and desirable either way - then you need to pass the full path of the data file so the link works. (If the temporary file will be int he same filesystem you could use a hard link, and you wouldn't have to pass the full path then either, but it's still cleaner to do so).
As you haven't shown your current command line options you may have to adjust that to take into account anything else you currently specify there rather than in the control file, particularly which positional argument is actually the data file path.
I have the same issue. I get a monthly download of reference data used in medical application and the 485 downloaded files don't have file extensions (#2gb). Unless I can load without file extensions I have to copy the files with .dat and load from there.

Reinstalling packages from a list generated by command: ado dir

I am recovering Stata following a Windows upgrade. I have a list of my packages generated from ado dir in the following format:
[1] package mdesc from http://fmwww.bc.edu/RePEc/bocode/m
'MDESC': module to tabulate prevalence of missing values
[2] package univar from http://fmwww.bc.edu/RePEc/bocode/u
'UNIVAR': module to generate univariate summary with box-and-whiskers plot
[3] package tabmiss from http://www.ats.ucla.edu/stat/stata/ado/analysis
tabmiss. Shows tabulation of number of missing and non-missing values
I have many packages and would like to reinstall them without having to designate each directory/url via net cd. While using net cd along with net install or ssc install along with package names in a loop is trivial (as below), it would seem that an automated method for this task might be available.
net cd http://www.ats.ucla.edu/stat/stata/ado/analysis
local ucla tabmiss csgof powerlog ldfbeta
foreach x of local ucla {
net install `x'
}
To my knowledge, there is no built-in or automated method of tracking and managing your installed packages outside of what is available through ado or net.
I would also tend to agree with #Nick Cox that this task seems strange and I can't imagine how a new Stata install or reinstall could know what was installed previously, but I find the question interesting for other reasons.
The main reason being for users who have Stata installed on multiple machines who need the same packages on both machines. I faced a similar issue when I purchased a new computer and installed Stata but wanted all of the packages I use to be available as well. Outside of moving the ado directory or selected contents I'm not aware of any quick solution.
Here it would be possible to use the output of ado dir on one machine to determine what you need to install on a second machine with a new Stata install.
The method you propose using a foreach loop could save you time from having to type in or copy/paste a lot of packages and URLs. At the same time however, this is only beneficial if you have many packages from only a few repositories because you will need to net cd to the URL each time as you show in your example.
An alternative solution is the programmatic solution. As you know, ado dir will list each installed package, the URL and a short description of the package. Using this, a log file, and the built in I/O functionality, a short program could be written to automate the process and dynamically build a do file that contains the commands to install the already installed packages.
The code below generates a do file containing commands (in this case, net describe package, from(url)) for each package I have installed on my computer.
clear *
tempfile log1
log using "`log1'", text name(mylog)
ado dir
log close mylog
tempname logfile
file open `logfile' using "`log1'", read
file read `logfile' line
file open dfh using "path/to/your/dofile.do", write replace
local pckage "package"
while r(eof) == 0 {
if `: list pckage in line' {
local packageName : word 3 of `line'
local dirName : word 5 of `line'
di "`packageName' `dirName'"
file write dfh "net describe `packageName', from(`dirName')"
file write dfh _newline
}
file read `logfile' line
}
file close `logfile'
file close dfh
In the above code, I create a temp file to write a .txt log file to and store the contents of ado dir in that file.
Then, I open the log file using file open and read it line by line in the while loop.
Above the loop, I'm creating a do file at /path/to/your/dofile.do to hold the output of the loop - the dynamically created commands relating to the installed packages on my machine.
The loop will iterate so long as r(eof) = 0, where r(eof) is an end of file marker. I use an if statement to sort out lines of the log file which contain the word package, as I'm only interested in those lines with the package name and URL in them.
Inside of the if block, I parse the local macro line to pull the package name and the URL/directory name.
this is important: this section of code assumes that the 3rd and 5th words in the macro will always be the package name and URL respectively - Confirm this from the output of ado dir before executing.
You will also need to change the command that is being written to the file handle dfh inside of the loop to what you want (net install, etc) when you are ready to execute.
For more help on using file, locals, and tempfiles execute any of the following in Stata:
help file
help extended_fcn
help macrolists
There may be nicer ways to parse the contents of ado dir but this has worked for me. And of course I'd always advise that you take the time to understand what the code is doing so that you can make any necessary tweaks to fit your particular situation.

is there a way i can use Asterix(*) in tesseract?

am trying to extract to use all my box file to extract characters and when i try this line
unicharset_extractor *.box
it gives me an error that it cannot find *.box instead of loading all box files.
That specific program does not support such syntax. You have to chain the names of all the box files and feed to it, such as:
unicharset_extractor lang.fontname.exp0.box lang.fontname.exp1.box ...
You can write a script (e.g., train.ps1) to automate the process.
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
I finally made my own tool for that .
[link]http://code.google.com/p/serak-tesseract-trainer/

Programming a Filter/Backend to 'Print to PDF' with CUPS from any Mac OS X application

Okay so here is what I want to do. I want to add a print option that prints whatever the user's document is to a PDF and adds some headers before sending it off to a device.
I guess my questions are: how do I add a virtual "printer" driver for the user that will launch the application I've been developing that will make the PDF (or make the PDF and launch my application with references to the newly generated PDF)? How do I interface with CUPS to generate the PDF? I'm not sure I'm being clear, so let me know if more information would be helpful.
I've worked through this printing with CUPS tutorial and seem to get everything set up okay, but the file never seems to appear in the appropriate temporary location. And if anyone is looking for a user-end PDF-printer, this cups-pdf-for-mac-os-x is one that works through the installer, however I have the same issue of no file appearing in the indicated directory when I download the source and follow the instructions in the readme. If anyone can get either of these to work on a mac through the terminal, please let me know step-by-step how you did it.
The way to go is this:
Set up a print queue with any driver you like. But I recommend to use a PostScript driver/PPD. (A PostScript PPD is one which does not contain any *cupsFilter: ... line.):
Initially, use the (educational) CUPS backend named 2dir. That one can be copied from this website: KDE Printing Developer Tools Wiki. Make sure when copying that you get the line endings right (Unix-like).
Commandline to set up the initial queue:
lpadmin \
-p pdfqueue \
-v 2dir:/tmp/pdfqueue \
-E \
-P /path/to/postscript-printer.ppd
The 2dir backend now will write all output to directory /tmp/pdfqueue/ and it will use a uniq name for each job. Each result should for now be a PostScript file. (with none of the modifications you want yet).
Locate the PPD used by this queue in /etc/cups/ppd/ (its name should be pdfqueue.ppd).
Add the following line (best, near the top of the PPD):
*cupsFilter: "application/pdf 0 -" (Make sure the *cupsFilter starts at the very beginning of the line.) This line tells cupsd to auto-setup a filtering chain that produces PDF and then call the last filter named '-' before it sends the file via a backend to a printer. That '-' filter is a special one: it does nothing, it is a passthrough filter.
Re-start the CUPS scheduler:sudo launchctl unload /System/Library/LaunchDaemons/org.cups.cupsd.plist
sudo launchctl load /System/Library/LaunchDaemons/org.cups.cupsd.plist
From now on your pdfqueue will cause each job printed to it to end up as PDF in /tmp/pdfqueue/*.pdf.
Study the 2dir backend script. It's simple Bash, and reasonably well commented.
Modify the 2dir in a way that adds your desired modifications to your PDF before saving on the result in /tmp/pdfqueue/*.pdf...
Update: Looks like I forgot 2 quotes in my originally prescribed *cupsFilter: ... line above. Sorry!
I really wish I could accept two answers because I don't think I could have done this without all of #Kurt Pfeifle 's help for Mac specifics and just understanding printer drivers and locations of files. But here's what I did:
Download the source code from codepoet cups-pdf-for-mac-os-x. (For non-macs, you can look at http://www.cups-pdf.de/) The readme is greatly detailed and if you read all of the instructions carefully, it will work, however I had a little trouble getting all the pieces, so I will outline exactly what I did in the hopes of saving someone else some trouble. For this, the directory with the source code is called "cups-pdfdownloaddir".
Compile cups-pdf.c contained in the src folder as the readme specifies:
gcc -09 -s -lcups -o cups-pdf cups-pdf.c
There may be a warning: ld: warning: option -s is obsolete and being ignored, but this posed no issue for me. Copy the binary into /usr/libexec/cups/backend. You will likely have to the sudo command, which will prompt you for your password. For example:
sudo cp /cups-pdfdownloaddir/src/cups-pdf /usr/libexec/cups/backend
Also, don't forget to change the permissions on this file--it needs root permissions (700) which can be changed with the following after moving cupd-pdf into the backend directory:
sudo chmod 700 /usr/libexec/cups/backend/cups-pdf
Edit the file contained in /cups-pdfdownloaddir/extra/cups-pdf.conf. Under the "PDF Conversion Settings" header, find a line under the GhostScript that reads #GhostScript /usr/bin/gs. I did not uncomment it in case I needed it, but simply added beneath it the line Ghostscript /usr/bin/pstopdf. (There should be no pre-cursor # for any of these modifications)
Find the line under GSCall that reads #GSCall %s -q -dCompatibilityLevel=%s -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pdfwrite -sOutputFile="%s" -dAutoRotatePage\
s=/PageByPage -dAutoFilterColorImages=false -dColorImageFilter=/FlateEncode -dPDFSETTINGS=/prepress -c .setpdfwrite \
-f %s Again without uncommenting this, under this I added the line GSCall %s %s -o %s %s
Find the line under PDFVer that reads #PDFVer 1.4 and change it to PDFVer, no spaces or following characters.
Now save and exit editing before copying this file to /etc/cups with the following command
sudo cp cups-pdfdownloaddir/extra/cups-pdf.conf /etc/cups
Be careful of editing in a text editor because newlines in UNIX and Mac environments are different and can potentially ruin scripts. You can always use a perl command to remove them, but I'm paranoid and prefer not to deal with it in the first place.
You should now be able to open a program (e.g. Word, Excel, ...) and select File >> Print and find an available printer called CUPS-PDF. Print to this printer, and you should find your pdfs in /var/spool/cups-pdf/yourusername/ by default.
*Also, I figured this might be helpful because it helped me: if something gets screwed up in following these directions and you need to start over/get rid of it, in order to remove the driver you need to (1) remove the cups-pdf backend from /usr/libexec/cups/backend (2) remove the cups-pdf.conf from /etc/cups/ (3) Go into System Preferences >> Print & Fax and delete the CUPS-PDF printer.
This is how I successfully set up a pdf backend/filter for myself, however there are more details, and other information on customization contained in the readme file. Hope this helps someone else!

Resources