Sound file to text file-speech recognition for ubuntu, specifically pocketsphinx usage - pocketsphinx

As made clear here: https://unix.stackexchange.com/questions/256138/is-there-any-decent-speech-recognition-software-for-linux
Finding speech recognition software that turns sound file into text is dificult to do on linux.
I trying to use pocketsphinx_continuous command. Pocket spinx is already installed.
There are several dict files, language model files and acoustic folders that I have downloaded. I tried running the command pocketsphinx_continuous.
The command I use is: sudo pocketsphinx_continuous -dict /home/barnabas/Desktop/dict/cmudict.dict -hmm /home/barnabas/Desktop/wsj_all_sc.cd_semi_5000/ -lm /home/barnabas/Desktop/en-70k-0.1.lm -infile untitled2.wav 2> pocketsphinx.log > myspeech.txt
Now.
Without fail all outputs have a padded index on the left without any pair output text.
000000000:
I want a short list of one dictionary file, language model file, acoustic file listed please, that are compatible with each other. Thank you.

I want a short list of one dictionary file, language model file, acoustic file listed please, that are compatible with each other.
Install the pocketsphinx-en-us package from the universe/sound section.
(It's available in Ubuntu 18.04 Bionic Beaver and later.
Prior to that, I believe it was called pocketsphinx-hmm-en-hub4wsj.)
This will put the model in /usr/share/pocketsphinx/model/en-us/.
After that, you can run commands like this (there's no need to use sudo):
pocketsphinx_continuous -infile myfile.wav 2>&1 > myspeech.txt | tee out.log | less
Or if you want to specify the folders manually:
pocketsphinx_continuous \
-hmm /usr/share/pocketsphinx/model/en-us/en-us \
-dict /usr/share/pocketsphinx/model/en-us/cmudict-en-us.dict \
-lm /usr/share/pocketsphinx/model/en-us/en-us.lm.bin \
-infile myfile.wav > myspeech.txt
Make sure you have a 16-bit, 16 kHz mono wav file, or convert if necessary:
ffmpeg -i myfile.mp3 -ar 16000 -ac 1 -sample_fmt s16 myfile.wav
You might not have the best accuracy from the generic model.
Here's set #1 of the Harvard Sentences:
One: The birch canoe slid on the smooth planks.
Two: Glue the sheet to the dark blue background.
Three: It's easy to tell the depth of a well.
Four: These days a chicken leg is a rare dish.
Five: Rice is often served in round bowls.
Six: The juice of lemons makes fine punch.
Seven: The box was thrown beside the parked truck.
Eight: The hogs were fed chopped corn and garbage.
Nine: Four hours of steady work faced us.
Ten: Large size in stockings is hard to sell.
and here's the output I got from my recording:
if one half the brcko nude lid on this good length
to conclude ishii to the dark blue background
three it's easy to tell the devil wow
for these days eat chicken leg is a rare dish
five race is often served in round polls
six the juice of the lemons makes flying conch
seven the box was thrown beside the parked truck
eight the hogs griffin chopped coroner and garbage
not in four hours of steady work the stocks
ten large son is in stockings his heart is the good
Related:
How to give an input wav file to pocket sphinx
Error when running pocketsphinx_continuous: Acoustic model definition is not specified
How can we convert .wav file to text by using pocketsphinx?
https://askubuntu.com/questions/837408/convert-speech-mp3-audio-files-to-text
https://askubuntu.com/questions/161515/speech-recognition-app-to-convert-mp3-to-text
https://old.reddit.com/r/linuxquestions/comments/bj54y0/speech_to_text_program/em6b7j9/
https://unicom.crosenthal.com/blog/entry/686

Related

Batch Convert Sony raw ".ARW" image files to .jpg with raw image settings on the command line

I am looking to convert 15 Million 12.8 mb Sony .ARW files to .jpg
I have figured out how to do it using sips on the Command line BUT what I need is to make adjustments to the raw image settings: Contrast, Highlights, Blacks, Saturation, Vibrance, and most importantly Dehaze. I would be applying the same settings to every single photo.
It seems like ImageMagick should work if I can make adjustments for how to incorporate Dehaze but I can't seem to get ImageMagick to work.
I have done benchmark testing comparing Lightroom Classic / Photoshop / Bridge / RAW Power / and a few other programs. Raw Power is fastest by far (on a M1 Mac Mini 16GB Ram) but Raw Power doesn't allow me to process multiple folders at once.
I do a lot of scripting / actions with photoshop - but in this case photoshop is by far the slowest option. I believe this is because it opens each photo.
That's 200TB of input images, without even allowing any storage space for output images. It's also 173 solid days of 24 hr/day processing, assuming you can do 1 image per second - which I doubt.
You may want to speak to Fred Weinhaus #fmw42 about his Retinex script (search for "hazy" on that page), which does a rather wonderful job of haze removal. Your project sounds distinctly commercial.
© Fred Weinhaus - Fred's ImageMagick scripts
If/when you get a script that does what you want, I would suggest using GNU Parallel to get decent performance. I would also think you may want to consider porting, or having ported, Fred's algorithm to C++ or Python to run with OpenCV rather than ImageMagick.
So, say you have a 24-core MacPro, and a bash script called ProcessOne that takes the name of a Sony ARW image as parameter, you could run:
find . -iname \*.arw -print0 | parallel --progress -0 ProcessOne {}
and that will recurse in the current directory finding all Sony ARW files and passing them into GNU Parallel, which will then keep all 24-cores busy until the whole lot are done. You can specify fewer, or more jobs in parallel with, say, parallel -j 8 ...
Note 1: You could also list the names of additional servers in your network and it will spread the load across them too. GNU Parallel is capable of transferring the images to remote servers along with the jobs, but I'd have to question whether it makes sense to do that for this task - you'd probably want to put a subset of the images on each server with its own local disk I/O and run the servers independently yourself rather than distributing from a single point globally.
Note 2: You will want your disks well configured to handle multiple, parallel I/O streams.
Note 3: If you do write a script to process an image, write it so that it accepts multiple filenames as parameters, then you can run parallel -X and it will pass as many filenames as your sysctl parameter kern.argmax allows. That way you won't need a whole bash or OpenCV C/C++ process per image.

Separating icons from pictures in a bunch of random PNGs

A long time ago I screwed with my HDD and had to recover all my data, but I couldn't recover the files' names.
I used a tool to sort all these files by extension, and another to sort the JPGs by date, since the date when a JPG was created is stored in the file itself. I can't do this with PNGs, though, unfortunately.
So I have a lot of PNGs, but most of them are just icons or assets used formerly as data by the software I used at that time. But I know there are other, "real" pictures, that are valuable to me, and I would really love to get them back.
I'm looking for any tool, or any way, just anything you can think of, that would help me separate the trash from the good in this bunch of pictures, it would really be amazing of you.
Just so you know, I'm speaking of 230 thousand files, for ~2GB of data.
As an exemple, this is what I call trash :
or , and all these kind of images.
I'd like these to be separated from pictures of landscapes / people / screenshots, the kind of pictures you could have in you phone's gallery...
Thanks for reading, I hope you'll be able to help !
This simple ImageMagick command will tell you the:
height
width
number of colours
name
of every PNG in the current directory, separated by colons for easy parsing:
convert *.png -format "%h:%w:%k:%f\n" info:
Sample Output
600:450:5435:face.png
600:450:17067:face_sobel_magnitude.png
2074:856:2:lottery.png
450:450:1016:mask.png
450:450:7216:result.png
600:450:5435:scratches.png
800:550:471:spectrum.png
752:714:20851:z.png
If you are on macOS or Linux, you can easily run it under GNU Parallel to get 16 done at a time and you can parse the results easily with awk, but you may be on Windows.
You may want to change the \n at the end for \r\n under Windows if you are planning to parse the output.

Faster way to export each page of a PDF into two images

The job is quite simple: I got a few hundred PDF documents and I need to export each page of them into 2 images: one big, one small.
After a couple hours of research and optimizations I came up with a neat Bash script to do it:
#!/bin/bash
FILE=$1
SLUG=$(md5 -q "$FILE")
mkdir -p $SLUG
gs -sDEVICE=jpeg -r216 -g1920x2278 -q -o $SLUG/%d.jpg "$FILE"
for IMAGE in $SLUG/*.jpg; do
convert $IMAGE -resize 171x219 ${IMAGE/jpg/png}
done
As you can see, I...
Create a directory named with the MD5 of the file
Use GhostScript to extract each page of the PDF into a big JPEG
Use ImageMagick to create a smaller version of the JPG into a PNG
It works. But I'm afraid it's not fast enough.
I'm getting an average of .6s for each page (roughly 1 minute for a 80 page PDF) on my MacBook. But that script's going to run on a server, a much low end one - probably a micro EC2 with Ubuntu on Amazon.
Anyone got any tips, tricks or a lead to help me optimize this script ? Should I use another tool ? Are there better suited libraries for this kind of work ?
Unfortunately I don't write C or C++, but if you guys point some good libraries and tutorials I'll gladly learn it.
Thanks.
Update.
I just tested it on a t1.micro instance on AWS. It took 10 minutes to process the same PDF with 80 pages. Also I noticed that convert was the slowest guy taking almost 5 minutes to resize the images.
Update. 2
I tested it now on a c1.medium instance. It's ~7x times the price of a t1.micro, but it came up very close to the performance of my MacBook: ~3.5 minutes for a document of 244 pages.
I'm gonna try mudraw and other combinations now.
You could just run GS twice, once for the big images and again for the smaller. Of course the output probably won't be as nice as convert would make, but at that size I'm guessing it won't be terribly obvious.
I've no idea how you would do it in a Bash script, but you could run 2 instances of Ghostscript (one for each size), which might be faster if the server is up to it.

Slow ghostscript conversion and slow printing with large ps files

I have a service which produces pdf files. I have PS-compatible printers. For printing the pdf files, I use ghostscript to convert them to ps an copy them to a shared (windows) print queue. Most of the pdf-files contain just a few pages (<10) and don't cause any trouble.
From time to time I have to print large files (100+, 500+, 5000+) pages and there I observe the following:
converting to ps is fast for the first couple of pages, then slows down. The further the progress, the longer the time for a single page.
after conversion, copying to the print queue works without problems
when copying is finished and it comes to sending the document to the printer, I observe more or less the same phenomenon: the further the progress, the slower the transfer.
Here is how I convert pdf to ps:
"C:\Program Files\gs\gs9.07\bin\gswin64c.exe" \
-dSAFER -dNOPAUSE -DBATCH \
-sOutputFile=D:\temp\testGS\test.ps \
-sDEVICE=ps2write \
D:\temp\testGS\test.pdf
After this conversion I simply copy it to the print queue
copy /B test.ps \\printserever\myPSQueue
What possibilities do I have to print large files this way?
My first idea was to do the following:
"C:\Program Files\gs\gs9.07\bin\gswin64c.exe" \
-dSAFER -dNOPAUSE -DBATCH \
-sOutputFile=D:\temp\testGS\test%05d.ps \
-sDEVICE=ps2write \
D:\temp\testGS\test.pdf
Working with single pages speeds up the conversion, it doesn't slow down after every single page, and also printing is fast, when I copy every single page as own ps file to the printer. But there is one problem I will encounter sooner or later: when I copy the single ps files, they will be single print jobs. Even when they are sorted in the correct order, if someone else starts a print job on the same printer in between, the printings will all get mixed up.
The other idea was using gsPrint, which works considerable fast, but with gsPrint I need the printer to be installed locally, which is not manageable in my environment with 300+ printers at different locations.
Can anyone tell me exactly, what happens? Is this a bad way to print? Does any have a suggestion how to solve the task of printing such documents in such an environment?
Without seeing an example PDF file its difficult to say much about why it should print slowly. However the most likely explanation is that the PDF is being rendered to an image, probably because it contains transparency.
This will result in a large image, created at the default resolution of the device (720 dpi), which is almost certainly higher than required for your printer(s). This means that a latge amount of time is spent transmitting extra data to the printer, which the PostScript interpreter in the printer then has to discard.
Using gsprint renders the file to the resolution of the device, assuming this is less than 720 dpi the resulting PostScript will be smaller therefore requiring less time to transmit, less time to decompress on the printer and less time spent throwing away extra data.
One reason the speed decreases is because of the way ps2write works, it maintains much of the final content in temporary files, and stitches the main file back together from those files. It also maintains a cross reference table which grows as the number of objects int eh file does. Unless you need the files to be continuous you could create a number of print files by using the -dFirstPage and -dLastPage options so that only a subset of the final printout is created, this might improve the performance.
Note that ps2write does not render the incoming file to an image, while gsprint definitely does, the PostScript emerging from gsprint will simply define a big bitmap. This doesn't mantain colours (everything goes to RGB) and doesn't maintain vector objects as vectors, so it doesn't scale well. However.... If you want to use gsprint to print to a remote printer, you can set up a 'virtual printer' using RedMon. You can have RedMon send the output from a port to a totally different printer, even a remote one. So you use gsprint to print to (eg) 'local instance of MyPrinter' on RedMon1: and have the RedMon port set up to capture the print stream to disk and then send the PostScript file to 'MyPrinter on another PC'. Though I'd guess thats probably not going to be any faster.
My suggestion would be to set the resolution of ps2write lower; -r300 should be enough for any printer, and lower may be possible. The resolution will only affect rendered output, everything else remains as vectors and so scales nicely. Rendered images will print perfectly well at half the resolution of the printer, in general.
I can't say why the printer becomes so slow with the Ghostscript generated PostScript, but you might want to give other converters a try, like pdftops from the Poppler utils (I found a Windows download here as you seem to be using Windows).

PostScript to PDF conversion/slow print issue [GhostScript]

I have several large PDF reports (>500 pages) with grid lines and background shading overlay that I converted from postscript using GhostScript's ps2pdf in a batch process. The PDFs that get created look perfect in the Adobe Reader.
However, when I go to print the PDF from Adobe Reader I get about 4-5 ppm from our Dell laser printer with long, 10+ second pauses between each page. The same report PDF generated from another proprietary process (not GhostScript) yeilds a fast 25+ ppm on the same printer.
The PDF file sizes on both are nearly the same at around 1.5 MB each, but when I print both versions of the PDF to file (i.e. postscript), the GhostScript generated PDF postscript output is about 5 times larger than that of the other (2.7 mil lines vs 675K) or 48 MB vs 9 MB. Looking at the GhostScript output, I see that the background pattern for the grid lines/shading (referenced by "/PatternType1" tag) is defined many thousands of times throughout the file, where it is only defined once in the other PDF output. I believe this constant re-defining of the background pattern is what is bogging down the printer.
Is there a switch/setting to force GhostScript to only define a pattern/image only once? I've tried using the -r and -dPdfsettings=/print switches with no relief.
Patterns (and indeed images) and many other constructs should only be emitted once, you don't need to do anything to have this happen.
Forms, however, do not get reused, and its possible that this is the source of your actual problem. As Kurt Pfiefle says above its not possible to tell without seeing a file which causes the problem.
You could raise a bug report at http://bubgs.ghostscript.com which will give you the opportunity to attach a file. If you do this please do NOT attach a > 500 page file, it would be appreciated if you would try to find the time to create a smaller file which shows the same kind of size inflation.
Without seeing the PostScript file I can't make any suggestions at all.
I've looked at the source PostScript now, and as suspected the problem is indeed the use of a form. This is a comparatively unusual area of PostScript, and its even more unusual to see it actually being used properly.
Because its rare usage, we haven't any impetus to implement the feature to preserve forms in the output PDF, and this is what results in the large PDF. The way the pattern is defined inside the form doesn't help either. You could try defining the pattern separately, at least that way pdfwrite might be able to detect the multiple pattern usage and only emit it once (the pattern contains an imagemask so this may be worthwhile).
This construction:
GS C20 setpattern 384 151 32 1024 RF GR
GS C20 setpattern 384 1175 32 1024 RF GR
is inefficient, you keep re-instantiating the pattern, which is expensive, this:
GS C20 setpattern
384 151 32 1024 RF
384 1175 32 1024 RF
GR
is more efficient
In any event, there's nothing you can do with pdfwrite to really reduce this problem.
'[...] when I print both versions of the PDF to file (i.e. postscript), the GhostScript generated PDF postscript output is about 5 times larger than that of the other (2.7 mil lines vs 675K) or 48 MB vs 9 MB.'
Which version of Ghostscript do you use? (Try gs -v or gswin32c.exe -v or gswin64c.exe -v to find out.)
How exactly do you 'print to file' the PDFs? (Which OS platform, which application, which kind of settings?)
Also, ps2pdf may not be your best option for the batch process. It's a small shell/batch script anyway, which internally calls a Ghostscript command.
Using Ghostscript directly will give you much more control over the result (though its commandline 'usability' is rather inconvenient and awkward -- that's why tools like ps2pdf are so popular...).
Lastly, without direct access to one of your PS input samples for testing (as well as the PDF generated by the proprietary converter) it will not be easy to come up with good suggestions.

Resources