Detect phones with "allphone" with goforward raw file - pocketsphinx

I've been experimenting with trying to get phonemes to be properly
detected. I've been doing this with several of my own audio files and
had poor results. Then I tried with the provided goforward.raw file
and it shows similar problematic results.
My install seems good, and it's working well for sentences:
% pocketsphinx_continuous -infile goforward.raw
go forward ten meters
But the -allphone option does not do what I expected.
% pocketsphinx_continuous -infile goforward.raw -allphone yes
SIL D SIL G OW F AO R W ER D JH T T EH N N M IY IH ZH ER Z S V SIL
It's not terrible, but there are some repeats and odd additions. Are
there workarounds for this? Is this a common result? Do I need to
tweak some options or the raw file?
I ultimately only wish to process a single word input, so any tips for
accomplishing this are much appreciated.
System is Arch Linux with pocketsphinx 5prealpha. I've tried this
with the source install and also the AUR package.

Use the command provided in documentation:
pocketsphinx_continuous -infile test/data/goforward.raw \
-allphone model/en-us/en-us-phone.lm.bin \
-beam 1e-20 -pbeam 1e-20 -lw 2.0

Related

Multiple plots from a single text file (gnuplot)

Currently, I have a text file and I'm interested in plotting two different curves from a single file(values for x axis are the same-column 1, values for y axis-columns 3 and 4). The plot should be in STDOUT since I'm working from ssh. The file that I am working with looks like this (filename: tmp)
%Iter duration train_objective valid_objective difference
0 6.0 0.0195735 0.0610958 0.0415223
1 5.0 0.180216 0.191344 0.011128
2 5.0 0.223318 0.241081 0.017763
3 6.0 0.245895 0.262197 0.016302
4 6.0 0.25796 0.28056 0.0226
5 6.0 0.269223 0.291769 0.022546
6 5.0 0.281187 0.298474 0.017287
7 5.0 0.283891 0.305579 0.021688
8 5.0 0.296456 0.307381 0.010925
9 5.0 0.296856 0.315487 0.018631
10 5.0 0.295805 0.321391 0.025586
Total training time is 0:06:27
So far, I can only plot the values corresponding to the 3rd column using the following line:
cat tmp | gnuplot -e "set terminal dumb size 120, 30; set autoscale; plot '-' u 1:3 with lines notitle"
Could someone tell me then how I could include the 4th column in the same plot? is that possible?
Thanks!
There is nothing in your description that rules out the trivial answer:
gnuplot -e "plot 'tmp' u 1:3 with lines, '' u 1:4 with lines"
The terminal choice is not relevant (you used 'set term dumb' but it could just as easily be any other output terminal, connection via ssh does not prevent that). If you have additional constraints that require a more complicated solution, please add them to the question.

How to pretty print a matrix in Octave?

I want to create a pretty printed table from a matrix (or column vector).
For Matlab there are several available functions that can do this (such as printmat, array2table, and table), but for Octave I cannot find any.
So instead of:
>> a = rand(3,2)*10;
>> round(a)
ans =
2 10
1 3
2 1
I would like to see:
>> a = rand(3,2)*10;
>> pretty_print(round(a))
THIS THAT
R1 2 10
R2 1 3
R3 2 1
How can I produce a pretty printed table from a matrix?
(Any available package to do so?)
UPDATE
After trying to follow the extremely obtuse package installation instruction from Octave Wiki, I kept getting the error pkg: failed to read package 'econometrics-1.1.1.tar.gz': Couldn't resolve host name. Apparently the windows version isn't able to use the direct installation command (as given on their Wiki). The only way I managed to get it, was by first downloading the package manually into the current working directory of Octave. (See pwd output.) Only then did the install command work.
pkg install econometrics-1.1.1.tar.gz
pkg load econometrics
Yes, there is a prettyprint function in the econometrics package. Once the package is installed and loaded, you can use it like this:
>> a = rand(3,2)*10;
>> prettyprint(round(a),['R1';'R2';'R3'],['THIS';'THAT'])
THIS THAT
R1 2.000 3.000
R2 3.000 4.000
R3 10.000 3.000

Print a postscript document with CUPS and a thermal printer

I installed an epson TM-T20 in Ubuntu 12.04, using the official driver. This is a thermal printer, I'm using 80mm paper.
My problem: When I print an image (using a postscript document) it waste a lot of paper because the image uses around 5cm and the printer before the image sends out 25cm of white paper.
I use the following command to send the document to the printer:
lpr -P tm-t20 -o document.ps
The printer prints the image (a 200x200 image), but first sends out a lot of non printed paper.
The printer wasn't recognized by CUPS (using the web interface at localhost:631). Then I installed it using the following procedure:
sudo lpadmin -p tm-t20 -E -v serial:/dev/ttyUSB0 -P /usr/share/ppd/epson-tm-t20-rastertotmt.ppd
Then the printer appeared in the CUPS web interface and I configured it (baud rate, bit parity, etc).
The printer works ok when I send some text.
Here is part of the printer ppd:
*DefaultPageRegion:RP80x297
*PageRegion RP80x297/Roll Paper 80 x 297 mm: "<</PageSize[204 841.8]/ ImagingBBox null>>setpagedevice"
*PageRegion RP58x297/Roll Paper 58 x 297 mm: "<</PageSize[141.7 841.8]/ ImagingBBox null>>setpagedevice"
*CloseUI: *PageRegion
*DefaultImageableArea: RP80x297
*ImageableArea RP80x297/Roll Paper 80 x 297 mm: "0 0 204 841.8"
*ImageableArea RP58x297/Roll Paper 58 x 297 mm: "0 0 141.7 841.8"
*DefaultPaperDimension: RP80x297
*PaperDimension RP80x297/Roll Paper 80 x 297 mm: "204 841.8"
*PaperDimension RP58x297/Roll Paper 58 x 297 mm: "141.7 841.8"
I suppose that this waste of paper is because the 297mm of long that appears in the ppd file. Then I tried adding another configuration of 100mm instead of 297mm, but the problem persists.
I also tryied adding the tag %%DocumentMedia to the ps file, but the same problem:
%!PS-Adobe-3.0
%%Creator: GIMP PostScript file plugin V 1.17 by Peter Kirchgessner
%%Title: yay.ps
%%CreationDate: Thu Sep 13 13:44:26 2012
%%DocumentData: Clean7Bit
%%LanguageLevel: 2
%%Pages: 1
%%BoundingBox: 14 14 215 215
%%
%%EndComments
%%DocumentMedia: Plain 72 72 0 white Plain
%%BeginProlog
% Use own dictionary to avoid conflicts
10 dict begin
%%EndProlog
%%Page: 1 1
% Translate for offset
14.173228346456694 14.173228346456694 translate
% Translate to begin of first scanline
0 199.99999999999997 translate
199.99999999999997 -199.99999999999997 scale
% Image geometry
200 200 8
% Transformation matrix
[ 200 0 0 200 0 0 ]
% Strings to hold RGB-samples per scanline
/rstr 200 string def
/gstr 200 string def
/bstr 200 string def
{currentfile /ASCII85Decode filter /RunLengthDecode filter rstr readstring pop}
{currentfile /ASCII85Decode filter /RunLengthDecode filter gstr readstring pop}
{currentfile /ASCII85Decode filter /RunLengthDecode filter bstr readstring pop}
true 3
%%BeginData: 14759 ASCII Bytes
Any idea?
Finally after a lot of pain. I discover that the problem was the serial to USB cable (in order to connect the serial printer to an USB port). I tried with two different serial to USB cables, but the problem persists and finally I conclude that The printer works erratically if is not connect to a "real" serial port. I tested the printer under identical conditions in a PC with a serial port and it works perfect, just installing the driver provided by epson and giving chmod 777 to /dev/ttyS0. At the job list sometimes I see the error: "/usr/lib/cups/filter/pstopdf failed". But the printer prints ok, like no error occurred.
I have to chmod 777 /dev/ttyUSB0 in order to get the printer working (Even if a run the commands with sudo).
I'm getting acceptable results (text is not at the center) with the option media=B8
lp -d tm-t20 -o media=B8 document.ps
I also tried with
lp -d tm-t20 -o media=Custom.80x90mm document.ps
But the printer doesn't print and the job appears as completed at the cups web interface.
If I try with
lp -d tm-t20 -o media=Custom.200x190 document.ps
The printer prints (not correctly centered, I guess that I need to try with different values until I get the desired result). The paper dimensions in dots are in this site: http://paulbourke.net/dataformats/postscript/
The printer isn't cutting the paper, I dont know how to give that option (print and cut the paper).
The options accepted by the printer are:
lpoptions -p tm-t20 -l
PageSize/Media Size: *RP80x297 RP58x297 Custom.WIDTHxHEIGHT
Resolution/Resolution: *203x203dpi
TmtSpeed/Printing Speed: *Auto 1 2 3 4
TmtPaperReduction/Paper Reduction: Off Top *Bottom Both
TmtPaperSource/Paper Source: *DocFeedCut DocFeedNoCut DocNoFeedCut DocNoFeedNoCut PageFeedCut PageFeedNoCut PageNoFeedCut
TmtBuzzerControl/Buzzer: *Off Before After
TmtSoundPattern/Sound Pattern: *A B C D E
TmtBuzzerRepeat/Buzzer Repeat: *1 2 3 5
TmtDrawer1/Cash Drawer #1: *Off Before After
How to make the printer print and cut the paper? I need to do it from the console, to use it from a custom C++ program. If you have any other experience with this kinds of printers under Linux, please give me some advice. My goal is to use the printer from a C++ program, I didn't find a fast way to do it (sending directly ESC/POS commands to the printer, there isn't official documentation to do it under Linux), so I'm working with CUPS from the console.
Paper CUT SOLVED:
lp -d tm-t20 -o media=Custom.200x258 -o source=DocFeedCut document.ps
I don't know why it works, because as is shown in the options DocFeedCut is the default option.
Now I just will try to center correctly the text.

MS-DOS debug -l 0 not working

I want to write a bin file to a flash drive. I'm supposed to run:
n helloworld.bin
l 0
w 0 0 0 1
But when I run l 0 I get a File not found error. What am I doing wrong?
Two issues:
MS-DOS filenames should have a maximum of 8 letters before the dot and a maximum of 3 letters after the dot.
For this use of the l command in debug, provide no parameters. The file will always be loaded to CS:0100.
(I somehow find it worrying that my brain saved this useless information for all those years...)

strange characters: interaction of R and Windows locale?

WinXP-x32, R-2.13.0
Dear list,
I have a problem that (I think) relates to the interaction between Windows and R.
I am trying to scrape a table with data on the Hawai'ian Islands. This is my R code:
library(XML)
u <- "http://en.wikipedia.org/wiki/Hawaii"
tables <- readHTMLTable(u)
Islands <- tables[[5]]
The output is (first set of columns):
Island Nickname > > Islands
Island Nickname > > Location 1 Hawaiʻi[7] The Big
Island 19°34′N 155°30′W /
19.567°N 155.5°W / 19.567;
-155.5 2 Maui[8] The Valley Isle 20°48′N 156°20′W /
20.8°N 156.333°W / 20.8;
-156.333 3 Kahoʻolawe[9] The Target Isle 20°33′N
156°36′W / 20.55°N
156.6°W / 20.55; -156.6 4 LÄnaÊ»i[10] The Pineapple Isle
20°50′N 156°56′W /
20.833°N 156.933°W / 20.833;
-156.933 5 Molokaʻi[11] The Friendly Isle 21°08′N
157°02′W / 21.133°N
157.033°W / 21.133; -157.033 6 Oʻahu[12] The Gathering Place
21°28′N 157°59′W /
21.467°N 157.983°W / 21.467;
-157.983 7 Kauaʻi[13] The Garden Isle 22°05′N
159°30′W / 22.083°N
159.5°W / 22.083; -159.5 8 Niʻihau[14] The Forbidden Isle
21°54′N 160°10′W / 21.9°N
160.167°W / 21.9; -160.167
As you can see, there are "weird" characters in there. I have also tried readHTMLTable(u, encoding = "UTF-16") and readHTMLTable(u, encoding = "UTF-8")
but that didn't help.
It seems to me that there may be an issue with the interaction of the Windows settings of the character set and R.
sessionInfo() gives
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_3.2-0.2
I have also attempted to let R use another setting by entering: Sys.setlocale("LC_ALL", "en_US.UTF-8"), but this yields the response:
> Sys.setlocale("LC_ALL", "en_US.UTF-8")
[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
OS reports request to set locale to "en_US.UTF-8" cannot be honored
In addition, I have attempted to make the change directly from the windows command prompt, using: chcp 65001 and variations of that, but that didn't change anything.
I noticed from searching the web that others have the issue as well, but have not been able to find a solution. I looks like this is an issue of how Windows and R interact. Unfortunately, all three computers at my disposal have this problem. It occurs both under WinXP-x32 and under Win7-x86.
Is there a way to make R override the windows settings or can the issue be solved otherwise?
I have also tried other websites, and the issue occurs every time when there is an é, ü, ä, î, et cetera in the text-to-be-scraped.
Thank you,
Roger
A not quite an answer:
If you look at the wikipedia page and change the encoding in your browser (in IE, View -> Encoding; in Firefox, View -> Character Encoding) to Western (ISO-8869-1) or Western (Windows-1252) then you see the silly characters. That ought to mean that you can use iconv to change the encoding and fix your problems.
#Convert factors to character
Islands <- as.data.frame(lapply(Islands, as.character), stringsAsFactors = FALSE)
iconv(Islands$Island, "windows-1252", "UTF-8")
Unfortunately, it doesn't work. It may be possible to get the correct text by using a different conversion (iconvlist() shows all the possibilities).
It is possible it simply strip out the offending characters, though this isn't ideal.
iconv(Islands$Island, "windows-1252", "ASCII", "")
Unable to replicate the error, however looking at the help files is useful.
Sys.setlocale("LC_TIME", "de") # Solaris: details are OS-dependent
Sys.setlocale("LC_TIME", "de_DE.utf8") # Modern Linux etc.
Sys.setlocale("LC_TIME", "de_DE.UTF-8") # ditto
Sys.setlocale("LC_TIME", "de_DE") # OS X, in UTF-8
Sys.setlocale("LC_TIME", "German") # Windows
For a windows you should use formatting like "English" or "Dutch_Netherlands.1252" to change these settings.
I tried to replicate your state
> Sys.setlocale("LC_ALL","Dutch_Netherlands.1252")
[1] "LC_COLLATE=Dutch_Netherlands.1252;LC_CTYPE=Dutch_Netherlands.1252;LC_MONETARY=Dutch_Netherlands.1252;LC_NUMERIC=C;LC_TIME=Dutch_Netherlands.1252"
> Sys.getlocale()
[1] "LC_COLLATE=Dutch_Netherlands.1252;LC_CTYPE=Dutch_Netherlands.1252;LC_MONETARY=Dutch_Netherlands.1252;LC_NUMERIC=C;LC_TIME=Dutch_Netherlands.1252"
library(XML)
u <- "http://en.wikipedia.org/wiki/Hawaii"
tables <- readHTMLTable(u)
Islands <- tables[[5]]
However I do not get the funny characters in console, in my own locale the ʻ was marked as , but still all functionality remained.
> Islands[1,1]
[1] Hawaiʻi[27]
8 Levels: Hawaiʻi[27] Kahoʻolawe[34] Kauaʻi[30] Lānaʻi[32] Maui[28] ... Oʻahu[29]
And these funny characters can be read easily, and found from the table.
> Encoding(as.character("Hawaiʻi"))
[1] "UTF-8"
> Encoding(as.character(Islands[1,1]))
[1] "UTF-8"
> grep("Hawaiʻi", as.character(Islands[1,1]))
[1] 1
If you still have problems it would rely elsewhere, however to change the locale under windows you have to use different names than Linux or OS X (see your own locale info for example). In Windows "Dutch" is probably enough.

Resources