How should I set "bw" options for acoustic model adaption? - pocketsphinx

In tutorials it has written this command:
~~~
./bw \
-hmmdir en-us \
-moddeffn en-us/mdef.txt \
-ts2cbfn .ptm. \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-cmn current \
-agc none \
-dictfn cmudict-en-us.dict \
-ctlfn arctic20.fileids \
-lsnfn arctic20.transcription \
-accumdir .
~~~
But I checked my feat.params and it has this content:
~~~
-lowerf 130
-upperf 6800
-nfilt 25
-transform dct
-lifter 22
-feat 1s_c_d_dd
-svspec 0-12/13-25/26-38
-agc none
-cmn current
-varnorm no
-model ptm
-cmninit 40,3,-1
~~~
I don't know how should I config these options? I am trying to config acoustic model for contunuous speaking.
I got my model from here:
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/cmusphinx-en-us-ptm-5.2.tar.gz/download
And tried to configure the above command like this:
./bw -hmmdir en-us -moddeffn en-us/mdef.txt -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn cmudict-en-us.dict -ctlfn robot_train.fileids -lsnfn robot_train.transcription -accumdir -lda feature_transform .
But I get these error messages:
INFO: main.c(229): Compiled on Mar 22 2018 at 12:54:02 ERROR:
"cmd_ln.c", line 607: Unknown argument name 'feature_transform' ERROR:
"cmd_ln.c", line 704: Failed to parse arguments list ERROR:
"cmd_ln.c", line 753: Failed to parse arguments list, forced exit

I changed my model to this file (from the ptm model that tutorial linked to it) : https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/cmusphinx-en-us-5.2.tar.gz/download
Then I removed the -lda feature_transform from my command, and it worked!

Related

Output training losses over iterations/epochs to file from trainer.py in HuggingFace Transfrormers

In the Transformer's library framework, by HuggingFace only the evaluation step metrics are outputted to a file named eval_resuls_{dataset}.txt in the "output_dir" when running run_glue.py. In the eval_resuls file, there are the metrics associated with the dataset. e.g., accuracy for MNLI and the evaluation loss.
Can a parameter be passed to run_glue.py to generate a training_results_{dataset}.txt file that tracks the training loss? Or would I have to build the functionality myself?
My file named run_python_script_glue.bash:
GLUE_DIR=../../huggingface/GLUE_SMALL/
TASK_NAME=MNLI
ID=OT
python3 run_glue.py \
--local_rank -1 \
--seed 42 \
--model_type albert \
--model_name_or_path albert-base-v2 \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 128 \
--per_gpu_train_batch_size 8 \
--per_gpu_eval_batch_size 8 \
--gradient_accumulation_steps 2\
--learning_rate 3e-5 \
--max_steps -1 \
--warmup_steps 1000\
--doc_stride 128 \
--num_train_epochs 3.0 \
--save_steps 9999\
--output_dir ./results/GLUE_SMALL/$TASK_NAME/ALBERT/$ID/ \
--do_lower_case \
--overwrite_output_dir \
--label_noise 0.2\
--att_kl 0.01\
--att_se_hid_size 16\
--att_se_nonlinear relu\
--att_type soft_attention \
--adver_type ot \
--rho 0.5 \
--model_type whai \
--prior_gamma 2.70 \
--three_initial 0.0
In the trainer.py file in the transformer library, the training loss variable during the training step is called tr_loss.
tr_loss = self._training_step(model, inputs, optimizer, global_step)
loss_scalar = (tr_loss - logging_loss) / self.args.logging_steps
logs["loss"] = loss_scalar
logging_loss = tr_loss
In the code, the training loss is first scaled by the logging steps and later passed to a logs dictionary. The logs['loss'] is later printed to the terminal but not to a file. Is there a way to upgrade this to include an update to a txt file?

Pandoc fails to render markdown with image: "! Missing endcsname inserted."

I am having trouble rendering a .md-file with since I inserted an image. Before everything worked quite well.
My configuration
MacBook Air / MacOS 12.5
multiple markdown files for each chapter: 1_einleitung.md / 2_theorie.md / ...
there is a short yaml-block that references the bibliography file within the same folder: bib.bib.
I use a special LUA filter (pangb4e) for numbered examples and interlinear glossing
a metadata.yml ➞ find the yaml code below.
a Make-file ➞ find the makefile code below
the image is stored in images/danes1.png (I even tried to use the entire path instead of the relative one)
2_theorie.md
---
bibliography: [bib.bib]
---
[...]
![Die einfache lineare Progression](images/danes1.png){width=300px #fig:danes1}
metadata.yml
---
author: ...
affiliation: ...
title: ...
date: \today
# number-sections: true
# abstract: This is the abstract.
# Formatting
bibliography: [bib.bib]
cls: linguistics-and-education.csl
lang: de-DE
link-citations: true
linkReferences: true
nameInLink: true
fontsize: 12pt
papersize: a4
indent: true
fontfamily: sourcesanspro
fontfamilyoptions: default
geometry: margin=2.5cm
linestretch: 1.5
header-includes:
- \usepackage{gb4e}
- \usepackage[nottoc]{tocbibind}
figureTitle: "Abbildung"
tableTitle: "Tabelle"
figPrefix:
- "Fig."
- "Figs."
tblPrefix:
- "Tab."
secPrefix:
- Kapitel
loftitle: "# Abbildungsverzeichnis"
lottitle: "# Tabellenverzeichnis"
...
makefile
1_einleitung:
pandoc 1_einleitung.md -o 1_einleitung.pdf \
--metadata-file=metadata.yml \
--number-sections \
--strip-comments \
--filter pandoc-crossref \
--citeproc \
--lua-filter addons/pangb4e.lua \
2_theorie:
pandoc 2_theorie.md -o 2_theorie.pdf \
--metadata-file=metadata.yml \
--number-sections \
--strip-comments \
--filter pandoc-crossref \
--citeproc \
--lua-filter addons/pangb4e.lua \
The error output
Currently, I do my writing within the 2_theorie.md and run the command make 2_theorie to produce a pdf. I just inserted an image in 2_theorie.md and a get the following error:
san#MacBook-Air Doktorarbeit % make 2_theorie
pandoc 2_theorie.md -o 2_theorie.pdf \
--metadata-file=metadata.yml \
--number-sections \
--strip-comments \
--filter pandoc-crossref \
--citeproc \
--lua-filter addons/pangb4e.lua \
Error producing PDF.
! Missing endcsname inserted.
<to be read again>
let
l.591 }
When I delete the image, the code runs as usual but I need to be able to use images in my work.
If you need more information, please let me know!
Thank you!
I found a solution to my problem myself here: https://tex.stackexchange.com/questions/448314/not-able-to-use-images-if-i-call-gb4e-sty
The problem was the package gb4e. I disabled the a feature of the package right after loading it and panda worked just fine
Edits in the metadata.yml
[...]
header-includes:
- \usepackage{gb4e}
- \noautomath # <-- I added this line the to metadata.yml
[...]

Learning rate not set in run_mlm.py?

I want to run (or resume) the run_mlm.py script with a specific learning rate, but it doesn't seem like setting it in the script arguments does anything.
os.system(
f"python {script} \
--model_type {model} \
--config_name './models/{model}/config.json' \
--train_file './content/{data}/train.txt' \
--validation_file './content/{data}/test.txt' \
--learning_rate 6e-4 \
--weight_decay 0.01 \
--warmup_steps 6000 \
--adam_beta1 0.9 \
--adam_beta2 0.98 \
--adam_epsilon 1e-6 \
--tokenizer_name './tokenizer/{model}' \
--output_dir './{out_dir}' \
--do_train \
--do_eval \
--num_train_epochs 40 \
--overwrite_output_dir {overwrite} \
--ignore_data_skip"
)
After warm-up, the log indicates that the learning rate tops out at 1e-05—a default from somewhere, I guess, but I'm not sure where (and certainly not 6e-4):
{'loss': 3.9821, 'learning_rate': 1e-05, 'epoch': 0.09}

Transformers fine tune model warning

python run_clm.py \
--model_name_or_path ctrl \
--train_file df_finetune_train.csv \
--validation_file df_finetune_test.csv \
--do_train \
--do_eval \
--preprocessing_num_workers 72 \
--block_size 256 \
--output_dir ./finetuned
I am trying to fine tune the ctrl model on my own dataset, where each row represents a sample.
However, I got the warning info below.
[WARNING|tokenization_utils_base.py:3213] 2021-03-25 01:32:22,323 >>
Token indices sequence length is longer than the specified maximum
sequence length for this model (934 > 256). Running this sequence
through the model will result in indexing errors
What is the cause for this ? any solutions ?

how to fix'Too few arguments to function Http\Adapter\Guzzle6\Client::buildClient()' when use 'GrahamCampbell/Laravel-GitHub'

I make a web application for the management of the educational establishment with laravel, so I have to make a collaborative workspace.
The idea that I find is to work with GitHub repository, after a search in the web I find 'GrahamCampbell / Laravel-GitHub'.
I do the installation like documentation, but when I test I have the following error:
Too few arguments to function Http \ Adapter \ Guzzle6 \ Client :: buildClient (),
0 passed in C: \ Users \ Fehmi \ Dropbox \ GRASP \ vendor \ php-http \ guzzle6-adapter \ src \ Client.php on line 31 and exactly 1 expected "
use GrahamCampbell\GitHub\Facades\GitHub;
class GitController extends Controller
{
public function FuncName ()
{
dd(GitHub::me()->organizations());
}
}
The result that I have is
Symfony \ Component \ Debug \ Exception \ FatalThrowableError (E_RECOVERABLE_ERROR)
Too few arguments to function Http\Adapter\Guzzle6\Client::buildClient(), 0 passed in C:\Users\Fehmi\Dropbox\GRASP\vendor\php-http\guzzle6-adapter\src\Client.php on line 31 and exactly 1 expected
Make sure to use the latest php-http/guzzle6-adapter version.
Only the one from May 2016 has a line 31 with $client = static::buildClient();, and it had an issue fixed in PR 32 to allow calling the buildClient() with no parameters.
GrahamCampbell/Laravel-GitHub only imposes a guzzle6 version as a range between 1.0 (included) and 2.0.
Maybe using ^2.0 or at least ^1.1 might help.

Resources