I'm using Kubuntu to run SoX. I have the following code to get info from sound files:
for file in *.mp3; do echo -e '\n--------------------\n'$file'\n'; sox $file -n stats; done > stats.txt 2>&1 | tail -1
It produces output that looks like this:
--------------------
soundfile_name.mp3
DC offset -0.000287
Min level -0.585483
Max level 0.572299
Pk lev dB -4.65
RMS lev dB -19.55
RMS Pk dB -12.98
RMS Tr dB -78.44
Crest factor 5.56
Flat factor 0.00
Pk count 2
Bit-depth 29/29
Num samples 628k
Length s 14.237
Scale max 1.000000
Window s 0.050
Could someone amend the command to limit the output so that it looks like this?
--------------------
soundfile_name.mp3
Pk lev dB -4.65
RMS lev dB -19.55
RMS Pk dB -12.98
RMS Tr dB -78.44
thanks
Given that the lines of interest have the word "dB" in common, you could filter SoX output with grep -w dB:
for file in *.mp3; do echo -e '\n--------------------\n'$file'\n'; sox $file -n stats | grep -w dB; done > stats.txt 2>&1
Resulting content of stats.txt:
--------------------
soundfile_name.mp3
Pk lev dB -4.65
RMS lev dB -19.55
RMS Pk dB -12.98
RMS Tr dB -78.44
I would like to add the output of du for all sub folders with the certain same subfolder characters.
I have tried (example)
du -s /aa/bb/cc/*/ | sort -k2.11,2.14
where I got the output sorted
2000 /aa/bb/cc/1234/
1000 /aa/bb/dd/1234/
2000 /aa/bb/ff/1234/
2000 /aa/bb/cc/5678/
2000 /aa/bb/dd/5678/
3000 /aa/bb/ee/5678/
1000 /aa/bb/gg/5678/
Now I would like to add all the ones with 1234 and 5678
Expected result
5000 -- 1234
8000 -- 5678
You can use awk to store all the content of first filed into an array a using key of 2nd last field.
du -s /aa/bb/cc/*/ | sort -k2.11,2.14 |awk -F'/' '{a[$(NF-1)]+=$1}END{for(i in a) print a[i],i}'
8000 5678
5000 1234
I have a large email list, and i want to know what is the top 100 domains on this list, example list :
cristiano.ofidiani#libero.it
cristianocurzi70#libero.it
cristianogiustetto#libero.it
cristianopaolieri#fercart.com
cristianoristori#tiscali.it
cristianorollo#tiscali.it
cristianoscavi#alice.it
cristianotradigo#virgilio.it
cristianpassarelli#virgilio.it
cristianprisco#libero.it
cristianriparip#riparifranco.it
cristiansrl.pec#legalmail.it
cristina.arese#vestisolidale.it
cristina.armillotta#coldiretti.it
cristina.bazzi#bazzicroup.it
cristina.bedocchi#tin.it
cristina.benassi#terminalrubiero.com
i need to know the top of domains in this list, example :
libero.it 100
tiscali.it 77
legalmain 44
how i can do this in linux bash ?
cut -d# -f2 | sort | uniq -c | sort -nr | head -n 100 should do the trick. Cut extracts the domain by using # to separate the fields, uniq requires a sorted list, -c gives a count, sort -nr sorts them in decreasing order, and head gives the top one hundred.
I have a very simple dataset, see below (let's call it a.vw):
-1 |a 1 |b c57
1 |a 2 |b c3
2 namespaces (a and b), and after reading wiki, I know that vw will automatically make the real features like a^1 or b^c57.
However, before I knew it, I actually made a vw file like this (call it b.vw):
-1 |a a_1 |b b_c57
1 |a a_2 |b b_c3
As you can see, I just add prefix for each feature manually.
Now I train models on both files with same configuration, like this:
cat a.vw | vw --loss_function logistic --passes 1 --hash all -f a.model --invert_hash a.readable --random_seed 1
cat b.vw | vw --loss_function logistic --passes 1 --hash all -f b.model --invert_hash b.readable --random_seed 1
then I checked the readable model files, they have exactly the same weights for each feature, see below:
$ cat a.readable
Version 8.2.1
Id
Min label:-50
Max label:50
bits:18
lda:0
0 ngram:
0 skip:
options:
Checksum: 295637807
:0
Constant:116060:-0.0539969
a^1:112195:-0.235305
a^2:1080:0.243315
b^c3:46188:0.243315
b^c57:166454:-0.235305
$ cat b.readable
Version 8.2.1
Id
Min label:-50
Max label:50
bits:18
lda:0
0 ngram:
0 skip:
options:
Checksum: 295637807
:0
Constant:116060:-0.0539969
a^a_1:252326:-0.235305
a^a_2:85600:0.243315
b^b_c3:166594:0.243315
b^b_c57:227001:-0.235305
Finally, I did prediction using both models on both datasets respectively, like this:
$ cat a.vw | vw -t -i a.model -p a.pred --link logistic --quiet
$ cat b.vw | vw -t -i b.model -p b.pred --link logistic --quiet
Now, here comes the problem, a.pred holds very different results from b.pred, see below:
$ cat a.pred
0.428175
0.547189
$ cat b.pred
0.371776
0.606502
WHY? Does it mean we have to manually add prefix for features?
If you try cat a.vw | vw -t -i a.model -p a.pred --link logistic --quiet --hash all you'll get:
$ cat a.pred
0.371776
0.606502
It seems --hash argument value doesn't stored in model file and you need it to be specified at test step too. It doesn't matter for b.vw as it has no pure numeric features but comes into play with a.vw. I'm not sure if it's a bug. But you may report it.
Suppose we have one file abc.csv.dat
100000114,AU79 Attract Mens Deo 150 Ml Can,100000113,AU79 Attract Mens Deo 150 Ml Can,18,_,18,Deo
100000115,AU79 Sauve Mens Deo 150 Ml Can,100000112,AU79 Sauve Mens Deo 150 Ml Can,18,_,18,Deo
100000117,AU79 Altitude Mens Deo 150 Ml Can,100000116,AU79 Altitude Mens Deo 150 Ml Can,18,_,18,Deo
100000119,DU AU79 Bandit Mens Deo 150 Ml Can,100000118,DU AU79 Bandit Mens Deo 150 Ml Can,18,_,18,Deo
Second file is xyz.csv.dat
100000114,AU79 Attract Mens Deo 250 Ml Can,100000113,AU79 Attract Mens Deo 250 Ml Can,18,_,18,Deo
100000115,AU79 Sauve Mens Deo 150 Ml Can,100000112,AU79 Sauve Mens Deo 150 Ml Can,18,_,18,Deo
100000119,DU AU79 Bandit Mens Deo 150 Ml Can,100000118,DU AU79 Bandit Mens Deo 150 Ml Can,18,_,18,Deo
100000120,AU79 Altitude Mens Deo 350 Ml Can,100000116,AU79 Altitude Mens Deo 350 Ml Can,18,_,18,Deo
I want to compare these two files using unix commands to create my unix shell scripts which will count the new rows, updated rows and deleted rows.
My sample files are small but the actual files are containing 20,000+ records.
Thanks for ur attention.
You can use comm to get most of what you want. It treats "update" as "delete and insert".
insertions: comm -13 abc.csv.dat xyz.csv.dat
deletions: comm -23 abc.csv.dat xyz.csv.dat
unchanged: comm -12 abc.csv.dat xyz.csv.dat
comm requires the input files to be sorted.
Here is a more in-depth example using comm:
$ comm -3 <(sort abc.csv.dat) <(sort xyz.csv.dat) | sed -e 's/^[ \t]*//' | awk -F , '{if (a[$1]) {print "^"$1","} {a[$1] = $0}}' > data2.txt
--count updated rows:
$ cat data2.txt | grep -E -f - xyz.csv.dat | wc -l
1
--count deleted rows:
$ cat data2.txt | grep -v -E -f - <(comm -2 -3 <(sort abc.csv.dat) <(sort xyz.csv.dat)) | wc -l
1
--count new rows:
$ cat data2.txt | grep -v -E -f - <(comm -1 -3 <(sort abc.csv.dat) <(sort xyz.csv.dat)) | wc -l
1
--list the update rows:
$ cat data2.txt | grep -E -f - xyz.csv.dat
100000114,AU79 Attract Mens Deo 250 Ml Can,100000113,AU79 Attract Mens Deo 250 Ml Can,18,_,18,Deo
--list the delete rows:
$ cat data2.txt | grep -v -E -f - <(comm -2 -3 <(sort abc.csv.dat) <(sort xyz.csv.dat))
100000117,AU79 Altitude Mens Deo 150 Ml Can,100000116,AU79 Altitude Mens Deo 150 Ml Can,18,_,18,Deo
--list the new rows:
$ cat data2.txt | grep -v -E -f - <(comm -1 -3 <(sort abc.csv.dat) <(sort xyz.csv.dat))
100000120,AU79 Altitude Mens Deo 350 Ml Can,100000116,AU79 Altitude Mens Deo 350 Ml Can,18,_,18,Deo