Remove everything before a string with bash? - bash

I'm doing this with ffmpeg :
ffmpeg -i /Users/petaire/GDrive/Taff/ASI/Bash/testFolder/SilenceAndBlack.mp4 -af silencedetect=d=2 -f null - 2>&1 | grep silence_duration
And my output is :
[silencedetect # 0x7f9e6940eba0] silence_end: 25.92 | silence_duration: 25.936
But I only want to keep the duration number, so I'm trying to remove everything before the last number.
I've never understood anything about sed/awk & co, so I dont know what is the best way to do that. I thought grep would be powerful enough, but it doesn't seems so.
Any idea?

Using awk to print the last field:
$ awk '{print $NF}'
Test it:
$ echo "[silencedetect # 0x7f9e6940eba0] silence_end: 25.92 | silence_duration: 25.936"| awk '{print $NF}'
25.936
or use sed to replace everything up to last space with nothing:
$ ... | sed 's/.* //'

you can change your grep command to
grep -oP '(?<=silence_duration: )\S+'
which will print the next field to the searched one.

to remove everything before the last number
you can use
grep -o "[^ ]*$"

Another option, grep -o with cut:
$ echo '[silencedetect # 0x7f9e6940eba0] silence_end: 25.92 | silence_duration: 25.936' \
| grep -o 'silence_duration: [0-9]*\.[0-9]*' | cut -d ' ' -f 2
25.936

Related

How do I remove the header in the df command?

I'm trying to write a bash command that will sort all volumes by the amount of data they have used and tried using
df | awk '{print $1 | "sort -r -k3 -n"}'
Output:
map
devfs
Filesystem
/dev/disk1s5
/dev/disk1s2
/dev/disk1s1
But this also shows the header called Filesystem.
How do I remove that?
For your specific case, i.e. using awk, #codeforester answer (using awk NR (Number of Records) variable) is the best.
In a more general case, in order to remove the first line of any output, you can use the tail -n +N option in order to output starting with line N:
df | tail -n +2 | other_command
This will remove the first line in df output.
Skip the first line, like this:
df | awk 'NR>1 {print $1 | "sort -r -k3 -n"}'
I normally use one of these options, if I have no reason to use awk:
df | sed 1d
The 1d option to sed says delete the first line, then print everything else.
df | tail -n+2
the -n+2 option to tail say start looking at line 2 and print everything until End-of-Input.
I suspect sed is faster than awk or tail, but I can't prove it.
EDIT
If you want to use awk, this will print every line except the first:
df | awk '{if (FNR>1) print}'
FNR is the File Record Number. It is the line number of the input. If it is greater than 1, print the input line.
Count the lines from the output of df with wc and then substract one line to output a headerless df with tail ...
LINES=$(df|wc -l)
LINES=$((${LINES}-1))
df | tail -n ${LINES}
OK - I see oneliner - Here is mine ...
DF_HEADERLESS=$(LINES=$(df|wc -l); LINES=$((${LINES}-1));df | tail -n ${LINES})
And for formated output lets printf loop over it...
printf "%s\t%s\t%s\t%s\t%s\t%s\n" ${DF_HEADERLESS} | awk '{print $1 | "sort -r -k3 -n"}'
This might help with GNU df and GNU sort:
df -P | awk 'NR>1{$1=$1; print}' | sort -r -k3 -n | awk '{print $1}'
With GNU df and GNU awk:
df -P | awk 'NR>1{array[$3]=$1} END{PROCINFO["sorted_in"]="#ind_num_desc"; for(i in array){print array[i]}}'
Documentation: 8.1.6 Using Predefined Array Scanning Orders with gawk
Removing something from a command output can be done very simply, using grep -v, so in your case:
df | grep -v "Filesystem" | ...
(You can do your awk at the ...)
When you're not sure about caps, small caps, you might add -i:
df | grep -i -v "FiLeSyStEm" | ...
(The switching caps/small caps are meant as a clarification joke :-) )

Oneline file-monitoring

I have a logfile continously filling with stuff.
I wish to monitor this file, grep for a specific line and then extract and use parts of that line in a curl command.
I had a look at How to grep and execute a command (for every match)
This would work in a script but I wonder if it is possible to achieve this with the oneliner below using xargs or something else?
Example:
Tue May 01|23:59:11.012|I|22|Event to process : [imsi=242010800195809, eventId = 242010800195809112112, msisdn=4798818181, inbound=false, homeMCC=242, homeMNC=01, visitedMCC=238, visitedMNC=01, timestamp=Tue May 12 11:21:12 CEST 2015,hlr=null,vlr=4540150021, msc=4540150021 eventtype=S, currentMCC=null, currentMNC=null teleSvcInfo=null camelPhases=null serviceKey=null gprsenabled= false APNlist: null SGSN: null]|com.uws.wsms2.EventProcessor|processEvent|139
Extract the fields I want and semi-colon separate them:
tail -f file.log | grep "Event to process" | awk -F'=' '{print $2";"$4";"$12}' | tr -cd '[[:digit:].\n.;]'
Curl command, e.g. something like:
http://user:pass#www.some-url.com/services/myservice?msisdn=...&imsi=...&vlr=...
Thanks!
Try this:
tail -f file.log | grep "Event to process" | awk -F'=' '{print $2" "$4" "$12; }' | tr -cd '[[:digit:].\n. ]' |while read msisdn imsi vlr ; do curl "http://user:pass#www.some-url.com/services/myservice?msisdn=$msisdn&imsi=$imsi&vlr=$vlr" ; done

Create name/value pairs based on file output

I'd like to format the output of cat myFile.txt in the form of:
app1=19
app2=7
app3=20
app4=19
Using some combination of piping output through various commands.
What would be easiest way to achieve this?
I've tried using cut -f2 but this does not change the output, which is odd.
Here is the basic command/file output:
[user#hostname ~]$ cat myFile.txt
1402483560882 app1 19
1402483560882 app2 7
1402483560882 app3 20
1402483560882 app4 19
Basing from your input:
awk '{ print $2 "=" $3 }' myFile
Output
app1=19
app2=7
app3=20
app4=19
Another solution, using sed and cut:
cat myFile.txt | sed 's/ \+/=/gp' | cut -f 3- -d '='
Or using tr and cut:
cat myFile.txt | tr -s ' ' '=' | cut -f 3- -d '='
You could try this sed oneliner also,
$ sed 's/^\s*[^ ]*\s\([^ ]*\)\s*\(.*\)$/\1=\2/g' file
app1=19
app2=7
app3=20
app4=19

bash (grep|awk|sed) - Extract domains from a file

I need to extract domains from a file.
domains.txt:
eofjoejfej fjpejfe http://ejej.dm1.com dêkkde
ojdoed www.dm2.fr doejd eojd oedj eojdeo
http://dm3.org ieodhjied oejd oejdeo jd
ozjpdj eojdoê jdeojde jdejkd http://dm4.nu/
io d oed 234585 http://jehrhr.dm5.net/hjrehr
[2014-05-31 04:05] eohjpeo jdpiehd pe dpeoe www.dm6.uk/jehr
I need to get:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.co.uk
Try this sed command,
$ sed -r 's/.*(dm[^\.]*\.[^/ ]*).*/\1/g' file
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
This is a bit long, but should work:
grep -oE "http[^ ]*|www[^ ]*" file | sed -e 's|http://||g' -e 's/^www\.//g' -e 's|/.*$||g' -re 's/^.*\.([^\.]+\.[^\.]+$)/\1/g'
Output:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
Unrefined method using grep and sed:
grep -oE '[[:alnum:]]+[.][[:alnum:]_.-]+' file | sed 's/www.//'
Outputs:
ejej.dm1.com
dm2.fr
dm3.org
dm4.nu
jehrhr.dm5.net
dm6.uk
An answer with gawk:
LC_ALL=C gawk -d -v RS="[[:space:]]+" -v FS="." '
{
# Remove the http prefix if it exists
sub( /http:[/][/]/, "" )
# Remove the path
sub( /[/].*$/, "" )
# Does it look like a domain?
if ( /^([[:alnum:]]+[.])+[[:alnum:]]+$/ ) {
# Print the last 2 components of the domain name
print $(NF-1) "." $NF
}
}' file
Some notes:
Using RS="[[:space:]]" allow us to process each group of letter independently.
LC_ALL=C forces [[:alnum:]] to be ASCII-only (this is not necessary any more with gawk 4+).
To be able to remove subdomains you have to validate them first, because if you cut the columns it would affect the TLDs. Then you have to do 3 steps.
Step 1: clean domains.txt
grep -oiE '([a-zA-Z0-9][a-zA-Z0-9-]{1,61}\.){1,}(\.?[a-zA-Z]{2,}){1,}' domains.txt | sed -r 's:(^\.*?(www|ftp|ftps|ftpes|sftp|pop|pop3|smtp|imap|http|https)[^.]*?\.|^\.\.?)::gi' | sort -u > capture
Content capture
ejej.dm1.com
dm2.fr
dm3.org
dm4.nu
jehrhr.dm5.net
dm6.uk
Step 2: download and filter TLD list:
wget https://raw.githubusercontent.com/publicsuffix/list/master/public_suffix_list.dat
grep -v "//" public_suffix_list.dat | sed '/^$/d; /#/d' | grep -v -P "[^a-z0-9_.-]" | sed 's/^\.//' | awk '{print "." $1}' | sort -u > tlds.txt
So far you have two lists (capture and tlds.txt)
Step 3: Download and run this python script:
wget https://raw.githubusercontent.com/maravento/blackweb/master/bwupdate/tools/parse_domain_tld.py && chmod +x parse_domain_tld.py && python parse_domain_tld.py | sort -u
out:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
Source: blackweb
This can be useful:
grep -Pho "(?<=http://)[^(\"|'|[:space:])]*" file.txt | sed 's/www.//g' | grep -Eo '[[:alnum:]]{1,}\.[[:alnum:]]{1,}[.]{0,1}[[:alnum:]]{0,}' | sort | uniq
First grep get 'http://www.example.com' enclosed in single or double quotes, but extract only domain. Second, using 'sed' I remove 'www.', third one extract domain names separated by '.' and in block of two or three alfnumeric characters. At the end, output is ordered to display only single instances of each domain

bash more effective use of sed / awk

I am looking for a more efficient to use the following string to get the desired result as a one liner
date -d #1381219358 | sed 's/\ \ /\ /g' | sed 's/[:\ ]/-/g' | sed 's/2013/13/' | awk -F '-' '{print $4"-"$5"-"$6"-"$2"-"$3"-"$8}'
The desired result output is as follows:
04-02-38-Oct-8-13
Any help would be appreciated
You can format the output directly, like this:
date -d #1381219358 +"%H-%M-%S-%b-%d-%y"
10-02-38-Oct-08-13
I'm not sure if you need sed or awk for this. You can format the output using date.
Try saying:
date -d #1381219358 +%H-%M-%S-%b-%d-%y

Resources