Script to save bash output to YAML file - bash

I can output the relevant details of my printer using the command line:
sed -En 's/[ \t]*b?(id[vp][^ \t]*|endpoint)(address)?[ \t]+([^ \t]*).* (out|in)?.*/\l\1\4 (\3)/Ip' <(lsusb | awk '$0 ~ /STMicroelectronics printer-80/{print $6}' | xargs -I % sh -c "lsusb -vvv -d %")
outputs:
idVendor (0x0483)
idProduct (0x5743)
endpointOUT (0x01)
endpointIN (0x81)
Now, I need to save those data to a YAML file in the following format:
printer:
type: Usb
idVendor: 0x0483
idProduct: 0x5743
in_ep: 0x81
out_ep: 0x01
I just don’t know how to achieve this formatting and save to the file.
I‘ve tried formatting the output, but couldn’t get further than this snippet.

This reformats each line:
sed -E 's/(.*) \((.*)\)/ \1: \2/g' <<< "idVendor (0x0483)"
result:
idVendor: 0x0483
Combine this with your previous command and prepend the printer: line to get the desired output:
echo "printer:" > file.yaml
sed -En 's/[ \t]*b?(id[vp][^ \t]*|endpoint)(address)?[ \t]+([^ \t]*).* (out|in)?.*/\l\1\4 (\3)/Ip' \
<(lsusb | awk '$0 ~ /STMicroelectronics printer-80/{print $6}' | \
xargs -I % sh -c "lsusb -vvv -d %"\
) | sed -E 's/(.*) \((.*)\)/ \1: \2/g' >> file.yaml
It is unclear how you want to get the line type: Usb, you can of course just write it into the file like printer:. As for the different names, simply append sed -e 's/endpointOUT/out_ep/' -e s/endpointIN/in_ep/'.

Related

Remove everything before a string with bash?

I'm doing this with ffmpeg :
ffmpeg -i /Users/petaire/GDrive/Taff/ASI/Bash/testFolder/SilenceAndBlack.mp4 -af silencedetect=d=2 -f null - 2>&1 | grep silence_duration
And my output is :
[silencedetect # 0x7f9e6940eba0] silence_end: 25.92 | silence_duration: 25.936
But I only want to keep the duration number, so I'm trying to remove everything before the last number.
I've never understood anything about sed/awk & co, so I dont know what is the best way to do that. I thought grep would be powerful enough, but it doesn't seems so.
Any idea?
Using awk to print the last field:
$ awk '{print $NF}'
Test it:
$ echo "[silencedetect # 0x7f9e6940eba0] silence_end: 25.92 | silence_duration: 25.936"| awk '{print $NF}'
25.936
or use sed to replace everything up to last space with nothing:
$ ... | sed 's/.* //'
you can change your grep command to
grep -oP '(?<=silence_duration: )\S+'
which will print the next field to the searched one.
to remove everything before the last number
you can use
grep -o "[^ ]*$"
Another option, grep -o with cut:
$ echo '[silencedetect # 0x7f9e6940eba0] silence_end: 25.92 | silence_duration: 25.936' \
| grep -o 'silence_duration: [0-9]*\.[0-9]*' | cut -d ' ' -f 2
25.936

Replace each } with a }\n in a huge (12GB) which consists of 1 line?

I have a log file (from a customer). 18 Gigs. All contents of the file are in 1 line.
I want to read the file in logstash. But I get problems because of Memory. The file is read line by line but unfortunately it is all on 1 line.
I tried split the file into lines so that logstash can process it (the file has a simple json format, no nested objects) i wanted to have each json in one line, splitting at } by replacing with }\n:
sed -i 's/}/}\n/g' NonPROD.log.backup
But sed is killed - I assume also because of memory. How can I resolve this? Can I let sed manipulate the file using other chunks of data than lines? I know by default sed reads line by line.
The following uses only functionality built into the shell:
#!/bin/bash
# as long as there exists another } in the file, read up to it...
while IFS= read -r -d '}' piece; do
# ...and print that content followed by '}' and a newline.
printf '%s}\n' "$piece"
done
# print any trailing content after the last }
[[ $piece ]] && printf '%s\n' "$piece"
If you have logstash configured to read from a TCP port (using 14321 as an arbitrary example below), you can run thescript <NonPROD.log.backup >"/dev/tcp/127.0.0.1/14321" or similar, and there you are -- without needing to have double your original input file's space available on disk, as other answers thus far given require.
With GNU awk for RT:
$ printf 'abc}def}ghi\n' | awk -v RS='}' '{ORS=(RT?"}\n":"")}1'
abc}
def}
ghi
with other awks:
$ printf 'abc}def}ghi\n' | awk -v RS='}' -v ORS='}\n' 'NR>1{print p} {p=$0} END{printf "%s",p}'
abc}
def}
ghi
I decided to test all of the currently posted solutions for functionality and execution time using an input file generated by this command:
awk 'BEGIN{for(i=1;i<=1000000;i++)printf "foo}"; print "foo"}' > file1m
and here's what I got:
1) awk (both awk scripts above had similar results):
time awk -v RS='}' '{ORS=(RT?"}\n":"")}1' file1m
Got expected output, timing =
real 0m0.608s
user 0m0.561s
sys 0m0.045s
2) shell loop:
$ cat tst.sh
#!/bin/bash
# as long as there exists another } in the file, read up to it...
while IFS= read -r -d '}' piece; do
# ...and print that content followed by '}' and a newline.
printf '%s}\n' "$piece"
done
# print any trailing content after the last }
[[ $piece ]] && printf '%s\n' "$piece"
$ time ./tst.sh < file1m
Got expected output, timing =
real 1m52.152s
user 1m18.233s
sys 0m32.604s
3) tr+sed:
$ time tr '}' '\n' < file1m | sed 's/$/}/'
Did not produce the expected output (Added an undesirable } at the end of the file), timing =
real 0m0.577s
user 0m0.468s
sys 0m0.078s
With a tweak to remove that final undesirable }:
$ time tr '}' '\n' < file1m | sed 's/$/}/; $s/}//'
real 0m0.718s
user 0m0.670s
sys 0m0.108s
4) fold+sed+tr:
$ time fold -w 1000 file1m | sed 's/}/}\n\n/g' | tr -s '\n'
Got expected output, timing =
real 0m0.811s
user 0m1.137s
sys 0m0.076s
5) split+sed+cat:
$ cat tst2.sh
mkdir tmp$$
pwd="$(pwd)"
cd "tmp$$"
split -b 1m "${pwd}/${1}"
sed -i 's/}/}\n/g' x*
cat x*
rm -f x*
cd "$pwd"
rmdir tmp$$
$ time ./tst2.sh file1m
Got expected output, timing =
real 0m0.983s
user 0m0.685s
sys 0m0.167s
You can running it through tr, then put the end bracket back on at the end of each line:
$ cat NonPROD.log.backup | tr '}' '\n' | sed 's/$/}/' > tmp$$
$ wc -l NonPROD.log.backup tmp$$
0 NonPROD.log.backup
43 tmp10528
43 total
(My test file only had 43 brackets.)
You could:
Split the file to say 1M chunks using split -b 1m file.log
Process all the files sed 's/}/}\n/g' x*
... and redirect the output of sed to combine them back to a single piece
The drawback of this is the doubled storage space.
another alternative with fold
$ fold -w 1000 long_line_file | sed 's/}/}\n\n/g' | tr -s '\n'

Create name/value pairs based on file output

I'd like to format the output of cat myFile.txt in the form of:
app1=19
app2=7
app3=20
app4=19
Using some combination of piping output through various commands.
What would be easiest way to achieve this?
I've tried using cut -f2 but this does not change the output, which is odd.
Here is the basic command/file output:
[user#hostname ~]$ cat myFile.txt
1402483560882 app1 19
1402483560882 app2 7
1402483560882 app3 20
1402483560882 app4 19
Basing from your input:
awk '{ print $2 "=" $3 }' myFile
Output
app1=19
app2=7
app3=20
app4=19
Another solution, using sed and cut:
cat myFile.txt | sed 's/ \+/=/gp' | cut -f 3- -d '='
Or using tr and cut:
cat myFile.txt | tr -s ' ' '=' | cut -f 3- -d '='
You could try this sed oneliner also,
$ sed 's/^\s*[^ ]*\s\([^ ]*\)\s*\(.*\)$/\1=\2/g' file
app1=19
app2=7
app3=20
app4=19

bash (grep|awk|sed) - Extract domains from a file

I need to extract domains from a file.
domains.txt:
eofjoejfej fjpejfe http://ejej.dm1.com dêkkde
ojdoed www.dm2.fr doejd eojd oedj eojdeo
http://dm3.org ieodhjied oejd oejdeo jd
ozjpdj eojdoê jdeojde jdejkd http://dm4.nu/
io d oed 234585 http://jehrhr.dm5.net/hjrehr
[2014-05-31 04:05] eohjpeo jdpiehd pe dpeoe www.dm6.uk/jehr
I need to get:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.co.uk
Try this sed command,
$ sed -r 's/.*(dm[^\.]*\.[^/ ]*).*/\1/g' file
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
This is a bit long, but should work:
grep -oE "http[^ ]*|www[^ ]*" file | sed -e 's|http://||g' -e 's/^www\.//g' -e 's|/.*$||g' -re 's/^.*\.([^\.]+\.[^\.]+$)/\1/g'
Output:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
Unrefined method using grep and sed:
grep -oE '[[:alnum:]]+[.][[:alnum:]_.-]+' file | sed 's/www.//'
Outputs:
ejej.dm1.com
dm2.fr
dm3.org
dm4.nu
jehrhr.dm5.net
dm6.uk
An answer with gawk:
LC_ALL=C gawk -d -v RS="[[:space:]]+" -v FS="." '
{
# Remove the http prefix if it exists
sub( /http:[/][/]/, "" )
# Remove the path
sub( /[/].*$/, "" )
# Does it look like a domain?
if ( /^([[:alnum:]]+[.])+[[:alnum:]]+$/ ) {
# Print the last 2 components of the domain name
print $(NF-1) "." $NF
}
}' file
Some notes:
Using RS="[[:space:]]" allow us to process each group of letter independently.
LC_ALL=C forces [[:alnum:]] to be ASCII-only (this is not necessary any more with gawk 4+).
To be able to remove subdomains you have to validate them first, because if you cut the columns it would affect the TLDs. Then you have to do 3 steps.
Step 1: clean domains.txt
grep -oiE '([a-zA-Z0-9][a-zA-Z0-9-]{1,61}\.){1,}(\.?[a-zA-Z]{2,}){1,}' domains.txt | sed -r 's:(^\.*?(www|ftp|ftps|ftpes|sftp|pop|pop3|smtp|imap|http|https)[^.]*?\.|^\.\.?)::gi' | sort -u > capture
Content capture
ejej.dm1.com
dm2.fr
dm3.org
dm4.nu
jehrhr.dm5.net
dm6.uk
Step 2: download and filter TLD list:
wget https://raw.githubusercontent.com/publicsuffix/list/master/public_suffix_list.dat
grep -v "//" public_suffix_list.dat | sed '/^$/d; /#/d' | grep -v -P "[^a-z0-9_.-]" | sed 's/^\.//' | awk '{print "." $1}' | sort -u > tlds.txt
So far you have two lists (capture and tlds.txt)
Step 3: Download and run this python script:
wget https://raw.githubusercontent.com/maravento/blackweb/master/bwupdate/tools/parse_domain_tld.py && chmod +x parse_domain_tld.py && python parse_domain_tld.py | sort -u
out:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
Source: blackweb
This can be useful:
grep -Pho "(?<=http://)[^(\"|'|[:space:])]*" file.txt | sed 's/www.//g' | grep -Eo '[[:alnum:]]{1,}\.[[:alnum:]]{1,}[.]{0,1}[[:alnum:]]{0,}' | sort | uniq
First grep get 'http://www.example.com' enclosed in single or double quotes, but extract only domain. Second, using 'sed' I remove 'www.', third one extract domain names separated by '.' and in block of two or three alfnumeric characters. At the end, output is ordered to display only single instances of each domain

sed move text in .txt to next line

I am trying to parse out a text file that looks like the following:
EMPIRE,STATE,BLDG,CO,494202320000008,336,5,AVE,ENT,NEW,YORK,NY,10003,N,3/1/2012,TensionCode,VariableICAP,PFJICAP,Residential,%LBMPZone,L,9,146.0,,,10715.0956,,,--,,0,,,J,TripNumber,ServiceClass,PreviousAccountNumber,MinMonthlyDemand,TODCode,Profile,Tax,Muni,41,39,00000000000000,9952,54,Y,Non-Taxable,--,FromDate,ToDate,Use,Demand,BillAmt,12/29/2011,1/31/2012,4122520,6,936.00,$293,237.54
what I would like to see is the data stacked
- EMPIRE STATE BLDG CO
- 494202320000008
- 336 5 AVE ENT
- NEW YORK NY
and so on. If anything, after each comma I would want the text following to go to a new txt line. Ultimatly in regards to the last line where it states date from forward, I would like to have it in a txt file like
- From Date ToDate use Demand BillAmt
- 12/29/2011 1/31/2012 4122520 6,936.00 $293,237.54.
I am using cygwin on a windows XP machine. Thank you in advance for any assistance.
For getting the last line into a separate file:
echo -e "From Date\tToDate\tuse\tDemand\tBillAmt" > lastlinefile.txt
cat originalfile.txt | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{print $2}' | sed 's/FromDate,ToDate,use,Demand,BillAmt,//' | sed 's/,/\t/' >> lastlinefile.txt
For the rest:
cat originalfile.txt | sed -r 's/,Fromdate[^\n]+//' | sed 's/,/\n/' | sed -r 's/$/\n\n' > nocommas.txt
Your mileage may vary as far as the first '\n' is concerned in the second command. It if doesn't work properly replace it with a space (assuming your data doesn't have spaces).
Or, if you like, a shell script to operate on a file and split it:
#!/bin/bash
if [ -z "$1" ]
then echo "Usage: $0 filename.txt; exit; fi
echo -e "From Date\tToDate\tuse\tDemand\tBillAmt" > "$1_lastline.txt"
cat "$1" | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{print $2}' | sed 's/FromDate,ToDate,use,Demand,BillAmt,//' | sed 's/,/\t/' >> "$1_lastline.txt"
cat "$1" | sed -r 's/,Fromdate[^\n]+//' | sed 's/,/\n/' | sed -r 's/$/\n\n' > "$1_fixed.txt"
Just paste it into a file and run it. It's been years since I used Cygwin... you may have to chmod +x file it first.
I'm providing you two answers depending on how you wanted the file. The previous answer split it into two files, this one keeps it all in one file in the format:
EMPIRE
STATE
BLDG
CO
494202320000008
336
5
AVE
ENT
NEW
YORK
NY
From Date ToDate use Demand BillAmt
12/29/2011 1/31/2012 4122520 6,936.00 $293,237.54.
That's the best I can do with the delimiters have you set in place. If you'd have left it something like "EMPIRE STATE BUILDING CO,494202320000008,336 5 AVE ENT,NEW YORK,NY" it'd be a lot easier.
#!/bin/bash
if [ -z "$1" ]
then echo "Usage: $0 filename.txt; exit; fi
cat "$1" | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{gsub(",","\n",$1);print $1;print "FromDate\tToDate\tuse\tDemand\tBillAmt";gsub("FromDate,ToDate,use,Demand,BillAmt","",$2);gsub(",","\t",$2);print $2}' >> "$1_fixed.txt"
again, just paste it into a file and run it from Cygwin: ./filename.sh

Resources