Running commands on several files - bash

My script (bash shell) extracts some numbers from file *026.s01
input=${input_dir}/*026.s01
seq=` cat $input |awk 'NR>3{print substr($2,5,7)}' | sort -n | head -1`
first_sht=` cat $input | awk 'NR>3{print int($3)}' | head -1`
last_sht=` cat $input | awk 'NR>3{print int($3)}' | tail -1`
echo " $seq $first_sht last_sht" | awk '{printf("%6s%10s%9s\n",$1,$2,$3)}' >> dir_file_SEQ-$seq.txt.
How can I do this on multiple files ${input_dir}/*.s01?
I tried to use :
for file in ${input_dir}/*.s01
do
done
echo " $seq $sail_line $src_dir" | awk '{printf("%6s%10s%9s\n",$1,$2,$3)}' >> dir_file_SEQ-$seq.txt
But instead of getting several dir_file_SEQ-???.txt file I only have 1 file only called dir_file_SEQ-.txt with this content:
Date 229
Date 409
Date 589
Date 769
Date 949
Date 1129
I assume "Date" comes from the error and it is nothing I asked to have, the second column has one of the values I asked but I still miss others.

A bash for loop should do it for you in no time:
SRC_DIR=/root
for FILE in ${SRC_DIR}/*.mp4
do
YOUR COMMANDS
done

It works in this way
for file
in /dir./*.s01
do
input=$file

Related

Unique ending lines for csv with bash

I would like to add a unique link to each line in a csv file in the following form
data1,name1,date1
data2,name2,date2
and afterward, it should look like
data1,name1,date1,somedomain.com/test-ZmQwZTdiNzIyZGExYTc1Njg1YjJjMWE2
data2,name2,date2,somedomain.com/test-ZTdmYjY4N2M5MjM0NzcxYjJjNGE0N2I5
whereby I was thinking to generate the unique strings with
date +%s | sha256sum | base64 | head -c 32 ; echo
I found approaches for part of it but I am not sure how to put it together.
You can use awk with the built-in getline command to call an external command and append the result to the end of each line.
Assuming your date is on the last field $NF
awk -F "," '{
cmd = "date -d "$NF" +%s | sha256sum | base64 | head -c 32"
cmd | getline hash
print $0 FS hash
close(cmd)
}' file.csv
Input
data1,name1,2017-11-01
data2,name2,2017-11-02
Output
data1,name1,2017-11-01,YTRiYWNmYmExMmM0NjJhYjAzNzU4ZGIx
data2,name2,2017-11-02,MTBjYjNlZTc5ZmNlMTU2NWFiY2Q2NmJk

Oneline file-monitoring

I have a logfile continously filling with stuff.
I wish to monitor this file, grep for a specific line and then extract and use parts of that line in a curl command.
I had a look at How to grep and execute a command (for every match)
This would work in a script but I wonder if it is possible to achieve this with the oneliner below using xargs or something else?
Example:
Tue May 01|23:59:11.012|I|22|Event to process : [imsi=242010800195809, eventId = 242010800195809112112, msisdn=4798818181, inbound=false, homeMCC=242, homeMNC=01, visitedMCC=238, visitedMNC=01, timestamp=Tue May 12 11:21:12 CEST 2015,hlr=null,vlr=4540150021, msc=4540150021 eventtype=S, currentMCC=null, currentMNC=null teleSvcInfo=null camelPhases=null serviceKey=null gprsenabled= false APNlist: null SGSN: null]|com.uws.wsms2.EventProcessor|processEvent|139
Extract the fields I want and semi-colon separate them:
tail -f file.log | grep "Event to process" | awk -F'=' '{print $2";"$4";"$12}' | tr -cd '[[:digit:].\n.;]'
Curl command, e.g. something like:
http://user:pass#www.some-url.com/services/myservice?msisdn=...&imsi=...&vlr=...
Thanks!
Try this:
tail -f file.log | grep "Event to process" | awk -F'=' '{print $2" "$4" "$12; }' | tr -cd '[[:digit:].\n. ]' |while read msisdn imsi vlr ; do curl "http://user:pass#www.some-url.com/services/myservice?msisdn=$msisdn&imsi=$imsi&vlr=$vlr" ; done

Error calling system() within awk

I'm trying to execute a system command to find out how many unique references a csv file has in its first seven characters as part of a larger awk script that processes the same csv file. There are duplicate entries and I don't want awk to parse the whole file twice so I'm avoiding NR. The gist of this part of the script is:
#!/bin/bash
awk '
{
#do some stuff, then when finished, count the number of unique references
productFile="BusinessObjects.csv";
systemCall = sprintf( "cat %s | cut -c 1-7 | sort | uniq | wc -l", $productFile );
productCount=`system( systemCall )`-1; #subtract 1 to remove column label row
}' < BusinessObjects.csv
And the interpreter doesn't like it:
awk: cmd. line:19: ^ syntax error ./awkscript.sh: line 38: syntax error near unexpected token '('
./awkscript.sh: line 38: systemCall = sprintf( "cat %s | cut -c 1-7 | sort | uniq | wc -l", $productFile );
If I hard-code the system command
productCount=`system( "cat BusinessObjects.csv | cut -c 1-7 | sort | uniq | wc -l" )`-1;
I get:
./awkscript.sh: command substitution: line 39: syntax error near unexpected token '"cat BusinessObjects.csv | cut -c 1-7 | sort | uniq | wc -l"'
./awkscript.sh: command substitution: line 39: 'system( "cat BusinessObjects.csv | cut -c 1-7 | sort | uniq | wc -l" )'
Technically, I could do this outside of awk at the start of the shell script, store the result in a system variable, and then pass it to awk using -v, but it's not great for the readability of the awk script (it's a few hundred lines long). Do I have a space or quotes in the wrong place? I've tried fiddling, but I can't seem to present the call to system() in a way that the interpreter will accept. Finally, is there a more sensible way to do this?
Edit: the csv file is indeed semicolon-delimited, so it's best to cut using the delimiter rather than the number of chars (thanks!).
ProductRef;Data1;Data2;etc
1234567;etc;etc;etc
Edit 2:
I'm trying to parse a csv file whose first column is full of N unique product references, and create a series of associated HTML pages that include a "Page n of N" information field. It's (painfully obviously) the first time I've used awk, but it seemed like an appropriate tool for parsing csv files. I'm trying to hence count and return the number of unique references. At the shell
cut -d\; -f1 BusinessObjects.csv | sort | uniq | wc -l
works fine, but I can't get it working inside awk by doing
#!/bin/bash
if [ -n "$1" ]
then
productFile=$1
else
echo "Missing product file argument."
exit
fi
awk -v productFile=$productFile '
BEGIN {
FS=";";
productCount = 0;
("cut -d\"\;\" -f1 " productFile " | sort | uniq | wc -l") | getline productCount;
productCount -=1; #remove the column label row
}
{
print productCount;
}'
I get a syntax error on the cut code if I don't wrap the semicolon in \"\;\" and the script just hangs without printing anything when I do.
I don't remember that you can use backticks in awk.
productCount=`system( systemCall )`-1; #subtract 1 to remove column label row
You can read your output by not using system and running your command directly, and using getline instead:
systemCall | getline productCount
productCount -= 1
Or more completely
productFile = "BusinessObjects.csv"
systemCall = "cut -c 1-7 " productFile " | sort | uniq | wc -l"
systemCall | getline productCount
productCount -= 1
No need to use sprintf and include cat.
Assigning strings to variables is also optional. You can just have "xyz" | getline ....
sort | uniq can just be sort -u if supported.
Quoting may be necessary if filename has spaces or characters that may confuse the command.
getline may alter global variables differently from expected. See https://www.gnu.org/software/gawk/manual/html_node/Getline.html.
Could something like this be an option?
$ cat productCount.sh
#!/bin/bash
if [ -n "$1" ]
then
productCount=`cat $1 | cut -c 1-7 | sort | uniq | wc -l`
echo $productCount
else
echo "please supply a filename as parameter"
fi
$ ./productCount.sh BusinessObjects.csv
9

awk and md5: replace a column

Starting from Awk replace a column with its hash value, I tried to hash(md5) a list of numbers:
$ cat -n file
1 40755462755
2 40751685373
3 40730094339
4 40722740446
5 40722740446
6 40743802204
7 40730094339
8 40745188886
9 40740593352
10 40745561530
If I run:
cat file | awk '{cmd="echo -n " $1 " | md5sum|cut -d\" \" -f1"; cmd|getline md5; $1=md5;print;}' | cat -n
1 29ece26ce4633b6e9480255db194cc40
2 120148eca0891d0fc645413d0f26b66b
3 cafc48d392a004f75b669f9d1d7bf894
4 7b4367e8f58835c0827dd6a2f61b7258
5 7b4367e8f58835c0827dd6a2f61b7258
6 49b12d1f3305ab93b33b330e8b1d3165
7 49b12d1f3305ab93b33b330e8b1d3165
8 bee44c89ac9d4e8e4e1f1c5c63088c71
9 f07262ac8f53755232c5abbf062364d0
10 2ac7c22170c00a3527eb99a2bfde2c2c
I don't know why the line 7 get the same md5 as line 6 because if I run them separately they are different:
$ echo -n 40743802204 | md5sum|cut -d" " -f1
49b12d1f3305ab93b33b330e8b1d3165
$ echo -n 40730094339 | md5sum|cut -d" " -f1
cafc48d392a004f75b669f9d1d7bf894
I tried some prints:
cat file| awk '{print $0,NF,NR;cmd="echo -n " $1 " | md5sum|cut -d\" \" -f1"; cmd|getline md5; $1=md5"---"cmd"---"$1;print;}' | cat -n
but with no success to find what's going wrong.
EDIT: As the title says, I try to replace a column in a file(a file with hundred fields). So, $1 would be $24 and NF would be 120 for a file and 233 for another file.
I wouldn't use getline in awk like that. You can do:
while read -r num; do
echo -n $num | md5sum | cut -d ' ' -f1;
done < file
29ece26ce4633b6e9480255db194cc40
120148eca0891d0fc645413d0f26b66b
cafc48d392a004f75b669f9d1d7bf894
7b4367e8f58835c0827dd6a2f61b7258
7b4367e8f58835c0827dd6a2f61b7258
49b12d1f3305ab93b33b330e8b1d3165
cafc48d392a004f75b669f9d1d7bf894
bee44c89ac9d4e8e4e1f1c5c63088c71
f07262ac8f53755232c5abbf062364d0
2ac7c22170c00a3527eb99a2bfde2c2c
Ok, I found the issue. The pipes in awk should be closed.
So, I needed a close(cmd);
I found the solution here
I would GUESS, but can't tell since you aren't testing it's return code, that it's because your getline is failing at line 7 so md5 has the same value it did for the previous line. Use of getline is fraught with caveats and not for use by beginners, see http://awk.info/?tip/getline.
What value are you getting out of using awk for this anyway as opposed to just staying in shell?
It's a bit awkward with all the quoting - I'm not sure why would it fail to be honest. But here's something that uses less awk and works just fine:
< tmp | while read num ; do echo -n $num | md5sum | cut -f1 -d' '; done | cat -n

sed move text in .txt to next line

I am trying to parse out a text file that looks like the following:
EMPIRE,STATE,BLDG,CO,494202320000008,336,5,AVE,ENT,NEW,YORK,NY,10003,N,3/1/2012,TensionCode,VariableICAP,PFJICAP,Residential,%LBMPZone,L,9,146.0,,,10715.0956,,,--,,0,,,J,TripNumber,ServiceClass,PreviousAccountNumber,MinMonthlyDemand,TODCode,Profile,Tax,Muni,41,39,00000000000000,9952,54,Y,Non-Taxable,--,FromDate,ToDate,Use,Demand,BillAmt,12/29/2011,1/31/2012,4122520,6,936.00,$293,237.54
what I would like to see is the data stacked
- EMPIRE STATE BLDG CO
- 494202320000008
- 336 5 AVE ENT
- NEW YORK NY
and so on. If anything, after each comma I would want the text following to go to a new txt line. Ultimatly in regards to the last line where it states date from forward, I would like to have it in a txt file like
- From Date ToDate use Demand BillAmt
- 12/29/2011 1/31/2012 4122520 6,936.00 $293,237.54.
I am using cygwin on a windows XP machine. Thank you in advance for any assistance.
For getting the last line into a separate file:
echo -e "From Date\tToDate\tuse\tDemand\tBillAmt" > lastlinefile.txt
cat originalfile.txt | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{print $2}' | sed 's/FromDate,ToDate,use,Demand,BillAmt,//' | sed 's/,/\t/' >> lastlinefile.txt
For the rest:
cat originalfile.txt | sed -r 's/,Fromdate[^\n]+//' | sed 's/,/\n/' | sed -r 's/$/\n\n' > nocommas.txt
Your mileage may vary as far as the first '\n' is concerned in the second command. It if doesn't work properly replace it with a space (assuming your data doesn't have spaces).
Or, if you like, a shell script to operate on a file and split it:
#!/bin/bash
if [ -z "$1" ]
then echo "Usage: $0 filename.txt; exit; fi
echo -e "From Date\tToDate\tuse\tDemand\tBillAmt" > "$1_lastline.txt"
cat "$1" | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{print $2}' | sed 's/FromDate,ToDate,use,Demand,BillAmt,//' | sed 's/,/\t/' >> "$1_lastline.txt"
cat "$1" | sed -r 's/,Fromdate[^\n]+//' | sed 's/,/\n/' | sed -r 's/$/\n\n' > "$1_fixed.txt"
Just paste it into a file and run it. It's been years since I used Cygwin... you may have to chmod +x file it first.
I'm providing you two answers depending on how you wanted the file. The previous answer split it into two files, this one keeps it all in one file in the format:
EMPIRE
STATE
BLDG
CO
494202320000008
336
5
AVE
ENT
NEW
YORK
NY
From Date ToDate use Demand BillAmt
12/29/2011 1/31/2012 4122520 6,936.00 $293,237.54.
That's the best I can do with the delimiters have you set in place. If you'd have left it something like "EMPIRE STATE BUILDING CO,494202320000008,336 5 AVE ENT,NEW YORK,NY" it'd be a lot easier.
#!/bin/bash
if [ -z "$1" ]
then echo "Usage: $0 filename.txt; exit; fi
cat "$1" | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{gsub(",","\n",$1);print $1;print "FromDate\tToDate\tuse\tDemand\tBillAmt";gsub("FromDate,ToDate,use,Demand,BillAmt","",$2);gsub(",","\t",$2);print $2}' >> "$1_fixed.txt"
again, just paste it into a file and run it from Cygwin: ./filename.sh

Resources