Related
I'm new to bash shell and I have to do a script with a csv file.
The file is a list of the participants, countries, sports and medals achieved.
when executing the script, I should give as parameters the nationality (column 3) and the sport (column 8). The script should return the amount of participants of that country for that sport, and the amount of medals achieved.
The amount of medals achieved is the sum of the columns "gold" "silver" "bronze" of each row which are columns 9,10 and 11.
I cannot use grep, awk, sed or csvkit.
So far, I have this code but I'm stuck with the medal counting part.
nacionality=$1
sport=$2
columns= cut -d, -f 3,8 athletes.csv
echo columns | tr -cd $nacionality,$sport | wc -c
Could anyone help me?
The file is: https://github.com/flother/rio2016/blob/master/athletes.csv
The name of the file is script2_4.sh
An example of the output is:
./script2_4.sh POL rowing
Participants, Medals
26, 6
A sample of the file:
id,name,nationality,sex,date_of_birth,height,weight,sport,gold,silver,bronze,info
736041664,A Jesus Garcia,ESP,male,1969-10-17,1.72,64,athletics,0,0,0,
532037425,A Lam Shin,KOR,female,1986-09-23,1.68,56,fencing,0,0,0,
435962603,Aaron Brown,CAN,male,1992-05-27,1.98,79,athletics,0,0,1,
521041435,Aaron Cook,MDA,male,1991-01-02,1.83,80,taekwondo,0,0,0,
33922579,Aaron Gate,NZL,male,1990-11-26,1.81,71,cycling,0,0,0,
173071782,Aaron Royle,AUS,male,1990-01-26,1.80,67,triathlon,0,0,0,
266237702,Aaron Russell,USA,male,1993-06-04,2.05,98,volleyball,0,0,1,
382571888,Aaron Younger,AUS,male,1991-09-25,1.93,100,aquatics,0,0,0,
87689776,Aauri Lorena Bokesa,ESP,female,1988-12-14,1.80,62,athletics,0,0,0,
997877719,Ababel Yeshaneh,ETH,female,1991-07-22,1.65,54,athletics,0,0,0,
343694681,Abadi Hadis,ETH,male,1997-11-06,1.70,63,athletics,0,0,0,
591319906,Abbas Abubakar Abbas,BRN,male,1996-05-17,1.75,66,athletics,0,0,0,
258556239,Abbas Qali,IOA,male,1992-10-11,,,aquatics,0,0,0,
376068084,Abbey D'Agostino,USA,female,1992-05-25,1.61,49,athletics,0,0,0,
162792594,Abbey Weitzeil,USA,female,1996-12-03,1.78,68,aquatics,1,1,0,
521036704,Abbie Brown,GBR,female,1996-04-10,1.76,71,rugby sevens,0,0,0,
149397772,Abbos Rakhmonov,UZB,male,1998-07-07,1.61,57,wrestling,0,0,0,
256673338,Abbubaker Mobara,RSA,male,1994-02-18,1.75,64,football,0,0,0,
337369662,Abby Erceg,NZL,female,1989-11-20,1.75,68,football,0,0,0,
334169879,Abd Elhalim Mohamed Abou,EGY,male,1989-06-03,2.10,88,volleyball,0,0,0,
215053268,Abdalaati Iguider,MAR,male,1987-03-25,1.73,57,athletics,0,0,0,
763711985,Abdalelah Haroun,QAT,male,1997-01-01,1.85,80,athletics,0,0,0,
Here is a pure bash implementation. Build a hash from field name to position ($h):
#!/bin/bash
file=athletes.csv
nationality=$1
sport=$2
IFS=, read -a l < "$file"
declare -A h
for pos in "h${!l[#]}"
do
h["${l[$pos]}"]=$pos
done
declare -i participants=0
declare -i medals=0
while IFS=, read -a l
do
if [ "${l[${h["nationality"]}]}" = "$nationality" ] &&
[ "${l[${h["sport"]}]}" = "$sport" ]
then
((participants++))
medals=$((
$medals +
"${l[${h["gold"]}]}" +
"${l[${h["silver"]}]}" +
"${l[${h["bronze"]}]}"
))
fi
done < "$file"
echo "Participants, Medals"
echo "$participants, $medals"
and example output with the first 4 lines of input:
$ ./script2_4.sh CAN athletics
Participants, Medals
1, 1
I have a file with more than 10K lines of record.
Within each line, there are two date+time info. Below is an example:
"aaa bbb ccc 170915 200801 12;ddd e f; g; hh; 171020 122030 10; ii jj kk;"
I want to filter out the lines the days between these two dates is less than 30 days.
Below is my source code:
#!/bin/bash
filename="$1"
echo $filename
touch filterfile
totalline=`wc -l $filename | awk '{print $1}'`
i=0
j=0
echo $totalline lines
while read -r line
do
i=$[i+1]
if [ $i -gt $[j+9] ]; then
j=$i
echo $i
fi
shortline=`echo $line | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'`
date1=`echo $shortline | awk '{print $1}'`
date2=`echo $shortline | awk '{print $2}'`
if [ $date1 -gt 700000 ]
then
continue
fi
d1=`date -d $date1 +%s`
d2=`date -d $date2 +%s`
diffday=$[(d2-d1)/(24*3600)]
#diffdays=`date -d $date2 +%s` - `date -d $date1 +%s`)/(24*3600)
if [ $diffday -lt 30 ]
then
echo $line >> filterfile
fi
done < "$filename"
I am running it in cywin. It took about 10 second to handle 10 lines. I use echo $i to show the progress.
Is it because i am using some wrong way in my script?
This answer does not answer your question but gives an alternative method to your shell script. The answer to your question is given by Sundeep's comment :
Why is using a shell loop to process text considered bad practice?
Furthermore, you should be aware that everytime you call sed, awk, echo, date, ... you are requesting the system to execute a binary which needs to be loaded into memory etc etc. So if you do this in a loop, it is very inefficient.
alternative solution
awk programs are commonly used to process log files containing timestamp information, indicating when a particular log record was written. gawk extended the awk standard with time-handling functions. The one you are interested in is :
mktime(datespec [, utc-flag ]) Turn datespec into a timestamp in the
same form as is returned by systime(). It is similar to the function
of the same name in ISO C. The argument, datespec, is a string of the
form "YYYY MM DD HH MM SS [DST]". The string consists of six or seven
numbers representing, respectively, the full year including century,
the month from 1 to 12, the day of the month from 1 to 31, the hour of
the day from 0 to 23, the minute from 0 to 59, the second from 0 to
60, and an optional daylight-savings flag.
The values of these numbers need not be within the ranges specified;
for example, an hour of -1 means 1 hour before midnight. The
origin-zero Gregorian calendar is assumed, with year 0 preceding year
1 and year -1 preceding year 0. If utc-flag is present and is either
nonzero or non-null, the time is assumed to be in the UTC time zone;
otherwise, the time is assumed to be in the local time zone. If the
DST daylight-savings flag is positive, the time is assumed to be
daylight savings time; if zero, the time is assumed to be standard
time; and if negative (the default), mktime() attempts to determine
whether daylight savings time is in effect for the specified time.
If datespec does not contain enough elements or if the resulting time
is out of range, mktime() returns -1.
As your date format is of the form yymmdd HHMMSS we need to write a parser function convertTime for this. Be aware in this function we will pass times of the form yymmddHHMMSS. Furthermore, using a space delimited fields, your times are located in field $4$5 and $11$12. As mktime converts the time to seconds since 1970-01-01 onwards, all we need to do is to check if the delta time is smaller than 30*24*3600 seconds.
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ t1=convertTime($4$5); t2=convertTime($11$12)}
(t2-t1 < 30*3600*24) { print }' <file>
If you are not interested in the real delta time (your sed line removes the actual time of the day), than you can adopt it to :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s "00 00 00"
return mktime(s)
}
{ t1=convertTime($4); t2=convertTime($11)}
(t2-t1 < 30*3600*24) { print }' <file>
If the dates are not in the fields, you can use match to find them :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ match($0,/[0-9]{6} [0-9]{6}/);
t1=convertTime(substr($0,RSTART,RLENGTH));
a=substr($0,RSTART+RLENGTH)
match(a,/[0-9]{6} [0-9]{6}/)
t2=convertTime(substr(a,RSTART,RLENGTH))}
(t2-t1 < 30*3600*24) { print }' <file>
With some modifications, often without speed in mind, I can reduce the processing time by 50% - which is a lot:
#!/bin/bash
filename="$1"
echo "$filename"
# touch filterfile
totalline=$(wc -l < "$filename")
i=0
j=0
echo "$totalline" lines
while read -r line
do
i=$((i+1))
if (( i > ((j+9)) )); then
j=$i
echo $i
fi
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
if (( date1 > 700000 ))
then
continue
fi
d1=$(date -d "$date1" +%s)
d2=$(date -d "$date2" +%s)
diffday=$(((d2-d1)/(24*3600)))
# diffdays=$(date -d $date2 +%s) - $(date -d $date1 +%s))/(24*3600)
if (( diffday < 30 ))
then
echo "$line" >> filterfile
fi
done < "$filename"
Some remarks:
# touch filterfile
Well - the later CMD >> filterfile overwrites this file and creates one, if it doesn't exist.
totalline=$(wc -l < "$filename")
You don't need awk, here. The filename output is surpressed if wc doesn't see the filename.
Capturing the output in an array:
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
allows us array access and saves another call to awk.
On my machine, your code took about 42s for 2880 lines (on your machine 2880 s?) and about 19s for the same file with my code.
So I suspect, if you aren't running it on an i486-machine, that cygwin might be a slowdown. It's a linux environment for windows, isn't it? Well, I'm on a core Linux system. Maybe you try the gnu-utils for Windows - the last time I looked for them, they were advertised as gnu-utils x32 or something, maybe there is an a64-version available by now.
And the next thing I would have a look at, is the date calculation - that might be a slowdown too.
2880 lines isn't that much, so I don't suspect that my SDD drive plays a huge role in the game.
I have a shell script that is writing(echoing) the output on an array to a file. The file is in the following format
The tansaction detials for today are 35
Please check the 5 biggest transactions below
-----------------------------------------------------------------------------------
Client Name,Account Number,Amount,Tran Time
Michael Press,20484,602117,11.41.02
Adam West,164121,50152,11.41.06
John Smith,15113,411700,11.41.07
Leo Anderson,2115116,350056,11.41.07
Wayne Clark,451987,296503,11.41.08
And i have multiple such line.
How do i tabulate the names after ---?
I tried using spaces while echoing the array elements. Also tried tabs. I tried using column -t -s options. But the text above the --- is interfering with the desired output.
The desired output is
The tansaction detials for today are 35
Please check the 5 biggest transactions below
-----------------------------------------------------------------------------------
Client Name Account Number Amount Tran Time
Michael Press 20484 602117 11.41.02
Adam West 164121 50152 11.41.06
John Smith 15113 411700 11.41.07
Leo Anderson 2115116 350056 11.41.07
Wayne Clark 451987 296503 11.41.08
The printing to a file is a part of a bigger script. So, i am looking for a simple solution to plug into this script.
Here's the snippet from that script where i am echoing to the file.
echo "The tansaction detials for today are 35 " >> log.txt
echo "" >> log.txt
echo " Please check the 5 biggest transactios below " >> log.txt
echo "" >> log.txt
echo "-----------------------------------------------------------------------------------" >> log.txt
echo "" >> log.txt
echo "" >> log.txt
echo "Client Name,Account Number,Amount,Tran Time" >> log.txt
array=( `output from a different script` )
x=1
for i in ${array[#]}
do
#echo "Array $x - $i"
Clientname=$(echo $i | cut -f1 -d',')
accountno=$(echo $i | cut -f2 -d',')
amount=$(echo $i | cut -f3 -d',')
trantime=$(echo $i | cut -f4 -d',')
echo "$Clientname,$accountno,$amount,$trantime" >> log.txt
(( x=$x+1 ))
done
I'm not sure to understand everythings =P
but to answer this question :
How do i tabulate the names after ---?
echo -e "Example1\tExample2"
-e means : enable interpretation of backslash escapes
So for your output, I suggest :
echo -e "$Clientname\t$accountno\t$amount\t$trantime" >> log.txt
Edit : If you need more space, you can double,triple,... it
echo -e "Example1\t\tExample2"
If I understand your question, in order to produce the output format of:
Client Name Account Number Amount Tran Time
Michael Press 20484 602117 11.41.02
Adam West 164121 50152 11.41.06
John Smith 15113 411700 11.41.07
Leo Anderson 2115116 350056 11.41.07
Wayne Clark 451987 296503 11.41.08
You should use the output formatting provided by printf instead of echo. For example, for the headings, you can use:
printf "Client Name Account Number Amount Tran Time\n" >> log.txt
instead of:
echo "Client Name,Account Number,Amount,Tran Time" >> log.txt
For writing the five largest amounts and details, you could use:
printf "%-14s%-17s%8s%s\n" "$Clientname" "$accountno" "$amount" "$trantime" >> log.txt
instead of:
echo "$Clientname,$accountno,$amount,$trantime" >> log.txt
If that isn't what you are needing, just drop a comment and let me know and I'm happy to help further.
(you may have to tweak the field widths a bit, I just did a rough count)
True Tabular Output Requires Measuring Each Field
If you want to insure that your data is always in tabular form, you need to measure each field width (including the heading) and then take the max of either the field width (or heading) to set the field width for your output. Below is an example of how that can be done (using your simulated other program input):
#!/bin/bash
ofn="log.txt" # set output filename
# declare variables as array and integer types
declare -a line_arr hdg name acct amt trn tmp
declare -i nmx=0 acmx=0 ammx=0 tmx=0
# set heading array (so you can measure lengths)
hdg=( "Client Name"
"Account Number"
"Ammount"
"Tran Time" )
## set the initial max based on headings
nmx="${#hdg[0]}" # max name width
acmx="${#hdg[1]}" # max account width
ammx="${#hdg[2]}" # max ammount width
tmx="${#hdg[3]}" # max tran width
{ IFS=$'\n' # your array=( `output from a different script` )
line_arr=($(
cat << EOF
Michael Press,20484,602117,11.41.02
Adam West,164121,50152,11.41.06
John Smith,15113,411700,11.41.07
Leo Anderson,2115116,350056,11.41.07
Wayne Clark,451987,296503,11.41.08
EOF
)
)
}
# write heading to file
cat << EOF > "$ofn"
The tansaction detials for today are 35
Please check the 5 biggest transactions below
-----------------------------------------------------------------------------------
EOF
# read line array into tmp, compare to max field widths
{ IFS=$','
for i in "${line_arr[#]}"; do
tmp=( $(printf "%s" "$i") )
((${#tmp[0]} > nmx )) && nmx=${#tmp[0]}
((${#tmp[1]} > acmx )) && acmx=${#tmp[1]}
((${#tmp[2]} > ammx )) && ammx=${#tmp[2]}
((${#tmp[3]} > tmx )) && tmx=${#tmp[3]}
name+=( "${tmp[0]}" ) # fill name array
acct+=( "${tmp[1]}" ) # fill account num array
amt+=( "${tmp[2]}" ) # fill amount array
trn+=( "${tmp[3]}" ) # fill tran array
done
}
printf "%-*s %-*s %-*s %s\n" "$nmx" "${hdg[0]}" "$acmx" "${hdg[1]}" \
"$ammx" "${hdg[2]}" "${hdg[3]}" >> "$ofn"
for ((i = 0; i < ${#name[#]}; i++)); do
printf "%-*s %-*s %-*s %s\n" "$nmx" "${name[i]}" "$acmx" "${acct[i]}" \
"$ammx" "${amt[i]}" "${trn[i]}" >> "$ofn"
done
(you can remove the extra space between each field in the final two printf statements if you only want a single space between them -- looked better with 2 to me)
Output to log.txt
$ cat log.txt
The tansaction detials for today are 35
Please check the 5 biggest transactions below
-----------------------------------------------------------------------------------
Client Name Account Number Ammount Tran Time
Michael Press 20484 602117 11.41.02
Adam West 164121 50152 11.41.06
John Smith 15113 411700 11.41.07
Leo Anderson 2115116 350056 11.41.07
Wayne Clark 451987 296503 11.41.08
Look things over and let me know if you have any questions.
we have been asked to parse a csv file and perform some operations based upon the data in the csv
I am trying to find the maximum of addition of two numbers which i get from the csv file
that is the last and second last numbers, which are decimals
Following is my code
#!/bin/bash
#this file was created on 09/03/2014
#Author = Shashank Pangam
OLDIFS=$IFS
IFS=","
maxTransport=0
while read year month hydro geo solar wind fuel1 biomassL biomassC totalRenew fuel2 biodieselT biomassT
do
while [ $year -eq 2012 ]
do
currentTransport=$(echo "$biodieselT+$biomassT" | bc)
echo $currentTransport
if (( $(echo "$currentTransport > $maxTransport" | bc -l)));
then
$maxTransport = $currentTransport
echo $maxTransport
fi
done
echo -e "Maximum amount of energy consumed by the Transportation sector for year 2012 : $maxTransport"
done < $1
and the following is my csv file
2012,January,2.614,0.356,0.006,0.021,114.362,14.128,1.308,66.74,196.539,199.536,81.791,
2012,February,2.286,0.333,0.007,0.017,107.388,13.952,1.304,61.277,183.921,186.564,81.545,
2012,March,0.356,0.009,0.02,108.268,15.588,1.404,63.444,188.705,191.318,87.827,11.187,
2012,April,,0.344,0.012,0.019,103.627,14.229,1.381,60.683,179.919,181.993,86.339,11.518,
2012,May,,0.356,0.012,0.01,109.644,13.789,1.473,63.611,188.517,190.913,92.087,12.09,
2012,June,,0.344,0.013,0.013,108.116,13.012,1.434,61.056,183.618,185.65,89.673,12.461,
2012,July,,0.356,0.017,0.008,112.426,14.035,1.403,58.057,185.921,187.61,87.707,10.464,
2012,August,0.356,0.016,0.008,113.64,14.01,1.513,60.011,189.174,190.999,94.592,11.14,
2012,September,1.513,0.344,0.015,0.01,110.84,13.435,1.324,56.047,181.647,183.528,82.814,
2012,October,1.83,0.356,0.012,0.02,111.544,15.597,1.462,57.365,185.969,188.186,91.42,
2012,November,2.022,0.344,0.01,0.014,111.808,15.594,1.326,56.793,185.521,187.911,82.919,
2012,December,1.77,0.356,0.007,0.022,116.416,15.873,1.368,58.741,192.398,194.552,85.526,
2013,January,3.021,0.357,0.007,0.018,114.601,15.309,1.334,57.31,188.553,191.956,83.415,
2013,February,3.285,0.322,0.012,0.023,102.499,13.658,1.246,52.05,169.452,173.094,77.914,
2013,March,0.357,0.016,0.025,111.594,14.538,1.419,59.096,186.646,189.884,88.713,11.938,
2013,April,,0.345,0.018,0.03,103.602,14.446,1.437,59.057,178.542,181.342,89.867,12.184,
2013,May,,0.357,0.02,0.032,108.113,14.452,1.497,62.606,186.668,190.117,93.634,13.166,
2013,June,,0.345,0.021,0.028,109.162,14.597,1.47,61.563,186.792,189.994,91.894,14.501,
2013,July,,0.357,0.018,0.024,119.154,15.018,1.45,62.037,197.659,201.027,90.689,14.523,
2013,August,0.357,0.022,0.02,113.177,15.014,1.44,60.682,190.313,192.949,90.065,13.28,
2013,September,2.185,0.345,0.021,0.026,106.912,14.367,1.411,58.901,181.591,184.168,88.254,
2013,October,2.171,0.357,0.02,0.029,109.123,15.158,1.483,64.509,190.273,192.849,92.748
The following is the error i get
./calculator.sh: line 16: 0: command not found
0
268.109
I don't understand why echo $currentTransport returns 0 while in the comparison it works and assigns value to maxTransport but throws the error for the same.
Thanks in advance.
Instead of this:
$maxTransport = $currentTransport
Try this:
maxTransport=$currentTransport
The $ in front of a variable gives its contents. By removing the $, the actual variable location of maxTransport is used instead as the destination for the contents of currentTransport.
I'm curious of how I might go about creating a more detailed output within the same scripting file. I would like to modify the script so that after it finds the matching line for the currency code, it outputs the following:
Currency code: {currency code value}
Currency name: {currency name value}
Units per USD: {units value}
USD per unit : {USD value}
What I have so far:
#!/bin/bash
# Set local variable $cc to first command line argument, if present
cc=$1
# execute the do loop while $curr is blank
while [ -z "$curr" ]
do
# test if $cc is blank. If so, prompt for a value
if [ -z "$cc" ]
then
echo -n "Enter a currency code: "
read cc
fi
# Search for $cc as a code in the curr.tab file
curr=`grep "^$cc" curr.tab`
# clear cc in case grep found no match and the loop should be repeated
cc=
done
# echo the result
echo $curr
The curr.tab I am using would look something like this:
USD US Dollar 1.0000000000 1.0000000000
EUR Euro 0.7255238463 1.3783144484
GBP British Pound 0.6182980743 1.6173428992
INR Indian Rupee 61.5600229886 0.0162443084
AUD Australian Dollar 1.0381120551 0.9632871472
CAD Canadian Dollar 1.0378792155 0.9635032527
AED Emirati Dirham 3.6730001428 0.2722570000
MYR Malaysian Ringgit 3.1596464286 0.3164911083
At the end, instead of echo $curr do:
[[ $curr =~ ^(...)[[:space:]]+(.+)([[:space:]]+([[:digit:]\.]+)){2}$ ]]
printf "Currency code: %s\nCurrency name: %s\nUnits per USD: %s\nUSD per unit : %s\n" "${BASH_REMATCH[#]:1}"