I'm new to bash shell and I have to do a script with a csv file.
The file is a list of the participants, countries, sports and medals achieved.
when executing the script, I should give as parameters the nationality (column 3) and the sport (column 8). The script should return the amount of participants of that country for that sport, and the amount of medals achieved.
The amount of medals achieved is the sum of the columns "gold" "silver" "bronze" of each row which are columns 9,10 and 11.
I cannot use grep, awk, sed or csvkit.
So far, I have this code but I'm stuck with the medal counting part.
nacionality=$1
sport=$2
columns= cut -d, -f 3,8 athletes.csv
echo columns | tr -cd $nacionality,$sport | wc -c
Could anyone help me?
The file is: https://github.com/flother/rio2016/blob/master/athletes.csv
The name of the file is script2_4.sh
An example of the output is:
./script2_4.sh POL rowing
Participants, Medals
26, 6
A sample of the file:
id,name,nationality,sex,date_of_birth,height,weight,sport,gold,silver,bronze,info
736041664,A Jesus Garcia,ESP,male,1969-10-17,1.72,64,athletics,0,0,0,
532037425,A Lam Shin,KOR,female,1986-09-23,1.68,56,fencing,0,0,0,
435962603,Aaron Brown,CAN,male,1992-05-27,1.98,79,athletics,0,0,1,
521041435,Aaron Cook,MDA,male,1991-01-02,1.83,80,taekwondo,0,0,0,
33922579,Aaron Gate,NZL,male,1990-11-26,1.81,71,cycling,0,0,0,
173071782,Aaron Royle,AUS,male,1990-01-26,1.80,67,triathlon,0,0,0,
266237702,Aaron Russell,USA,male,1993-06-04,2.05,98,volleyball,0,0,1,
382571888,Aaron Younger,AUS,male,1991-09-25,1.93,100,aquatics,0,0,0,
87689776,Aauri Lorena Bokesa,ESP,female,1988-12-14,1.80,62,athletics,0,0,0,
997877719,Ababel Yeshaneh,ETH,female,1991-07-22,1.65,54,athletics,0,0,0,
343694681,Abadi Hadis,ETH,male,1997-11-06,1.70,63,athletics,0,0,0,
591319906,Abbas Abubakar Abbas,BRN,male,1996-05-17,1.75,66,athletics,0,0,0,
258556239,Abbas Qali,IOA,male,1992-10-11,,,aquatics,0,0,0,
376068084,Abbey D'Agostino,USA,female,1992-05-25,1.61,49,athletics,0,0,0,
162792594,Abbey Weitzeil,USA,female,1996-12-03,1.78,68,aquatics,1,1,0,
521036704,Abbie Brown,GBR,female,1996-04-10,1.76,71,rugby sevens,0,0,0,
149397772,Abbos Rakhmonov,UZB,male,1998-07-07,1.61,57,wrestling,0,0,0,
256673338,Abbubaker Mobara,RSA,male,1994-02-18,1.75,64,football,0,0,0,
337369662,Abby Erceg,NZL,female,1989-11-20,1.75,68,football,0,0,0,
334169879,Abd Elhalim Mohamed Abou,EGY,male,1989-06-03,2.10,88,volleyball,0,0,0,
215053268,Abdalaati Iguider,MAR,male,1987-03-25,1.73,57,athletics,0,0,0,
763711985,Abdalelah Haroun,QAT,male,1997-01-01,1.85,80,athletics,0,0,0,
Here is a pure bash implementation. Build a hash from field name to position ($h):
#!/bin/bash
file=athletes.csv
nationality=$1
sport=$2
IFS=, read -a l < "$file"
declare -A h
for pos in "h${!l[#]}"
do
h["${l[$pos]}"]=$pos
done
declare -i participants=0
declare -i medals=0
while IFS=, read -a l
do
if [ "${l[${h["nationality"]}]}" = "$nationality" ] &&
[ "${l[${h["sport"]}]}" = "$sport" ]
then
((participants++))
medals=$((
$medals +
"${l[${h["gold"]}]}" +
"${l[${h["silver"]}]}" +
"${l[${h["bronze"]}]}"
))
fi
done < "$file"
echo "Participants, Medals"
echo "$participants, $medals"
and example output with the first 4 lines of input:
$ ./script2_4.sh CAN athletics
Participants, Medals
1, 1
I'm writing a bash script which takes a number, and also a comma-separated sequence of values and strings, e.g.: 3,15,4-7,19-20. I want to check whether the number is contained in the set corresponding to the sequence. For simplicity, assume no comma-separated elements intersect, and that the elements are sorted in ascending order.
Is there a simple way to do this in bash other than the brute-force naive way? Some shell utility which does something like that for me, maybe something related to lpr which already knows how to process page range sequences etc.
Is awk cheating?:
$ echo -n 3,15,4-7,19-20 |
awk -v val=6 -v RS=, -F- '(NF==1&&$1==val) || (NF==2&&$1<=val&&$2>=val)' -
Output:
4-7
Another version:
$ echo 19 |
awk -v ranges=3,15,4-7,19-20 '
BEGIN {
split(ranges,a,/,/)
}
{
for(i in a) {
n=split(a[i],b,/-/)
if((n==1 && $1==a[i]) || (n==2 && $1>=b[1] && $1<=b[2]))
print a[i]
}
}' -
Outputs:
19-20
The latter is better as you can feed it more values from a file etc. Then again the former is shorter. :D
Pure bash:
check() {
IFS=, a=($2)
for b in "${a[#]}"; do
IFS=- c=($b); c+=(${c[0]})
(( $1 >= c[0] && $1 <= c[1] )) && break
done
}
$ check 6 '3,15,4-7,19-20' && echo "yes" || echo "no"
yes
$ check 42 '3,15,4-7,19-20' && echo "yes" || echo "no"
no
As bash is tagged, why not just
inrange() { for r in ${2//,/ }; do ((${r%-*}<=$1 && $1<=${r#*-})) && break; done; }
Then test it as usual:
$ inrange 6 3,15,4-7,19-20 && echo yes || echo no
yes
$ inrange 42 3,15,4-7,19-20 && echo yes || echo no
no
A function based on #JamesBrown's method:
function match_in_range_seq {
(( $# == 2 )) && [[ -n "$(echo -n "$2" | awk -v val="$1" -v RS=, -F- '(NF==1&&$1==val) || (NF==2&&$1<=val&&$2>=val)' - )" ]]
}
Will return 0 (in $?) if the second argument (the range sequence) contains the first argument, 1 otherwise.
Another awk idea using two input (-v) variables:
# use of function wrapper is optional but cleaner for the follow-on test run
in_range() {
awk -v value="$1" -v r="$2" '
BEGIN { n=split(r,ranges,",")
for (i=1;i<=n;i++) {
low=high=ranges[i]
if (ranges[i] ~ "-") {
split(ranges[i],x,"-")
low=x[1]
high=x[2]
}
if (value >= low && value <= high) {
print value,"found in the range:",ranges[i]
exit
}
}
}'
}
NOTE: the exit assumes no overlapping ranges, ie, value will not be found in more than one 'range'
Take for a test spin:
ranges='3,15,4-7,19-20'
for value in 1 6 15 32
do
echo "########### value = ${value}"
in_range "${value}" "${ranges}"
done
This generates:
########### value = 1
########### value = 6
6 found in the range: 4-7
########### value = 15
15 found in the range: 15
########### value = 32
NOTES:
OP did not mention what to generate as output if no range match is found; code could be modified to output a 'not found' message as needed
in a comment OP mentioned possibly running the search for a number of values; code could be modified to support such a requirement but would need more input (eg, format of list of values, desired output and how to be used/captured by calling process, etc)
Sorry if I don't write good, it's my first post.
I have a list in one file with the name, id, marks etc of students (see below):
And I want to calculate the average mark in another file, but I don't know how to take only the marks and write the average in another file.
Thanks;
#name surname student_index_number course_group_id lecturer_id list_of_marks
athos musketeer 1 1 1 3,4,5,3.5
porthos musketeer 2 1 1 2,5,3.5
aramis musketeer 3 2 2 2,1,4,5
while read line; do
echo "$line" | cut -f 6 -d ' '
done<main_list
awk 'NR>1{n=split($NF,a,",");for(i=1;i<=n;i++){s+=a[i]} ;print $1,s/n;s=0}' input
athos 3.875
porthos 3.5
aramis 3
For all the lines except header(NR>1 will filter out header) , pick up the last column and split into smaller numbers by comma. Using for loop sum the value of all the marks and then divid by the total subject number.
Something like (untested)
awk '{ n = split($6, a, ","); total=0; for (v in a) total += a[v]; print total / n }' main_list
In pure BASH solution, could you please try following once.
while read first second third fourth fifth sixth
do
if [[ "$first" =~ (^#) ]]
then
continue
fi
count="${sixth//[^,]}"
val=$(echo "(${#count}+1)" | bc)
echo "scale=2; (${sixth//,/+})/$val" | bc
done < "Input_file"
Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):
alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f
Note that each page has a random number of lines.
Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.
After a long time researching the solution I finally came across this piece of code:
function get_page_from_line
{
local nline="$1"
local input_file="$2"
local npag=0
local ln=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( ++npag ))
ln=$(echo -n "$page" | wc -l)
total=$(( total + ln ))
if [ $total -ge $nline ]; then
echo "${npag}"
return
fi
done < "$input_file"
echo "0"
return
}
But, unfortunately, this solution proved to be very slow in some cases.
Any better solution ?
Thanks!
The idea to use read -d $'\f' and then to count the lines is good.
This version migth appear not ellegant: if nline is greater than or equal to the number of lines in the file, then the file is read twice.
Give it a try, because it is super fast:
function get_page_from_line ()
{
local nline="${1}"
local input_file="${2}"
if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
printf "0\n"
else
printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
fi
}
Performance of awk is better than the above bash version. awk was created for such text processing.
Give this tested version a try:
function get_page_from_line ()
{
awk -v nline="${1}" '
BEGIN {
npag=1;
}
{
if (index($0,"\f")>0) {
npag++;
}
if (NR==nline) {
print npag;
linefound=1;
exit;
}
}
END {
if (!linefound) {
print 0;
}
}' "${2}"
}
When \f is encountered, the page number is increased.
NR is the current line number.
----
For history, there is another bash version.
This version is using only built-it commands to count the lines in current page.
The speedtest.sh that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:
function get_page_from_line ()
{
local nline="$1"
local input_file="$2"
local npag=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( npag + 1 ))
IFS=$'\n'
for line in ${page}
do
total=$(( total + 1 ))
if [[ total -eq nline ]] ; then
printf "%d\n" ${npag}
unset IFS
return
fi
done
unset IFS
done < "$input_file"
printf "0\n"
return
}
awk to the rescue!
awk -v RS='\f' -v n=09 '$0~"^"n"." || $0~"\n"n"." {print NR}' file
3
updated anchoring as commented below.
$ for i in $(seq -w 12); do awk -v RS='\f' -v n="$i"
'$0~"^"n"." || $0~"\n"n"." {print n,"->",NR}' file; done
01 -> 1
02 -> 1
03 -> 1
04 -> 2
05 -> 2
06 -> 2
07 -> 2
08 -> 2
09 -> 3
10 -> 3
11 -> 3
12 -> 3
A script of similar length can be written in bash itself to locate and respond to the embedded <form-feed>'s contained in a file. (it will work for POSIX shell as well, with substitute for string index and expr for math) For example,
#!/bin/bash
declare -i ln=1 ## line count
declare -i pg=1 ## page count
fname="${1:-/dev/stdin}" ## read from file or stdin
printf "\nln:pg text\n" ## print header
while read -r l; do ## read each line
if [ ${l:0:1} = $'\f' ]; then ## if form-feed found
((pg++))
printf "<ff>\n%2s:%2s '%s'\n" "$ln" "$pg" "${l:1}"
else
printf "%2s:%2s '%s'\n" "$ln" "$pg" "$l"
fi
((ln++))
done < "$fname"
Example Input File
The simple input file with embedded <form-feed>'s was create with:
$ echo -e "a\nb\nc\n\fd\ne\nf\ng\nh\n\fi\nj\nk\nl" > dat/affex.txt
Which when output gives:
$ cat dat/affex.txt
a
b
c
d
e
f
g
h
i
j
k
l
Example Use/Output
$ bash affex.sh <dat/affex.txt
ln:pg text
1: 1 'a'
2: 1 'b'
3: 1 'c'
<ff>
4: 2 'd'
5: 2 'e'
6: 2 'f'
7: 2 'g'
8: 2 'h'
<ff>
9: 3 'i'
10: 3 'j'
11: 3 'k'
12: 3 'l'
With Awk, you can define RS (the record separator, default newline) to form feed (\f) and IFS (the input field separator, default any sequence of horizontal whitespace) to newline (\n) and obtain the number of lines as the number of "fields" in a "record" which is a "page".
The placement of form feeds in your data will produce some empty lines within a page so the counts are off where that happens.
awk -F '\n' -v RS='\f' '{ print NF }' file
You could reduce the number by one if $NF == "", and perhaps pass in the number of the desired page as a variable:
awk -F '\n' -v RS='\f' -v p="2" 'NR==p { print NF - ($NF == "") }' file
To obtain the page number for a particular line, just feed head -n number to the script, or loop over the numbers until you have accrued the sum of lines.
line=1
page=1
for count in $(awk -F '\n' -v RS='\f' '{ print NF - ($NF == "") }' file); do
old=$line
((line += count))
echo "Lines $old through line are on page $page"
((page++)
done
This gnu awk script prints the "page" for the linenumber given as command line argument:
BEGIN { ffcount=1;
search = ARGV[2]
delete ARGV[2]
if (!search ) {
print "Please provide linenumber as argument"
exit(1);
}
}
$1 ~ search { printf( "line %s is on page %d\n", search, ffcount) }
/[\f]/ { ffcount++ }
Use it like awk -f formfeeds.awk formfeeds.txt 05 where formfeeds.awk is the script, formfeeds.txt is the file and '05' is a linenumber.
The BEGIN rule deals mostly with the command line argument. The other rules are simple rules:
$1 ~ search applies when the first field matches the commandline argument stored in search
/[\f]/ applies when there is a formfeed