UNIX average of specific employee as per designation - bash

This is an example of a text file to be given as input
Name,Designation,Salary
Hari,Engineer,35000
Suresh,Consultant,80000
Umesh,Engineer,45500
Maya,Analyst,50000
Guru,Consultant,100000
Sushma,Engineer,30000
Mohan,Engineer,30000
My code should be able to run find the average salary of particular employee's designation. For example,
bash script.sh employees.txt Analyst
Then my output should be
50000
My current code to find just the average of all employees doesn't work. I am new to shell. This is my current code
count="$(tail -n 1 salary.txt | grep -o '^[^\s]\+')"
echo "$count"
salary="$(grep -o '[^ ]\+$' salary.txt | paste -sd+)"
echo "$salary"
echo "($salary)/$count" | bc
I get empty values as results.

This is better done in awk:
awk -F, -v dgn='Engineer' '$2 == dgn{s += $3; ++c} END{printf "%.2f\n", s/c}' file.csv
35125.00

Could you please try following(since OP requested for script way, so adding it in a script way where passing 1st argument as Input_file name and 2nd argument as string whose avg is needed).
cat script.ksh
file="$1"
name="$2"
awk -F, -v field="$name" '{a[$2]+=$3;b[$2]++} END{for(i in a){if(i == field){print a[i]/b[i]}}}' "$file"
Now run the script as follwos.
./script.ksh Input_file Analyst
50000

GNU datamash is a useful tool for calculating this kind of thing:
$ datamash -sHt, groupby 2 mean 3 < employees.txt
Combine with grep to limit it to just the title you're interested in.

If you want to do this in the shell:
#!/bin/bash
file=$1
designation=$2
# code to validate user input here ...
sum=0
count=0
while IFS=, read -r n d s; do
if [[ ${designation,,} == "${d,,}" ]]; then
(( sum += s ))
(( count++ ))
fi
done < "$file"
if (( count == 0 )); then
echo "No $designation found in $file"
else
echo $((sum / count))
fi

Using Perl
perl -F, -lane ' if(/Engineer/) { $dsg+=$F[2];$c++ } END { print $dsg/$c } ' file
with your given inputs
$ cat john.txt
Name,Designation,Salary
Hari,Engineer,35000
Suresh,Consultant,80000
Umesh,Engineer,45500
Maya,Analyst,50000
Guru,Consultant,100000
Sushma,Engineer,30000
Mohan,Engineer,30000
$ perl -F, -lane ' if(/Engineer/) { $dsg+=$F[2];$c++ } END { print $dsg/$c } ' john.txt
35125
$

Related

Shell Script for combining 3 files

I have 3 files with below data
$cat File1.txt
Apple,May
Orange,June
Mango,July
$cat File2.txt
Apple,Jan
Grapes,June
$cat File3.txt
Apple,March
Mango,Feb
Banana,Dec
I require the below output file.
$Output_file.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec
Requirement here is the take out the first column and then common data in column 1 in each file need to be searched and second column needs to be "|" separated. If there is no common column, then same needs to be printed in the output file.
I have tried putting this in a while loop, but it takes time as the file size increase. Wanted a simple solution using shell script.
This should work :
#!/bin/bash
for FRUIT in $( cat "$#" | cut -d "," -f 1 | sort | uniq )
do
echo -ne "${FRUIT},"
awk -F "," "\$1 == \"$FRUIT\" {printf(\"%s|\",\$2)}" "$#" | sed 's/.$/\'$'\n/'
done
Run it as :
$ ./script.sh File1.txt File2.txt File3.txt
A purely native-bash solution (calling no external tools, and thus limited only by the performance constraints of bash itself) might look like:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4 or newer required" >&2; exit 1;; esac
declare -A items=( )
for file in "$#"; do
while IFS=, read -r key value; do
items[$key]+="|$value"
done <"$file"
done
for key in "${!items[#]}"; do
value=${items[$key]}
printf '%s,%s\n' "$key" "${value#'|'}"
done
...called as ./yourscript File1.txt File2.txt File3.txt
This is fairly easy done with a single awk command:
awk 'BEGIN{FS=OFS=","} {a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i in a) print i, a[i]}' File{1,2,3}.txt
Orange,June
Banana,Dec
Apple,May|Jan|March
Grapes,June
Mango,July|Feb
If you want output in the same order as strings appear in original files then use this awk:
awk 'BEGIN{FS=OFS=","} !($1 in a) {b[++n] = $1}
{a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i=1; i<=n; i++) print b[i], a[b[i]]}' File{1,2,3}.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec

Simple bash script to split csv file by week number

I'm trying to separate a large pipe-delimited file based on a week number field. The file contains data for a full year thus having 53 weeks. I am hoping to create a loop that does the following:
1) check if week number is less than 10 - if it is paste a '0' in front
2) use grep to send the rows to a file (ie `grep '|01|' bigFile.txt > smallFile.txt` )
3) gzip the smaller file (ie `gzip smallFile.txt`)
4) repeat
Is there a resource that would show how to do this?
EDIT :
Data looks like this:
1|#gmail|1|0|0|0|1|01|com
1|#yahoo|0|1|0|0|0|27|com
The column I care about is the 2nd from the right.
EDIT 2:
Here's the script I'm using but it's not functioning:
for (( i = 1; i <= 12; i++ )); do
#statements
echo 'i :'$i
q=$i
# echo $q
# $q==10
if [[ q -lt 10 ]]; then
#statements
k='0'$q
echo $k
grep '|$k|' 20150226_train.txt > 'weeks_files/week'$k
gzip weeks_files/week $k
fi
if [[ q -gt 9 ]]; then
#statements
echo $q
grep \'|$q|\' 20150226_train.txt > 'weeks_files/week'$q
gzip 'weeks_files/week'$q
fi
done
Very simple in awk ...
awk -F'|' '{ print > ("smallfile-" $(NF-1) ".txt";) }' bigfile.txt
Edit: brackets added for "original-awk".
You're almost there.
#!/bin/bash
for (( i = 1; i <= 12; i++ )); do
#statements
echo 'i :'$i
q=$i
# echo $q
# $q==10
#OLD if [[ q -lt 10 ]]; then
if [[ $q -lt 10 ]]; then
#statements
k='0'$q
echo $k
#OLD grep '|$k|' 20150226_train.txt > 'weeks_files/week'$k
grep "|$k|" 20150226_train.txt > 'weeks_files/week'$k
#OLD gzip weeks_files/week $k
gzip weeks_files/week$k
#OLD fi
#OLD if [[ q -gt 9 ]]; then
elif [[ $q -gt 9 ]] ; then
#statements
echo $q
#OLD grep \'|$q|\' 20150226_train.txt > 'weeks_files/week'$q
grep "|$q|" 20150226_train.txt > 'weeks_files/week'$q
gzip 'weeks_files/week'$q
fi
done
You didn't alway use $ in front of your variable values. You can only get away with using k or q without a $ inside the shell arthimetic substitution feature, ie z=$(( x+k)) or just to operate on a variable like (( k++ )). There are others.
You need to learn the difference between single quoting and dbl-quoting. You need to use dbl-quoting when you want a value substituted for a variable, as in your lines
grep "|$q|" 20150226_train.txt > 'weeks_files/week'$q
and others.
I'm guessing that your use of grep \'|$q|\' 20150226_train.txt was an attempt to get the value of $q.
The way to get comfortable with debugging this sort of situation is to set the shell debugging option with set -x (turn it off with set +x). You'll see each line that is executed with the values substituted for the variables. Advanced debugging requires echo "varof Interset now = $var" (print statements). Also, you can use set -vx (and set +vx) to see each line or block of code before it is executed, and then the -x output will show which lines where acctually executed. For your script, you'd see the whole if ... elfi ...fi block printed, and then just the lines of -x output with values for variables. It can be confusing, even after years of looking at it. ;-)
So you can go thru and remove all lines with the prefix #OLD, and I'm hoping your code will work for you.
IHTH
mkdir -p weeks_files &&
awk -F'|' '
{ file=sprintf("weeks_files/week%2d",$(NF-1)); print > file }
!seen[file]++ { print file }
' 20150226_train.txt |
xargs gzip
If your data is ordered so that all of the rows for a given week number are contiguous you can make it simpler and more efficient:
mkdir -p weeks_files &&
awk -F'|' '
$(NF-1) != prev { file=sprintf("weeks_files/week%2d",$(NF-1)); print file }
{ print > file; prev=$(NF-1) }
' 20150226_train.txt |
xargs gzip
There are certainly a number of approaches - the 'awk' line below will reformat your data. If you take a sequential approach, then:
1) awk to reformat
awk -F '|' '{printf "%s|%s|%s|%s|%s|%s|%s|%02d|%s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9}' SOURCE_FILE > bigFile.txt
2) loop through the weeks, create small file an zip it
for N in {01..53}
do
grep "|${N}|" bigFile.txt > smallFile.${N}.txt
gzip smallFile.${N}.txt
done
3) test script showing reformat step
#!/bin/bash
function show_data {
# Data set w/9 'fields'
# 1| 2 |3|4|5|6|7| 8|9
cat << EOM
1|#gmail|1|0|0|0|1|01|com
1|#gmail|1|0|0|0|1|2|com
1|#gmail|1|0|0|0|1|5|com
1|#yahoo|0|1|0|0|0|27|com
EOM
}
###
function stars {
echo "## $# ##"
}
###
stars "Raw data"
show_data
stars "Modified data"
# 1| 2| 3| 4| 5| 6| 7| 8|9 ##
show_data | awk -F '|' '{printf "%s|%s|%s|%s|%s|%s|%s|%02d|%s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9}'
Sample run:
$ bash test.sh
## Raw data ##
1|#gmail|1|0|0|0|1|01|com
1|#gmail|1|0|0|0|1|2|com
1|#gmail|1|0|0|0|1|5|com
1|#yahoo|0|1|0|0|0|27|com
## Modified data ##
1|#gmail|1|0|0|0|1|01|com
1|#gmail|1|0|0|0|1|02|com
1|#gmail|1|0|0|0|1|05|com
1|#yahoo|0|1|0|0|0|27|com

How to pass filename through variable to be read it by awk

Good day,
I was wondering how to pass the filename to awk as variable, in order to awk read it.
So far I have done:
echo file1 > Aenumerar
echo file2 >> Aenumerar
echo file3 >> Aenumerar
AE=`grep -c '' Aenumerar`
r=1
while [ $r -le $AE ]; do
lista=`awk "NR==$r {print $0}" Aenumerar`
AEList=`grep -c '' $lista`
s=1
while [ $s -le $AEList ]; do
word=`awk -v var=$s 'NR==var {print $1}' $lista`
echo $word
let "s = s + 1"
done
let "r = r + 1"
done
Thanks so much in advance for any clue or other simple way to do it with bash command line
Instead of:
awk "NR==$r {print $0}" Aenumerar
You need to use:
awk -v r="$r" 'NR==r' Aenumerar
Judging by what you've posted, you don't actually need all the NR stuff; you can replace your whole script with this:
while IFS= read -r lista ; do
awk '{print $1}' "$lista"
done < Aenumerar
(This will print the first field of each line in each of file1, file2, file3. I think that's what you're trying to do?)

Need help in shell script

I am new into shell scripting and learning it for past 2 month. I need your help in tuning or providing any other solution either in sed or AWK for the below question.
"write a script to input the filename and display the content of file in such a manner that each line has only 10 characters.If line in a file exceeds 10 characters then display the rest of the line in next line."
I have written the below script and worked fine. But it took 2 hours for me to write it..(certainly not acceptable. Problem is i know the shell commands very well but still have not mastered the skills to put them into shell scripts :-( . Thanks.
#!/bin/bash
if [ $# -ne 1 ]; then
echo "USAGE: $0 $1"
exit 99;
fi
VAR1=$(echo "$1" | wc -c)
cat "$1" | while read line
do
[ $VAR1 -gt 10 ] && echo "$line" || echo "$line"|tr " " "\n"
done
Using sed
sed 's/........../&\n/g' file.txt
Using grep
grep -oE '.{1,10}' file.txt
Using dd
cat file.txt | dd cbs=10 conv=unblock 2>/dev/null
Using awk?
awk 'BEGIN {FS=""} {for (i=1; i<=NF; i++) if (i % 10 == 0) printf "%s\n", $i ; else if (i == NF) print "\n" ; else printf "%s", $i} ' inputs.txt
This works, but I have a feeling that this is not the most optimal way of using awk :-P

How can I specify a row in awk in for loop?

I'm using the following awk command:
my_command | awk -F "[[:space:]]{2,}+" 'NR>1 {print $2}' | egrep "^[[:alnum:]]"
which successfully returns my data like this:
fileName1
file Name 1
file Nameone
f i l e Name 1
So as you can see some file names have spaces. This is fine as I'm just trying to echo the file name (nothing special). The problem is calling that specific row within a loop. I'm trying to do it this way:
i=1
for num in $rows
do
fileName=$(my_command | awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]])"
echo "$num $fileName"
$((i++))
done
But my output is always null
I've also tried using awk -v record=$i and then printing $record but I get the below results.
f i l e Name 1
EDIT
Sorry for the confusion: rows is a variable that list ids like this 11 12 13
and each one of those ids ties to a file name. My command without doing any parsing looks like this:
id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3
I can only use the id field to run a the command that I need, but I want to use the File Info field to notify the user of the actual File that the command is being executed against.
I think your $i does not expand as expected. You should quote your arguments this way:
fileName=$(my_command | awk -F "[[:space:]]{2,}+" "NR==$i {print \$2}" | egrep "^[[:alnum:]]")
And you forgot the other ).
EDIT
As an update to your requirement you could just pass the rows to a single awk command instead of a repeatitive one inside a loop:
#!/bin/bash
ROWS=(11 12)
function my_command {
# This function just emulates my_command and should be removed later.
echo " id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3"
}
awk -- '
BEGIN {
input = ARGV[1]
while (getline line < input) {
sub(/^ +/, "", line)
split(line, a, / +/)
for (i = 2; i < ARGC; ++i) {
if (a[1] == ARGV[i]) {
printf "%s %s\n", a[1], a[2]
break
}
}
}
exit
}
' <(my_command) "${ROWS[#]}"
That awk command could be condensed to one line as:
awk -- 'BEGIN { input = ARGV[1]; while (getline line < input) { sub(/^ +/, "", line); split(line, a, / +/); for (i = 2; i < ARGC; ++i) { if (a[1] == ARGV[i]) {; printf "%s %s\n", a[1], a[2]; break; }; }; }; exit; }' <(my_command) "${ROWS[#]}"
Or better yet just use Bash instead as a whole:
#!/bin/bash
ROWS=(11 12)
while IFS=$' ' read -r LINE; do
IFS='|' read -ra FIELDS <<< "${LINE// +( )/|}"
for R in "${ROWS[#]}"; do
if [[ ${FIELDS[0]} == "$R" ]]; then
echo "${R} ${FIELDS[1]}"
break
fi
done
done < <(my_command)
It should give an output like:
11 File Name1
12 Fi leNa me2
Shell variables aren't expanded inside single-quoted strings. Use the -v option to set an awk variable to the shell variable:
fileName=$(my_command | awk -v i=$i -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]])"
This method avoids having to escape all the $ characters in the awk script, as required in konsolebox's answer.
As you already heard, you need to populate an awk variable from your shell variable to be able to use the desired value within the awk script so thi:
awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]]"
should be this:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
Also, though, you don't need awk AND grep since awk can do anything grep van do so you can change this part of your script:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
to this:
awk -v i="$i" -F "[[:space:]]{2,}+" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
and you don't need a + after a numeric range so you can change {2,}+ to just {2,}:
awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
Most importantly, though, instead of invoking awk once for every invocation of my_command, you can just invoke it once for all of them, i.e. instead of this (assuming this does what you want):
i=1
for num in rows
do
fileName=$(my_command | awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}')
echo "$num $fileName"
$((i++))
done
you can do something more like this:
for num in rows
do
my_command
done |
awk -F '[[:space:]]{2,}' '$2~/^[[:alnum:]]/{print NR, $2}'
I say "something like" because you don't tell us what "my_command", "rows" or "num" are so I can't be precise but hopefully you see the pattern. If you give us more info we can provide a better answer.
It's pretty inefficient to rerun my_command (and awk) every time through the loop just to extract one line from its output. Especially when all you're doing is printing out part of each line in order. (I'm assuming that my_command really is exactly the same command and produces the same output every time through your loop.)
If that's the case, this one-liner should do the trick:
paste -d' ' <(printf '%s\n' $rows) <(my_command |
awk -F '[[:space:]]{2,}+' '($2 ~ /^[::alnum::]/) {print $2}')

Resources