Write a shell script to calculate the number of employees - bash

I need to find the count of employees whose salary is less than average salary of all employees.
The file with the employee details will be given as a command line argument when your script will run
example->
Input: File:
empid;empname;salary
100;A;30000
102;B;45000
103;C;15000
104;D;40000
Output:
2
my solution->
f=`awk -v s=0 'BEGIN{FS=":"}{if(NR>1){s+=$3;row++}}END{print s/row}' $file`;
awk -v a="$f" 'BEGIN{FS=":"}{if(NR!=1 && $3<a)c++}END{print c}' $file;
This is what i have tried so far
but output comes out to be
0

This one-liner should solve the problem:
awk -F';' 'NR>1{e[$1]=$3;s+=$3}
END{avg=s/(NR-1);for(x in e)if(e[x]<avg)c++;print c}' file
If you run it with your example file, it is gonna print:
2
explanation:
NR>1 skip the header
e[$1]=$3;s+=$3 : build a hashtable, and sum the salarays
END{avg=s/(NR-1); : calc the averge
for(x in e)if(e[x]<avg)c++;print c :go through the hashtables, count the element, which value < avg and output.

Could you please try following.
awk '
BEGIN{
FS=";"
}
FNR==NR{
if(FNR>1)
{
total+=$NF
count++
}
next
}
FNR==1{
avg=total/count
}
avg>$NF
' Input_file Input_file

Your script is fine except it's setting FS=":"; it should be setting FS=";" since that is what is separating your fields in the input.

avg=$(awk -F";" 'NR>1 { s+=$3;i++} END { print s/i }' f)
awk -v avg=$avg -F";" 'NR>1 && $3<avg' f
1) Ignore header and compute Average, avg
2) Ignore header and if salary is less than avg print

file=$1
salary=`sed "s/;/ /" $file | sed "s/;/ /" | awk '{print $3}' | tail -n+2`
sum=0
n=0
for line in $salary
do
((sum+=line))
((n++))
done
avg=$((sum / n))
count=0
for line in $salary
do
if [ $line -lt $avg ]
then
((count++))
fi
done
echo "No. of Emp : $count"

Related

awk to calculate average of field in multiple text files and merge into one

I am trying to calculate the average of $2 in multiple test files in a directory and merge the output in one tab-delimeted output file. The output file is two fields, in which $1 is the file name that has been extracted by pref, and $2" is the calculated average with one decimal, rounded up. There is also a header in the outputSamplein$1andPercentin$2`. The below seems close but I am missing a few things (adding the header to the output, merging into one tab-delimeted file, and rounding to 3 decimal places), that I do not know how to do yet and not getting the desired output. Thank you :).
123_base.txt
AASS 99.81
ABAT 100.00
ABCA10 0.0
456_base.txt
ABL2 97.81
ABO 100.00
ACACA 99.82
desired output (tab-delimeted)
Sample Percent
123 66.6
456 99.2
Bash
for f in /home/cmccabe/Desktop/20x/percent/*.txt ; do
bname=$(basename $f)
pref=${bname%%_base_*.txt}
awk -v OFS='\t' '{ sum += $2 } END { if (NR > 0) print sum / NR }' $f /home/cmccabe/Desktop/NGS/bed/bedtools/IDP_total_target_length_by_panel/IDP_unix_trim_total_target_length.bed > /home/cmccabe/Desktop/20x/coverage/${pref}_average.txt
done
This one uses GNU awk, which provides handy BEGINFILE and ENDFILE events:
gawk '
BEGIN {print "Sample\tPercent"}
BEGINFILE {sample = FILENAME; sub(/_.*/,"",sample); sum = n = 0}
{sum += $2; n++}
ENDFILE {printf "%s\t%.1f\n", sample, sum/n}
' 123_base.txt 456_base.txt
If you're giving a pattern with the directory attached, I'd get the sample name like this:
match(FILENAME, /^.*\/([^_]+)/, m); sample = m[1]
and then, yes this is OK: gawk '...' /path/to/*_base.txt
And to steal against division by zero, inspired by James Brown's answer:
ENDFILE {printf "%s\t%.1f\n", sample, n==0 ? 0 : sum/n}
with perl
$ perl -ane '
BEGIN{ print "Sample\tPercent\n" }
$c++; $sum += $F[1];
if(eof)
{
($pref) = $ARGV=~/(.*)_base/;
printf "%s\t%.1f\n", $pref, $sum/$c;
$c = 0; $sum = 0;
}' 123_base.txt 456_base.txt
Sample Percent
123 66.6
456 99.2
print header using BEGIN block
-a option would split input line on spaces and save to #F array
For each line, increment counter and add to sum variable
If end of file eof is detected, print in required format
$ARGV contains current filename being read
If full path of filename is passed but only filename should be used to get pref, then use this line instead
($pref) = $ARGV=~/.*\/\K(.*)_base/;
In awk. Notice printf "%3.3s" to truncate the filename after 3rd char:
$ cat ave.awk
BEGIN {print "Sample", "Percent"} # header
BEGINFILE {s=c=0} # at the start of every file reset
{s+=$2; c++} # sum and count hits
ENDFILE{if(c>0) printf "%3.3s%s%.1f\n", FILENAME, OFS, s/c}
# above output if more than 0 lines
Run it:
$ touch empty_base.txt # test for division by zero
$ awk -f ave.awk 123_base.txt 123_base.txt empty_base.txt
Sample Percent
123 66.6
456 99.2
another awk
$ awk -v OFS='\t' '{f=FILENAME;sub(/_.*/,"",f);
a[f]+=$2; c[f]++}
END{print "Sample","Percent";
for(k in a) print k, sprintf("%.1f",a[k]/c[k])}' {123,456}_base.txt
Sample Percent
456 99.2
123 66.6

awk custom printf command generation

TL;DR - I have a variable which looks like a format specifier ($TEMP) which I need to use with awk printf.
So by doing this:
awk '-v foo="$temp" {....{printf foo} else {print $_}}' tempfile1.txt > tmp && mv tmp tempfile1.txt
Bash should see this:
awk '{.....{printf "%-5s %-6s %...\n", $1, $2, $3....} else {print $_}}' tempfile1.txt > tmp && mv tmp tempfile1.txt
Sample Input:
col1 col2 col3
aourwyo5637[dfs] 67tyd 8746.0000
tomsd - 4
o938743[34* 0 834.92
.
.
.
Expected Output:
col1 col2 col3
aourwyo5637[dfs] 67tyd 8746.0000
tomsd - 4
o938743[34* 0646sggg 834.92
.
.
.
Long Version
I am new to scripting and after over 5 hours of scouring the internet and doing what I believe is a patchwork of information, I have hit a brick wall.
Scenario:
So I have a multiple random tables I need to open in a directory. Since I do not know anything about a given table except that I need to format all data that is on line 4 and all lines after line 14 of the file.
I need to make a custom printf command in awk on the fly so the padding for each column is equal to a value (say 5 SPACES) so the table looks pretty once I open it up.
This is what I am come up with so far:
awk '{
for (i=1;i<=NF;i++)
{
max_length=length($i);
if ( max_length > linesize[i] )
{
linesize[i]=max_length+5;
}
}}
END{
for (i = 1; i<=length(linesize); i++)
{
print linesize[i] >> "tempfile1.txt"
}
}' file1.txt
# remove all blank lines in tempfile1.txt
awk 'NF' tempfile1.txt > tmp && mv tmp tempfile1.txt
# Get number of entries in tempfile1.txt
n=`awk 'END {print NR}' tempfile1.txt`
# This for loop generates the pattern I need for the printf command
declare -a a
for((i=0;i<$n;i++))
do
a[i]=`awk -v i=$((i+1)) 'FNR == i {print}' tempfile1.txt`
temp+=%-${a[i]}s' '
temp2+='$'$((i+1))', '
#echo "${a[$i]}";
#echo "$sub"
done
temp='"'${temp::-2}'\n", '
# echo "$temp"
temp=$temp${temp2::-2}''
# echo "$temp"
awk <something here>
# Tried the one below and it gave an error
awk -v tem="$temp" '{printf {tem}}
So ideally what I would like is the awk command is to look like this by simply putting the bash variable temp in the awk command.
So by doing this:
awk '-v foo="$temp" {if(FNR >=14 || FNR == 4) {printf foo} else {print $_}}' tempfile1.txt > tmp && mv tmp tempfile1.txt
Bash should see this:
awk '{if(FNR >=14 || FNR == 4) {printf "%-5s %-6s %...\n", $1, $2, $3....} else {print $_}}' tempfile1.txt > tmp && mv tmp tempfile1.txt
It sounds like this MIGHT be what you want but it's still hard to tell from your question:
$ cat tst.awk
BEGIN { OFS=" " }
NR==FNR {
for (i=1;i<=NF;i++) {
width[i] = (width[i] > length($i) ? width[i] : length($i))
}
next
}
{
for (i=1;i<=NF;i++) {
printf "%-*s%s", width[i], $i, (i<NF?OFS:ORS)
}
}
$ awk -f tst.awk file file
col1 col2 col3
aourwyo5637[dfs] 67tyd 8746.0000
tomsd - 4
o938743[34* 0 834.92
I ran it against the sample input from your question after removing all the spurious .s.
# Tried the one below and it gave an error
awk -v tem="$temp" '{printf {tem}}
' at end of line is missing
{tem} is wrong; just write tem
printf's , expr-list is missing
\n is missing
Corrected:
awk -v tem="$temp" "{printf tem\"\n\", $temp2 0}"
or
awk "{printf \"$temp\n\", $temp2 0}"
(simpler).

awk script: find and print max value and file name containing max value

I am trying to find a file with in a dir that contains the largest number (at position 3rd row 3rd column). I want both the max value and the file name containing max value printed. This is what i have right now
find ./ -name sample-file\* -exec sed '3!d' {} \; | awk '{print $3}' | awk 'n<$1{n=$1}END{print n}'
This gets me the max value, but i also want the file name containing the max value. Print along with this.
Current output:
When run for dir1:
487987
When run for dir2:
9879889
I want the output to be like this
when run for dir1:
file16 487987
when run for dir2:
file23 9879889
Appreciate some inputs on this. Thanks
awk script:
BEGIN {
n = 0
fn = ""
}
(FNR == 3) && ($3 > n) {
n = $3
fn = FILENAME
}
END {
printf "%s: %s\n", fn, n
}
use as awk -f <file.awk> sample-file*.
Could probably be more efficient with nextfile after the fn assignment in the FNR block or similar mechanisms to short-circuit the rest of the other lines in each input file.
zcat and shell
declare n=0 fn=
for i in sample-file*; do
t=$(zcat "$i" | awk 'NR == 3 {print $3; exit}')
if [ $t -gt $n ]; then
n=$t
fn=$i
fi
done
echo "$fn: $n"

using awk to list out report according to Type

i am trying to use awk to do a payroll report but i am not very sure how to go about doing. Have tried the following but doesn't seems to be working properly. I am stuck because the code i written managed to sort out the "Salaried" but still list out the other data instead of only name and Pay.
EDIT: i've tried out the calculation part.. but does not know how it works too
need the result to show as :
1) sort out the type 'Salaried" , Hourly and Commissioned
eg:
Salaried:
Frank $2333
Mary $1111
Total salary: $3444
----------------------
Grand Total: $3444
code:
echo "***** payroll report ****"
awk -F',' '{print $2}' | grep "Salaried" $PAYROLL
totalcost=0
salariedcost=0
for i in `grep $j $PAYROLL | cut -d "," -f6`
do
let "salariedcost = salariedcost + $i"
done
echo "Salaried Cost: \$${salariedcost}"
let "totalcost = totalcost + salariedcost"
echo "Total Cost: \$$totalcost"
echo -en "Hit [Enter] to return to main menu..."
read
.txt file :
sequence as followed : [id], [name],[title],[phone],[type],[pay]
3,Frank,CFO,91111453,Salaried,2333
1,Mary,CEO,93424222,Salaried,1111
5,John,Sales user,9321312,Commission,9999
7,Chris,Admin,98888753,Hourly[122]
Try using awk
awk -F, 'BEGIN {print "Salaried:"} $5=="Salaried"{total+=$6; printf "%s\t$%s\n", $2, $6} END {printf "Total salary: $%s", total}' $PAYROLL
Output:
Salaried:
Frank $2333
Mary $1111
Total salary: $3444
awk -F',' '{print $2}' | grep "Salaried" $PAYROLL
This tells grep to open the file named in $PAYROLL, search for the string Salaried, and print the full lines when it finds it. grep the exits, and awk is killed by a SIGPIPE. What you were probably going for:
awk -F, '{print $2}' "$PAYROLL" | grep Salaried
Note slight changes to quoting.
But awk does pattern matching just like grep:
awk -F, '/Salaried/{print $2}' "$PAYROLL"
For the whole program you'll want something like this:
awk -F, '
# Before processing the first line, print out the header
BEGIN {
print "Salaried:"
}
# Lines matching Salaried
/Salaried/ {
# Print name <tab> salary
print $2 "\t$" $6
# Add their salary to our salary total
salaries += $6
}
# Every line, add cost to total
{
total += $6
}
# After processing all lines
END {
# Print the salary total, separator, and grand total.
print "Total Salary: $" salaries
print "--------------------"
print "Grand total: $" total
}' file.txt

Filter a file using shell script tools

I have a file which contents are
E006:Jane:HR:9800:Asst
E005:Bob:HR:5600:Exe
E002:Barney:Purc:2300:PSE
E009:Miffy:Purc:3600:Mngr
E001:Franny:Accts:7670:Mngr
E003:Ostwald:Mrktg:4800:Trainee
E004:Pearl:Accts:1800:SSE
E009:Lala:Mrktg:6566:SE
E018:Popoye:Sales:6400:QAE
E007:Olan:Sales:5800:Asst
I want to fetch List all employees whose emp codes are between E001 and E018 using command including pipes is it possible to get ?
Use sed:
sed -n -e '/^E001:/,/^E018:/p' data.txt
That is, print the lines that are literally between those lines that start with E001 and E018.
If you want to get the employees that are numerically between those, one way to do that would be to do comparisons inline using something like awk (as suggested by hochl). Or, you could take this approach preceded by a sort (if the lines are not already sorted).
sort data.txt | sed -n -e '/^E001:/,/^E018:/p'
You can use awk for such cases:
$ gawk 'BEGIN { FS=":" } /^E([0-9]+)/ { n=substr($1, 2)+0; if (n >= 6 && n <= 18) { print } }' < data.txt
E006:Jane:HR:9800:Asst
E009:Miffy:Purc:3600:Mngr
E009:Lala:Mrktg:6566:SE
E018:Popoye:Sales:6400:QAE
E007:Olan:Sales:5800:Asst
Is that the result you want? This example intentionally only prints employees between 6 and 18 to show that it filters out records. You may print some fields only using $1 or $2 as in print $1 " " $2.
You can try something like this: cut -b2- | awk '{ if ($1 < 18) print "E" $0 }'
Just do string comparison: Since all your sample data matches, I changed the boundaries for illustration
awk -F: '"E004" <= $1 && $1 <= "E009" {print}'
output
E006:Jane:HR:9800:Asst
E005:Bob:HR:5600:Exe
E009:Miffy:Purc:3600:Mngr
E004:Pearl:Accts:1800:SSE
E009:Lala:Mrktg:6566:SE
E007:Olan:Sales:5800:Asst
You can pass the strings as variables if you don't want to hard-code them in the awk script
awk -F: -v start=E004 -v stop=E009 'start <= $1 && $1 <= stop {print}'

Resources