Can somebody experienced have a look at my bash script and advice how to make it simplier? - bash

my task is to create a script that displays the frequency of random characters in the file. The output should display frequency of a to z (case insensitive) in percent.
I created the script below and I just wonder if there is a way how to make it simplier?
#!/bin/bash
echo Hello, please tell me in which file shall I count the letters:
read file
TOTAL=$( grep -o [[:alpha:]] $file | wc -l )
A=$( grep -io a $file | wc -l )
B=$( grep -io b $file | wc -l )
C=$( grep -io c $file | wc -l )
D=$( grep -io d $file | wc -l )
E=$( grep -io e $file | wc -l )
F=$( grep -io f $file | wc -l )
G=$( grep -io g $file | wc -l )
H=$( grep -io h $file | wc -l )
I=$( grep -io i $file | wc -l )
J=$( grep -io j $file | wc -l )
K=$( grep -io k $file | wc -l )
L=$( grep -io l $file | wc -l )
M=$( grep -io m $file | wc -l )
N=$( grep -io n $file | wc -l )
O=$( grep -io o $file | wc -l )
P=$( grep -io p $file | wc -l )
Q=$( grep -io q $file | wc -l )
R=$( grep -io R $file | wc -l )
S=$( grep -io s $file | wc -l )
T=$( grep -io t $file | wc -l )
U=$( grep -io u $file | wc -l )
V=$( grep -io v $file | wc -l )
W=$( grep -io w $file | wc -l )
X=$( grep -io x $file | wc -l )
Y=$( grep -io y $file | wc -l )
Z=$( grep -io z $file | wc -l )
echo Frequency of 'a': $(($A*100/$TOTAL))%
echo Frequency of 'b': $(($B*100/$TOTAL))%
echo Frequency of 'c': $(($C*100/$TOTAL))%
echo Frequency of 'd': $(($D*100/$TOTAL))%
echo Frequency of 'e': $(($E*100/$TOTAL))%
echo Frequency of 'f': $(($F*100/$TOTAL))%
echo Frequency of 'g': $(($G*100/$TOTAL))%
echo Frequency of 'h': $(($H*100/$TOTAL))%
echo Frequency of 'i': $(($I*100/$TOTAL))%
echo Frequency of 'j': $(($J*100/$TOTAL))%
echo Frequency of 'k': $(($K*100/$TOTAL))%
echo Frequency of 'l': $(($L*100/$TOTAL))%
echo Frequency of 'm': $(($M*100/$TOTAL))%
echo Frequency of 'n': $(($N*100/$TOTAL))%
echo Frequency of 'o': $(($O*100/$TOTAL))%
echo Frequency of 'p': $(($P*100/$TOTAL))%
echo Frequency of 'q': $(($Q*100/$TOTAL))%
echo Frequency of 'r': $(($R*100/$TOTAL))%
echo Frequency of 's': $(($S*100/$TOTAL))%
echo Frequency of 't': $(($T*100/$TOTAL))%
echo Frequency of 'u': $(($U*100/$TOTAL))%
echo Frequency of 'v': $(($V*100/$TOTAL))%
echo Frequency of 'w': $(($W*100/$TOTAL))%
echo Frequency of 'x': $(($X*100/$TOTAL))%
echo Frequency of 'y': $(($Y*100/$TOTAL))%
echo Frequency of 'z': $(($Z*100/$TOTAL))%
I considered using for loop as in below script which replace the first part of above script...but then I got stuck as I do not know if there is any way to work with those outputs further?
#!/bin/bash
echo File:
read file
TOTAL=$( grep -o [[:alpha:]] $file | wc -l )
for letter in {a..z}
do echo grep -io $letter $file | wc -l
done
I also want to ask if there is any way how to have output of my script with two decimal places?
This is my first script so please be merciful :) But I will be grateful for any feedback or advice how to get better.

You were almost there! Here's a solution with 2 variants, depending on the output you want and if you want to use bc.
#!/bin/bash
echo File:
read file
TOTAL=$( grep -o "[[:alpha:]]" "$file" | wc -l )
for letter in {a..z}
do
count=$(grep -io $letter "$file" | wc -l)
echo "Frequency of $letter : $(bc <<< "scale=2; $count*100/$TOTAL")%" # Variant with floats, requires bc
echo "Frequency of $letter : $(($count*100/$TOTAL))%" # Variant with integers
done

You can use the awk command inside your bash script
awk -vFS="" 'BEGIN{OFMT="%.2f"} {for(i=1;i<=NF;i++){ if($i~/[a-zA-Z]/) { w[tolower($i)]++} sum++} }END{for(i in w) print i,(100*w[i]/sum),"%"}'

Related

Counting the number of lines of many files (only .h, .c and .py files) in a directory using bash

I'm asked to write a script (using bash) that count the number of lines in files (but only C files (.h and .c) and python files (.py)) that are regrouped in a single directory. I've already tried with this code but my calculation is always wrong
let "sum = 0"
let "sum = sum + $(wc -l $1/*.c | tail --lines=1 | tr -dc '0-9')"
let "sum = sum + $(wc -l $1/*.h | tail --lines=1 | tr -dc '0-9')"
let "sum = sum + $(wc -l $1/*.py | tail --lines=1 | tr -dc '0-9')"
echo $sum >> manifest.txt
I must write the total in the "manifest.txt" file and the argument of my script is the path to the directory that contains the files.
If someone has another technique to compute this, I'd be very grateful.
Thank you !
You could also use a loop to aggregate the counts:
extensions=("*.h" "*.c" "*.py")
sum=0
for ext in ${extensions[#]} ; do
count=$(wc -l ${1}/${ext} | awk '{ print $1 }')
sum=$((sum+count))
done
echo "${sum}"
Version 1: step by step
#!/bin/bash
echo "Counting the total number of lines for all .c .h .py files in $1"
sum=0
num_py=$(wc -l $1/*.py | tail -1 | tr -dc '0-9')
num_c=$(wc -l $1/*.c | tail -1 | tr -dc '0-9')
num_h=$(wc -l $1/*.h | tail -1 | tr -dc '0-9')
sum=$(($num_py + $num_c + $num_h))
echo $sum >> manifest.txt
version 2: concise
#!/bin/bash
echo "Counting the total number of lines for all .c .h .py files in $1"
echo "$(( $(wc -l $1/*.py | tail -1 | tr -dc '0-9') + $(wc -l $1/*.c | tail -1 | tr -dc '0-9') + $(wc -l $1/*.h | tail -1 | tr -dc '0-9') ))" >> manifest.txt
version 3: loop over your desired files
#!/bin/bash
echo "Counting the total number of lines for all .c .h .py files in $1"
sum=0
for sfile in $1/*.{c,h,py}; do
sum=$(($sum+$(wc -l $sfile|tail -1|tr -dc '0-9')))
done
echo $sum >> manifest.txt
This is how arithmetic operations work: var = $((EXPR))
For example: $sum= $(($sum + $result ))
it is very common to miss the $ sign within the EXPR! Try not to forget them :)
This is the scripts that I use (with minor modifications):
files=( $(find . -mindepth 1 -maxdepth 1 -type f -iname "*.h" -iname "*.c" -iname "*.py") )
declare -i total=0
for file in "${files[#]}"; do
lines="$(wc -l < <(cat "$file"))"
echo -e "${lines}\t${file}"
total+="$lines"
done
echo -e "\n$total\ttotal"
Here is my version.
#!/usr/bin/env bash
shopt -s extglob nullglob
files=( "$1"/*.#(c|h|py) )
shopt -u extglob nullglob
while IFS= read -rd '' file_name; do
count=$(wc -l < "$file_name")
((sum+=count))
done< <(printf '%s\0' "${files[#]}")
echo "$sum" > manifest.txt
Needs some error checking, like if the argument is a directory or if it even exists at all, and so on.

expression using grep is giving all zeros

So I have an expression that I want to extract some lines from a text and count them. I can grep them as follows:
$ cat medsCounts_totals.csv | grep -E 'NumMeds": 0' | wc -l
Which is fine. Now I want to loop over with the string ...
$ for i in {0..10}; do expr="NumMeds\": $i"; echo $expr; done
However, when I try to use $expr
for i in {0..10}; do expr="NumMeds:\" $i"; cat medsCounts_totals.csv | grep -E "$expr" | wc -l ; done
I get nothing. How do I solve this problem in an elegant manner?
there is a typo in
for i in {0..10}; do expr="NumMeds:\" $i"; cat medsCounts_totals.csv | grep -E "$expr" | wc -l ; done
it should be
"NumMeds\": $i"

BASH: Remove newline for multiple commands

I need some help . I want the result will be
UP:N%:N%
but the current result is
UP:N%
:N%
this is the code.
#!/bin/bash
UP=$(pgrep mysql | wc -l);
if [ "$UP" -ne 1 ];
then
echo -n "DOWN"
else
echo -n "UP:"
fi
df -hl | grep 'sda1' | awk ' {percent+=$5;} END{print percent"%"}'| column -t && echo -n ":"
top -bn2 | grep "Cpu(s)" | \sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | \awk 'END{print 100 - $1"%"}'
You can use command substitution in your first sentence (notice you're creating a subshell in this way):
echo -n $(df -hl | grep 'sda1' | awk ' {percent+=$5;} END{print percent"%"}'| column -t ):

How to print number of occurances of a word in a file in unix

This is my shell script.
Given a directory, and a word, search the directory and print the absolute path of the file that has the maximum occurrences of the word and also print the number of occurrences.
I have written the following script
#!/bin/bash
if [[ -n $(find / -type d -name $1 2> /dev/null) ]]
then
echo "Directory exists"
x=` echo " $(find / -type d -name $1 2> /dev/null)"`
echo "$x"
cd $x
y=$(find . -type f | xargs grep -c $2 | grep -v ":0"| grep -o '[^/]*$' | sort -t: -k2,1 -n -r )
echo "$y"
else
echo "Directory does does not exists"
fi
result: scriptname directoryname word
output: /somedirectory/vtb/wordsearch : 4
/foo/bar: 3
Is there any option to replace xargs grep -c $2 ? Because grep -c prints the count=number of lines which contains the word but i need to print the exact occurrence of a word in the files in a given directory
Using grep's -c count feature:
grep -c "SEARCH" /path/to/files* | sort -r -t : -k 2 | head -n 1
The grep command will output each file in a /path/name:count format, the sort will numerically (-n) sort by the 2nd (-k 2) field as delimited by a colon (-t :) in reverse order (-r). We then use head to keep the first result (-n 1).
Try This:
grep -o -w 'foo' bar.txt | wc -w
OR
grep -o -w 'word' /path/to/file/ | wc -w
grep -Fwor "$word" "$dir" | sed "s/:${word}\$//" | sort | uniq -c | sort -n | tail -1

$(shell …) command in Makefile is not executed correctly, but works in bash/sh

I want to count the number of nodes in a graphviz file in a Makefile to use it to start a process for each node.
When I run
grep -- -\> graph.gv | while read line; do for w in $line; do echo $w; done; done | grep [Aa-Zz] | sort | uniq | wc -l
in the shell, it prints the number of nodes as expected.
However, when I use it in my Makefile
NODES := $(shell grep -- -\> graph.gv | while read line; do for w in $line; do echo $w; done; done | grep [Aa-Zz] | sort | uniq | wc -l)
${NODES} is always 0.
You'll need to escape the $ sign. Say:
NODES := $(shell grep -- -\> graph.gv | while read line; do for w in $$line; do echo $$w; done; done | grep [Aa-Zz] | sort | uniq | wc -l)

Resources