Bash script that analyzes report files - bash

I have the following bash script which I will use to analyze all report files in the current directory:
#!/bin/bash
# methods
analyzeStructuralErrors()
{
# do something with $1
}
# main
reportFiles=`find $PWD -name "*_report*.txt"`;
for f in $reportFiles
do
echo "Processing $f"
analyzeStructuralErrors $f
done
My report files are formatted as such:
Error Code for Issue X - Description Text - Number of errors.
col1_name,col2_name,col3_name,col4_name,col5_name,col6_name
1143-1-1411-247-1-72953-1
1143-2-1411-247-436-72953-1
2211-1-1888-204-442-22222-1
Error Code for Issue Y - Description Text - Number of errors.
col1_name,col2_name,col3_name,col4_name,col5_name,col6_name
Other data
.
.
.
I'm looking for a way to go through each file and aggregate the report data. In the above example, we have two unique issues of type X, which I would like to handle in analyzeStructural. Other types of issues can be ignored in this routine. Can anyone offer advice on how to do this? I want to read each line until I hit the next error basically, and put that data into some kind of data structure.

Below is a working awk implementation that uses it's pseudo multidimensional arrays. I've included sample output to show you how it looks. I took the liberty to add a 'Count' column to denote how many times a certain "Issue" was hit for a given Error Code
#!/bin/bash
awk '
/Error Code for Issue/ {
errCode[currCode=$5]=$5
}
/^ +[0-9-]+$/ {
split($0, tmpArr, "-")
error[errCode[currCode],tmpArr[1]]++
}
END {
for (code in errCode) {
printf("Error Code: %s\n", code)
for (item in error) {
split(item, subscr, SUBSEP)
if (subscr[1] == code) {
printf("\tIssue: %s\tCount: %s\n", subscr[2], error[item])
}
}
}
}
' *_report*.txt
Output
$ ./report.awk
Error Code: B
Issue: 1212 Count: 3
Error Code: X
Issue: 2211 Count: 1
Issue: 1143 Count: 2
Error Code: Y
Issue: 2961 Count: 1
Issue: 6666 Count: 1
Issue: 5555 Count: 2
Issue: 5911 Count: 1
Issue: 4949 Count: 1
Error Code: Z
Issue: 2222 Count: 1
Issue: 1111 Count: 1
Issue: 2323 Count: 2
Issue: 3333 Count: 1
Issue: 1212 Count: 1

As suggested by Dave Jarvis, awk will:
handle this better than bash
is fairly easy to learn
likely available wherever bash is available
I've never had to look farther than The AWK Manual.
It would make things easier if you used a consistent field separator for both the list of column names and the data. Perhaps you could do some pre-processing in a bash script using sed before feeding to awk. Anyway, take a look at multi-dimensional arrays and reading multiple lines in the manual.

Bash has one-dimensional arrays that are indexed by integers. Bash 4 adds associative arrays. That's it for data structures. AWK has one dimensional associative arrays and fakes its way through two dimensional arrays. If you need some kind of data structure more advanced than that, you'll need to use Python, for example, or some other language.
That said, here's a rough outline of how you might parse the data you've shown.
#!/bin/bash
# methods
analyzeStructuralErrors()
{
local f=$1
local Xpat="Error Code for Issue X"
local notXpat="Error Code for Issue [^X]"
while read -r line
do
if [[ $line =~ $Xpat ]]
then
flag=true
elif [[ $line =~ $notXpat ]]
then
flag=false
elif $flag && [[ $line =~ , ]]
then
# columns could be overwritten if there are more than one X section
IFS=, read -ra columns <<< "$line"
elif $flag && [[ $line =~ - ]]
then
issues+=(line)
else
echo "unrecognized data line"
echo "$line"
fi
done
for issue in ${issues[#]}
do
IFS=- read -ra array <<< "$line"
# do something with ${array[0]}, ${array[1]}, etc.
# or iterate
for field in ${array[#]}
do
# do something with $field
done
done
}
# main
find . -name "*_report*.txt" | while read -r f
do
echo "Processing $f"
analyzeStructuralErrors "$f"
done

Related

How to exclude lines above a target

I have read multiple posts about how to exclude lines around a grep match, but none addresses it with finality, most find other ways to sort data, and that does not solve similar issues with different data.
i have a file with a recursive output, a command repeated over and over. i want to trim out the 0 results because it is the only constant value, the result hits are an unknown quantity.
the only unique string i can search by needs to have the 4 lines above it excluded no matter what the content of those lines are, and i have not found any post with info generic enough to fit.
this is a conceptual question, there has to be a simple solution, but if an example is needed:
Path/Path/Path> search
[results]
[results]
2 entries found
Path/Path/Path> search
[result]
1 entry found
Path/Path/Path> search
0 entry found
try this:
# Assumption: The data is in logile.txt
i=0
tac logfile.txt |\
while read -r line; do
if [[ "${line:0:7}" == "0 entry" ]]; then
i=0
continue
else
((i++))
[[ $i -le 4 ]] && continue
fi
echo "$line"
done | tac
output:
Path/Path/Path> search
[results]
[results]
2 entries found

Having difficulty defining conditions to call certain functions and error messages

I'm writing a piece of code which will use data from a file that I've already made in order to work out the average value of the file, the minimum value, maximum value and then finally displaying all values at once.
I'm very new to unix so I'm trying to learn it but I just cant seem to crack where I need to go with my code in order for it to gain functionality.
I've got the basics of the code but I need to find a way to call the functions using the year, which is stored in a directory corresponding to that year which is making me think I'm going to have problems calling from the file as I'm using a sed function to only take line 4 of that file rather than the year.
I also need to figure out how to set error messages and status to the script if they have not stated (Year) (One of the 4 commands), the year doesnt correspond to one available in the tree and the keyword is invalid.
Any help or even pointers towards good material to learn these things would be great.
Here is my current code:
#!/bin/bash
#getalldata() {
#find . -name "ff_*" -exec sed -n '4p' {} \;
#}
#Defining where the population configuration file is which contains all the data
popconfile.txt=$HOME/testarea
#Function to find the average population of all of the files
averagePopulation()
{
total=0
list=$(cat popconfile.txt)
for var in "${list[#]}"
do
total=$((total + var))
done
average=$((total/$(wc -l popconfile.txt)))
echo "$average"
}
#Function to find the maximum population from all the files
maximumPopulation()
{
max=1
for in `cat popconfile.txt`
do
if [[ $1 > "$max" ]]; then
max=$1
echo "$max"
fi
done
}
#Function to find the minimum population from all the files
minimumPopulation()
{
min=1000000
for in `cat popconfile.txt`
do
if [[ $1 < "$min" ]]; then
max=$1
echo "$min"
fi
done
}
#Function to show all of the results in one place
function showAll()
{
echo "$min"
echo "$max"
echo "$average"
}
Thanks!
Assuming your popconfile.txt format is
cat popconfile.txt
150
10
45
1000
34
87
You might be able to simplify your code with :
for i in $(cat popconfile.txt);do
temp[$i]=$i
done
pop=(${temp[*]})
min=${pop[0]}
max=${pop[$((${#pop[*]}-1))]}
for ((j=0;j<${#pop[*]};j++));do
sum=$(($sum+${pop[$j]}))
done
average=$(($sum/${#pop[*]}))
echo "maximum="$max
echo "minimum="$min
echo "average="$average
Be aware though that the average here or in your code is calculated with integer mathematics, so you're loosing all decimals.

BASH - Parse strings with special characters

Goal: I'm attempting to create an interactive version of docker ps. Basically, have each line be a "menu" such that a user can: start, stop, ssh, etc.
Example:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1. bf4a9c7de6bf app_1 "docker-php-entryp..." 7 days ago Up About an hour 443/tcp, 0.0.0.0:80->80/tcp, 9000/tcp app_1
2. 26195f0764ce app_2 "sh /var/www/html/..." 10 days ago Up About an hour 443/tcp, 127.0.0.1:8000->80/tcp app_2
Upon choosing (1/2, etc) there will be an options menu to perform various actions on the selected container.
Problem: I can't seem to figure out how to parse out each line of the docker ps command such that i'll have the Container ID and other values as array elements.
The code so far:
list=`docker ps`
IFS=$'\n' array=($list)
for index in ${!array[#]}
do
declare -a 'a=('"${array[index]}"')'
printf "%s\n" "${a[#]}"
done
The result:
CONTAINER
ID
IMAGE
COMMAND
CREATED
STATUS
PORTS
NAMES
/usr/bin/dockersh: array assign: line 9: syntax error near unexpected token `>'
/usr/bin/dockersh: array assign: line 9: `bf4a9c7de6bf app_1 "docker-php-entryp..." 7 days ago Up About an hour 443/tcp, 0.0.0.0:80->80/tcp, 9000/tcp app_1'
It looks like you've got a few issues with the quoting, maybe try:
list=$(docker ps)
IFS=$'\n' array=($list)
for index in "${!array[#]}"
do
declare -a a=("${array[index]}")
printf "%s\n" "${a[#]}"
done
Without proper quoting your string will be likely by re-split; consider checking your shell scripts # shell-check.net, as it usually will give you some good hints regarding bad syntax.
If you want to have an associative array that features a matrix with all your docker ps field accessible in row/column, you can use awk to insert separator | between fields. Then export the result in a single associative array and build the matrix according to the number of column you expect (eg 7) :
#!/bin/bash
IFS=$'|'
data=$(docker ps -a | awk '
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
{
if (NR == 1) {
head[1] = index($0,"CONTAINER ID")
head[2] = image=index($0,"IMAGE")
head[3] = command=index($0,"COMMAND")
head[4] = created=index($0,"CREATED")
head[5] = status=index($0,"STATUS")
head[6] = ports=index($0,"PORTS")
head[7] = names=index($0,"NAMES")
}
else{
for (i = 1;i < 8;i++) {
if (i!=7){
printf "%s",rtrim(substr($0, head[i], head[i+1] - 1 - head[i])) "|"
}
else{
printf "%s",rtrim(substr($0, head[i], 100)) "|"
}
}
print ""
}
}')
arr=($data)
max_column=7
row=0
column=0
declare -A matrix
for index in "${!arr[#]}"
do
matrix[$row,$column]=$(echo "${arr[index]}" | tr -d '\n')
column=$((column+1))
if [ $((column%max_column)) == 0 ]; then
row=$((row+1))
column=0
fi
done
echo "first container ID is : ${matrix[0,0]}"
echo "second container ID is : ${matrix[1,0]}"
echo "third container NAME is : ${matrix[2,6]}"
In the awk part, the aim is to insert a | character between each field for the data to be injected into an associative array with the | delimiter
As field content is aligned with field title, we store the index of each field names in head array and extract each field trimming according to the next field position
Then the matrix is build according to the max column count (7). Then each row/column can be accessed easily with ${matrix[row,column]}
Usual story ... don't read data with a for loop unless you know exactly the format and how to control it:
while IFS="\n" read -r line
do
array+=("$line")
done< <(docker ps)
Personally I would try and remove the numbers from the start of the lines (1., 2., etc) because then you can throw it into a select and it will give you numbers which can then be used to reference the relevant items.

Storing multiple columns of data from a file in a variable

I'm trying to read from a file the data that it contains and get 2 important pieces of data from the file and use it in a bash script. A string and then a number for example:
Box 12
Toy 85
Dog 13
Bottle 22
I was thinking I could write a while loop to loop through the file and store the data into a variable. However I need two different variables, one for the number and one for the word. How do I get them separated into two variables?
Example code:
#!/bin/bash
declare -a textarr numarr
while read -r text num;do
textarr+=("$text")
numarr+=("$num")
done <file
echo ${textarr[1]} ${numarr[1]} #will print Toy 85
data are stored into two array variables: textarr numarr.
You can access each one of them using index ${textarr[$index]} or all of them at once with ${textarr[#]}
To read all the data into a single associative array (in bash 4.0 or newer):
#!/bin/bash
declare -A data=( )
while read -r key value; do
data[$key]=$value
done <file
With that done, you can retrieve a value by key efficiently:
echo "${data[Box]}"
...or iterate over all keys:
for key in "${!data[#]}"; do
value=${data[$key]}
echo "Key $key has value $value"
done
You'll note that read takes multiple names on its argument list. When given more than one argument, it splits fields by IFS, putting columns into their respective variables (with the entire rest of the line going into the last variable named, if more columns exist than variables are named).
Here I provide my own solution which should be discussed. I am not sure this is a good solution or not. Using while read construct has the drawback of starting a new shell and it will not be able to update a variable outside the loop. Here is an example code which you can modify to suite your own need. If you have more column data to use, then slight adjustment is need.
#!/bin/sh
res=$(awk 'BEGIN{OFS=" "}{print $2, $3 }' mytabularfile.tab)
n=0
for x in $res; do
row=$(expr $n / 2)
col=$(expr $n % 2)
#echo "row: $row column: $col value: $x"
if [ $col -eq 0 ]; then
if [ $n -gt 0 ]; then
echo "row: $row "
echo col1=$col1 col2=$col2
fi
col1=$x
else
col2=$x
fi
n=$(expr $n + 1)
done
row=$(expr $row + 1)
echo "last row: $row col1=$col1 col2=$col2"

Adding two decimal variables and assigning values in bash

we have been asked to parse a csv file and perform some operations based upon the data in the csv
I am trying to find the maximum of addition of two numbers which i get from the csv file
that is the last and second last numbers, which are decimals
Following is my code
#!/bin/bash
#this file was created on 09/03/2014
#Author = Shashank Pangam
OLDIFS=$IFS
IFS=","
maxTransport=0
while read year month hydro geo solar wind fuel1 biomassL biomassC totalRenew fuel2 biodieselT biomassT
do
while [ $year -eq 2012 ]
do
currentTransport=$(echo "$biodieselT+$biomassT" | bc)
echo $currentTransport
if (( $(echo "$currentTransport > $maxTransport" | bc -l)));
then
$maxTransport = $currentTransport
echo $maxTransport
fi
done
echo -e "Maximum amount of energy consumed by the Transportation sector for year 2012 : $maxTransport"
done < $1
and the following is my csv file
2012,January,2.614,0.356,0.006,0.021,114.362,14.128,1.308,66.74,196.539,199.536,81.791,
2012,February,2.286,0.333,0.007,0.017,107.388,13.952,1.304,61.277,183.921,186.564,81.545,
2012,March,0.356,0.009,0.02,108.268,15.588,1.404,63.444,188.705,191.318,87.827,11.187,
2012,April,,0.344,0.012,0.019,103.627,14.229,1.381,60.683,179.919,181.993,86.339,11.518,
2012,May,,0.356,0.012,0.01,109.644,13.789,1.473,63.611,188.517,190.913,92.087,12.09,
2012,June,,0.344,0.013,0.013,108.116,13.012,1.434,61.056,183.618,185.65,89.673,12.461,
2012,July,,0.356,0.017,0.008,112.426,14.035,1.403,58.057,185.921,187.61,87.707,10.464,
2012,August,0.356,0.016,0.008,113.64,14.01,1.513,60.011,189.174,190.999,94.592,11.14,
2012,September,1.513,0.344,0.015,0.01,110.84,13.435,1.324,56.047,181.647,183.528,82.814,
2012,October,1.83,0.356,0.012,0.02,111.544,15.597,1.462,57.365,185.969,188.186,91.42,
2012,November,2.022,0.344,0.01,0.014,111.808,15.594,1.326,56.793,185.521,187.911,82.919,
2012,December,1.77,0.356,0.007,0.022,116.416,15.873,1.368,58.741,192.398,194.552,85.526,
2013,January,3.021,0.357,0.007,0.018,114.601,15.309,1.334,57.31,188.553,191.956,83.415,
2013,February,3.285,0.322,0.012,0.023,102.499,13.658,1.246,52.05,169.452,173.094,77.914,
2013,March,0.357,0.016,0.025,111.594,14.538,1.419,59.096,186.646,189.884,88.713,11.938,
2013,April,,0.345,0.018,0.03,103.602,14.446,1.437,59.057,178.542,181.342,89.867,12.184,
2013,May,,0.357,0.02,0.032,108.113,14.452,1.497,62.606,186.668,190.117,93.634,13.166,
2013,June,,0.345,0.021,0.028,109.162,14.597,1.47,61.563,186.792,189.994,91.894,14.501,
2013,July,,0.357,0.018,0.024,119.154,15.018,1.45,62.037,197.659,201.027,90.689,14.523,
2013,August,0.357,0.022,0.02,113.177,15.014,1.44,60.682,190.313,192.949,90.065,13.28,
2013,September,2.185,0.345,0.021,0.026,106.912,14.367,1.411,58.901,181.591,184.168,88.254,
2013,October,2.171,0.357,0.02,0.029,109.123,15.158,1.483,64.509,190.273,192.849,92.748
The following is the error i get
./calculator.sh: line 16: 0: command not found
0
268.109
I don't understand why echo $currentTransport returns 0 while in the comparison it works and assigns value to maxTransport but throws the error for the same.
Thanks in advance.
Instead of this:
$maxTransport = $currentTransport
Try this:
maxTransport=$currentTransport
The $ in front of a variable gives its contents. By removing the $, the actual variable location of maxTransport is used instead as the destination for the contents of currentTransport.

Resources