Bash script that analyzes report files - bash
I have the following bash script which I will use to analyze all report files in the current directory:
#!/bin/bash
# methods
analyzeStructuralErrors()
{
# do something with $1
}
# main
reportFiles=`find $PWD -name "*_report*.txt"`;
for f in $reportFiles
do
echo "Processing $f"
analyzeStructuralErrors $f
done
My report files are formatted as such:
Error Code for Issue X - Description Text - Number of errors.
col1_name,col2_name,col3_name,col4_name,col5_name,col6_name
1143-1-1411-247-1-72953-1
1143-2-1411-247-436-72953-1
2211-1-1888-204-442-22222-1
Error Code for Issue Y - Description Text - Number of errors.
col1_name,col2_name,col3_name,col4_name,col5_name,col6_name
Other data
.
.
.
I'm looking for a way to go through each file and aggregate the report data. In the above example, we have two unique issues of type X, which I would like to handle in analyzeStructural. Other types of issues can be ignored in this routine. Can anyone offer advice on how to do this? I want to read each line until I hit the next error basically, and put that data into some kind of data structure.
Below is a working awk implementation that uses it's pseudo multidimensional arrays. I've included sample output to show you how it looks. I took the liberty to add a 'Count' column to denote how many times a certain "Issue" was hit for a given Error Code
#!/bin/bash
awk '
/Error Code for Issue/ {
errCode[currCode=$5]=$5
}
/^ +[0-9-]+$/ {
split($0, tmpArr, "-")
error[errCode[currCode],tmpArr[1]]++
}
END {
for (code in errCode) {
printf("Error Code: %s\n", code)
for (item in error) {
split(item, subscr, SUBSEP)
if (subscr[1] == code) {
printf("\tIssue: %s\tCount: %s\n", subscr[2], error[item])
}
}
}
}
' *_report*.txt
Output
$ ./report.awk
Error Code: B
Issue: 1212 Count: 3
Error Code: X
Issue: 2211 Count: 1
Issue: 1143 Count: 2
Error Code: Y
Issue: 2961 Count: 1
Issue: 6666 Count: 1
Issue: 5555 Count: 2
Issue: 5911 Count: 1
Issue: 4949 Count: 1
Error Code: Z
Issue: 2222 Count: 1
Issue: 1111 Count: 1
Issue: 2323 Count: 2
Issue: 3333 Count: 1
Issue: 1212 Count: 1
As suggested by Dave Jarvis, awk will:
handle this better than bash
is fairly easy to learn
likely available wherever bash is available
I've never had to look farther than The AWK Manual.
It would make things easier if you used a consistent field separator for both the list of column names and the data. Perhaps you could do some pre-processing in a bash script using sed before feeding to awk. Anyway, take a look at multi-dimensional arrays and reading multiple lines in the manual.
Bash has one-dimensional arrays that are indexed by integers. Bash 4 adds associative arrays. That's it for data structures. AWK has one dimensional associative arrays and fakes its way through two dimensional arrays. If you need some kind of data structure more advanced than that, you'll need to use Python, for example, or some other language.
That said, here's a rough outline of how you might parse the data you've shown.
#!/bin/bash
# methods
analyzeStructuralErrors()
{
local f=$1
local Xpat="Error Code for Issue X"
local notXpat="Error Code for Issue [^X]"
while read -r line
do
if [[ $line =~ $Xpat ]]
then
flag=true
elif [[ $line =~ $notXpat ]]
then
flag=false
elif $flag && [[ $line =~ , ]]
then
# columns could be overwritten if there are more than one X section
IFS=, read -ra columns <<< "$line"
elif $flag && [[ $line =~ - ]]
then
issues+=(line)
else
echo "unrecognized data line"
echo "$line"
fi
done
for issue in ${issues[#]}
do
IFS=- read -ra array <<< "$line"
# do something with ${array[0]}, ${array[1]}, etc.
# or iterate
for field in ${array[#]}
do
# do something with $field
done
done
}
# main
find . -name "*_report*.txt" | while read -r f
do
echo "Processing $f"
analyzeStructuralErrors "$f"
done
Related
How to exclude lines above a target
I have read multiple posts about how to exclude lines around a grep match, but none addresses it with finality, most find other ways to sort data, and that does not solve similar issues with different data. i have a file with a recursive output, a command repeated over and over. i want to trim out the 0 results because it is the only constant value, the result hits are an unknown quantity. the only unique string i can search by needs to have the 4 lines above it excluded no matter what the content of those lines are, and i have not found any post with info generic enough to fit. this is a conceptual question, there has to be a simple solution, but if an example is needed: Path/Path/Path> search [results] [results] 2 entries found Path/Path/Path> search [result] 1 entry found Path/Path/Path> search 0 entry found
try this: # Assumption: The data is in logile.txt i=0 tac logfile.txt |\ while read -r line; do if [[ "${line:0:7}" == "0 entry" ]]; then i=0 continue else ((i++)) [[ $i -le 4 ]] && continue fi echo "$line" done | tac output: Path/Path/Path> search [results] [results] 2 entries found
Having difficulty defining conditions to call certain functions and error messages
I'm writing a piece of code which will use data from a file that I've already made in order to work out the average value of the file, the minimum value, maximum value and then finally displaying all values at once. I'm very new to unix so I'm trying to learn it but I just cant seem to crack where I need to go with my code in order for it to gain functionality. I've got the basics of the code but I need to find a way to call the functions using the year, which is stored in a directory corresponding to that year which is making me think I'm going to have problems calling from the file as I'm using a sed function to only take line 4 of that file rather than the year. I also need to figure out how to set error messages and status to the script if they have not stated (Year) (One of the 4 commands), the year doesnt correspond to one available in the tree and the keyword is invalid. Any help or even pointers towards good material to learn these things would be great. Here is my current code: #!/bin/bash #getalldata() { #find . -name "ff_*" -exec sed -n '4p' {} \; #} #Defining where the population configuration file is which contains all the data popconfile.txt=$HOME/testarea #Function to find the average population of all of the files averagePopulation() { total=0 list=$(cat popconfile.txt) for var in "${list[#]}" do total=$((total + var)) done average=$((total/$(wc -l popconfile.txt))) echo "$average" } #Function to find the maximum population from all the files maximumPopulation() { max=1 for in `cat popconfile.txt` do if [[ $1 > "$max" ]]; then max=$1 echo "$max" fi done } #Function to find the minimum population from all the files minimumPopulation() { min=1000000 for in `cat popconfile.txt` do if [[ $1 < "$min" ]]; then max=$1 echo "$min" fi done } #Function to show all of the results in one place function showAll() { echo "$min" echo "$max" echo "$average" } Thanks!
Assuming your popconfile.txt format is cat popconfile.txt 150 10 45 1000 34 87 You might be able to simplify your code with : for i in $(cat popconfile.txt);do temp[$i]=$i done pop=(${temp[*]}) min=${pop[0]} max=${pop[$((${#pop[*]}-1))]} for ((j=0;j<${#pop[*]};j++));do sum=$(($sum+${pop[$j]})) done average=$(($sum/${#pop[*]})) echo "maximum="$max echo "minimum="$min echo "average="$average Be aware though that the average here or in your code is calculated with integer mathematics, so you're loosing all decimals.
BASH - Parse strings with special characters
Goal: I'm attempting to create an interactive version of docker ps. Basically, have each line be a "menu" such that a user can: start, stop, ssh, etc. Example: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1. bf4a9c7de6bf app_1 "docker-php-entryp..." 7 days ago Up About an hour 443/tcp, 0.0.0.0:80->80/tcp, 9000/tcp app_1 2. 26195f0764ce app_2 "sh /var/www/html/..." 10 days ago Up About an hour 443/tcp, 127.0.0.1:8000->80/tcp app_2 Upon choosing (1/2, etc) there will be an options menu to perform various actions on the selected container. Problem: I can't seem to figure out how to parse out each line of the docker ps command such that i'll have the Container ID and other values as array elements. The code so far: list=`docker ps` IFS=$'\n' array=($list) for index in ${!array[#]} do declare -a 'a=('"${array[index]}"')' printf "%s\n" "${a[#]}" done The result: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES /usr/bin/dockersh: array assign: line 9: syntax error near unexpected token `>' /usr/bin/dockersh: array assign: line 9: `bf4a9c7de6bf app_1 "docker-php-entryp..." 7 days ago Up About an hour 443/tcp, 0.0.0.0:80->80/tcp, 9000/tcp app_1'
It looks like you've got a few issues with the quoting, maybe try: list=$(docker ps) IFS=$'\n' array=($list) for index in "${!array[#]}" do declare -a a=("${array[index]}") printf "%s\n" "${a[#]}" done Without proper quoting your string will be likely by re-split; consider checking your shell scripts # shell-check.net, as it usually will give you some good hints regarding bad syntax.
If you want to have an associative array that features a matrix with all your docker ps field accessible in row/column, you can use awk to insert separator | between fields. Then export the result in a single associative array and build the matrix according to the number of column you expect (eg 7) : #!/bin/bash IFS=$'|' data=$(docker ps -a | awk ' function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s } { if (NR == 1) { head[1] = index($0,"CONTAINER ID") head[2] = image=index($0,"IMAGE") head[3] = command=index($0,"COMMAND") head[4] = created=index($0,"CREATED") head[5] = status=index($0,"STATUS") head[6] = ports=index($0,"PORTS") head[7] = names=index($0,"NAMES") } else{ for (i = 1;i < 8;i++) { if (i!=7){ printf "%s",rtrim(substr($0, head[i], head[i+1] - 1 - head[i])) "|" } else{ printf "%s",rtrim(substr($0, head[i], 100)) "|" } } print "" } }') arr=($data) max_column=7 row=0 column=0 declare -A matrix for index in "${!arr[#]}" do matrix[$row,$column]=$(echo "${arr[index]}" | tr -d '\n') column=$((column+1)) if [ $((column%max_column)) == 0 ]; then row=$((row+1)) column=0 fi done echo "first container ID is : ${matrix[0,0]}" echo "second container ID is : ${matrix[1,0]}" echo "third container NAME is : ${matrix[2,6]}" In the awk part, the aim is to insert a | character between each field for the data to be injected into an associative array with the | delimiter As field content is aligned with field title, we store the index of each field names in head array and extract each field trimming according to the next field position Then the matrix is build according to the max column count (7). Then each row/column can be accessed easily with ${matrix[row,column]}
Usual story ... don't read data with a for loop unless you know exactly the format and how to control it: while IFS="\n" read -r line do array+=("$line") done< <(docker ps) Personally I would try and remove the numbers from the start of the lines (1., 2., etc) because then you can throw it into a select and it will give you numbers which can then be used to reference the relevant items.
Storing multiple columns of data from a file in a variable
I'm trying to read from a file the data that it contains and get 2 important pieces of data from the file and use it in a bash script. A string and then a number for example: Box 12 Toy 85 Dog 13 Bottle 22 I was thinking I could write a while loop to loop through the file and store the data into a variable. However I need two different variables, one for the number and one for the word. How do I get them separated into two variables?
Example code: #!/bin/bash declare -a textarr numarr while read -r text num;do textarr+=("$text") numarr+=("$num") done <file echo ${textarr[1]} ${numarr[1]} #will print Toy 85 data are stored into two array variables: textarr numarr. You can access each one of them using index ${textarr[$index]} or all of them at once with ${textarr[#]}
To read all the data into a single associative array (in bash 4.0 or newer): #!/bin/bash declare -A data=( ) while read -r key value; do data[$key]=$value done <file With that done, you can retrieve a value by key efficiently: echo "${data[Box]}" ...or iterate over all keys: for key in "${!data[#]}"; do value=${data[$key]} echo "Key $key has value $value" done You'll note that read takes multiple names on its argument list. When given more than one argument, it splits fields by IFS, putting columns into their respective variables (with the entire rest of the line going into the last variable named, if more columns exist than variables are named).
Here I provide my own solution which should be discussed. I am not sure this is a good solution or not. Using while read construct has the drawback of starting a new shell and it will not be able to update a variable outside the loop. Here is an example code which you can modify to suite your own need. If you have more column data to use, then slight adjustment is need. #!/bin/sh res=$(awk 'BEGIN{OFS=" "}{print $2, $3 }' mytabularfile.tab) n=0 for x in $res; do row=$(expr $n / 2) col=$(expr $n % 2) #echo "row: $row column: $col value: $x" if [ $col -eq 0 ]; then if [ $n -gt 0 ]; then echo "row: $row " echo col1=$col1 col2=$col2 fi col1=$x else col2=$x fi n=$(expr $n + 1) done row=$(expr $row + 1) echo "last row: $row col1=$col1 col2=$col2"
Adding two decimal variables and assigning values in bash
we have been asked to parse a csv file and perform some operations based upon the data in the csv I am trying to find the maximum of addition of two numbers which i get from the csv file that is the last and second last numbers, which are decimals Following is my code #!/bin/bash #this file was created on 09/03/2014 #Author = Shashank Pangam OLDIFS=$IFS IFS="," maxTransport=0 while read year month hydro geo solar wind fuel1 biomassL biomassC totalRenew fuel2 biodieselT biomassT do while [ $year -eq 2012 ] do currentTransport=$(echo "$biodieselT+$biomassT" | bc) echo $currentTransport if (( $(echo "$currentTransport > $maxTransport" | bc -l))); then $maxTransport = $currentTransport echo $maxTransport fi done echo -e "Maximum amount of energy consumed by the Transportation sector for year 2012 : $maxTransport" done < $1 and the following is my csv file 2012,January,2.614,0.356,0.006,0.021,114.362,14.128,1.308,66.74,196.539,199.536,81.791, 2012,February,2.286,0.333,0.007,0.017,107.388,13.952,1.304,61.277,183.921,186.564,81.545, 2012,March,0.356,0.009,0.02,108.268,15.588,1.404,63.444,188.705,191.318,87.827,11.187, 2012,April,,0.344,0.012,0.019,103.627,14.229,1.381,60.683,179.919,181.993,86.339,11.518, 2012,May,,0.356,0.012,0.01,109.644,13.789,1.473,63.611,188.517,190.913,92.087,12.09, 2012,June,,0.344,0.013,0.013,108.116,13.012,1.434,61.056,183.618,185.65,89.673,12.461, 2012,July,,0.356,0.017,0.008,112.426,14.035,1.403,58.057,185.921,187.61,87.707,10.464, 2012,August,0.356,0.016,0.008,113.64,14.01,1.513,60.011,189.174,190.999,94.592,11.14, 2012,September,1.513,0.344,0.015,0.01,110.84,13.435,1.324,56.047,181.647,183.528,82.814, 2012,October,1.83,0.356,0.012,0.02,111.544,15.597,1.462,57.365,185.969,188.186,91.42, 2012,November,2.022,0.344,0.01,0.014,111.808,15.594,1.326,56.793,185.521,187.911,82.919, 2012,December,1.77,0.356,0.007,0.022,116.416,15.873,1.368,58.741,192.398,194.552,85.526, 2013,January,3.021,0.357,0.007,0.018,114.601,15.309,1.334,57.31,188.553,191.956,83.415, 2013,February,3.285,0.322,0.012,0.023,102.499,13.658,1.246,52.05,169.452,173.094,77.914, 2013,March,0.357,0.016,0.025,111.594,14.538,1.419,59.096,186.646,189.884,88.713,11.938, 2013,April,,0.345,0.018,0.03,103.602,14.446,1.437,59.057,178.542,181.342,89.867,12.184, 2013,May,,0.357,0.02,0.032,108.113,14.452,1.497,62.606,186.668,190.117,93.634,13.166, 2013,June,,0.345,0.021,0.028,109.162,14.597,1.47,61.563,186.792,189.994,91.894,14.501, 2013,July,,0.357,0.018,0.024,119.154,15.018,1.45,62.037,197.659,201.027,90.689,14.523, 2013,August,0.357,0.022,0.02,113.177,15.014,1.44,60.682,190.313,192.949,90.065,13.28, 2013,September,2.185,0.345,0.021,0.026,106.912,14.367,1.411,58.901,181.591,184.168,88.254, 2013,October,2.171,0.357,0.02,0.029,109.123,15.158,1.483,64.509,190.273,192.849,92.748 The following is the error i get ./calculator.sh: line 16: 0: command not found 0 268.109 I don't understand why echo $currentTransport returns 0 while in the comparison it works and assigns value to maxTransport but throws the error for the same. Thanks in advance.
Instead of this: $maxTransport = $currentTransport Try this: maxTransport=$currentTransport The $ in front of a variable gives its contents. By removing the $, the actual variable location of maxTransport is used instead as the destination for the contents of currentTransport.