BASH - Parse strings with special characters - bash

Goal: I'm attempting to create an interactive version of docker ps. Basically, have each line be a "menu" such that a user can: start, stop, ssh, etc.
Example:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1. bf4a9c7de6bf app_1 "docker-php-entryp..." 7 days ago Up About an hour 443/tcp, 0.0.0.0:80->80/tcp, 9000/tcp app_1
2. 26195f0764ce app_2 "sh /var/www/html/..." 10 days ago Up About an hour 443/tcp, 127.0.0.1:8000->80/tcp app_2
Upon choosing (1/2, etc) there will be an options menu to perform various actions on the selected container.
Problem: I can't seem to figure out how to parse out each line of the docker ps command such that i'll have the Container ID and other values as array elements.
The code so far:
list=`docker ps`
IFS=$'\n' array=($list)
for index in ${!array[#]}
do
declare -a 'a=('"${array[index]}"')'
printf "%s\n" "${a[#]}"
done
The result:
CONTAINER
ID
IMAGE
COMMAND
CREATED
STATUS
PORTS
NAMES
/usr/bin/dockersh: array assign: line 9: syntax error near unexpected token `>'
/usr/bin/dockersh: array assign: line 9: `bf4a9c7de6bf app_1 "docker-php-entryp..." 7 days ago Up About an hour 443/tcp, 0.0.0.0:80->80/tcp, 9000/tcp app_1'

It looks like you've got a few issues with the quoting, maybe try:
list=$(docker ps)
IFS=$'\n' array=($list)
for index in "${!array[#]}"
do
declare -a a=("${array[index]}")
printf "%s\n" "${a[#]}"
done
Without proper quoting your string will be likely by re-split; consider checking your shell scripts # shell-check.net, as it usually will give you some good hints regarding bad syntax.

If you want to have an associative array that features a matrix with all your docker ps field accessible in row/column, you can use awk to insert separator | between fields. Then export the result in a single associative array and build the matrix according to the number of column you expect (eg 7) :
#!/bin/bash
IFS=$'|'
data=$(docker ps -a | awk '
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
{
if (NR == 1) {
head[1] = index($0,"CONTAINER ID")
head[2] = image=index($0,"IMAGE")
head[3] = command=index($0,"COMMAND")
head[4] = created=index($0,"CREATED")
head[5] = status=index($0,"STATUS")
head[6] = ports=index($0,"PORTS")
head[7] = names=index($0,"NAMES")
}
else{
for (i = 1;i < 8;i++) {
if (i!=7){
printf "%s",rtrim(substr($0, head[i], head[i+1] - 1 - head[i])) "|"
}
else{
printf "%s",rtrim(substr($0, head[i], 100)) "|"
}
}
print ""
}
}')
arr=($data)
max_column=7
row=0
column=0
declare -A matrix
for index in "${!arr[#]}"
do
matrix[$row,$column]=$(echo "${arr[index]}" | tr -d '\n')
column=$((column+1))
if [ $((column%max_column)) == 0 ]; then
row=$((row+1))
column=0
fi
done
echo "first container ID is : ${matrix[0,0]}"
echo "second container ID is : ${matrix[1,0]}"
echo "third container NAME is : ${matrix[2,6]}"
In the awk part, the aim is to insert a | character between each field for the data to be injected into an associative array with the | delimiter
As field content is aligned with field title, we store the index of each field names in head array and extract each field trimming according to the next field position
Then the matrix is build according to the max column count (7). Then each row/column can be accessed easily with ${matrix[row,column]}

Usual story ... don't read data with a for loop unless you know exactly the format and how to control it:
while IFS="\n" read -r line
do
array+=("$line")
done< <(docker ps)
Personally I would try and remove the numbers from the start of the lines (1., 2., etc) because then you can throw it into a select and it will give you numbers which can then be used to reference the relevant items.

Related

shell script subtract fields from pairs of lines

Suppose I have the following file:
stub-foo-start: 10
stub-foo-stop: 15
stub-bar-start: 3
stub-bar-stop: 7
stub-car-start: 21
stub-car-stop: 51
# ...
# EOF at the end
with the goal of writing a script which would append to it like so:
stub-foo-start: 10
stub-foo-stop: 15
stub-bar-start: 3
stub-bar-stop: 7
stub-car-start: 21
stub-car-stop: 51
# ...
# appended:
stub-foo: 5 # 5 = stop(15) - start(10)
stub-bar: 4 # and so on...
stub-car: 30
# ...
# new EOF
The format is exactly this sequential pairing of start and stop tags (stop being the closing one) and no nesting in between.
What is the recommended approach to writing such a script using awk and/or sed? Mostly, what I've tried is greping lines, storing to a variable, but that seemed to overcomplicate things and trail off.
Any advice or helpful links welcome. (Most tutorials I found on shell scripting were illustrative at best)
A naive implementation in plain bash
#!/bin/bash
while read -r start && read -r stop; do
printf '%s: %d\n' "${start%-*}" $(( ${stop##*:} - ${start##*:} ))
done < file
This assumes pairs are contiguous and there are no interlaced or nested pairs.
Using GNU awk:
awk -F '[ -]' '{ map[$2][$3]=$4;print } END { for (i in map) { print i": "(map[i]["stop:"]-map[i]["start:"])" // ("map[i]["stop:"]"-"-map[i]["start:"]")" } }' file
Explanation:
awk -F '[ -]' '{ # Set the field delimiter to space or "-"
map[$2][$3]=$4; # Create a two dimensional array with the second and third field as indexes and the fourth field as the value
print # Print the line
}
END { for (i in map) {
print i": "(map[i]["stop:"]-map[i]["start:"])" // ("map[i]["stop:"]"-"-map[i]["start:"]")" # Loop through the array and print the data in the required format
}
}' file

UNIX: cut inside if

I have a simple search script, where based on user's options it will search in certain column of a file.
The file looks similar to passwd
openvpn:x:990:986:OpenVPN:/etc/openvpn:/sbin/nologin
chrony:x:989:984::/var/lib/chrony:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
radvd:x:75:75:radvd user:/:/sbin/nologin
now the function based on user's option will search in different columns of the file. For example
-1 "searchedword" -2 "secondword"
will search in the first column for "searchedword" and in the second column for "secondword"
The function looks like this:
while [ $# -gt 0 ]; do
case "$1" in
-1|--one)
c=1
;;
-2|--two)
c=2
;;
-3|--three)
c=3
;;
...
esac
In the c variable is the number of the column where I want to search.
cat data | if [ "$( cut -f $c -d ':' )" == "$2" ]; then cut -d: -f 1-7 >> result; fi
Now I have something like this, where I try to select the right column and compare it to the second option, which is in this case "searchedword" and then copy the whole column into the result file. But it doesn't work. It doesn't copy anything into the result file.
Does anyone know where is the problem?
Thanks for answers
(At the end of the script I use:
shift
shift
to get the next two options)
I suggest using awk for this task as awk is better tool for processing delimited columns and rows.
Consider this awk command where we pass search column numbers their corresponding search values in 2 different strings cols and vals to awk command:
awk -v cols='1:3' -v vals='rpcuser:29' 'BEGIN {
FS=OFS=":" # set input/output field separator as :
nc = split(cols, c, /:/) # split column # by :
split(vals, v, /:/) # split values by :
}
{
p=1 # initialize p as 1
for(i=1; i<=nc; i++) # iterate the search cols/vals and set p=0
if ($c[i] !~ v[i]) { # if any match fails
p=0
break
} # finally value of p decides if a row is printing or not
} p' file
Output:
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin

Merging rows in .csv in order

After analysis of brain scans I ended up with around 1000 .csv files, one for each scan. I've merged them into one in order (by subject ID and date). My problem is, that some subjects had two or more consecutive scans and some had only one. Database now looks like that:
ID, CC_area, CC_perimeter, CC_circularity
024_S_0985, 407.00, 192.15, 0.138530 //first scan of A
024_S_0985, 437.50, 204.80, 0.131074 //second scan of A
024_S_0985, 400.75, 198.80, 0.127420 //third scan of A
024_S_1063, 544.50, 214.34, 0.148939 //first and only scan of B
024_S_1171, 654.75, 240.33, 0.142453 //first scan of C
024_S_1171, 659.50, 242.21, 0.141269 //second scan of C
...
But I want it to look like that:
ID, CC_area, CC_perimeter, CC_circularity, CC_area2, CC_perimeter2, CC_circularity2, CC_area3, CC_perimeter3, CC_circularity3, ..., CC_circularity6
024_S_0985, 407.00, 192.15, 0.138530, 437.50, 204.80, 0.131074, 400.75, 198.80, 0.127420, ... ,
024_S_1063, 544.50, 214.34, 0.148939,,,,,, ...,
024_S_1171, 654.75, 240.33, 0.142453, 659.50, 242.21, 0.141269,,, ... ,
...
What is important, that order of data must not be changed and number of rows for one ID is not known (it varies from 1 to 6). (So first columns of scan 1, then scan 2 etc.). Could you help me, or provide, with solution for that using bash? I am not experienced in programming and I have lost hope, that I could do it myself.
You can combine the line with the same filename (or initial index) using a normal while read loop and then acting on 3 conditions. (1) whether it is the first line following the header; (2) where the current index is equal to the last; and (3) where the current index differs from the last. There are a number of ways to approach this, but a short bash script could look like the following:
#!/bin/bash
fn="${1:-/dev/stdin}" ## accept filename or stdin
[ -r "$fn" ] || { ## validate file is readable
printf "error: file not found: '%s'\n" "$fn"
exit 1
}
declare -i cnt=0 ## flag for 1st iteration
while read -r line; do ## for each line in file
## read header, print & continue
[ ${line//,*/} = ID ] && printf "%s\n" "$line" && continue
line="${line// */}" ## strip //first scan of A....
idx=${line//,*/} ## parse file index from line
line="${line#*, }" ## strip index
if [ $cnt -eq 0 ]; then ## if first line - print
printf "%s, %s" "$idx" "$line"
((cnt++))
elif [ $idx = $lidx ]; then ## if indexes equal, append
printf ", %s" "$line"
else ## else, newline & print
printf "\n%s, %s" "$idx" "$line"
fi
last="$line" ## save last line
lidx=$idx ## save last index
done <"$fn"
printf "\n"
Input
$ cat dat/cmbcsv.dat
ID, CC_area, CC_perimeter, CC_circularity
024_S_0985, 407.00, 192.15, 0.138530 //first scan of A
024_S_0985, 437.50, 204.80, 0.131074 //second scan of A
024_S_0985, 400.75, 198.80, 0.127420 //third scan of A
024_S_1063, 544.50, 214.34, 0.148939 //first and only scan of B
024_S_1171, 654.75, 240.33, 0.142453 //first scan of C
024_S_1171, 659.50, 242.21, 0.141269 //second scan of C
Output
$ bash cmbcsv.sh dat/cmbcsv.dat
ID, CC_area, CC_perimeter, CC_circularity
024_S_0985, 407.00, 192.15, 0.138530, 437.50, 204.80, 0.131074, 400.75, 198.80, 0.127420
024_S_1063, 544.50, 214.34, 0.148939
024_S_1171, 654.75, 240.33, 0.142453, 659.50, 242.21, 0.141269
Note: I didn't know whether you needed all the additional commas or ellipses or if they were just there to show there could be more of the same index (e.g. ,,...,). You can easily add them if need be.
well if you know which scan belongs to which person you can add an extra column like patient name or id, but I guess that's if you have that original info of how much scans per person

Adding two decimal variables and assigning values in bash

we have been asked to parse a csv file and perform some operations based upon the data in the csv
I am trying to find the maximum of addition of two numbers which i get from the csv file
that is the last and second last numbers, which are decimals
Following is my code
#!/bin/bash
#this file was created on 09/03/2014
#Author = Shashank Pangam
OLDIFS=$IFS
IFS=","
maxTransport=0
while read year month hydro geo solar wind fuel1 biomassL biomassC totalRenew fuel2 biodieselT biomassT
do
while [ $year -eq 2012 ]
do
currentTransport=$(echo "$biodieselT+$biomassT" | bc)
echo $currentTransport
if (( $(echo "$currentTransport > $maxTransport" | bc -l)));
then
$maxTransport = $currentTransport
echo $maxTransport
fi
done
echo -e "Maximum amount of energy consumed by the Transportation sector for year 2012 : $maxTransport"
done < $1
and the following is my csv file
2012,January,2.614,0.356,0.006,0.021,114.362,14.128,1.308,66.74,196.539,199.536,81.791,
2012,February,2.286,0.333,0.007,0.017,107.388,13.952,1.304,61.277,183.921,186.564,81.545,
2012,March,0.356,0.009,0.02,108.268,15.588,1.404,63.444,188.705,191.318,87.827,11.187,
2012,April,,0.344,0.012,0.019,103.627,14.229,1.381,60.683,179.919,181.993,86.339,11.518,
2012,May,,0.356,0.012,0.01,109.644,13.789,1.473,63.611,188.517,190.913,92.087,12.09,
2012,June,,0.344,0.013,0.013,108.116,13.012,1.434,61.056,183.618,185.65,89.673,12.461,
2012,July,,0.356,0.017,0.008,112.426,14.035,1.403,58.057,185.921,187.61,87.707,10.464,
2012,August,0.356,0.016,0.008,113.64,14.01,1.513,60.011,189.174,190.999,94.592,11.14,
2012,September,1.513,0.344,0.015,0.01,110.84,13.435,1.324,56.047,181.647,183.528,82.814,
2012,October,1.83,0.356,0.012,0.02,111.544,15.597,1.462,57.365,185.969,188.186,91.42,
2012,November,2.022,0.344,0.01,0.014,111.808,15.594,1.326,56.793,185.521,187.911,82.919,
2012,December,1.77,0.356,0.007,0.022,116.416,15.873,1.368,58.741,192.398,194.552,85.526,
2013,January,3.021,0.357,0.007,0.018,114.601,15.309,1.334,57.31,188.553,191.956,83.415,
2013,February,3.285,0.322,0.012,0.023,102.499,13.658,1.246,52.05,169.452,173.094,77.914,
2013,March,0.357,0.016,0.025,111.594,14.538,1.419,59.096,186.646,189.884,88.713,11.938,
2013,April,,0.345,0.018,0.03,103.602,14.446,1.437,59.057,178.542,181.342,89.867,12.184,
2013,May,,0.357,0.02,0.032,108.113,14.452,1.497,62.606,186.668,190.117,93.634,13.166,
2013,June,,0.345,0.021,0.028,109.162,14.597,1.47,61.563,186.792,189.994,91.894,14.501,
2013,July,,0.357,0.018,0.024,119.154,15.018,1.45,62.037,197.659,201.027,90.689,14.523,
2013,August,0.357,0.022,0.02,113.177,15.014,1.44,60.682,190.313,192.949,90.065,13.28,
2013,September,2.185,0.345,0.021,0.026,106.912,14.367,1.411,58.901,181.591,184.168,88.254,
2013,October,2.171,0.357,0.02,0.029,109.123,15.158,1.483,64.509,190.273,192.849,92.748
The following is the error i get
./calculator.sh: line 16: 0: command not found
0
268.109
I don't understand why echo $currentTransport returns 0 while in the comparison it works and assigns value to maxTransport but throws the error for the same.
Thanks in advance.
Instead of this:
$maxTransport = $currentTransport
Try this:
maxTransport=$currentTransport
The $ in front of a variable gives its contents. By removing the $, the actual variable location of maxTransport is used instead as the destination for the contents of currentTransport.

Bash script that analyzes report files

I have the following bash script which I will use to analyze all report files in the current directory:
#!/bin/bash
# methods
analyzeStructuralErrors()
{
# do something with $1
}
# main
reportFiles=`find $PWD -name "*_report*.txt"`;
for f in $reportFiles
do
echo "Processing $f"
analyzeStructuralErrors $f
done
My report files are formatted as such:
Error Code for Issue X - Description Text - Number of errors.
col1_name,col2_name,col3_name,col4_name,col5_name,col6_name
1143-1-1411-247-1-72953-1
1143-2-1411-247-436-72953-1
2211-1-1888-204-442-22222-1
Error Code for Issue Y - Description Text - Number of errors.
col1_name,col2_name,col3_name,col4_name,col5_name,col6_name
Other data
.
.
.
I'm looking for a way to go through each file and aggregate the report data. In the above example, we have two unique issues of type X, which I would like to handle in analyzeStructural. Other types of issues can be ignored in this routine. Can anyone offer advice on how to do this? I want to read each line until I hit the next error basically, and put that data into some kind of data structure.
Below is a working awk implementation that uses it's pseudo multidimensional arrays. I've included sample output to show you how it looks. I took the liberty to add a 'Count' column to denote how many times a certain "Issue" was hit for a given Error Code
#!/bin/bash
awk '
/Error Code for Issue/ {
errCode[currCode=$5]=$5
}
/^ +[0-9-]+$/ {
split($0, tmpArr, "-")
error[errCode[currCode],tmpArr[1]]++
}
END {
for (code in errCode) {
printf("Error Code: %s\n", code)
for (item in error) {
split(item, subscr, SUBSEP)
if (subscr[1] == code) {
printf("\tIssue: %s\tCount: %s\n", subscr[2], error[item])
}
}
}
}
' *_report*.txt
Output
$ ./report.awk
Error Code: B
Issue: 1212 Count: 3
Error Code: X
Issue: 2211 Count: 1
Issue: 1143 Count: 2
Error Code: Y
Issue: 2961 Count: 1
Issue: 6666 Count: 1
Issue: 5555 Count: 2
Issue: 5911 Count: 1
Issue: 4949 Count: 1
Error Code: Z
Issue: 2222 Count: 1
Issue: 1111 Count: 1
Issue: 2323 Count: 2
Issue: 3333 Count: 1
Issue: 1212 Count: 1
As suggested by Dave Jarvis, awk will:
handle this better than bash
is fairly easy to learn
likely available wherever bash is available
I've never had to look farther than The AWK Manual.
It would make things easier if you used a consistent field separator for both the list of column names and the data. Perhaps you could do some pre-processing in a bash script using sed before feeding to awk. Anyway, take a look at multi-dimensional arrays and reading multiple lines in the manual.
Bash has one-dimensional arrays that are indexed by integers. Bash 4 adds associative arrays. That's it for data structures. AWK has one dimensional associative arrays and fakes its way through two dimensional arrays. If you need some kind of data structure more advanced than that, you'll need to use Python, for example, or some other language.
That said, here's a rough outline of how you might parse the data you've shown.
#!/bin/bash
# methods
analyzeStructuralErrors()
{
local f=$1
local Xpat="Error Code for Issue X"
local notXpat="Error Code for Issue [^X]"
while read -r line
do
if [[ $line =~ $Xpat ]]
then
flag=true
elif [[ $line =~ $notXpat ]]
then
flag=false
elif $flag && [[ $line =~ , ]]
then
# columns could be overwritten if there are more than one X section
IFS=, read -ra columns <<< "$line"
elif $flag && [[ $line =~ - ]]
then
issues+=(line)
else
echo "unrecognized data line"
echo "$line"
fi
done
for issue in ${issues[#]}
do
IFS=- read -ra array <<< "$line"
# do something with ${array[0]}, ${array[1]}, etc.
# or iterate
for field in ${array[#]}
do
# do something with $field
done
done
}
# main
find . -name "*_report*.txt" | while read -r f
do
echo "Processing $f"
analyzeStructuralErrors "$f"
done

Resources