Storing multiple columns of data from a file in a variable - bash

I'm trying to read from a file the data that it contains and get 2 important pieces of data from the file and use it in a bash script. A string and then a number for example:
Box 12
Toy 85
Dog 13
Bottle 22
I was thinking I could write a while loop to loop through the file and store the data into a variable. However I need two different variables, one for the number and one for the word. How do I get them separated into two variables?

Example code:
#!/bin/bash
declare -a textarr numarr
while read -r text num;do
textarr+=("$text")
numarr+=("$num")
done <file
echo ${textarr[1]} ${numarr[1]} #will print Toy 85
data are stored into two array variables: textarr numarr.
You can access each one of them using index ${textarr[$index]} or all of them at once with ${textarr[#]}

To read all the data into a single associative array (in bash 4.0 or newer):
#!/bin/bash
declare -A data=( )
while read -r key value; do
data[$key]=$value
done <file
With that done, you can retrieve a value by key efficiently:
echo "${data[Box]}"
...or iterate over all keys:
for key in "${!data[#]}"; do
value=${data[$key]}
echo "Key $key has value $value"
done
You'll note that read takes multiple names on its argument list. When given more than one argument, it splits fields by IFS, putting columns into their respective variables (with the entire rest of the line going into the last variable named, if more columns exist than variables are named).

Here I provide my own solution which should be discussed. I am not sure this is a good solution or not. Using while read construct has the drawback of starting a new shell and it will not be able to update a variable outside the loop. Here is an example code which you can modify to suite your own need. If you have more column data to use, then slight adjustment is need.
#!/bin/sh
res=$(awk 'BEGIN{OFS=" "}{print $2, $3 }' mytabularfile.tab)
n=0
for x in $res; do
row=$(expr $n / 2)
col=$(expr $n % 2)
#echo "row: $row column: $col value: $x"
if [ $col -eq 0 ]; then
if [ $n -gt 0 ]; then
echo "row: $row "
echo col1=$col1 col2=$col2
fi
col1=$x
else
col2=$x
fi
n=$(expr $n + 1)
done
row=$(expr $row + 1)
echo "last row: $row col1=$col1 col2=$col2"

Related

Bash: checking substring increments with modular arithmetic

I have a list of files with file names that contain a substring of 6 numbers that represents HHMMSS, HH: 2 digits hour, MM: 2 digits minutes, SS: 2 digits seconds.
If the list of files is ordered, the increments should be in steps of 30 minutes, that is, the first substring should be 000000, followed by 003000, 010000, 013000, ..., 233000.
I want to check that no file is missing iterating the list of files and checking that neither of these substrings is missing. My approach:
string_check=000000
for file in ${file_list[#]}; do
if [[ ${file:22:6} == $string_check ]]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi
string_check=$((string_check+3000)) #this is the key line
done
And the previous to the last line is the key. It should be formatted to 6 digits, I know how to do that, but I want to add time like a clock, or, in more specific words, modular arithmetic modulo 60. How can that be done?
Assumptions:
all 6-digit strings are of the format xx[03]0000 (ie, has to be an even 00 or 30 minutes and no seconds)
if there are strings like xx1529 ... these will be ignored (see 2nd half of answer - use of comm - to address OP's comment about these types of strings being an error)
Instead of trying to do a bunch of mod 60 math for the MM (minutes) portion of the string, we can use a sequence generator to generate all the desired strings:
$ for string_check in {00..23}{00,30}00; do echo $string_check; done
000000
003000
010000
013000
... snip ...
230000
233000
While OP should be able to add this to the current code, I'm thinking we might go one step further and look at pre-parsing all of the filenames, pulling the 6-digit strings into an associative array (ie, the 6-digit strings act as the indexes), eg:
unset myarray
declare -A myarray
for file in ${file_list}
do
myarray[${file:22:6}]+=" ${file}" # in case multiple files have same 6-digit string
done
Using the sequence generator as the driver of our logic, we can pull this together like such:
for string_check in {00..23}{00,30}00
do
[[ -z "${myarray[${string_check}]}" ]] &&
echo "Problem: (file) '${string_check}' is missing"
done
NOTE: OP can decide if the process should finish checking all strings or if it should exit on the first missing string (per OP's current code).
One idea for using comm to compare the 2 lists of strings:
# display sequence generated strings that do not exist in the array:
comm -23 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[#]}" | sort)
# OP has commented that strings not like 'xx[03]000]` should generate an error;
# display strings (extracted from file names) that do not exist in the sequence
comm -13 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[#]}" | sort)
Where:
comm -23 - display only the lines from the first 'file' that do not exist in the second 'file' (ie, missing sequences of the format xx[03]000)
comm -13 - display only the lines from the second 'file' that do not exist in the first 'file' (ie, filenames with strings not of the format xx[03]000)
These lists could then be used as input to a loop, or passed to xargs, for additional processing as needed; keeping in mind the comm -13 output will display the indices of the array, while the associated contents of the array will contain the name of the original file(s) from which the 6-digit string was derived.
Doing this easy with POSIX shell and only using built-ins:
#!/usr/bin/env sh
# Print an x for each glob matched file, and store result in string_check
string_check=$(printf '%.0sx' ./*[0-2][0-9][03]000*)
# Now string_check length reflects the number of matches
if [ ${#string_check} -eq 48 ]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi
Alternatively:
#!/usr/bin/env sh
if [ "$(printf '%.0sx' ./*[0-2][0-9][03]000*)" \
= 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi

How to iterate over two strings simultaneously ksh

I'm using data that is returned by another person's ksh93 script in the format of a print to the standard output. Depending on the flag I give it, their script gives me the information I need for my code. It comes out like a list separated by spaces, such that a run of the program has the format of:
"1 3 4 7 8"
"First Third Fourth Seventh Eighth"
For what I'm working on, I need to be able to match the entries of each output, so that I could make the information print in the following format:
1:First
3:Third
4:Fourth
7:Seventh
8:Eighth
I need to do more than just printing with the data, I just need to be able to access the pairs of information in each of the strings. Even though the actual contents of the strings can be any number of values, the two strings I get from running the other script will always be the same length.
I'm wondering if there exists a way to iterate over both at the same time, something along the lines of:
str_1=$(other_script -f)
str_2=$(other_script -i)
for a,b in ${str_1},${str_2} ; do
print "${a}:${b}"
done
This obviously isn't the right syntax, but I have been unable to find a way to make it work. Is there a way to iterate over both at the same time?
I know I could convert them to arrays first then iterate by numerical element, but I would like to save the time of converting them if there's a way to iterate over both simultaneously.
Why do you think it is not quick to convert the strings to arrays?
For example:
`#!/bin/ksh93
set -u
set -A line1
string1="1 3 4 7 8"
line1+=( ${string1} )
set -A line2
string2="First Third Fourth Seventh Eighth"
line2+=( ${string2})
typeset -i num_elem_line1=${#line1[#]}
typeset -i num_elem_line2=${#line2[#]}
typeset -i loop_counter=0
if (( num_elem_line1 == num_elem_line2 ))
then
while (( loop_counter < num_elem_line1 ))
do
print "${line1[${loop_counter}]}:${line2[${loop_counter}]}"
(( loop_counter += 1 ))
done
fi
`
As with the other comments, not sure why an array would be out of the question, especially if you plan on referencing the individual elements more than once later in your code.
A sample script that assumes you want to maintain your str_1/str_2 variables as strings; we'll load into arrays for referencing individual elements:
$ cat testme
#!/bin/ksh
str_1="1 3 4 7 8"
str_2="First Third Fourth Seventh Eighth"
str1=( ${str_1} )
str2=( ${str_2} )
# at this point matching array elements have the same index (0..4) ...
echo "++++++++++ str1[index]=element"
for i in "${!str1[#]}"
do
echo "str1[${i}]=${str1[${i}]}"
done
echo "++++++++++ str2[index]=element"
for i in "${!str1[#]}"
do
echo "str2[${i}]=${str2[${i}]}"
done
# since matching array elements have the same index, we just need
# to loop through one set of indexes to allow us to access matching
# array elements at the same time ...
echo "++++++++++ str1:str2"
for i in "${!str1[#]}"
do
echo ${str1[${i}]}:${str2[${i}]}
done
echo "++++++++++"
And a run of the script:
$ testme
++++++++++ str1[index]=element
str1[0]=1
str1[1]=3
str1[2]=4
str1[3]=7
str1[4]=8
++++++++++ str2[index]=element
str2[0]=First
str2[1]=Third
str2[2]=Fourth
str2[3]=Seventh
str2[4]=Eighth
++++++++++ str1:str2
1:First
3:Third
4:Fourth
7:Seventh
8:Eighth
++++++++++

bash script to compare number inside the 2 files

I want to compare 2 number from two different file using Bash script. The file is tmp$i and tmp$(($i-1)). I have tried the script below is not working
#!/bin/bash
for i in `seq 1 5`
do
if [ $tmp$i -lt $tmp$(($i-1)) ];then
cat tmp$i >> inf
else
cat tmp$i >> sup
fi
done
Sample data
Tmp1:
0.8856143905954186 0.8186070632371812 0.7624440603372680 0.7153352945456424 0.6762383806114797 0.6405457936981878
Tmp2:
0.5809579333203458 0.5567050091247218 0.5329405222386163 0.5115305043007474 0.4963898045543342 0.4846139486344327
You are not setting $tmp so you end up simply comparing whether i is smaller than i-1 which of course it isn't.
Removing the dollar sign nominally fixes that, but will just compare two strings (for which numeric cardinality isn't well-defined, so in practice, always false), not access the contents of files named like those strings. tmp2 is neither larger nor smaller than tmp1. (Bash can perform lexical comparison, but test ... -lt isn't the tool to do that.)
Try this instead:
if [ $(cat "tmp$i") -lt $(cat "tmp$((i - 1))") ]; then
In response to the observation that you want to do this on decimal numbers, you need a different tool, because Bash only supports integer arithmetic. My approach would be to write a simple Awk script which performs the comparison.
In order to be able to use it as a conditional, it should exit(0) if the condition is true, exit(1) otherwise.
In order to keep the main script readable, I would encapsulate it in a function, like this:
smaller_first_line () {
awk 'NR==1 && FNR==1 { i=$1; next } FNR==1 { exit($1 < i) }' "$1" "$2"
}
if smaller_first_line "tmp$i" "tmp$((i - 1))"; then
:

Iterate over lists embedded as values in key/value pairs in bash

I'm trying to get a (key,multiple-value) structure (some sort of hashmap) in bash, like this :
[
[ "abc" : 1, 2, 3, 4 ],
[ "def" : "w", 33, 2 ]
]
I'd like to iterate through eack key (some kind of for key in ..., and get each value with something like map["def",2] or map[$key,2].
I've seen a couple of threads talking about single-value hashmap, but nothing about this issue.
I could go with N arrays, N being the amount of key in my map, filled with every field in a row, but I don't want to duplicate code as much as possible.
Thanks in advance !
Edit :
I'd like to go through the structure with something like this :
for key in ${map[#]} do;
echo $key # "abc" then "def"
for value in ${map[$key,#]} do;
...
done
done
Using modern bash features with the multiple-array case:
Assignment (manual):
map_abc=( 1 2 3 4 )
map_def=( w 33 2 )
Assignment (programmatic):
append() {
local array_name="${1}_$2"; shift; shift
declare -g -a "$array_name"
declare -n array="$array_name" # BASH 4.3 FEATURE
array+=( "$#" )
}
append map abc 1 2 3 4
append map def w 33 2
Iteration (done inside a function to contain the namevar's scope):
iter() {
for array in ${!map_#}; do
echo "Iterating over array ${array#map_}"
declare -n cur_array="$array" # BASH 4.3 FEATURE
for key in "${!cur_array[#]}"; do
echo "$key: ${cur_array[$key]}"
done
done
}
iter
This can also be done without namevars, but in an uglier and more error-prone fashion. (To be clear, I believe the code given here uses eval safely, but it's easy to get wrong -- if trying to build your own implementation on this template, please be very cautious).
# Compatible with older bash (should be through 3.x).
append() {
local array_name="${1}_$2"; shift; shift
declare -g -a "$array_name"
local args_str cmd_str
printf -v args_str '%q ' "$#"
printf -v cmd_str "%q+=( %s )" "$array_name" "$args_str"
eval "$cmd_str"
}
...and, to iterate in a way compatible with bash back through 3.x:
for array in ${!map_#}; do
echo "Iterating over array ${array#map_}"
printf -v cur_array_cmd 'cur_array=( ${%q[#]} )' "$array"
eval "$cur_array_cmd"
for key in "${!cur_array[#]}"; do
echo "$key: ${cur_array[$key]}"
done
done
This is more computationally efficient than filtering through a single large array (the other answer given) -- and, when namevars are available, arguably results in cleaner code as well.
Do-able. The declaration is somewhat ugly
declare -A map=(
[abc,0]=1
[abc,1]=2
[abc,2]=3
[abc,3]=4
[def,0]=w
[def,1]=33
[def,2]=2
)
key="def"
i=1
echo "${map[$key,$i]}" # => 33
Iterating: helpful to keep a separate array of "keys":
keys=(abc def)
Then
for key in "${keys[#]}"; do
echo "$key"
for idx in "${!map[#]}"; do
if [[ $idx == $key,* ]]; then
n=${idx##*,}
printf "\t%s\t%s\n" "$n" "${map["$idx"]}"
fi
done
done
abc
0 1
1 2
2 3
3 4
def
1 33
0 w
2 2

Bash script that analyzes report files

I have the following bash script which I will use to analyze all report files in the current directory:
#!/bin/bash
# methods
analyzeStructuralErrors()
{
# do something with $1
}
# main
reportFiles=`find $PWD -name "*_report*.txt"`;
for f in $reportFiles
do
echo "Processing $f"
analyzeStructuralErrors $f
done
My report files are formatted as such:
Error Code for Issue X - Description Text - Number of errors.
col1_name,col2_name,col3_name,col4_name,col5_name,col6_name
1143-1-1411-247-1-72953-1
1143-2-1411-247-436-72953-1
2211-1-1888-204-442-22222-1
Error Code for Issue Y - Description Text - Number of errors.
col1_name,col2_name,col3_name,col4_name,col5_name,col6_name
Other data
.
.
.
I'm looking for a way to go through each file and aggregate the report data. In the above example, we have two unique issues of type X, which I would like to handle in analyzeStructural. Other types of issues can be ignored in this routine. Can anyone offer advice on how to do this? I want to read each line until I hit the next error basically, and put that data into some kind of data structure.
Below is a working awk implementation that uses it's pseudo multidimensional arrays. I've included sample output to show you how it looks. I took the liberty to add a 'Count' column to denote how many times a certain "Issue" was hit for a given Error Code
#!/bin/bash
awk '
/Error Code for Issue/ {
errCode[currCode=$5]=$5
}
/^ +[0-9-]+$/ {
split($0, tmpArr, "-")
error[errCode[currCode],tmpArr[1]]++
}
END {
for (code in errCode) {
printf("Error Code: %s\n", code)
for (item in error) {
split(item, subscr, SUBSEP)
if (subscr[1] == code) {
printf("\tIssue: %s\tCount: %s\n", subscr[2], error[item])
}
}
}
}
' *_report*.txt
Output
$ ./report.awk
Error Code: B
Issue: 1212 Count: 3
Error Code: X
Issue: 2211 Count: 1
Issue: 1143 Count: 2
Error Code: Y
Issue: 2961 Count: 1
Issue: 6666 Count: 1
Issue: 5555 Count: 2
Issue: 5911 Count: 1
Issue: 4949 Count: 1
Error Code: Z
Issue: 2222 Count: 1
Issue: 1111 Count: 1
Issue: 2323 Count: 2
Issue: 3333 Count: 1
Issue: 1212 Count: 1
As suggested by Dave Jarvis, awk will:
handle this better than bash
is fairly easy to learn
likely available wherever bash is available
I've never had to look farther than The AWK Manual.
It would make things easier if you used a consistent field separator for both the list of column names and the data. Perhaps you could do some pre-processing in a bash script using sed before feeding to awk. Anyway, take a look at multi-dimensional arrays and reading multiple lines in the manual.
Bash has one-dimensional arrays that are indexed by integers. Bash 4 adds associative arrays. That's it for data structures. AWK has one dimensional associative arrays and fakes its way through two dimensional arrays. If you need some kind of data structure more advanced than that, you'll need to use Python, for example, or some other language.
That said, here's a rough outline of how you might parse the data you've shown.
#!/bin/bash
# methods
analyzeStructuralErrors()
{
local f=$1
local Xpat="Error Code for Issue X"
local notXpat="Error Code for Issue [^X]"
while read -r line
do
if [[ $line =~ $Xpat ]]
then
flag=true
elif [[ $line =~ $notXpat ]]
then
flag=false
elif $flag && [[ $line =~ , ]]
then
# columns could be overwritten if there are more than one X section
IFS=, read -ra columns <<< "$line"
elif $flag && [[ $line =~ - ]]
then
issues+=(line)
else
echo "unrecognized data line"
echo "$line"
fi
done
for issue in ${issues[#]}
do
IFS=- read -ra array <<< "$line"
# do something with ${array[0]}, ${array[1]}, etc.
# or iterate
for field in ${array[#]}
do
# do something with $field
done
done
}
# main
find . -name "*_report*.txt" | while read -r f
do
echo "Processing $f"
analyzeStructuralErrors "$f"
done

Resources