Bash script processing too slow - bash

I have the following script where I'm parsing 2 csv files to find a MATCH the files have 10000 lines each one. But the processing is taking a long time!!! Is this normal?
My script:
#!/bin/bash
IFS=$'\n'
CSV_FILE1=$1;
CSV_FILE2=$2;
sort -t';' $CSV_FILE1 >> Sorted_CSV1
sort -t';' $CSV_FILE2 >> Sorted_CSV2
echo "PATH1 ; NAME1 ; SIZE1 ; CKSUM1 ; PATH2 ; NAME2 ; SIZE2 ; CKSUM2" >> 'mapping.csv'
while read lineCSV1 #Parse 1st CSV file
do
PATH1=`echo $lineCSV1 | awk '{print $1}'`
NAME1=`echo $lineCSV1 | awk '{print $3}'`
SIZE1=`echo $lineCSV1 | awk '{print $7}'`
CKSUM1=`echo $lineCSV1 | awk '{print $9}'`
while read lineCSV2 #Parse 2nd CSV file
do
PATH2=`echo $lineCSV2 | awk '{print $1}'`
NAME2=`echo $lineCSV2 | awk '{print $3}'`
SIZE2=`echo $lineCSV2 | awk '{print $7}'`
CKSUM2=`echo $lineCSV2 | awk '{print $9}'`
# Test if NAM1 MATCHS NAME2
if [[ $NAME1 == $NAME2 ]]; then
#Test checksum OF THE MATCHING NAME
if [[ $CKSUM1 != $CKSUM2 ]]; then
#MAPPING OF THE MATCHING LINES
echo $PATH1 ';' $NAME1 ';' $SIZE1 ';' $CKSUM1 ';' $PATH2 ';' $NAME2 ';' $SIZE2 ';' $CKSUM2 >> 'mapping.csv'
fi
break #When its a match break the while loop and go the the next Row of the 1st CSV File
fi
done < Sorted_CSV2 #Done CSV2
done < Sorted_CSV1 #Done CSV1

This is a quadratic order. Also, see Tom Fenech comment: You are calling awk several times inside a loop inside another loop. Instead of using awk for the fields in every line try setting the IFS shell variable to ";" and read the fields directly in read commands:
IFS=";"
while read FIELD11 FIELD12 FIELD13; do
while read FIELD21 FIELD22 FIELD23; do
...
done <Sorted_CSV2
done <Sorted_CSV1
Though, this would be still O(N^2) and very inefficient. It seems you are matching 2 fields by a coincident field. This task is easier and faster to accomplish by using join command line utility, and would reduce order from O(N^2) to O(N).

Whenever you say "Does this file/data list/table have something that matches this file/data list/table?", you should think of associative arrays (sometimes called hashes).
An associative array is keyed by a particular value and each key is associated with a value. The nice thing is that finding a key is extremely fast.
In your loop of a loop, you have 10,000 lines in each file. You're outer loop executed 10,000 times. Your inner loop may execute 10,000 times for each and every line in your first file. That's 10,000 x 10,000 times you go through that inner loop. That's potentially looping 100 million times through that inner loop. Think you can see why your program might be a little slow?
In this day and age, having a 10,000 member associative array isn't that bad. (Imagine doing this back in 1980 on a MS-DOS system with 256K. It just wouldn't work). So, let's go through the first file, create a 10,000 member associative array, and then go through the second file looking for matching lines.
Bash 4.x has associative arrays, but I only have Bash 3.2 on my system, so I can't really give you an answer in Bash.
Besides, sometimes Bash isn't the answer to a particular issue. Bash can be a bit slow and the syntax can be error prone. Awk might be faster, but many versions don't have associative arrays. This is really a job for a higher level scripting language like Python or Perl.
Since I can't do a Bash answer, here's a Perl answer. Maybe this will help. Or, maybe this will inspire someone who has Bash 4.x can give an answer in Bash.
I Basically open the first file and create an associative array keyed by the checksum. If this is a sha1 checksum, it should be unique for all files (unless they're an exact match). If you don't have a sha1 checksum, you'll need to massage the structure a wee bit, but it's pretty much the same idea.
Once I have the associative array figured out, I then open file #2 and simply see if the checksum already exists in the file. If it does, I know I have a matching line, and print out the two matches.
I have to loop 10,000 times in the first file, and 10,000 times in the second. That's only 20,000 loops instead of 10 million that's 20,000 times less looping which means the program will run 20,000 times faster. So, if it takes 2 full days for your program to run with a double loop, an associative array solution will work in less than one second.
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
FILE1 => "file1.txt",
FILE2 => "file2.txt",
MATCHING => "csv_matches.txt",
};
#
# Open the first file and create the associative array
#
my %file_data;
open my $fh1, "<", FILE1;
while ( my $line = <$fh1> ) {
chomp $line;
my ( $path, $blah, $name, $bather, $yadda, $tl_dr, $size, $etc, $check_sum ) = split /\s+/, $line;
#
# The main key is "check_sum" which **should** be unique, especially if it's a sha1
#
$file_data{$check_sum}->{PATH} = $path;
$file_data{$check_sum}->{NAME} = $name;
$file_data{$check_sum}->{SIZE} = $size;
}
close $fh1;
#
# Now, we have the associative array keyed by the data we want to match, read file 2
#
open my $fh2, "<", FILE2;
open my $csv_fh, ">", MATCHING;
while ( my $line = <$fh2> ) {
chomp $line;
my ( $path, $blah, $name, $bather, $yadda, $tl_dr, $size, $etc, $check_sum ) = split /\s+/, $line;
#
# If there is a matching checksum in file1, we know we have a matching entry
#
if ( exists $file_data{$check_sum} ) {
printf {$csv_fh} "%s;%s:%s:%s:%s:%s\n",
$file_data{$check_sum}->{PATH}, $file_data{$check_sum}->{NAME}, $file_data{$check_sum}->{SIZE},
$path, $name, $size;
}
}
close $fh2;
close $csv_fh;
BUGS
(A good manpage always list issues!)
This assumes one match per file. If you have multiple duplicates in file1 or file2, you will only pick up the last one.
This assumes a sha256 or equivalent checksum. In such a checksum, it is extremely unlikely that two files will have the same checksum unless they match. A 16bit checksum from the historic sum command may have collisions.

Although a proper database engine would make a much better tool for this, it is still very well possible to do it with awk.
The trick is to sort your data, so that records with the same name are grouped together. Then a single pass from top to bottom is enough to find the matches. This can be done in linear time.
In detail:
Insert two columns in both CSV files
Make sure every line starts with the name. Also add a number (either 1 or 2) which denotes from which file the line originates. We will need this when we merge the two files together.
awk -F';' '{ print $2 ";1;" $0 }' csvfile1 > tmpfile1
awk -F';' '{ print $2 ";2;" $0 }' csvfile2 > tmpfile2
Concatenate the files, then sort the lines
sort tmpfile1 tmpfile2 > tmpfile3
Scan the result, report the mismatches
awk -F';' -f scan.awk tmpfile3
Where scan.awk contains:
BEGIN {
origin = 3;
}
$1 == name && $2 > origin && $6 != checksum {
print record;
}
{
name = $1;
origin = $2;
checksum = $6;
sub(/^[^;]*;.;/, "");
record = $0;
}
Putting it all together
Crammed together into a Bash oneliner, without explicit temporary files:
(awk -F';' '{print $2";1;"$0}' csvfile1 ; awk -F';' '{print $2";2;"$0}' csvfile2) | sort | awk -F';' 'BEGIN{origin=3}$1==name&&$2>origin&&$6!=checksum{print record}{name=$1;origin=$2;checksum=$6;sub(/^[^;]*;.;/,"");record=$0;}'
Notes:
If the same name appears more than once in csvfile1, then all but the last one are ignored.
If the same name appears more than once in csvfile2, then all but the first one are ignored.

Related

Grep list (file) from another file

Im new to bash and trying to extract a list of patterns from file:
File1.txt
ABC
BDF
GHJ
base.csv (tried comma separated and tab delimited)
line 1,,,,"hfhf,ferf,ju,ABC"
line 2 ,,,,,"ewy,trggt,gtg,ABC,RFR"
line 3 .."himk,n,hn.ujj., BDF"
etc
Suggested output is smth like
ABC
line 1..
line 2..(whole lines)
BDF
line 3..
and so on for each pattern from file 1
the code i tried was:
#!/bin/bash
for i in *.txt -# cycle through all files containing pattern lists
do
for q in "$i"; # # cycle through list
do
echo $q >>output.${i};
grep -f "${q}" base.csv >>output.${i};
echo "\n";
done
done
But output is only filename and then some list of strings without pattern names, e.g.
File1.txt
line 1...
line 2...
line 3..
so i don`t know to what pattern belongs each string and have to check and assign manually. Can you please point out my errors? Thanks!
grep can process multiple files in one go, and then has the attractive added bonus of indicating which file it found a match in.
grep -f File1.txt base.csv >output.txt
It's not clear what you hope for the inner loop to do; it will just loop over a single token at a time, so it's not really a loop at all.
If you want the output to be grouped per pattern, here's a for loop which looks for one pattern at a time:
while read -r pat; do
echo "$pat"
grep "$pat" *.txt
done <File1.txt >output.txt
But the most efficient way to tackle this is to write a simple Awk script which processes all the input files at once, and groups the matches before printing them.
An additional concern is anchoring. grep "ABC" will find a match in 123DEABCXYZ; is this something you want to avoid? You can improve the regex, or, again, turn to Awk which gives you more control over where exactly to look for a match in a structured line.
awk '# Read patterns into memory
NR==FNR { a[++i] = $1; next }
# Loop across patterns
{ for(j=1; j<=i; ++j)
if($0 ~ a[j]) {
print FILENAME ":" FNR ":" $0 >>output.a[j]
next }
}' File1.txt base.csv
You're not actually reading the files, you're just handling the filenames. Try this:
#!/bin/bash
for i in *.txt # cycle through all files containing pattern lists
do
while read -r q # read file line by line
do
echo "$q" >>"output.${i}"
grep -f "${q}" base.csv >>"output.${i}"
echo "\n"
done < "${i}"
done
Here is one that separates (with split, comma-separatd with quotes and spaces stripped off) words from file2 to an array (word[]) and stores the record names (line 1 etc.) to it comma-separated:
awk '
NR==FNR {
n=split($0,tmp,/[" ]*(,|$)[" ]*/) # split words
for(i=2;i<=n;i++) # after first
if(tmp[i]!="") # non-empties
word[tmp[i]]=word[tmp[i]] (word[tmp[i]]==""?"":",") tmp[1] # hash rownames
record[tmp[1]]=$0 # store records
next
}
($1 in word) { # word found
n=split(word[$1],tmp,",") # get record names
print $1 ":" # output word
for(i=1;i<=n;i++) # and records
print record[tmp[i]]
}' file2 file1
Output:
ABC:
line 1,,,,"hfhf,ferf,ju,ABC"
line 2 ,,,,,"ewy,trggt,gtg,ABC,RFR"
BDF:
line 3 .."himk,n,hn.ujj., BDF"
Thank you for your kind help, my friends.
Tried both variants above but kept getting various errors ( "do" expected) or misbehavior ( gets names of pattern blocks, eg ABC, BDF, but no lines.
Gave up for a while and then eventually tried another way
While base goal were to cycle through pattern list files, search for patterns in huge file and write out specific columns from lines found - i simply wrote
for *i in *txt # cycle throughfiles w/ patterns
do
grep -F -f "$i" bigfile.csv >> ${i}.out1 #greps all patterns from current file
cut -f 2,3,4,7 ${i}.out1>> ${i}.out2 # cuts columns of interest and writes them out to another file
done
I'm aware that this code should be improved using some fancy pipeline features, but it works perfectly as is, hope it`ll help somebody in similar situation. You can easily add some echoes to write out pattern list names as i initially requested

awk store a pattern result to a shell array variable

I am trying to store the result of a pattern matched by awk to a shell array variable. Here's a simplified example of the same:
#!/bin/bash
declare -a array1=()
declare -a array2=()
READ_FILE="directory1/read_file.csv"
WRITE_FILE="directory2/results.csv"
#variable for counting array index
count1=0
count2=0
#
#
# need help with line below
# $2 below is the second set of characters which is a floating point number
awk -F 'string1_to_search' '{$array1[count1++] = $2}' $READ_FILE
awk -F 'string2_to_search' '{$array2[count2++] = $2}' $READ_FILE
#count++ indicates post increment of count variable
#do something with the array
.
.
#end
any suggestions would be helpful.
Something roughly like this, then?
awk '/string1_to_search/ {
count["id1"]++; sum["id1"] += $2 }
/string2_too/ {
count["id2"]++; sum["id2"] += $2 }
# ...
END { for (k in count) printf("%s: sum %f/count %i = avg %f\n", k, sum[k], count[k], sum[k]/count[k]) }' inputfile
I seem to recall there was a clever way to calculate a rolling variance without keeping the entire input set in memory; or else just collect the values space-separated value["id"] = value["id"] " " $2 and split into a list and loop over it near the end. Alternatively, simplify this to only examine one search string at a time and run it multiple times (let's hope then the input isn't very big). Or switch to Perl, which will easily let you collect lists of lists and other nested structures.
Obviously break out common functionality into separate functions so you don't have repeated code ... I suppose it's actually clearer like this, but if you find bugs, or need other changes, you only want to have to change one place in the code.
another method to do it is making awk print the number which can be passed to an array variable in bash like this :
mapfile -t array1 < <( awk -F 'string1_to_search' '{print $2}' "$READ_FILE" )
Later for taking out mean, variance and SD we can use bc tool from within the bash

Want to sort a file based on another file in unix shell

I have 2 files refer.txt and parse.txt
refer.txt contains the following
julie,remo,rob,whitney,james
parse.txt contains
remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,whitney/hello/1.0,julie/hello/2.0,julie/hello/3.0,rob/hello/4.0,james/hello/6.0
Now my output.txt should list the files in parse.txt based on the order specified in refer.txt
ex of output.txt should be:
julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0
i have tried the following code:
sort -nru refer.txt parse.txt
but no luck.
please assist me.TIA
You can do that using gnu-awk:
awk -F/ -v RS=',|\n' 'FNR==NR{a[$1] = (a[$1])? a[$1] "," $0 : $0 ; next}
{s = (s)? s "," a[$1] : a[$1]} END{print s}' parse.txt refer.txt
Output:
julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0
Explanation:
-F/ # Use field separator as /
-v RS=',|\n' # Use record separator as comma or newline
NR == FNR { # While processing parse.txt
a[$1]=(a[$1])?a[$1] ","$0:$0 # create an array with 1st field as key and value as all the
# records with keys julie, remo, rob etc.
}
{ # while processing the second file refer.txt
s = (s)?s "," a[$1]:a[$1] # aggregate all values by reading key from 2nd file
}
END {print s } # print all the values
In pure native bash (4.x):
# read each file into an array
IFS=, read -r -a values <parse.txt
IFS=, read -r -a ordering <refer.txt
# create a map from content before "/" to comma-separated full values in preserved order
declare -A kv=( )
for value in "${values[#]}"; do
key=${value%%/*}
if [[ ${kv[$key]} ]]; then
kv[$key]+=",$value" # already exists, comma-separate
else
kv[$key]="$value"
fi
done
# go through refer list, putting full value into "out" array for each entry
out=( )
for value in "${ordering[#]}"; do
out+=( "${kv[$value]}" )
done
# print "out" array in comma-separated form
IFS=,
printf '%s\n' "${out[*]}" >output.txt
If you're getting more output fields than you have input fields, you're probably trying to run this with bash 3.x. Since associative array support is mandatory for correct operation, this won't work.
tr , "\n" refer.txt | cat -n >person_id.txt # 'cut -n' not posix, use sed and paste
cat person_id.txt | while read person_id person_key
do
print "$person_id" > $person_key
done
tr , "\n" parse.txt | sed 's/(^[^\/]*)(\/.*)$/\1 \1\2/' >person_data.txt
cat person_data.txt | while read foreign_key person_data
do
person_id="$(<$foreign_key)"
print "$person_id" " " "$person_data" >>merge.txt
done
sort merge.txt >output.txt
A text book data processing approach, a person id table, a person data table, merged on a common key field, which is the first name of the person:
[person_key] [person_id]
- person id table, a unique sortable 'id' for each person (line number in this instance, since that is the desired sort order), and key for each person (their first name)
[person_key] [person_data]
- person data table, the data for each person indexed by 'person_key'
[person_id] [person_data]
- a merge of the 'person_id' table and 'person_data' table on 'person_key', which can then be sorted on person_id, giving the output as requested
The trick is to implement an associative array using files, the file name being the key (in this instance 'person_key'), the content being the value. [Essentially a random access file implemented using the filesystem.]
This actually adds a step to the otherwise simple but not very efficient task of grepping parse.txt with each value in refer.txt - which is more efficient I'm not sure.
NB: The above code is very unlikely to work out of the box.
NBB: On reflection, probably a better way of doing this would be to use the file system to create a random access file of parse.txt (essentially an index), and to then consider refer.txt as a batch file, submitting it as a job as such, printing out from the parse.txt random access file the data for each of the names read in from refer.txt in turn:
# 1) index data file on required field
cat person_data.txt | while read data
do
key="$(print "$data" | sed 's/(^[^\/]*)/\1/')" # alt. `cut -d'/' -f1` ??
print "$data" >>./person_data/"$key"
done
# 2) run batch job
cat refer_data.txt | while read key
do
print ./person_data/"$key"
done
However having said that, using egrep is probably just as rigorous a solution or at least for small datasets, I would most certainly use this approach given the specific question posed. (Or maybe not! The above could well prove faster as well as being more robust.)
Command
while read line; do
grep -w "^$line" <(tr , "\n" < parse.txt)
done < <(tr , "\n" < refer.txt) | paste -s -d , -
Key points
For both files, newlines are translated to commas using the tr command (without actually changing the files themselves). This is useful because while read and grep work under the assumption that your records are separated by newlines instead of commas.
while read will read in every name from refer.txt, (i.e julie, remo, etc.) and then use grep to retrieve lines from parse.txt containing that name.
The ^ in the regex ensures matching is only performed from the start of the string and not in the middle (thanks to #CharlesDuffy's comment below), and the -w option for grep allows whole-word matching only. For example, this ensures that "rob" only matches "rob/..." and not "robby/..." or "throb/...".
The paste command at the end will comma-separate the results. Removing this command will print each result on its own line.

combining multiple grep searches and making my script more efficient

I have a file called Type1.txt, that looks like this:
$ cat Type1.txt
ID.580.G3C0
TTTTTTTTTTT
ID.580.G3C8
ATTATATC-AAA
ID.580.GXC16
ATTATTTC-ACG-TTTTTCCTA
ID.694.G9C3
ATTATATC-ACG-AAATCCTA
ID.694.G9C3
etc...
I want to write a bash script to count the instances of each ID and export it into another file that provides a summary, something like this:
ID.580 = 3
ID.694 = 1
etc...
So far the script is messy and unusable.
For the above I have the following:
#!/bin/bash
for Count in `grep -c "ID.580" Type1.txt; do
echo $Count=ID.580
done > Result.txt #Allows to count only for that single ID.
I have over a thousand ID.XXX, making this code unusable since it's not plausible to add individual ID.XXX for each search. Thank you for the help!
Shell
The code below uses the standard UNIX utilities, and does not assume that the second part of the ID is exactly 3 characters, but will find ID.1.123123123 and ID.1234.123123 and properly only take the first dot-delimited part. As it is
grep '^ID\.[0-9]' Type1.txt | cut -d . -f 1-2 | sort \
| uniq -c | awk '{ print $2" = "$1 }'
grep filters only lines beginning with ID. followed by 1 digit (at least)
cut uses . as the field delimiter, and only outputting fields 1 and 2, thus removing
everything after and including the second . on the line.
sort sorts the lines for uniq to work
uniq prints each line from its input prefixed with a count
awk part reverses these fields and prints them separated with =.
If the first part of the ID can contain letters too, change the end of regular expression to [0-9] to [0-9A-Z]. for example
The pipeline outputs
ID.580 = 3
ID.694 = 2
Python
As the Python is popular among biologists, you might want to hone your python skills instead:
from collections import Counter
counter = Counter()
with open('Type1.txt') as f:
for line in f:
if line.startswith('ID.'):
top_id = '.'.join(line.split('.', 2)[:2])
counter[top_id] += 1
for top_id, count in sorted(counter.items()):
print("%s = %d" % (top_id, count))
The results are exactly identical.
grep '^ID.[0-9][0-9][0-9]' input_file | cut -c1-6 | sort | uniq -c
works?
TL;DR
Given your particular corpus and grouping strategy, there's more than one way to get the results you need. Here are two alternative solutions, one in awk, and one in Ruby.
GNU awk
One way is to use GNU awk to perform the following steps:
match just the ID lines
split matching input lines into fields
select and print the fields you need
sort the lines in the filtered result
count the adjacent duplicates
perform any specialized formatting on the result
For example:
$ awk '/^ID/ {split($0, a, "."); print a[1] "." a[2]}' /tmp/foo |
sort | uniq --count | awk '{print $2 " = " $1}'
ID.580 = 3
ID.694 = 2
With the corpus you provided in your question, this takes an average of 8 ms on my system. A larger corpus will take longer, of course, but unless you have a really huge data set this should be fast enough for most purposes.
Ruby
Ruby offers what I consider a more elegant solution, but is in fact slower. The idea here is to store the relevant portion of your IDs as hash keys, and increment a counter each time you encounter a given ID. For example, consider this Ruby one-liner:
$ ruby -ne 'BEGIN { id = Hash.new(0) }
id[$&] += 1 if /\AID\.\d+/
END { id.each_pair do |k,v| puts "#{k} = #{v}" end }' /tmp/foo
ID.580 = 3
ID.694 = 2
This solution takes around 45 ms to process the same corpus, so I wouldn't recommend it over the awk pipeline just for transforming output. The main advantage to doing it this way is that you have an actual data structure (e.g. a Hash object) that you could manipulate in a more full-featured program.
Here is awk one liner:
$ awk -F. '$1=="ID"{a[$2,$3]++}END{for (i in a) {split(i,ind,SUBSEP); r[ind[1]]++}for (i in r) print "ID."i" = "r[i]}' file
ID.694 = 1
ID.580 = 3
And here is a pure bash solution:
#!/bin/bash
while IFS=. read -r pre id code rest
do
[[ $pre == ID ]] || continue
[[ ${a[$id]} =~ \."$code"\. ]] || {
a[$id]="${a[$id]}.$code."
((count[$id]++));
}
done < file
for i in "${!count[#]}"
do
echo "ID.$i = ${count[$i]}"
done
$ ./script.sh
ID.580 = 3
ID.694 = 1
awk might work too...
awk '/ID.580/{x++}END{print x}' test.txt
You can put this in a for loop
for i in ID.580 ID.694
do
awk '/'$i'/{x++}END{print x}' test.txt
done

How to interate based on words in text? (Shell Scripting)

I have a file currently in the form
location1 attr attr ... attr
location2 attr attr ... attr
...
locationn attr atrr ... attr
What I want to do is go through each line, grab the location (first field) then iterate through the attributes. So far I know how to grab the first field, but not iterate through the attributes. There are also a different number of attributes for each line.
TEMP_LIST=$DIR/temp.list
while read LINE
do
x=`echo $LINE | awk '{print $1}'`
echo $x
done<$TEMP_LIST
Can someone tell me how to iterate through the attributes?
I want to get the effect like
while read LINE
do
location=`echo $LINES |awk '{print $1}'`
for attribute in attributes
do something involving the $location for the line and each individual $attribute
done<$TEMP_LIST
I am currently working in ksh shell, but any other unix shell is fine, I will find out how to translate. I am really grateful if someone could help as it would save me alot of time.
Thank you.
Similar to DreadPirateShawn's solution, but a bit simpler:
while read -r location all_attrs; do
read -ra attrs <<< "$all_attrs"
for attr in "${attrs[#]}"; do
: # do something with $location and $attr
done
done < inputfile
The second read line makes use of bash's herestring feature.
This might work in other shells too, but here's an approach that works in Bash:
#!/bin/bash
TEMP_LIST=temp.list
while read LINE
do
# Split line into array using space as delimiter.
IFS=' ' read -a array <<< $LINE
# Use first element of array as location.
location=${array[0]}
echo "First param: $location"
# Remove first element from array.
unset array[0]
# Loop through remaining array elements.
for i in "${array[#]}"
do
echo " Value: $i"
done
done < $TEMP_LIST
As you're already using awk in your posted code, why not learn how to use awk, as it is designed for this sort of problem.
while read LINE
do
location=`echo $LINES |awk '{print $1}'`
for attribute in attributes
do something involving the $location for the line and each individual $attribute
done<$TEMP_LIST
is written in awk as
#!/bin/bash
tempList="MyTempList.txt"
awk '{ # implied while loop for input records by default
location=$1
print "location=" location # location as a "header"
for (i=2;i<NF;i++) {
printf("attr%d=%s\t", i, $i) # print each attr with its number
}
printf("\n") # add new-line char to end of each line of attributes
}' ${tempList}
If you want to save your output, use awk '{.....}' ${tempList}> ${tempList}.new
Awk has numerous vars that it sets as it reads your files. NF mean NumberOfFields for the current line. So the for loop, starts at field 2, and prints all remaining fields on that line in the format provided (change to suit your needs). The i<=NF drives the ability to print all elems on a line.
Sometimes you'll want the 3rd to last elem on line, so you can perform math on the value stored in NF, like thirdFromLast=$(NF-3). For all variables that are numbers, you can "dereference" it as a value, and ask awk to print the value stored of the $N(th) field. i.e. try
print "thirdFromLast="(NF-3)
print "thirdFromLast="$(NF-3)
... to see the difference that the $ makes on a variable that holds a number.
(For large amounts of data, 1 awk process will be considerably more efficient that using subprocesses to gather parts of files.)
Also work your way through this tutorial grymoire's awk tutorial
IHTH

Resources