How can I compare floats in bash - bash

Working on a script and I am currently stuck. (Still pretty new at this)
First off I have my data file, the file I am searching inside.
First field is name, second is money payed, and third is money owed.
customerData.txt
name1,500.00,1000
name2,2000,100
name3,100,100.00
Here is my bash file. Basically if the owe amount is greater than the paid amount, then print the name. Works fine for anything thats not a float. I also understand that bash doesn't handle floats and the only way to handle them is with the bc utility, but I have had no luck.
#!/bin/bash
while IFS="," read name paid owe; do
#due=$(echo "$owe - $paid" |bc -1)
#echo $due
if [ $owe -gt $paid ]; then
echo $name
fi
done < customerData.txt

To print all lines for which the third column is larger than the second:
$ awk -F, '$3>$2' customerData.txt
name1,500.00,1000
How it works
-F, tells awk that the columns are comma-separated.
$3>$2 tells awk to print any line for which the third column is larger than the second.
In more detail, $3>$2 is a condition: it evaluates to true or false. If it evaluates to true, then the action is performed. Since we didn't specify any action, awk performs the default action which is to print the line.

Related

Update csv data in bash

I am trying to create a bash script that Looks through the csv file and depending on some input that would be the first value in a row of the file, it would increase the value of the second value in the row by 10.
eg. I have the file:
Jim,0,
Henry,12,
Louise,6
And with the input "Henry", I want to change the 12 to a 24 such that the file saves the value.
I have tried using sed with this
while read type time amount
do
i=$(($i+1))
if [ $i -eq 2 ];
then
x=$time
fi
done < $INPUT
sed -i '/^Henry,/s/[^,]*/$(($x+10))/1' test.csv
so that it searches through the csv until it gets to the second row (array indexing seems to start from 1 in my tests) then it saves the time (the second value in the row) to a variable then I use sid as I found here, but it doesn't seem to work, whether it is not saving or not editing I don't know.
I tried to do it with awk, but I don't understand it enough, so if any of you would be kind enough to enlighten me I would be very grateful.
Edit: Fixed bad formatting
Thanks in advance!
awk doesn't support a "in place" replacement. You should use gawk instead. For example:
gawk -F ',' -i inplace '{ $2 = ($1 == "Henry") ? $2+10 : $2 } 1' file.csv
-F defines delimeter
{ $2 = ($1 == "Henry") ? $2+10 : $2 }
ternary operator
condition expression ? statement1 : statement2
https://www.tutorialspoint.com/awk/awk_ternary_operators.htm
if condition ? then first : else second
so, if our condition is met, we are adding 10, if not, we just rewrite $2nd field as it was before
and here is a meaning of a 1 at the end:
https://unix.stackexchange.com/questions/63891/what-is-the-meaning-of-1-at-the-end-of-an-awk-script

Convert multi-line csv to single line using Linux tools

I have a .csv file that contains double quoted multi-line fields. I need to convert the multi-line cell to a single line. It doesn't show in the sample data but I do not know which fields might be multi-line so any solution will need to check every field. I do know how many columns I'll have. The first line will also need to be skipped. I don't how much data so performance isn't a consideration.
I need something that I can run from a bash script on Linux. Preferably using tools such as awk or sed and not actual programming languages.
The data will be processed further with Logstash but it doesn't handle double quoted multi-line fields hence the need to do some pre-processing.
I tried something like this and it kind of works on one row but fails on multiple rows.
sed -e :0 -e '/,.*,.*,.*,.*,/b' -e N -e '1n;N;N;N;s/\n/ /g' -e b0 file.csv
CSV example
First name,Last name,Address,ZIP
John,Doe,"Country
City
Street",12345
The output I want is
First name,Last name,Address,ZIP
John,Doe,Country City Street,12345
Jane,Doe,Country City Street,67890
etc.
etc.
First my apologies for getting here 7 months late...
I came across a problem similar to yours today, with multiple fields with multi-line types. I was glad to find your question but at least for my case I have the complexity that, as more than one field is conflicting, quotes might open, close and open again on the same line... anyway, reading a lot and combining answers from different posts I came up with something like this:
First I count the quotes in a line, to do that, I take out everything but quotes and then use wc:
quotes=`echo $line | tr -cd '"' | wc -c` # Counts the quotes
If you think of a single multi-line field, knowing if the quotes are 1 or 2 is enough. In a more generic scenario like mine I have to know if the number of quotes is odd or even to know if the line completes the record or expects more information.
To check for even or odd you can use the mod operand (%), in general:
even % 2 = 0
odd % 2 = 1
For the first line:
Odd means that the line expects more information on the next line.
Even means the line is complete.
For the subsequent lines, I have to know the status of the previous one. for instance in your sample text:
First name,Last name,Address,ZIP
John,Doe,"Country
City
Street",12345
You can say line 1 (John,Doe,"Country) has 1 quote (odd) what means the status of the record is incomplete or open.
When you go to line 2, there is no quote (even). Nevertheless this does not mean the record is complete, you have to consider the previous status... so for the lines following the first one it will be:
Odd means that record status toggles (incomplete to complete).
Even means that record status remains as the previous line.
What I did was looping line by line while carrying the status of the last line to the next one:
incomplete=0
cat file.csv | while read line; do
quotes=`echo $line | tr -cd '"' | wc -c` # Counts the quotes
incomplete=$((($quotes+$incomplete)%2)) # Check if Odd or Even to decide status
if [ $incomplete -eq 1 ]; then
echo -n "$line " >> new.csv # If line is incomplete join with next
else
echo "$line" >> new.csv # If line completes the record finish
fi
done
Once this was executed, a file in your format generates a new.csv like this:
First name,Last name,Address,ZIP
John,Doe,"Country City Street",12345
I like one-liners as much as everyone, I wrote that script just for the sake of clarity, you can - arguably - write it in one line like:
i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l " || echo "$l";done >new.csv
I would appreciate it if you could go back to your example and see if this works for your case (which you most likely already solved). Hopefully this can still help someone else down the road...
Recovering the multi-line fields
Every need is different, in my case I wanted the records in one line to further process the csv to add some bash-extracted data, but I would like to keep the csv as it was. To accomplish that, instead of joining the lines with a space I used a code - likely unique - that I could then search and replace:
i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l ~newline~ " || echo "$l";done >new.csv
the code is ~newline~, this is totally arbitrary of course.
Then, after doing my processing, I took the csv text file and replaced the coded newlines with real newlines:
sed -i 's/ ~newline~ /\n/g' new.csv
References:
Ternary operator: https://stackoverflow.com/a/3953666/6316852
Count char occurrences: https://stackoverflow.com/a/41119233/6316852
Other peculiar cases: https://www.linuxquestions.org/questions/programming-9/complex-bash-string-substitution-of-csv-file-with-multiline-data-937179/
TL;DR
Run this:
i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l " || echo "$l";done >new.csv
... and collect results in new.csv
I hope it helps!
If Perl is your option, please try the following:
perl -e '
while (<>) {
$str .= $_;
}
while ($str =~ /("(("")|[^"])*")|((^|(?<=,))[^,]*((?=,)|$))/g) {
if (($el = $&) =~ /^".*"$/s) {
$el =~ s/^"//s; $el =~ s/"$//s;
$el =~ s/""/"/g;
$el =~ s/\s+(?!$)/ /g;
}
push(#ary, $el);
}
foreach (#ary) {
print /\n$/ ? "$_" : "$_,";
}' sample.csv
sample.csv:
First name,Last name,Address,ZIP
John,Doe,"Country
City
Street",12345
John,Doe,"Country
City
Street",67890
Result:
First name,Last name,Address,ZIP
John,Doe,Country City Street,12345
John,Doe,Country City Street,67890
This might work for you (GNU sed):
sed ':a;s/[^,]\+/&/4;tb;N;ba;:b;s/\n\+/ /g;s/"//g' file
Test each line to see that it contains the correct number of fields (in the example that was 4). If there are not enough fields, append the next line and repeat the test. Otherwise, replace the newline(s) by spaces and finally remove the "'s.
N.B. This may be fraught with problems such as ,'s between "'s and quoted "'s.
Try cat -v file.csv. When the file was made with Excel, you might have some luck: When the newlines in a field are a simple \n and the newline at the end is a \r\n (which will look like ^M), parsing is simple.
# delete all newlines and replace the ^M with a new newline.
tr -d "\n" < file.csv| tr "\r" "\n"
# Above two steps with one command
tr "\n\r" " \n" < file.csv
When you want a space between the joined line, you need an additional step.
tr "\n\r" " \n" < file.csv | sed '2,$ s/^ //'
EDIT: #sjaak commented this didn't work is his case.
When your broken lines also have ^M you still can be a lucky (wo-)man.
When your broken field is always the first field in double quotes and you have GNU sed 4.2.2, you can join 2 lines when the first line has exactly one double quote.
sed -rz ':a;s/(\n|^)([^"]*)"([^"]*)\n/\1\2"\3 /;ta' file.csv
Explanation:
-z don't use \n as line endings
:a label for repeating the step after successful replacement
(\n|^) Search after a newline or the very first line
([^"]*) Substring without a "
ta Go back to label a and repeat
awk pattern matching is working.
answer in one line :
awk '/,"/{ORS=" "};/",/{ORS="\n"}{print $0}' YourFile
if you'd like to drop quotes, you could use:
awk '/,"/{ORS=" "};/",/{ORS="\n"}{print $0}' YourFile | sed 's/"//gw NewFile'
but I prefer to keep it.
to explain the code:
/Pattern/ : find pattern in current line.
ORS : indicates the output line record.
$0 : indicates the whole of the current line.
's/OldPattern/NewPattern/': substitude first OldPattern with NewPattern
/g : does the previous action for all OldPattern
/w : write the result to Newfile

Efficient substring parsing of large fixed length text file in Bash

I have a large text file (millions of records) of fixed length data and need to extract unique substrings and create a number of arrays with those values. I have a working version, however I'm wondering if performance can be improved since I need to run the script iteratively.
$_file5 looks like:
138000010065011417865201710152017102122
138000010067710416865201710152017102133
138000010131490417865201710152017102124
138000010142349413865201710152017102154
138400010142356417865201710152017102165
130000101694334417865201710152017102176
Here is what I have so far:
while IFS='' read -r line || [[ -n "$line" ]]; do
_in=0
_set=${line:15:6}
_startDate=${line:21:8}
_id="$_account-$_set-$_startDate"
for element in "${_subsets[#]}"; do
if [[ $element == "$_set" ]]; then
_in=1
break
fi
done
# If we find a new one and it's not 504721
if [ $_in -eq 0 ] && [ $_set != "504721" ] ; then
_subsets=("${_subsets[#]}" "$_set")
_ids=("${_ids[#]}" "$_id")
fi
done < $_file5
And this yields:
_subsets=("417865","416865","413865")
_ids=("9899-417865-20171015", "9899-416865-20171015", "9899-413865-20171015")
I'm not sure if sed or awk would be better here and can't find a way to implement either. Thanks.
EDIT: Benchmark Tests
So I benchmarked my original solution against the two provided. Ran this over 10 times and all results where similar to below.
# Bash read
real 0m8.423s
user 0m8.115s
sys 0m0.307s
# Using sort -u (#randomir)
real 0m0.719s
user 0m0.693s
sys 0m0.041s
# Using awk (#shellter)
real 0m0.159s
user 0m0.152s
sys 0m0.007s
Looks like awk wins this one. Regardless, the performance improvement from my original code is substantial. Thank you both for your contributions.
I don't think you can beat the performance of sort -u with bash loops (except in corner cases, as this one turned out to be, see footnote✻).
To reduce the list of strings you have in file to a list of unique strings (set), based on a substring:
sort -k1.16,1.21 -u file >set
Then, to filter-out the unwanted id, 504721, starting at position 16, you can use grep -v:
grep -vE '.{15}504721' set
Finally, reformat the remaining lines and store them in arrays with cut/sed/awk/bash.
So, to populate the _subsets array, for example:
$ _subsets=($(sort -k1.16,1.21 -u file | grep -vE '.{15}504721' | cut -c16-21))
$ printf "%s\n" "${_subsets[#]}"
413865
416865
417865
or, to populate the _ids array:
$ _ids=($(sort -k1.16,1.21 -u file | grep -vE '.{15}504721' | sed -E 's/^.{15}(.{6})(.{8}).*/9899-\1-\2/'))
$ printf "%s\n" "${_ids[#]}"
9899-413865-20171015
9899-416865-20171015
9899-417865-20171015
✻ If the input file is huge, but it contains only a small number (~40) of unique elements (for the relevant field), then it makes perfect sense for the awk solution to be faster. sort needs to sort a huge file (O(N*logN)), then filter the dupes (O(N)), all for a large N. On the other hand, awk needs to pass through the large input only once, checking for dupes along the way via set membership testing. Since the set of uniques is small, membership testing takes only O(1) (on average, but for such a small set, practically constant even in worst case), making the overall time O(N).
In case there were less dupes, awk would have O(N*log(N)) amortized, and O(N2) worst case. Not to mention the higher constant per-instruction overhead.
In short: you have to know how your data looks like before choosing the right tool for the job.
Here's an awk solution embedded in a bash script:
#!/bin/bash
fn_parser() {
awk '
BEGIN{ _account="9899" }
{ _set=substr($0,16,6)
_startDate=substr($0,22,8)
#dbg print "#dbg:_set=" _set "\t_startDate=" _startDate
if (_set != "504721") {
_id= _account "-" _set"-" _startDate
ids[_id] = _id
sets[_set]=_set
}
}
END {
printf "_subsets=("
for (s in sets) { printf("%s\"%s\"" , (commaCtr++ ? "," : ""), sets[s]) }
print ");"
printf "_ids=("
for (i in ids) { printf("%s\"%s\"" , (commaCtr2++ ? "," : ""), ids[i]) }
print ")"
}
' "${#}"
}
#dbg set -vx
eval $( echo $(fn_parser *.txt) )
echo "_subsets="$_subsets
echo "_ids="$_ids
output
_subsets=413865,417865,416865
_ids=9899-416865-20171015,9899-413865-20171015,9899-417865-20171015
Which I believe would be the same output your script would get if you did an echo on your variable names.
I didn't see that _account was being extracted from your file, and assume it is passed in from a previous step in your batch. But until I know if that is a critical piece, I'll have to come back to figuring out how to pass in var to a function that calls awk.
People won't like using eval, but hopefully no one will embed /bin/rm -rf / into your data set ;-)
I use the eval so that the data extracted is available via the shell variables. You can uncomment the #dbg before the eval line to see how the code is executing in the "layers" of function, eval, var=value assignments.
Hopefully, you see how the awk script is a transcription of your code into awk.
It does rely on the fact that arrays can contain only 1 copy of a key/value pair.
I'd really appreciate if you post timings for all solutions submitted. (You could reduce the file size by 1/2 and still have a good test). Be sure to run each version several times, and discard the first run.
IHTH

Bash script processing too slow

I have the following script where I'm parsing 2 csv files to find a MATCH the files have 10000 lines each one. But the processing is taking a long time!!! Is this normal?
My script:
#!/bin/bash
IFS=$'\n'
CSV_FILE1=$1;
CSV_FILE2=$2;
sort -t';' $CSV_FILE1 >> Sorted_CSV1
sort -t';' $CSV_FILE2 >> Sorted_CSV2
echo "PATH1 ; NAME1 ; SIZE1 ; CKSUM1 ; PATH2 ; NAME2 ; SIZE2 ; CKSUM2" >> 'mapping.csv'
while read lineCSV1 #Parse 1st CSV file
do
PATH1=`echo $lineCSV1 | awk '{print $1}'`
NAME1=`echo $lineCSV1 | awk '{print $3}'`
SIZE1=`echo $lineCSV1 | awk '{print $7}'`
CKSUM1=`echo $lineCSV1 | awk '{print $9}'`
while read lineCSV2 #Parse 2nd CSV file
do
PATH2=`echo $lineCSV2 | awk '{print $1}'`
NAME2=`echo $lineCSV2 | awk '{print $3}'`
SIZE2=`echo $lineCSV2 | awk '{print $7}'`
CKSUM2=`echo $lineCSV2 | awk '{print $9}'`
# Test if NAM1 MATCHS NAME2
if [[ $NAME1 == $NAME2 ]]; then
#Test checksum OF THE MATCHING NAME
if [[ $CKSUM1 != $CKSUM2 ]]; then
#MAPPING OF THE MATCHING LINES
echo $PATH1 ';' $NAME1 ';' $SIZE1 ';' $CKSUM1 ';' $PATH2 ';' $NAME2 ';' $SIZE2 ';' $CKSUM2 >> 'mapping.csv'
fi
break #When its a match break the while loop and go the the next Row of the 1st CSV File
fi
done < Sorted_CSV2 #Done CSV2
done < Sorted_CSV1 #Done CSV1
This is a quadratic order. Also, see Tom Fenech comment: You are calling awk several times inside a loop inside another loop. Instead of using awk for the fields in every line try setting the IFS shell variable to ";" and read the fields directly in read commands:
IFS=";"
while read FIELD11 FIELD12 FIELD13; do
while read FIELD21 FIELD22 FIELD23; do
...
done <Sorted_CSV2
done <Sorted_CSV1
Though, this would be still O(N^2) and very inefficient. It seems you are matching 2 fields by a coincident field. This task is easier and faster to accomplish by using join command line utility, and would reduce order from O(N^2) to O(N).
Whenever you say "Does this file/data list/table have something that matches this file/data list/table?", you should think of associative arrays (sometimes called hashes).
An associative array is keyed by a particular value and each key is associated with a value. The nice thing is that finding a key is extremely fast.
In your loop of a loop, you have 10,000 lines in each file. You're outer loop executed 10,000 times. Your inner loop may execute 10,000 times for each and every line in your first file. That's 10,000 x 10,000 times you go through that inner loop. That's potentially looping 100 million times through that inner loop. Think you can see why your program might be a little slow?
In this day and age, having a 10,000 member associative array isn't that bad. (Imagine doing this back in 1980 on a MS-DOS system with 256K. It just wouldn't work). So, let's go through the first file, create a 10,000 member associative array, and then go through the second file looking for matching lines.
Bash 4.x has associative arrays, but I only have Bash 3.2 on my system, so I can't really give you an answer in Bash.
Besides, sometimes Bash isn't the answer to a particular issue. Bash can be a bit slow and the syntax can be error prone. Awk might be faster, but many versions don't have associative arrays. This is really a job for a higher level scripting language like Python or Perl.
Since I can't do a Bash answer, here's a Perl answer. Maybe this will help. Or, maybe this will inspire someone who has Bash 4.x can give an answer in Bash.
I Basically open the first file and create an associative array keyed by the checksum. If this is a sha1 checksum, it should be unique for all files (unless they're an exact match). If you don't have a sha1 checksum, you'll need to massage the structure a wee bit, but it's pretty much the same idea.
Once I have the associative array figured out, I then open file #2 and simply see if the checksum already exists in the file. If it does, I know I have a matching line, and print out the two matches.
I have to loop 10,000 times in the first file, and 10,000 times in the second. That's only 20,000 loops instead of 10 million that's 20,000 times less looping which means the program will run 20,000 times faster. So, if it takes 2 full days for your program to run with a double loop, an associative array solution will work in less than one second.
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
FILE1 => "file1.txt",
FILE2 => "file2.txt",
MATCHING => "csv_matches.txt",
};
#
# Open the first file and create the associative array
#
my %file_data;
open my $fh1, "<", FILE1;
while ( my $line = <$fh1> ) {
chomp $line;
my ( $path, $blah, $name, $bather, $yadda, $tl_dr, $size, $etc, $check_sum ) = split /\s+/, $line;
#
# The main key is "check_sum" which **should** be unique, especially if it's a sha1
#
$file_data{$check_sum}->{PATH} = $path;
$file_data{$check_sum}->{NAME} = $name;
$file_data{$check_sum}->{SIZE} = $size;
}
close $fh1;
#
# Now, we have the associative array keyed by the data we want to match, read file 2
#
open my $fh2, "<", FILE2;
open my $csv_fh, ">", MATCHING;
while ( my $line = <$fh2> ) {
chomp $line;
my ( $path, $blah, $name, $bather, $yadda, $tl_dr, $size, $etc, $check_sum ) = split /\s+/, $line;
#
# If there is a matching checksum in file1, we know we have a matching entry
#
if ( exists $file_data{$check_sum} ) {
printf {$csv_fh} "%s;%s:%s:%s:%s:%s\n",
$file_data{$check_sum}->{PATH}, $file_data{$check_sum}->{NAME}, $file_data{$check_sum}->{SIZE},
$path, $name, $size;
}
}
close $fh2;
close $csv_fh;
BUGS
(A good manpage always list issues!)
This assumes one match per file. If you have multiple duplicates in file1 or file2, you will only pick up the last one.
This assumes a sha256 or equivalent checksum. In such a checksum, it is extremely unlikely that two files will have the same checksum unless they match. A 16bit checksum from the historic sum command may have collisions.
Although a proper database engine would make a much better tool for this, it is still very well possible to do it with awk.
The trick is to sort your data, so that records with the same name are grouped together. Then a single pass from top to bottom is enough to find the matches. This can be done in linear time.
In detail:
Insert two columns in both CSV files
Make sure every line starts with the name. Also add a number (either 1 or 2) which denotes from which file the line originates. We will need this when we merge the two files together.
awk -F';' '{ print $2 ";1;" $0 }' csvfile1 > tmpfile1
awk -F';' '{ print $2 ";2;" $0 }' csvfile2 > tmpfile2
Concatenate the files, then sort the lines
sort tmpfile1 tmpfile2 > tmpfile3
Scan the result, report the mismatches
awk -F';' -f scan.awk tmpfile3
Where scan.awk contains:
BEGIN {
origin = 3;
}
$1 == name && $2 > origin && $6 != checksum {
print record;
}
{
name = $1;
origin = $2;
checksum = $6;
sub(/^[^;]*;.;/, "");
record = $0;
}
Putting it all together
Crammed together into a Bash oneliner, without explicit temporary files:
(awk -F';' '{print $2";1;"$0}' csvfile1 ; awk -F';' '{print $2";2;"$0}' csvfile2) | sort | awk -F';' 'BEGIN{origin=3}$1==name&&$2>origin&&$6!=checksum{print record}{name=$1;origin=$2;checksum=$6;sub(/^[^;]*;.;/,"");record=$0;}'
Notes:
If the same name appears more than once in csvfile1, then all but the last one are ignored.
If the same name appears more than once in csvfile2, then all but the first one are ignored.

How to interate based on words in text? (Shell Scripting)

I have a file currently in the form
location1 attr attr ... attr
location2 attr attr ... attr
...
locationn attr atrr ... attr
What I want to do is go through each line, grab the location (first field) then iterate through the attributes. So far I know how to grab the first field, but not iterate through the attributes. There are also a different number of attributes for each line.
TEMP_LIST=$DIR/temp.list
while read LINE
do
x=`echo $LINE | awk '{print $1}'`
echo $x
done<$TEMP_LIST
Can someone tell me how to iterate through the attributes?
I want to get the effect like
while read LINE
do
location=`echo $LINES |awk '{print $1}'`
for attribute in attributes
do something involving the $location for the line and each individual $attribute
done<$TEMP_LIST
I am currently working in ksh shell, but any other unix shell is fine, I will find out how to translate. I am really grateful if someone could help as it would save me alot of time.
Thank you.
Similar to DreadPirateShawn's solution, but a bit simpler:
while read -r location all_attrs; do
read -ra attrs <<< "$all_attrs"
for attr in "${attrs[#]}"; do
: # do something with $location and $attr
done
done < inputfile
The second read line makes use of bash's herestring feature.
This might work in other shells too, but here's an approach that works in Bash:
#!/bin/bash
TEMP_LIST=temp.list
while read LINE
do
# Split line into array using space as delimiter.
IFS=' ' read -a array <<< $LINE
# Use first element of array as location.
location=${array[0]}
echo "First param: $location"
# Remove first element from array.
unset array[0]
# Loop through remaining array elements.
for i in "${array[#]}"
do
echo " Value: $i"
done
done < $TEMP_LIST
As you're already using awk in your posted code, why not learn how to use awk, as it is designed for this sort of problem.
while read LINE
do
location=`echo $LINES |awk '{print $1}'`
for attribute in attributes
do something involving the $location for the line and each individual $attribute
done<$TEMP_LIST
is written in awk as
#!/bin/bash
tempList="MyTempList.txt"
awk '{ # implied while loop for input records by default
location=$1
print "location=" location # location as a "header"
for (i=2;i<NF;i++) {
printf("attr%d=%s\t", i, $i) # print each attr with its number
}
printf("\n") # add new-line char to end of each line of attributes
}' ${tempList}
If you want to save your output, use awk '{.....}' ${tempList}> ${tempList}.new
Awk has numerous vars that it sets as it reads your files. NF mean NumberOfFields for the current line. So the for loop, starts at field 2, and prints all remaining fields on that line in the format provided (change to suit your needs). The i<=NF drives the ability to print all elems on a line.
Sometimes you'll want the 3rd to last elem on line, so you can perform math on the value stored in NF, like thirdFromLast=$(NF-3). For all variables that are numbers, you can "dereference" it as a value, and ask awk to print the value stored of the $N(th) field. i.e. try
print "thirdFromLast="(NF-3)
print "thirdFromLast="$(NF-3)
... to see the difference that the $ makes on a variable that holds a number.
(For large amounts of data, 1 awk process will be considerably more efficient that using subprocesses to gather parts of files.)
Also work your way through this tutorial grymoire's awk tutorial
IHTH

Resources