AWK array parsing issue - shell

My two input files are pipe separated.
File 1 :
a|b|c|d|1|44
File 2 :
44|ab|cd|1
I want to store all my values of first file in array.
awk -F\| 'FNR==NR {a[$6]=$0;next}'
So if I store the above way is it possible to interpret array; say I want to know $3 of File 1. How can I get tat from a[].
Also will I be able to access array values if I come out of that awk?
Thanks

I'll answer the question as it is stated, but I have to wonder whether it is complete. You state that you have a second input file, but it doesn't play a role in your actual question.
1) It would probably be most sensible to store the fields individually, as in
awk -F \| '{ for(i = 1; i < NF; ++i) a[$NF,i] = $i } END { print a[44,3] }' filename
See here for details on multidimensional arrays in awk. You could also use the split function:
awk -F \| '{ a[$NF] = $0 } END { split(a[44], fields); print fields[3] }'
but I don't see the sense in it here.
2) No. At most you can print the data in a way that the surrounding shell understands and use command substitution to build a shell array from it, but POSIX shell doesn't know arrays at all, and bash only knows one-dimensional arrays. If you require that sort of functionality, you should probably use a more powerful scripting language such as Perl or Python.
If, any I'm wildly guessing here, you want to use the array built from the first file while processing the second, you don't have to quit awk for this. A common pattern is
awk -F \| 'FNR == NR { for(i = 1; i < NF; ++i) { a[$NF,i] = $i }; next } { code for the second file here }' file1 file2
Here FNR == NR is a condition that is only true when the first file is processed (the number of the record in the current file is the same as the number of the record overall; this is only true in the first file).

To keep it simple, you can reach your goal of storing (and accessing) values in array without using awk:
arr=($(cat yourFilename |tr "|" " ")) #store in array named arr
# accessing individual elements
echo ${arr[0]}
echo ${arr[4]}
# ...or accesing all elements
for n in ${arr[*]}
do
echo "$n"
done
...even though I wonder if that's what you are looking for. Inital question is not really clear.

Related

Turning multi-line string into single comma-separated list in Bash

I have this format:
host1,app1
host1,app2
host1,app3
host2,app4
host2,app5
host2,app6
host3,app1
host4... and so on.
I need it like this format:
host1;app1,app2,app3
host2;app4,app5,app6
I have tired this: awk -vORS=, '{ print $2 }' data | sed 's/,$/\n/'
and it gives me this:
app1,app2,app3 without the host in front.
I do not want to show duplicates.
I do not want this:
host1;app1,app1,app1,app1...
host2;app1,app1,app1,app1...
I want this format:
host1;app1,app2,app3
host2;app2,app3,app4
host3;app2;app3
With input sorted on the first column (as in your example ; otherwise just pipe it to sort), you can use the following awk command :
awk -F, 'NR == 1 { currentHost=$1; currentApps=$2 }
NR > 1 && currentHost == $1 { currentApps=currentApps "," $2 }
NR > 1 && currentHost != $1 { print currentHost ";" currentApps; currentHost=$1; currentApps=$2 }
END { print currentHost ";" currentApps }'
It has the advantage over other solutions posted as of this edit to avoid holding the whole data in memory. This comes at the cost of needing the input to be sorted (which is what would need to put lots of data in memory if the input wasn't sorted already).
Explanation :
the first line initializes the currentHost and currentApps variables to the values of the first line of the input
the second line handles a line with the same host as the previous one : the app mentionned in the line is appended to the currentApps variable
the third line handles a line with a different host than the previous one : the infos for the previous host are printed, then we reinitialize the variables to the value of the current line of input
the last line prints the infos of the current host when we have reached the end of the input
It probably can be refined (so much redundancy !), but I'll leave that to someone more experienced with awk.
See it in action !
$ awk '
BEGIN { FS=","; ORS="" }
$1!=prev { print ors $1; prev=$1; ors=RS; OFS=";" }
{ print OFS $2; OFS=FS }
END { print ors }
' file
host1;app1,app2,app3
host2;app4,app5,app6
host3;app1
Maybe something like this:
#!/bin/bash
declare -A hosts
while IFS=, read host app
do
[ -z "${hosts["$host"]}" ] && hosts["$host"]="$host;"
hosts["$host"]+=$app,
done < testfile
printf "%s\n" "${hosts[#]%,}" | sort
The script reads the sample data from testfile and outputs to stdout.
You could try this awk script:
awk -F, '{a[$1]=($1 in a?a[$1]",":"")$2}END{for(i in a) printf "%s;%s\n",i,a[i]}' file
The script creates entries in the array a for each unique element in the first column. It appends to that array entry all element from the second column.
When the file is parsed, the content of the array is printed.

Align around a given character in bash

Is there an easy way to align multiple rows of text about a single character, similar to this question, but in bash.
Also open to zsh solutions.
What I have:
aaa:aaaaaaaa
bb:bbb
cccccccccccc:cc
d:d
What I want:
aaa:aaaaaaaa
bb:bbb
cccccccccccc:cc
d:d
Preferably the output can be piped out and retain its layout too.
You can try with column and gnu sed
column -t -s':' infile | sed -E 's/(\S+)(\s{0,})( )(.*)/\2\1:\4/'
The shell itself does not seem like a particularly suitable tool for this task. Using an external tool makes for a solution which is portable between shells. Here is a simple Awk solution.
awk -F ':' '{ a[++n] = $1; b[n] = $2; if (length($1) > max) max = length($1) }
END { for (i=1; i<=n; ++i) printf "%" max "s:%s\n", a[i], b[i] }'
Demo: https://ideone.com/Eaebhh
This stores the input file in memory; if you need to process large amount of text, it would probably be better to split this into a two-pass script (first pass, just read all the lines to get max, then change the END block to actually print output from the second pass), which then requires the input to be seekable.

Using "awk" to find a string between two other specific strings:

I have a large output of text that'll include several lines like this:
sending:WHATIWANT:output
How would I use awk to make it so that this output would ONLY include WHATIWANT on each line?
edit: there is a changing amount of text before and after WHATIWANT so something like awk -F: '{print $2}' would not always work
From what you mention in the comments, this should do it:
perl -n -e'/sending:([^:]+):output/ && print $1' input_file
This runs a simple regex match line-by-line, capturing the interesting part and then printing it. It assumes that WHATIWANT does not contain the character :
If for some reason you absolutely must use awk(1), then I think you don't have much choice but to do this:
awk -F: '{ for (i = 2; i < NF; i++) if ($(i-1) == "sending" && $(i+1) == "output") print $i }' input_file
It basically splits each line by : and iterates through every field, comparing the left and right fields until it finds one that is between sending and output. Again, it assumes that WHATIWANT does not have a :
Can't you just use sed?
echo "asfasfdsf__sending:WHATIWANT:output__asdfadas" | sed -n 's/.*sending\:\([a-zA-Z0-9]*\)\:output.*/\1/p'
Gives you "WHATIWANT"

Error in code ... need correction

I am extracting the values in fourth column of a file and trying to add them.
#!/bin/bash
cat tag_FLI1 | awk '{print $4}'>tags
$t=0
for i in `cat tags`
do
$t=$t+$i (this is the position of trouble)
done
echo $t
error on line 6.
Thank you in advance for your time.
In case of using only awk for the task:
If fields are separated with blanks:
awk '{ sum += $4 } END { print sum }' tag_FLI1
Otherwise, use FS variable, like:
awk 'BEGIN { FS = "|" } { sum += $4 } END { print sum }' tag_FLI1
That's not how you do arithmetic in bash. To add the values from two variables x and y and store the result in a third variable z, it should look like this:
z=$((x + y))
However, you could more simply just do everything in awk, replacing your awk '{print $4}' with:
awk '{ sum += $4 } END { print sum }'
The awk approach will also correctly handle floating point numbers, which the bash approach will not.
You need to use a numeric context for adding the numbers. Also, cat is not needed here, as awk can read from a file. Unless you use "tags" in another script, you don't need to create the file. Also, if you are using bash and not perl or php, there shouldn't be a "$" on the left side of a variable assignment.
t=0
while read -r i
do
t=$((t + i))
done < <(awk '{print $4}' tag_FLI1)
echo "$t"
That can be done in just one line:
awk '{sum += $4} END {print sum}' tag_FLI1
However, if this is a learning exercise for bash, have a look at this example:
#!/bin/bash
sum=0
while read line; do
(( sum += $line ))
done < <(awk '{print $4}' tag_FLI1)
echo $sum
There were essentially 3 issues with your code:
Variables are assigned using VAR=..., not $VAR=.... See http://tldp.org/LDP/abs/html/varassignment.html
The way you sum the numers is incorrect. See arithmetic expansion for examples of how to do it.
It is not necessary to use an intermediate file just to iterate through the output of a command. Use a while loop as show above, but beware of this caveat.

Get next field/column width awk

I have a dataset of the following structure:
1234 4334 8677 3753 3453 4554
4564 4834 3244 3656 2644 0474
...
I would like to:
1) search for a specific value, eg 4834
2) return the following field (3244)
I'm quite new to awk, but realize it is a simple operation. I have created a bash-script that asks the user for the input, and attempts to return the following field.
But I can't seem to get around scoping in AWK. How do I parse the input value to awk?
#!/bin/bash
read input
cat data.txt | awk '
for (i=1;i<=NF;i++) {
if ($i==input) {
print $(i+1)
}
}
}'
Cheers and thanks in advance!
UPDATE Sept. 8th 2011
Thanks for all the replies.
1) It will never happen that the last number of a row is picked - still I appreciate you pointing this out.
2) I have a more general problem with awk. Often I want to "do something" with the result found. In this case I would like to output it to xclip - an application which read from standard input and copies it to the clipboard. Eg:
$ echo Hi | xclip
Unfortunately, echo doesn't exist for awk, so I need to return the value and echo it. How would you go about this?
#!/bin/bash
read input
cat data.txt | awk '{
for (i=1;i<=NF;i++) {
if ($i=='$input') {
print $(i+1)
}
}
}'
Don't over think it!
You can create an array in awk with the split command:
split($0, ary)
This will split the line $0 into an array called ary. Now, you can use array syntax to find the particular fields:
awk '{
size = split($0, ary)
for (i=1; i < size ;i++) {
print ary[i]
}
print "---"
}' data.txt
Now, when you find ary[x] as the field, you can print out ary[x+1].
In your example:
awk -v input=$input '{
size = split($0, ary)
for (i=1; i<= size ;i++) {
if ($i == ary[i]) {
print ary[i+1]
}
}
}' data.txt
There is a way of doing this without creating an array, but it's simply much easier to work with arrays in situations like this.
By the way, you can eliminate the cat command by putting the file name after the awk statement and save creating an extraneous process. Everyone knows creating an extraneous process kills a kitten. Please don't kill a kitten.
You pass shell variable to awk using -v option. Its cleaner/nicer than having to put quotes.
awk -v input="$input" '
for(i=1;i<=NF;i++){
if ($i == input ){
print "Next value: " $(i+1)
}
}
' data.txt
And lose the useless cat.
Here is my solution: delete everything up to (and including) the search field, then the field you want to print out is field #1 ($1):
awk '/4834/ {sub(/^.* * 4834 /, ""); print $1}' data.txt

Resources