awk field count arithmetic - bash

I am trying to do a simple column addition of column $i and column $((i+33)), I am not sure the syntax is correct or not.
Two files are first pasted together, and then a column addition across two files are performed.
Thank you!
paste DOS.tmp DOS.tmp2 | awk '{ printf "%12.8f",$1 OFS; for(i=2; i<33; i++) printf "%12.8f",$i+$((i+33)) OFS; if(33) printf "%12.8f",$33+$66; printf ORS}' >| DOS.tmp3

In awk, unlike in bash, variable expansion does not require a dollar sign ($) in front of the variable name. Variables are defined like a = 2 and used like print a.
Dollar sign ($) is used to refer to (input) fields. So, print $1 will print the first field, and print $a will print the field referenced by variable a, in our case the second field. Similarly, print $a, $(a+3) will print the second and fifth field (separated by the OFS).
All this taken together, makes your program look like:
awk '{ out = sprintf("%12.8f", $1)
for (i=2; i<=33; i++) out = out sprintf("%s%12.8f", OFS, $i+$(i+33))
print out }' numbers
Notice we use sprintf to print all values to the output line variable out first, concatenating like out = out val, and then printing the complete output record with print.

Are you trying to add column i in file_1 and file_2? In this case, I provide an example:
paste <(seq -s' ' 33) <(seq -s' ' 33) | awk '{ for(i=1; i<=33; i++) { printf "%f",$i+$((i+33)) ; if(i!=33) printf OFS;} printf ORS}'

Related

Processing text with multiple delims in awk

I have a text which looks like -
Application.||dates:[2022-11-12]|models:[MODEL1]|count:1|ids:2320
Application.||dates:[2022-11-12]|models:[MODEL1]|count:5|ids:2320
I want the number from the count:1 columns so 1 and i wish to store these numbers in an array.
nums=($(echo -n "$grepResult" | awk -F ':' '{ print $4 }' | awk -F '|' '{ print $1 }'))
this seems very repetitive and not very efficient, any ideas how to simplify this ?
You can use awk once, set the field separator to |. Then loop all the fields and split on :
If the field starts with count then print the second part of the splitted value.
This way the count: part can occur anywhere in the string and can possibly print this multiple times.
nums=($(echo -n "$grepResult" | awk -F'|' '
{
for(i=1; i<=NF; i++) {
split($i, a, ":")
if (a[1] == "count") {
print a[2]
}
}
}
'))
for i in "${nums[#]}"
do
echo "$i"
done
Output
1
5
If you want to combine the both split values, you can use [|:] as a character class and print field number 8 for a precise match as mentioned in the comments.
Note that it does not check if it starts with count:
nums=($(echo -n "$grepResult" | awk -F '[|:]' '{print $8}'))
With gnu awk you can use a capture group to get a bit more precise match where on the left and right can be either the start/end of string or a pipe char. The 2nd group matches 1 or more digits:
nums=($(echo -n "$grepResult" | awk 'match($0, /(^|\|)count:([0-9]+)(\||$)/, a) {print a[2]}' ))
Try sed
nums=($(sed 's/.*count://;s/|.*//' <<< "$grepResult"))
Explanation:
There are two sed commands separated with ; symbol.
First command 's/.*count://' remove all characters till 'count:' including it.
Second command 's/|.*//' remove all characters starting from '|' including it.
Command order is important here.

even after putting FS as comma awk is considerins space as well for FS

I have set FS for awk as FS = "," but it is still using space as delimiter. After using below code it has created 3 array elemnets, one for commma and other for space. Please suggest how can I restrict awk to not use space as FS
Code :
arr_values=(`awk 'BEGIN{FS = ","}
{for (i=0; i<=NF; i++)
print $i
};
END{}' File`)
for ((i=0; i<${#arr_values[#]}; i++))
do
echo ${arr_values[$i]}
done
content of File :
1, abc def
Output :
abc
def
1
abc
def
First, quotes should be surrounding command substitution to prevent unwanted word splitting. When command output expands into an array unquoted, each whitespace delimited field gets separated into its own element.
Also, as jas mentioned, the Awk loop should be initializing i to 1 not 0. The $0 record in Awk is the entire line, which is why the array contains duplicates.
However, quoting variables won't do what you want:
$ arr_values=("$(awk -F', ' '{for (i=1; i<=NF; i++) print $i}' File)")
$ echo "${arr_values[0]}"
1
abc def
The newline is preserved here but is all contained a single element of the array because the quotes enclose the entire command substitution output.
To accomplish what you want, use the Bash builtin readarray. It will read each line from standard input into a separate array element:
$ readarray -t arr_values < <(awk -F', ' '{for (i=1; i<=NF; i++) print $i}' File)
$ for ((i=0; i<${#arr_values[#]}; i++)); do echo "${arr_values[$i]}"; done
1
abc def
See demo

output the rows with one non-empty column in csv using bash

Given a csv file, I want to output only the rows with exactly one non-empty column.
input file
"a","b","c"
"d","",""
output:
"d","",""
Can this be done in bash?
A simpler awk solution can be
$ awk '/^("",)*"."(,"")*$/' inputFile
"d","",""
What it does
/^("",)*"."(,"")*$/ patterns matches as
("",) number of empty columns
"." followed by ONE non empty column
(,"") further followed by number of empty columns
no action specified, hence takes the default action to print the entire record
EDIT
If there are more than one letter in a column
$ awk '/^("",)*"[^"]+"(,"")*$/' input
"d","",""
Thanks to Jotne
You can use sed for this:
sed -n '/^[",]*[^",]*[",]*$/p' file
To make sure it does not match blank lines we can add the +:
sed -n '/^[",]*"[^",]\+"[",]*$/p' file
It returns:
"d","",""
It is a matter of checking if there is one, and just one, block characters different than " or , in between these characters. -n inhibits the printing, whereas p prints the lines that accomplish the condition.
You could use gsub() to count the number of times an empty field is found, then subtract from NF and test equal to one. Here's one way using GNU AWK and the FPAT variable:
awk 'BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")" } NF - gsub(/""/, "&") == 1' file
If you don't have embedded commas, you could simply write:
awk -F, 'NF - gsub(/""/, "&") == 1' file
A simplistic approach which assumes that no fields in the CSV file contain commas:
awk -F '[",]+' '{n=0;for(i=2;i<NF;++i)$i~/^$/||++n}n==1' file.txt
Set the input field separator to one or more double quotes and commas. Loop through all of the fields, incrementing n for every non-empty field. If the total number is exactly 1, print the line.
The reason that the loop goes from field 2 to NF-1 is that the first and last field are before and after the parts that you are interested in.
Very similar but ever-so-slightly shorter:
awk -F ',' '{n=0;for(i=1;i<=NF;++i)$i~/""/||++n}n==1' file.txt
Use the comma as the field separator and increment n for any fields that contain "". In this case, the loop goes through each field.
Through sed.
$ sed -rn '/^(".[^"]*"(,"")*|""(,"")*,".[^"]*"(,"")*)$/p' file
"d","",""
First part ".[^"]*"(,"")* matches these type of string "A","","" where the second part ""(,"")*,".[^"]*"(,"")* would match these type of string formats "","","A"
Example:
$ cat file
"a","b","c"
"d","",""
"","","A"
"A","","A"
"","A",""
"","A","A"
"A","A",""
$ sed -rn '/^(".[^"]*"(,"")*|""(,"")*,".[^"]*"(,"")*)$/p' file
"d","",""
"","","A"
"","A",""
This grep should be able to handle this:
grep -E '^("",)*"[^"]+"(,"")*$' file
"d","",""
Just split the line into fields and count how many are non-empty:
$ awk -F'^"|","|"$' '{c=0; for (i=2; i<NF; i++) if ($i != "") ++c} c==1' file
"d","",""
The loop starts at 2 and ends at NF-1 because there's no point checking the empty fields that will always exist before the first and after the last "real" fields (i.e. before the ^" and after the "$) when the line is split using an FS that includes the start-of-string (^) and end-of-string ($) RE metacharacters.
If you ever wanted to check different counts of non-empty fields, just change the number you compare c to:
$ cat file
"a","b","c"
"d","",""
"e","","f"
"","",""
.
$ awk -F'^"|","|"$' '{c=0; for (i=2; i<NF; i++) if ($i != "") ++c} c==0' file
"","",""
$ awk -F'^"|","|"$' '{c=0; for (i=2; i<NF; i++) if ($i != "") ++c} c==1' file
"d","",""
$ awk -F'^"|","|"$' '{c=0; for (i=2; i<NF; i++) if ($i != "") ++c} c==2' file
"e","","f"
$ awk -F'^"|","|"$' '{c=0; for (i=2; i<NF; i++) if ($i != "") ++c} c==3' file
"a","b","c"
$ awk -F'^"|","|"$' '{c=0; for (i=2; i<NF; i++) if ($i != "") ++c} c==4' file
$

Error in code ... need correction

I am extracting the values in fourth column of a file and trying to add them.
#!/bin/bash
cat tag_FLI1 | awk '{print $4}'>tags
$t=0
for i in `cat tags`
do
$t=$t+$i (this is the position of trouble)
done
echo $t
error on line 6.
Thank you in advance for your time.
In case of using only awk for the task:
If fields are separated with blanks:
awk '{ sum += $4 } END { print sum }' tag_FLI1
Otherwise, use FS variable, like:
awk 'BEGIN { FS = "|" } { sum += $4 } END { print sum }' tag_FLI1
That's not how you do arithmetic in bash. To add the values from two variables x and y and store the result in a third variable z, it should look like this:
z=$((x + y))
However, you could more simply just do everything in awk, replacing your awk '{print $4}' with:
awk '{ sum += $4 } END { print sum }'
The awk approach will also correctly handle floating point numbers, which the bash approach will not.
You need to use a numeric context for adding the numbers. Also, cat is not needed here, as awk can read from a file. Unless you use "tags" in another script, you don't need to create the file. Also, if you are using bash and not perl or php, there shouldn't be a "$" on the left side of a variable assignment.
t=0
while read -r i
do
t=$((t + i))
done < <(awk '{print $4}' tag_FLI1)
echo "$t"
That can be done in just one line:
awk '{sum += $4} END {print sum}' tag_FLI1
However, if this is a learning exercise for bash, have a look at this example:
#!/bin/bash
sum=0
while read line; do
(( sum += $line ))
done < <(awk '{print $4}' tag_FLI1)
echo $sum
There were essentially 3 issues with your code:
Variables are assigned using VAR=..., not $VAR=.... See http://tldp.org/LDP/abs/html/varassignment.html
The way you sum the numers is incorrect. See arithmetic expansion for examples of how to do it.
It is not necessary to use an intermediate file just to iterate through the output of a command. Use a while loop as show above, but beware of this caveat.

Awk consider double quoted string as one token and ignore space in between

Data file - data.txt:
ABC "I am ABC" 35 DESC
DEF "I am not ABC" 42 DESC
cat data.txt | awk '{print $2}'
will result the "I" instead of the string being quoted
How to make awk so that it ignore the space within the quote and think that it is one single token?
Another alternative would be to use the FPAT variable, that defines a regular expression describing the contents of each field.
Save this AWK script as parse.awk:
#!/bin/awk -f
BEGIN {
FPAT = "([^ ]+)|(\"[^\"]+\")"
}
{
print $2
}
Make it executable with chmod +x ./parse.awk and parse your data file as ./parse.awk data.txt:
"I am ABC"
"I am not ABC"
Yes, this can be done nicely in awk. It's easy to get all the fields without any serious hacks.
(This example works in both The One True Awk and in gawk.)
{
split($0, a, "\"")
$2 = a[2]
$3 = $(NF - 1)
$4 = $NF
print "and the fields are ", $1, "+", $2, "+", $3, "+", $4
}
Try this:
$ cat data.txt | awk -F\" '{print $2}'
I am ABC
I am not ABC
The top answer for this question only works for lines with a single quoted field. When I found this question I needed something that could work for an arbitrary number of quoted fields.
Eventually I came upon an answer by Wintermute in another thread, and he provided a good generalized solution to this problem. I've just modified it to remove the quotes. Note that you need to invoke awk with -F\" when running the below program.
BEGIN { OFS = "" } {
for (i = 1; i <= NF; i += 2) {
gsub(/[ \t]+/, ",", $i)
}
print
}
This works by observing that every other element in the array will be inside of the quotes when you separate by the "-character, and so it replaces the whitespace dividing the ones not in quotes with a comma.
You can then easily chain another instance of awk to do whatever processing you need (just use the field separator switch again, -F,).
Note that this might break if the first field is quoted - I haven't tested it. If it does, though, it should be easy to fix by adding an if statement to start at 2 rather than 1 if the first character of the line is a ".
I've scrunched up together a function that re-splits $0 into an array called B. Spaces between double quotes are not acting as field separators. Works with any number of fields, a mix of quoted and unquoted ones. Here goes:
#!/usr/bin/gawk -f
# Resplit $0 into array B. Spaces between double quotes are not separators.
# Single quotes not handled. No escaping of double quotes.
function resplit( a, l, i, j, b, k, BNF) # all are local variables
{
l=split($0, a, "\"")
BNF=0
delete B
for (i=1;i<=l;++i)
{
if (i % 2)
{
k=split(a[i], b)
for (j=1;j<=k;++j)
B[++BNF] = b[j]
}
else
{
B[++BNF] = "\""a[i]"\""
}
}
}
{
resplit()
for (i=1;i<=length(B);++i)
print i ": " B[i]
}
Hope it helps.
Okay, if you really want all three fields, you can get them, but it takes a lot of piping:
$ cat data.txt | awk -F\" '{print $1 "," $2 "," $3}' | awk -F' ,' '{print $1 "," $2}' | awk -F', ' '{print $1 "," $2}' | awk -F, '{print $1 "," $2 "," $3}'
ABC,I am ABC,35
DEF,I am not ABC,42
By the last pipe you've got all three fields to do whatever you'd like with.
Here is something like what I finally got working that is more generic for my project.
Note it doesn't use awk.
someText="ABC \"I am ABC\" 35 DESC '1 23' testing 456"
putItemsInLines() {
local items=""
local firstItem="true"
while test $# -gt 0; do
if [ "$firstItem" == "true" ]; then
items="$1"
firstItem="false"
else
items="$items
$1"
fi
shift
done
echo "$items"
}
count=0
while read -r valueLine; do
echo "$count: $valueLine"
count=$(( $count + 1 ))
done <<< "$(eval putItemsInLines $someText)"
Which outputs:
0: ABC
1: I am ABC
2: 35
3: DESC
4: 1 23
5: testing
6: 456

Resources