Appending result of function on another field into csv using shell script, awk - shell

I have a csv file stored as a temporary variable in a shell script (*.sh).
Let's say the data looks like this:
Account,Symbol,Price
100,AAPL US,200
102,SPY US,500
I want to add a fourth column, "Type", which is the result of a shell function "foobar". Run from the command line or a shell script itself:
$ foobar "AAPL US"
"Stock"
$ foobar "SPY US"
"ETF"
How do I add this column to my csv, and populate it with calls to foobar which take the second column as an argument? To clarify, this is my ideal result post-script:
Account,Symbol,Price,Type
100,AAPL US,200,Common Stock
102,SPY US,500,ETF
I see many examples online involving such a column addition using awk, and populating the new column with fixed values, conditional values, mathematical derivations from other columns, etc. - but nothing that calls a function on another field and stores its output.

You may use this awk:
export -f foobar
awk 'BEGIN{FS=OFS=","} NR==1{print $0, "Type"; next} {
cmd = "foobar \"" $2 "\""; cmd | getline line; close(cmd);
print $0, line
}' file.csv
Account,Symbol,Price,Type
100,AAPL US,200,Common Stock
102,SPY US,500,ETF

#anubhavas answer is a good approach so please don't change the accepted answer as I'm only posting this as an answer as it's too big and in need of formatting to fit in a comment.
FWIW I'd write his awk script as:
awk '
BEGIN { FS=OFS="," }
NR==1 { type = "Type" }
NR > 1 {
cmd = "foobar \047" $2 "\047"
type = ((cmd | getline line) > 0 ? line : "ERROR")
close(cmd)
}
{ print $0, type }
' file.csv
to:
better protect $2 from shell expansion, and
protect from silently printing the previous value if/when cmd | getline fails, and
consolidate the print statements to 1 line so it's easy to change for all output lines if/when necessary

awk to the rescue!
$ echo "Account,Symbol,Price
100,AAPL US,200
102,SPY US,500" |
awk -F, 'NR>1{cmd="foobar "$2; cmd | getline type} {print $0 FS (NR==1?"Type":type)}'
Not sure you need to quote the input to foobar

Another way not using awk:
paste -d, input.csv <({ read; printf "Type\n"; while IFS=, read -r _ s _; do foobar "$s"; done; } < input.csv)

Related

change numerical value in file to characters via awk

I'm looking to replace the numerical values in a file with a new value provided by me. Can be present in any part of the text, in some cases, it comes across as the third position but is not always necessarily the case. Also to try and save a new version of the file.
original format
A:fdg:user#server:r
A:g:1234:xtcy
A:d:1111:xtcy
modified format
A:fdg:user#server:rxtTncC
A:g:replaced_value:xtcy
A:d:replaced_value:xtcy
bash line command with awk:
awk -v newValue="newVALUE" 'BEGIN{FS=OFS=":"} /:.:.*:/ && ~/^[0-9]+$/{~=newValue} 1' original_file.txt > replaced_file.txt
You can simply use sed instead of awk:
sed -E 's/\b[0-9]+\b/replaced_value/g' /path/to/infile > /path/to/outfile
Here is an awk that asks you for replacement values for each numerical value it meets:
$ awk '
BEGIN {
FS=OFS=":" # delimiters
}
{
for(i=1;i<=NF;i++) # loop all fields
if($i~/^[0-9]+$/) { # if numerical value found
printf "Provide replacement value for %d: ",$i > "/dev/stderr"
getline $i < "/dev/stdin" # ask for a replacement
}
}1' file_in > file_out # write output to a new file
I would use GNU AWK for this task following way, let file.txt content be
A:fdg:user#server:rxtTncC
A:g:1234:xtcy
A:d:1111:xtcy
then
awk 'BEGIN{newvalue="replacement"}{gsub(/[[:digit:]]+/,newvalue);print}' file.txt
output
A:fdg:user#server:rxtTncC
A:g:replacement:xtcy
A:d:replacement:xtcy
Explanation: replace one or more digits using newvalue. Disclaimer: I assumed numeric is something consisting solely from digits.
(tested in gawk 4.2.1)
How about
awk -F : '$3 ~ /^[0-9]+$/ { $3 = "new value"} {print}' original_file >replaced_file
?

how to select the last line of the shell output

Hi I have a shell command like this.
s3=$(awk 'BEGIN{ print "S3 bucket path" }
/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 }
/s3:\/\//{ print "," $10 }' OFS=',' hive-server2.log)
The output of the above command like this.
echo $s3
2018-02-21T17:58:22,
2018-02-21T17:58:26,
2018-02-21T18:05:33,
2018-02-21T18:05:34
I want to select the last line only. I need the last output like this.
2018-02-21T18:05:34
I tried like this.
awk -v $s3 '{print $(NF)}'
Not working.Any help will be appreciated.
In general, command | tail -n 1 prints the last line of the output from command. However, where command is of the form awk '... { ... print something }' you can refactor to awk '... { ... result = something } END { print result }' to avoid spawning a separate process just to discard the other output. (Conversely, you can replace awk '/condition/ { print something }' | head -n 1 with awk '/condition/ { print something; exit }'.)
If you already have the result in a shell variable s3 and want to print just the last line, a parameter expansion echo "${s3##*$'\n'}" does that. The C-style string $'\n' to represent a newline is a Bash extension, and the parameter expansion operator ## to remove the longest matching prefix isn't entirely portable either, so you should make sure the shebang line says #!/bin/bash, not #!/bin/sh
Notice also that $s3 without quotes is an error unless you specifically require the shell to perform whitespace tokenization and wildcard expansion on the value. You should basically always use double quotes around variables except in a couple of very specific scenarios.
Your Awk command would not work for two reasons; firstly, as explained in the previous paragraph, you are setting s3 to the first token of the variable, and the second is your Awk script (probably a syntax error). In more detail, you are basically running
awk -v s3=firstvalue secondvalue thirdvalue '{ print $(NF) }'
^ value ^ script to run ^ names of files ...
where you probably wanted to say
awk -v s3=$'firstvalue\nsecondvalue\nthirdvalue' '{ print $(NF) }'
But even with quoting, your script would set v to something but then tell Awk to (ignore the variable and) process standard input, which on the command line leaves it reading from your terminal. A fixed script might look like
awk 'END { print }' <<<"$s3"
which passes the variable as standard input to Awk, which prints the last line. The <<<value "here string" syntax is also a Bash extension, and not portable to POSIX sh.
much simple way is
command | grep "your filter" | tail -n 1
or directly
command | tail -n 1
You could try this:
echo -e "This is the first line \nThis is the second line" | awk 'END{print}'
another approach can be, processing the file from the end and exiting after first match.
tac file | awk '/match/{print; exit}'
Hi you can do it just by adding echo $s3 | sed '$!d'
s3=$(awk 'BEGIN{ print "S3 bucket path" }/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 } /s3:\/\//{ print "," $10 }' OFS=',' hive-server2.log)
echo $s3 | sed '$!d'
It will simply print:-
2018-02-21T18:05:34
Hope this will help you.

Replace special characters in variable in awk shell command

I am currently executing the following command:
awk 'BEGIN { FS="," ; getline ; H=$0 } N != $3 { N=$3 ; print H > "/Directory/FILE_"$3"_DOWNLOAD.csv" } { print > "/Directory/FILE_"$3"_DOWNLOAD.csv" }' /Directory/FILE_ALL_DOWNLOAD.csv
This takes the value from the third position in the CSV file and creates a CSV for each distinct $3 value. Works as desired.
The input file looks as follows:
Name, Amount, ID
"ABC", "100.00", "0000001"
"DEF", "50.00", "0000001"
"GHI", "25.00", "0000002"
Unfortunately I have no control over the value in the source (CSV) sheet, the $3 value, but I would like to eliminate special (non-alphanumeric) characters from it. I tried the following to accomplish this but failed...
awk 'BEGIN { FS="," ; getline ; H=$0 } N != $3 { N=$3 ; name=${$3//[^a-zA-Z_0-9]/}; print H > "/Directory/FILE_"$name"_DOWNLOAD.csv" } { print > "/Directory/FILE_"$name"_DOWNLOAD.csv" }' /Directory/FILE_ALL_DOWNLOAD.csv
Suggestions? I'm hoping to do this in a single command but if anyone has a bash script answer that would work.
This is definitely not a job you should be using getline for, see http://awk.info/?tip/getline
It looks like you just want to reproduce the first line of your input file in every $3-named file. That'd be:
awk -F, '
NR==1 { hdr=$0; next }
$3 != prev { prev=name=$3; gsub(/[^[:alnum:]_]/,"",name); $0 = hdr "\n" $0 }
{ print > ("/Directory/FILE_" name "_DOWNLOAD.csv") }
' /Directory/FILE_ALL_DOWNLOAD.csv
Note that you must always parenthesize expressions on the right side of output redirection (>) as it's ambiguous otherwise and different awks will behave differently if you don't.
Feel free to put it all back onto one line if you prefer.
If you always expect the number to be in the last field of your CSV and you know that each field is wrapped in quotes, you could use this awk to extract the value 456 from the input you have provided in the comment:
echo " 123.", "Company Name" " 456." | awk -F'[^a-zA-Z0-9]+' 'NF { print $(NF-1) }'
This defines the field separator as any number of non-alphanumeric characters and retrieves the second-last field.
If this is sufficient to reliably retrieve the value, you could construct your filename like this:
file = "/Directory/FILE_" $(NF-1) "_DOWNLOAD.csv"
and output to it as you're already doing.
bash variable expansions do not occur in single quotes.
They also cannot be performed on awk variables.
That being said you don't need that to work.
awk has string manipulation functions that can perform the same tasks. In this instance you likely want the gsub function.
Would this not work for what you asked ?
awk -F, 'a=NR==1{x=$0;next}
!a{gsub(/[^[:alnum:]]/,"",$3);print x"\n"$0 >> "/Directory/FILE_"$3"_DOWNLOAD.csv"}' file

How to replace full column with the last value?

I'm trying to take last value in third column of a CSV file and replace then the whole third column with this value.
I've been trying this:
var=$(tail -n 1 math_ready.csv | awk -F"," '{print $3}'); awk -F, '{$3="$var";}1' OFS=, math_ready.csv > math1.csv
But it's not working and I don't understand why...
Please help!
awk '
BEGIN { ARGV[2]=ARGV[1]; ARGC++; FS=OFS="," }
NR==FNR { last = $3; next }
{ $3 = last; print }
' math_ready.csv > math1.csv
The main problem with your script was trying to access a shell variable ($var) inside your awk script. Awk is not shell, it is a completely separate language/tool with it's own namespace and variables. You cannot directly access a shell variable in awk, just like you couldn't access it in C. To access the VALUE of a shell variable you'd do:
shellvar=27
awk -v awkvar="$shellvar" 'BEGIN{ print awkvar }'`
Some additional cleanup:
When FS and OFS have the same value, don't assign them each to that value separately, use BEGIN{ FS=OFS="," } instead for clarity and maintainability.
Do not iniatailize variables AFTER the script that uses those variables unless you have a very specifc reason to do so. Use awk -F... -v OFS=... 'script' to init those variables to separate values, not awk -F... 'script' OFS=... as it's very unnatural to init variables in the code segment AFTER you've used them and variables inited in the args list at the end are not initialized when the BEGIN section is executed which can cause bugs.
A shell variable is not expandable internally in awk. You can do this instead:
awk -F, -v var="$var" '{ $3 = var } 1' OFS=, math_ready.csv > math1.cs
And you probably can simplify your code with this:
awk -F, 'NR == FNR { r = $3; next } { $3 = r } 1' OFS=, math_ready.csv math_ready.csv > math1.csv
Example input:
1,2,1
1,2,2
1,2,3
1,2,4
1,2,5
Output:
1,2,5
1,2,5
1,2,5
1,2,5
1,2,5
Try this one liner. It doesn't depend on the column count
var=`tail -1 sample.csv | perl -ne 'm/([^,]+)$/; print "$1";'`; cat sample.csv | while read line; do echo $line | perl -ne "s/[^,]*$/$var\n/; print $_;"; done
cat sample.csv
24,1,2,30,12
33,4,5,61,3333
66,7,8,91111,1
76,10,11,32,678
Out:
24,1,2,30,678
33,4,5,61,678
66,7,8,91111,678
76,10,11,32,678

Get next field/column width awk

I have a dataset of the following structure:
1234 4334 8677 3753 3453 4554
4564 4834 3244 3656 2644 0474
...
I would like to:
1) search for a specific value, eg 4834
2) return the following field (3244)
I'm quite new to awk, but realize it is a simple operation. I have created a bash-script that asks the user for the input, and attempts to return the following field.
But I can't seem to get around scoping in AWK. How do I parse the input value to awk?
#!/bin/bash
read input
cat data.txt | awk '
for (i=1;i<=NF;i++) {
if ($i==input) {
print $(i+1)
}
}
}'
Cheers and thanks in advance!
UPDATE Sept. 8th 2011
Thanks for all the replies.
1) It will never happen that the last number of a row is picked - still I appreciate you pointing this out.
2) I have a more general problem with awk. Often I want to "do something" with the result found. In this case I would like to output it to xclip - an application which read from standard input and copies it to the clipboard. Eg:
$ echo Hi | xclip
Unfortunately, echo doesn't exist for awk, so I need to return the value and echo it. How would you go about this?
#!/bin/bash
read input
cat data.txt | awk '{
for (i=1;i<=NF;i++) {
if ($i=='$input') {
print $(i+1)
}
}
}'
Don't over think it!
You can create an array in awk with the split command:
split($0, ary)
This will split the line $0 into an array called ary. Now, you can use array syntax to find the particular fields:
awk '{
size = split($0, ary)
for (i=1; i < size ;i++) {
print ary[i]
}
print "---"
}' data.txt
Now, when you find ary[x] as the field, you can print out ary[x+1].
In your example:
awk -v input=$input '{
size = split($0, ary)
for (i=1; i<= size ;i++) {
if ($i == ary[i]) {
print ary[i+1]
}
}
}' data.txt
There is a way of doing this without creating an array, but it's simply much easier to work with arrays in situations like this.
By the way, you can eliminate the cat command by putting the file name after the awk statement and save creating an extraneous process. Everyone knows creating an extraneous process kills a kitten. Please don't kill a kitten.
You pass shell variable to awk using -v option. Its cleaner/nicer than having to put quotes.
awk -v input="$input" '
for(i=1;i<=NF;i++){
if ($i == input ){
print "Next value: " $(i+1)
}
}
' data.txt
And lose the useless cat.
Here is my solution: delete everything up to (and including) the search field, then the field you want to print out is field #1 ($1):
awk '/4834/ {sub(/^.* * 4834 /, ""); print $1}' data.txt

Resources