awk print $0 with newline separated column values - bash

Input:
"prefix_foo,prefix_bar"
Expected Output:
foo
bar
This is what I've so far.
$ echo "PREFIX_foo,PREFIX_bar" | awk '/PREFIX_/{x=gsub("PREFIX_", ""); print $0 }'
foo,bar
I'm unable to figure out how to print foo and bar separated by a newline. Thanks in advance!
EDIT:
Length of input is unknown so there can be more than 2 words separated by comma.
This question is more towards learning awk language, not alternative gnu utils.

You may not need awk for this. Here is pure bash solution:
s="prefix_foo,prefix_bar"
s="${s//prefix_/}"
s="${s//,/$'\n'}"
echo "$s"
foo
bar
Here is one liner gnu sed for the same:
sed 's/prefix_//g; s/,/\n/g' <<< "$s"
foo
bar

EDIT: 2nd solution Adding more generic solution here as per OP's comments, this will Look for every field and check if its having prefix then it will print that column's 2nd part(after _ one).
echo "prefix_foo,etc,bla,prefix_bar" |
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i~/prefix/){
split($i,array,"_")
val=(val?val OFS:"")array[2]
}
}
if(val){
print val
}
val=""
}'
To print output field values in new line try:
echo "prefix_foo,etc,bla,prefix_bar" |
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i~/prefix/){
split($i,array,"_")
print array[2]
}
}
}
'
1st solution: For simple case(specific to shown samples) could you please try following.
awk -F'[_,]' '/prefix_/{print $2,$4}' Input_file
OR
echo "prefix_foo,prefix_bar" | awk -F'[_,]' '/prefix_/{print $2,$4}'

Just trying out awk
echo "PREFIX_foo,PREFIX_bar" | awk -F, -v OFS="\n" '{gsub(/PREFIX_/,""); $1=$1}1'

Related

awk output to file based on filter

I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv

Copy numbers at the beginning of each line to the end of line

I have a file that produces this kind of lines . I wanna edit these lines and put them in passageiros.txt
a82411:x:1015:1006:Adriana Morais,,,:/home/a82411:/bin/bash
a60395:x:1016:1006:Afonso Pichel,,,:/home/a60395:/bin/bash
a82420:x:1017:1006:Afonso Alves,,,:/home/a82420:/bin/bash
a69225:x:1018:1006:Afonso Alves,,,:/home/a69225:/bin/bash
a82824:x:1019:1006:Afonso Carreira,,,:/home/a82824:/bin/bash
a83112:x:1020:1006:Aladje Sanha,,,:/home/a83112:/bin/bash
a82652:x:1022:1006:Alexandre Ferreira,,,:/home/a82652:/bin/bash
a83063:x:1023:1006:Alexandre Feijo,,,:/home/a83063:/bin/bash
a82540:x:1024:1006:Ana Santana,,,:/home/a82540:/bin/bash
With the following code i'm able to get something like this:
cat /etc/passwd |grep "^a[0-9]" | cut -d ":" -f1,5 | sed "s/a//" | sed "s/,//g" > passageiros.txt
sed -e "s/$/:::a/" -i passageiros.txt
82411:Adriana Morais:::a
60395:Afonso Pichel:::a
82420:Afonso Alves:::a
69225:Afonso Alves:::a
82824:Afonso Carreira:::a
83112:Aladje Sanha:::a
82652:Alexandre Ferreira:::a
83063:Alexandre Feijo:::a
82540:Ana Santana:::a
So my goal is to create something like this:
82411:Adriana Morais:::a82411#
60395:Afonso Pichel:::a60395#
82420:Afonso Alves:::a82420#
69225:Afonso Alves:::a69225#
82824:Afonso Carreira:::a82824#
83112:Aladje Sanha:::a83112#
82652:Alexandre Ferreira:::a82652#
83063:Alexandre Feijo:::a83063#
82540:Ana Santana:::a82540#
How can I do this?
Could you please try following.
awk -F'[:,]' '{val=$1;sub(/[a-z]+/,"",$1);print $1,$5,_,_,val"#"}' OFS=":" Input_file
Explanation: Adding explanation for above code too.
awk -F'[:,]' ' ##Starting awk script here and making field seprator as colon and comma here.
{ ##Starting main block here for awk.
val=$1 ##Creating a variable val whose value is first field.
sub(/[a-z]+/,"",$1) ##Using sub for substituting any kinf of alphabets small a to z in first field with NULL here.
print $1,$5,_,_,val"#" ##Printing 1st, 5th field and printing 2 NULL variables and printing variable val with #.
} ##Closing block for awk here.
' OFS=":" Input_file ##Mentioning OFS value as colon here and mentioning Input_file name here.
EDIT: Adding #Aserre's solution too here.
awk -F'[:,]' '{print substr($1, 2),$5,_,_,$1"#"}' OFS=":" Input_file
You may use the following awk:
awk 'BEGIN {FS=OFS=":"} {sub(/^a/, "", $1); gsub(/,/, "", $5); print $1, $5, _, _, "a" $1 "#"}' file > passageiros.txt
See the online demo
Details
BEGIN {FS=OFS=":"} sets the input and output field separator to :
sub(/^a/, "", $1) removes the first a from Field 1
gsub(/,/, "", $5) removes all , from Field 5
print $1, $5, _, _, "a" $1 "#" prints only the necessary fields to the output.
You can use just one sed:
grep '^a' file | cut -d: -f1,5 | sed 's/a\([^:]*\)\(.*\)/\1\2:::a\1#/;s/,,,//'

Chop a row into multiple rows using awk

I am trying to chop a line into multiple lines using awk. After every two words.
Input:
hey there this is a test
Output:
hey there
this is
a test
I am able to achieve it using xargs ,as follow:
echo hey there this is a test |xargs -n2
hey there
this is
a test
However I am curious to know how to achive this using awk. Here is command I am using, which of course didn't gave expected result.
echo hey there this is a test | awk '{ for(i=1;i<=NF;i++) if(i%2=="0") ORS="\n" ;else ORS=" "}1'
hey there this is a test
And
echo hey there this is a test | awk '{$1=$1; for(i=1;i<=NF;i++) if(i%2==0) ORS="\n" ;else ORS=" "}{ print $0}'
hey there this is a test
Need to know what is conceptually wrong in above awk command and how it can be modified to give correct output. Assume input is of single line.
Thanks and Regards.
Using awk you can do:
s='hey there this is a test'
awk '{for (i=1; i<=NF; i++) printf "%s%s", $i, (i%2 ? OFS : ORS)}' <<< "$s"
hey there
this is
a test
First you want OFS (field separator) not ORS (record separator).
And your for is in the end setting a single ORS, it iterates over all fields and sets the ORS value back and forth between " " and "\n" and at the end only one value will be there.
So what you really want is to operate on records (normally those are lines) instead of fields (normally spaces separate them).
Here's a version that uses records:
echo hey there this is a test | awk 'BEGIN {RS=" "} {if ((NR-1)%2 == 0) { ORS=" "} else {ORS="\n"}}1'
Result:
hey there
this is
a test
Another flavour of #krzyk's version:
$ awk 'BEGIN {RS=" "} {ORS="\n"} NR%2 {ORS=" "} 1' test.in
hey there
this is
a test
$
Maybe even:
awk 'BEGIN {RS=" "} {ORS=(ORS==RS?"\n":RS)} 1' test.in
They both do leave an ugly enter in the end, though.

awk OFS not producing expected value

I have a file
[root#nmk~]# cat file
abc>
sssd>
were>
I run both these variations of the awk commands
[root#nmk~]# cat file | awk -F\> ' { print $1}' OFS=','
abc
sssd
were
[root#nmk~]# cat file | awk -F\> ' BEGIN { OFS=","} { print $1}'
abc
sssd
were
[root#nmk~]#
But my expected output is
abc,sssd,were
What's missing in my commands ?
You're just a bit confused about the meaning/use of FS, OFS, RS and ORS. Take another look at the man page. I think this is what you were trying to do:
$ awk -F'>' -v ORS=',' '{print $1}' file
abc,sssd,were,$
but this is probably closer to the output you really want:
$ awk -F'>' '{rec = rec (NR>1?",":"") $1} END{print rec}' file
abc,sssd,were
or if you don't want to buffer the whole output as a string:
$ awk -F'>' '{printf "%s%s", (NR>1?",":""), $1} END{print ""}' file
abc,sssd,were
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1}' file
to print newline at the end:
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1} END{print "\n"}' file
output:
abc,sssd,were
Each line of input in awk is a record, so what you want to set is the Output Record Separator, ORS. The OFS variable holds the Output Field Separator, which is used to separate different parts of each line.
Since you are setting the input field separator, FS, to >, and OFS to ,, an easy way to see how these work is to add something on each line of your file after the >:
awk 'BEGIN { FS=">"; OFS=","} {$1=$1} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc,def
sssd,dsss
were,wolf
So you want to set the ORS. The default record separator is newline, so whatever you set ORS to effectively replaces the newlines in the input. But that means that if the last line of input has a newline - which is usually the a case - that last line will also get a copy of your new ORS:
awk 'BEGIN { FS=">"; ORS=","} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc>def,sssd>dsss,were>wolf,
It also won't get a newline at all, because that newline was interpreted as an input record separator and turned into the output record separator - it became the final comma.
So you have to be a little more explicit about what you're trying to do:
awk 'BEGIN { FS=">" } # split input on >
(NR>1) { printf "," } # if not the first line, print a ,
{ printf "%s", $1 } # print the first field (everything up to the first >)
END { printf "\n" } # add a newline at the end
' <<<$'abc>\nsssd>\nwere>'
Which outputs this:
abc,sssd,were
Through sed,
$ sed ':a;N;$!ba;s/>\n/,/g;s/>$//' file
abc,sssd,were
Through Perl,
$ perl -00pe 's/>\n(?=.)/,/g;s/>$//' file
abc,sssd,were

How to replace full column with the last value?

I'm trying to take last value in third column of a CSV file and replace then the whole third column with this value.
I've been trying this:
var=$(tail -n 1 math_ready.csv | awk -F"," '{print $3}'); awk -F, '{$3="$var";}1' OFS=, math_ready.csv > math1.csv
But it's not working and I don't understand why...
Please help!
awk '
BEGIN { ARGV[2]=ARGV[1]; ARGC++; FS=OFS="," }
NR==FNR { last = $3; next }
{ $3 = last; print }
' math_ready.csv > math1.csv
The main problem with your script was trying to access a shell variable ($var) inside your awk script. Awk is not shell, it is a completely separate language/tool with it's own namespace and variables. You cannot directly access a shell variable in awk, just like you couldn't access it in C. To access the VALUE of a shell variable you'd do:
shellvar=27
awk -v awkvar="$shellvar" 'BEGIN{ print awkvar }'`
Some additional cleanup:
When FS and OFS have the same value, don't assign them each to that value separately, use BEGIN{ FS=OFS="," } instead for clarity and maintainability.
Do not iniatailize variables AFTER the script that uses those variables unless you have a very specifc reason to do so. Use awk -F... -v OFS=... 'script' to init those variables to separate values, not awk -F... 'script' OFS=... as it's very unnatural to init variables in the code segment AFTER you've used them and variables inited in the args list at the end are not initialized when the BEGIN section is executed which can cause bugs.
A shell variable is not expandable internally in awk. You can do this instead:
awk -F, -v var="$var" '{ $3 = var } 1' OFS=, math_ready.csv > math1.cs
And you probably can simplify your code with this:
awk -F, 'NR == FNR { r = $3; next } { $3 = r } 1' OFS=, math_ready.csv math_ready.csv > math1.csv
Example input:
1,2,1
1,2,2
1,2,3
1,2,4
1,2,5
Output:
1,2,5
1,2,5
1,2,5
1,2,5
1,2,5
Try this one liner. It doesn't depend on the column count
var=`tail -1 sample.csv | perl -ne 'm/([^,]+)$/; print "$1";'`; cat sample.csv | while read line; do echo $line | perl -ne "s/[^,]*$/$var\n/; print $_;"; done
cat sample.csv
24,1,2,30,12
33,4,5,61,3333
66,7,8,91111,1
76,10,11,32,678
Out:
24,1,2,30,678
33,4,5,61,678
66,7,8,91111,678
76,10,11,32,678

Resources