awk - split only by first occurrence - bash

I have a line like:
one:two:three:four:five:six seven:eight
and I want to use awk to get $1 to be one and $2 to be two:three:four:five:six seven:eight
I know I can get it by doing sed before. That is to change the first occurrence of : with sed then awk it using the new delimiter.
However replacing the delimiter with a new one would not help me since I can not guarantee that the new delimiter will not already be somewhere in the text.
I want to know if there is an option to get awk to behave this way
So something like:
awk -F: '{print $1,$2}'
will print:
one two:three:four:five:six seven:eight
I will also want to do some manipulations on $1 and $2 so I don't want just to substitute the first occurrence of :.

Without any substitutions
echo "one:two:three:four:five" | awk -F: '{ st = index($0,":");print $1 " " substr($0,st+1)}'
The index command finds the first occurance of the ":" in the whole string, so in this case the variable st would be set to 4. I then use substr function to grab all the rest of the string from starting from position st+1, if no end number supplied it'll go to the end of the string. The output being
one two:three:four:five
If you want to do further processing you could always set the string to a variable for further processing.
rem = substr($0,st+1)
Note this was tested on Solaris AWK but I can't see any reason why this shouldn't work on other flavours.

Some like this?
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1'
one two:three:four:five:six
This replaces the first : to space.
You can then later get it into $1, $2
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1' | awk '{print $1,$2}'
one two:three:four:five:six
Or in same awk, so even with substitution, you get $1 and $2 the way you like
echo "one:two:three:four:five:six" | awk '{sub(/:/," ");$1=$1;print $1,$2}'
one two:three:four:five:six
EDIT:
Using a different separator you can get first one as filed $1 and rest in $2 like this:
echo "one:two:three:four:five:six seven:eight" | awk -F\| '{sub(/:/,"|");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
Unique separator
echo "one:two:three:four:five:six seven:eight" | awk -F"#;#." '{sub(/:/,"#;#.");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight

The closest you can get with is with GNU awk's FPAT:
$ awk '{print $1}' FPAT='(^[^:]+)|(:.*)' file
one
$ awk '{print $2}' FPAT='(^[^:]+)|(:.*)' file
:two:three:four:five:six seven:eight
But $2 will include the leading delimiter but you could use substr to fix that:
$ awk '{print substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
two:three:four:five:six seven:eight
So putting it all together:
$ awk '{print $1, substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
Storing the results of the substr back in $2 will allow further processing on $2 without the leading delimiter:
$ awk '{$2=substr($2,2); print $1,$2}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
A solution that should work with mawk 1.3.3:
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1}' FS='\0'
one
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $2}' FS='\0'
two:three:four five:six:seven
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1,$2}' FS='\0'
one two:three:four five:six:seven

Just throwing this on here as a solution I came up with where I wanted to split the first two columns on : but keep the rest of the line intact.
Comments inline.
echo "a:b:c:d::e" | \
awk '{
split($0,f,":"); # split $0 into array of fields `f`
sub(/^([^:]+:){2}/,"",$0); # remove first two "fields" from `$0`
print f[1],f[2],$0 # print first two elements of `f` and edited `$0`
}'
Returns:
a b c:d::e
In my input I didn't have to worry about the first two fields containing escaped :, if that was a requirement, this solution wouldn't work as expected.
Amended to match the original requirements:
echo "a:b:c:d::e" | \
awk '{
split($0,f,":");
sub(/^([^:]+:)/,"",$0);
print f[1],$0
}'
Returns:
a b:c:d::e

Related

Shell script to add values to a specific column

I have semicolon-separated columns, and I would like to add some characters to a specific column.
aaa;111;bbb
ccc;222;ddd
eee;333;fff
to the second column I want to add '#', so the output should be;
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
I tried
awk -F';' -OFS=';' '{ $2 = "#" $2}1' file
It adds the character but removes all semicolons with space.
You could use sed to do your job:
# replaces just the first occurrence of ';', note the absence of `g` that
# would have made it a global replacement
sed 's/;/;#/' file > file.out
or, to do it in place:
sed -i 's/;/;#/' file
Or, use awk:
awk -F';' '{$2 = "#"$2}1' OFS=';' file
All the above commands result in the same output for your example file:
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
#atb: Try:
1st:
awk -F";" '{print $1 FS "#" $2 FS $3}' Input_file
Above will work only when your Input_file has 3 fields only.
2nd:
awk -F";" -vfield=2 '{$field="#"$field} 1' OFS=";" Input_file
Above code you could put any field number and could make it as per your request.
Here I am making field separator as ";" and then taking a variable named field which will have the field number in it and then that concatenating "#" in it's value and 1 is for making condition TRUE and not making and action so by default print action will happen of current line.
You just misunderstood how to set variables. Change -OFS to -v OFS:
awk -F';' -v OFS=';' '{ $2 = "#" $2 }1' file
but in reality you should set them both to the same value at one time:
awk 'BEGIN{FS=OFS=";"} { $2 = "#" $2 }1' file

Can I have multiple awk actions without inserting newlines?

I'm a newbie with very small and specific needs. I'm using awk to parse something and I need to generate uninterrupted lines of text assembled from several pieces in the original text. But awk inserts a newline in the output whenever I use a semicolon.
Simplest example of what I mean:
Original text:
1 2
awk command:
{ print $1; print $2 }
The output will be:
1
2
The thing is that I need the output to be a single line, and I also need to use the semicolons, because I have to do multiple actions on the original text, not all of them print.
Also, using ORS=" " causes a whole lot of different problems, so it's not an option.
Is there any other way that I can have multiple actions in the same line without newline insertion?
Thanks!
The newlines in the output are nothing to do with you using semicolons to separate statements in your script, they are because print outputs the arguments you give it followed by the contents of ORS and the default value of ORS is newline.
You may want some version of either of these:
$ echo '1 2' | awk '{printf "%s ", $1; printf "%s ", $2; print ""}'
1 2
$
$ echo '1 2' | awk -v ORS=' ' '{print $1; print $2; print "\n"}'
1 2
$
$ echo '1 2' | awk -v ORS= '{print $1; print " "; print $2; print "\n"}'
1 2
$
but it's hard to say without knowing more about what you're trying to do.
At least scan through the book Effective Awk Programming, 4th Edition, by Arnold Robbins to get some understanding of the basics before trying to program in awk or you're going to waste a lot of your time and learn a lot of bad habits first.
You have better control of the output if you use printf, e.g.
awk '{ printf "%s %s\n",$1,$2 }'
awk '{print $1 $2}'
Is the solution in this case
TL;DR
You're getting newlines because print sends OFS to standard output after each print statement. You can format the output in a variety of other ways, but the key is generally to invoke only a single print or printf statement regardless of how many fields or values you want to print.
Use Commas
One way to do this is to use a single call to print using commas to separate arguments. This will insert OFS between the printed arguments. For example:
$ echo '1 2' | awk '{print $1, $2}'
1 2
Don't Separate Arguments
If you don't want any separation in your output, just pass all the arguments to a single print statement. For example:
$ echo '1 2' | awk '{print $1 $2}'
12
Formatted Strings
If you want more control than that, use formatted strings using printf. For example:
$ echo '1 2' | awk '{printf "%s...%s\n", $1, $2}'
1...2
$ echo "1 2" | awk '{print $1 " " $2}'
1 2

enclose a string where missing double quotes

I have an input file like below. The issue is that the file is pipe delimited and enclosed by double quotes, optionally. It is missed in the third field at the end of the string and I could see that it happens whenever the length exceeds say 2.
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2|10301 # 3rd field -> closing " missed out
The output should look like
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301
I was trying with some awk commands but could not achieve it.
awk -F'|' -v q=\" '{$3=$3 q;}1' OFS=| temp
awk -F'|' -v q=\" '{if (length($3) > 2) ($3=$3;}1)}' OFS='|' temp
Using awk you can write,
awk -F'"?\\|' -vOFS='"|' '{print $1, $2, $3, $4}'
Example
awk -F'"?\\|' -vOFS='"|' '{print $1, $2, $3, $4}' input
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301
What it does?
-F'"?\\|' Sets the input field separator to either "| or |
-vOFS='"|' Sets the output filed separator to "|. This is set always, that is even if the input field separator is | or "|
Or you can also write
awk -F'"?\|' -vOFS='"|' '1' input
Here 1 is always evaluated to true, in which case it will print the entire line.
awk -F'"?\\|' -vOFS='"|' '1' input
or
awk -F'"?\\|' -vOFS='"|' '{$1=$1}1' input
See #Kent's comment.
EDIT
If you want to add the quoting only for the third filed based on the length, you can write something like
awk -F'|' -vOFS='|' '{print $1, $2, $3(length($3)>4 ? "\"" : ""), $4}'
this sed one-liner works for given example:
sed 's/\([^"]\)|"/\1"|"/' file # this only works for the original example
This works for the original and current example:
sed 's/\([^"]\)|/\1"|/' file
awk '{sub(/Asdf2/,"Asdf2\"")}1' file
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301

issue with OFS in awk

I have a string containing this (field separator is the percentage sign), stored in a variable called data
201%jkhjfhn%kfhngjm%mkdfhgjdfg%mkdfhgjdfhg%mkdhfgjdhfg%kdfhgjgh%kdfgjhgfh%mkfgnhmkgfnh%k,gnhjkgfn%jkdfhngjdfng
I'm trying to print out that string replacing the percentage sign with a pipe but it seems harden than i thought:
echo ${data} | awk -F"%" 'BEGIN {OFS="|"} {print $0}'
I know I'm very close to it just not close enough.
I see that code as:
1 echo the variable value into a awk session
2 set field separator as "%"
3 set as output field separator "|"
4 print the line
Try this :
echo "$data" | awk -F"%" 'BEGIN {OFS="|"} {$1=$1; print $0}'
From awk manual
Finally, there are times when it is convenient to force awk to rebuild the entire
record, using the current value of the fields and OFS. To do this, use the seemingly
innocuous assignment:
$1 = $1 # force record to be reconstituted
print $0 # or whatever else with $0
Another lightweight way using only tr if you search an alternative for awk :
tr '%' '|' <<< "$data"
Sputnick gave you the awk solution, but you don't actually need awk at all, just use your shell:
echo ${data//%/|}

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources