Parsing flat files using String delimiter "|^|" in Unix - bash

I tried to parse a text file having delimiter as "|^|" using awk command. But awk command is not working as expected.
Below is the example:
Command-1:
echo "28851|^|178838|^||^|" | awk -F '|^|' '{ print $1}'
Output:
28851|^|178838|^||^|
Expected output:
28851.
Command-2:
echo "28851|^|178838|^||^|" | awk -F '|^|' '{ print $2}'
Output:
BLANK or NULL
Expected output:
178838
Please provide some inputs on how to parse the text file in Unix.

Awk treats |^| as a complex regex pattern. But since | and ^ are regex metacharacters - they should be escaped with \ (\\|\\^\\|) or put into a character classes [|][^][|].
echo "28851|^|178838|^||^|" | awk -F'\\|\\^\\|' '{ print $2 }'
178838

echo '28851|^|178838|^||^|' | awk -F'\\|\\^\\|' '{print $2}'
You must escape special characters.

Related

How to replace all occurrence of a symbol with awk

From the cmd (awk 'some expression') I got a result in the format
Key:(white_space)Value
Key:(white_space)Value
...
How to manipulate the result to be in the format:
Key=Value
I need this because I want to put the information into .properties file format which is key=value
In other words I need to replace : with = and remove the whitespace.
Is there a command in awk that can achieve this ?
You ask for awk, while sed provides just as easy a solution. However, awk makes it trivial with sub as well:
awk '{ sub(/:[ \t]*/,"=") }1'
Example
$ echo "Key: Value" | awk '{ sub(/:[ \t]*/,"=") }1'
Key=Value
Another awk approach.
awk -F'[: ]' '{print $1 "=" $NF}' file.txt

Shell script to match a string and print the next string on aix machine

I have a following line as input.
Parsing events:hostname='tom';Ipaddress='10.10.10.1';situation_name='sgd_abc_app_a';type='General';
Like this there are many fields in a line separated by a delimiter as semi-colon. (But starting with Parsing Events:)
I want to extract onlysgd_abc_app_a when it matches situation_name.
Thanks
Kulli
Try
sed -n 's/^.*situation_name=//p' input_file| awk -F "'" '{print $2}'
For your request, it would work no matter the position of situation_name
$ awk '/situation_name/{match($0,/situation_name=[^;]+/); print substr($0,RSTART+16,RLENGTH-17)}' file
sgd_abc_app_a
awk solution:
s="Parsing events: hostname='tom';Ipaddress='10.10.10.1';situation_name='sgd_abc_app_a';type='General';"
awk -F'[=;]' '{ gsub("\047","",$6); print $6 }' <<< $s
Or with sed:
sed -n "s/^Parsing events:.*situation_name='\([^']*\).*/\1/p" <<< $s
The output:
sgd_abc_app_a

awk print something if column is empty

I am trying out one script in which a file [ file.txt ] has so many columns like
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha| |325
xyz| |abc|123
I would like to get the column list in bash script using awk command if column is empty it should print blank else print the column value
I have tried the below possibilities but it is not working
cat file.txt | awk -F "|" {'print $2'} | sed -e 's/^$/blank/' // Using awk and sed
cat file.txt | awk -F "|" '!$2 {print "blank"} '
cat file.txt | awk -F "|" '{if ($2 =="" ) print "blank" } '
please let me know how can we do that using awk or any other bash tools.
Thanks
I think what you're looking for is
awk -F '|' '{print match($2, /[^ ]/) ? $2 : "blank"}' file.txt
match(str, regex) returns the position in str of the first match of regex, or 0 if there is no match. So in this case, it will return a non-zero value if there is some non-blank character in field 2. Note that in awk, the index of the first character in a string is 1, not 0.
Here, I'm assuming that you're interested only in a single column.
If you wanted to be able to specify the replacement string from a bash variable, the best solution would be to pass the bash variable into the awk program using the -v switch:
awk -F '|' -v blank="$replacement" \
'{print match($2, /[^ ]/) ? $2 : blank}' file.txt
This mechanism avoids problems with escaping metacharacters.
You can do it using this sed script:
sed -r 's/\| +\|/\|blank\|/g' File
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123
If you don't want the |:
sed -r 's/\| +\|/\|blank\|/g; s/\|/ /g' File
abc pqr lmn 123
pqr xzy 321 azy
lee cha blank 325
xyz blank abc 123
Else with awk:
awk '{gsub(/\| +\|/,"|blank|")}1' File
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123
You can use awk like this:
awk 'BEGIN{FS=OFS="|"} {for (i=1; i<=NF; i++) if ($i ~ /^ *$/) $i="blank"} 1' file
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123

awk - split only by first occurrence

I have a line like:
one:two:three:four:five:six seven:eight
and I want to use awk to get $1 to be one and $2 to be two:three:four:five:six seven:eight
I know I can get it by doing sed before. That is to change the first occurrence of : with sed then awk it using the new delimiter.
However replacing the delimiter with a new one would not help me since I can not guarantee that the new delimiter will not already be somewhere in the text.
I want to know if there is an option to get awk to behave this way
So something like:
awk -F: '{print $1,$2}'
will print:
one two:three:four:five:six seven:eight
I will also want to do some manipulations on $1 and $2 so I don't want just to substitute the first occurrence of :.
Without any substitutions
echo "one:two:three:four:five" | awk -F: '{ st = index($0,":");print $1 " " substr($0,st+1)}'
The index command finds the first occurance of the ":" in the whole string, so in this case the variable st would be set to 4. I then use substr function to grab all the rest of the string from starting from position st+1, if no end number supplied it'll go to the end of the string. The output being
one two:three:four:five
If you want to do further processing you could always set the string to a variable for further processing.
rem = substr($0,st+1)
Note this was tested on Solaris AWK but I can't see any reason why this shouldn't work on other flavours.
Some like this?
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1'
one two:three:four:five:six
This replaces the first : to space.
You can then later get it into $1, $2
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1' | awk '{print $1,$2}'
one two:three:four:five:six
Or in same awk, so even with substitution, you get $1 and $2 the way you like
echo "one:two:three:four:five:six" | awk '{sub(/:/," ");$1=$1;print $1,$2}'
one two:three:four:five:six
EDIT:
Using a different separator you can get first one as filed $1 and rest in $2 like this:
echo "one:two:three:four:five:six seven:eight" | awk -F\| '{sub(/:/,"|");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
Unique separator
echo "one:two:three:four:five:six seven:eight" | awk -F"#;#." '{sub(/:/,"#;#.");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
The closest you can get with is with GNU awk's FPAT:
$ awk '{print $1}' FPAT='(^[^:]+)|(:.*)' file
one
$ awk '{print $2}' FPAT='(^[^:]+)|(:.*)' file
:two:three:four:five:six seven:eight
But $2 will include the leading delimiter but you could use substr to fix that:
$ awk '{print substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
two:three:four:five:six seven:eight
So putting it all together:
$ awk '{print $1, substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
Storing the results of the substr back in $2 will allow further processing on $2 without the leading delimiter:
$ awk '{$2=substr($2,2); print $1,$2}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
A solution that should work with mawk 1.3.3:
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1}' FS='\0'
one
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $2}' FS='\0'
two:three:four five:six:seven
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1,$2}' FS='\0'
one two:three:four five:six:seven
Just throwing this on here as a solution I came up with where I wanted to split the first two columns on : but keep the rest of the line intact.
Comments inline.
echo "a:b:c:d::e" | \
awk '{
split($0,f,":"); # split $0 into array of fields `f`
sub(/^([^:]+:){2}/,"",$0); # remove first two "fields" from `$0`
print f[1],f[2],$0 # print first two elements of `f` and edited `$0`
}'
Returns:
a b c:d::e
In my input I didn't have to worry about the first two fields containing escaped :, if that was a requirement, this solution wouldn't work as expected.
Amended to match the original requirements:
echo "a:b:c:d::e" | \
awk '{
split($0,f,":");
sub(/^([^:]+:)/,"",$0);
print f[1],$0
}'
Returns:
a b:c:d::e

SSH call inside ruby, using %x

I am trying to make a single line ssh call from a ruby script. My script takes a hostname, and then sets out to return the hostname's machine info.
return_value = %x{ ssh #{hostname} "#{number_of_users}; #{number_of_processes};
#{number_of_processes_running}; #{number_of_processes_sleeping}; "}
Where the variables are formatted like this.
number_of_users = %Q(users | wc -w | cat | awk '{print "Number of Users: "\$1}')
number_of_processes = %Q(ps -el | awk '{print $2}' | wc -l | awk '{print "Number of Processes: "$1}')
I have tried both %q, %Q, and just plain "" and I cannot get the awk to print anything before the output. I either get this error (if I include the colon)
awk: line 1: syntax error at or near :
or if I don't include the slash in front of $1 I just get empty output for that line. Is there any solution for this? I thought it might be because I was using %q, but it even happens with just double quotes.
Use backticks to capture the output of the command and return the output as a string:
number_of_users = `users | wc -w | cat | awk '{print "Number of Users:", $1}'`
puts number_of_users
Results on my system:
48
But you can improve your pipeline:
users | awk '{ print "Number of Users:", NF }'
ps -e | awk 'END { print "Number of Processes:", NR }'
So the solution to this problem is:
%q(users | wc -w | awk '{print \"Number of Users: \"\$1}')
Where you have to use %q, not %, not %Q, and not ""
You must backslash double quotes and the dollar sign in front of any awk variables
If somebody could improve upon this answer by explaining why, that would be most appreciated
Though as Steve pointed out I could have improved my code using users | awk '{ print \"Number of Users:\", NF }'
In which case there is no need to backslash the NF.

Resources