How to do text processing using awk to cut last field in a line? - bash

I am having this scenario and need if I can improvise the awk output.
cat example.txt
"id": "/subscriptions/fbfa3437-c63c-4ed7-b9d3-fe595221950d/resourceGroups/rg-ooty/providers/Microsoft.Compute/virtualMachines/fb11b768-4d9f-4e83-b7dc-ee677f496fc9",
"id": "/subscriptions/fbfa3437-c63c-4ed7-b9d3-fe595221950d/resourceGroups/rg-ooty/providers/Microsoft.Compute/virtualMachines/fbee83e8-a84a-4b22-8197-fc9cc924801f",
"id": "/subscriptions/fbfa3437-c63c-4ed7-b9d3-fe595221950d/resourceGroups/rg-ooty/providers/Microsoft.Compute/virtualMachines/fc224f83-57f4-41eb-aee3-78f18d055704",
I am looking to cut the pattern after /virtualMachines/
Hence, used the below awk command to get the output.
cat example.txt | awk '{print $2}' | awk -F"/" '{print $(NF)}' | awk -F'",' '{print $1}'
fb11b768-4d9f-4e83-b7dc-ee677f496fc9
fbee83e8-a84a-4b22-8197-fc9cc924801f
fc224f83-57f4-41eb-aee3-78f18d055704
Is there any way I can use some options like 'getline' or multiple awk options in single awk execution or better ways to improve the command to get the output?
Please suggest.

Use " and / as field separators and print second last field:
awk -F '["/]' '{print $(NF-1)}' file
Output:
fb11b768-4d9f-4e83-b7dc-ee677f496fc9
fbee83e8-a84a-4b22-8197-fc9cc924801f
fc224f83-57f4-41eb-aee3-78f18d055704

If the spacing of example.txt is as consistent as it seems, then it's simpler to use cut with the -characters count option:
cut -c 127-162 example.txt
Output:
fb11b768-4d9f-4e83-b7dc-ee677f496fc9
fbee83e8-a84a-4b22-8197-fc9cc924801f
fc224f83-57f4-41eb-aee3-78f18d055704

You could also use sed for this:
sed 's#.*/\([^/]*\)",#\1#' example.txt
Matches anything .* forwardslash / then captures \( any number of non-forwardslash characters [^/]*, ends the capture \) followed by a quote & comma to end ",, and replaces this with the captured group (anything between the forwardslash and the ", at the end.

Related

Linux get data from each line of file

I have a file with many (~2k) lines similar to:
117 VALID|AUTHEN tcp:10.92.163.5:64127 uniqueID=nwCelerra
....
991 VALID|AUTHEN tcp:10.19.16.21:58332 uniqueID=smUNIX
I want only the IP address (10.19.16.21 shown above) and the value of the uniqueID (smUNIX shown above)
I am able to get close with:
cat t.txt|cut -f2- -d':'
10.22.36.69:46474 uniqueID=smwUNIX
...
I am on Linux using bash.
Using awk:
awk '{split($3,a,":"); split($4,b,"="); print a[2] " " b[2]}'
By default if splits on the whitespaces, with some extra code you can split the subfields
Update:
even easier overriding the default delimiter:
awk -F '[:=]' '{print $2 " "$4}'
using grep and sed :
grep -oP "^\d+ [A-Z]+\|[A-Z]+ \w+:\K(.*)" | sed "s/ uniqueID=/ /g"
outputs:
10.92.163.5:64127 nwCelerra
10.19.16.21:58332 smUNIX

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

Grep only 2 portions in a line

I have the following line. I can grep one part but struggling with also grepping the second portion.
Line:
html:<TR><TD>PICK_1</TD><TD>36.0000</TD><TD>1000000</TD><TD>26965</TD><TD>100000000</TD><TD>97074000</TD><TD>2926000</TD><TD>2.926%</TD><TD>97.074%</TD></TR>
I want to have the following results after grepping this line.
PICK_1 97.074%
Currently just grepping first portion via following command.
grep -Po "<TR><TD>[A-Z0-9_]+" test.txt
Appreciate any help on how I can go about doing this. Thanks.
Use awk with a custom field separator:
awk -F'[<>TDR/]+' '{ print $2, $(NF-1) }' file
This splits the line on things that look like one or more opening or closing <TD> or <TR> tags, and prints the second and second-last field.
Warning: this will break on almost every input except the one that you've shown, since awk, grep and friends are designed for processing text, not HTML.
If you always have the same number of fields delimited by "TD" tags, you can try with this (dirty) awk:
awk -F'[<TD>|</TD>]' '{print $8 " " $80}'
Or this combination of column and awk:
column -t -s "</TD>" | awk -F' ' '{print $3 " " $11}'
Or with sed instead of column:
sed -e 's/<TD>/ /g' | awk -F' ' '{print $3 " " $11}'
try provide each patter after "-e" option
grep -e PICK_1 -e "<TR><TD>[A-Z0-9_]+" test.txt
awk -F'[<>]' '{print $5,$(NF-4)}' file
PICK_1 97.074%

How to truncate trailing space in xargs

I would like to use xargs to list the contents of some files based on the output of command A. Xargs replace-str seem to be adding a space to the end and causing the command to fail. Any suggestions? I know this can be worked around using for loop. But curious to know how to do this using xargs.
lsscsi |awk -F\/ '/ATA/ {print $NF}' | xargs -L 1 -I % cat /sys/block/%/queue/scheduler
cat: /sys/block/sda /queue/scheduler: No such file or directory
The problem is not with xargs -I, which does not append a space to each argument, which can be verified as follows:
$ echo 'sda' | xargs -I % echo '[%]'
[sda]
Incidentally, specifying -L 1 in addition to -I is pointless: -I implies line-by-line processing.
Therefore, it must be the output from the command that provides input to xargs that contains the trailing space.
You can adapt your awk command to fix that:
lsscsi |
awk -F/ '/ATA/ {sub(/ $/,"", $NF); print $NF}' |
xargs -I % cat '/sys/block/%/queue/scheduler'
sub(/ $/,"", $NF) replaces a trailing space in field $NF with the empty string, thereby effectively removing it.
Note how I've (single-)quoted cat's argument so as to make it work even with filenames with spaces.
lsscsi |awk -F\/ '/ATA/ {print $NF}'| awk '{print $NF}' | xargs -L 1 -I % cat /sys/block/%/queue/scheduler
The first awk stmt splits by "/" so anything else is considered as field. In this is case "sda " becomes whole field including a space at the end. But by default, awk removes space . So after the pipe, the second awk prints $NF (which is last word of the line) and leaves out " " space as delimiter. awk { print $1 } will do the same because we have only one word, "sda" which is both first and last.

awk - split only by first occurrence

I have a line like:
one:two:three:four:five:six seven:eight
and I want to use awk to get $1 to be one and $2 to be two:three:four:five:six seven:eight
I know I can get it by doing sed before. That is to change the first occurrence of : with sed then awk it using the new delimiter.
However replacing the delimiter with a new one would not help me since I can not guarantee that the new delimiter will not already be somewhere in the text.
I want to know if there is an option to get awk to behave this way
So something like:
awk -F: '{print $1,$2}'
will print:
one two:three:four:five:six seven:eight
I will also want to do some manipulations on $1 and $2 so I don't want just to substitute the first occurrence of :.
Without any substitutions
echo "one:two:three:four:five" | awk -F: '{ st = index($0,":");print $1 " " substr($0,st+1)}'
The index command finds the first occurance of the ":" in the whole string, so in this case the variable st would be set to 4. I then use substr function to grab all the rest of the string from starting from position st+1, if no end number supplied it'll go to the end of the string. The output being
one two:three:four:five
If you want to do further processing you could always set the string to a variable for further processing.
rem = substr($0,st+1)
Note this was tested on Solaris AWK but I can't see any reason why this shouldn't work on other flavours.
Some like this?
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1'
one two:three:four:five:six
This replaces the first : to space.
You can then later get it into $1, $2
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1' | awk '{print $1,$2}'
one two:three:four:five:six
Or in same awk, so even with substitution, you get $1 and $2 the way you like
echo "one:two:three:four:five:six" | awk '{sub(/:/," ");$1=$1;print $1,$2}'
one two:three:four:five:six
EDIT:
Using a different separator you can get first one as filed $1 and rest in $2 like this:
echo "one:two:three:four:five:six seven:eight" | awk -F\| '{sub(/:/,"|");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
Unique separator
echo "one:two:three:four:five:six seven:eight" | awk -F"#;#." '{sub(/:/,"#;#.");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
The closest you can get with is with GNU awk's FPAT:
$ awk '{print $1}' FPAT='(^[^:]+)|(:.*)' file
one
$ awk '{print $2}' FPAT='(^[^:]+)|(:.*)' file
:two:three:four:five:six seven:eight
But $2 will include the leading delimiter but you could use substr to fix that:
$ awk '{print substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
two:three:four:five:six seven:eight
So putting it all together:
$ awk '{print $1, substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
Storing the results of the substr back in $2 will allow further processing on $2 without the leading delimiter:
$ awk '{$2=substr($2,2); print $1,$2}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
A solution that should work with mawk 1.3.3:
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1}' FS='\0'
one
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $2}' FS='\0'
two:three:four five:six:seven
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1,$2}' FS='\0'
one two:three:four five:six:seven
Just throwing this on here as a solution I came up with where I wanted to split the first two columns on : but keep the rest of the line intact.
Comments inline.
echo "a:b:c:d::e" | \
awk '{
split($0,f,":"); # split $0 into array of fields `f`
sub(/^([^:]+:){2}/,"",$0); # remove first two "fields" from `$0`
print f[1],f[2],$0 # print first two elements of `f` and edited `$0`
}'
Returns:
a b c:d::e
In my input I didn't have to worry about the first two fields containing escaped :, if that was a requirement, this solution wouldn't work as expected.
Amended to match the original requirements:
echo "a:b:c:d::e" | \
awk '{
split($0,f,":");
sub(/^([^:]+:)/,"",$0);
print f[1],$0
}'
Returns:
a b:c:d::e

Resources