Linux get data from each line of file - bash

I have a file with many (~2k) lines similar to:
117 VALID|AUTHEN tcp:10.92.163.5:64127 uniqueID=nwCelerra
....
991 VALID|AUTHEN tcp:10.19.16.21:58332 uniqueID=smUNIX
I want only the IP address (10.19.16.21 shown above) and the value of the uniqueID (smUNIX shown above)
I am able to get close with:
cat t.txt|cut -f2- -d':'
10.22.36.69:46474 uniqueID=smwUNIX
...
I am on Linux using bash.

Using awk:
awk '{split($3,a,":"); split($4,b,"="); print a[2] " " b[2]}'
By default if splits on the whitespaces, with some extra code you can split the subfields
Update:
even easier overriding the default delimiter:
awk -F '[:=]' '{print $2 " "$4}'

using grep and sed :
grep -oP "^\d+ [A-Z]+\|[A-Z]+ \w+:\K(.*)" | sed "s/ uniqueID=/ /g"
outputs:
10.92.163.5:64127 nwCelerra
10.19.16.21:58332 smUNIX

Related

Getting last X fields from a specific line in a CSV file using bash

I'm trying to get as bash variable list of users which are in my csv file. Problem is that number of users is random and can be from 1-5.
Example CSV file:
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
I would like to get something like
list_of_users="cat file.csv | grep "record2_data2" | <something> "
echo $list_of_users
user1,user2,user3,user4
I'm trying this:
cat file.csv | grep "record2_data2" | awk -F, -v OFS=',' '{print $4,$5,$6,$7,$8 }' | sed 's/"//g'
My result is:
user2,user3,user4,,
Question:
How to remove all "," from the end of my result? Sometimes it is just one but sometimes can be user1,,,,
Can I do it in better way? Users always starts after 3rd column in my file.
This will do what your code seems to be trying to do (print the users for a given string record2_data2 which only exists in the 2nd field):
$ awk -F',' '{gsub(/"/,"")} $2=="record2_data2"{sub(/([^,]*,){3}/,""); print}' file.csv
user1,user2,user3,user4
but I don't see how that's related to your question subject of Getting last X records from CSV file using bash so idk if it's what you really want or not.
Better to use a bash array, and join it into a CSV string when needed:
#!/usr/bin/env bash
readarray -t listofusers < <(cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u))
IFS=,
printf "%s\n" "${listofusers[*]}"
cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u is the important bit - it first only prints out the fourth and following fields of the CSV input file, removes quotes, turns commas into newlines, and then sorts the resulting usernames, removing duplicates. That output is then read into an array with the readarray builtin, and you can manipulate it and the individual elements however you need.
GNU sed solution, let file.csv content be
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
then
sed -n -e 's/"//g' -e '/record2_data/ s/[^,]*,[^,]*,[^,]*,// p' file.csv
gives output
user1,user2,user3,user4
Explanation: -n turns off automatic printing, expressions meaning is as follow: 1st substitute globally " using empty string i.e. delete them, 2nd for line containing record2_data substitute (s) everything up to and including 3rd , with empty string i.e. delete it and print (p) such changed line.
(tested in GNU sed 4.2.2)
awk -F',' '
/record2_data2/{
for(i=4;i<=NF;i++) o=sprintf("%s%s,",o,$i);
gsub(/"|,$/,"",o);
print o
}' file.csv
user1,user2,user3,user4
This might work for you (GNU sed):
sed -E '/record2_data/!d;s/"([^"]*)"(,)?/\1\2/4g;s///g' file
Delete all records except for that containing record2_data.
Remove double quotes from the fourth field onward.
Remove any double quoted fields.

How to do text processing using awk to cut last field in a line?

I am having this scenario and need if I can improvise the awk output.
cat example.txt
"id": "/subscriptions/fbfa3437-c63c-4ed7-b9d3-fe595221950d/resourceGroups/rg-ooty/providers/Microsoft.Compute/virtualMachines/fb11b768-4d9f-4e83-b7dc-ee677f496fc9",
"id": "/subscriptions/fbfa3437-c63c-4ed7-b9d3-fe595221950d/resourceGroups/rg-ooty/providers/Microsoft.Compute/virtualMachines/fbee83e8-a84a-4b22-8197-fc9cc924801f",
"id": "/subscriptions/fbfa3437-c63c-4ed7-b9d3-fe595221950d/resourceGroups/rg-ooty/providers/Microsoft.Compute/virtualMachines/fc224f83-57f4-41eb-aee3-78f18d055704",
I am looking to cut the pattern after /virtualMachines/
Hence, used the below awk command to get the output.
cat example.txt | awk '{print $2}' | awk -F"/" '{print $(NF)}' | awk -F'",' '{print $1}'
fb11b768-4d9f-4e83-b7dc-ee677f496fc9
fbee83e8-a84a-4b22-8197-fc9cc924801f
fc224f83-57f4-41eb-aee3-78f18d055704
Is there any way I can use some options like 'getline' or multiple awk options in single awk execution or better ways to improve the command to get the output?
Please suggest.
Use " and / as field separators and print second last field:
awk -F '["/]' '{print $(NF-1)}' file
Output:
fb11b768-4d9f-4e83-b7dc-ee677f496fc9
fbee83e8-a84a-4b22-8197-fc9cc924801f
fc224f83-57f4-41eb-aee3-78f18d055704
If the spacing of example.txt is as consistent as it seems, then it's simpler to use cut with the -characters count option:
cut -c 127-162 example.txt
Output:
fb11b768-4d9f-4e83-b7dc-ee677f496fc9
fbee83e8-a84a-4b22-8197-fc9cc924801f
fc224f83-57f4-41eb-aee3-78f18d055704
You could also use sed for this:
sed 's#.*/\([^/]*\)",#\1#' example.txt
Matches anything .* forwardslash / then captures \( any number of non-forwardslash characters [^/]*, ends the capture \) followed by a quote & comma to end ",, and replaces this with the captured group (anything between the forwardslash and the ", at the end.

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

Grep only 2 portions in a line

I have the following line. I can grep one part but struggling with also grepping the second portion.
Line:
html:<TR><TD>PICK_1</TD><TD>36.0000</TD><TD>1000000</TD><TD>26965</TD><TD>100000000</TD><TD>97074000</TD><TD>2926000</TD><TD>2.926%</TD><TD>97.074%</TD></TR>
I want to have the following results after grepping this line.
PICK_1 97.074%
Currently just grepping first portion via following command.
grep -Po "<TR><TD>[A-Z0-9_]+" test.txt
Appreciate any help on how I can go about doing this. Thanks.
Use awk with a custom field separator:
awk -F'[<>TDR/]+' '{ print $2, $(NF-1) }' file
This splits the line on things that look like one or more opening or closing <TD> or <TR> tags, and prints the second and second-last field.
Warning: this will break on almost every input except the one that you've shown, since awk, grep and friends are designed for processing text, not HTML.
If you always have the same number of fields delimited by "TD" tags, you can try with this (dirty) awk:
awk -F'[<TD>|</TD>]' '{print $8 " " $80}'
Or this combination of column and awk:
column -t -s "</TD>" | awk -F' ' '{print $3 " " $11}'
Or with sed instead of column:
sed -e 's/<TD>/ /g' | awk -F' ' '{print $3 " " $11}'
try provide each patter after "-e" option
grep -e PICK_1 -e "<TR><TD>[A-Z0-9_]+" test.txt
awk -F'[<>]' '{print $5,$(NF-4)}' file
PICK_1 97.074%

how to extract string appears after one particular string in Shell

I am working on a script where I am grepping lines that contains -abc_1.
I need to extract string that appear just after this string as follow :
option : -abc_1 <some_path>
I have used following code :
grep "abc_1" | awk -F " " {print $4}
This code is failing if there are more spaces used between string , e.g :
option : -abc_1 <some_path>
It will be helpful if I can extract the path somehow without bothering of spaces.
thanks
This should do:
echo 'option : -abc_1 <some_path>' | awk '/abc_1/ {print $4}'
<some_path>
If you do not specify field separator, it uses one ore more blank as separator.
PS you do not need both grep and awk
With sed you can do the search and the filter in one step:
sed -n 's/^.*abc_1 *: *\([^ ]*\).*$/\1/p'
The -n option suppresses printing, but the p command at the end still prints if a successful substitution was made.
perl -lne ' print $1 if(/-abc_1 (.*)/)' your_file
Tested Here
Or if you want to use awk:
awk '{for(i=1;i<=NF;i++)if($i="-abc_1")print $(i+1)}' your_file
try this grep only way:
grep -Po '^option\s*:\s*-abc_1\s*\K.*' file
or if the white spaces were fixed:
grep -Po '^option : -abc_1 \K.*' file

Resources