cut out fields that matched a regex from a delimited string - bash

Example file:
35=A|11=ABC|55=AAA|20=DEF
35=B|66=ABC|755=AAA|800=DEF|11=ZZ|55=YYY
35=C|66=ABC|11=CC|755=AAA|800=DEF|55=UUU
35=C|66=ABC|11=XX|755=AAA|800=DEF
i want the output to to print like following, with only column 11= and 55= printed. (They are not at fixed location)
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU
Thanks.

sed might be easier here:
sed -nr '/(^|\|)11=[^|]*.*\|55=/s~^.*(11=[^|]*).*(\|55=[^|]*).*$~\1\2~p' file
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU

Try this:
$ awk -F'|' '{f=0;for (i=1;i<=NF;i++)if ($i~/^(11|55)=/){printf "%s",(f?"|":"")$i;f=1};print""}' file
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU
11=XX
To only show lines that have both a 11 field and a 55 field:
$ awk -F'|' '/(^|\|)11=/ && /\|55=/{f=0;for (i=1;i<=NF;i++)if ($i~/^(11|55)=/){printf "%s",(f?"|":"")$i;f=1};print""}' file
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU

Related

how to discard the last field of the content of a file using awk command

how to discard the last field using awk
list.txt file contains data like below,
Ram/45/simple
Gin/Run/657/No/Sand
Ram/Hol/Sin
Tan/Tin/Bun
but I require output like below,
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
tried the following command but it prints only the last field
cat list.txt |awk -F '/' '{print $(NF)}'
45
No
Hol
Tin
With GNU awk, you could try following.
awk 'BEGIN{FS=OFS="/"} NF--' Input_file
OR with any awk try following.
awk 'BEGIN{FS=OFS="/"} match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}' Input_file
This simple awk should work:
awk '{sub(/\/[^/]*$/, "")} 1' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
Or even this simpler sed should also work:
sed 's~/[^/]*$~~' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin

Bash replace in CSV multiple columns

I have the following CSV format:
data_disk01,"/opt=920MB;4512;4917;0;4855","/=4244MB;5723;6041;0;6359","/tmp=408MB;998;1053;0;1109","/var=789MB;1673;1766;0;1859","/boot=53MB;656;692;0;729"
I would like to take from each column, except the first one, the last value from the array, like this:
data_disk01,"/opt=4855","/=6359","/tmp=1109","/var=1859","/boot=729"
I have tried something like:
awk 'BEGIN {FS=OFS=","} {if(NF==!1);gsub(/\=.*/,",")} 1'
Just the string, I managed to do it with:
string="/opt=920MB;4512;4917;0;4855"
echo $string | awk '{split($0,a,";"); print a[1],a[5]}' | sed 's#=.* #=#'
/opt=4855
But could not make it work for the whole CSV.
Any hints are appreciated.
If your input never contains commas in the quoted fields, simple sed script should work:
sed 's/=[^"]*;/=/g' file.csv
Could you please try following awk and let me know if this helps you.
awk '{gsub(/=[^"]*;/,"=")} 1' Input_file
In case you want to save output into Input_file then append > temp_file && mv temp_file Input_file in above code too.

awk: Preserve multiple field separators

I'm using awk to swap fields in a filename using two different field separators.
I want to know if it's possible to preserve both separators, '/' and '_', in the correct positions in the output.
Example:
I want to change this:
/path/to/example_file_123.txt
into this:
/path/to/file_example_123.txt
I've tried:
awk -F "[/_]" '{ t=$3; $3=$4; $4=t;print}' file.txt
but the field separators are missing from the output:
path to file example 123.txt
I've tried preserving the field separators:
awk -F "[/_]" '{t=$3; $3=$4; $4=t; OFS=FS; print}' file.txt
but I get this:
[/_]path[/_]to[/_]file[/_]example[/_]123.txt
Is there a way of preserving the correct original field separator in awk when you're dealing multiple separators?
Here is one solution:
awk -F/ '{n=split($NF,a,"_");b=a[1];a[1]=a[2];a[2]=b;$NF=a[1];for (i=2;i<=n;i++) $NF=$NF"_"a[i]}1' OFS=/ file
/path/to/file_example_123.txt
You can always use Perl.
Given:
$ echo $e
/path/to/example_file_123.txt
Then:
$ echo $e | perl -ple 's/([^_\/]+)_([^_\/]+)/\2_\1/'
/path/to/file_example_123.txt
$ cat /tmp/1
/path/to/example_file_123.txt
/path/to/example_file_345.txt
$ awk -F'_' '{split($1,a,".*/"); gsub(a[2],"",$1);print $1$2"_"a[2]"_"$3}' /tmp/1
/path/to/file_example_123.txt
/path/to/file_example_345.txt

awk: csv split works, but ignores the last field in the row

I have a sample file that looks like:
Sample.csv
Data_1,0,289,292,293,300,306
Data_2,0,294,3,306
Data_3,0,294,305,306
Data_4,0,294,305,306
And Im running awk on it:
scr.sh:
awk -F ',' -v tId="$1" '{for(i=3; i<NF; i++){if($i==tId) print}}' $2
By calling
./scr.sh 300 Sample.csv
That works fine and returns me exactly one row that matches.
UK_4_AB34,0,289,292,293,300,306
Original Problem statement: From the 3rd column onwards, if any of the column data matches the number given, then the line should get printed.
But if I call:
./scr.sh 306 Sample.csv
That returns me NOTHING!
I've double checked the lines in Sample.csv and confirmed that there are NO trailing spaces on any of the lines.
Any clues? Thanks.
This awk will do what you're looking for:
awk -F ',' -v tId="$1" '$0 ~ "(^|,)" tId "(,|$)"' file
Alternatively this egrep will also do the job:
egrep '(^|,)306(,|$)' file
UPDATE: Based on your comments below you can use:
awk -v tId="$1" 'BEGIN{FS=OFS=","} {p=$0; $1=$2=""} $0 ~ "(^|,)" tId "(,|$)"{print p}' file
Here is a simple solution to your problem.
Lets say your argument is stored in a variable named var
ie var=$1;
Therefore run the following command to find the occurences in your file
grep -E "^${var},|,${var},|,${var}$" yourfilename

How I can remove double lines in whole file, omiting first n characters in each line?

I have following data format:
123456786|data1
123456787|data2
123456788|data3
The first column is main_id. I need to remove all duplicated lines from txt file but omitting main_id number. How I can do that?
Normally I use such AWK script, but it finds double lines without omiting:
awk '!x[$0]++' $2 > "$filename"_no_doublets.txt #remove doublets
Thanks for any help.
if you have more columns, this line should do:
awk '{a=$0;sub(/[^|]*\|/,"",a)}!x[a]++' file
example:
123456786|data1
12345676|data1
123456787|data2|foo
203948787|data2|foo
123456788|data3
kent$ awk '{a=$0;sub(/[^|]*\|/,"",a)}!x[a]++' f
123456786|data1
123456787|data2|foo
123456788|data3
You can use:
awk -F'|' '!x[$2]++'
This will find duplicates only based on field 2 delimited by |
UPDATE:
awk '{line=$0; sub(/^[^|]+\|/, "", line)} !found[line]++'
awk '{key=$0; sub(/[^|]+/,"",key)} !seen[key]++' file

Resources