how to discard the last field of the content of a file using awk command - shell

how to discard the last field using awk
list.txt file contains data like below,
Ram/45/simple
Gin/Run/657/No/Sand
Ram/Hol/Sin
Tan/Tin/Bun
but I require output like below,
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
tried the following command but it prints only the last field
cat list.txt |awk -F '/' '{print $(NF)}'
45
No
Hol
Tin

With GNU awk, you could try following.
awk 'BEGIN{FS=OFS="/"} NF--' Input_file
OR with any awk try following.
awk 'BEGIN{FS=OFS="/"} match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}' Input_file

This simple awk should work:
awk '{sub(/\/[^/]*$/, "")} 1' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
Or even this simpler sed should also work:
sed 's~/[^/]*$~~' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin

Related

Extract a property value from a text file

I have a log file which contains lines like the following one:
Internal (reserved=1728469KB, committed=1728469KB)
I'd need to extract the value contained in "committed", so 1728469
I'm trying to use awk for that
cat file.txt | awk '{print $4}'
However that produces:
committed=1728469KB)
This is still incomplete and would need still some work. Is there a simpler solution to do that instead?
Thanks
Could you please try following, using match function of awk.
awk 'match($0,/committed=[0-9]+/){print substr($0,RSTART+10,RLENGTH-10)}' Input_file
With GNU grep using \K option of it:
grep -oP '.*committed=\K[0-9]*' Input_file
Output will be 1728469 in both above solutions.
1st solution explanation:
awk ' ##Starting awk program from here.
match($0,/committed=[0-9]+/){ ##Using match function to match from committed= till digits in current line.
print substr($0,RSTART+10,RLENGTH-10) ##Printing sub string from RSTART+10 to RLENGTH-10 in current line.
}
' Input_file ##Mentioning Input_file name here.
Sed is better at simple matching tasks:
sed -n 's/.*committed=\([0-9]*\).*/\1/p' input_file
$ awk -F'[=)]' '{print $3}' file
1728469KB
You can try this:
str="Internal (reserved=1728469KB, committed=1728469KB)"
echo $str | awk '{print $3}' | cut -d "=" -f2 | rev | cut -c4- | rev

Need to prepend a string to a column and to add another column with it

I have a file with 2 lines
123|456|789
abc|123|891
I need a to output like below. Basically, I want to add the string "xyz" to col 1 and to add "xyz" as a new col 2
xyz-123|xyz|456|789
xyz-abc|xyz|123|891
This is what I used
awk 'BEGIN{FS=OFS="fs-"$1}{print value OFS $0}' /tmp/b.log
I get
xyz-123|456|789
xyz-abc|123|891
I tried
awk 'BEGIN{FS=OFS="fs-"$1}{print value OFS $0}' /tmp/b.log|awk -F" " '{$2="fs" $0;}1' OFS=" "
In addition to the awk's updating fields ($1, $2...) approach, we can also use substitution to do the job:
sed 's/^[^|]*/xyz-&|xyz/' file
If awk is a must:
awk '1+sub(/^[^|]*/, "xyz-&|xyz")' file
Both one-liners give expected output.
Could you please try following.
awk 'BEGIN{FS=OFS="|"} {$1="xyz-"$1;$2="xyz" OFS $2} 1' Input_file
OR as per #Corentin Limier's comment try:
awk 'BEGIN{FS=OFS="|"} {$1="xyz-" $1 OFS "xyz"} 1' Input_file
Output will be as follows.
xyz-123|xyz|456|789
xyz-abc|xyz|123|891
I would use sed instead of awk as follows:
sed -e 's/^/xyz-/' -e 's/|/|xyz|/' Input_file
This prepends xyz- at beginning of each line and changes the first | into |xyz|
Another slight variation of sed:
sed 's/^/xyz-/;s/|/&xyz&/' file

Bash replace in CSV multiple columns

I have the following CSV format:
data_disk01,"/opt=920MB;4512;4917;0;4855","/=4244MB;5723;6041;0;6359","/tmp=408MB;998;1053;0;1109","/var=789MB;1673;1766;0;1859","/boot=53MB;656;692;0;729"
I would like to take from each column, except the first one, the last value from the array, like this:
data_disk01,"/opt=4855","/=6359","/tmp=1109","/var=1859","/boot=729"
I have tried something like:
awk 'BEGIN {FS=OFS=","} {if(NF==!1);gsub(/\=.*/,",")} 1'
Just the string, I managed to do it with:
string="/opt=920MB;4512;4917;0;4855"
echo $string | awk '{split($0,a,";"); print a[1],a[5]}' | sed 's#=.* #=#'
/opt=4855
But could not make it work for the whole CSV.
Any hints are appreciated.
If your input never contains commas in the quoted fields, simple sed script should work:
sed 's/=[^"]*;/=/g' file.csv
Could you please try following awk and let me know if this helps you.
awk '{gsub(/=[^"]*;/,"=")} 1' Input_file
In case you want to save output into Input_file then append > temp_file && mv temp_file Input_file in above code too.

awk load one file into array, test against another file

I have two files:
seqs.fa:
>seq000007;size=72768;
ACTGTGAG
>seq000010;size=53132;
GTAAGATC
GAATTCTT
>seq00045;size=40321;
ACCCATTT
...
numbers.txt
72768
53132
my desired output would be the lines from the first file that match a number from the second file:
>seq000007;size=72768;
>seq000010;size=53132;
I attempted to use awk, but it only returns lines matching the first number:
awk -F"\n" -v RS=">" 'NR==FNR{for(i=1;i<=NF;i++) A[$i]; next} END {for (header in A) {if ( match(header,$1) ) {print header}}}' seqs.fa numbers.txt
seq000007;size=72768;
seq072768;size=1;
Why is awk only looping through the "header" array for the first line in numbers.txt? And, if this is an XY problem, is there a better way to accomplish this goal?
after fixing the typo in your numbers file
$ awk -F'=|;' 'NR==FNR{a[$1]; next}; $3 in a' numbers.txt seqs.fa
>seq000007;size=72768;
>seq000010;size=53132;
In this special case you can use GNU grep like this:
grep -F -f numbers.txt seqs.fa
The option -f filename uses all the patterns found in filename for the search. The options -F tells grep, that the patterns are simple fixed strings.

cut out fields that matched a regex from a delimited string

Example file:
35=A|11=ABC|55=AAA|20=DEF
35=B|66=ABC|755=AAA|800=DEF|11=ZZ|55=YYY
35=C|66=ABC|11=CC|755=AAA|800=DEF|55=UUU
35=C|66=ABC|11=XX|755=AAA|800=DEF
i want the output to to print like following, with only column 11= and 55= printed. (They are not at fixed location)
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU
Thanks.
sed might be easier here:
sed -nr '/(^|\|)11=[^|]*.*\|55=/s~^.*(11=[^|]*).*(\|55=[^|]*).*$~\1\2~p' file
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU
Try this:
$ awk -F'|' '{f=0;for (i=1;i<=NF;i++)if ($i~/^(11|55)=/){printf "%s",(f?"|":"")$i;f=1};print""}' file
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU
11=XX
To only show lines that have both a 11 field and a 55 field:
$ awk -F'|' '/(^|\|)11=/ && /\|55=/{f=0;for (i=1;i<=NF;i++)if ($i~/^(11|55)=/){printf "%s",(f?"|":"")$i;f=1};print""}' file
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU

Resources