Remove blank line from awk output - bash

I am trying to remove leading whitespace from awk output. When I use this command, a leading whitespace is displayed.
diff test1.txt test.txt | awk '{print $2}'
output:
asdfasdf.txt
test.txt
weqtwqe.txt
How can I remove the leading whitespace using awk?
Thanks in advance

if you want to print the lines where $2 exists you can do it conditionally on number of fields
awk 'NF>1{print $2}'
will do.

Related

Replacing new line with comma seperator

I have a text file that the records in the following format. Please note that there are no empty files within the Name, ID and Rank section.
"NAME","STUDENT1"
"ID","123"
"RANK","10"
"NAME","STUDENT2"
"ID","124"
"RANK","11"
I have to convert the above file to the below format
"STUDENT1","123","10"
"STUDENT2","124","11"
I understand that this can be achieved using shell script by reading the records and writing it to another output file. But can this can done using awk or sed ?
$ awk -F, '{ORS=(NR%3?FS:RS); print $2}' file
"STUDENT1","123","10"
"STUDENT2","124","11"
With awk:
awk -F, '$1=="\"RANK\""{print $2;next}{printf "%s,",$2}' file
With awk, printing newline each 3 lines:
awk -F, '{printf "%s",$2;if (NR%3){printf ","}else{print""};}'
Following awk may also help you on same.
awk -F, '{ORS=$0~/^"RANK/?"\n":FS;print $NF}' Input_file
With sed
sed -E 'N;N;;y/\n/ /;s/([^,]*)(,[^ ]*)/\2/g;s/,//' infile

Get only part of file using sed or awk

I have a file which contains text as follows:
Directory /home/user/ "test_user"
bunch of code
another bunch of code
How can I get from this file only the /home/user/ part?
I've managed to use awk -F '"' 'NR==1{print $1}' file.txt to get rid of rest of the file and I'm gettig output like this:
Directory /home/user/
How can I change this command to get only /home/user/ part? I'd like to make it as simple as possible. Unfortunately, I can't modify this file to add/change the content.
this should work the fastest, noticeable if your file is large
awk '{print $2; exit}' file
it will print the second field of the first line and stop processing the rest of the file.
With awk it should be:
awk 'NR==1{print $2}' file.txt
Setting the field delimiter to " was wrong Since it splits the line into these fields:
$1 = 'Directory /home/user/'
$2 = 'test_user'
$3 = '' (empty)
The default record separator, which is [[:space:]]+, splits like this:
$1 = 'Directory'
$2 = '/home/user/'
$3 = '"test_user"'
As an alternate, you can use head and cut:
$ head -n 1 file | cut -d' ' -f2
Not sure why you are using the -F" as that changes the delimiter. If you remove that, then $2 will get you what you want.
awk 'NR==1{print $2}' file.txt
You can also use awk to execute the print when the line contains /home/user instead of counting records:
awk '/\home\/user\//{print $2}' file.txt
In this case, if the line were buried in the file, or if you had multiple instances, you would get the name for every occurrence wherever it was.
Adding some grep
grep Directory file.txt|awk '{print $2}'

awk command to select exact word in any field

I have input file as
ab,1,3,qqq,bbc
b,445,jj,abc
abcqwe,234,23,123
abc,12,bb,88
uirabc,33,99,66
I have to select the rows which has only 'abc'. And note that abc string can appear in any of the column. Please help me how to achieve this using awk.
Output:
b,445,jj,abc
abc,12,bb,88
You could also use plain grep:
grep "(^|,)abc(,|$)" file
Or if you have to use awk
awk '/(^|,)abc(,|$)/' file
Using awk
awk 'gsub(/(^|,)abc(,|$)/,"&")' file
b,445,jj,abc
abc,12,bb,88
Based on Beny23s regex.
It does look for abc where its starting from ^ start or from a , and
ends with a , or end of line $
Another one using beny23 regex:
awk 'NF>1' FS="(^|,)abc(,|$)" infile
Not asked but if you feel the need to filter just the lines with one ocurrence:
$ cat infile
ab,1,3,qqq,bbc
b,445,jj,abc
abcqwe,234,23,123
abc,12,bb,88
abc,12,bb,abc
uirabc,33,99,66
This will be handy:
$ awk 'NF==2' FS="(^|,)abc(,|$)" infile
b,445,jj,abc
abc,12,bb,88
Also possible using Jotne solution:
$ awk 'gsub(/(^|,)abc(,|$)/,"&")==1' infile
Through awk,
$ awk -F, '{for(i=1;i<=NF;i++){if($i=="abc") print $0;}}' file | uniq
b,445,jj,abc
abc,12,bb,88
OR
$ awk -F, '{for(i=1;i<=NF;i++){if($i=="abc") {print; next}}}' file
b,445,jj,abc,abc
abc,12,bb,88
In the above awk command Field Separator variable is set to , . AWk parses the input file line by line. for function is used to traverse all the fields in a line. If a value of a particular field is abc, then it prints the whole line.

Add blank column using awk or sed

I have a file with the following structure (comma delimited)
116,1,89458180,17,FFFF,0403254F98
I want to add a blank column on the 4th field such that it becomes
116,1,89458180,,17,FFFF,0403254F98
Any inputs as to how to do this using awk or sed if possible ?
thank you
Assuming that none of the fields contain embedded commas, you can restate the task as replacing the third comma with two commas. This is just:
sed 's/,/,,/3'
With the example line from the file:
$ echo "116,1,89458180,17,FFFF,0403254F98" | sed 's/,/,,/3'
116,1,89458180,,17,FFFF,0403254F98
You can use this awk,
awk -F, '$4="," $4' OFS=, yourfile
(OR)
awk -F, '$4=FS$4' OFS=, yourfile
If you want to add 6th and 8th field,
awk -F, '{$4=FS$4; $1=FS$1; $6=FS$6}1' OFS=, yourfile
Through awk
$ echo '116,1,89458180,17,FFFF,0403254F98' | awk -F, -v OFS="," '{print $1,$2,$3,","$4,$5,$6}'
116,1,89458180,,17,FFFF,0403254F98
It prints a , after third field(delimited) by ,
Through GNU sed
$ echo 116,1,89458180,17,FFFF,0403254F98| sed -r 's/^([^,]*,[^,]*,[^,]*)(.*)$/\1,\2/'
116,1,89458180,,17,FFFF,0403254F98
It captures all the characters upto the third command and stored it into a group. Characters including the third , upto the last are stored into another group. In the replacement part, we just add an , between these two captured groups.
Through Basic sed,
Through Basic sed
$ echo 116,1,89458180,17,FFFF,0403254F98| sed 's/^\([^,]*,[^,]*,[^,]*\)\(.*\)$/\1,\2/'
116,1,89458180,,17,FFFF,0403254F98
echo 116,1,89458180,17,FFFF,0403254F98|awk -F',' '{print $1","$2","$3",,"$4","$5","$6}'
Non-awk
t="116,1,89458180,17,FFFF,0403254F98"
echo $(echo $t|cut -d, -f1-3),,$(echo $t|cut -d, -f4-)
You can use bellow awk command to achieve that.Replace the $3 with what ever the column that you want to make it blank.
awk -F, '{$3="" FS $3;}1' OFS=, filename
sed -e 's/\([^,]*,\)\{4\}/&,/' YourFile
replace the sequence of 4 [content (non comma) than comma ] by itself followed by a comma

unix bash - extract a line from file

need your help!!! I tried looking for this but to no avail.
How can I achieve the following using bash?
I've a flat file called "cube.mdl" that contains:
[...]
bla bla bla bla lots of lines above
Cube 8007841 "BILA_" MdcFile "BILA_CO_PM_MKT_BR_CUBE.mdc"
bla bla bla more lines below
[...]
I need to open that file, look for the word "MdcFile" and get the string that follows between quotes, which would be BILA_CO_PM_MKT_BR_CUBE.mdc
I know AWK or grep are powerful enough to do this in one line, but I couldn't find an example that could help me do it on my own.
Thanks in advance!
JMA
You can use:
grep -o -P "MdcFile.*" cube.mdl | awk -F\" '{ print $2 }'
This will use grep's regex to only return MdcFile and everything after it in the current line. Then, awk will use the " as a delimiter and print only the second word - which would be your "in-quotes" word(s), returned without the quotes of course.
The option -o, --only-matching specifies to return only the text matching that matches and the -P, --perl-regexp specifies that the pattern is a Perl-Regex pattern. It appears that some versions of grep do not contain these options. The OP's version is a version that does not include them, but the following appears to work for him instead:
grep "MdcFile.*" cube.mdl | awk -F\" '{ print $2 }'
grep MdcFile cube.mdl | awk '{print $5}'
would do it, assuming there's no spaces in any of those bits to throw off the position count.
This might do it.
sed -n '/MdcFile / s/.*MdcFile "\(\[^"\]\+\)".*/\1/;/MdcFile / p' INPUTFILE
Use awk for the whole thing with " as record separator:
awk -v RS='"' '/MdcFile/ { getline; print }' cube.mdl

Resources