awk: print first line of file before reading lines - bash

How would I go about printing the first line of given input before I start stepping through each of the lines with awk?
Say I wanted to run the command ps aux and return the column headings and a particular pattern I'm searching for. In the past I've done this:
ps aux | ggrep -Pi 'CPU|foo'
Where CPU is a value I know will be in the first line of input as it's one of the column headings and foo is the particular pattern I'm actually searching for.
I found an awk pattern that will pull the first line:
awk 'NR > 1 { exit }; 1'
Which makes sense, but I can't seem to figure out how to fire this before I do my pattern matching on the rest of the input. I thought I could put it in the BEGIN section of the awk command but that doesn't seem to work.
Any suggestions?

Use the following awk script:
ps aux | awk 'NR == 1 || /PATTERN/'
it prints the current line either if it is the first line in output or if it contains the pattern.
Btw, the same result could be achieved using sed:
ps aux | sed -n '1p;/PATTERN/p'

If you want to read in the first line in the BEGIN action, you can read it in with getline, process it, and discard that line before moving on to the rest of your awk command. This is "stepping in", but may be helpful if you're parsing a header or something first.
#input.txt
Name City
Megan Detroit
Jackson Phoenix
Pablo Charlotte
awk 'BEGIN { getline; col1=$1; col2=$2; } { print col1, $1; print col2, $2 }' input.txt
# output
Name Megan
City Detroit
Name Jackson
City Phoenix
Name Pablo
City Charlotte

Explaining awk BEGIN
I thought I could put it in the BEGIN section ...
In awk, you can have more than one BEGIN clause. These are executed in order before awk starts to read from stdin.

Related

Prepend text to specific line numbers with variables

I have spent hours trying to solve this. There are a bunch of answers as to how to prepend to all lines or specific lines but not with a variable text and a variable number.
while [ $FirstVariable -lt $NextVariable ]; do
#sed -i "$FirstVariables/.*/$FirstVariableText/" "$PWD/Inprocess/$InprocessFile"
cat "$PWD/Inprocess/$InprocessFile" | awk 'NR==${FirstVariable}{print "$FirstVariableText"}1' > "$PWD/Inprocess/Temp$InprocessFile"
FirstVariable=$[$FirstVariable+1]
done
Essentially I am looking for a particular string delimiter and then figuring out where the next one is and appending the first result back into the following lines... Note that I already figured out the logic I am just having issues prepending the line with the variables.
Example:
This >
Line1:
1
2
3
Line2:
1
2
3
Would turn into >
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
You can do all that using below awk one liner.
Assuming your pattern starts with Line, then the below script can be used.
> awk '{if ($1 ~ /Line/ ){var=$1;print $0;}else{ if ($1 !="")print var $1}}' $PWD/Inprocess/$InprocessFile
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
Here is how the above script works:
If the first record contains word Line then it is copied into an awk variable var. From next word onwards, if the record is not empty, the newly created var is appended to that record and prints it producing the desired result.
If you need to pass the variables dynamically from shell to awk you can use -v option. Like below:
awk -v var1=$FirstVariable -v var2=$FirstVariableText 'NR==var{print var2}1' > "$PWD/Inprocess/Temp$InprocessFile"
The way you addressed the problem is by parsing everything both with bash and awk to process the file. You make use of bash to extract a line, and then use awk to manipulate this one line. The whole thing can actually be done with a single awk script:
awk '/^Line/{str=$1; print; next}{print (NF ? str $0 : "")}' inputfile > outputfile
or
awk 'BEGIN{RS="";ORS="\n\n";FS=OFS="\n"}{gsub(FS,OFS $1)}1' inputfile > outputfile

Extract the last three columns from a text file with awk

I have a .txt file like this:
ENST00000000442 64073050 64074640 64073208 64074651 ESRRA
ENST00000000233 127228399 127228552 ARF5
ENST00000003100 91763679 91763844 CYP51A1
I want to get only the last 3 columns of each line.
as you see some times there are some empty lines between 2 lines which must be ignored. here is the output that I want to make:
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
awk  '/a/ {print $1- "\t" $-2 "\t" $-3}'  file.txt.
it does not return what I want. do you know how to correct the command?
Following awk may help you in same.
awk 'NF{print $(NF-2),$(NF-1),$NF}' OFS="\t" Input_file
Output will be as follows.
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
EDIT: Adding explanation of command too now.(NOTE this following command is for only explanation purposes one should run above command only to get the results)
awk 'NF ###Checking here condition NF(where NF is a out of the box variable for awk which tells number of fields in a line of a Input_file which is being read).
###So checking here if a line is NOT NULL or having number of fields value, if yes then do following.
{
print $(NF-2),$(NF-1),$NF###Printing values of $(NF-2) which means 3rd last field from current line then $(NF-1) 2nd last field from line and $NF means last field of current line.
}
' OFS="\t" Input_file ###Setting OFS(output field separator) as TAB here and mentioning the Input_file here.
You can use sed too
sed -E '/^$/d;s/.*\t(([^\t]*[\t|$]){2})/\1/' infile
With some piping:
$ cat file | tr -s '\n' | rev | cut -f 1-3 | rev
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
First, cat the file to tr to squeeze out repeted \ns to get rid of empty lines. Then reverse the lines, cut the first three fields and reverse again. You could replace the useless cat with the first rev.

passing a parameter in awk command won't work

I run the script bellow with ./command script.sh 11, the first line of code bellow stores the output (321) successfully in parameter x (checked with echo on line 2). On line 3 I try to use parameter x to retrieve the last two columns on all lines where the value in the first column is equal to x (in doc2.csv). This won't work but when I replace z=$x by z=321it works fine. Why won't this code work when passing the parameter?
#!/bin/bash
x="$(awk -v y=$1 -F\; '$1 == y' ~/Documents/doc1.csv | cut -d ';' -f2)"
echo $x
awk -v z=$x -F, '$1 == z' ~/Documents/doc2.csv | cut -d ',' -f2,3
doc1.csv (all columns have unique values)
33;987
22;654
11;321
...
doc2.csv
321,156843,ABCD
321,637253,HYEB
123,256843,BHJN
412,486522,HDBC
412,257843,BHJN
862,256843,BHLN
...
Like others have mentioned there is probably some extra characters coming along for the ride in field 2 of your cut command.
If you just use awk to print the column you want instead of the entire line and cutting that you shouldn't have any problems. If you still do then you will need to look into dos2unix.
n=33;
x=$(awk -v y=$n -F\; '$1 == y {print $2}' d1);
echo ${x};
awk -v z=$x -F, '$1 == z' d2
d1 and d2 contain doc1 and doc2 contents as you outlined.
As you can see all I did was stop using cut on the output of awk and just told awk to print the second field if the first field is equal to the input variable.
By the way awk is pretty powerful if you weren't aware... You can do this entire program within awk.
n=11; awk -v x=$n -F\; 'NR==FNR{ if($1==x){ y[$2]; } next} $1 in y{print $2, $3}' d1 <( sed 's/,/;/g' d2)
NR==FNR Is a trick that effectively says "If we are still in the first file, do this"... the key is not forgetting to use next to skip the rest of the awk command. Once we get to the second file FNR flips back to 1 but NR keeps incrementing up so they'll never be equal again.
So for the first file we just load up the second column values into an array where the first column matches our passed variable. You could optimize this since you said d1 was always unique lines.
So once we get into the next file the logic skips everything and runs $1 in y. This just checks if the first column is in the array we have created. If it is awk prints column 2 and 3.
<( sed 's/,/;/g' d2) just means we want to treat the output of the sed command as a file. The sed command is just converting the commas in d2 to semicolons so that it matches the FS that awk expects.
Hopefully you've learned a bit about awk, read more here http://www.catonmat.net/blog/ten-awk-tips-tricks-and-pitfalls/ and a great redirection cheat sheet is available here http://www.catonmat.net/download/bash-redirections-cheat-sheet.pdf .

Extracting field from last row of given table using sed

I would like to write a bash script to extract a field in the last row of a table. I will illustrate by example. I have a text file containing tables with space delimited fields like ...
Table 1 (foobar)
num flag name comments
1 ON Frank this guy is frank
2 OFF Sarah she is tall
3 ON Ahmed who knows him
Table 2 (foobar)
num flag name comments
1 ON Mike he is short
2 OFF Ahmed his name is listed twice
I want to extract the first field in the last row of Table1, which is 3. Ideally I would like to be able to use any given table's title to do this. There are guaranteed carriage returns between each table. What would be the best way to accomplish this, preferably using sed and grep?
Awk is perfect for this, print the first field in the last row for each record:
$ awk '!$1{print a}{a=$1}END{print a}' file
3
2
Just from the first record:
$ awk '!$1{print a;exit}{a=$1}' file
3
Edit:
For a given table title:
$ awk -v t="Table 1" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
3
$ awk -v t="Table 2" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
2
This sed line seems to work for your sample.
table='Table 2'
sed -n "/$table"'/{n;n;:next;h;n;/^$/b last;$b last;b next;:last;g;s/^\s*\(\S*\).*/\1/p;}' file
Explanation: When we find a line matching the table name in $table, we skip that line, and the next (the field labels). Starting at :next we push the current line into the hold space, get the next line and see if it is blank or the end of the file, if not we go back to :next, push the current line into hold and get another. If it is blank or EOF, we skip to :last, pull the hold space (the last line of the table) into pattern space, chop out all but the first field and print it.
Just read each block as a record with each line as a field and then print the first sub-field of the last field of whichever record you care about:
$ awk -v RS= -F'\n' '/^Table 1/{split($NF,a," "); print a[1]}' file
3
$ awk -v RS= -F'\n' '/^Table 2/{split($NF,a," "); print a[1]}' file
2
Better tool to that is awk!
Here is a kind legible code:
awk '{
if(NR==1) {
row=$0;
next;
}
if($0=="") {
$0=row;
print $1;
} else {
row=$0;
}
} END {
if(row!="") {
$0=row;
print $1;
}
}' input.txt

Join lines based on pattern

I have the following file:
test
1
My
2
Hi
3
i need a way to use cat ,grep or awk to give the following output:
test1
My2
Hi3
How can i achieve this in a single command? something like
cat file.txt | grep ... | awk ...
Note that its always a string followed by a number in the original text file.
sed 'N;s/\n//' file.txt
This should give the desired output when the content is in file.txt
paste -d "" - - < filename
This takes consecutive lines and pastes them together delimited by the empty string.
awk '{printf("%s", $0);} !(NR%2){printf("\n");}' file.txt
EDIT: I just noticed that your question requires the use of cat and grep. Both of those programs are unnecessary to achieve your stated aims. If you have some reason for including them that you haven't mentioned, try this (uselessly inefficient) version of the line I wrote immediately above:
cat file.txt | grep '^' | awk '{printf("%s", $0);} !(NR%2){printf("\n");}'
It is possible that this command uses features not present in the original awk program. You may need to invoke the new awk program, nawk instead.
If your input file is always 1 number then 1 string, and you only want the strings, all you have to do is take every other line.
If you only want the odd lines, you can do awk 'NR % 2' file.txt
If you want the evens, this becomes awk 'NR % 2==0' data
Here is the answer:
cat file.txt | awk 'BEGIN { lno = 0 } { val=$0; if (lno % 2 == 1) {printf "%s\n", $0} else {printf "%s", $0}; ++lno}'

Resources