Unix awk command to return all matching lines - shell

I have a file which looks like the below -
A
B
C
D
E
-----
A
B
C
D
C
---
X
Y
A
B
XEC
---
When the fifth row of each block is/contains E, I want the previous 4 lines to be returned. I wrote the below command but it is buggy
awk '{a[NR]=$0} $0~s {f=NR} END {print a[f-4]; print a[f-6]; print a[f-8];}' s="E" file.txt
But it is returning only the last match. I want all the matched lines to be returned.
For the above entries, the output needs to be
A
B
C
D
---
X
Y
A
B
Is there any other way to achieve this?

Using gawk : multi-character RS is only supported in gnu-awk
awk -v RS='\n\n[-]+\n\n*' -v FS="\n" '$5 ~ /E/{printf "%s\n%s\n%s\n%s\n---\n",$1,$2,$3,$4}' inputfile
A
B
C
D
---
X
Y
A
B
---

Not sure really how you want, you really need --- and then newline char ???
Using tac and awk you can try below one
Print the N records after some regexp:
awk -v n=4 'c&&c--;/regexp/{c=n}' <input_file>
Print the N records before some regexp:
tac <input_file> | awk -v n=4 'c&&c--;/regexp/{c=n}' | tac
^ ^ ^ ^
| | | |
reverse file no of lines to print when regexp found again reverse
Input
$ cat infile
A
B
C
D
E
-----
A
B
C
D
C
---
X
Y
A
B
XEC
---
When n=4
$ tac infile | awk -v n=4 'c&&c--;/E/{c=n}' | tac
A
B
C
D
X
Y
A
B
When n=2
$ tac infile | awk -v n=2 'c&&c--;/E/{c=n}' | tac
C
D
A
B

Related

Combining 2 lines together but "interlaced"

I have 2 lines from an output as follow:
a b c
x y z
I would like to pipe both lines from the last command into a script that would combine them "interlaced", like this:
a x b y c z
The solution should work for a random number of columns from the output, such as:
a b c d e
x y z x y
Should result in:
a x b y c z d x e y
So far, I have tried using awk, perl, sed, etc... but without success. All I can do, is to put the output into one line, but it won't be "interlaced":
$ echo -e 'a b c\nx y z' | tr '\n' ' ' | sed 's/$/\n/'
a b c x y z
Keep fields of odd numbered records in an array, and update the fields of even numbered records using it. This will interlace each pair of successive lines in input.
prog | awk 'NR%2{split($0,a);next} {for(i in a)$i=(a[i] OFS $i)} 1'
Here's a 3 step solution:
$ # get one argument per line
$ printf 'a b c\nx y z' | xargs -n1
a
b
c
x
y
z
$ # split numbers of lines by 2 and combine them side by side
$ printf 'a b c\nx y z' | xargs -n1 | pr -2ts' '
a x
b y
c z
$ # combine all input lines into single line
$ printf 'a b c\nx y z' | xargs -n1 | pr -2ts' ' | paste -sd' '
a x b y c z
$ printf 'a b c d e\nx y z 1 2' | xargs -n1 | pr -2ts' ' | paste -sd' '
a x b y c z d 1 e 2
Could you please try following, it will join every 2 lines in "interlaced" fashion as follows.
awk '
FNR%2!=0 && FNR>1{
for(j=1;j<=NF;j++){
printf("%s%s",a[j],j==NF?ORS:OFS)
delete a
}
}
{
for(i=1;i<=NF;i++){
a[i]=(a[i]?a[i] OFS:"")$i}
}
END{
for(j=1;j<=NF;j++){
printf("%s%s",a[j],j==NF?ORS:OFS)
}
}' Input_file
Here is a simple awk script
script.awk
NR == 1 {split($0,inArr1)} # read fields frrom 1st line into arry1
NR == 2 {split($0,inArr2); # read fields frrom 2nd line into arry2
for (i = 1; i <= NF; i++) printf("%s%s%s%s", inArr1[i], OFS, inArr2[i], OFS); # ouput interlace fields from arr1 and arr2
print; # terminate output line.
}
input.txt
a b c d e
x y z x y
running:
awk -f script.awk input.txt
output:
a x b y c z d x e y x y z x y
Multiline awk solution:
interlaced.awk
{
a[NR] = $0
}
END {
split(a[1], b)
split(a[2], c)
for (i in b) {
printf "%s%s %s", i==1?"":OFS, b[i], c[i]
}
print ORS
}
Run it like this:
foo_program | awk -f interlaced.awk
Perl will do the job. It was invented for this type of task.
echo -e 'a b c\nx y z' | \
perl -MList::MoreUtils=mesh -e \
'#f=mesh #{[split " ", <>]}, #{[split " ", <>]}; print "#f"'
 
a x b y c z
You can of course print out the meshed output any way you want.
Check out http://metacpan.org/pod/List::MoreUtils#mesh
You could even make it into a shell function for easy use:
function meshy {
perl -MList::MoreUtils=mesh -e \
'#f=mesh #{[split " ", <>]}, #{[split " ", <>]}; print "#f"'
}
$ echo -e 'X Y Z W\nx y z w' |meshy
X x Y y Z z W w
$
Ain't Perl grand?
This might work for you (GNU sed):
sed -E 'N;H;x;:a;s/\n(\S+\s+)(.*\n)(\S+\s+)/\1\3\n\2/;ta;s/\n//;s// /;h;z;x' file
Process two lines at time. Append two lines in the pattern space to the hold space which will introduce a newline at the front of the two lines. Using pattern matching and back references, nibble away at the front of each of the two lines and place the pairs at the front. Eventually, the pattern matching fails, then remove the first newline and replace the second by a space. Copy the amended line to hold space, clean up the pattern space ready for the next couple of line (if any) and print.

How to print lines with the specified word in the path?

Let's say I have file abc.txt which contains the following lines:
a b c /some/path/123/path/120
a c b /some/path/312/path/098
a p t /some/path/123/path/321
a b c /some/path/098/path/123
and numbers.txt:
123
321
123
098
I want to print the whole line which contain "123" only in the third place under "/some/path/123/path",
I don't want to print line "a c b/some/path/312/path" or
"a b c /some/path/098/path/123/". I want to save all files with the "123" in the third place in the new file.
I tried several methods and the best way seems to be use awk. Here is my example code which is not working correctly:
for i in `cat numbers.txt | xargs`
do
cat abc.txt | awk -v i=$i '$4 ~ /i/ {print $0}' > ${i}_number.txt;
done
because it's catching also for example "a b c /some/path/098/path/123/".
Example:
For number "123" I want to save only one line from abc.txt in 123_number.txt:
a b c /some/path/123/path/120
For number "312" I want to save only one line from abc.txt in 312_number.txt:
a c b /some/path/312/path/098
this can be accomplished in a single awk call:
$ awk -F'/' 'NR==FNR{a[$0];next} ($4 in a){f=$4"_number.txt";print >>f;close(f)}' numbers.txt abc.txt
$ cat 098_number.txt
a b c /some/path/098/path/123
$ cat 123_number.txt
a b c /some/path/123/path/120
a p t /some/path/123/path/321
keep numbers in an array and use it for matching lines, append matching lines to corresponding files.
if your files are huge you may speed up the process using sort:
sort -t'/' -k4 abc.txt | awk -F'/' 'NR==FNR{a[$0];next} ($4 in a){if($4!=p){close(f);f=(p=$4)"_number.txt"};print >>f}' numbers.txt -

Concatenation of two columns from the same file

From a text file
file
a d
b e
c f
how are the tab delimited columns concatenated into one column
a
b
c
d
e
f
Now I use awk to output columns to two files that I then concatenated using cat. But there must be a better one line command?
for a generalized approach
$ f() { awk '{print $'$1'}' file; }; f 1; f 2
a
b
c
d
e
f
if the file is tab delimited perhaps simply with cut (the inverse operation of paste)
$ cut -f1 file.t; cut -f2 file.t
This simple awk command should do the job:
awk '{print $1; s=s $2 ORS} END{printf "%s", s}' file
a
b
c
d
e
f
You can use process substitution; that would eliminate the need to create file for each column.
$ cat file
a d
b e
c f
$ cat <(awk '{print $1}' file) <(awk '{print $2}' file)
a
b
c
d
e
f
$
OR
as per the comment you can just combine multiple commands and redirect their output to a different file like this:
$ cat file
a d
b e
c f
$ (awk '{print $1}' file; awk '{print $2}' file) > output
$ cat output
a
b
c
d
e
f
$
try: Without reading file twice or without any external calls of any other commands, only single awk to rescue. Also considering that your Input_file is same like shown sample.
awk '{VAL1=VAL1?VAL1 ORS $1:$1;VAL2=VAL2?VAL2 ORS $2:$2} END{print VAL1 ORS VAL2}' Input_file
Explanation: Simply creating a variable named VAL1 which will contain $1's value and keep on concatenating in it's own value, VAL2 will have $2's value and keep on concatenating value in it's own. In END section of awk printing the values of VAL1 and VAL2.
You can combine bash commands with ; to get a single stream:
$ awk '{print $1}' file; awk '{print $2}' file
a
b
c
d
e
f
Use process substitution if you want that to be as if it were a single file:
$ txt=$(awk '{print $1}' file; awk '{print $2}' file)
$ echo "$txt"
a
b
c
d
e
f
Or for a Bash while loop:
$ while read -r line; do echo "line: $line"; done < <(awk '{print $1}' file; awk '{print $2}' file)
line: a
line: b
line: c
line: d
line: e
line: f
If you're using notepadd++ you could replace all tab values with the newline char "\r\n"
another approach:
for i in $(seq 1 2); do
awk '{print $'$i'}' file
done
output:
a
b
c
d
e
f

How to multiply AWK output

I have a file data.csv with multiple lines that reads:
A
B
C
and I want the output of the code to be multiplied n times:
A
B
C
A
B
C
Here is an example of a line I've been trying and what it returns:
awk '{for (i=0; i<3 ;i++){ print $1}}' input.csv
A
A
A
B
B
B
C
C
C
Same with cat and other tools
$ awk -v n=3 'BEGIN{ for (i=1;i<n;i++) {ARGV[ARGC]=ARGV[1]; ARGC++} } 1' file
A
B
C
A
B
C
A
B
C
Note that the above only stores the name of the file n times, not the contents of the file and so it'd work for any file of any size as it uses negligible memory.
This would do:
for i in {1..3}; do cat data.csv; done
It won't work with pipes, though.
Thanks for the comments
You can use cat and printf
cat $(printf "%0.sfile " {1..3})
Here is a single efficient 1-liner: yes data | head -3 | xargs cat
$ cat data
A
B
C
$ yes data | head -3 | xargs cat
A
B
C
A
B
C
A
B
C
$
head -3 => here 3 indicates n number of times.
Or using an awk solution:
$ cat data
A
B
C
$ awk 'BEGIN{i=0} {a[i]=$0;i++} END {for(i=0;i<=3;i++) for(j=0;j<=NR;j++) print a[j]}' data | sed '/^$/d'
A
B
C
A
B
C
A
B
C
A
B
C
$
Try this :
seq 2 | xargs -Inone cat input.csv
probably the shortest
cat input.csv{,,}
Supposing you're writing a shell-script, why use awk?
for i in `seq 3`; do
cat data.csv
done
If you want to do this using pipes, e.g. with awk, you'll need to store the file data in memory or save it temporarily to disk. For example:
cat data.csv | \
awk '{a = a $0 "\n"} END { for (i=0; i<3 ;i++){ printf "%s",a; }}'
for (( c=1; c<=3; c++ ))
do
cat Input_file.csv
done
With sed and hold/pattern space:
In this given situation with only single letters. Respectively ABC:
If you want to print once:
cat temp | sed 's/\(.*\)/\1/;N;N;H'
output:
[anthony#aguevara ~]$ cat temp | sed 's/\(.*\)/\1/;N;N;H;G;G'
A
B
C
[anthony#aguevara ~]$
Twice(Just append a semi-colon/capital G tot he end):
cat temp | sed 's/\(.*\)/\1/;N;N;H;G'
output:
[anthony#aguevara ~]$ cat temp | sed 's/\(.*\)/\1/;N;N;H;G;G'
A
B
C
A
B
C
[anthony#aguevara ~]$
Three times(Another G):
cat temp | sed 's/\(.*\)/\1/;N;N;H;G;G'
output:
[anthony#aguevara ~]$ cat temp | sed 's/\(.*\)/\1/;N;N;H;G;G'
A
B
C
A
B
C
A
B
C
[anthony#aguevara ~]$
and so on.
File(Has no newlines in file):
[anthony#aguevara ~]$ cat temp
A
B
C
[anthony#aguevara ~]$

tail file till last duplicate line in a file using bash

Hey everyone! How to in simple way find line number of last duplicate in file
I need take tale till last duplicate Example
hhhh
str1
str2
hhhh
str1
hhh
**str1
str2
str3**
I need only bold till hhh(str1,str2,str3).Thanks in advance!
Give this a try:
awk '{if (a[$0]) accum = nl = ""; else {a[$0]=1;accum = accum nl $0; nl = "\n"}} END { print accum}' inputfile
Given this input:
aaa
b
c
aaa
d
e
f
aaa
b
aaa
g
h
i
This is the output:
g
h
i
taking the sample from Dennis,
$ gawk -vRS="aaa" 'END{print}' file
g
h
i
here's another way if you don't know before hand, although not as elegant as one awk script.
var=$(sort file| uniq -c|sort -n | tail -1| awk '{print $2}')
gawk -vRS="$var" 'END{print}' file
still, this will only get the duplicate that occurs the most frequency. it does not get the "last duplicate" , whatever that means.

Resources