select multiple patterns with grep - bash

I have file that looks like that:
t # 3-7, 1
v 0 104
v 1 92
v 2 95
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-8, 1
v 0 94
v 1 13
v 2 19
v 3 5
u 0 1 2
u 0 2 2
u 0 3 2
t # 3-9, 1
v 0 94
v 1 13
v 2 19
v 3 7
u 0 1 2
u 0 2 2
u 0 3 2
t corresponds to header of each block.
I would like to extract multiple patterns from the file and output transactions that contain required patterns altogether.
I tried the following code:
ps | grep -e 't\|u 0 1 2' file.txt
and it works well to extract header and pattern 'u 0 1 2'. However, when I add one more pattern, the output list only headers start with t #. My modified code looks like that:
ps | grep -e 't\|u 0 1 2 && u 0 2 2' file.txt
I tried sed and awk solutions, but they do not work for me as well.
Thank you for your help!
Olha

Use | as the separator before the third alternative, just like the second alternative.
grep -E 't|u 0 1 2|u 0 2 2' file.txt
Also, it doesn't make sense to specify a filename and also pipe ps to grep. If you provide filename arguments, it doesn't read from the pipe (unless you use - as a filename).

You can use grep with multiple -e expressions to grep for more than one thing at a time:
$ printf '%d\n' {0..10} | grep -e '0' -e '5'
0
5
10

Expanding on #kojiro's answer, you'll want to use an array to collect arguments:
mapfile -t lines < file.txt
for line in "${lines[#]}"
do
arguments+=(-e "$line")
done
grep "${arguments[#]}"
You'll probably need a condition within the loop to check whether the line is one you want to search for, but that's it.

Related

Add up every 5 rows in a column of integers BASH

I am writing a parser, and have to so some fancy stuff. I am trying not to use python, but I might have to at this point.
Given an STDOUT that looks like this:
1
0
2
3
0
0
1
0
0
2
0
3
0
4
0
5
0
2
.
.
.
For about 100,000 lines. What I need to do is add up every 5, like so:
1 - start
0 |
2 | - 6
3 |
0 - end
0 - start
1 |
0 | - 3
0 |
2 - end
0 - start
3 |
0 | - 7
4 |
0 - end
5
0
2
.
.
.
The -, |, start, end, are all for visual representation, I just need it in a column list:
6
3
7
.
.
.
I currently have a method of doing this by using an increment head -n $i and tail -n 5 to cut 5 rows out of the list, then I use paste -sd+ - | bc to add up all the values. But this is wayyyy to slow because there is 100,000 columns.
If anyone has anything to add I would appreciate it. Let me know if more info is needed.
Thank you
It looks like awk is a natural tool to use:
awk '{ sum += $1 } NR % 5 == 0 { print sum; sum = 0 }'
Add values in column 1 to sum. If the record number modulo 5 is 0, print the sum and reset it to 0. Note that if the last group of records is short (1-4 elements in the group), their sum is not printed. If you want the sum for the short group printed, add END { if (NR % 5 != 0) print sum } to the script.
Since this makes a single pass over the data file using a single command, it will be hard to beat it. Using Perl might be a little faster. I don't know how Python would fare against either Awk or Perl.
You can use awk for it.
Say file named file1 contains
1
0
2
3
0
0
1
0
0
2
0
3
0
4
0
5
0
.
.
.
So the awk command goes like:
awk 'begin{sum=0;} {sum=sum+1;if(NR%5==0){print sum;sum=0;}}' file1

How do I filter tab-separated input by the count of fields with a given value?

My data(tab separated):
1 0 0 1 0 1 1 0 1
1 1 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
...
how can i grep the lines with exact, for example, 5 '1's,
ideal output:
1 0 0 1 0 1 1 0 1
Also, how can i grep lines with equal or more than (>=) 5 '1's,
ideal output:
1 0 0 1 0 1 1 0 1
1 1 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1
i tried,
grep 1$'\t'1$'\t'1$'\t'1$'\t'1
however this will only output consecutive '1's, which is not all i want.
i wonder if there will be any simple method to achieve this, thank you!
John Bollinger's helpful answer and anishane's answer show that it can be done with grep, but, as has been noted, that is quite cumbersome, given that regular expression aren't designed for counting.
awk, by contrast, is built for field-based parsing and counting (often combined with regular expressions to identify field separators, or, as below, the fields themselves).
Assuming you have GNU awk, you can use the following:
Exactly 5 1s:
awk -v FPAT='\\<1\\>' 'NF==5' file
5 or more 1s:
awk -v FPAT='\\<1\\>' 'NF>=5' file
Special variable FPAT is a GNU awk extension that allows you to identify fields via a regex that describes the fields themselves, in contrast with the standard approach of using a regex to define the separators between fields (via special variable FS or option -F):
'\\<1\\>' identifies any "isolated" 1 (surrounded by non-word characters) as a field, based on word-boundary assertions \< and \>; the \ must be doubled here so that the initial string parsing performed by awk doesn't "eat" single \s.
Standard variable NF contains the count of input fields in the line at hand, which allows easy numerical comparison. If the conditional evaluates to true, the input line at hand is implicitly printed (in other words: NF==5 is implicitly the same as NF==5 { print } and, more verbosely, NF==5 { print $0 }).
A POSIX-compliant awk solution is a little more complicated:
Exactly 5 1s:
awk '{ l=$0; gsub("[\t0]", "") }; length($0)==5 { print l }' file
5 or more 1s:
awk '{ l=$0; gsub("[\t0]", "") }; length($0)>=5 { print l }' file
l=$0 saves the input line ($0) in its original form in variable l.
gsub("[\t0]", "") replaces all \t and 0 chars. in the input line with the empty string, i.e., effectively removes them, and only leaves (directly concatenated) 1 instances (if any).
length($0)==5 { print l } then prints the original input line (l) only if the resulting string of 1s (i.e., the count of 1s now stored in the modified input line ($0)) matches the specified count.
You can use grep. But that would be an abuse of regex.
$ cat countme
1 0 0 1 0 1 1 0 1
1 1 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
$ grep -P '^[0\t]*(1[0\t]*){5}[0\t]*$' countme # Match exactly 5
1 0 0 1 0 1 1 0 1
$ grep -P '^[0\t]*(1[0\t]*){5,}[0\t]*$' countme # Match >=5
1 0 0 1 0 1 1 0 1
1 1 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1
You can do this to get lines with exactly five '1's:
grep '^[^1]*\(1[^1]*\)\{5,5\}[^1]*$'
You can simplify that to this for at least five '1's:
grep '\(1[^1]*\)\{5,\}'
The enumerated quantifier (\{n,m\}) enables you to conveniently specify a particular number or range of numbers of consecutive matches to a sub-pattern. To avoid matching lines with extra matches to such a pattern, however, you must also anchor it to the beginning and end of the line.
The other other trick is to make sure the gaps previous to the first 1, between the 1s, and after the last 1 are matched. In your case, all of those gaps can be represented pretty simply as ranges of zero or more characters other than 1: [^1]*. Putting those pieces together gives you the above regular expressions.
Do
sed -nE '/^([^1]*1[^1]*){5}$/p' your_file
for exactly 5 matches and
sed -nE '/^([^1]*1[^1]*){5,}$/p' your_file
for 5 or more matches.
Note: In GNU sed you may not see the -E option in the manpage, but it is supported. Using -E is for portability to, say, Mac OSX.
with perl
$ perl -ane 'print if (grep {$_==1} #F) == 5' ip.txt
1 0 0 1 0 1 1 0 1
$ perl -ane 'print if (grep {$_==1} #F) >= 5' ip.txt
1 0 0 1 0 1 1 0 1
1 1 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1
-a to automatically split input line on whitespaces and save to #F array
grep {$_==1} #F returns array with elements from #F array which are exactly equal to 1
(grep {$_==1} #F) == 5 in scalar context, comparison will be done based on number of elements of array
See http://perldoc.perl.org/perlrun.html#Command-Switches for details on -ane options

Grep variable in for loop

I want to grep a specific line for each loop in a for loop. I've already looked on the internet to see an answer to my problem, I tried them but it doesn't seem to work for me... And I don't find what I'm doing wrong.
Here is the code :
for n in 2 4 6 8 10 12 14 ; do
for U in 1 10 100 ; do
for L in 2 4 6 8 ; do
i=0
cat results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat
for k in $(seq 1 1 $L) ; do
${'var'.$k}=`grep " $k " results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat | tail -n 1`
done
which gives me :
%
%
% site density double occupancy
1 0.49791021 0.03866179
2 0.49891438 0.06077808
3 0.50426102 0.05718336
4 0.49891438 0.06077808
./run_deviation_functionL.sh: line 109: ${'var'.$k}=`grep " $k " results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat | tail -n 1`: bad substitution
Then, I would like to take only the density number, with something like:
${'density'.$k}=`echo "${'var'.$k:10:10}" | bc -l`
Anyone knows the reason why it fails?
Use declare to create variable names from variables:
declare density$k="`...`"
Use the variable indirection to retrieve them:
var=var$k
echo ${!var:10:10}

How to produce cartesian product in bash?

I want to produce such file (cartesian product of [1-3]X[1-5]):
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5
I can do this using nested loop like:
for i in $(seq 3)
do
for j in $(seq 5)
do
echo $i $j
done
done
is there any solution without loops?
Combine two brace expansions!
$ printf "%s\n" {1..3}" "{1..5}
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5
This works by using a single brace expansion:
$ echo {1..5}
1 2 3 4 5
and then combining with another one:
$ echo {1..5}+{a,b,c}
1+a 1+b 1+c 2+a 2+b 2+c 3+a 3+b 3+c 4+a 4+b 4+c 5+a 5+b 5+c
A shorter (but hacky) version of Rubens's answer:
join -j 999999 -o 1.1,2.1 file1 file2
Since the field 999999 most likely does not exist it is considered equal for both sets and therefore join have to do the Cartesian product. It uses O(N+M) memory and produces output at 100..200 Mb/sec on my machine.
I don't like the "shell brace expansion" method like echo {1..100}x{1..100} for large datasets because it uses O(N*M) memory and can when used careless bring your machine to knees. It is hard to stop because ctrl+c does not interrupts brace expansion which is done by the shell itself.
The best alternative for cartesian product in bash is surely -- as pointed by #fedorqui -- to use parameter expansion. However, in case your input that is not easily producible (i.e., if {1..3} and {1..5} does not suffice), you could simply use join.
For example, if you want to peform the cartesian product of two regular files, say "a.txt" and "b.txt", you could do the following. First, the two files:
$ echo -en {a..c}"\tx\n" | sed 's/^/1\t/' > a.txt
$ cat a.txt
1 a x
1 b x
1 c x
$ echo -en "foo\nbar\n" | sed 's/^/1\t/' > b.txt
$ cat b.txt
1 foo
1 bar
Notice the sed command is used to prepend each line with an identifier. The identifier must be the same for all lines, and for all files, so the join will give you the cartesian product -- instead of putting aside some of the resultant lines. So, the join goes as follows:
$ join -j 1 -t $'\t' a.txt b.txt | cut -d $'\t' -f 2-
a x foo
a x bar
b x foo
b x bar
c x foo
c x bar
After both files are joined, cut is used as an alternative to remove the column of "1"s formerly prepended.

Add to the end of a predetermined line using sed in bash

I have a file in the format:
C 1 1 2
H 2 2 1
C 3 1 2
C 3 3 2
H 2 3 1
I need to add " f" to the end of specific lines, for example the third line, so the output would be:
C 1 1 2
H 2 2 1
C 3 1 2 f
C 3 3 2
H 2 3 1
From Googling, it seems that I need to use sed, but I couldn't find any examples on how to do specifically what I want.
Thanks in advance.
You are looking for this article on sed. Specifically, the section on restricting to a line number. An example:
sed '3 s/$/f/' < yourFile
awk 'NR==3{$0=$0" f"}1' your_file

Resources