Remove lines contain pattern exclude another pattern bash - bash

I have a file
$ cat File
ce5 xxx 123
ed9 myself,yyy,fail? -
f27 xxx,fail? 145
105 yyy,fail? -
I want to remove all the lines containing string ",fail?" but not "myself" in bash.
Expected output
$ cat File
ce5 xxx 123
ed9 myself,yyy,fail? -
I can grep the lines but not sure how to remove them
cat File | grep -v "myself" | grep ",fail?"
f27 xxx,fail? 145
105 yyy,fail? -

I think you can't do such things (easily) with grep.
Print mysql and don't print ,fail? with sed:
sed '/myself/n; /,fail\?/d' File
With awk:
awk '! /,fail\?/ || /myself/'

Related

Get the context around a line number in Bash from compressed file

I know that it's possible to search a line number (say, line # 139656504) and return the context around it:
grep -n 139656504 -B 5 -A 5 file.txt
But when I used that on a compressed file it returned nothing:
zgrep -n 139656504 -B 5 -A 5 file.txt.gz
I'm on macOS Mojave.
Is there a way to get the context around that line with compressed files?
I made a test file (test.txt) that contains:
test1
test2
test3
test4
test5
test6
test7
test8
test9
testBB
test11
test12
test13
test14
test15
test16
test17
test17
test18
test19
test20
Compress the file:
pigz test.txt
Now if I run:
zgrep -n 10 -C 5 test.txt.gz
It gives me nothing... (I made sure that the line 10 would not have a number 10, otherwise, zgrep "searches" for a 10, not the line # 10)
If the line 10 would have been test10 and not testBB, it would have worked. But this is not what I'm expecting.
If you want to print the content of a specific line number, you can use awk:
awk -v nr=10 'FNR==nr' file
or with line number prefix
awk -v nr=10 'FNR==nr{ print FNR":"$0 }' file
or with 5 context lines
awk -v nr=10 'FNR>=nr-5 && FNR<=nr+5{ print FNR":"$0 }' file
or
zcat file.gz | awk ...
for gzipped files.
try this:
zgrep -n 139656504 -C 5 file.txt.gz
Notes:
-n means show line numbers in the output
-C 5 means show 5 lines before and 5 lines after each line that satisfies the expression.
139656504 is the pattern you are searching for.
Do man grep or man zgrep -- looks like all the command line switches work for either.
If all you need is to see surrounding lines of a specific line, you can do something like this:
gunzip -c file.txt.gz | sed -n "139656499,139656509p"
This means -- gunzip the file to stdout so you can pipe it, pipe it into sed which is an amazing utility, and sed here has been told to display line number 5 before and 5 after 139656504.

Extracting multiple lines of data between two delimiters

I have a log file containing multiple lines of data. I need to extract and the all the lines between the delimiters and save it to the output file
input.log
Some data
<delim_begin>ABC<delim_end>
some data
<delim_begin>DEF<delim_end>
some data
The output.log file should look like
ABC
DEF
I tried this code but it does not work, it prints all the content of input.log
sed 's/<delim_begin>\(.*\)<delim_end>/\1/g' input.log > output.log
Using awk you can do it using custom field separator:
awk -F '<(delim_begin|delim_end)>' 'NF>2{print $2}' file
ABC
DEF
Using grep -P (PCRE):
grep -oP '(?<=<delim_begin>).*(?=<delim_end>)' file
ABC
DEF
sed alternative
$ sed -nr 's/<delim_begin>(.*)<delim_end>/\1/p' file
ABC
DEF
This should do it:
cat file | awk -F '<(delim_begin|delim_end)>' '{print $2}'
You can use this command -
cat file | grep "<delim_begin>.*<delim_end>" | sed 's/<delim_begin>//g' | sed 's/<delim_end>//' > output.log

Extract lines from a file in bash

I have a file like this
I would like to extract the line with the 0 and 1 (all lines in the file) into a seperate file. However, the sequence does not have to start with a 0 but could also start with a 1. However, the line always comes directly after the line (SITE:). Moreover, I would like to extract the line SITTE itself into a seperate file. Could somebody tell me how that is doable in bash?
Moreover, I would like to extract the line SITTE itself into a seperate file.
That’s the easy part:
grep '^SITE:' infile > outfile.site
Extracting the line after that is slightly harder:
grep --after-context=1 '^SITE:' infile \
| grep '^[01]*$' \
> outfile.nr
--after-context (or -A) specifies how many lines after the matching line to print as well. We then use the second grep to print only that line, and not the actually matching line (nor the delimiter which grep puts between each matching entry when specifying an after-context).
Alternatively, you could use the following to match the numeric lines:
grep '^[01]*$' infile > outfile.nr
That’s much easier, but it will find all lines consisting solely of 0s and 1s, regardless of whether they come after a line which starts with SITE:.
You could try something like :
$ egrep -o "^(0|1)+$" test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
$ grep "^SITE:" test.txt > test3.txt
$ cat test3.txt
SITE: 0 0.000340988542 0.0357651018
SITE: 1 0.000529755514 0.00324293642
SITE: 2 0.000577745511 0.052214098
Another solution, using bash :
$ while read; do [[ $REPLY =~ ^(0|1)+$ ]] && echo "$REPLY"; done < test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
To remove the characters 0 at beginning of the line :
$ egrep "^(0|1)+$" test.txt | sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
1010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000
11010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
UPDATE : New file format provided in comments :
$ egrep "^SITE:" test.txt|egrep -o "(0|1)+$"|sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
100000000000000000000001000001000000000000000000000000000000000000
1010010010000000000111101000010000001001010111111100000000000010010001101010100011101011110011100
10000000000
$ egrep "^SITE:" test.txt|sed "s/[01\ ]\{1,\}$//g" > test3.txt
$ cat test3.txt
SITE: 967 0.189021866 0.0169990123
SITE: 968 0.189149593 0.246619149
SITE: 969 0.189172266 6.84752689e-05
Here's a simple awk solution that matches all lines starting with SITE: and outputs the respective next line:
awk '/^SITE:/ { if (getline) print }' infile > outfile
Simply omit the { ... } block part to extract all lines starting with SITE: themselves to a separate file:
awk '/^SITE:/' infile > outfile
If you wanted to combine both operations:
outfile1 and outfile2 are the names of the 2 output files, passed to awk as variables f1 and f2:
awk -v f1=outfile1 -v f2=outfile2 \
'/^SITE:/ { print > f1; if (getline) print > f2 }' infile

Need to replace a character in file2 with string in file1

Here is what I am trying to do.
File1:
abc
bcd
cde
def
efg
fgh
ghi
File2:
ip:/vol0/scratch/&
ip:/vol0/sysbuild/
ip:/vol0/cde
ip:/vol0/mnt/cm/&
ip:/vol0/&
ip:/vol0/mnt/fgh
ip:/vol0/mnt/&
As you can see File2 has & at the end of some lines, I need to replace the & with corresponding line in File1 and ignore the lines without the & For example, if line 2 and line 3 doesn't have & the script would skip line 2 and 3 in both files and go to line 4 to replace the &
How would I achieve this with shell script.
Using paste and awk:
$ paste file2 file1 | awk 'sub(/&\s+/,"")'
ip:/vol0/scratch/abc
ip:/vol0/mnt/cm/def
ip:/vol0/efg
ip:/vol0/mnt/ghi
Wasn't 100% clear if you wanted the lines not ending in & in the output:
$ paste file2 file1 | awk '{sub(/&\s+/,"");print $1}'
ip:/vol0/scratch/abc
ip:/vol0/sysbuild/
ip:/vol0/cde
ip:/vol0/mnt/cm/def
ip:/vol0/efg
ip:/vol0/mnt/fgh
ip:/vol0/mnt/ghi
With sed:
$ paste file2 file1 | sed -rn '/&/s/&\s+//p'
ip:/vol0/scratch/abc
ip:/vol0/mnt/cm/def
ip:/vol0/efg
ip:/vol0/mnt/ghi
awk 'NR==FNR{a[NR]=$0;next} sub(/&/,a[FNR])' file1 file2
paste file1 file2 | awk 'gsub( /&/, $1 )' | cut -f2-
try this
awk '{if (NR == FNR){f[NR]= $0;}else {gsub("&",f[FNR],$0); print $0}}' file1.txt file2.txt
This might work for you (GNU sed):
sed = file1 | sed -r 'N;s/(.*)\n(.*)/\1s|\&$|\2|/' | sed -f - file2
sed = file1 generate line numbers
sed -r 'N;s/(.*)\n(.*)/\1s|\&$|\2|/' combine line number with data line and produce a sed substitution command using the line number as an address.
sed -f - file2 feed the above commands into a sed invocation using the -f switch and the standard input -

bash process data from two files

file1:
456
445
2323
file2:
433
456
323
I want get the deficit of the data in the two files and output to output.txt, that is:
23
-11
2000
How do I realize this? thank you.
$ paste file1 file2 | awk '{ print $1 - $2 }'
23
-11
2000
Use paste to create the formulae, and use bc to perform the calculations:
paste -d - file1 file2 | bc
In pure bash, with no external tools:
while read -u 4 line1 && read -u 5 line2; do
printf '%s\n' "$(( line1 - line2 ))"
done 4<file1 5<file2
This works by opening both files (attaching them to file descriptors 4 and 5); going into a loop in which we read one line from each descriptor per iteration (exiting the loop if either has no value), and calculate and print the result.
You could use paste and awk to operate between columns:
paste -d" " file1 file2 | awk -F" " '{print ($1-$2)}'
Or even pipe to a file:
paste -d" " file1 file2 | awk -F" " '{print ($1-$2)}' > output.txt
Hope it helps!

Resources