find pattern getting from file in directory with files - shell

i have first file with 3 lines :
test1
test2
test3
i use the grep cmd to search every lines from directory with 10 files :
grep -Ril "test2"
result is :
/usr/src/files/rog.txt
i need grep to delete the 5 lines from the finiding file , 2 lines before and 2 after test2
please can help me for good grep use .

There is one way to use the -A and -B options of grep. But to use it, you need to perform two steps.
First you select all matches with the previous and next line and with that list, but that would have some side effects (which in your application are probably acceptable).
To do this, you issue the following commands:
grep -A 2 -B 2 "test2" file1.txt > negative.txt
grep -v -f negative.txt file1.txt
The first line outputs all findings of test2 in file1.txt accompanied by the 2 preceding and 2 succeeding lines of each line found. If I got your question right, this is the "negative" of the lines you want. The second line now lists all lines from file1.txt which do not correspond to a "negative line". This should be close to what you need.
There is only one side effect which you should know. If file1.txt contains duplicate lines like this:
test1
test2
test3
test4
...
test11
test12
test3
test4
The code above would also filter out the two last lines, even though there is no "test2" line near because they are duplicates of the lines 3 and 4 which were written to "negative.txt" because of line 2. But if you're processing file lists probably duplicates are no issue.

Related

Is it possible to work with 'for loop grep' commands?

I have lots of files in every year directory
and in each file have long and large sentence like this for exmaple
List item
home/2001/2001ab.txt
the AAAS kill every one not but me and you and etc
the A1CF maybe color of full fill zombie
home/2002/2002ab.txt
we maybe know some how what
home/2003/2003ab.txt
Mr, Miss boston, whatever
aaas will will will long long
and in home directory, I got home/reference.txt (list of word file)
A1BG
A1CF
A2M
AAAS
I'd like to do count how many word in the file reference.txt is in every single year file
this is my code where I run in every year directory
home/2001/, home/2002/, home/2003/
# awk
function search () {
awk -v pattern="$1" '$0 ~ pattern {print}' *.txt > $1
}
# load custom.txt
for i in $(cat reference.txt)
do
search $i
done
# word count
wc -l * > line-count.txt
this is my result
home/2001/A1BG
$cat A1BG
0
home/2001/A1CF
$cat A1CF
1
home/2001/A2M
$cat A2M
0
home/2001/AAAS
$cat AAAS
1
home/2001/line-count.txt
$cat line-count.txt
2021ab.txt 2
A1BG
A1CF 1
A2M 0
AAAS 1
result line-count.txt file have all information what I want
but I have to do this work repeat manually
do cd directory
do run my code
and then cd directory
I have around 500 directory and file, it is not easy
and second problem is wasty bunch of file
create lots of file and takes too much time
because of this at first I'd likt use grep command
but I dont' know how to use list of file instead of single word
that is why I use awk
How can i do it more simple
at first I'd likt use grep command but I dont' know how to use list of
file instead of single word
You might use --file=FILE option for that purpose, selected file should hold one pattern per line.
How can i do it more simple
You might use --count option to avoid need of using wc -l for that, consider following simple example, let file.txt content be
123
456
789
and file1.txt content be
abc123
def456
and file2.txt content be
ghi789
xyz000
and file3.txt content be
xyz000
xyz000
then
grep --count --file=file.txt file1.txt file2.txt file3.txt
gives output
file1.txt:2
file2.txt:1
file3.txt:0
Observe that no files are created and file without matches does appear in output. Disclaimer: this solution assumes file.txt does not contain character of special meaning for GNU grep, if this does not hold do not use this solution.
(tested in GNU grep 3.4)

How to remove n lines in a log file only after the first match of the pattern

I have a log file which contains several repeats of the pattern Fre --. I need to remove only first occurrence of this pattern and the next 20 lines after that and keep other matches intact. I need to do it in a bash terminal, using sed preferably or awk or perl. I would highly appreciate your help.
I tried
sed -e '/Fre --/,+20d' log.log
but it deletes all the patterns and next 20 lines after that. I want only first pattern to be removed
There is a more or less similar question and some answers here: How to remove only the first occurrence of a line in a file using sed but I don't know how to change it to remove 20 lines after the first match
Pretty sure that someone will find a nice sed command but I know awk better.
You can try :
awk '/Fre --/ && !found++{counter=21}--counter<0' log.log
Explanations :
/Fre --/ -> if it finds pattern Fre --
&& !found++ -> and if it didn't find it before
{counter=21} -> it sets counter value at 21 (because you want to remove the line + the next 20s)
--counter<0 -> decreases the counter and prints the line only if counter < 0
As mentioned by #Sundeep, #EdMorton solution is safer on very big files.
awk '/Fre --/ && !found++{counter=21}!(counter&&counter--)' log.log
NOTE
If you want the deletions to be saved into the original file, you will have to copy the contents of the awk command into a temp file, and then move the temp file into the original file. Always be careful before editing the original file since you may lose precious informations.
Run the first command first :
awk '/Fre --/ && !found++{counter=21}!(counter&&counter--)' log.log > log.log.tmp
Then check the .tmp file and you can run the second command to apply the changes if .tmp file looks ok :
mv log.log.tmp log.log
$ seq 20 | awk '!f && /3/{c=4; f=1} !(c&&c--)'
1
2
7
8
9
10
11
12
13
14
15
16
17
18
19
20
See Printing with sed or awk a line following a matching pattern

Use grep only on specific columns in many files?

Basically, I have one file with patterns and I want every line to be searched in all text files in a certain directory. I also only want exact matches. The many files are zipped.
However, I have one more condition. I need the first two columns of a line in the pattern file to match the first two columns of a line in any given text file that is searched. If they match, the output I want is the pattern(the entire line) followed by all the names of the text files that a match was found in with their entire match lines (not just first two columns).
An output such as:
pattern1
file23:"text from entire line in file 23 here"
file37:"text from entire line in file 37 here"
file156:"text from entire line in file 156 here"
pattern2
file12:"text from entire line in file 12 here"
file67:"text from entire line in file 67 here"
file200:"text from entire line in file 200 here"
I know that grep can take an input file, but the problem is that it takes every pattern in the pattern file and searches for them in a given text file before moving onto the next file, which makes the above output more difficult. So I thought it would be better to loop through each line in a file, print the line, and then search for the line in the many files, seeing if the first two columns match.
I thought about this:
cat pattern_file.txt | while read line
do
echo $line >> output.txt
zgrep -w -l $line many_files/*txt >> output.txt
done
But with this code, it doesn't search by the first two columns only. Is there a way so specify the first two columns for both the pattern line and for the lines that grep searches through?
What is the best way to do this? Would something other than grep, like awk, be better to use? There were other questions like this, but none that used columns for both the search pattern and the searched file.
Few lines from pattern file:
1 5390182 . A C 40.0 PASS DP=21164;EFF=missense_variant(MODERATE|MISSENSE|Aag/Cag|p.Lys22Gln/c.64A>C|359|AT1G15670|protein_coding|CODING|AT1G15670.1|1|1)
1 5390200 . G T 40.0 PASS DP=21237;EFF=missense_variant(MODERATE|MISSENSE|Gcc/Tcc|p.Ala28Ser/c.82G>T|359|AT1G15670|protein_coding|CODING|AT1G15670.1|1|1)
1 5390228 . A C 40.0 PASS DP=21317;EFF=missense_variant(MODERATE|MISSENSE|gAa/gCa|p.Glu37Ala/c.110A>C|359|AT1G15670|protein_coding|CODING|AT1G15670.1|1|1)
Few lines from a file in searched files:
1 10699576 . G A 36 PASS DP=4 GT:GQ:DP 1|1:36:4
1 10699790 . T C 40 PASS DP=6 GT:GQ:DP 1|1:40:6
1 10699808 . G A 40 PASS DP=7 GT:GQ:DP 1|1:40:7
They both in reality are much larger.
It sounds like this might be what you want:
awk 'NR==FNR{a[$1,$2]; next} ($1,$2) in a' patternfile anyfile
If it's not then update your question to provide a clear, simple statement of your requirements and concise, testable sample input and expected output that demonstrates your problem and that we could test a potential solution against.
if anyfile is actually a zip file then you'd do something like:
zcat anyfile | awk 'NR==FNR{a[$1,$2]; next} ($1,$2) in a' patternfile -
Replace zcat with whatever command you use to produce text from your zip file if that's not what you use.
Per the question in the comments, if both input files are compressed and your shell supports it (e.g. bash) you could do:
awk 'NR==FNR{a[$1,$2]; next} ($1,$2) in a' <(zcat patternfile) <(zcat anyfile)
otherwise just uncompress patternfile to a tmp file first and use that in the awk command.
Use read to parse the pattern file's columns and add an anchor to the zgrep pattern :
while read -r column1 column2 rest_of_the_line
do
echo "$column1 $column2 $rest_of_the_line"
zgrep -w -l "^$column1\s*$column2" many_files/*txt
done < pattern_file.txt >> output.txt
read is able to parse lines into multiple variables passed as parameters, the last of which getting the rest of the line. It will separate fields around characters of the $IFS Internal Field Separator (by default tabulations, spaces and linefeeds, can be overriden for the read command by using while IFS='...' read ...).
Using -r avoids unwanted escapes and makes the parsing more reliable, and while ... do ... done < file performs a bit better since it avoids an useless use of cat. Since the output of all the commands inside the while is redirected I also put the redirection on the while rather than on each individual commands.

Read a line from file, replace line to another file

I am new to Linux scripting.
I need the lines from a file, and one by one write it in a specific line of another file.
Example:
File1.txt:
line1
line2
line3
File2.txt:
abc
abc
xxx
I need to write first "line1" instead of the 3rd line of File2.txt, then do some operations with this file, then write "line2" instead of the 3rd line of File2.txt and so on.
At the moment this is what I have
for n in {1..5}
do
a=$(sed '24!d' File1) #read string 24
echo $a
sed -i '1s/.*/a/' File2.txt
done
Now instead of 24 in line 3 i should put the variable n used in the cycle. Is it possible?
The same thing is in line 5, where >"a" is supposed to be a variable, but the program changes the first line of File2.txt with "a".
Can I use this functions or I need to use other functions (if yes what functions?)?
Try,
sed '2r t2' t1
If you want to perform any operation on file 2 you can simply use
sed 2r<(cat t2) t1 ##You can change command (cat t2) as per your need
I think this will solve your problem.
root#ubuntu:~/T/e/s/t# cat t1
test
asdf
xyza
root#ubuntu:~/T/e/s/t# cat t2
sample line 1
sample line 2
sample line 3
sample line 4
root#ubuntu:~/T/e/s/t# sed 2r<(cat t2) t1
test
asdf
sample line 1
sample line 2
sample line 3
sample line 4
xyza
Details
2: Second line
r: read from the file
If you want to commit this operation, you can use sed -i option. Refer to man sed for more details.
Thanks, Benjamin W. for missing scenario (sed '2r t2' t1)

Unix : Head , Tail , Middle of all files recursively

Requirement :
There might be multiple files within a folder . For Each File , i want to retrive Top10, Bottom 10 and Middle 10 of each file and dump into One File .
Example :
Input Files : APPLE.TXT, ORAGNE.TXT , BANANA.TXT
Output File: Final.TXT which will contain Top 10, Bottom 10 , Middle 10 of each file Above.
Final.Txt will have :
Apple.txt
ABC
CDE
EFG
ORANGE.TXT
DEF
GEH
IJK
etc.
Thanks for you help.
Here are a few pointers to get you started:
Use head to get the first ten lines:
head -10 file
To append the output of the command to a file, use >> e.g. head -10 file >> output
Use tail to get the last ten lines:
tail -10 file
Use sed to get the middle ten lines. You need to work out the line numbers first as shown below:
total=$(wc -l < file)
middle=$((total/2))
start=$((middle-4))
end=$((middle+5))
sed -n ${start},${end}p file
Of course, you should first check that your file has at least ten lines.

Resources