Unix: Perform grep command for every line from inputfile and save output to file [duplicate] - bash

This question already has answers here:
How to make "grep" read patterns from a file?
(2 answers)
Closed 6 years ago.
I'm trying to write a small unix script to search one file for the line that contain a certain term, found in a different file. I would like to save the output of this command to a new file.
I have a file containing terms (terms.txt) which has a term on each line:
term1
term2
term3
term4
For each of these terms, I want to find the line that contains this term in another file (scores.txt) and append the output of this to a new file (output.txt).
The script I have come up with thus far:
#!/bin/bash
for f in `cat terms.txt`;
do
grep -i $f scores.txt >> output.txt;
done
Somehow this does not seem to work properly.
Running just the grep command with the term hard coded does indeed give me the right line I'm searching for:
grep -i "term1" scores.txt
Also, a simple echo does give me the right terms:
for f in `cat terms.txt`; do echo $f; done
However, when I try to repeat this with the $f variable, to repeat the same command for every term in my terms.txt, it does not work.
Could someone help me out on this one?

can you try:
grep -if terms.txt scores.txt > output.txt
basically grep's option f treats the strings in the terms.txt file as patterns to search for in scores.txt
If your terms.txt has CRLF line endings, try this:
grep -if <(tr -d '\r' < terms.txt) scores.txt > output.txt

Related

How to read strings from a text file and use them in grep?

I have a file of strings that I need to search for in another file. So I've tried the following code:
#!/bin/bash
while read name; do
#echo $name
grep "$name" file.txt > results.txt
done < missing.txt
The echo line confirms the file is being read into the variable, but my results file is always empty. Doing the grep command on its own works, I'm obviously missing something very basic here but I have been stuck for a while and can't figure it out.
I've also tried without quotes around the variable. Can someone tell me what I'm missing? Thanks a bunch
Edit - input file was DOS format, set file format to unix and works fine now
Use grep's -f option: Then you only need a single grep call and no loop.
grep -f missing.txt file.txt > results.txt
If the contents of "missing.txt" are fixed strings, not regular expressions, this will speed up the process:
grep -F -f missing.txt file.txt > results.txt
And if you want to find the words of missing.txt in the other file, not partial words
grep -F -w -f missing.txt file.txt > results.txt
My first guess is that you are overwriting your results.txt file in every iteration of the while loop (with the single >). If it is the case you should at least have the result for the very last line in your missing.txt file. Then I think it would suffice to do something like
#!/bin/bash
while read name; do
#echo "$name"
grep "$name" file.txt
done < missing.txt > results.txt

Deleting first n rows and column x from multiple files using Bash script

I am aware that the "deleting n rows" and "deleting column x" questions have both been answered individually before. My current problem is that I'm writing my first bash script, and am having trouble making that script work the way I want it to.
file0001.csv (there are several hundred files like these in one folder)
Data number of lines 540
No.,Profile,Unit
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm
Desired output
1,1027.84
2,1027.92
3,1028
4,1028.81
I am able to use sed and cut individually but for some reason the following bash script doesn't take cut into account. It also gives me an error "sed: can't read ls: No such file or directory", yet sed is successful and the output is saved to the original files.
sem2csv.sh
for files in 'ls *.csv' #list of all .csv files
do
sed '1,2d' -i $files | cut -f '1-2' -d ','
done
Actual output:
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm
I know there may be awk one-liners but I would really like to understand why this particular bash script isn't running as intended. What am I missing?
The -i option of sed modifies the file in place. Your pipeline to cut receives no input because sed -i produces no output. Without this option, sed would write the results to standard output, instead of back to the file, and then your pipeline would work; but then you would have to take care of writing the results back to the original file yourself.
Moreover, single quotes inhibit expansion -- you are "looping" over the single literal string ls *.csv. The fact that you are not quoting it properly then causes the string to be subject to wildcard expansion inside the loop. So after variable interpolation, your sed command expands to
sed -i 1,2d ls *.csv
and then the shell expands *.csv because it is not quoted. (You should have been receiving a warning that there is no file named ls in the current directory, too.) You probably attempted to copy an example which used backticks (ASCII 96) instead of single quotes (ASCII 39) -- the difference is quite significant.
Anyway, the ls is useless -- the proper idiom is
for files in *.csv; do
sed '1,2d' "$files" ... # the double quotes here are important
done
Mixing sed and cut is usually not a good idea because you can express anything cut can do in terms of a simple sed script. So your entire script could be
for f in *.csv; do
sed -i -e '1,2d' -e 's/,[^,]*$//' "$f"
done
which says to remove the last comma and everything after it. (If your sed does not like multiple -e options, try with a semicolon separator: sed -i '1,2d;s/,[^,]*$//' "$f")
You may use awk,
$ awk 'NR>2{sub(/,[^,]*$/,"",$0);print}' file
1,1027.84
2,1027.92
3,1028
4,1028.81
or
sed -i '1,2d;s/,[^,]*$//' file
1,2d; for deleting the first two lines.
s/,[^,]*$// removes the last comma part in remaining lines.

Loop through text file and execute If Then statement for each line with bash

I have a command that lists the full 8 level deep path of all folders we are backing up.
I also have a command that enumerates all 8 level deep folders on the system.
Both of these are stored as variables in a bash script.
I'm trying to get a loop together that takes file 1 and uses the first line entry as a variable in an if/then/else, and then moves onwards to through the end of the file.
I've tried so many things but its beyond my skillset to provide an example that won't confuse the reader of this post.
TempFile1=/ifs/data/scripts/ConfigMonitor/TempFile1.txt
TempFile2=/ifs/data/scripts/ConfigMonitor/TempFile2.txt
find /ifs/*/*/backup -maxdepth 4 -mindepth 4 -type d > $TempFile1
isi snapshot schedules list -v | grep Path: | awk '{print $2}' > $TempFile2
list line 1 on $TempFile1
Grep for line 1 within $TempFile2
if result yielded then
echo found
else
echo fullpath not being backed up
fi
Use Grep's -f Flag
grep(1) says:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
Therefore, the following should work:
grep -f patterns_to_match.txt file_to_examine.txt
Faster Reporting
Another way to think about this is that you can ask GNU grep to show you all the matches:
echo 'Lines that match a pattern in your pattern file.'
grep -f patterns_to_match.txt file_to_examine.txt
and then show you all the lines that don't match any of the patterns:
echo 'Lines that do not match any patterns in your pattern file.'
grep -f patterns_to_match.txt -v file_to_examine.txt
This is likely to be faster and more efficient than looping through the file one line at a time in Bash. You may or may not get similar results with a grep other than GNU grep; while the -f and -v flags are specified by POSIX, I only tested it against GNU grep 2.16, so your mileage may vary.
This should iterate through Tempfile1.txt and grep for the line in TempFile2.txt.
while read line; do
if grep $line /path/to/TempFile2.txt > /dev/null
then
echo "Found $line"
else
echo "Did not find $line"
fi
done < Tempfile1.txt
Tempfile1.txt:
a
b
c
Tempfile2.txt
b
d
z
Output:
Did not find a
Found b
Did not find c

Searching a file name in file using SHELL SCRIPT [duplicate]

This question already has answers here:
Find lines from a file which are not present in another file [duplicate]
(4 answers)
Closed 8 years ago.
I will fetch the file names from the file say: FILE_A, and will search these file names in another file say: File_B Using the script say: script.sh
I want to print those file names which are not present in a file say: FILE_B.
I use the code but it didn't work.
Code in the script->script.sh is as follows:
#!/bin/bash
while read line
do
grep -v "$line" FILE_B
done<FILE_A
please help me. why it is not working and what is the solution of it?
grep can read its input from a file; no need for a loop.
grep -Fxvf FILE_A FILE_B
The -F option specifies that the input is literal strings, not regular expressions. Otherwise an input which contains regex metacharacters would not match itself; or not only itself. For example, the regular expression a.c matches "aac", "abc", etc.
The -x option requires a full-line match. Otherwise, the input "bc" would match on any line containing it as a substring, such as "abcd".
The -v option says to print non-matching lines instead of matching.
Finally, the lowercase -f option specifies a file name as its argument to use as input for the patterns to match.
comm is good for this, but it requires the input files to be sorted. If that's not a problem:
# lines in FILE_A that are not in FILE_B
comm -23 <(sort FILE_A) <(sort FILE_B)
No extra linefeed between while and do
grep -v expr file will
print all lines of those files, not containing expr. What you want, is just the result whether it's found or not. You need to test the
exit state.
Try:
#!/bin/bash
while read line
do
grep -q "$line" FILE_B || echo "$line"
done<FILE_A
grep returns exit 0 if a line was found. The || concatenation with echo means: execute echo when exit state != 0- i.e. when $line was not found.
This script works but does not print what you want. For each filename in FILE_A it prints all the OTHER filenames in FILE_B. Instead you should print the filename yourself if grep does not find it:
while read line
do
grep "$line" FILE_B >/dev/null || echo "$line"
done <FILE_A
Use this instead
#!/bin/bash
while read line
do
if grep -qw $line "file_B"
then
echo $line
fi
done < file_A

How can I remove the first line of a text file using bash/sed script?

I need to repeatedly remove the first line from a huge text file using a bash script.
Right now I am using sed -i -e "1d" $FILE - but it takes around a minute to do the deletion.
Is there a more efficient way to accomplish this?
Try tail:
tail -n +2 "$FILE"
-n x: Just print the last x lines. tail -n 5 would give you the last 5 lines of the input. The + sign kind of inverts the argument and make tail print anything but the first x-1 lines. tail -n +1 would print the whole file, tail -n +2 everything but the first line, etc.
GNU tail is much faster than sed. tail is also available on BSD and the -n +2 flag is consistent across both tools. Check the FreeBSD or OS X man pages for more.
The BSD version can be much slower than sed, though. I wonder how they managed that; tail should just read a file line by line while sed does pretty complex operations involving interpreting a script, applying regular expressions and the like.
Note: You may be tempted to use
# THIS WILL GIVE YOU AN EMPTY FILE!
tail -n +2 "$FILE" > "$FILE"
but this will give you an empty file. The reason is that the redirection (>) happens before tail is invoked by the shell:
Shell truncates file $FILE
Shell creates a new process for tail
Shell redirects stdout of the tail process to $FILE
tail reads from the now empty $FILE
If you want to remove the first line inside the file, you should use:
tail -n +2 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"
The && will make sure that the file doesn't get overwritten when there is a problem.
You can use -i to update the file without using '>' operator. The following command will delete the first line from the file and save it to the file (uses a temp file behind the scenes).
sed -i '1d' filename
For those who are on SunOS which is non-GNU, the following code will help:
sed '1d' test.dat > tmp.dat
You can easily do this with:
cat filename | sed 1d > filename_without_first_line
on the command line; or to remove the first line of a file permanently, use the in-place mode of sed with the -i flag:
sed -i 1d <filename>
No, that's about as efficient as you're going to get. You could write a C program which could do the job a little faster (less startup time and processing arguments) but it will probably tend towards the same speed as sed as files get large (and I assume they're large if it's taking a minute).
But your question suffers from the same problem as so many others in that it pre-supposes the solution. If you were to tell us in detail what you're trying to do rather then how, we may be able to suggest a better option.
For example, if this is a file A that some other program B processes, one solution would be to not strip off the first line, but modify program B to process it differently.
Let's say all your programs append to this file A and program B currently reads and processes the first line before deleting it.
You could re-engineer program B so that it didn't try to delete the first line but maintains a persistent (probably file-based) offset into the file A so that, next time it runs, it could seek to that offset, process the line there, and update the offset.
Then, at a quiet time (midnight?), it could do special processing of file A to delete all lines currently processed and set the offset back to 0.
It will certainly be faster for a program to open and seek a file rather than open and rewrite. This discussion assumes you have control over program B, of course. I don't know if that's the case but there may be other possible solutions if you provide further information.
The sponge util avoids the need for juggling a temp file:
tail -n +2 "$FILE" | sponge "$FILE"
If you want to modify the file in place, you could always use the original ed instead of its streaming successor sed:
ed "$FILE" <<<$'1d\nwq\n'
The ed command was the original UNIX text editor, before there were even full-screen terminals, much less graphical workstations. The ex editor, best known as what you're using when typing at the colon prompt in vi, is an extended version of ed, so many of the same commands work. While ed is meant to be used interactively, it can also be used in batch mode by sending a string of commands to it, which is what this solution does.
The sequence <<<$'1d\nwq\n' takes advantage of modern shells' support for here-strings (<<<) and ANSI quotes ($'...') to feed input to the ed command consisting of two lines: 1d, which deletes line 1, and then wq, which writes the file back out to disk and then quits the editing session.
As Pax said, you probably aren't going to get any faster than this. The reason is that there are almost no filesystems that support truncating from the beginning of the file so this is going to be an O(n) operation where n is the size of the file. What you can do much faster though is overwrite the first line with the same number of bytes (maybe with spaces or a comment) which might work for you depending on exactly what you are trying to do (what is that by the way?).
You can edit the files in place: Just use perl's -i flag, like this:
perl -ni -e 'print unless $. == 1' filename.txt
This makes the first line disappear, as you ask. Perl will need to read and copy the entire file, but it arranges for the output to be saved under the name of the original file.
should show the lines except the first line :
cat textfile.txt | tail -n +2
Could use vim to do this:
vim -u NONE +'1d' +'wq!' /tmp/test.txt
This should be faster, since vim won't read whole file when process.
How about using csplit?
man csplit
csplit -k file 1 '{1}'
This one liner will do:
echo "$(tail -n +2 "$FILE")" > "$FILE"
It works, since tail is executed prior to echo and then the file is unlocked, hence no need for a temp file.
Since it sounds like I can't speed up the deletion, I think a good approach might be to process the file in batches like this:
While file1 not empty
file2 = head -n1000 file1
process file2
sed -i -e "1000d" file1
end
The drawback of this is that if the program gets killed in the middle (or if there's some bad sql in there - causing the "process" part to die or lock-up), there will be lines that are either skipped, or processed twice.
(file1 contains lines of sql code)
tail +2 path/to/your/file
works for me, no need to specify the -n flag. For reasons, see Aaron's answer.
You can use the sed command to delete arbitrary lines by line number
# create multi line txt file
echo """1. first
2. second
3. third""" > file.txt
deleting lines and printing to stdout
$ sed '1d' file.txt
2. second
3. third
$ sed '2d' file.txt
1. first
3. third
$ sed '3d' file.txt
1. first
2. second
# delete multi lines
$ sed '1,2d' file.txt
3. third
# delete the last line
sed '$d' file.txt
1. first
2. second
use the -i option to edit the file in-place
$ cat file.txt
1. first
2. second
3. third
$ sed -i '1d' file.txt
$cat file.txt
2. second
3. third
If what you are looking to do is recover after failure, you could just build up a file that has what you've done so far.
if [[ -f $tmpf ]] ; then
rm -f $tmpf
fi
cat $srcf |
while read line ; do
# process line
echo "$line" >> $tmpf
done
Based on 3 other answers, I came up with this syntax that works perfectly in my Mac OSx bash shell:
line=$(head -n1 list.txt && echo "$(tail -n +2 list.txt)" > list.txt)
Test case:
~> printf "Line #%2d\n" {1..3} > list.txt
~> cat list.txt
Line # 1
Line # 2
Line # 3
~> line=$(head -n1 list.txt && echo "$(tail -n +2 list.txt)" > list.txt)
~> echo $line
Line # 1
~> cat list.txt
Line # 2
Line # 3
Would using tail on N-1 lines and directing that into a file, followed by removing the old file, and renaming the new file to the old name do the job?
If i were doing this programatically, i would read through the file, and remember the file offset, after reading each line, so i could seek back to that position to read the file with one less line in it.

Resources