How to display a word from different lines? - bash

I wanna be able to display every line that has the word from input read.
Now Im able to display the index position of the input read word.
echo "Filename"
read file
echo "A word"
read word
echo $file | awk '{print index($0,"'"$word"'")}'

as requested here is an example of how you could use grep.
The following command will use wget to fetch the content of this stackoverflow webpage, the -O - option tells wget to output the html into std out, which then gets piped into grep -n grep.
grep matches the number of instances the term "grep" appears in the html and then outputs with the corresponding line number where the match occurs - and highlights the match.
wget -O - http://stackoverflow.com/questions/27691506/how-to-display-a-word-from-different-lines | grep -n grep
e.g. When I run in my terminal (ignoring the wget related output) grep -n grepgives:
42: <span class="comment-copy">why dont you use <code>grep</code> ?</span>
468: <span class="comment-copy">#nu11p01n73R Can you give me any example, on how to use grep?</span>
495: <span class="comment-copy">As per your question <code>grep $word $file</code> will output lines in <code>$file</code> containing <code>$word</code></span>
521: <span class="comment-copy">If you want line numbers too, you can use <code>grep</code> like this: <code>grep -in $word $file</code></span>
Note stackoverflow syntax highlighting is different to what you get in terminal

If you want to print also the line number
read -p 'Filename? ' fname
read -p 'Word to search? ' word
awk /"$word"/'{print NR, $0}' "$fname"
If you don't want the line number
read -p 'Filename? ' fname
read -p 'Word to search? ' word
awk /"$word"/ "$fname"
BTW, what everyone told you is "use grep", but look at this
% time for i in {1..10000} ; do grep alias .bashrc > /dev/null ; done
real 0m22.665s
user 0m2.672s
sys 0m3.564s
% time for i in {1..10000} ; do mawk /alias/ .bashrc > /dev/null ; done
real 0m21.951s
user 0m3.412s
sys 0m3.636s
Of course gawk is slower, in this case 27.9 seconds.

Related

How to use lines in a file as keyword for grep?

I've search lots of questions on here and other sites, and people have suggested things that should fix my problem, but I think there's something wrong with my code that I just don't recognize.
I have 24 .fasta files from NGS sequencing that are 150bp long. There's approximately 1M reads for each file. The reads are from targeted sequencing where we electroplated vectors with cDNA for genes of interest, and a unique barcode sequence. I need to look through the sequencing files for the presence or absence of the barcode sequence which corresponds to a specific gene.
I have a .txt list of the barcodeSequences that I want to pass to grep to look for the barcode in the .fasta file. I've tried so many variations of this command. I can give grep each barcode individually but that's so time consuming, I know it's possible to give it the list of barcode sequences and search each .fasta for each of the barcodes and record how many times each barcode is found in each file.
Here's my code where I give it each barcode individually:
# Barcode 33
mkdir --mode 755 $dir/BC33
FILES="*.fasta"
for f in $FILES; do
cat "$f" | tr -d "\n" | tr ">" "\n" | grep 'TATTAGAGTTTGAGAATAAGTAGT' > $dir/BC33/"$f"
done
I tried to adapt it so that I don't have to feed every barcode sequence in individually:
dir="/home/lozzib/AG_Barcode_Seq/"
cd $dir
FILES="*.fasta"
for f in $FILES; do
cat "$f" | tr -d "\n" | tr ">" "\n" | grep -c -f BarcodeScreenSeq.txt | sort > $dir/Results/"$f"
echo "Finished $f"
done
But it is not searching for the barcode sequences. With this iteration it is just returning new files in the /Results directory that are empty. I also tried a nest loop, where I tried to make the barcode sequence a variable that changed like the $FILES, but that just gave me a new file with the names of my .fasta files:
dir="/home/lozzib/AG_Barcode_Seq/"
cd $dir
FILES="*.fasta"
for f in $FILES; do
for b in `cat /home/lozzib/AG_Barcode_Seq/BarcodeScreenSeq.txt`; do
cat "$f" | grep -c "$b" | sort > $dir/"$f"_Barcode
done ;
done
I want a output .txt file that has:
<barcode sequence>: <# of times that bc was found>
for each .fasta file because I want to put all the samples together to make one large excel sheet which shows each barcode and how many times it was found in each sample.
Please help, I've tried everything I can think of.
EDIT
Here is what the BarcodeScreenSeq.txt file would look like. It's just a txt file where each line is a barcode sequence:
head BarcodeScreenSeq.txt
TATTATGAGAAAGTTGAATAGTAG
ATGAAAGTTAGAGTTTATGATAAG
AATAGATAAGATTGATTGTGTTTG
TGTTAAATGTATGTAGTAATTGAG
ATAGATTTAAGTGAAGAGAGTTAT
GAATGTTTGTAAATGTATAGATAG
AAATTGTGAAAGATTGTTTGTGTA
TGTAAGTGAAATAGTGAGTTATTT
GAATTGTATAAAGTATTAGATGTG
AGTGAGATTATGAGTATTGATTTA
EDIT
lozzib#gliaserver:~/AG_Barcode_Seq$ file BarcodeScreenSeq.txt
BarcodeScreenSeq.txt: ASCII text, with CRLF line terminators
Windows Line Endings
Your BarcodeScreenSeq.txt has windows line endings. Each line ends with the special characters \r\n. Linux tools such as grep only deal with linux line endings \r and interpret your file ...
TATTATG\r\n
ATGAAAG\r\n
...
to look for the patterns TATTATG\r, ATGAAAG\r, ... (note the \r at the end). Because of the \r there is no match.
Either: Convert your file once bye running dos2unix BarcodeScreenSeq.txt or sed -i 's/\r//g' BarcodeScreenSeq.txt. This will change your file.
Or: replace every BarcodeScreenSeq.txt in the following scripts by <(tr -d '\r' < BarcodeScreenSeq.txt). This won't change the file, but creates more overhead as the file is converted over and over again.
Command
grep -c has only one counter. If you pass multiple search patterns at once (for instance using -f BarcodeScreenSeq.txt) you still get only one number for all patterns together.
To count the occurrences of each pattern individually you can use the following trick:
for file in *.fasta; do
grep -oFf BarcodeScreenSeq.txt "$file" |
sort | uniq -c |
awk '{print $2 ": " $1 }' > "Results/$file"
done
grep -o will print each match as a single line.
sort | uniq -c will count how often each line occurs.
awk is only there to change the format from #matches pattern to pattern: #matches.
Benefit: The command should be fairly fast.
Drawback: Patterns from BarcodeScreenSeq.txt that are not found in $file won't be listed at all. Your result will leave out lines of the form pattern: 0.
If you really need the lines of the form pattern: 0 you could use another trick:
for file in *.fasta; do
grep -oFf BarcodeScreenSeq.txt "$file" |
cat - BarcodeScreenSeq.txt |
sort | uniq -c |
awk '{print $2 ": " ($1 - 1) }' > "Results/$file"
done
cat - BarcodeScreenSeq.txt will insert the content of BarcodeScreenSeq.txt at the end of grep's output such that #matches is one bigger than it should be. The number is corrected by awk.
You can read a text file one line at a time and process each line separately using a redirect, like so:
for f in *.fasta; do
while read -r seq; do
grep -c "${seq}" "${f}" > "${dir}"/"${f}"_Barcode
done < /home/lozzib/AG_Barcode_Seq/BarcodeScreenSeq.txt
done

Evaluating a log file using a sh script

I have a log file with a lot of lines with the following format:
IP - - [Timestamp Zone] 'Command Weblink Format' - size
I want to write a script.sh that gives me the number of times each website has been clicked.
The command:
awk '{print $7}' server.log | sort -u
should give me a list which puts each unique weblink in a separate line. The command
grep 'Weblink1' server.log | wc -l
should give me the number of times the Weblink1 has been clicked. I want a command that converts each line created by the Awk command above to a variable and then create a loop that runs the grep command on the extracted weblink. I could use
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Text read from file: $line"
done
(source: Read a file line by line assigning the value to a variable) but I don't want to save the output of the Awk script in a .txt file.
My guess would be:
while IFS='' read -r line || [[ -n "$line" ]]; do
grep '$line' server.log | wc -l | ='$variabel' |
echo " $line was clicked $variable times "
done
But I'm not really familiar with connecting commands in a loop, as this is my first time. Would this loop work and how do I connect my loop and the Awk script?
Shell commands in a loop connect the same way they do without a loop, and you aren't very close. But yes, this can be done in a loop if you want the horribly inefficient way for some reason such as a learning experience:
awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do
n=$(grep -c "$line" server.log)
echo "$line" clicked $n times
done
# you only need the read || [ -n ] idiom if the input can end with an
# unterminated partial line (is illformed); awk print output can't.
# you don't really need the IFS= and -r because the data here is URLs
# which cannot contain whitespace and shouldn't contain backslash,
# but I left them in as good-habit-forming.
# in general variable expansions should be doublequoted
# to prevent wordsplitting and/or globbing, although in this case
# $line is a URL which cannot contain whitespace and practically
# cannot be a glob. $n is a number and definitely safe.
# grep -c does the count so you don't need wc -l
or more simply
awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do
echo "$line" clicked $(grep -c "$line" server.log) times
done
However if you just want the correct results, it is much more efficient and somewhat simpler to do it in one pass in awk:
awk '{n[$7]++}
END{for(i in n){
print i,"clicked",n[i],"times"}}' |
sort
# or GNU awk 4+ can do the sort itself, see the doc:
awk '{n[$7]++}
END{PROCINFO["sorted_in"]="#ind_str_asc";
for(i in n){
print i,"clicked",n[i],"times"}}'
The associative array n collects the values from the seventh field as keys, and on each line, the value for the extracted key is incremented. Thus, at the end, the keys in n are all the URLs in the file, and the value for each is the number of times it occurred.

How to Read a file word by word and use those words to grep in bash shell?

I want to read a file word by word and i want to use each word in that text file as an input to grep.
to read the file word by word i have used the following code
for word in $(<filename)
do
echo "$word"
done
now when I replaced
echo "$word"
with
grep -i "$word"
I'm not getting any output.
The following will read the file word by word and apply grep using the read word as input:
#!/bin/bash
while read line; do
for word in $line; do
grep -i "<REGULAR_EXPRESSION_HERE>" "$word"
done
done < filename
The reason you are not getting any output is that grep expects two arguments. If you leave out the filename argument, it will wait for you to type in the text to grep from; it is reading standard input. (This is what allows you to use it in a pipeline, like command | grep error.)
Anyway, what you are attempting is already built into grep. Just pass it the file of search expressions as an argument to -f.
grep -irf filename .
where -r says to search recursively through all the files in a directory and . is the current directory.
Note, however, that this will search for matches anywhere on a line. If your input file contains dog then grep will find a match on lines which contain dogmatic or endogenous; and if it contains an empty line, it will match all lines in all files. Maybe look at the -w and/or -x options (as well as perhaps -F to disarm any regex specials in the input) to address these issues.
See if this serves your purpose:
$ grep -o "\S*" filename | grep -i "<your regex here>"
The first grep in the pipeline will flatten the file to one word per line. Then second grep will search those word for your regex.
Note: This answer assumes that the individual words in file are the data you want to grep in. If those are supposed to be interpreted as filenames, refer to higuaro's answer.
This is what worked for me
while read line
do
output=`grep -i "$line" /filepath/*`
if [ $? -eq 0 ]; then
echo "$line present in file : $output"
fi
done <filename

Use sed te extract ascii hex string from a single line in a file

I have a file that looks like this:
some random
text
00ab46f891c2emore random
text
234324fc234ba253069
and yet more text
only one line in the file contains only hex characters (234324fc234ba253069), how do I extract that? I tried sed -ne 's/^\([a-f0-9]*\)$/\1/p' file I used line start and line end (^ and &) as delimiters, but I am obviously missing something...
Grep does the job,
$ grep '^[a-f0-9]\+$' file
234324fc234ba253069
Through awk,
$ awk '/^[a-f0-9]+$/{print}' file
234324fc234ba253069
Based on the search pattern given, awk and grep prints the matched line.
^ # start
[a-f0-9]\+ # hex characters without capital A-F one or more times
$ # End
sed can make it:
sed -n '/^[a-f0-9]*$/p' file
234324fc234ba253069
By the way, your command sed -ne 's/^\([a-f0-9]*\)$/\1/p' file is working to me. Note, also, that it is not necessary to use \1 to print back. It is handy in many cases, but now it is too much because you want to print the whole line. Just sed -n '/pattern/p' does the job, as I indicate above.
As there is just one match in the whole file, you may want to exit once it is found (thanks NeronLeVelu!):
sed -n '/^[a-f0-9]*$/{p;q}' file
Another approach is to let printf decide when the line is hexadecimal:
while read line
do
printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"
done < file
Based on Hexadecimal To Decimal in Shell Script, printf "%f" 0xNUMBER executes successfully if the number is indeed hexadecimal. Otherwise, it returns an error.
Hence, using printf ... >/dev/null 2>&1 && echo "$line" does not let printf print anything (redirects to /dev/null) but then prints the line if it was hexadecimal.
For your given file, it returns:
$ while read line; do printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"; done < a
234324fc234ba253069
Using egrep you can restrict your regex to select lines that only match valid hex characters i.e. [a-fA-F0-9]:
egrep '^[a-fA-F0-9]+$' file
234324fc234ba253069

Searching a file name in file using SHELL SCRIPT [duplicate]

This question already has answers here:
Find lines from a file which are not present in another file [duplicate]
(4 answers)
Closed 8 years ago.
I will fetch the file names from the file say: FILE_A, and will search these file names in another file say: File_B Using the script say: script.sh
I want to print those file names which are not present in a file say: FILE_B.
I use the code but it didn't work.
Code in the script->script.sh is as follows:
#!/bin/bash
while read line
do
grep -v "$line" FILE_B
done<FILE_A
please help me. why it is not working and what is the solution of it?
grep can read its input from a file; no need for a loop.
grep -Fxvf FILE_A FILE_B
The -F option specifies that the input is literal strings, not regular expressions. Otherwise an input which contains regex metacharacters would not match itself; or not only itself. For example, the regular expression a.c matches "aac", "abc", etc.
The -x option requires a full-line match. Otherwise, the input "bc" would match on any line containing it as a substring, such as "abcd".
The -v option says to print non-matching lines instead of matching.
Finally, the lowercase -f option specifies a file name as its argument to use as input for the patterns to match.
comm is good for this, but it requires the input files to be sorted. If that's not a problem:
# lines in FILE_A that are not in FILE_B
comm -23 <(sort FILE_A) <(sort FILE_B)
No extra linefeed between while and do
grep -v expr file will
print all lines of those files, not containing expr. What you want, is just the result whether it's found or not. You need to test the
exit state.
Try:
#!/bin/bash
while read line
do
grep -q "$line" FILE_B || echo "$line"
done<FILE_A
grep returns exit 0 if a line was found. The || concatenation with echo means: execute echo when exit state != 0- i.e. when $line was not found.
This script works but does not print what you want. For each filename in FILE_A it prints all the OTHER filenames in FILE_B. Instead you should print the filename yourself if grep does not find it:
while read line
do
grep "$line" FILE_B >/dev/null || echo "$line"
done <FILE_A
Use this instead
#!/bin/bash
while read line
do
if grep -qw $line "file_B"
then
echo $line
fi
done < file_A

Resources