Finding presence of substring within a string in BASH - bash

I have a script that is trying to find the presence of a given string inside a file of arbitrary text.
I've settled on something like:
#!/bin/bash
file="myfile.txt"
for j in `cat blacklist.txt`; do
echo Searching for $j...
unset match
match=`grep -i -m1 -o "$j" $file`
if [ $match ]; then
echo "Match: $match"
fi
done
Blacklist.txt contains lines of potential matches, like so:
matchthis
"match this too"
thisisasingleword
"This is multiple words"
myfile.txt could be something like:
I would matchthis if I could match things with grep. I really wish I could.
When I ask it to match this too, it fails to matchthis. It should match this too - right?
If I run this at a bash prompt, like so:
j="match this too"
grep -i -m1 -o "$j" myfile.txt
...I get "match this too".
However, when the batch file runs, despite the variables being set correctly (verified via echo lines), it never greps properly and returns nothing.
Where am I going wrong?

Wouldn't
grep -owF -f blacklist.txt myfile.txt
instead of writing an inefficient loop, do what you want?

Would you please try:
#!/bin/bash
file="myfile.txt"
while IFS= read -r j; do
j=${j#\"}; j=${j%\"} # remove surrounding double quotes
echo "Searching for $j..."
match=$(grep -i -m1 -o "$j" "$file")
if (( $? == 0 )); then # if match
echo "Match: $match" # then print it
fi
done < blacklist.txt
Output:
Searching for matchthis...
Match: matchthis
Searching for match this too...
Match: match this too
match this too
Searching for thisisasingleword...
Searching for This is multiple words...

I wound up abandoning grep entirely and using sed instead.
match=`sed -n "s/.*\($j\).*/\1/p" $file
Works well, and I was able to use unquoted multiple word phrases in the blacklist file.

With this:
if [ $match ]; then
you are passing random arguments to test. This is not how you properly check for variable net being empty. Use test -n:
if [ -n "$match" ]; then
You might also use grep's exit code instead:
if [ "$?" -eq 0 ]; then
for ... in X splits X at spaces by default, and you are expecting the script to match whole lines.
Define IFS properly:
IFS='
'
for j in `cat blacklist.txt`; do
blacklist.txt contains "match this too" with quotes, and it is read like this by for loop and matched literally.
j="match this too" does not cause j variable to contain quotes.
j='"match this too"' does, and then it will not match.
Since whole lines are read properly from the blacklist.txt file now, you can probably remove quotes from that file.
Script:
#!/bin/bash
file="myfile.txt"
IFS='
'
for j in `cat blacklist.txt`; do
echo Searching for $j...
unset match
match=`grep -i -m1 -o "$j" "$file"`
if [ -n "$match" ]; then
echo "Match: $match"
fi
done
Alternative to the for ... in ... loop (no IFS= needed):
while read; do
j="$REPLY"
...
done < 'blacklist.txt'

Related

How to check filetype in if statement bash using wildecard and -f

subjects_list=$(ls -l /Volumes/Backup_Plus/PPMI_10 | awk '{ print $NF }')
filepath="/Volumes/Backup_Plus/PPMI_10/$subjects/*/*/S*/"
for subjects in $subjects_list; do
if [[ -f "${filepath}/*.bval" && -f "${filepath}/*.bvec" && -f "${filepath}/*.json" && -f "${filepath}/*.nii.gz" ]]; then
echo "${subjects}" >> /Volumes/Backup_Plus/PPMI_10/keep_subjects.txt
else
echo "${subjects}" >> /Volumes/Backup_Plus/PPMI_10/not_keep_subjects.txt
fi
done
problem is supposedly in the if statement, I tried this...
bvalfile = (*.bval)
bvecfile =(*.bvec)
jsonfile =(*.json)
niigzfile =(*.nii.gz)
if [[ -f "$bvalfile" && -f "$bvecfile" && -f "$jsonfile" && -f "$niigzfile" ]]; then
however that didn't work. Any help with syntax or errors or does it need to be changed completely. Trying to separate the files that have .^file types from those that don't by making two lists.
thanks
You're assigning filepath outside the for-subject loop but using the unset variable $subjects in it. You want to move that inside the loop.
Double-quoted wildcards aren't expanded, so both $filepath and your -f test will be looking for filenames with literal asterisks in them.
-f only works on a single file, so even if you fix the quotes, you'll have a syntax error if there's more than one file matching the pattern.
So I think what you want is something like this:
# note: array assignment -
# shell does the wildcard expansion, no ls required
prefix_list=( /Volumes/Backup_Plus/PPMI_10/* )
# and array expansion
for prefix in "${prefix_list[#]}"; do
# the subject is just the last component of the path
subject=${prefix##*/}
# start by assuming we're keeping this one
decision=keep
# in case filepath pattern matches more than one directory, loop over them
for filepath in "$prefix"/*/*/S*/; do
# if any of the files don't exist, switch to not keeping it
for file in "$filepath"/{*.bval,*.bvec,*.json,*.nii.gz}; do
if [[ ! -f "$file" ]]; then
decision=not_keep
# we have our answer and can stop looping now
break 2
fi
done
done
# now append to the correct list
printf '%s\n' "$subject" >>"/Volumes/Backup_Plus/PPMI_10/${decision}_subjects.txt"
done

Find and replace few words in text file with using bash

I have a script to, where in one variable words, that i have in file, in other variable, i have words, that i want use instead words from first variable. I need to find i am scatman and replace these words to you are dukenukem. For example, my text file, wwe.txt:
i
am
dsadsa
sda
daaaa
ds
dsds
dsa
d
scatman
For example, i wrote script, that makes grep, and it works:
words="i am scatman"
echo "$words"
for i in $words; do
if grep -q "$i" wwe.txt; then
echo "these words are exists"
grep "$i" wwe.txt
else
echo "these words are not exists"
exit 1
fi
done
It works. But if i want, to replace these words, how i can do this ? i wrote this:
words="i am scatman"
words2="you are dukenukem"
for i in $words; do
for y in $words2; do
if grep -q "$i" wwe.txt; then
echo "these words are exists"
grep "$i" wwe.txt
sed -i 's/'"$i"'/'"$y"'/g' wwe.txt
else
echo "these words are not exists"
exit 1
fi
done
done
But it does not work, where i have error ? Help please.
This code works. Please try it out.
#!/bin/bash
line1="i am scatman"
line2="you are dukenukem"
words2=($line2)
count=0
for word in $line1; do
sed -i -e "s/$word/${words2[$count]}/g" wwe.txt
count=$((count + 1))
done

How can I get the return value and matched line by grep in bash at once?

I am learning bash. I would like to get the return value and matched line by grep at once.
if cat 'file' | grep 'match_word'; then
match_by_grep="$(cat 'file' | grep 'match_word')"
read a b <<< "${match_by_grep}"
fi
In the code above, I used grep twice. I cannot think of how to do it by grep once. I am not sure match_by_grep is always empty even when there is no matched words because cat may output error message.
match_by_grep="$(cat 'file' | grep 'match_word')"
if [[ -n ${match_by_grep} ]]; then
# match_by_grep may be an error message by cat.
# So following a and b may have wrong value.
read a b <<< "${match_by_grep}"
fi
Please tell me how to do it. Thank you very much.
You can avoid the double use of grep by storing the search output in a variable and seeing if it is not empty.
Your version of the script without double grep.
#!/bin/bash
grepOutput="$(grep 'match_word' file)"
if [ ! -z "$grepOutput" ]; then
read a b <<< "${grepOutput}"
fi
An optimization over the above script ( you can remove the temporary variable too)
#!/bin/bash
grepOutput="$(grep 'match_word' file)"
[[ ! -z "$grepOutput" ]] && (read a b <<< "${grepOutput}")
Using double-grep once for checking if-condition and once to parse the search result would be something like:-
#!/bin/bash
if grep -q 'match_word' file; then
grepOutput="$(grep 'match_word' file)"
read a b <<< "${grepOutput}"
fi
When assigning a variable with a string containing a command expansion, the return code is that of the (rightmost) command being expanded.
In other words, you can just use the assignment as the condition:
if grepOutput="$(cat 'file' | grep 'match_word')"
then
echo "There was a match"
read -r a b <<< "${grepOutput}"
(etc)
else
echo "No match"
fi
Is this what you want to achieve?
grep 'match_word' file ; echo $?
$? has a return value of the command run immediately before.
If you would like to keep track of the return value, it will be also useful to have PS1 set up with $?.
Ref: Bash Prompt with Last Exit Code

Incrementing a variable inside a Bash loop

I'm trying to write a small script that will count entries in a log file, and I'm incrementing a variable (USCOUNTER) which I'm trying to use after the loop is done.
But at that moment USCOUNTER looks to be 0 instead of the actual value. Any idea what I'm doing wrong? Thanks!
FILE=$1
tail -n10 mylog > $FILE
USCOUNTER=0
cat $FILE | while read line; do
country=$(echo "$line" | cut -d' ' -f1)
if [ "US" = "$country" ]; then
USCOUNTER=`expr $USCOUNTER + 1`
echo "US counter $USCOUNTER"
fi
done
echo "final $USCOUNTER"
It outputs:
US counter 1
US counter 2
US counter 3
..
final 0
You are using USCOUNTER in a subshell, that's why the variable is not showing in the main shell.
Instead of cat FILE | while ..., do just a while ... done < $FILE. This way, you avoid the common problem of I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?:
while read country _; do
if [ "US" = "$country" ]; then
USCOUNTER=$(expr $USCOUNTER + 1)
echo "US counter $USCOUNTER"
fi
done < "$FILE"
Note I also replaced the `` expression with a $().
I also replaced while read line; do country=$(echo "$line" | cut -d' ' -f1) with while read country _. This allows you to say while read var1 var2 ... varN where var1 contains the first word in the line, $var2 and so on, until $varN containing the remaining content.
Always use -r with read.
There is no need to use cut, you can stick with pure bash solutions.
In this case passing read a 2nd var (_) to catch the additional "fields"
Prefer [[ ]] over [ ].
Use arithmetic expressions.
Do not forget to quote variables! Link includes other pitfalls as well
while read -r country _; do
if [[ $country = 'US' ]]; then
((USCOUNTER++))
echo "US counter $USCOUNTER"
fi
done < "$FILE"
minimalist
counter=0
((counter++))
echo $counter
You're getting final 0 because your while loop is being executed in a sub (shell) process and any changes made there are not reflected in the current (parent) shell.
Correct script:
while read -r country _; do
if [ "US" = "$country" ]; then
((USCOUNTER++))
echo "US counter $USCOUNTER"
fi
done < "$FILE"
I had the same $count variable in a while loop getting lost issue.
#fedorqui's answer (and a few others) are accurate answers to the actual question: the sub-shell is indeed the problem.
But it lead me to another issue: I wasn't piping a file content... but the output of a series of pipes & greps...
my erroring sample code:
count=0
cat /etc/hosts | head | while read line; do
((count++))
echo $count $line
done
echo $count
and my fix thanks to the help of this thread and the process substitution:
count=0
while IFS= read -r line; do
((count++))
echo "$count $line"
done < <(cat /etc/hosts | head)
echo "$count"
USCOUNTER=$(grep -c "^US " "$FILE")
Incrementing a variable can be done like that:
_my_counter=$[$_my_counter + 1]
Counting the number of occurrence of a pattern in a column can be done with grep
grep -cE "^([^ ]* ){2}US"
-c count
([^ ]* ) To detect a colonne
{2} the colonne number
US your pattern
Using the following 1 line command for changing many files name in linux using phrase specificity:
find -type f -name '*.jpg' | rename 's/holiday/honeymoon/'
For all files with the extension ".jpg", if they contain the string "holiday", replace it with "honeymoon". For instance, this command would rename the file "ourholiday001.jpg" to "ourhoneymoon001.jpg".
This example also illustrates how to use the find command to send a list of files (-type f) with the extension .jpg (-name '*.jpg') to rename via a pipe (|). rename then reads its file list from standard input.

How sort recursively by maximum file size and count files?

I'm beginner in bash programming. I want to display head -n $1 results of sorting files
by size in /etc/*. The problem is that at final search, I must know how many directories and files has processed.
I compose following code:
#!/bash/bin
let countF=0;
let countD=0;
for file in $(du -sk /etc/* |sort +0n | head $1); do
if [ -f "file" ] then
echo $file;
let countF=countF+1;
else if [ -d "file" ] then
let countD=countD+1;
fi
done
echo $countF
echo $countD
I have errors at execution. How use find with du, because I must search recursively?
#!/bin/bash # directory and program reversed
let countF=0 # semicolon not needed (several more places)
let countD=0
while read -r file; do
if [ -f "$file" ]; then # missing dollar sign and semicolon
echo $file
let countF=countF+1 # could also be: let countF++
else if [ -d "$file" ]; then # missing dollar sign and semicolon
let countD=countD+1
fi
done < <(du -sk /etc/* |sort +0n | head $1) # see below
echo $countF
echo $countD
Changing the loop from a for to a while allows it to work properly in case filenames contain spaces.
I'm not sure what version of sort you have, but I'll take your word for it that the argument is correct.
It's #!/bin/bash not #!/bash/bin.
I don't know what that argument to sort is supposed to be. Maybe you meant sort -r -n?
Your use of head is wrong. Giving head file arguments causes it to ignore its standard input, so in general it's an error to both pipe something to head and give it a file argument. Besides that, "$1" refers to the script's first argument. Did you maybe mean head -n 1, or were you trying to make the number of lines processed configurable from an argument to the script: head -n"$1".
In your if tests, you're not referencing your loop variable: it should read "$file", not "file".
Not that the bash parser cares, but you should try to indent sanely.
#!/bin/bash # directory and program reversed
let countF=0 # semicolon not needed (several more places)
let countD=0
while read -r file; do
if [ -f "$file" ]; then # missing dollar sign and semicolon
echo $file
let countF=countF+1 # could also be: let countF++
else if [ -d "$file" ]; then # missing dollar sign and semicolon
let countD=countD+1
fi
done < <(du -sk /etc/* |sort +0n | head $1) # see below
echo $countF
echo $countD
I tried instead of file variable the /etc/* but I don't see a result. the idea is to sort all files by size from a directories and subdirectories and display $1 results ordered by
size of the files. In this process I must know how many files and dirs contains the directory where
I did the search.
Ruby(1.9+)
#!/usr/bin/env ruby
fc=0
dc=0
a=Dir["/etc/*"].inject([]) do |x,f|
fc+=1 if File.file?(f)
dc+=1 if File.directory?(f)
x<<f
end
puts a.sort
puts "number of files: #{fc}"
puts "number of directories: #{dc}"

Resources