looping grep function - bash

So I was building a script for a co-worker so she can easily scan files for occurrences of strings. But I am having trouble with my grep command.
#!/bin/bash -x
filepath() {
echo -n "Please enter the path of the folder you would like to scan, then press [ENTER]: "
read path
filepath=$path
}
filename () {
echo -n "Please enter the path/filename you would like the output saved to, then press [ENTER]: "
read outputfile
fileoutput=$outputfile
touch $outputfile
}
searchstring () {
echo -n "Please enter the string you would like to seach for, then press [ENTER]: "
read searchstring
string=$searchstring
}
codeblock() {
for i in $(ls "${filepath}")
do
grep "'${string}'" "$i" | wc -l | sed "s/$/ occurance(s) in "${i}" /g" >> "${fileoutput}"
done
}
filepath
filename
searchstring
codeblock
exit
I know there are a lot of extra variable "redirects" Just practicing my scripting. Here is the error I am receiving when I run the command.
+ for i in '$(ls "${filepath}")'
+ grep ''\''<OutageType>'\''' *filename*.DONE.xml
+ wc -l
+ sed 's/$/ occurance(s) in *filename*.DONE.xml /g'
grep: *filename*.DONE.xml: No such file or directory
However if I run the grep command with the wc and sed functions from CLI it works fine.
# grep '<OutageNumber>' "*filename*.DONE.xml" | wc -l | sed "s/$/ occurance(s) in "*filename*.DONE.xml" /g"
13766 occurance(s) in *filename*.DONE.xml

There are several things going wrong here.
for i in $(ls "${filepath}")
The value of filepath is *filename*.DONE.xml, and if you assume that the * get expanded there, that won't happen. A double-quoted string variable is taken literally by the shell, the * will not get expanded.
If you want wildcard characters to be expanded to match filename patterns,
then you cannot double-quote the variable in the command.
Next, it's strongly discouraged to parse the output of the ls command. This would be better:
for i in ${filepath}
And this still won't be "perfect', because if there are no files matching the pattern,
then grep will fail. To avoid that, you could enable the nullglob option:
shopt -s nullglob
for i in ${filepath}
Finally, I suggest to eliminate this for loop,
and use the grep command directly:
grep "'${string}'" ${filepath} | ...

Related

combining all files that contains the same word into a new text file with leaving new lines between individual files

it is my first question here. I have a folder called "materials", which has 40 text files in it. I am basically trying to combine the text files that contain the word "carbon"(both in capitalized and lowercase form)in it into a single file with leaving newlines between them. I used " grep -w carbon * " to identify the files that contain the word carbon. I just don't know what to do after this point. I really appreciate all your help!
grep -il carbon materials/*txt | while read line; do
echo ">> Adding $line";
cat $line >> result.out;
echo >> result.out;
done
Explanation
grep searches the strings in the files. -i ignores the case for the searched string. -l prints on the filename containing the string
while command loops over the files containing the string
cat with >> appends to the results.out
echo >> adds new line after appending each files content to result.out
Execution
$ ls -1 materials/*.txt
materials/1.txt
materials/2.txt
materials/3.txt
$ grep -i carbon materials/*.txt
materials/1.txt:carbon
materials/2.txt:CARBON
$ grep -irl carbon materials/*txt | while read line; do echo ">> Adding $line"; cat $line >> result.out; echo >> result.out; done
>> Adding materials/1.txt
>> Adding materials/2.txt
$ cat result.out
carbon
CARBON
Try this (assuming your filenames don't contain newline characters):
grep -iwl carbon ./* |
while IFS= read -r f; do cat "$f"; echo; done > /tmp/combined
If it is possible that your filenames may contain newline characters and your shell is bash, then:
grep -iwlZ carbon ./* |
while IFS= read -r -d '' f; do cat "$f"; echo; done > /tmp/combined
grep is assumed to be GNU grep (for the -w and -Z options). Note that these will leave a trailing newline character in the file /tmp/combined.

Bad Substitution error with pdfgrep as variable?

I'm using a bash script to parse information from a PDF and use it to rename the file (with the help of pdfgrep). However, after some working, I'm receiving a "Bad Substitution" error with line 5. Any ideas on how to reformat it?
shopt -s nullglob nocaseglob
for f in *.pdf; do
id1=$(pdfgrep -i "ID #: " "$f" | grep -oE "[M][0-9][0-9]+")
id2=$(pdfgrep -i "Second ID: " "$f" | grep -oE "[V][0-9][0-9]+")
$({ read dobmonth; read dobday; read dobyear; } < (pdfgrep -i "Date Of Birth: " "$f" | grep -oE "[0-9]+"))
# Check id1 is found, else do nothing
if [ ${#id1} ]; then
mv "$f" "${id1}_${id2}_${printf '%02d-%02d-%04d\n' "$dobmonth" "$dobday" "$dobyear"}.pdf"
fi
done
There are several unrelated bugs in this code; a corrected version might look like the following:
#!/usr/bin/env bash
shopt -s nullglob nocaseglob
for f in *.pdf; do
id1=$(pdfgrep -i "ID #: " "$f" | grep -oE "[M][0-9][0-9]+") || continue
id2=$(pdfgrep -i "Second ID: " "$f" | grep -oE "[V][0-9][0-9]+") || continue
{ read dobmonth; read dobday; read dobyear; } < <(pdfgrep -i "Date Of Birth: " "$f" | grep -oE "[0-9]+")
printf -v date '%02d-%02d-%04d' "$dobmonth" "$dobday" "$dobyear"
mv -- "$f" "${id1}_${id2}_${date}.pdf"
done
< (...) isn't meaningful bash syntax. If you want to redirect from a process substitution, you should use the redirection syntax < and the process substitution <(...) separately.
$(...) generates a subshell -- a separate process with its own memory, such that variables assigned in that subprocess aren't exposed to the larger shell as a whole. Consequently, if you want the contents you set with read to be visible, you can't have them be in a subshell.
${printf ...} isn't meaningful syntax. Perhaps you wanted a command substitution? That would be $(printf ...), not ${printf ...}. However, it's more efficient to use printf -v varname 'fmt' ..., which avoids the overhead of forking off a subshell altogether.
Because we put the || continues on the id1=$(... | grep ...) command, we no longer need to test whether id1 is nonempty: The continue will trigger and cause the shell to continue to the next file should the grep fail.
Do what Charles suggests wrt creating the new file name but you might consider a different approach to parsing the PDF file to reduce how many pdfregs and pipes and greps you're doing on each file. I don't have pdfgrep on my system, nor do I know what your input file looks like but if we use this input file:
$ cat file
foo
ID #: M13
foo
Date Of Birth: 05 21 1996
foo
Second ID: V27
foo
and grep -E in place of pdfgrep then here's how I'd get the info from the input file by just reading it once with pdfgrep and parsing that output with awk instead of reading it multiple times with pdfgrep and using multiple pipes and greps to extract the info you need:
$ grep -E -i '(ID #|Second ID|Date Of Birth): ' file |
awk -F': +' '{f[$1]=$2} END{print f["ID #"], f["Second ID"], f["Date Of Birth"]}'
M13 V27 05 21 1996
Given that you can use the same read approach to save the output in variables (or an array). You obviously may need to massage the awk command depending on what your pdfgrep output actually looks like.

Duplicate the output of bash script

Below is the piece of code of my bash script, I want to get duplicate output of that script.
This is how my script runs
#bash check_script -a used_memory
Output is: used_memory: 812632
Desired Output: used_memory: 812632 | used_memory: 812632
get_vals() {
metrics=`command -h $hostname -p $port -a $pass info | grep -w $opt_var | cut -d ':' -f2 > ${filename}`
}
output() {
get_vals
if [ -s ${filename} ];
then
val1=`cat ${filename}`
echo "$opt_var: $val1"
# rm $filename;
exit $ST_OK;
else
echo "Parameter not found"
exit $ST_UK
fi
}
But when i used echo "$opt_var: $val1 | $opt_var: $val1" the output become: | used_memory: 812632
$opt_var is an argument.
I had a similar problem when capturing results from cat with Windows-formatted text files. One way to circumvent this issue is to pipe your result to dos2unix, e.g.:
val1=`cat ${filename} | dos2unix`
Also, if you want to duplicate lines, you can use sed:
sed 's/^\(.*\)$/\1 | \1/'
Then pipe it to your echo command:
echo "$opt_var: $val1" | sed 's/^\(.*\)$/\1 | \1/'
The sed expression works like that:
's/<before>/<after>/' means that you want to substitute <before> with <after>
on the <before> side: ^.*$ is a regular expression meaning you get the entire line, ^\(.*\)$ is basically the same regex but you get the entire line and you capture everything (capturing is performed inside the \(\) expression)
on the <after> side: \1 | \1 means you write the 1st captured expression (\1), then the space character, then the pipe character, then the space character and then the 1st captured expression again
So it captures your entire line and duplicates it with a "|" separator in the middle.

Shell script hangs forever grepping from file with name from "read $file"

I have my below shell script which searches for a string inside a file and returns the count. Not sure why it's getting stuck in the middle. Please can anyone explain.
#!/bin/bash
read -p "Enter file to be searched: " $file
read -p "Enter the word you want to search for: " $word
count=$(grep -o "^${word}:" $file | wc -l)
echo "The count for `$word`: " $count
OUTPUT:
luckee#zarvis:~/scripts$ ./wordsearch.sh
Enter file to be searched: apple.txt
Enter the word you want to search for: apple
^C
read needs to be passed a variable name. file, not $file.
#!/bin/bash
read -p "Enter file to be searched: " file
read -p "Enter the word you want to search for: " word
count=$(grep -o -e "$word" "$file" | wc -l)
echo "The count for $word: $count"
What was happening previously is that your file variable was empty, so your code was running:
count=$(grep -o "^${word}:" | wc -l)
...with no input specified, so it would wait forever for stdin.
By the way -- you don't need wc for this; grep can emit a counter itself, using the -c argument (also called --count in the GNU implementation). If you want that counter to go by words rather than lines, one can use tr to put each word on its own line:
count=$(tr '[[:space:]]' '\n' <"$file" | grep -c -e "$word")

Pass multiple file names captured in a variable to a command (vim)

I am trying to create a script that opens automatically any files containing a particular pattern.
This is what I achieved so far:
xargs -d " " vim < "$(grep --color -r test * | cut -d ':' -f 1 | uniq | sed ':a;N;$!ba;s/\n/ /g')"
The problem is that vim does not recognize the command as separate file of list, but as a whole filename instead:
zsh: file name too long: ..............
Is there an easy way to achieve it? What am I missing?
The usual way to call xargs is just to pass the arguments with newlines via a pipe:
grep -Rl test * | xargs vim
Note that I'm also passing the -l argument to grep to list the files that contain my pattern.
Use this:
vim -- `grep -rIl test *`
-I skip matching in binary files
-l print file name at first match
Try to omit xargs, becouse this leads to incorrect behaviour of vim:
Vim: Warning: Input is not from a terminal
What I usually do is append the following line to a list of files:
> ~/.files.txt && vim $(cat ~/.files.txt | tr "\n" " ")
For example :
grep --color -r test * > ~/.files.txt && vim $(cat ~/.files.txt | tr "\n" " ")
I have the following in my .bashrc to bind VV (twice V in uppercase) to insert that automatically :
insertinreadline() {
READLINE_LINE=${READLINE_LINE:0:$READLINE_POINT}$1${READLINE_LINE:$READLINE_POINT}
READLINE_POINT=`expr $READLINE_POINT + ${#1}`
}
bind -x '"VV": insertinreadline " > ~/.files.txt && vim \$(cat ~/.files.txt | tr \"\\n\" \" \")"'

Resources