Bash script being too resource intensive - bash

I wrote a script in bash that basically takes a wordlist file and checks every line it contains against another list, and outputs the non-matching lines to "uniques.txt". I found though, that this is VERY resource intensive, and takes a lot of time. As i am not a computer scientist or engineer, i don't really know what is going on in the metal.. I heard "C" was a great language because of this issue... Heres a portion of the code:
if [[ "$1" =~ ^\-i(.*)+$ ]]; then
echo "[*] Testing lines in \""$2"\" against \""$3"\"..."
for string in $(cat "$2"); do
if ! cat "$3" | grep -x "$string" &>/dev/null; then
echo "$string" >> uniques.txt
fi
done
fi
A sample use of this script would be: "$script" -i "$wordlist" "$wordlist_to check_against".
The contents of the files would be strings with no spaces in between, one per line, as in:
johnson
peter
newyork
amsterdam

The regex you match $1 against makes no sense. The first parameter should start with -i followed by anything (including an empty string) repeated at least once. It's identical to ^-i, i.e. it starts with -i.
"in \""$2"\" is strange. It prints $2 unquoted, i.e. it can show the name wrong if it contains whitespace (e.g. file a b will be shown as a b).
in $(cat means the words are read from file one by one, i.e. if there is more than one word per line in $2, they will be matched separately.
You can use grep -f to read the patterns from a file and avoid the loops that cause the slowness:
#! /bin/bash
if [[ $1 =~ ^-i ]]; then
echo "[*] Testing lines in \"$2\" against \"$3\"..."
grep -vxf "$2" "$3"
fi

Related

If a bash command accepts a filename as the last arg, can piping be used to provide file content?

Say a command takes a filename as its last argument:
count-words "$word" file.txt
Is there a way to use pipe in order to provide file content rather than writing to a temp file?
I'm not sure if I'm understanding your requirement correctly but you may
be in the situation that:
You have a program (say "generate-words") which prints a text
to the stdout.
You also have a program "count-words" which counts a given word
in the specified text file.
You can combine two programs by writing the output of "generate-words"
to a temporary file file.txt.
But you want to find a solution without writing to a temporary file.
If the assumptions above are correct, please try:
count-words "$word" <(generate-words)
where <(command) is called process substitution and you can connect the
output (stdout) of the command to other program which requires filename
as an input.
Each argument, regardless of position, is a character string which the program handles itself after bash executes it. bash can't intervene in that.
Individual programs may provide an option to read from standard input, or do so by default, but if the program doesn't do that you need to point it at a file on the file system.
Many programs will accept a single hyphen (-) as an instruction to use standard input. Consider, given how you're invoking your example command, the following:
cat file.txt | grep -Fow "$word" - | wc -l
This counts instances of $word in the given file. The -F option speeds up the search by disabling regular expressions (so . actually means .), -o sets the output to show just the matches (one per line), and the -w option requires a word boundary on both sides of the word (so foo will not match food; remove this flag to change that). wc -l will give you the count of the lines outputted by grep, which is the number of instances of $word. (I didn't use grep -c because that counts the lines with matches, which means foo bar foo baz on a line would be counted just once.)
If count-words is your script, consider one of these options:
# imply standard input when given insufficient arguments
# or when the only argument is a hyphen
# (requires you to `shift` your options and the query term)
if [ "$#" = "0" ] || [ "$*" = "-" ]; then
set -- /dev/stdin
fi
# convert hyphen(s) to /dev/stdin within the argument list
FIRST=1
for OPT in "$#"; do
if [ "$FIRST" = 1 ]; then
unset FIRST
set --
fi
if [ "$OPT" = "-" ]; then
OPT="/dev/stdin"
fi
set -- "$#" "$OPT"
done
This will let you run as any of
count-words "$word" < file.txt
get-input | count-words "$word"
get-input | count-words "$word" -
get-input | count-words "$word" /dev/stdin
echo "$(get-input)" | count-words "$word"
count-words "$word" <<<"list of words as if echoed"
count-words "$word" <(get-input)
The last three are bashisms and will not work in dash or other simpler /bin/sh programs. The very last command tells count-words to use process substitution to supply a named pipe as a temporary filehandle that stores the output of get-input.
As Gordon Davisson mentioned in the comment, you could use the special file /dev/stdin which represents standard input.
some-command | count-words "$word" /dev/stdin

grep, else print message for no matches

In a bash script, I have a list of lines in a file I wish to grep and then display on standard out, which is easiest done with a while read:
grep "regex" "filepath" | while read line; do
printf "$line\n"
done
However, I would like to inform the user if no lines were matched by the grep. I know that one can do this by updating a variable inside the loop but it seems like a much more elegant approach (if possible) would be to try to read a line in an until loop, and if there were no output, an error message could be displayed.
This was my first attempt:
grep "regex" "filepath" | until [[ -z ${read line} ]]; do
if [[ -z $input ]]; then
printf "No matches found\n"
break
fi
printf "$line\n"
done
But in this instance the read command is malformed, and I wasn't sure of another way the phrase the query. Is this approach possible, and if not, is there a more suitable solution to the problem?
You don't need a loop at all if you simply want to display a message when there's no match. Instead you can use grep's return code. A simple if statement will suffice:
if ! grep "regex" "filepath"; then
echo "no match" >&2
fi
This will display the results of grep matches (since that's grep's default behavior), and will display the error message if it doesn't.
A popular alternative to if ! is to use the || operator. foo || bar can be read as "do foo or else do bar", or "if not foo then bar".
grep "regex" "filepath" || echo "no match" >&2
John Kugelman's answer is the correct and succinct one and you should accept it. I am addressing your question about syntax here just for completeness.
You cannot use ${read line} to execute read -- the brace syntax actually means (vaguely) that you want the value of a variable whose name contains a space. Perhaps you were shooting for $(read line) but really, the proper way to write your until loop would be more along the lines of
grep "regex" "filepath" | until read line; [[ -z "$line" ]]; do
... but of course, when there is no output, the pipeline will receive no lines, so while and until are both wrong here.
It is worth amphasizing that the reason you need a separate do is that you can have multiple commands in there. Even something like
while output=$(grep "regex filepath"); echo "grep done, please wait ...";
count=$(echo "$output" | wc -l); [[ $count -gt 0 ]]
do ...
although again, that is much more arcane than you would ever really need. (And in this particular case, you would want probably actually want if , not while.)
As others already noted, there is no reason to use a loop like that here, but I wanted to sort out the question about how to write a loop like this for whenever you actually do want one.
As mentioned by #jordanm, there is no need for a loop in the use case you mentioned.
output=$(grep "regex" "file")
if [[ -n $output ]]; then
echo "$output"
else
echo "Sorry, no results..."
fi
If you need to iterate over the results for processing (rather than just displaying to stdout) then you can do something like this:
output=$(grep "regex" "file")
if [[ -n $output ]]; then
while IFS= read -r line; do
# do something with $line
done <<< "$output"
else
echo "Sorry, no results..."
fi
This method avoids using a pipeline or subshell so that any variable assignments made within the loop will be available to the rest of the script.
Also, i'm not sure if this relates to what you are trying to do at all, but grep does have the ability to load patterns from a file (one per line). It is invoked as follows:
grep search_target -f pattern_file.txt

Script to call either from file or user input

I'm trying to write a small script that either takes input from a file or from user, then it gets rid of any blank lines from it.
I'm trying to make it so that if there is no file name specified it will prompt the user for input. Also is the best way to output the manual input to a file then run the code or to store it in a variable?
So far I have this but when I run it with a file it give 1 line of error before returning the output I want. The error says ./deblank: line 1: [blank_lines.txt: command not found
if [$# -eq "$NO_ARGS"]; then
cat > temporary.txt; sed '/^$/d' <temporary.txt
else
sed '/^$/d' <$#
fi
Where am I going wrong?
You need spaces around [ and ]. In bash, [ is a command and you need spaces around it for bash to interpret it so.
You can also check for the presence of arguments by using (( ... )). So your script could be rewritten as:
if ((!$#)); then
cat > temporary.txt; sed '/^$/d' <temporary.txt
else
sed '/^$/d' "$#"
fi
If you want to use only the first argument, then you need to say $1 (and not $#).
Try using this
if [ $# -eq 0 ]; then
cat > temporary.txt; sed '/^$/d' <temporary.txt
else
cat $# | sed '/^$/d'
fi
A space is needed between [ and $# and your usage of $# is not good. $# represents all arguments and -eq is used to compare numeric values.
There are multiple problems here:
You need to leave a space between the square brackets [ ] and the variables.
When using a string type, you cannot use -eq, use == instead.
When using a string comparison you need to use double square brackets.
So the code should look like:
if [[ "$#" == "$NO_ARGS" ]]; then
cat > temporary.txt; sed '/^$/d' <temporary.txt
else
sed '/^$/d' <$#
fi
Or else use $# instead.
Instead of forcing user input to a file, I'd force the given file to stdin:
#!/bin/bash
if [[ $1 && -r $1 ]]; then
# it's a file
exec 0<"$1"
elif ! tty -s; then
: # input is piped from stdin
else
# get input from user
echo "No file specified, please enter your input, ctrl-D to end"
fi
# now, let sed read from stdin
sed '/^$/d'

bash sed fail in while loop

#!/bin/bash
fname=$2
rname=$1
echo "$(<$fname)" | while read line ; do
result=`echo "$(<$rname)" | grep "$line"; echo $?`
if [ $result != 0 ]
then
sed '/$line/d' $fname > newkas
fi 2> /dev/null
done
Hi all, i am new to bash.
i have two lists one older than another. I wish to compare the names on 'fname' against 'rname'. 'Result' is the standard out put which i will get if the name is still available in 'rname'. if is not then i will get the non-zero output.
Using sed to delete that line and re route it to a new file.
I have tried part by part of the code and it works until i add in the while loop function. sed don't seems to work as the final output of 'newkas' is the same as the initial input 'fname'.
Is my method wrong or did i miss out any parts?
Part 1: What's wrong
The reason your sed expression "doesn't work" is because you used single quotes. You said
sed '/$line/d' $fname > newkas
Supposing fname=input.txt' and line='example text' this will expand to:
sed '/$line/d' input.txt > newkas
Note that $line is still literally present. This is because bash will not interpolate variables inside single quotes, thus sed sees the $ literally.
You could fix this by saying
sed "/$line/d/" $fname > newkas
Because inside double quotes the variable will expand. However, if your sed expression becomes more complicated you could run into difficulty in cases where bash interprets things which you intended to be interpreted by sed. I tend to use the form
sed '/'"$line"'/d/' $fname > newkas
Which is a bit harder to read but, if you look carefully, single-quotes everything I intend to be part of the sed expression and double quotes the variable I want to expand.
Part 2: How to improve it
Your script contains a number things which could be improved.
echo "$(<$fname)" | while read line ; do
:
done
In the first place you're reading the file with "$(<$fname)" when you could just redirect the stdin of the while loop. This is a bit redundant, but more importantly you're piping to while, which creates an extra subshell and means you can't modify any variables from the enclosing scope. Better to say
while IFS= read -r line ; do
:
done < "$fname"
Next, consider your grep
echo "$(<$rname)" | grep "$line"
Again you're reading the file and echoing it to grep. But, grep can read files directly.
grep "$line" "$rname"
Afterwards you echo the return code and check its value in an if statement, which is a classic useless construct.
result=$( grep "$line" "$rname" ; echo $?)
Instead you can just pass grep directly to if, which will test its return code.
if grep -q "$line" "$rname" ; then
sed "/$line/d" "$fname" > newkas
fi
Note here that I have quoted $fname, which is important if it might ever contain a space. I have also added -q to grep, which suppresses its output.
There's now no need to suppress error messages from the if statement, here, because we don't have to worry about $result containing an unusual value or grep not returning properly.
The final result is this script
while IFS= read -r line ; do
if grep -q "$line" "$rname" ; then
sed "/$line/d" "$fname" > newkas
fi
done < "$fname"
Which will not work, because newkas is overwritten on every loop. This means that in the end only the last line in $fname was used. Instead you could say:
cp "$fname" newkas
while IFS= read -r line ; do
if grep -q "$line" "$rname" ; then
sed -i '' "/$line/d" newkas
fi
done < "$fname"
Which, I believe, will do what you expect.
Part 3: But don't do that
But this is all tangential to solving your actual problem. It appears to me that you want to simply create a file newkas which contains the all the lines of $fname except those that appear in $rname. This is easily done with the comm utility:
comm -2 -3 <(sort "$fname") <(sort "$rname") > newkas
This also changes the sort order of the lines, which may not be good for you. If you want to do it without changing the ordering then using the method #fge suggests is best.
grep -F -v -x -f "$rname" "$fname"
If I understand your need correctly, you want a file newaks which contains the lines in $fname which are also in $rname.
If this is what you want, using sed is overkill. Use fgrep:
fgrep -x -f $fname $rname > newkas
Also, there are problems with your script:
you capture the output of grep in result, which means it will never be exactly 0; what you want is executing the command and simply check for $?
your echoes are convoluted, just do grep whatever thefilename, or while...done <thefile;
finally, you take the line as is from the source file: the line can potentially be a regex, which means you will try and match a regex in $rname, which may yield to unexpected results.
And others.

Check execute command after cheking file type

I am working on a bash script which execute a command depending on the file type. I want to use the the "file" option and not the file extension to determine the type, but I am bloody new to this scripting stuff, so if someone can help me I would be very thankful! - Thanks!
Here the script I want to include the function:
#!/bin/bash
export PrintQueue="/root/xxx";
IFS=$'\n'
for PrintFile in $(/bin/ls -1 ${PrintQueue}) do
lpr -r ${PrintQueue}/${PrintFile};
done
The point is, all files which are PDFs should be printed with the lpr command, all others with ooffice -p
You are going through a lot of extra work. Here's the idiomatic code, I'll let the man page provide the explanation of the pieces:
#!/bin/sh
for path in /root/xxx/* ; do
case `file --brief $path` in
PDF*) cmd="lpr -r" ;;
*) cmd="ooffice -p" ;;
esac
eval $cmd \"$path\"
done
Some notable points:
using sh instead of bash increases portability and narrows the choices of how to do things
don't use ls when a glob pattern will do the same job with less hassle
the case statement has surprising power
First, two general shell programming issues:
Do not parse the output of ls. It's unreliable and completely useless. Use wildcards, they're easy and robust.
Always put double quotes around variable substitutions, e.g. "$PrintQueue/$PrintFile", not $PrintQueue/$PrintFile. If you leave the double quotes out, the shell performs wildcard expansion and word splitting on the value of the variable. Unless you know that's what you want, use double quotes. The same goes for command substitutions $(command).
Historically, implementations of file have had different output formats, intended for humans rather than parsing. Most modern implementations have an option to output a MIME type, which is easily parseable.
#!/bin/bash
print_queue="/root/xxx"
for file_to_print in "$print_queue"/*; do
case "$(file -i "$file_to_print")" in
application/pdf\;*|application/postscript\;*)
lpr -r "$file_to_print";;
application/vnd.oasis.opendocument.*)
ooffice -p "$file_to_print" &&
rm "$file_to_print";;
# and so on
*) echo 1>&2 "Warning: $file_to_print has an unrecognized format and was not printed";;
esac
done
#!/bin/bash
PRINTQ="/root/docs"
OLDIFS=$IFS
IFS=$(echo -en "\n\b")
for file in $(ls -1 $PRINTQ)
do
type=$(file --brief $file | awk '{print $1}')
if [ $type == "PDF" ]
then
echo "[*] printing $file with LPR"
lpr "$file"
else
echo "[*] printing $file with OPEN-OFFICE"
ooffice -p "$file"
fi
done
IFS=$OLDIFS

Resources