Bash grep, awk o sed to reverse find - bash

I am creating a script to look for commonly used patterns in a password.Although I have security policies in the hosting panel, servers have been outdated due to incompatibilities.
Example, into the file words.txt, i put in there, the word test, when i execute grep -c test123 words.txt. When I look for that pattern I need it to find it but I think that with the command grep it won't work for me.
Script:
EMAILPASS=`/root/info.sh -c usera | grep #`
for PAR in ${EMAILPASS} ; do
EMAIL=$(echo "${PAR}" | grep # | cut -f1 -d:)
PASS=$(echo "${PAR}" | cut -d: -f 2)
PASS="${PASS,,}"
FINDSTRING=$(grep -ic "${PASS}" /root/words.txt)
echo -e ""
echo -e "Validating password ${EMAIL}"
echo -e ""
if [ $FINDSTRING -ge 1 ] ; then
echo "Insecre"
else
echo "Secure"
fi
the current output of the command is as follows
# grep -c test123 /root/words.txt
0
I think grep is not good for what I need, maybe someone can help me.
I could also use awk or sed but I can't find an option to help me.
Regardsm

Reverse your application.
echo test123 | grep -f words.txt
Each line of the text file will be used as a pattern to test against the input.
edit
Apparently you actually do want to see if the whole password is an actual word, rather than just checking to see if it's based on a dictionary word. That's considerably less secure, but easy enough to do. The logic you have will not report test123 as insecure unless the whole passwword is an exact match for a word in the dictionary.
You said you were putting test in the dictionary and using test123 as the password, so I assumed you were looking for passwords based on dictionary words, which was the structure I suggested above. Will include as commented alternate lines below.
Also, since you're doing a case insensitive search, why bother to downcase the password?
declare -l pass # set as always lowecase
would do it, but there's no need.
Likewise, unless you are using it again later, it isn't necessary to put everything into a variable first, such as the grep results. Try to remove anything not needed -- less is more.
Finally, since we aren't catching the grep output in a variable and testing that, I threw it away with -q. All we need to see is whether it found anything, and the return code, checked by the if, tells us that.
/root/info.sh -c usera | grep # | # only lines with at signs
while IFS="$IFS:" read email pass # parse on read with IFS
do printf "\n%s\n\n" "Validating password for '$email'"
if grep -qi "$pass" /root/words.txt # exact search (-q = quiet)
#if grep -qif /root/words.txt <<< "$pass" # 'based on' search
then echo "Insecure"
else echo "Secure" # well....
fi
done
I think a better paradigm might be to just report the problematic ones and be silent for those that seem ok, but that's up to you.
Questions?

Related

linux bash insert text at a variable line number in a file

I'm trying to temporarily disable dhcp on all connections in a computer using bash, so I need the process to be reversible. My approach is to comment out lines that contain BOOTPROTO=dhcp, and then insert a line below it with BOOTPROTO=none. I'm not sure of the correct syntax to make sed understand the line number stored in the $insertLine variable.
fileList=$(ls /etc/sysconfig/network-scripts | grep ^ifcfg)
path="/etc/sysconfig/network-scripts/"
for file in $fileList
do
echo "looking for dhcp entry in $file"
if [ $(cat $path$file | grep ^BOOTPROTO=dhcp) ]; then
echo "disabling dhcp in $file"
editLine=$(grep -n ^BOOTPROTO=dhcp /$path$file | cut -d : -f 1 )
#comment out the original dhcp value
sed -i "s/BOOTPROTO=dhcp/#BOOTPROTO=dhcp/g" $path$file
#insert a line below it with value of none.
((insertLine=$editLine+1))
sed "$($insertLine)iBOOTPROTO=none" $path$file
fi
done
Any help using sed or other stream editor greatly appreciated. I'm using RHEL 6.
The sed editor should be able to do the job, without having to to be combine bash, grep, cat, etc. Easier to test, and more reliable.
The whole scripts can be simplified to the below. It performs all operations (substitution and the insert) with a single pass using multiple sed scriptlets.
#! /bin/sh
for file in $(grep -l "^BOOTPROTO=dhcp" /etc/sysconfig/network-scripts/ifcfg*) ; do
sed -i -e "s/BOOTPROTO=dhcp/#BOOTPROTO=dhcp/g" -e "/BOOTPROTO=dhcp/i BOOTPROTO=none" $file
done
As side note consider NOT using path as variable to avoid possible confusion with the 'PATH` environment variable.
Writing it up, your attempt with the following fails:
sed "$($insertLine)iBOOTPROTO=none" $path$file
because:
$($insertLine) encloses $insertLIne in a command substitution which when $insertLIne is evaluated it returns a number which is not a command generating an error.
your call to sed does not include the -i option to edit the file $path$file in place.
You can correct the issues with:
sed -i "${insertLine}i BOOTPROTO=none" $path$file
Which is just sed - i (edit in place) and Ni where N is the number of the line to insert followed by the content to insert and finally what file to insert it in. You add ${..} to insertLine to protect the variable name from the i that follows and then the expression is double-quoted to allow variable expansion.
Let me know if you have any further questions.
(and see dash-o's answer for refactoring the whole thing to simply use sed to make the change without spawning 10 other subshells)

How to get line WITH tab character using tail and head

I have made a script to practice my Bash, only to realize that this script does not take tabulation into account, which is a problem since it is designed to find and replace a pattern in a Python script (which obviously needs tabulation to work).
Here is my code. Is there a simple way to get around this problem ?
pressure=1
nline=$(cat /myfile.py | wc -l) # find the line length of the file
echo $nline
for ((c=0;c<=${nline};c++))
do
res=$( tail -n $(($(($nline+1))-$c)) myfile.py | head -n 1 | awk 'gsub("="," ",$1){print $1}' | awk '{print$1}')
#echo $res
if [ $res == 'pressure_run' ]
then
echo "pressure_run='${pressure}'" >> myfile_mod.py
else
echo $( tail -n $(($nline-$c)) myfile.py | head -n 1) >> myfile_mod.py
fi
done
Basically, it finds the line that has pressure_run=something and replaces it by pressure_run=$pressure. The rest of the file should be untouched. But in this case, all tabulation is deleted.
If you want to just do the replacement as quickly as possible, sed is the way to go as pointed out in shellter's comment:
sed "s/\(pressure_run=\).*/\1$pressure/" myfile.py
For Bash training, as you say, you may want to loop manually over your file. A few remarks for your current version:
Is /myfile.py really in the root directory? Later, you don't refer to it at that location.
cat ... | wc -l is a useless use of cat and better written as wc -l < myfile.py.
Your for loop is executed one more time than you have lines.
To get the next line, you do "show me all lines, but counting from the back, don't show me c lines, and then show me the first line of these". There must be a simpler way, right?
To get what's the left-hand side of an assignment, you say "in the first space-separated field, replace = with a space , then show my the first space separated field of the result". There must be a simpler way, right? This is, by the way, where you strip out the leading tabs (your first awk command does it).
To print the unchanged line, you do the same complicated thing as before.
A band-aid solution
A minimal change that would get you the result you want would be to modify the awk command: instead of
awk 'gsub("="," ",$1){print $1}' | awk '{print$1}'
you could use
awk -F '=' '{ print $1 }'
"Fields are separated by =; give me the first one". This preserves leading tabs.
The replacements have to be adjusted a little bit as well; you now want to match something that ends in pressure_run:
if [[ $res == *pressure_run ]]
I've used the more flexible [[ ]] instead of [ ] and added a * to pressure_run (which must not be quoted): "if $res ends in pressure_run, then..."
The replacement has to use $res, which has the proper amount of tabs:
echo "$res='${pressure}'" >> myfile_mod.py
Instead of appending each line each loop (and opening the file each time), you could just redirect output of your whole loop with done > myfile_mod.py.
This prints literally ${pressure} as in your version, because it's single quoted. If you want to replace that by the value of $pressure, you have to remove the single quotes (and the braces aren't needed here, but don't hurt):
echo "$res=$pressure" >> myfile_mod.py
This fixes your example, but it should be pointed out that enumerating lines and then getting one at a time with tail | head is a really bad idea. You traverse the file for every single line twice, it's very error prone and hard to read. (Thanks to tripleee for suggesting to mention this more clearly.)
A proper solution
This all being said, there are preferred ways of doing what you did. You essentially loop over a file, and if a line matches pressure_run=, you want to replace what's on the right-hand side with $pressure (or the value of that variable). Here is how I would do it:
#!/bin/bash
pressure=1
# Regular expression to match lines we want to change
re='^[[:space:]]*pressure_run='
# Read lines from myfile.py
while IFS= read -r line; do
# If the line matches the regular expression
if [[ $line =~ $re ]]; then
# Print what we matched (with whitespace!), then the value of $pressure
line="${BASH_REMATCH[0]}"$pressure
fi
# Print the (potentially modified) line
echo "$line"
# Read from myfile.py, write to myfile_mod.py
done < myfile.py > myfile_mod.py
For a test file that looks like
blah
test
pressure_run=no_tab
blah
something
pressure_run=one_tab
pressure_run=two_tabs
the result is
blah
test
pressure_run=1
blah
something
pressure_run=1
pressure_run=1
Recommended reading
How to read a file line-by-line (explains the IFS= and -r business, which is quite essential to preserve whitespace)
BashGuide

BASH Palindrome Checker

This is my first time posting on here so bear with me please.
I received a bash assignment but my professor is completely unhelpful and so are his notes.
Our assignment is to filter and print out palindromes from a file. In this case, the directory is:
/usr/share/dict/words
The word lengths range from 3 to 45 and are supposed to only filter lowercase letters (the dictionary given has characters and uppercases, as well as lowercase letters). i.e. "-dkas-das" so something like "q-evvavve-q" may count as a palindrome but i shouldn't be getting that as a proper result.
Anyways, I can get it to filter out x amount of words and return (not filtering only lowercase though).
grep "^...$" /usr/share/dict/words |
grep "\(.\).\1"
And I can use subsequent lines for 5 letter words and 7 and so on:
grep "^.....$" /usr/share/dict/words |
grep "\(.\)\(.\).\2\1"
But the prof does not want that. We are supposed to use a loop. I get the concept but I don't know the syntax, and like I said, the notes are very unhelpful.
What I tried was setting variables x=... and y=.. and in a while loop, having x=$x$y but that didn't work (syntax error) and neither did x+=..
Any help is appreciated. Even getting my non-lowercase letters filtered out.
Thanks!
EDIT:
If you're providing a solution or a hint to a solution, the simplest method is prefered.
Preferably one that uses 2 grep statements and a loop.
Thanks again.
Like this:
for word in `grep -E '^[a-z]{3,45}$' /usr/share/dict/words`;
do [ $word == `echo $word | rev` ] && echo $word;
done;
Output using my dictionary:
aha
bib
bob
boob
...
wow
Update
As pointed out in the comments, reading in most of the dictionary into a variable in the for loop might not be the most efficient, and risks triggering errors in some shells. Here's an updated version:
grep -E '^[a-z]{3,45}$' /usr/share/dict/words | while read -r word;
do [ $word == `echo $word | rev` ] && echo $word;
done;
Why use grep? Bash will happily do that for you:
#!/bin/bash
is_pal() {
local w=$1
while (( ${#w} > 1 )); do
[[ ${w:0:1} = ${w: -1} ]] || return 1
w=${w:1:-1}
done
}
while read word; do
is_pal "$word" && echo "$word"
done
Save this as banana, chmod +x banana and enjoy:
./banana < /usr/share/dict/words
If you only want to keep the words with at least three characters:
grep ... /usr/share/dict/words | ./banana
If you only want to keep the words that only contain lowercase and have at least three letters:
grep '^[[:lower:]]\{3,\}$' /usr/share/dict/words | ./banana
The multiple greps are wasteful. You can simply do
grep -E '^([a-z])[a-z]\1$' /usr/share/dict/words
in one fell swoop, and similarly, put the expressions on grep's standard input like this:
echo '^([a-z])[a-z]\1$
^([a-z])([a-z])\2\1$
^([a-z])([a-z])[a-z]\2\1$' | grep -E -f - /usr/share/dict/words
However, regular grep does not permit backreferences beyond \9. With grep -P you can use double-digit backreferences, too.
The following script constructs the entire expression in a loop. Unfortunately, grep -P does not allow for the -f option, so we build a big thumpin' variable to hold the pattern. Then we can actually also simplify to a single pattern of the form ^(.)(?:.|(.)(?:.|(.)....\3)?\2?\1$, except we use [a-z] instead of . to restrict to just lowercase.
head=''
tail=''
for i in $(seq 1 22); do
head="$head([a-z])(?:[a-z]|"
tail="\\$i${tail:+)?}$tail"
done
grep -P "^${head%|})?$tail$" /usr/share/dict/words
The single grep should be a lot faster than individually invoking grep 22 or 43 times on the large input file. If you want to sort by length, just add that as a filter at the end of the pipeline; it should still be way faster than multiple passes over the entire dictionary.
The expression ${tail+:)?} evaluates to a closing parenthesis and question mark only when tail is non-empty, which is a convenient way to force the \1 back-reference to be non-optional. Somewhat similarly, ${head%|} trims the final alternation operator from the ultimate value of $head.
Ok here is something to get you started:
I suggest to use the plan you have above, just generate the number of "." using a for loop.
This question will explain how to make a for loop from 3 to 45:
How do I iterate over a range of numbers defined by variables in Bash?
for i in {3..45};
do
* put your code above here *
done
Now you just need to figure out how to make "i" number of dots "." in your first grep and you are done.
Also, look into sed, it can nuke the non-lowercase answers for you..
Another solution that uses a Perl-compatible regular expressions (PCRE) with recursion, heavily inspired by this answer:
grep -P '^(?:([a-z])(?=[a-z]*(\1(?(2)\2))$))++[a-z]?\2?$' /usr/share/dict/words

Grep $value `grep $value2 `<command>`` - Nested grep?

I'm a complete noob at awk/sed so forgive me if I'm missing something obvious here.
Basically I'm trying to do a nested grep, i.e. something akin to:
grep $value `exim -Mvh $(`exim -bpru | grep $eximID | more`)`
Breakdown:
grep $value IN COMMAND
--> exim -Mvh (print exim mail headers) FROM RESULTS OF
---> exim -bpru | grep $eximID | more
$value is the string I'm looking for
$eximID is the string I'm looking for within exim -bpru (list all exim thingies)
No idea if what I'm trying to accomplish would be easier with awk/sed hence the question really.
I tried to make that as legible as possible but nested nesting is hard yo
Edit
Tada! My script is now workings thanks to you guys! Here it is, unfinished, but working:
#!/usr/bin/bash
echo "Enter the email address you want to search for + compare sender info via exim IDs."
read searchTarget
echo "Enter the target domain the email is coming from."
read searchDomain
#domanList is array for list of exim IDs needed
domainList=($(exim -bpru | grep "$searchDomain" | awk '{ print $3 }'))
for i in "${domainList[#]}"
do
echo "$(exim -Mvh $i | grep $searchTarget)"
#echo "$(grep $searchTarget $(exim -Mvh $i))"
done
grep $value `exim -Mvh $(`exim -bpru | grep $eximID | more`)`
This isn't right. The backticks (`command`) and $(command) do the same thing, it's just an alternative syntax. The advantage of using $() is that it's better nestable, so it's a good habit to always use that.
So, let's fix this, we now end up with:
grep "$value" "$(exim -Mvh "$(exim -bpru | grep "$eximID")")" | more
I relocated the more command, for what I think will be obvious reasons. more just paginates data for the user, feeding the output of more to something else almost never makes sense.
I've also quoted the variables, this is also a good habit, because otherwise things will break when there are certain characters in your variable (most common is the a space).
I can't test if this gives you the output you want, if it doesn't, then update your answer with a few lines of example data, and the expected output.
If you're going to do it with back-quotes (not recommended; it is hard work), then you have to write:
grep $value `exim -Mvh $(\`exim -bpru | grep $eximID\`)`
(where I've removed the more since when used like that it behaves like cat and there's no point in using cat at the end of the commands like that either).
It would be more sane to use the $(…) notation throughout:
grep $value $(exim -Mvh $( $(exim -bpru | grep $eximID)))
And it seems more plausible that you don't need quite that many sets of indirection and this is what you're really after:
grep $value $(exim -Mvh $(exim -bpru | grep $eximID))
You should look at:
Why didn't back quotes in a shell script help me cd to a directory?
What is the benefit of using $(…) instead of back ticks in shell scripts?
Why does \$ reduce to $ inside backquotes [though not inside $(…)]?
and no doubt there are other related questions too.

How to calculate a hash for a string (url) in bash for wget caching

I'm building a little tool that will download files using wget, reading the urls from different files. The same url may be present in different files; the url may even be present in one file several times. It would be inefficient to download a page several times (every time its url found in the list(s)).
Thus, the simple approach is to save the downloaded file and to instruct wget not to download it again if it is already there.
That would be very straightforward; however the urls are very long (many many GET parameters) and therefore cannot be used as such for filenames (wget gives the error 'Cannot write to... [] file name too long').
So, I need to rename the downloaded files. But for the caching mechanism to work, the renaming scheme needs to implement "one url <=> one name": if a given url can have multiple names, the caching does not work (ie, if I simply number the files in the order they are found, I won't let wget identify which urls have already been downloaded).
The simplest renaming scheme would be to calculate an md5 hash of the filename (and not of the file itself, which is what md5sum does); that would ensure the filename is unique and that a given url results in always the same name.
It's possible to do this in Perl, etc., but can it be done directly in bash or using a system utility (RedHat)?
Sounds like you want the md5sum system utility.
URLMD5=`/bin/echo $URL | /usr/bin/md5sum | /bin/cut -f1 -d" "`
If you want to only create the hash on the filename, you can get that quickly with sed:
FILENAME=`echo $URL | /bin/sed -e 's#.*/##'`
URLMD5=`/bin/echo $FILENAME | /usr/bin/md5sum | /bin/cut -f1 -d" "`
Note that, depending on your distribution, the path to cut may be /usr/bin/cut.
Other options on my Ubuntu (Precise) box:
echo -n $STRING | sha512sum
echo -n $STRING | sha256sum
echo -n $STRING | sha224sum
echo -n $STRING | sha384sum
echo -n $STRING | sha1sum
echo -n $STRING | shasum
Other options on my Mac:
echo -n $STRING | shasum -a 512
echo -n $STRING | shasum -a 256
etc.
I don't have the rep to comment on the answer, but there's one clarification to Epsilon Prime's answer: by default, echo will print a newline at the end of the text. If you want the md5 sums to match up with what will be generated by any other tool (eg php, Java's md5, etc) you need to call
echo -n "$url"
which will suppress the newline.
Newer versions of Bash provide an associative array, as well as an indexed array. Something like this might work for you:
declare -A myarray
myarray["url1"]="url1_content"
myarray["url2"]=""
if [ ! -z ${myarray["url1"]} ] ; then
echo "Cached";
fi
wget will typically rename the files with a filename.html.1, .2, etc., so you could use the associative array to store a list of which one has been downloaded and what the actual filename was.

Resources