generating random string zsh - random

I am trying to generate a random string with the following code:
for pic in `ls *.jpg`; do
rdn=`echo $RANDOM | sha256sum | cut -d" " -f1`
mv "$pic" ${rnd}.jpg
done
This part of the script runs from within a directory containing lots of jpeg files and it should randomize their filenames. The problem is that the $RANDOM variable does not update during the iteration, and therefore gives the same hash every time. I tried to use /dev/urandom, and it works, but is a lot slower than $RANDOM. What can I do to "regenerate" $RANDOM every time it is read?

On my mac (using macOS High Sierra), the /dev/urandom gives me binary bytes, so the above solution results in tr: Illegal byte sequence, so I used base64 to convert bytes to characters:
cat /dev/urandom | base64 | tr -dc '0-9a-zA-Z' | head -c100
or I found a solution without base64 so you can get punctuation as well:
cat /dev/urandom | LC_ALL=C tr -dc '\''[:alnum:]\!\#\#$\-\.\,'\'' | head -c40

You can do this more simply just using cat, tr and head. For example:
cat /dev/urandom | tr -dc '0-9a-zA-Z' | head -c100
The tr command in this pipeline will delete any character from stdin which does not match the specified character set. The head will print the first 100 characters and then exit, terminating the overall command.
This will generate a 100 character string containing alphanumeric characters. To make this into a rename you just need to use command substitution:
for file in *.jpg
mv -n ${file} $(cat /dev/urandom | tr -dc '0-9a-zA-Z' | head -c100).jpg
In zsh a for loop with a single statement does not need to be surrounded with do or done. The -n flag to mv will prevent it from overwriting an existing file - just in case you get really unlucky with the random strings.

for pic in *.jpg; do # Iterate over the jpgs in the current directory.
dd if=/dev/urandom count=1 2>/dev/null | sha256sum | ( # Gather 512 bytes from /dev/urandom
read rnd _ # Read the first "word" in the sha256sum output
mv "$pic" ${rnd}.jpg # rename the jpg.
)
done
Piping to read causes an implicit subshell, so I create an explicit subshell to guarantee I can still access the rnd parameter. Also, don't parse ls
By the way, are you sure you don't just want to base64 the output? It's cheaper than sha256sum by far and I don't see what you're getting out of sha256sum. Plus it'd make the code easier to read:
for pic in *.jpg; do
mv "$pic" "$(base64 </dev/urandom | tr -dc 'a-zA-Z0-9' | head -c20).jpg"
done

Related

Counting number of different words in a txt file in Bash

Well, I do not know much about programming at bash, I'm new at it so I'm struggling to find a code to iterate all the lines in a txt file, and count how many words are different.
Example: If a txt file has "Nory was a Catholic because her mother was a Catholic"
So the result must be 7
$ grep -o '[^[:space:]]*' file | sort -u | wc -l
7
Sure. I assume you are ok with defining "words" as things that are separated by space? In which case, try something like this:
cat filename | sed -r -e "s/[ ]+/ /g" -e "s/ /\n/g" | sort -u | wc -l
This command says:
Dump contents of filename
Replace multiple spaces with a single space
Replace spaces with newline
Sort and "uniquify" the list
Print out the count of lines
Per the comment, you can technically get away without using cat if you'd like, with the following:
sed -r -e "s/[ ]+/ /g" -e "s/ /\n/g" filename | sort -u | wc -l
Further, from another comment, you could optionally use tr (importantly with it's -s flag to handle repeated spaces) instead of sed with something like:
tr -s " " "\n" < filename | sort -u | wc -l
The moral of the story is there are several ways this kind of thing can be accomplished, not to mention the other full answers that are given here :-) My personal favorite answer at this point is Ed Morton's which I've upvoted accordingly.
You could also lowercase the text so words compares regardless of casing.
Also filter words with the [:alnum:] character class, rather than [a-zA-Z0-9_] that is only valid for US-ASCII, and will fail dramatically with Greek or Turkish.
#!/usr/bin/env bash
echo "The uniq words are the words that appears at least once, regardless of casing." |
# Turn text to lowercase
tr '[:upper:]' '[:lower:]' |
# Split alphanumeric with newlines
tr -sc '[:alnum:]' '\n' |
# Sort uniq words
sort -u |
# Count lines of unique words
wc -l
I would do it like so, with comments:
echo "Nory was a Catholic because her mother was a Catholic" |
# tr replace
# -s - squeeze
# -c - complementary
# [a-zA-Z0-9_] - all letters, number and underscore
# but complementary set, so all non letters, not numbers and not underscores.
# replace them by newline
tr -sc '[a-zA-Z0-9_]' '\n' |
# and sort unique and display count
sort -u | wc -l
Tested on repl bash.
Decided to use [a-zA-Z0-9_], because this is how GNU sed \w extension matches a word.
cat yourfile.txt | xargs -n1 | sort | uniq -c > youroutputfile.txt
xargs -n1 = put one word per line
sort = sorts
uniq -c = counts occurrences of distinct values
source

Any idea why sort utility gives me incorrect results?

EDIT:
To be clear, we got our STDOUT from a for loop that goes something like this
for (( i=1; i<="$FILE_AMOUNT"; i++ )); do
MY_FILE=`find $DIR -type f | head -$i | tail -1`
FILE_TYPE=`file -b "$MY_FILE"
FILE_TYPE_COUNT=`echo $FILE_TYPE" | sort | uniq -c`
echo "$FILE_TYPE_COUNT"
done
Hence our STDOUT is basically output from file utility printed one by one, instead of it actualling being set of strings we can copy - which is likely the core behind all of the issues
`
So there's a pickle i absolutely can't wrap my head around.
Basically i'm creating a shellscript that will print out various filetypes we have in our directory. It pretty much works, however, for some odd reason when i try to use uniq on my output, it doesnt work. This is my output
POSIX shell script, ASCII text executable
ASCII text
Bourne-Again shell script, ASCII text executable
UTF-8 Unicode text, with overstriking
Bourne-Again shell script, ASCII text executable
Seems fairly self-explanatory, however when I use
FILE_TYPE_COUNT=`echo "$FILE_TYPE" | sort | uniq -c`
this is the result it prints
1 POSIX shell script, ASCII text executable
1 ASCII text
1 Bourne-Again shell script, ASCII text executable
1 UTF-8 Unicode text, with overstriking
1 Bourne-Again shell script, ASCII text executable
Obviously it should be
1 POSIX shell script, ASCII text executable
1 ASCII text
2 Bourne-Again shell script, ASCII text executable
1 UTF-8 Unicode text, with overstriking
Any idea what I'm doing wrong?
Obviously uniq thinks the lines aren't different, but that's what I assume is at fault of sort, because it cant sort my STDOUT. So any clue how to sort the list properly ALPHABETICALlY?
Your approach seem overly complicated, try this:
find $DIR -type f -exec file -b -- {} \; | sort | uniq -c
If you'r not familiar with -exec, it executes the given command, in our case file -b -- {}, once per file. The place holder {} is replaced with the path to the file currently being processed.
Why you approach doesn't work:
You do this echo $FILE_TYPE" | sort | uniq -c within the for loop, $FILE_TYPE contains only the file type of one file at that point. You need to move the sort | uniq -c out of the loop.
I adjusted your code so it works:
declare -a TYPES=()
for (( i=1; i<="$FILE_AMOUNT"; i++ )); do
MY_FILE=`find a/ -type f | head -$i | tail -1`
FILE_TYPE=`file -b "$MY_FILE"`
TYPES+=("$FILE_TYPE") # add type of current file to TYPES array
done
# TYPES now contains the types of all files and we can now count them
printf "%s\n" "${TYPES[#]}" | sort | uniq -c
The issue you are seeing is because you are sorting a set of one item, for every iteration of the loop.
You'd need to sort the whole output of the loop instead.
Your (syntactically fixed) script:
for (( i=1; i<="$FILE_AMOUNT"; i++ )); do
MY_FILE=`find $DIR -type f | head -$i | tail -1`
FILE_TYPE=`file -b "$MY_FILE"`
FILE_TYPE_COUNT=`echo "$FILE_TYPE" | sort | uniq -c`
echo "$FILE_TYPE_COUNT"
done
Mofified to work properly:
for (( i=1; i<="$FILE_AMOUNT"; i++ )); do
MY_FILE=`find $DIR -type f | head -$i | tail -1`
file -b "$MY_FILE"
done | sort | uniq -c
Optimised once:
for FILE in $(find $DIR -type f); do
file -b "$FILE"
done | sort | uniq -c
Optimised twice (See #P. Gerber's Answer):
find $DIR -type f -exec file -b -- {} \; | sort | uniq -c
Your original script is horrifically inefficient.
Notes on efficiency & operation:
${FILE_AMOUNT} has to be correct to iterate over the whole dataset
You are running find, which returns the whole dataset and then discarding everything that you're not interested in, every iteration
You are running sort and uniq, on every iteration, on a dataset of size one
As you are constantly re-computing your dataset, if it changes half way through your script's execution (e.g: file / directory is created / deleted), then your results will become invalid
Remember that every time you start a new program, you pay a performance penalty - this is exacerbated by the fact that you are continually computing your dataset and then discarding "everything that you don't want"
In addition to the other good solutions here, be sure to understand the sorting rule set that you are using. To inspect your current sorting rule, you can do:
echo anything | sort --debug
to see your results with annotations. Consider:
echo -e "a 2\na1" | sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
a1
__
a 2
___
Note that the rule set is sorting with perhaps an unexpected result. If you're looking for a simple byte comparison, then be sure to set LC_ALL=C as in:
LC_ALL=C sort
For example:
echo -e "a 2\na1" | LC_ALL=C sort --debug
sort: using simple byte comparison
a 2
___
a1
__
The use of LC_ALL is important in getting the results you expect. Lastly, run the locale command and read the man page to get locale-specific information.

Get a specific number of char from /dev/urandom?

I'm trying to get past the "lack" of a good terminal tool on OS X to generate passwords.
This script is a mix from various sources and already done debug from my side (here, here, here and here too)
pass=`LC_CTYPE=C < /dev/urandom tr -cd [:graph:] | tr -d '\n' | head -c 32`
echo $pass
I opted for the head -c 32 method instead of the fold -w 32 method because head seems more clear to me after reading their corresponding man page, and because when I use fold, the script just "hangs" (never stop loading).
I did clearly understood the -c flag of head is for a number of bytes and not characters and that is why I sometimes have more than 32 characters in my resulting password.
The question being, is there a way to replace head to have 32 characters? (I would accept using fold, of course if you explain me why it does not work and rephrase the use of fold -w so I can understand it clearer than with the description coming from the man).
Example of problematic return :
^[[A]S9,Bx8>c:fG)xB??<msm^Ot|?C()bd] # 36 characters, sadly
One option would be just to use fold and head together:
pass=$(LC_CTYPE=C < /dev/urandom tr -cd [:graph:] | tr -d '\n' | fold -w 32 | head -n 1)
fold keeps the line length 32 characters, while head exits after capturing the first line. I've updated your backticks to the more modern $( ) syntax as a bonus.
fold -w 32 will insert newlines in the output after every 32 characters of input. As long as there continues to be any input, it will continue to produce output. head -n 1 will exit after one line (using the newlines, closing the pipe and causing all of the commands to terminate.
maybe use dd:
echo "$(LC_CTYPE=C < /dev/urandom tr -cd [:graph:] | tr -d '\n' | dd count=1 bs=32 status=none )"
On Linux, the tr -d '\n' seem useless, and the man says
[:graph:]
all printable characters, not including space
But I don't know if OSX's tr has the same behaviour

Bash sha1 with hex input

I found this solution to build hash-values:
echo -n wicked | shasum | awk '{print $1}'
But this works only with string input. I don't know how to hanlde input as hex, for example if i want to build sha1-value of sha1-value.
upd: I just found out there is option -b for shasum but it produces wrong output. Does it expect bytes with reversed endianness?
upd2: for example: I do the following input:
echo -n 9e38cc8bf3cb7c147302f3e620528002e9dcae82 | shasum -b | awk '{print $1}'
The output is bed846bb1621d915d08eb1df257c2274953b1ad9 but according to the hash calculator the ouput should be 9d371d148d9c13050057105296c32a1368821717
upd3: the -b option seems not to work at all. There is no difference whether I apply this parameter or not, i get the same result.
upd4: the whole script lookes as follows. It doesn't work because the null-byte gets removed as i either assign or concatenate .
password="wicked"
scrumble="4d~k|OS7T%YqMkR;pA6("
stage1_hash=$(echo -n $password| shasum | awk '{print $1}')
stage2_hash=$(echo $(echo -n $stage1_hash | xxd -r -p | shasum | awk '{print $1}') | xxd -r -p)
token=$(./xor.sh $(echo -n $scrumble$(echo 9d371d148d9c13050057105296c32a1368821717 | xxd -r -p) | shasum | awk '{print $1}') $stage1_hash)
echo $token
You can use xxd -r -p to convert hexadecimal to binary:
echo -n 9e38cc8bf3cb7c147302f3e620528002e9dcae82 | xxd -r -p | shasum -b | awk '{print $1}'
Note that the output of this is 9d371d148d9c13050057105296c32a1368821717; this matches what I get from hashing 9e38cc8bf3cb7c147302f3e620528002e9dcae82 using hash calculator. It appears that the value you got from bash calculator was a results of a copy-paste error, specifically leaving off the final "2" in the hex string.
UPDATE: I'm not sure exactly what the entire script is supposed to do, but I can point out several problems with it:
Shell variables, command arguments, and c strings in general cannot contain null bytes. There are also situations where trailing linefeeds get trimmed, and IIRC some early versions of bash couldn't handle delete characters (hex 7F)... Basically, don't try to store binary data (as in stage2_hash) or pass it as arguments (as in ./xor.sh) in the shell. Pipes, on the other hand, can pass raw binary just fine. So store it in hex, then convert to binary with xxd -r -p and pipe it directly to its destination.
When you expand a shell variable ($password) or use a command substitution ($(somecommand)) without wrapping it in double-quotes, the shell does some additional parsing on it (things like turning spaces into word breaks, expanding wildcards to lists of matching filenames, etc). This is almost never what you want, so always wrap things like variable references in double-quotes.
Don't use echo for anything nontrivial and expect it to behave consistently. Depending on which version of echo you have and/or what the password is, echo -n "$password" might print the password without a linefeed after it, or might print it with "-n " before it and a linefeed after, might do something with any backslash sequences in the password, or (if the password starts with "-") interpret the password itself as more options to the echo command. Use printf "%s" "$password" instead.
Don't use echo $(somecommand) (or even printf "%s" "$(somecommand)"). The echo and $() are mostly canceling each other here, but creating opportunities for problems in between. Just use the command directly.
Clean those up, and if it doesn't work after the cleanup try posting a separate question.
openssl command may help you. see HMAC-SHA1 in bash
like:
echo -n wicked | openssl dgst -sha1

Average word length of input file

If i use
wc -m filename
it will generate the number of characters
and
wc -w filename
will generate number of words
if i used this info by dividing number of characters/number of words
it will give me misleading result as number of character will include spaces and punctuation
any advice ?
the solution that I came up with without writing a script was to pipe it through a couple of commands like this.
<filename tr -d " \t\n\r\.\?\!" | wc -m
This works to remove all of the spacing, like new line, tabs and normal spacing. A more rigorous tr command that included any sort of other punctuation like a colon can just be added to the list for example \:
Hope That Helps
Subtract out characters you do not want
chars=$(tr -dc '[:alnum:]' < filename | wc -c)
words=$(cat filename | wc -c)
Now do you calculation. I piped into wc to avoid the extra "filename" in output
printf "%.2f" $(echo "$chars/$words" | bc -l)
Edit: thanks BMW

Resources