Get a specific number of char from /dev/urandom? - bash

I'm trying to get past the "lack" of a good terminal tool on OS X to generate passwords.
This script is a mix from various sources and already done debug from my side (here, here, here and here too)
pass=`LC_CTYPE=C < /dev/urandom tr -cd [:graph:] | tr -d '\n' | head -c 32`
echo $pass
I opted for the head -c 32 method instead of the fold -w 32 method because head seems more clear to me after reading their corresponding man page, and because when I use fold, the script just "hangs" (never stop loading).
I did clearly understood the -c flag of head is for a number of bytes and not characters and that is why I sometimes have more than 32 characters in my resulting password.
The question being, is there a way to replace head to have 32 characters? (I would accept using fold, of course if you explain me why it does not work and rephrase the use of fold -w so I can understand it clearer than with the description coming from the man).
Example of problematic return :
^[[A]S9,Bx8>c:fG)xB??<msm^Ot|?C()bd] # 36 characters, sadly

One option would be just to use fold and head together:
pass=$(LC_CTYPE=C < /dev/urandom tr -cd [:graph:] | tr -d '\n' | fold -w 32 | head -n 1)
fold keeps the line length 32 characters, while head exits after capturing the first line. I've updated your backticks to the more modern $( ) syntax as a bonus.
fold -w 32 will insert newlines in the output after every 32 characters of input. As long as there continues to be any input, it will continue to produce output. head -n 1 will exit after one line (using the newlines, closing the pipe and causing all of the commands to terminate.

maybe use dd:
echo "$(LC_CTYPE=C < /dev/urandom tr -cd [:graph:] | tr -d '\n' | dd count=1 bs=32 status=none )"
On Linux, the tr -d '\n' seem useless, and the man says
[:graph:]
all printable characters, not including space
But I don't know if OSX's tr has the same behaviour

Related

Counting number of different words in a txt file in Bash

Well, I do not know much about programming at bash, I'm new at it so I'm struggling to find a code to iterate all the lines in a txt file, and count how many words are different.
Example: If a txt file has "Nory was a Catholic because her mother was a Catholic"
So the result must be 7
$ grep -o '[^[:space:]]*' file | sort -u | wc -l
7
Sure. I assume you are ok with defining "words" as things that are separated by space? In which case, try something like this:
cat filename | sed -r -e "s/[ ]+/ /g" -e "s/ /\n/g" | sort -u | wc -l
This command says:
Dump contents of filename
Replace multiple spaces with a single space
Replace spaces with newline
Sort and "uniquify" the list
Print out the count of lines
Per the comment, you can technically get away without using cat if you'd like, with the following:
sed -r -e "s/[ ]+/ /g" -e "s/ /\n/g" filename | sort -u | wc -l
Further, from another comment, you could optionally use tr (importantly with it's -s flag to handle repeated spaces) instead of sed with something like:
tr -s " " "\n" < filename | sort -u | wc -l
The moral of the story is there are several ways this kind of thing can be accomplished, not to mention the other full answers that are given here :-) My personal favorite answer at this point is Ed Morton's which I've upvoted accordingly.
You could also lowercase the text so words compares regardless of casing.
Also filter words with the [:alnum:] character class, rather than [a-zA-Z0-9_] that is only valid for US-ASCII, and will fail dramatically with Greek or Turkish.
#!/usr/bin/env bash
echo "The uniq words are the words that appears at least once, regardless of casing." |
# Turn text to lowercase
tr '[:upper:]' '[:lower:]' |
# Split alphanumeric with newlines
tr -sc '[:alnum:]' '\n' |
# Sort uniq words
sort -u |
# Count lines of unique words
wc -l
I would do it like so, with comments:
echo "Nory was a Catholic because her mother was a Catholic" |
# tr replace
# -s - squeeze
# -c - complementary
# [a-zA-Z0-9_] - all letters, number and underscore
# but complementary set, so all non letters, not numbers and not underscores.
# replace them by newline
tr -sc '[a-zA-Z0-9_]' '\n' |
# and sort unique and display count
sort -u | wc -l
Tested on repl bash.
Decided to use [a-zA-Z0-9_], because this is how GNU sed \w extension matches a word.
cat yourfile.txt | xargs -n1 | sort | uniq -c > youroutputfile.txt
xargs -n1 = put one word per line
sort = sorts
uniq -c = counts occurrences of distinct values
source

Asterisk in bash variable

I've a file that contains info that I'm retrieving such way
Command
cat 2018_02_15_09_01_08_result.tsv | grep -o [A-Z]\\*[0-9]*:[0-9]* | sort | uniq | sed -e 's/^/HLA-/' |tr '\n' ',' | sed '$ s/.$//'
Output
HLA-A*30:02,HLA-B*18:01,HLA-C*05:01
But I'm trying to save this in variable, the asterisk and a letter disappears, I've tried several ways, adding/removing commas etc and I'm yet not able to print it properly.
hla=`cat 2018_02_15_09_01_08_result.tsv | grep -o [A-Z]\\*[0-9]*:[0-9]* | sort | uniq | sed -e 's/^/HLA-/' |tr '\n' ',' | sed '$ s/.$//'`
echo $hla
HLA-05:01,HLA-18:01,HLA-30:02
echo "$hla"
HLA-05:01,HLA-18:01,HLA-30:02
There are multiple errors here, most of which will be aptly diagnosed by http://shellcheck.net/ without any human intervention.
You really should single-quote your regular expressions unless you specifically require the shell to perform wildcard expansion and whitespace tokenization on the regex before executing the command.
The obsolescent `command` in backticks introduces some unfortunate additional shell handling on the string inside the backticks. The solution since the 1990s is to prefer the $(command) syntax for command substitution, which does not exhibit this problem.
The cat is useless; grep knows full well how to read a file.
Try this refactored code:
hla=$(grep -o '[A-Z]*[0-9]*:[0-9]*' 2018_02_15_09_01_08_result.tsv |
sort -u | sed -e 's/^/HLA-/' |tr '\n' ',' | sed '$ s/.$//')
echo "$hla"
The double quotes around the variable interpolation in the echo are necessary and useful; notice also the line wraps for legibility and the use of sort -u in preference over sort | uniq (and generally try to reduce the number of processes -- once I understand what the sed | tr | sed does I can probably propose a simplification for that, too). Perhaps the simplest fix would be to refactor all of this into a single Awk script, but without access to the input, it's hard to tell you in more detail what that might look like.
(Also, are you really sure you need to capture the value to a variable? Often variable=value; echo "$variable" is just an obscure and inefficient way to say echo "value". And variable=$(command); echo "$variable" is better written simply command and capturing the command's standard output just so you can print it to standard output is a pure waste of cycles, unless you are planning to do something more with that variable's value.)
I've solved it by saving the output of the command with a redirection:
cat 2018_02_15_09_01_08_result.tsv |
grep -o [A-Z]\\*[0-9]*:[0-9]* |
sort | uniq |
sed -e 's/^/HLA-/' |tr '\n' ',' | sed '$ s/.$//' > out_file
hla=`cat out_file`
echo $hla
which gets me the expected HLA-A*30:02,HLA-B*18:01,HLA-C*05:01. Not the ideal solution, but it works.

Average word length of input file

If i use
wc -m filename
it will generate the number of characters
and
wc -w filename
will generate number of words
if i used this info by dividing number of characters/number of words
it will give me misleading result as number of character will include spaces and punctuation
any advice ?
the solution that I came up with without writing a script was to pipe it through a couple of commands like this.
<filename tr -d " \t\n\r\.\?\!" | wc -m
This works to remove all of the spacing, like new line, tabs and normal spacing. A more rigorous tr command that included any sort of other punctuation like a colon can just be added to the list for example \:
Hope That Helps
Subtract out characters you do not want
chars=$(tr -dc '[:alnum:]' < filename | wc -c)
words=$(cat filename | wc -c)
Now do you calculation. I piped into wc to avoid the extra "filename" in output
printf "%.2f" $(echo "$chars/$words" | bc -l)
Edit: thanks BMW

generating random string zsh

I am trying to generate a random string with the following code:
for pic in `ls *.jpg`; do
rdn=`echo $RANDOM | sha256sum | cut -d" " -f1`
mv "$pic" ${rnd}.jpg
done
This part of the script runs from within a directory containing lots of jpeg files and it should randomize their filenames. The problem is that the $RANDOM variable does not update during the iteration, and therefore gives the same hash every time. I tried to use /dev/urandom, and it works, but is a lot slower than $RANDOM. What can I do to "regenerate" $RANDOM every time it is read?
On my mac (using macOS High Sierra), the /dev/urandom gives me binary bytes, so the above solution results in tr: Illegal byte sequence, so I used base64 to convert bytes to characters:
cat /dev/urandom | base64 | tr -dc '0-9a-zA-Z' | head -c100
or I found a solution without base64 so you can get punctuation as well:
cat /dev/urandom | LC_ALL=C tr -dc '\''[:alnum:]\!\#\#$\-\.\,'\'' | head -c40
You can do this more simply just using cat, tr and head. For example:
cat /dev/urandom | tr -dc '0-9a-zA-Z' | head -c100
The tr command in this pipeline will delete any character from stdin which does not match the specified character set. The head will print the first 100 characters and then exit, terminating the overall command.
This will generate a 100 character string containing alphanumeric characters. To make this into a rename you just need to use command substitution:
for file in *.jpg
mv -n ${file} $(cat /dev/urandom | tr -dc '0-9a-zA-Z' | head -c100).jpg
In zsh a for loop with a single statement does not need to be surrounded with do or done. The -n flag to mv will prevent it from overwriting an existing file - just in case you get really unlucky with the random strings.
for pic in *.jpg; do # Iterate over the jpgs in the current directory.
dd if=/dev/urandom count=1 2>/dev/null | sha256sum | ( # Gather 512 bytes from /dev/urandom
read rnd _ # Read the first "word" in the sha256sum output
mv "$pic" ${rnd}.jpg # rename the jpg.
)
done
Piping to read causes an implicit subshell, so I create an explicit subshell to guarantee I can still access the rnd parameter. Also, don't parse ls
By the way, are you sure you don't just want to base64 the output? It's cheaper than sha256sum by far and I don't see what you're getting out of sha256sum. Plus it'd make the code easier to read:
for pic in *.jpg; do
mv "$pic" "$(base64 </dev/urandom | tr -dc 'a-zA-Z0-9' | head -c20).jpg"
done

How to remove the last character from a bash grep output

COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2`
outputs something like this
"Abc Inc";
What I want to do is I want to remove the trailing ";" as well. How can i do that? I am a beginner to bash. Any thoughts or suggestions would be helpful.
This will remove the last character contained in your COMPANY_NAME var regardless if it is or not a semicolon:
echo "$COMPANY_NAME" | rev | cut -c 2- | rev
I'd use sed 's/;$//'. eg:
COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2 | sed 's/;$//'`
foo="hello world"
echo ${foo%?}
hello worl
I'd use head --bytes -1, or head -c-1 for short.
COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2 | head --bytes -1`
head outputs only the beginning of a stream or file. Typically it counts lines, but it can be made to count characters/bytes instead. head --bytes 10 will output the first ten characters, but head --bytes -10 will output everything except the last ten.
NB: you may have issues if the final character is multi-byte, but a semi-colon isn't
I'd recommend this solution over sed or cut because
It's exactly what head was designed to do, thus less command-line options and an easier-to-read command
It saves you having to think about regular expressions, which are cool/powerful but often overkill
It saves your machine having to think about regular expressions, so will be imperceptibly faster
I believe the cleanest way to strip a single character from a string with bash is:
echo ${COMPANY_NAME:: -1}
but I haven't been able to embed the grep piece within the curly braces, so your particular task becomes a two-liner:
COMPANY_NAME=$(grep "company_name" file.txt); COMPANY_NAME=${COMPANY_NAME:: -1}
This will strip any character, semicolon or not, but can get rid of the semicolon specifically, too.
To remove ALL semicolons, wherever they may fall:
echo ${COMPANY_NAME/;/}
To remove only a semicolon at the end:
echo ${COMPANY_NAME%;}
Or, to remove multiple semicolons from the end:
echo ${COMPANY_NAME%%;}
For great detail and more on this approach, The Linux Documentation Project covers a lot of ground at http://tldp.org/LDP/abs/html/string-manipulation.html
Using sed, if you don't know what the last character actually is:
$ grep company_name file.txt | cut -d '=' -f2 | sed 's/.$//'
"Abc Inc"
Don't abuse cats. Did you know that grep can read files, too?
The canonical approach would be this:
grep "company_name" file.txt | cut -d '=' -f 2 | sed -e 's/;$//'
the smarter approach would use a single perl or awk statement, which can do filter and different transformations at once. For example something like this:
COMPANY_NAME=$( perl -ne '/company_name=(.*);/ && print $1' file.txt )
don't have to chain so many tools. Just one awk command does the job
COMPANY_NAME=$(awk -F"=" '/company_name/{gsub(/;$/,"",$2) ;print $2}' file.txt)
In Bash using only one external utility:
IFS='= ' read -r discard COMPANY_NAME <<< $(grep "company_name" file.txt)
COMPANY_NAME=${COMPANY_NAME/%?}
Assuming the quotation marks are actually part of the output, couldn't you just use the -o switch to return everything between the quote marks?
COMPANY_NAME="\"ABC Inc\";" | echo $COMPANY_NAME | grep -o "\"*.*\""
you can strip the beginnings and ends of a string by N characters using this bash construct, as someone said already
$ fred=abcdefg.rpm
$ echo ${fred:1:-4}
bcdefg
HOWEVER, this is not supported in older versions of bash.. as I discovered just now writing a script for a Red hat EL6 install process. This is the sole reason for posting here.
A hacky way to achieve this is to use sed with extended regex like this:
$ fred=abcdefg.rpm
$ echo $fred | sed -re 's/^.(.*)....$/\1/g'
bcdefg
Some refinements to answer above. To remove more than one char you add multiple question marks. For example, to remove last two chars from variable $SRC_IP_MSG, you can use:
SRC_IP_MSG=${SRC_IP_MSG%??}
cat file.txt | grep "company_name" | cut -d '=' -f 2 | cut -d ';' -f 1
I am not finding that sed 's/;$//' works. It doesn't trim anything, though I'm wondering whether it's because the character I'm trying to trim off happens to be a "$". What does work for me is sed 's/.\{1\}$//'.

Resources