How to create a script that takes a string and converts it to another managed string? - macos

My intent is to capture the values of a string that I type and have those values be shifted to other letters. Essentially it would be a fake translation program or custom cipher generation script. Example of function:
I would type the sentence:
Who are you?
and the output would be shifted by lets say 1 to the next consonant or vowel, for example. The script would also need to know how to skip vowels or consonants as needed, and for the sake of argument y would always be considered a vowel. So the output would be:
Xju eso auy?
This is something I wanted to attempt for a creative writing project as a means of making another language. Ideally the shift variable could be an input as well to work with to find the best outcome. Possibly even variable shifts for vowels and consonants at the same time?

If you truly are doing this for a creative writing project, then I submit that diving deep into the programming is not warranted. None of the input transformations you described require decisions to be made by the program. That is; once an encoding is chosen, the incoming letters will be each be firmly associated with outgoing letters. This greatly expands your options for how to achieve this, and greatly simplifies the complexity of the task.
Since you tagged Terminal, here are a couple commands you could use in action:
echo "Who are you?" | perl -pe 'tr/N-ZA-Mn-za-m/A-Za-z/'
outputs: Jub ner lbh?
This is the famous Rot13 "encoding" (all it does is substitute the letter that is 13 later in the alphabet). It's particularly handy as 13 is half the alphabet's 26, so putting some "encoded" text in will give you back the original text:
echo "Jub ner lbh?" | perl -pe 'tr/N-ZA-Mn-za-m/A-Za-z/'
outputs: Who are you?
echo just sends text to the screen or other commands. Here we echo our text "How are you?" into a pipe | to pass it to the next command perl, which is a very powerful and flexible text-manipulation and reporting program. The rest of the line is just instructions for perl on how to spin 13 letters later in the alphabet.
Quick note; normally hitting return runs the command in terminal. You can put a backslash \ at the end of a line though and hit return, it will then let you keep typing on the next line but treat it all as one command. Handy for lining things up.
echo "How are you?" | tr \
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' \
'DFVBTXEUWZOSHCJMAQYRINKLPGdfvbtxeuwzoshcjmaqyrinklpg'
outputs: Ujk dqt pji?
There's another command, tr. This example demonstrates an arbitrary substitution—in this case, random. It looks through that first long set of letters, and swaps in instead the letter in the second long set that is in the matching position. Since this substitution example is random, you could use this kind of mapping to create "Cryptogram" puzzles.
The great thing about the tr command is that you can tell it to use whatever input-to-output "mapping" you'd like. Sure, it's a bit manual, but hey—no programming needed!
Here's the mapping to achieve your requested "consonants and vowels" example shift:
echo "Who are you?" | tr \
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' \
'ECDFIGHJOKLMNPUQRSTVYWXZABecdfighjoklmnpuqrstvywxzab'
Outputs: Xju esi auy? Not doing it by hand has its advantages—you missed a vowel in there.
So if you need to rapidly try different mappings, consider learning a bit more about perl (or simpler: sed. or more complex: awk. Or or or…). If, instead, you don't mind a bit of careful command-construction, just lining up each incoming letter with your desired output letter, I think tr would serve nicely.

Related

check for permutation in bash

I have a script wherein you have to input a string with a length greater then or equal to 1 and less then 26.
If that's not the case I want to return an error. But that's the part I have figured out
lengthAlphabetInput=${#1}
if [ $lengthAlphabetInput -lt 1 ] || [ $lengthAlphabetInput -gt 26 ]
then
echo "error: key needs to be between 1 and 26 characters"
exit 1
fi
Other than that I would like to check if the input the user gave is a permutation of (a part of) the alphabet.
For example if the user inputs "abc" I want to return an error "abc is
not a permutation of the alphabet"
if the user inputs "xxxgsdnoip" I again want to return the same error
because I don't want the user to use the same letter more than once.
But the input "xyz" or "jhcwslaedmviotrgzxkbynpuqf" would be correct
because these are permutations of the alphabet. (x instead of a, y
instead of b and z instead of c).
Can anyone help me transform this idea into code?
I realized that this is a question raised by a student, so I did not write down a detailed answer, since the experience of reading manual and figuring it out yourself will really help you learn how to use bash (actually the GNU/BSD core utilities), as said by #binaryzebra. What you should do is:
Learn to read manual in bash, with command man, such as man sort for the manual of sort utility. Hit Up/Down arrow key or PageUp/PageDown key to scroll; hit q to exit. Reading manual is your first step into Unix world. Sure you can skip this and find all the information from Google, but learning to read manual will do you more good in the long run.
Read the manual of sed and learn substitution with regular expression. The manual is a little too long for a newcomer, but luckily you do not need to read it all; just scan the manual and find the part about substitution; read the examples as well, if there is any. Practice with some test file. Now you know how to check whether input contains only letters (instead of whitespace, symbols, etc.), as well as how to split each character in its own line.
Read the manual of uniq. It has a much shorter manual; reading the whole manual won't take long.
Now learn the pipeline feature in bash. I cannot find a short and focused manual entry, so you may as well just read the online manual from GNU. With the help of pipeline, you can combine sed and uniq to detect duplicated characters.
By "permutation", it seems that you do not want the characters in their original order. If so, read the manual of the sort utility and think how it can help you.
You do not seem to care about whether all 26 letters are there. If this is the case, you probably do not need the wc (word count) utility, unless you require the subset of letters be continuous (such as "cdefg" instead of "cdhjk").
That's all the hints; good luck with your homework.
#!/usr/bin/perl
$_=shift;
print "not ok:repeated: $1\n" if/(.).*\1/;
my $i=0;
my #s= ( map { ord($_)-97 != $i++ ? ():($_)} split(''));
print "not ok:samePlace: #s\n" if #s;
usage:
$ perl ex.pl rty
$ perl ex.pl abc
not ok:samePlace: a b c
$ perl ex.pl ddss
not ok:repeated: d

Most reliable way to get text into ruby script

I have a ruby script that’ll do some text parsing (à lá markdown). It does it in a sequence of steps, like
string = string.gsub # more code here
string = string.gsub # more code here
# and so on
what is the best (i.e. most reliable) way to feed text into string in the first place? It’s a script, and the text it’ll be fed can vary a lot — it can be multilingual, have some characters that might trip a shell (like ", ', ’, &, $ you get the idea), and will likely be multi-line.
Is there some trick on the lines of
cat << EOF
bunch of text here
EOF
Additional considerations
I’m not looking for a markdown parser, this is something I want to do, not something I want a tool for.
I’m not a big ruby user (I’m starting to use it), so the more detailed the answer you can provide, the better.
It must be completely scriptable (i.e., no interrupting to ask the user for information).
The Kernel#gets method will read a string separated using the record separator from stdin or files specified on the command line. So if you use that you can do things like:
yourscript <filename #read from filename
yourscript file1 file2 # read both file1 and file2
yourscript #lets you type at your script
So to run something like:
cat <<'eof' |ruby yourscript.rb
This' & will $all 'eof' be 'fine'''
eof
Script might contain something like:
s = gets() # read a line
lines = readlines() # read all lines into an array
That's fairly standard for command-line scripts. If you want to have a user-interface then you'll want something more complex. There is an option to the Ruby interpreter to set the encoding of files as they are read.
Just read from stdin (which is an IO object):
$stdin.read
As you can see, stdin is provided in the global variable $stdin. Since it’s an IO object, there are a lot of other methods available if read doesn’t suit your needs.
Here’s a simple one-line example in the shell:
$ echo "foo\nbar" | ruby -e 'puts $stdin.read.upcase'
FOO
BAR
Obviously reading from stdin is extremely flexible since you can pipe input in from anywhere.
Ruby is very adept at encodings (see eg. Encoding docs). To get text into Ruby, one typically uses either gets, or reads File objects, or uses a GUI, which one can build with gtk2 gem or rugui (if already finished). In case you are getting texts from the wild internet, security should be your concern. Ruby used to have 4 $SAFE levels, but after some discussions, now there might be only 3 of them left. In any case, the best strategy to handle strings is to know as much as possible about the properties of the string that you expect in advance. Handling absolutely arbitrary strings is a surprisingly difficult task. Try to limit the number of possible encodings and figure the maximum size for the string that you expect.
Also, with respect to your original stated goal writing a markdown-processor-like something, you might want to not reinvent the wheel (unless it is for didactic purposes). There is this SO post:
Better ruby markdown interpreter?
The answer will direct you to kramdown gem, which gets a lot of praise, though I have not tried it personally.

Take token from this bash string/array...not sure which it is

Hi I am writing a bash script and I have a string
foo=1.0.3
What I want to do is examine the '3'. The first thing I did was get rid of the periods by doing this. bar=echo $foo|tr '.' ' ' with backticks around echo until the last single quote (not sure how to accomplish writing that.
When I do an echo $bar it prints 1 0 3. Now how do I create a variable that holds only the 3? thank you very much
As you are no doubt learning about bash, there are many many ways to achieve your goals. I think #Mat's answer using bar=${foo##*.} is the best so far, although he doesn't explain how or why it works. I strongly recommend you check out the bash tutorial on tldp, it is my goto source when I have questions like this. For string manipulation, there is a section there that discusses many of the different ways to go about this sort of thing.
For example, if you know that foo is always going to be 5 characters long, you can simply take the fifth character from it:
bar=${foo:4}
That is, make bar the fifth position of foo (remember, we start counting from zero, not from one).
If you know it is always going to be the last position of foo, then you can just count backwards:
bar=${foo: -1}
Notice there is a space between the -1 and the colon, you need that (or parenthesis) to escape the negative sign.
To explain #Mat's answer, I had to look at the link I provided above. Apparently the double pound signs (hash mark, octothorpe, whatever you want to call them) in the expression:
${string##substring}
Mean to delete longest match of $substring from front of $string. So you are looking for the longest match of *. which equates to everything before a dot. Pretty cool, huh?
This should work:
bar=$(echo $foo|cut -d. -f3)
If you know you only want the part after the last dot (not the third item in a .-separated list) you can also do this:
bar=${foo##*.}
Advantage: no extra process or subshell started.
One way: Build an array and take position 2:
array=(`echo $foo | tr . ' '`)
echo ${array[2]}
This should also work too:
echo $foo | awk -F. '{print $3}'

Bash script frequency analysis of unique letters and repeating letter pairs how should i build this script?

Ok,first post..
So I have this assignment to decrypt cryptograms by hand,but I also wanted to automate the process a little if not all at least a few parts,so i browsed around and found some sed and awk one liners to do some things I wanted done,but not all i wanted/needed.
There are some websites that sort of do what I want, but I really want to just do it in bash for some reason,just because I want to understand it better and such :)
The script would take a filename as parameter and output another file such as solution$1 when done.
if [ -e "$PWD/$1" ]; then
echo "$1 exists"
else
echo "$1 doesnt exists"
fi
Would start the script to see if the file in param exists..
Then I found this one liner
sed -e "s/./\0\n/g" $1 | while read c;do echo -n "$c" ; done
Which works fine but I would need to have the number of occurences per letter, I really don't see how to do that.
Here is what I'm trying to achieve more or less http://25yearsofprogramming.com/fun/ciphers.htm for the counting unique letter occurences and such.
I then need to put all letters in lowercase.
After this I see the script doing theses things..
-a subscript that scans a dictionary file for certain pattern and size of words
the bigger words the better.
For example: let's say the solution is the word "apparel" and the crypted word is "zxxzgvk"
is there a regex way to express the pattern that compares those two words and lists the word "apparel" in a dictionnary file because "appa" and "zxxz" are similar patterns and "zxxzgvk" is of similar length with "apparel"
Can this be part done and is it realistic to view the problem like this or is this just far fetched ?
Another subscript who takes the found letters from the previous output word and that swap
letters in the cryptogram.
The swapped letters will be in uppercase to differentiate them over time.
I'll have to figure out then how to proceed to maybe rescan the new found words to see if they're found in a dictionnary file partly or fully as well,then swap more letters or not.
Did anyone see this problem in the past and tried to solve it with the patterns in words
like i described it,or is this just too complex ?
Should I log any of the swaps ?
Maybe just scan through all the crypted words and swap as I go along then do another sweep
with having for constraint in the first sweep to not change uppercase letters(actually to use them as more precise patterns..!)
Anyone did some similar script/program in another langage? If so which one? Maybe I can relate somehow :)
Maybe we can use your insight as to how you thought out your code.
I will happily include the cryptograms I have decoded and the one I have yet to decode :)
Again, the focus of my assignment is not to do this script but just to resolve the cryptograms. But doing scripts or at least trying to see how I would do this script does help me understand a little more how to think in terms of code. Feel free to point me in the right directions!
The cryptogram itself is based on simple alphabetic substitution.
I have done a pastebin here with the code to be :) http://pastebin.com/UEQDsbPk
In pseudocode the way I see it is :
call program with an input filename in param and optionally a second filename(dictionary)
verify the input file exists and isnt empty
read the file's content and echo it on screen
transform to lowercase
scan through the text and count the amount of each letter to do a frequency analysis
ask the user what langage is the text supposed to be (english default)
use the response to specify which letter frequencies to use as a baseline
swap letters corresponding to the frequency analysis in uppercase..
print the changed document on screen
ask the user to swap letters in the crypted text
if user had given a dictionary file as the second argument
then scan the cipher for words and find the bigger words
find words with a similar pattern (some letters repeating letters) in the dictionary file
list on screen the results if any
offer to swap the letters corresponding in the cipher
print modified cipher on screen
ask again to swap letters or find more similar words
More or less it the way I see the script structured.
Do you see anything that I should add,did i miss something?
I hope this revised version is more clear for everyone!
Tl,dr to be frank. To the only question i've found - the answer is yes:) Please split it to smaller tasks and we'll be happy to assist you - if you won't find the answer to these smaller questions before.
If you can put it out in pseudocode, it would be easier. There's all kinds of text-manipulating stuff in unix. The means to employ depend on how big are your texts. I believe they are not so big, or you would have used some compiled language.
For example the easy but costly gawk way to count frequences:
awk -F "" '{for(i=1;i<=NF;i++) freq[$i]++;}END{for(i in freq) printf("%c %d\n", i, freq[i]);}'
As for transliterating, there is tr utility. You can forge and then pass to it the actual strings in each case (that stands true for Caesar-like ciphers).
grep -o . inputfile | sort | uniq -c | sort -rn
Example:
$ echo 'aAAbbbBBBB123AB' | grep -o . | sort | uniq -c | sort -rn
5 B
3 b
3 A
1 a
1 3
1 2
1 1

Replacing huge blocks with sed

I have 2 files that are generated elsewhere. First one is "what to search", and second one is the replacement. Both files are huge, about 2-3mb each.
I need to write a bash script that takes an even bigger file (about 200-300mb) and replaces all occurrences of file1 contents to file2 contents.
Problem is, file1 and file2 can contain any possible characters, including regexp special symbols.
How can I solve this problem using sed?
Thanks in advance.
Maybe have a look at chgrep:
http://www.bmk-it.com/projects/chgrep/
Cheers,
gregx
Since you don't actually need regular expressions, just direct string matching, sed is overkill. What you're really looking for is a fixed-string (maybe even binary) stream editor. Unfortunately, I don't know of one... I hate to suggest possibly reinventing a wheel, but you could write something fairly quickly in C that'd do what you want. A rough draft outline:
read search-file into memory
create a buffer of the same size as search-file
read from stdin (or input-file) into buffer.
For each character, if it does not match the parallel character from search-file, shift the buffer. To find out how much to shift it by, read until you find a match to the first character of input-file, then check to see if the rest matches, repeating until you've found a partial match to input-file (or gotten to the end of the buffer). When you shift, print all the non-matching characters to stdout (or output-file)
If the buffer ever fills up, i.e. totally matches input-file, print replacement-file to stdout (or output-file). Depending on memory vs. speed, you can keep replacement-file in memory or read it from disk each time.
You could also attempt to automatically escape all regex characters from your input file. This could be done with a horribly ugly list of sed substitutions, like
sed -e 's/\\/\\\\/g' -e 's#/#\/#' -e 's/\[/\\[/g' ...
(make sure you do the \ one first!)
I don't know about sed but in Perl you could do (off the top of my head, untested):
perl -0777 -pe 'BEGIN{local $/ = undef; open FROM, "<", shift #ARGV; $from = <FROM>; open TO, "<" shift #ARGV; $to = <TO>} s/\Q$from\E/$to/sog' file1 file2 bigger-file > new-bigger-file
If you're interesting in trying Perl, I could try testing it for you tomorrow.
But it sucks the entire bigger-file into memory because it ignores line-breaks so that your search text can span multiple lines. This will meant that it uses quite a lot of memory!
This answer assumes that the search file is one long search string over multiple lines which must be matched in its entirety rather than a number of separate search strings, any of which can be matched.

Resources