Caesar cypher indexing - ruby

Can someone briefly explain me what happens in this line:
new_word += alphabet[alphabet.index(i.downcase) - num]
new_word = current state of new_word variable + what?
This is whole program:
def cipher(word, num)
alphabet = ('a'..'z').to_a.concat(('A'..'Z').to_a)
new_word = ""
word.each_char do |i|
if !alphabet.include?(i)
new_word +=i
else
new_word += alphabet[alphabet.index(i.downcase) - num]
end
end
return new_word.downcase.capitalize
end
puts cipher("Apples? and Oranges!", 2)

new_word is a String, so the value on the right will be appended to it. The expression alphabet[alphabet.index(i.downcase) - num] is just an inefficient way of determining a character that is shifted num places in the alphabet.
alphabet is an Array containing the character values corresponding to the letters of the alphabet, starting with lowercase letters and then followed by uppercase letters.
The index method in this case finds the index of the first occurence of the character value i in alphabet. This index is then decreased by num. The character corresponding to this new position is finally looked up in alphabet, and the result is appended to new_word.
Note also that the result will „wrap around“ in the sense that if the new index is negative, the array will be indexed from the back, resulting in the capital letters if num is not too large. Those potential uppercase letters will be downcased in new_word.downcase.capitalize.
The downcase part is strange, because it means that the „cipher“ is not invertible. Note also that this will not work as you might expect if nums absolute value is so large that the lookup is out of bounds.

It's adding to the new word the letter in the alphabet which is represented by (alphabet.index(i.downcase) - num).
i.downcase just converts the letter to lowercase if it wasn't already.
alphabet.index finds the place in the alphabet where i exists.
The num subtraction is the cipher. It changes the letter that would be added by modifying the index in alphabet where the letter is found. A num of 1 would change 'b's into 'a's, because 'a's come before 'b' in the given alphabet.
So it takes the letter, converts it into lowercase, then an array index, modifies that index by num and adds the letter that that new index represents back onto the word.

Related

Feasibility of a bit modified version of Rabin Karp algorithm

I am trying to implement a bit modified version of Rabin Karp algorithm. My idea is if I get a hash value of the given pattern in terms of weight associated with each letter, then I don't have to worry about anagrams so I can just pick up a part of the string, calculate its hash value and compare with hash value of the pattern unlike traditional approach where hashvalue of both part of string and pattern is calculated and then checked whether they are actually similar or it could be an anagram. Here is my code below
string = "AABAACAADAABAABA"
pattern = "AABA"
#string = "gjdoopssdlksddsoopdfkjdfoops"
#pattern = "oops"
#get hash value of the pattern
def gethashp(pattern):
sum = 0
#I mutiply each letter of the pattern with a weight
#So for eg CAT will be C*1 + A*2 + T*3 and the resulting
#value wil be unique for the letter CAT and won't match if the
#letters are rearranged
for i in range(len(pattern)):
sum = sum + ord(pattern[i]) * (i + 1)
return sum % 101 #some prime number 101
def gethashst(string):
sum = 0
for i in range(len(string)):
sum = sum + ord(string[i]) * (i + 1)
return sum % 101
hashp = gethashp(pattern)
i = 0
def checkMatch(string,pattern,hashp):
global i
#check if we actually get first four strings(comes handy when you
#are nearing the end of the string)
if len(string[:len(pattern)]) == len(pattern):
#assign the substring to string2
string2 = string[:len(pattern)]
#get the hash value of the substring
hashst = gethashst(string2)
#if both the hashvalue matches
if hashst == hashp:
#print the index of the first character of the match
print("Pattern found at {}".format(i))
#delete the first character of the string
string = string[1:]
#increment the index
i += 1 #keep a count of the index
checkMatch(string,pattern,hashp)
else:
#if no match or end of string,return
return
checkMatch(string,pattern,hashp)
The code is working just fine. My question is this a valid way of doing it? Can there be any instance where the logic might fail? All the Rabin Karp algorithms that I have come across doesn't use this logic instead for every match, it furthers checks character by character to ensure it's not an anagram. So is it wrong if I do it this way? My opinion is with this code as soon as the hash value matches, you never have to further check both the strings character by character and you can just move on to the next.
It's not necessary that only anagrams collide with the hash value of the pattern. Any other string with same hash value could also collide. Same hash value can act as a liar, so character by character match is required.
For example in your case, you are taking mod 100. Take any distinct 101 patterns, then by the Pigeonhole principle, at least two of them would be having the same hash. If you use one of them as a pattern then the presence of other string would err your output if you avoid character match.
Moreover, even with the hash you used, two anagrams can have the same hash value which can be obtained by solving two linear equations.
For example,
DCE = 4*1 + 3*2 + 5*3 = 25
CED = 3*1 + 5*2 + 4*3 = 25

Permutations of strings takes too long to solve

I'm creating an array of permutated and unique letters in a string, only to sort them alphabetically and find the middle element in the set.
def middle_permutation(string)
length = string.length
permutation_set = string.split("").permutation(length).to_a.map{|item| item.join}.sort
permutation_set.length.even? ? permutation_set[(permutation_set.length)/2-1] : permutation_set[(permutation_set.length/2)+1]
end
For example:
middle_permutation("zxcvbnmasd") should equal "mzxvsndcba"
Even for small strings (N >=10), the calculations take pretty long to finish, and I can forget about anything double that; is there a quicker way?
I'm assuming the letters are unique, as in the OP's question.
Sort
Pluck the middle letter of the sorted string (rounded down). This is the first letter of the middle permutation.
If the original list had an even number of letters, the rest of the permutation is the reverse sort of the remaining letters.
If not, take the middle letter again. Now the rest of the result is the reverse sort of the remaining letters.
The method below returns the desired permutation directly, without iterating through permutations.
The asker has stated that the string contains no duplicated letters, which is a requirement for this method. I assume the characters of the string are sorted. If they are not, the creation of a sorted string would be the first step:
str = "ebadc".chars.sort.join
#=> "abcde"
Code
def mid_perm(str)
return mid_perm_even_length_strings(str) if str.size.even?
first_char_index = str.size/2
str[first_char_index] << mid_perm_even_length_strings(str[0,first_char_index] +
str[first_char_index+1..-1])
end
def mid_perm_even_length_strings(str)
first_char_index = str.size/2-1
str[first_char_index] + (str[0,first_char_index] + str[first_char_index+1..-1]).reverse
end
Examples
mid_perm 'abcd'
#=> "bdca"
mid_perm 'abcde'
#=> "cbeda"
mid_perm 'abcdefghijklmnopqrstuvwxyz'
#=> "mzyxwvutsrqponlkjihgfedcba"
Explanation
Let's start by defining a method to produce permutations of the letters of a string.
def perms(str)
str.chars.permutation(str.size).map(&:join)
end
Strings containing an even number of characters
Consider
a = perms "abcd"
#=> ["abcd", "abdc", "acbd", "acdb", "adbc", "adcb",
# "bacd", "badc", "bcad", "bcda", "bdac", "bdca",
# "cabd", "cadb", "cbad", "cbda", "cdab", "cdba",
# "dabc", "dacb", "dbac", "dbca", "dcab", "dcba"]
a contains 4! #=> 4*3*2 => 24 elements, 4 being the length of the string.
Notice that since the characters in perms' argument are sorted, the array returned is also sorted1.
a == a.sort #=>true
As a.size #=> 24, the "middle" element is either a[11] #=> "bdca" or a[12] #=> "cabd" (where 11 = (24-1)/2 and 12 = 24/2), depending on how we want to round. The question stipulates that, for even-length strings, we are to round down, so that would be "bdca".
Now let's slice a into str.size equal arrays, each containing a.size/str.size #=> 24/4 => 6 elements:
b = a.each_slice(a.size/str.size).to_a
#=> [["abcd", "abdc", "acbd", "acdb", "adbc", "adcb"],
# ["bacd", "badc", "bcad", "bcda", "bdac", "bdca"],
# ["cabd", "cadb", "cbad", "cbda", "cdab", "cdba"],
# ["dabc", "dacb", "dbac", "dbca", "dcab", "dcba"]]
The desired element is therefore
b[(a.size/str.size-1)/2-1][-1]
#=> "bdca"
This value can be computed more directly as follows.
first_char_index = str.size/2-1
#=> 1
first_char = str[first_char_index]
#=> "b"
remaining_chars = (str[0,first_char_index] + str[first_char_index+1..-1]).reverse
#=> "dca"
first_char + remaining_chars
#=> "bdca"
The same logic applies to all strings having an even number of characters. We therefore can write the method mid_perm_even_length_strings shown in the Code section above.
For example (for a 12-character string)
mid_perm_even_length_strings 'abcdefghijkl'
#=> "flkjihgedcba"
Strings containing an odd number of characters
Now consider
str = "abcde"
a = perms str
#=> ["abcde", "abced", "abdce", "abdec", "abecd", "abedc",
# "acbde", "acbed", "acdbe", "acdeb", "acebd", "acedb",
# "adbce", "adbec", "adcbe", "adceb", "adebc", "adecb",
# "aebcd", "aebdc", "aecbd", "aecdb", "aedbc", "aedcb",
# "bacde", "baced", "badce", "badec", "baecd",..., "bedca",
# "cabde", "cabed", "cadbe", "cadeb", "caebd", "caedb",
# "cbade", "cbaed", "cbdae", "cbdea", "cbead", "cbeda",
# "cdabe", "cdaeb", "cdbae", "cdbea", "cdeab", "cdeba",
# "ceabd", "ceadb", "cebad", "cebda", "cedab", "cedba",
# "dabce", "dabec", "dacbe", "daceb", "daebc",..., "decba",
# "eabcd", "eabdc", "eacbd", "eacdb", "eadbc",..., "edcba"]
Here the permutation contains 5! #=> 100 elements, in 5 blocks of 20. (Again, a.each_cons(2).all? { |s1,s2| s1 < s2 } #=> true.)
The middle element of a is clearly the middle element of the block of elements that begin with
str[str.size/2] #=> "c"
That block would be the array
b = a.each_slice(a.size/str.size).to_a[str.size/2]
#=> ["cabde", "cabed", "cadbe", "cadeb", "caebd", "caedb",
# "cbade", "cbaed", "cbdae", "cbdea", "cbead", "cbeda",
# "cdabe", "cdaeb", "cdbae", "cdbea", "cdeab", "cdeba",
# "ceabd", "ceadb", "cebad", "cebda", "cedab", "cedba"]
which would be 'c' plus the middle element of the array
["abde", "abed", "adbe", "adeb", "aebd", "aedb",
"bade", "baed", "bdae", "bdea", "bead", "beda",
"dabe", "daeb", "dbae", "dbea", "deab", "deba",
"eabd", "eadb", "ebad", "ebda", "edab", "edba"]
That array is merely the permutations of the string "abde". Since that string contains an even number characters, its middle element is
mid_perm_even_length_strings 'abde'
#=> "beda"
It follows that the middle element of the permutations of the letters of "abcde" is therefore
'c' + 'abde'
#=> "cabde"
This clearly applies to all strings containing an odd number of characters.
1. The doc for Array#permutation states, "The implementation makes no guarantees about the order in which the permutations are yielded.". We therefore might need to tack .sort to the end of the operative line of perms, but with Ruby v2.4 (and I suspect, earlier versions) that is, in fact not necessary here.
I was able to compact it like this:
def middle_permutation(string)
list = string.chars.permutation.map(&:join).sort
list[list.length / 2 - (list.length.even? ? 1 : 0)]
end
Which yields:
middle_permutation('zxcvbnmasd')
# => "mzxvsndcba"
You don't need to generate all permutations. Just find overall number of permutations as PN = N! where N is string (of different chars) length and calculate only needed PN/2-th permutation by its number - for example, using this approach
public static int[] perm(int n, int k)
{
int i, ind, m=k;
int[] permuted = new int[n];
int[] elems = new int[n];
for(i=0;i<n;i++) elems[i]=i;
for(i=0;i<n;i++)
{
ind=m%(n-i);
m=m/(n-i);
permuted[i]=elems[ind];
elems[ind]=elems[n-i-1];
}
return permuted;
}
So it turns out there are two tracks to this, odd strings and even strings.
For odd strings, you take out the middle character Element of the sorted array and the one before it, in that order. When you do that you have two remaining arrays, the one the right and left, both alphabetically sorted. You tack on elements of the right array, starting with the last element, then do the same for the one on the left.
For even strings, Do the same but only take one character in the first step: the (N/2) element.
Here's my solution:
def middle_permutation(string)
string_array = string.chars.sort
mid_string = []
length = string.length
if length.even?
mid_string << string_array[length/2-1]
string_array.delete_at(length/2-1)
(mid_string << string_array.reverse).flatten.join
else
mid_string << string_array[(length/2)-1..length/2].reverse
string_array.slice!((length/2)-1, 2)
(mid_string << string_array.reverse).flatten.join
end
end

How to affect only letters, not punctuation in Caesar Cipher code

I am trying to write a Caesar Cipher in Ruby and I hit a snag when trying to change only the letters to a numerical values and not the punctuation marks.
Here is my script so far:
def caesar_cipher(phrase, key)
array = phrase.split("")
number = array.map {|n| n.upcase.ord - (64-key)}
puts number
end
puts "Script running"
caesar_cipher("Hey what's up", 1)
I tried to use select but I couldn't figure out how to select only the punctuation marks or only the letters.
Use String#gsub to match only the characters that you want to replace. In this case it's the letters of the alphabet, so you'll use the regular expression /[a-z]/i.
You can pass a block to gsub which will be called for each match in the string, and the return value of the block will be used as the replacement. For example:
"Hello, world!".gsub(/[a-z]/i) {|chr| (chr.ord + 1).chr }
# => Ifmmp, xpsme!"
Here's a version of your Caesar cipher method that works pretty well:
BASE_ORD = 'A'.ord
def caesar_cipher(phrase, key)
phrase.gsub(/[a-z]/i) do |letter|
orig_pos = letter.upcase.ord - BASE_ORD
new_pos = (orig_pos + key) % 26
(new_pos + BASE_ORD).chr
end
end
caesar_cipher("Hey, what's up?", 1) # => "IFZ, XIBU'T VQ?"
Edit:
% is the modulo operator. Here it's used to make new_pos "wrap around" to the beginning of the alphabet if it's greater than 25.
For example, suppose letter is "Y" and key is 5. The position of "Y" in the alphabet is 24 (assuming "A" is 0), so orig_pos + key will be 29, which is past the end of the alphabet.
One solution would be this:
new_pos = orig_pos + key
if new_pos > 25
new_pos = new_pos - 26
end
This would make new_pos 3, which corresponds to the letter "D," the correct result. We can get the same result more efficiently, however, by taking "29 modulo 26"—expressed in Ruby (and many other languages) as 29 % 26—which returns the remainder of the operation 29 ÷ 26. (because there are 26 letters in the alphabet). 29 % 26 is 3, the same result as above.
In addition to constraining a number to a certain range, as we do here, the modulo operator is also often used to test whether a number is divisible by another number. For example, you can check if n is divisible by 3 by testing n % 3 == 0.

CoderByte CaesarCipher Excercise in Ruby

This is the question : Using the Ruby language, have the function CaesarCipher(str,num) take the str parameter and perform a Caesar Cipher shift on it using the num parameter as the shifting number. A Caesar Cipher works by shifting each letter in the string N places down in the alphabet (in this case N will be num). Punctuation, spaces, and capitalization should remain intact. For example if the string is "Caesar Cipher" and num is 2 the output should be "Ecguct Ekrjgt"
Here is my code
def CaesarCipher(str,num)
alphabet = ("a".."z").to_a.join("")
alphabetupcase = ("a".."z").to_a.join("").upcase
i=0
result = ""
while i < str.length
if alphabet.include?(str[i])
result += alphabet[alphabet.index(str[i]) + num]
elsif alphabetupcase.include?(str[i])
result += alphabetupcase[alphabetupcase.index(str[i]) + num]
else
result += str[i]
end
i += 1
end
# code goes here
return result
end
I keep getting this error (eval):11: (eval):11:in +': can't convert Fixnum into String (TypeError) from (eval):11:inCaesarCipher' from (eval):26
What is the problem with this and How can I fix this code?
Can you suggest a better solution keeping in mind I am a beginner in Ruby ?
Thank you all in advance
Your code does not work, because you miss the modulo operation - if you shift the letter 'z' by 2, you should get 'b'. And your program fails in this case. The algorithm for counting the new letter index is: (index + shift) modulo alphabet_size.
But I would do like this:
def caesar (str, num)
str.split('').collect do |character|
case character
when 'a'..'z', 'A'..'Z'
base_ascii = if character == character.upcase then 'A'.ord else 'a'.ord end
(((character.ord - base_ascii + num) % ('a'..'z').count) + base_ascii).chr
else
character
end
end.join('')
end
First, iterate on every character of the string. If it is a letter, calculate the shift (note the base_ascii, which is the ASCII code for 'A' or 'a', depends if shift is for lower- or uppercase), which is just an index of letter (character.ord - base_ascii) plus the shif (num) modulo number of letters in the alphabet (('a'..'z').count). If the characters is not a letter, so space, punctation, returns it unchanged.
I recommend taking a look at some other Caesar cypher implementations here: https://codereview.stackexchange.com/questions/55049/learning-ruby-caesar-cipher

How do I modify the Damerau-Levenshtein algorithm, such that it also includes the start index, and the end index of the larger substring?

Here is my code:
#http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
# used for fuzzy matching of two strings
# for indexing, seq2 must be the parent string
def dameraulevenshtein(seq1, seq2)
oneago = nil
min = 100000000000 #index
max = 0 #index
thisrow = (1..seq2.size).to_a + [0]
seq1.size.times do |x|
twoago, oneago, thisrow = oneago, thisrow, [0] * seq2.size + [x + 1]
seq2.size.times do |y|
delcost = oneago[y] + 1
addcost = thisrow[y - 1] + 1
subcost = oneago[y - 1] + ((seq1[x] != seq2[y]) ? 1 : 0)
thisrow[y] = [delcost, addcost, subcost].min
if (x > 0 and y > 0 and seq1[x] == seq2[y-1] and seq1[x-1] == seq2[y] and seq1[x] != seq2[y])
thisrow[y] = [thisrow[y], twoago[y-2] + 1].min
end
end
end
return thisrow[seq2.size - 1], min, max
end
there has to be someway to get the starting and ending index of substring, seq1, withing parent string, seq2, right?
I'm not entirely sure how this algorithm works, even after reading the wiki article on it. I mean, I understand the highest level explanation, as it finds the insertion, deletion, and transposition difference (the lines in the second loop).. but beyond that. I'm a bit lost.
Here is an example of something that I wan to be able to do with this (^):
substring = "hello there"
search_string = "uh,\n\thello\n\t there"
the indexes should be:
start: 5
end: 18 (last char of string)
Ideally, the search_string will never be modified. But, I guess I could take out all the white space characters (since there are only.. 3? \n \r and \t) store the indexes of each white space character, get the indexes of my substring, and then re-add in the white space characters, making sure to compensate the substring's indexes as I offset them with the white space characters that were originally in there in the first place. -- but if this could all be done in the same method, that would be amazing, as the algorithm is already O(n^2).. =(
At some point, I'd like to only allow white space characters to split up the substring (s1).. but one thing at a time
I don't think this algorithm is the right choice for what you want to do. The algorithm is simply calculating the distance between two strings in terms of the number of modifications you need to make to turn one string into another. If we rename your function to dlmatch for brevity and only return the distance, then we have:
dlmatch("hello there", "uh, \n\thello\n\t there"
=> 7
meaning that you can convert one string into the other in 7 steps (effectively by removing seven characters from the second). The problem is that 7 steps is a pretty big difference:
dlmatch("hello there", "panda here"
=> 6
This would actually imply that "hello there" and "panda here" are closer matches than the first example.
If what you are trying to do is "find a substring that mostly matches", I think you are stuck with an O(n^3) algorithm as you feed the first string to a series of substrings of the second string, and then selecting the substring that provides you the closest match.
Alternatively, you may be better off trying to do pre-processing on the search string and then doing regexp matching with the substring. For example, you could strip off all special characters and then build a regexp that looks for words in the substring that are case insensitive and can have any amount of whitespace between them.

Resources