Ruby Byte XOR strange result - help please - ruby

I was doing some XOR of data and things were going well with my hex based XOR. It was recommend that I use a byte XOR (^) and only work with bytes. I thought that will take no time to change that but I have the some strange behaviour that I had not expected.
Could some add a little light as to why I'm getting a different result if I'm processing the string as bytes. I was expecting it to be the same.
m_hex_string ="124f33e6a118566377f237075354541f0a5a1b"
m_XOR_string ="662756c6c27732065796586974207468652870"
m_expected ="the code don't work"
m_expected_hex ="74686520636f646520646f6e277420776f726b"
def XOR_hex_strings(a,b)
(a.hex ^ b.hex).to_s(16)
end
def XOR_byte_strings(s1,s2)
xored = s1.bytes.zip(s2.bytes).map { |(a,b)| a ^ b }.pack('c*')
end
def hex_digest(hexdigest)
[hexdigest].pack("H*")
end
puts "My strings for stack overflow"
puts "'"+hex_digest(XOR_hex_strings(m_hex_string,m_XOR_string))+"'"
puts "'"+hex_digest(XOR_byte_strings(m_hex_string,m_XOR_string))+"'"
Results:
My strings for stack overflow
'the code don't work'
'tje`#ode ?on't ~mrk'
The text should be the same 'the code don't work' for both methods. I'd really like to know why rather than just a correct code fragment. thanks.

As already said in the comments, bytes doesn't take the hex format into account, it just returns the integer values for "1", "2", "4", "f" etc. You can convert the hex string with pack:
[m_hex_string].pack("H*")
# => "\x12O3\xE6\xA1\x18Vcw\xF27\aSTT\x1F\nZ\e"
unpack converts this into a byte array, just like bytes but more explicit and faster (IIRC):
[m_hex_string].pack("H*").unpack("C*")
# => [18, 79, 51, 230, 161, 24, 86, 99, 119, 242, 55, 7, 83, 84, 84, 31, 10, 90, 27]
The final method would look like:
def XOR_pack_unpack_strings(s1, s2)
s1_bytes = [s1].pack("H*").unpack("C*")
s2_bytes = [s2].pack("H*").unpack("C*")
s1_bytes.zip(s2_bytes).map { |a, b| a ^ b }.pack('C*')
end
If speed is an issue, take a look at the fast_xor gem:
require 'xor'
def XOR_fast_xor_strings(s1_hex, s2_hex)
s1 = [s1_hex].pack("H*")
s2 = [s2_hex].pack("H*")
s1.xor!(s2)
end

Related

Can't encipher text in Ruby. It showes me the last letter of cipher-key in all my plaintext after iterating

Can't encipher text in Ruby. It showes me the last letter of cipher-key in all my plaintext after iterating.
key is: VCHPRZGJNTLSKFBDQWAXEUYMOI
plaintext is: Hello, CS-50!
expected ciphered_text: Jrssb, HA-50!
I got ciphered_text: Iiiii, II-50!
I don't know why I got a last letter of a key (I) in every char of ciphered_text....
Maybe I need a "break" after every succes "if". But it doesn't helped.
Here is my code:
# Design and implement a program, substitution, that encrypts messages using a substitution cipher.
plaintext_str = 'Hello, CS-50!'
key_str = 'VCHPRZGJNTLSKFBDQWAXEUYMOI'
# Converting string into array:
plaintext = plaintext_str.split('')
key = key_str.split('')
# Check if letter is alphabetical
def alpha?(char)
char.match?(/^[[:alpha:]]$/)
end
# Check if letter is in uppercase
def upper?(char)
char.match?(/^[[:upper:]]$/)
end
# # Check if letter is in lowercase
def lower?(char)
char.match?(/^[[:lower:]]$/)
end
# ASCII arrays value assigned to capital letters for alphabets
capital_letters = [65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90]
# ASCII arrays value assigned to small letters for alphabets
small_letters = [97,
98,
99,
100,
101,
102,
103,
104,
105,
106,
107,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
118,
119,
120,
121,
122]
# Define variable for ciphertext:
ciphertext = ''
# iterating each plaintext's char (i-th):
plaintext.each_index do |i|
# iterating each key's char j-th on i-th char of plaintext:
key.each_index do |j|
# If char in plaintext is alphabetical:
if alpha?(plaintext[i])
# Check if letter is in uppercase
if upper?(plaintext[i])
capital_letters.each_index do |k|
# Checking if plaintext's letter is equal to alphabet's letter in [j]
if plaintext[i].ord == capital_letters[k]
ciphertext[i] = key[j].upcase
break
end
end
# Check if letter is in lowercase
elsif lower?(plaintext[i])
small_letters.each_index do |l|
if plaintext[i].ord == small_letters[l]
ciphertext[i] = key[j].downcase
break
end
end
end
# if non-alphabetical:
else
ciphertext[i] = plaintext[i]
end
end
end
puts "ciphertext: #{ciphertext}"
As a preface, the code you have here looks an awful lot like C. There are a lot of concepts that Ruby provides which can make problems like this into significantly shorter solutions. For now, we'll focus on what's wrong with your code as it is.
The main issue is you have three index variables, i from plaintext, j from key, and k from capital_letters. You check that if plaintext[i] == capital_letters[k], then you place key[j] at the ith position. But, since j was never participating in the checks, you will simply pass this check for all indices in your key variable. So, you might as well have ciphertext[i] = key.last.upcase, and if you check key, the last entry is I, hence why your output is nothing but I's.
Some suggestions for how you could simplify your code:
def encipher(plaintext, key)
# Make a table that maps { plaintext character => target character }
# With your input example, table = {"A" => "V", "B" => "C", "C" => "H", ... }
table = key.chars
.each_with_index
.map { |sub, index| [Array("A".."Z")[index], sub] }
.to_h
plaintext.map { |char|
# TODO: Using char and table, encipher exactly one character
}.join
end

Trying to find a student's grade by assignment number in Ruby

So I have a hash like the following:
data = {bill: [100, 95, 92], frank: [67, 73, 84]}
I'm trying to build it out so that it would do 95 if I put in :bill, 2.
I'm getting really caught up in the iteration.
I have, which hasn't worked:
def scores (grade_hash, student, assign_number)
grade_hash.map.with_index {|i, x| puts x-i}
end
Clearly I'm a novice at Ruby. Any suggestions?
Try this:
def scores(grade_hash, student, assign_number)
grade_hash[student][assign_number - 1]
end
puts scores(data, :bill, 2) #=> 95
Some explanation:
{bill: [100, 95, 92], frank: [67, 73, 84]}[:bill] #=> [100, 95, 92]
[100, 95, 92][1] #=> 95
In newer versions of ruby (2.3+), you can use the dig method of Array and Hash without needing a custom method for this, though the index you pass in still needs to be 0-based:
data.dig(:bill, 2) # => 92
data.dig(:bill, 1) # => 95
data.dig(:bill, 5) # => nil -- they haven't taken 6 tests, yet
data.dig(:john, 1) # => nil -- there is no student 'john'

ruby delete hidden characters from entire file

I have text file which contains the below hidden characters which I got with sed.
\033[H\033[2J\033
When I open file with vi and seeing the above code as below,
^[[H^[[2J^[[H^[[2J
Due to this hidden characters am facing some issue while processing file. Is there any to get rid of this hidden characters in entire file before processing it.
If the file size is not too big, you can read the whole file contents in, and then remove all the escaped sequences.
content = File.read('your_input_file_path')
content.gsub!(/\033\[(?:H|2J)/, '')
content.split(/\r?\n/).each do |line|
# process line
end
You can generalize the Regex used according to the escaped sequence pattern. In you example, it seems it's \033[ followed by an optional digit, and then a letter. Which can be updated as:
content.gsub!(/\033\[\d?[A-Z]/, '')
One way to remove non-printing characters with ASCII values less than 32 (" ".ord #=> 32) follows.
def remove_invisible(infile, outfile)
File.write(outfile,
File.read(infile).
codepoints.
reject { |n| n < 32 }.
map(&:chr).
join
)
Suppose File.read(infile) returns
str = "\033[H\033[2J\033"
#=> "\e[H\e[2J\e"
then
a = str.codepoints
#=> [27, 91, 72, 27, 91, 50, 74, 27]
b = a.reject { |n| n < 32 }
#=> [91, 72, 91, 50, 74]
c = b.map(&:chr)
#=> ["[", "H", "[", "2", "J"]
c.join
#=> "[H[2J"

How to analyze the "max" method

Can someone explain why "time" is the max value here?
my_array = %w{hello my time here is long}
my_array.max #=> "time"
Because alphabetically t in time is greater here among others in your array my_array.
Here is one way,how string comparisons happened :
'hello' > 'time' # => false
'my' > 'time' # => false
'here' > 'time' # => false
'is' > 'time' # => false
'long' > 'time' # => false
To understand the outputs of the above fragment code,you must need to see String#<=> documentation. As your my_array contains all string instances,which has called the method <=>,to build the output of max.
Documentations says Enumerable#max:
Enumerable#max,without block assumes all objects implement Comparable.
Here's how computers look at the strings and compare them.
If we look at the first characters of each word it'll help a little, because we know how the alphabet orders letters:
%w[hello my time here is long].map{ |s| s[0] }.sort # => ["h", "h", "i", "l", "m", "t"]
But that doesn't really help visualize it, so here's a look at each word's letters as a computer sees them:
%w[time tome].each do |w|
puts w.chars.map(&:ord).join(', ')
end
# >> 116, 105, 109, 101
# >> 116, 111, 109, 101
Each letter has a value. Over the years there have been many different ways of ordering letters for a computer, which caused the character to value mapping to change. EBCDIC and ASCII have been the most popular but have different orders. We're usually dealing with ASCII, or a derivative, which is set by the OS.
Look at how the characters in the words are represented by the values in the following output. It should make it easy to understand what the computer is doing then.
%w[he hello help holler hollow].sort.each do |w|
puts '"%6s": %s' % [ w, w.chars.map(&:ord).join(', ') ]
end
# >> " he": 104, 101
# >> " hello": 104, 101, 108, 108, 111
# >> " help": 104, 101, 108, 112
# >> "holler": 104, 111, 108, 108, 101, 114
# >> "hollow": 104, 111, 108, 108, 111, 119

Anagrams Code Kata, Ruby Solution very slow

I've been having a play with Ruby recently and I've just completed the Anagrams Code Kata from http://codekata.pragprog.com.
The solution was test driven and utilises the unique prime factorisation theorem, however it seems to run incredibly slow. Just on the 45k file it's been running for about 10 minutes so far. Can anyone give me any pointers on improving the performance of my code?
class AnagramFinder
def initialize
#words = self.LoadWordsFromFile("dict45k.txt")
end
def OutputAnagrams
hash = self.CalculatePrimeValueHash
#words.each_index{|i|
word = #words[i]
wordvalue = hash[i]
matches = hash.select{|key,value| value == wordvalue}
if(matches.length > 1)
puts("--------------")
matches.each{|key,value|
puts(#words[key])
}
end
}
end
def CalculatePrimeValueHash
hash = Hash.new
#words.each_index{|i|
word = #words[i]
value = self.CalculatePrimeWordValue(word)
hash[i] = value
}
hash
end
def CalculatePrimeWordValue(word)
total = 1
hash = self.GetPrimeAlphabetHash
word.downcase.each_char {|c|
value = hash[c]
total = total * value
}
total
end
def LoadWordsFromFile(filename)
contentsArray = []
f = File.open(filename)
f.each_line {|line|
line = line.gsub(/[^a-z]/i, '')
contentsArray.push line
}
contentsArray
end
def GetPrimeAlphabetHash
hash = { "a" => 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11, "f" => 13, "g" =>17, "h" =>19, "i" => 23, "j" => 29, "k" => 31, "l" => 37, "m" => 41, "n" =>43, "o" =>47, "p" => 53, "q" =>59, "r" => 61, "s" => 67, "t" => 71, "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97, "z" => 101 }
end
end
Frederick Cheung has a few good points, but I thought I might provide you with a few descriptive examples.
I think your main problem is that you create your index in a way that forces you to do linear searches in it.
Your word list (#words) seems to look something like this:
[
"ink",
"foo",
"kin"
]
That is, it is just an array of words.
Then you create your hash index with CalculatePrimeValueHash, with hash keys being equal to the word's index in #words.
{
0 => 30659, # 23 * 43 * 31, matching "ink"
1 => 28717, # 13 * 47 * 47, matching "foo"
2 => 30659 # 31 * 23 * 43, matching "kin"
}
I would consider this a good start, but the thing is if you keep it like this, you will have to iterate through the hash to find what hash keys (i.e. indexes in #words) that belong together, and then iterate through those to join them. That is, the basic problem here is that you do things too granularly.
If you instead were to build this hash with the prime values as hash keys, and have them point to an array of the words with that key, you would get a hash index like this instead:
{
30659 => ["ink", "kin"],
28717 => ["foo"]
}
With this kind of structure, the only thing you have to do to write your output, is to just iterate over the hash values and print them, since they are already grouped.
Another thing with your code, is that it seems to generate a whole bunch of throwaway objects , which will make sure to keep your garbarge collector busy, and that is generally quite a big choke point in ruby.
It might also be a good thing to go find either a benchmark tool and/or a profiler to analyze your code and see where it could be approved upon.
Fundamentally your code is slow because for each word (45k) of them you iterate over the entire hash (45k of them) looking for words with the same signature, so you're doing 45k * 45k of these comparisons. Another way of phrasing that is to say that your complexity is n^2 in the number of words.
The code below implements your basic idea but runs in a few seconds on the 236k word file I happen to have lying around. It could definitely be faster - the second pass over the data to find the things with > 1 items could be eliminated but would be less readable
It's also a lot shorter than your code, around a third, while staying readable, largely because I used more standard library functions and idiomatic ruby.
For example, the load_words method uses collect to turn one array into another, rather than iterating over one array and adding things to a second one. Similarly the signature function uses inject rather than iterating over the characters. Lastly I've used group_by to do the actual grouping. All of these methods happen to be in Enumerable - it's well worth becoming very familiar with these.
signature_for_word could become even pithier with
word.each_char.map {|c| CHAR_MAP[c.downcase]}.reduce(:*)
This takes the word, splits it into characters and then maps each one of those to the right number. reduce(:*) (reduce is an alias for inject) then multiplies them all together.
class AnagramFinder
CHAR_MAP ={ "a" => 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11, "f" => 13, "g" =>17, "h" =>19, "i" => 23, "j" => 29, "k" => 31, "l" => 37, "m" => 41, "n" =>43, "o" =>47, "p" => 53, "q" =>59, "r" => 61, "s" => 67, "t" => 71, "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97, "z" => 101 }
def initialize
#words = load_words("/usr/share/dict/words")
end
def find_anagrams
words_by_signature = #words.group_by {|word| signature_for_word word}
words_by_signature.each do |signaure, words|
if words.length > 1
puts '----'
puts words.join('; ')
end
end
end
def signature_for_word(word)
word.downcase.each_char.inject(1) {| total, c| total * CHAR_MAP[c]}
end
def load_words(filename)
File.readlines(filename).collect {|line| line.gsub(/[^a-z]/i, '')}
end
end
You can start limiting the slowness by using the Benchmark tool. Some examples here:
http://www.skorks.com/2010/03/timing-ruby-code-it-is-easy-with-benchmark/
First of all it would be interesting to see how long it takes to run self.calculate_prime_value_hash and after that the calculate_prime_word_value.
Quite often the slowness boils down to the number of times the inners loops are run so you can also log how many times they are run.
One very quick improvement you can do is to set the prime alhabet hash as a constant because it's not changed at all:
PRIME_ALPHABET_HASH = { "a" => 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11, "f" => 13, "g" =>17, "h" =>19, "i" => 23, "j" => 29, "k" => 31, "l" => 37, "m" => 41, "n" =>43, "o" =>47, "p" => 53, "q" =>59, "r" => 61, "s" => 67, "t" => 71, "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97, "z" => 101 }

Resources