String to BigNum and back again (in Ruby) to allow circular shift - ruby

As a personal challenge I'm trying to implement the SIMON block cipher in Ruby. I'm running into some issues finding the best way to work with the data. The full code related to this question is located at: https://github.com/Rami114/Personal/blob/master/Simon/Simon.rb
SIMON requires both XOR, shift and circular shift operations, the last of which is forcing me to work with BigNums so I can perform the left circular shift with math rather than a more complex/slower double loop on byte arrays.
Is there a better way to convert a string to a BigNum and back again.
String -> BigNum (where N is 64 and pt is a string of plaintext)
pt = pt.chars.each_slice(N/8).map {|x| x.join.unpack('b*')[0].to_i(2)}.to_a
So I break the string into individual characters, slice into N-sized arrays (the word size in SIMON) and unpack each set into a BigNum. That appears to work fine and I can convert it back.
Now my SIMON code is currently broken, but that's more the math I think/hope and not the code. The conversion back is (where ct is an array of bignums representing the ciphertext):
ct.map { |x| [x.to_s(2).rjust(128,'0')].pack('b*') }.join
I seem to have to right-justify pad the string as bignums are of undefined width so I have no leading 0s. Unfortunately the pack requires the defined with to have sensible output.
Is this a valid method of conversion? Is there a better way? I'm not sure on either count and hoping someone here can help out.
E: For #torimus, the circular shift implementation I'm using (From link above)
def self.lcs (bytes, block_size, shift)
((bytes << shift) | (bytes >> (block_size - shift))) & ((1<< block_size)-1)
end

If you would be equally happy with unpack('B*') with msb first binary numbers (which you could well be if all your processing is circular), then you could also use .unpack('Q>') instead of .unpack('B*')[0].to_i(2) for generating pt:
pt = "qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM1234567890!#"
# Your version (with 'B' == msb first) for comparison:
pt_nums = pt.chars.each_slice(N/8).map {|x| x.join.unpack('B*')[0].to_i(2)}.to_a
=> [8176115190769218921, 8030025283835160424, 7668342063789995618, 7957105551900562521,
6145530372635706438, 5136437062280042563, 6215616529169527604, 3834312847369707840]
# unpack to 64-bit unsigned integers directly
pt_nums = pt.unpack('Q>8')
=> [8176115190769218921, 8030025283835160424, 7668342063789995618, 7957105551900562521,
6145530372635706438, 5136437062280042563, 6215616529169527604, 3834312847369707840]
There are no native 128-bit pack/unpacks to return in the other direction, but you can use Fixnum to solve this too:
split128 = 1 << 64
ct = pt # Just to show round-trip
ct.map { |x| [ x / split128, x % split128 ].pack('Q>2') }.join
=> "\x00\x00\x00\x00\x00\x00\x00\x00qwertyui . . . " # truncated
This avoids a lot of the temporary stages on your code, but at the expense of using a different byte coding - I don't know enough about SIMON to comment whether this is adaptable to your needs.

Related

`sum` function not working as expected in case of large strings

In Ruby, #sum is used to calculate
Sum of array
Sum of an array based on a function or condition
Sum of ASCII codepoints (ord) in a string (not char array) i.e. 'abcd'.sum # => 394
The problem with the third one is the following
For the string below,
AwotIJHOAIJSRoieJHOjasOIADaoiHAOHJAOIJGOIajdOIQWJTOIGJDOINCOIASORIOGIMAOIMEORIQEMOIGMEOIFMASKDJQOWJGOJOASJOIQWOGIMASOIDMOQWIROQIGJOIAMSFOAIJGIHIWUNVNZMXCNXCKJQOWRIEOGSDGSPOKSDLAMKMROQIJRDFLKMZXOIAJSQPIRKLMAdglkaSFAJOIAJFOIQWJEOIQJKAMCLKACMALKSDLAKWEQANLEIRJRQFIJAOIVAWOTIJHOAIJSROIEJHOJASOIADAOIHAOHJAOIJGOIAJDOIQWJTOIGJDOINCOIASORIOGIMAOIMEORIQEMOIGMASODLQWKEJOIFJLKMALSKQIOWELKMZLXKMFALSFJQOIWEAOISFWIDHGPSODRJAWOPIJHOIDJOIAJTGIOJAORAJWOIJHOFMAOIFMOIPDMOAIPWJTOPIJDOIFjawoiRJOIpjmaioGJIGHAIJRHQHQIUEIvnaksJDNWIORQIOPEGHIDVNAJKNASIPHRQEUITHIUHDNAJSNWIHJQIWJQEOIGOIDVNAKOSDNAOPWPJQOPIWTJQEOIPGDPJFNASPJNQWOIRQWIOTOIVNAKSFNAIOAWOTIJHOAIJSROIEJHOJASOIADAOIHAOHJAOIJGOIAJDOIQWJTOIGJDOINCOIASORIOGIMAOIMEORIQEMOIGMASODLQWKEJOIFJLKMALSKQIOWELKMZLXKMFALSFJQOIWEAOISFWIDHGPSODRJAWOPIJHOIDJoiajTGIOJAORAJWOIJHOFMAOIFMOIPDMOAIPWJTOPIJDOIFJAWOIRJOIPJMAIOGJIGHAIJRHQHQIUEIVNAKSJDNWIORQIOPEGHIDVIPNWIHJQIWJQEOIGOIDVNAKOSDNAOPWPJQOPIWTJqeoIPGDPJFNASPJNQWJQWOIRJgonasKFAWOEJQWOIJOGALKFNASLFKqeqOFIJAOISFJAOISFJAWOI
which is large, (of 1000 characters), the following program doesn't work
putc gets.upcase.sum/~/$/
It works for all other strings of lesser size. The output of the above must be K. But it shows \9
But if I do this
putc gets.upcase.chars.sum(&:ord)/~/$/
It shows K. But the former one gives the correct output for all the other string except the large ones like this.
What is wrong here?
EDIT : Try it Online link
Try it online!
Sum of ASCII codepoints (ord) in a string (not char array) i.e. 'abcd'.sum # => 394
I've actually never heard of String#sum before, despite being fairly knowledgeable in the language. So I looked it up:
Returns a basic n-bit checksum of the characters in str, where n is the optional Integer parameter, defaulting to 16. The result is simply the sum of the binary value of each byte in str modulo 2**n - 1. This is not a particularly good checksum.
And sure enough, using your example input string, that's why we get:
str.chars.map(&:ord).sum
# => 77090
str.sum
# => 11554
The values are different because 77090 > 2**15. Moreover, 77090 % 2**15 == 11554.
If you use a larger value for n, the (check)sum is what you expected:
str.sum(100)
#=> 77090

How do I convert a spreadsheet "letternamed" column coordinate to an integer?

In spreadsheets I have cells named like "F14", "BE5" or "ALL1". I have the first part, the column coordinate, in a variable and I want to convert it to a 0-based integer column index.
How do I do it, preferably in an elegant way, in Ruby?
I can do it using a brute-force method: I can imagine loopping through all letters, converting them to ASCII and adding to a result, but I feel there should be something more elegant/straightforward.
Edit: Example: To simplify I do only speak about the column coordinate (letters). Therefore in the first case (F14) I have "F" as the input and I expect the result to be 5. In the second case I have "BE" as input and I expect getting 56, for "ALL" I want to get 999.
Not sure if this is any clearer than the code you already have, but it does have the advantage of handling an arbitrary number of letters:
class String
def upcase_letters
self.upcase.split(//)
end
end
module Enumerable
def reverse_with_index
self.map.with_index.to_a.reverse
end
def sum
self.reduce(0, :+)
end
end
def indexFromColumnName(column_str)
start = 'A'.ord - 1
column_str.upcase_letters.map do |c|
c.ord - start
end.reverse_with_index.map do |value, digit_position|
value * (26 ** digit_position)
end.sum - 1
end
I've added some methods to String and Enumerable because I thought it made the code more readable, but you could inline these or define them elsewhere if you don't like that sort of thing.
We can use modulo and the length of the input. The last character will
be used to calculate the exact "position", and the remainders to count
how many "laps" we did in the alphabet, e.g.
def column_to_integer(column_name)
letters = /[A-Z]+/.match(column_name).to_s.split("")
laps = (letters.length - 1) * 26
position = ((letters.last.ord - 'A'.ord) % 26)
laps + position
end
Using decimal representation (ord) and the math tricks seems a neat
solution at first, but it has some pain points regarding the
implementation. We have magic numbers, 26, and constants 'A'.ord all
over.
One solution is to give our code better knowlegde about our domain, i.e.
the alphabet. In that case, we can switch the modulo with the position of
the last character in the alphabet (because it's already sorted in a zero-based array), e.g.
ALPHABET = ('A'..'Z').to_a
def column_to_integer(column_name)
letters = /[A-Z]+/.match(column_name).to_s.split("")
laps = (letters.length - 1) * ALPHABET.size
position = ALPHABET.index(letters.last)
laps + position
end
The final result:
> column_to_integer('F5')
=> 5
> column_to_integer('AK14')
=> 36
HTH. Best!
I have found particularly neat way to do this conversion:
def index_from_column_name(colname)
s=colname.size
(colname.to_i(36)-(36**s-1).div(3.5)).to_s(36).to_i(26)+(26**s-1)/25-1
end
Explanation why it works
(warning spoiler ;) ahead). Basically we are doing this
(colname.to_i(36)-('A'*colname.size).to_i(36)).to_s(36).to_i(26)+('1'*colname.size).to_i(26)-1
which in plain English means, that we are interpreting colname as 26-base number. Before we can do it we need to interpret all A's as 1, B's as 2 etc. If only this is needed than it would be even simpler, namely
(colname.to_i(36) - '9'*colname.size).to_i(36)).to_s(36).to_i(26)-1
unfortunately there are Z characters present which would need to be interpreted as 10(base 26) so we need a little trick. We shift every digit 1 more then needed and than add it at the end (to every digit in original colname)
`

Generating an Instagram- or Youtube-like unguessable string ID in ruby/ActiveRecord

Upon creating an instance of a given ActiveRecord model object, I need to generate a shortish (6-8 characters) unique string to use as an identifier in URLs, in the style of Instagram's photo URLs (like http://instagram.com/p/P541i4ErdL/, which I just scrambled to be a 404) or Youtube's video URLs (like http://www.youtube.com/watch?v=oHg5SJYRHA0).
What's the best way to go about doing this? Is it easiest to just create a random string repeatedly until it's unique? Is there a way to hash/shuffle the integer id in such a way that users can't hack the URL by changing one character (like I did with the 404'd Instagram link above) and end up at a new record?
Here's a good method with no collision already implemented in plpgsql.
First step: consider the pseudo_encrypt function from the PG wiki.
This function takes a 32 bits integer as argument and returns a 32 bits integer that looks random to the human eye but uniquely corresponds to its argument (so that's encryption, not hashing). Inside the function, you may change the formula: (((1366.0 * r1 + 150889) % 714025) / 714025.0) with another function known only by you that produces a result in the [0..1] range (just tweaking the constants will probably be good enough, see below my attempt at doing just that). Refer to the wikipedia article on the Feistel cypher for more theorical explanations.
Second step: encode the output number in the alphabet of your choice. Here's a function that does it in base 62 with all alphanumeric characters.
CREATE OR REPLACE FUNCTION stringify_bigint(n bigint) RETURNS text
LANGUAGE plpgsql IMMUTABLE STRICT AS $$
DECLARE
alphabet text:='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
base int:=length(alphabet);
_n bigint:=abs(n);
output text:='';
BEGIN
LOOP
output := output || substr(alphabet, 1+(_n%base)::int, 1);
_n := _n / base;
EXIT WHEN _n=0;
END LOOP;
RETURN output;
END $$
Now here's what we'd get for the first 10 URLs corresponding to a monotonic sequence:
select stringify_bigint(pseudo_encrypt(i)) from generate_series(1,10) as i;
stringify_bigint
------------------
tWJbwb
eDUHNb
0k3W4b
w9dtmc
wWoCi
2hVQz
PyOoR
cjzW8
bIGoqb
A5tDHb
The results look random and are guaranteed to be unique in the entire output space (2^32 or about 4 billion values if you use the entire input space with negative integers as well).
If 4 billion values was not wide enough, you may carefully combine two 32 bits results to get to 64 bits while not loosing unicity in outputs. The tricky parts are dealing correctly with the sign bit and avoiding overflows.
About modifying the function to generate your own unique results: let's change the constant from 1366.0 to 1367.0 in the function body, and retry the test above. See how the results are completely different:
NprBxb
sY38Ob
urrF6b
OjKVnc
vdS7j
uEfEB
3zuaT
0fjsab
j7OYrb
PYiwJb
Update: For those who can compile a C extension, a good replacement for pseudo_encrypt() is range_encrypt_element() from the permuteseq extension, which has of the following advantages:
works with any output space up to 64 bits, and it doesn't have to be a power of 2.
uses a secret 64-bit key for unguessable sequences.
is much faster, if that matters.
You could do something like this:
random_attribute.rb
module RandomAttribute
def generate_unique_random_base64(attribute, n)
until random_is_unique?(attribute)
self.send(:"#{attribute}=", random_base64(n))
end
end
def generate_unique_random_hex(attribute, n)
until random_is_unique?(attribute)
self.send(:"#{attribute}=", SecureRandom.hex(n/2))
end
end
private
def random_is_unique?(attribute)
val = self.send(:"#{attribute}")
val && !self.class.send(:"find_by_#{attribute}", val)
end
def random_base64(n)
val = base64_url
val += base64_url while val.length < n
val.slice(0..(n-1))
end
def base64_url
SecureRandom.base64(60).downcase.gsub(/\W/, '')
end
end
Raw
user.rb
class Post < ActiveRecord::Base
include RandomAttribute
before_validation :generate_key, on: :create
private
def generate_key
generate_unique_random_hex(:key, 32)
end
end
You can hash the id:
Digest::MD5.hexdigest('1')[0..9]
=> "c4ca4238a0"
Digest::MD5.hexdigest('2')[0..9]
=> "c81e728d9d"
But somebody can still guess what you're doing and iterate that way. It's probably better to hash on the content

Calculating the size of an Array pack struct format in Ruby

In the case of e.g. ddddd, d is the native format for the system, so I can't know exactly how big it will be.
In python I can do:
import struct
print struct.calcsize('ddddd')
Which will return 40.
How do I get this in Ruby?
I haven't found a built-in way to do this, but I've had success with this small function when I know I'm dealing with only numeric formats:
def calculate_size(format)
# Only for numeric formats, String formats will raise a TypeError
elements = 0
format.each_char do |c|
if c =~ /\d/
elements += c.to_i - 1
else
elements += 1
end
end
([ 0 ] * elements).pack(format).length
end
This constructs an array of the proper number of zeros, calls pack() with your format, and returns the length (in bytes). Zeros work in this case because they're convertible to each of the numeric formats (integer, double, float, etc).
I don't know of a shortcut but you can just pack one and ask how long it is:
length_of_five_packed_doubles = 5 * [1.0].pack('d').length
By the way, a ruby array combined with the pack method appears to be functionally equivalent to python's struct module. Ruby pretty much copied perl's pack and put them as methods on the Array class.

Code folding on consecutive collect/select/reject/each

I play around with arrays and hashes quite a lot in ruby and end up with some code that looks like this:
sum = two_dimensional_array.select{|i|
i.collect{|j|
j.to_i
}.sum > 5
}.collect{|i|
i.collect{|j|
j ** 2
}.average
}.sum
(Let's all pretend that the above code sample makes sense now...)
The problem is that even though TextMate (my editor of choice) picks up simple {...} or do...end blocks quite easily, it can't figure out (which is understandable since even I can't find a "correct" way to fold the above) where the above blocks start and end to fold them.
How would you fold the above code sample?
PS: considering that it could have 2 levels of folding, I only care about the outer consecutive ones (the blocks with the i)
To be honest, something that convoluted is probably confusing TextMate as much as anyone else who has to maintain it, and that includes you in the future.
Whenever you see something that rolls up into a single value, it's a good case for using Enumerable#inject.
sum = two_dimensional_array.inject(0) do |sum, row|
# Convert row to Fixnum equivalent
row_i = row.collect { |i| i.to_i }
if (row_i.sum > 5)
sum += row_i.collect { |i| i ** 2 }.average
end
sum # Carry through to next inject call
end
What's odd in your example is you're using select to return the full array, allegedly converted using to_i, but in fact Enumerable#select does no such thing, and instead rejects any for which the function returns nil. I'm presuming that's none of your values.
Also depending on how your .average method is implemented, you may want to seed the inject call with 0.0 instead of 0 to use a floating-point value.

Resources