Base-36 representation of Digest - ruby

I would like to be able to take an arbitrary string, run it through a hashing function (like MD5), and then interpret the resulting digest in base-36.
I know there already exists a Digest library in Ruby, but as far as I can tell I can't get at the raw bytes of a digest; the to_s function is mapped to hexdigest, which is, of course, base-16.

Fixnum#to_s accepts a base as the argument. So does string#to_i. Because of this, you can convert from the base-16 string to an int, then to base 36 string:
i = hexstring.to_i(16)
base_36 = i.to_s(36)

You can access the raw digest bytes using Digest::Class#digest:
Digest::SHA1.digest("test")
# => "\xA9J\x8F\xE5\xCC\xB1\x9B\xA6\x1CL\bs\xD3\x91\xE9\x87\x98/\xBB\xD3"
Unfortunately from that point I'm not sure how to get it into base36 without first going via another number base like in Sammy Larbi's answer..
bytes = Digest::SHA1.digest("test")
Digest.hexencode(bytes).to_i(16).to_s(36)
Hopefully you can find a better way to go from raw bytes to base36.

Related

How to read a large file into a string

I'm trying to save and load the states of Matrices (using Matrix) during the execution of my program with the functions dump and load from Marshal. I can serialize the matrix and get a ~275 KB file, but when I try to load it back as a string to deserialize it into an object, Ruby gives me only the beginning of it.
# when I want to save
mat_dump = Marshal.dump(#mat) # serialize object - OK
File.open('mat_save', 'w') {|f| f.write(mat_dump)} # write String to file - OK
# somewhere else in the code
mat_dump = File.read('mat_save') # read String from file - only reads like 5%
#mat = Marshal.load(mat_dump) # deserialize object - "ArgumentError: marshal data too short"
I tried to change the arguments for load but didn't find anything yet that doesn't cause an error.
How can I load the entire file into memory? If I could read the file chunk by chunk, then loop to store it in the String and then deserialize, it would work too. The file has basically one big line so I can't even say I'll read it line by line, the problem stays the same.
I saw some questions about the topic:
"Ruby serialize array and deserialize back"
"What's a reasonable way to read an entire text file as a single string?"
"How to read whole file in Ruby?"
but none of them seem to have the answers I'm looking for.
Marshal is a binary format, so you need to read and write in binary mode. The easiest way is to use IO.binread/write.
...
IO.binwrite('mat_save', mat_dump)
...
mat_dump = IO.binread('mat_save')
#mat = Marshal.load(mat_dump)
Remember that Marshaling is Ruby version dependent. It's only compatible under specific circumstances with other Ruby versions. So keep that in mind:
In normal use, marshaling can only load data written with the same major version number and an equal or lower minor version number.

What kind of encoding does posFlag requires?

How can I encode the position of the form /pathto/file.go:40:32 which is returned by token.Position.String() to a posFlag param required by ParseQueryPos which looks like /pathto/file.go:#550.
Why?
I'm using the Oracle tool to do some static analysis. I need to run Oracle.Query which requires a param of type *QueryPos. The only way to get *QueryPos is using ParseQueryPos.
The source to tools/pos.go called by ParseQueryPos says
// parsePosFlag parses a string of the form "file:pos" or
// file:start,end" where pos, start, end match #%d and represent byte
// offsets, and returns its components.
If you really had to convert from line:column strings, you'd look at the file contents and count up bytes (including newlines) leading to that line:column. But since you're working with a token.Position, it looks like you can get what you need from token.Position.Offset.

Convert a string of 0-F into a byte array in Ruby

I am attempting to decrypt a number encrypted by another program that uses the BouncyCastle library for Java.
In Java, I can set the key like this: key = Hex.decode("5F3B603AFCE22359");
I am trying to figure out how to represent that same step in Ruby.
To get Integer — just str.hex. You may get byte array in several ways:
str.scan(/../).map(&:hex)
[str].pack('H*').unpack('C*')
[str].pack('H*').bytes.to_a
See other options for pack/unpack and examples (by codeweblog).
For a string str:
"".tap {|binary| str.scan(/../) {|hn| binary << hn.to_i(16).chr}}

Ruby on Rails - generating bit.ly style identifiers

I'm trying to generate UUIDs with the same style as bit.ly urls like:
http://bit [dot] ly/aUekJP
or cloudapp ones:
http://cl [dot] ly/1hVU
which are even smaller
how can I do it?
I'm now using UUID gem for ruby but I'm not sure if it's possible to limitate the length and get something like this.
I am currently using this:
UUID.generate.split("-")[0] => b9386070
But I would like to have even smaller and knowing that it will be unique.
Any help would be pretty much appreciated :)
edit note: replaced dot letters with [dot] for workaround of banned short link
You are confusing two different things here. A UUID is a universally unique identifier. It has a very high probability of being unique even if millions of them were being created all over the world at the same time. It is generally displayed as a 36 digit string. You can not chop off the first 8 characters and expect it to be unique.
Bitly, tinyurl et-al store links and generate a short code to represent that link. They do not reconstruct the URL from the code they look it up in a data-store and return the corresponding URL. These are not UUIDS.
Without knowing your application it is hard to advise on what method you should use, however you could store whatever you are pointing at in a data-store with a numeric key and then rebase the key to base32 using the 10 digits and 22 lowercase letters, perhaps avoiding the obvious typo problems like 'o' 'i' 'l' etc
EDIT
On further investigation there is a Ruby base32 gem available that implements Douglas Crockford's Base 32 implementation
A 5 character Base32 string can represent over 33 million integers and a 6 digit string over a billion.
If you are working with numbers, you can use the built in ruby methods
6175601989.to_s(30)
=> "8e45ttj"
to go back
"8e45ttj".to_i(30)
=>6175601989
So you don't have to store anything, you can always decode an incoming short_code.
This works ok for proof of concept, but you aren't able to avoid ambiguous characters like: 1lji0o. If you are just looking to use the code to obfuscate database record IDs, this will work fine. In general, short codes are supposed to be easy to remember and transfer from one medium to another, like reading it on someone's presentation slide, or hearing it over the phone. If you need to avoid characters that are hard to read or hard to 'hear', you might need to switch to a process where you generate an acceptable code, and store it.
I found this to be short and reliable:
def create_uuid(prefix=nil)
time = (Time.now.to_f * 10_000_000).to_i
jitter = rand(10_000_000)
key = "#{jitter}#{time}".to_i.to_s(36)
[prefix, key].compact.join('_')
end
This spits out unique keys that look like this: '3qaishe3gpp07w2m'
Reduce the 'jitter' size to reduce the key size.
Caveat:
This is not guaranteed unique (use SecureRandom.uuid for that), but it is highly reliable:
10_000_000.times.map {create_uuid}.uniq.length == 10_000_000
The only way to guarantee uniqueness is to keep a global count and increment it for each use: 0000, 0001, etc.

Ruby - read bytes from a file, convert to integer

I'm trying to read unsigned integers from a file (stored as consecutive byte) and convert them to Integers. I've tried this:
file = File.new(filename,"r")
num = file.read(2).unpack("S") #read an unsigned short
puts num #value will be less than expected
What am I doing wrong here?
You're not reading enough bytes. As you say in the comment to tadman's answer, you get 202 instead of 3405691582
Notice that the first 2 bytes of 0xCAFEBABE is 0xCA = 202
If you really want all 8 bytes in a single number, then you need to read more than the unsigned short
try
num = file.read(8).unpack("L_")
The underscore is assuming that the native long is going to be 8 bytes, which definitely is not guaranteed.
How about looking in The Pickaxe? (Ruby 1.9, p. 44)
File.open("testfile")
do |file|
file.each_byte {|ch| print "#{ch.chr}:#{ch} " }
end
each_byte iterates over a file byte by byte.
There are a couple of libraries that help with parsing binary data in Ruby, by letting you declare the data format in a simple high-level declarative DSL and then figure out all the packing, unpacking, bit-twiddling, shifting and endian-conversions by themselves.
I have never used one of these, but here's two examples. (There are more, but I don't know them):
BitStruct
BinData
Ok, I got it to work:
num = file.read(8).unpack("N")
Thanks for all of your help.
What format are the numbers stored in the file? Is it in hex? Your code looks correct to me.
When dealing with binary data you need to be sure you're opening the file in binary mode if you're on Windows. This goes for both reading and writing.
open(filename, "rb") do |file|
num = file.read(2).unpack("S")
puts num
end
There may also be issues with "endian" encoding depending on the source platform. For instance, PowerPC-based machines, which include old Mac systems, IBM Power servers, PS3 clusters, or Sun Sparc servers.
Can you post an example of how it's "less"? Usually there's an obvious pattern to the data.
For example, if you want 0x1234 but you get 0x3412 it's an endian problem.

Resources