String is unexpectedly converted to hex - ruby

I tried to get some data from Firebird database. I have a field "UID", whose value is de6c50a94aee524d9d287a43158360f4 String(16).
When I get it with Ruby, I got:
"UID"=>"\xDElP\xA9J\xEERM\x9D(zC\x15\x83`\xF4"
Why didn't I get a string?
conn.query(:hash , 'SELECT FIRST 1 UID FROM cmd').first

The UID you receive is a binary array, which in ruby is represented as a packed string. To unpack it do the following:
"\xDElP\xA9J\xEERM\x9D(zC\x15\x83`\xF4".unpack('n*').map { |x| x.to_s(16) }.join
# => "de6c50a94aee524d9d287a43158360f4"

Your UID is a 128bit value. The hex string representation of UID can be built with unpack:
str = "%08x%04x%04x%04x%04x%08x" % UID.unpack("NnnnnN")
=> "de6c50a94aee524d9d287a43158360f4"
The reason for the specific formatting is this code is really for UUID's
str = "%08x-%04x-%04x-%04x-%04x%08x" % UID.unpack("NnnnnN")
=> "de6c50a9-4aee-524d-9d28-7a43158360f4"

As I commented, I guess the datatype of UID in your Firebird database is a CHAR(16) CHARACTER SET OCTETS, this is a binary datatype. Firebird (before Firebird 4) doesn't know the SQL types BINARY or VARBINARY, but fields with CHARACTER SET OCTETS are binary.
The value you are retrieving is probably a UUID. You either need to use the value as a binary, or select a human 'readable' UUID string using UUID_TO_CHAR:
SELECT FIRST 1 UUID_TO_CHAR(UID) FROM cmd

Related

Ruby: Why does unpack('Q') give a different result than manual conversion?

I'm trying to write a function that will .unpack('Q') (unpack to uint64_t) without access to the unpack method.
When I manually convert from string to binary to uint64, I get a different result than .unpack('Q'):
Integer('abcdefgh'.unpack('B*').first, 2) # => 7017280452245743464
'abcdefgh'.unpack('Q').first # => 7523094288207667809
I don't understand what's happening here.
I also don't understand why the output of .unpack('Q') is fixed regardless of the size of the input. If I add a thousand characters after 'abcdefgh' and then unpack('Q') it, I still just get [7523094288207667809]?
Byte order matters:
Integer('abcdefgh'.
each_char.
flat_map { |c| c.unpack('B*') }.
reverse.
join, 2)
#⇒ 7523094288207667809
'abcdefgh'.unpack('Q*').first
#⇒ 7523094288207667809
Your code produces the wrong result because after converting to binary, bytes should be reversed.
For the last part of your question, the reason the output of .unpack('Q') doesn't change with a longer input string is because the format is specifying a single 64-bit value so any characters after the first 8 are ignored. If you specified a format of Q2 and a 16 character string you'd decode 2 values:
> 'abcdefghihjklmno'.unpack('Q2')
=> [7523094288207667809, 8029475498074204265]
and again you'd find adding additional characters wouldn't change the result:
> 'abcdefghihjklmnofoofoo'.unpack('Q2')
=> [7523094288207667809, 8029475498074204265]
A format of Q* would return as many values as multiples of 64-bits were in the input:
> 'abcdefghihjklmnopqrstuvw'.unpack('Q*')
=> [7523094288207667809, 8029475498074204265, 8608196880778817904]
> 'abcdefghihjklmnopqrstuvwxyz'.unpack('Q*')
=> [7523094288207667809, 8029475498074204265, 8608196880778817904]

How can I convert a UUID to a string using a custom character set in Ruby?

I want to create a valid IFC GUID (IfcGloballyUniqueId) according to the specification here:
http://www.buildingsmart-tech.org/ifc/IFC2x3/TC1/html/ifcutilityresource/lexical/ifcgloballyuniqueid.htm
It's basically a UUID or GUID (128 bit) mapped to a set of 22 characters to limit storage space in a text file.
I currently have this workaround, but it's merely an approximation:
guid = '';22.times{|i|guid<<'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$'[rand(64)]}
It seems best to use ruby SecureRandom to generate a 128 bit UUID, like in this example (https://ruby-doc.org/stdlib-2.3.0/libdoc/securerandom/rdoc/SecureRandom.html):
SecureRandom.uuid #=> "2d931510-d99f-494a-8c67-87feb05e1594"
This UUID needs to be mapped to a string with a length of 22 characters according to this format:
1 2 3 4 5 6
0123456789012345678901234567890123456789012345678901234567890123
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$";
I don't understand this exactly.
Should the 32-character long hex-number be converted to a 128-character long binary number, then devided in 22 sets of 6 bits(except for one that gets the remaining 2 bits?) for which each can be converted to a decimal number from 0 to 64? Which then in turn can be replaced by the corresponding character from the conversion table?
I hope someone can verify if I'm on the right track here.
And if I am, is there a computational faster way in Ruby to convert the 128 bit number to the 22 sets of 0-64 than using all these separate conversions?
Edit: For anyone having the same problem, this is my solution for now:
require 'securerandom'
# possible characters in GUID
guid64 = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$'
guid = ""
# SecureRandom.uuid: creates a 128 bit UUID hex string
# tr('-', ''): removes the dashes from the hex string
# pack('H*'): converts the hex string to a binary number (high nibble first) (?) is this correct?
# This reverses the number so we end up with the leftover bit on the end, which helps with chopping the sting into pieces.
# It needs to be reversed again to end up with a string in the original order.
# unpack('b*'): converts the binary number to a bit string (128 0's and 1's) and places it into an array
# [0]: gets the first (and only) value from the array
# to_s.scan(/.{1,6}/m): chops the string into pieces 6 characters(bits) with the leftover on the end.
[SecureRandom.uuid.tr('-', '')].pack('H*').unpack('b*')[0].to_s.scan(/.{1,6}/m).each do |num|
# take the number (0 - 63) and find the matching character in guid64, add the found character to the guid string
guid << guid64[num.to_i(2)]
end
guid.reverse
Base64 encoding is pretty close to what you want here, but the mappings are different. No big deal, you can fix that:
require 'securerandom'
require 'base64'
# Define the two mappings here, side-by-side
BASE64 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
IFCB64 = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$'
def ifcb64(hex)
# Convert from hex to binary, then from binary to Base64
# Trim off the == padding, then convert mappings with `tr`
Base64.encode64([ hex.tr('-', '') ].pack('H*')).gsub(/\=*\n/, '').tr(BASE64, IFCB64)
end
ifcb64(SecureRandom.uuid)
# => "fa9P7E3qJEc1tPxgUuPZHm"

Visual Works smalltalk, how to convert Ascii values to characters

using visualworks, in small talk, I'm receiving a string like '31323334' from a network connection.
I need a string that reads '1234' so I need a way of extracting two characters at a time, converting them to what they represent in ascii, and then building a string of them...
Is there a way to do so?
EDIT(7/24): for some reason many of you are assuming I will only be working with numbers and could just truncate 3s or read every other char. This is not the case, examples of strings read could include any keys on the US standard keyboard (a-z, A-Z,0-9,punctuation/annotation such as {}*&^%$...)
Following along the lines of what Max started to suggest:
x := '31323334'.
in := ReadStream on: x.
out := WriteStream on: String new.
[ in atEnd ] whileFalse: [ out nextPut: (in next digitValue * 16 + (in next digitValue)) asCharacter ].
newX := out contents.
newX will have the result '1234'. Or, if you start with:
x := '454647'
You will get a result of 'EFG'.
Note that digitValue might only recognize upper case hex digits, so an asUppercase may be needed on the string before processing.
There is usually a #fold: or #reduce: method that will let you do that. In Pharo there's also a message #allPairsDo: and #groupsOf:atATimeCollect:. Using one of these methods you could do:
| collectionOfBytes |
collectionOfBytes := '9798'
groupsOf: 2
atATimeCollect: [ :group |
(group first digitValue * 10) + (group second digitValue) ].
collectionOfBytes asByteArray asString "--> 'ab'"
The #digitValue message in Pharo simply returns the value of the digit for numerical characters.
If you're receiving the data on a stream you could replace #groupsOf:atATime: with a loop (result may be any collection that you then convert to a string like above):
...
[ stream atEnd ] whileFalse: [
result add: (stream next digitValue * 10) + (stream next digitValue) ]
...
in Smalltalk/X, there is a method called "fromHexBytes:" which the ByteArray class understands. I am not sure, but think that something similar exists in other ST dialects.
If present, you can solve this with:
(ByteArray fromHexString:'68656C6C6F31323334') asString
and the reverse would be:
'hello1234' asByteArray hexPrintString
Another possible solution is to read the string as a hex number,
fetch the digitBytes (which should give you a byte array) and then convert that to a string.
I.e.
(Integer readFrom:'68656C6C6F31323334' radix:16)
digitBytes asString
One problem with that is that I am not sure about which byte-order you will get the digitBytes (LSB or MSB), and if that is defined to be the same across architectures or converted at image loading time to use the native order. So it may be required to reverse the string at the end (to be portable, it may even be required to reverse it conditionally, depending on the endianess of the system.
I cannot test this on VisualWorks, but I assume it should work fine there, too.

Is it possible to identify the format of a string?

Is it possible to recognize if a string is formatted as a BSON ObjectID?
For strings we could do:
"hello".is_a?(String) # => true
That would not work since the ObjectID is a String anyway. But is it possible to analyze the string to determine if it's formatted as a BSON ObjectID?
Usually, ObjectIDs have this format.
52f4e2274d6f6865080c0000
The formatting criteria is stated in the docs:
ObjectId is a 12-byte BSON type, constructed using:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
Any 24 chararcters long hexadecimal string is a valid BSON object id, so you can check it using this regular expression:
'52f4e2274d6f6865080c0000' =~ /\A\h{24}\z/
# => 0
Both the moped (used by mongoid) and the bson (used by mongo_mapper) gems encapsulates this check in a legal? method:
require 'moped'
Moped::BSON::ObjectId.legal?('00' * 12)
# => true
require 'bson'
BSON::ObjectId.legal?('00' * 12)
# => true
In Mongoid use: .is_a?(Moped::BSON::ObjectId) sytanx.
Example:
some_id = YourModel.first.id
some_id.is_a?(Moped::BSON::ObjectId)
Note:
"52d7874679478f45e8000001".is_a?(String) # Prints true

Hexa to decimal conversion in Ruby

I have "\001\022" as value of a. my desired decimal value is 274.
I tried following function . but I get ["0112"]
a.unpack("H*") ==> ["0112"]
When I convert this "0112" to decimal using calculator it gives me 274. How can i get like
this using ruby methods.
Thanks
The format string in your question: "H*", is for "hex string (high nibble first)". Therefore it decoded your string as an array of 4-bit hexadecimal elements.
You need a different format.
Try this, which decodes it as a "16-bit unsigned, network (big-endian) byte order" integer:
a.unpack("n") # => [274]
For full details on what characters you can use in the format string, check the Ruby Documentation for String#unpack.

Resources