Convert string to hexadecimal in Ruby - ruby

I'm trying to convert a Binary file to Hexadecimal using Ruby.
At the moment I have the following:
File.open(out_name, 'w') do |f|
f.puts "const unsigned int modFileSize = #{data.length};"
f.puts "const char modFile[] = {"
first_line = true
data.bytes.each_slice(15) do |a|
line = a.map { |b| ",#{b}" }.join
if first_line
f.puts line[1..-1]
else
f.puts line
end
first_line = false
end
f.puts "};"
end
This is what the following code is generating:
const unsigned int modFileSize = 82946;
const char modFile[] = {
116, 114, 97, 98, 97, 108, 97, 115, 104, 0, 0, 0, 0, 0, 0
, 0, 0, 0, 0, 0, 62, 62, 62, 110, 117, 107, 101, 32, 111, 102
, 32, 97, 110, 97, 114, 99, 104, 121, 60, 60, 60, 8, 8, 130, 0
};
What I need is the following:
const unsigned int modFileSize = 82946;
const char modFile[] = {
0x74, 0x72, etc, etc
};
So I need to be able to convert a string to its hexadecimal value.
"116" => "0x74", etc
Thanks in advance.

Ruby 1.9 added an even easier way to do this:
"0x101".hex will return the number given in hexadecimal in the string.

Change this line:
line = a.map { |b| ", #{b}" }.join
to this:
line = a.map { |b| sprintf(", 0x%02X",b) }.join
(Change to %02x if necessary, it's unclear from the example whether the hex digits should be capitalized.)

I don't know if this is the best solution, but this a solution:
class String
def to_hex
"0x" + self.to_i.to_s(16)
end
end
"116".to_hex
=> "0x74"

Binary to hex conversion in four languages (including Ruby) might be helpful.
One of the comments on that page seems to provide a very easy short cut. The example covers reading input from STDIN, but any string representation should do.:
STDIN.read.to_i(base=16).to_s(base=2)

For another approach, check out unpack

Related

ruby - How to change font color array value

With ERB, print into HTML.
whenever the array value is divisible by 2, I want to change font color to red.
whenever the value contains 3, I want to change the font color to blue.
How do I write input.txt ?
This is code of example.rb
require 'erb'
input_file = "input.txt"
output_file = "output.html"
prime_number = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97]
str = ""
File.open(input_file, "r:UTF-8") do |io|
str = io.read
end
erb = ERB.new(str)
result = erb.result(binding)
File.open(output_file, "w") do |io|
io.write result
end
This is input.txt
<% prime_number.each |a| % 2 == 0 do %>
<% puts a %>
<% end %>

Need a Ruby method to convert a binary array to ASCII alpha-numeric string

I have an array of [1, 0, 11, 0, 4, 0, 106, 211, 169, 1, 0, 12, 0, 8, 0, 1, 26, 25, 32, 189, 77, 216, 1, 0, 1, 0, 4, 0, 0, 0, 0, 12, 15].
I would love to create a string version mostly for logging purposes. My end result would be "01000B0004006AD3..."
I could not find a simple way to take each array byte value and pack a string with an ASCII presentation of the byte value.
My solution is cumbersome. I appreciate advice on making the solution slick.
array.each {|x|
value = (x>>4)&0x0f
if( value>9 ) then
result_string.concat (value-0x0a + 'A'.ord).chr
else
result_string.concat (value + '0'.ord).chr
end
value = (x)&0x0f
if( value>9 ) then
result_string.concat (value-0x0a + 'A'.ord).chr
else
result_string.concat (value + '0'.ord).chr
end
}
Your question isn't very clear, but I guess something like this is what you are looking for:
array.map {|n| n.to_s(16).rjust(2, '0').upcase }.join
#=> "01000B0004006AD3A901000C000800011A1920BD4DD80100010004000000000C0F"
or
array.map(&'%02X'.method(:%)).join
#=> "01000B0004006AD3A901000C000800011A1920BD4DD80100010004000000000C0F"
Which one of the two is more readable depends on how familiar your readers are with sprintf-style format strings, I guess.
It's actually pretty simple:
def hexpack(data)
data.pack('C*').unpack('H*')[0]
end
That packs your bytes using integer values (C) and unpacks the resulting string to hex (H). In practice:
hexpack([1, 0, 11, 0, 4, 0, 106, 211, 169, 1, 0, 12, 0, 8, 0, 1, 26, 25, 32, 189, 77, 216, 1, 0, 1, 0, 4, 0, 0, 0, 0, 12, 15])
# => "01000b0004006ad3a901000c000800011a1920bd4dd80100010004000000000c0f"
I might suggest you stick to hex or base64 instead of making your own formatting
dat = [1, 0, 11, 0, 4, 0, 106, 211, 169, 1, 0, 12, 0, 8, 0, 1, 26, 25, 32, 189, 77, 216, 1, 0, 1, 0, 4, 0, 0, 0, 0, 12, 15]
Hexadecimal
hex = dat.map { |x| sprintf('%02x', x) }.join
# => 01000b0004006ad3a901000c000800011a1920bd4dd80100010004000000000c0f
Base64
require 'base64'
base64 = Base64.encode64(dat.pack('c*'))
# => AQALAAQAatOpAQAMAAgAARoZIL1N2AEAAQAEAAAAAAwP\n
Proquints
What? Proquints are pronounceable unique identifiers which makes them great for reading/communicating binary data. In your case, maybe not the best because you're dealing with 30+ bytes here, but they're very suitable for smaller byte strings
# proquint.rb
# adapted to ruby from https://github.com/deoxxa/proquint
module Proquint
C = %w(b d f g h j k l m n p r s t v z)
V = %w(a i o u)
def self.encode (bytes)
bytes << 0 if bytes.size & 1 == 1
bytes.pack('c*').unpack('S*').reduce([]) do |acc, n|
c1 = n & 0x0f
v1 = (n >> 4) & 0x03
c2 = (n >> 6) & 0x0f
v2 = (n >> 10) & 0x03
c3 = (n >> 12) & 0x0f
acc << C[c1] + V[v1] + C[c2] + V[v2] + C[c3]
end.join('-')
end
def decode str
# learner's exercise
# or see some proquint library (eg) https://github.com/deoxxa/proquint
end
end
Proquint.encode dat
# => dabab-rabab-habab-potat-nokab-babub-babob-bahab-pihod-bohur-tadot-dabab-dabab-habab-babab-babub-zabab
Of course the entire process is reversible too. You might not need it, so I'll leave it as an exercise for the learner
It's particularly nice for things like IP address, or any other short binary blobs. You gain familiarity too as you see common byte strings in their proquint form
Proquint.encode [192, 168, 11, 51] # bagop-rasag
Proquint.encode [192, 168, 11, 52] # bagop-rabig
Proquint.encode [192, 168, 11, 66] # bagop-ramah
Proquint.encode [192, 168, 22, 19] # bagop-kisad
Proquint.encode [192, 168, 22, 20] # bagop-kibid

sql server 2008 checksum algorithm? [duplicate]

We perform checksums of some data in sql server as follows:
declare #cs int;
select
#cs = CHECKSUM_AGG(CHECKSUM(someid, position))
from
SomeTable
where
userid = #userId
group by
userid;
This data is then shared with clients. We'd like to be able to repeat the checksum at the client end... however there doesn't seem to be any info about how the checksums in the functions above are calculated. Can anyone enlighten me?
On SQL Server Forum, at this page, it's stated:
The built-in CHECKSUM function in SQL Server is built on a series of 4 bit left rotational xor operations. See this post for more explanation.
The CHECKSUM function doesn't provide a very good quality checksum and IMO is pretty useless for most purposes. As far as I know the algorithm isn't published. If you want a check that you can reproduce yourself then use the HashBytes function and one of the standard, published algorithms such as MD5 or SHA.
//Quick hash sum of SQL and C # mirror Ukraine
private Int64 HASH_ZKCRC64(byte[] Data)
{
Int64 Result = 0x5555555555555555;
if (Data == null || Data.Length <= 0) return 0;
int SizeGlobalBufer = 8000;
int Ost = Data.Length % SizeGlobalBufer;
int LeftLimit = (Data.Length / SizeGlobalBufer) * SizeGlobalBufer;
for (int i = 0; i < LeftLimit; i += 64)
{
Result = Result
^ BitConverter.ToInt64(Data, i)
^ BitConverter.ToInt64(Data, i + 8)
^ BitConverter.ToInt64(Data, i + 16)
^ BitConverter.ToInt64(Data, i + 24)
^ BitConverter.ToInt64(Data, i + 32)
^ BitConverter.ToInt64(Data, i + 40)
^ BitConverter.ToInt64(Data, i + 48)
^ BitConverter.ToInt64(Data, i + 56);
if ((Result & 0x0000000000000080) != 0)
Result = Result ^ BitConverter.ToInt64(Data, i + 28);
}
if (Ost > 0)
{
byte[] Bufer = new byte[SizeGlobalBufer];
Array.Copy(Data, LeftLimit, Bufer, 0, Ost);
for (int i = 0; i < SizeGlobalBufer; i += 64)
{
Result = Result
^ BitConverter.ToInt64(Bufer, i)
^ BitConverter.ToInt64(Bufer, i + 8)
^ BitConverter.ToInt64(Bufer, i + 16)
^ BitConverter.ToInt64(Bufer, i + 24)
^ BitConverter.ToInt64(Bufer, i + 32)
^ BitConverter.ToInt64(Bufer, i + 40)
^ BitConverter.ToInt64(Bufer, i + 48)
^ BitConverter.ToInt64(Bufer, i + 56);
if ((Result & 0x0000000000000080)!=0)
Result = Result ^ BitConverter.ToInt64(Bufer, i + 28);
}
}
byte[] MiniBufer = BitConverter.GetBytes(Result);
Array.Reverse(MiniBufer);
return BitConverter.ToInt64(MiniBufer, 0);
#region SQL_FUNCTION
/* CREATE FUNCTION [dbo].[HASH_ZKCRC64] (#data as varbinary(MAX)) Returns bigint
AS
BEGIN
Declare #I64 as bigint Set #I64=0x5555555555555555
Declare #Bufer as binary(8000)
Declare #i as int Set #i=1
Declare #j as int
Declare #Len as int Set #Len=Len(#data)
if ((#data is null) Or (#Len<=0)) Return 0
While #i<=#Len
Begin
Set #Bufer=Substring(#data,#i,8000)
Set #j=1
While #j<=8000
Begin
Set #I64=#I64
^ CAST(Substring(#Bufer,#j, 8) as bigint)
^ CAST(Substring(#Bufer,#j+8, 8) as bigint)
^ CAST(Substring(#Bufer,#j+16,8) as bigint)
^ CAST(Substring(#Bufer,#j+24,8) as bigint)
^ CAST(Substring(#Bufer,#j+32,8) as bigint)
^ CAST(Substring(#Bufer,#j+40,8) as bigint)
^ CAST(Substring(#Bufer,#j+48,8) as bigint)
^ CAST(Substring(#Bufer,#j+56,8) as bigint)
if #I64<0 Set #I64=#I64 ^ CAST(Substring(#Bufer,#j+28,8) as bigint)
Set #j=#j+64
End;
Set #i=#i+8000
End
Return #I64
END
*/
#endregion
}
I figured out the CHECKSUM algorithm, at least for ASCII characters. I created a proof of it in JavaScript (see https://stackoverflow.com/a/59014293/9642).
In a nutshell: rotate 4 bits left and xor by a code for each character. The trick was figuring out the "XOR codes". Here's the table of those:
var xorcodes = [
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
0, 33, 34, 35, 36, 37, 38, 39, // !"#$%&'
40, 41, 42, 43, 44, 45, 46, 47, // ()*+,-./
132, 133, 134, 135, 136, 137, 138, 139, // 01234567
140, 141, 48, 49, 50, 51, 52, 53, 54, // 89:;<=>?#
142, 143, 144, 145, 146, 147, 148, 149, // ABCDEFGH
150, 151, 152, 153, 154, 155, 156, 157, // IJKLMNOP
158, 159, 160, 161, 162, 163, 164, 165, // QRSTUVWX
166, 167, 55, 56, 57, 58, 59, 60, // YZ[\]^_`
142, 143, 144, 145, 146, 147, 148, 149, // abcdefgh
150, 151, 152, 153, 154, 155, 156, 157, // ijklmnop
158, 159, 160, 161, 162, 163, 164, 165, // qrstuvwx
166, 167, 61, 62, 63, 64, 65, 66, // yz{|}~
];
The main thing to note is the bias towards alphanumerics (their codes are similar and ascending). English letters use the same code regardless of case.
I haven't tested high codes (128+) nor Unicode.

Ruby, XOR random bytes in binary string

I have a string of binary data and want to pick a single character, and ^ it by 0xff. Is there a simple way to do this? For example:
x = "test\223\434t"
r = rand(x.length)
c = x[r].unpack("H*") ^ 0xff # This doesnt work
# Re concat the string
bytes = x.bytes.to_a
# => [116, 101, 115, 116, 147, 28, 116]
bytes[rand(bytes.length)] ^ 0xff
# => 139

At which n does binary search become faster than linear search on a modern CPU?

Due to the wonders of branch prediction, a binary search can be slower than a linear search through an array of integers. On a typical desktop processor, how big does that array have to get before it would be better to use a binary search? Assume the structure will be used for many lookups.
I've tried a little C++ benchmarking and I'm surprised - linear search seems to prevail up to several dozen items, and I haven't found a case where binary search is better for those sizes. Maybe gcc's STL is not well tuned? But then -- what would you use to implement either kind of search?-) So here's my code, so everybody can see if I've done something silly that would distort timing grossly...:
#include <vector>
#include <algorithm>
#include <iostream>
#include <stdlib.h>
int data[] = {98, 50, 54, 43, 39, 91, 17, 85, 42, 84, 23, 7, 70, 72, 74, 65, 66, 47, 20, 27, 61, 62, 22, 75, 24, 6, 2, 68, 45, 77, 82, 29, 59, 97, 95, 94, 40, 80, 86, 9, 78, 69, 15, 51, 14, 36, 76, 18, 48, 73, 79, 25, 11, 38, 71, 1, 57, 3, 26, 37, 19, 67, 35, 87, 60, 34, 5, 88, 52, 96, 31, 30, 81, 4, 92, 21, 33, 44, 63, 83, 56, 0, 12, 8, 93, 49, 41, 58, 89, 10, 28, 55, 46, 13, 64, 53, 32, 16, 90
};
int tosearch[] = {53, 5, 40, 71, 37, 14, 52, 28, 25, 11, 23, 13, 70, 81, 77, 10, 17, 26, 56, 15, 94, 42, 18, 39, 50, 78, 93, 19, 87, 43, 63, 67, 79, 4, 64, 6, 38, 45, 91, 86, 20, 30, 58, 68, 33, 12, 97, 95, 9, 89, 32, 72, 74, 1, 2, 34, 62, 57, 29, 21, 49, 69, 0, 31, 3, 27, 60, 59, 24, 41, 80, 7, 51, 8, 47, 54, 90, 36, 76, 22, 44, 84, 48, 73, 65, 96, 83, 66, 61, 16, 88, 92, 98, 85, 75, 82, 55, 35, 46
};
bool binsearch(int i, std::vector<int>::const_iterator begin,
std::vector<int>::const_iterator end) {
return std::binary_search(begin, end, i);
}
bool linsearch(int i, std::vector<int>::const_iterator begin,
std::vector<int>::const_iterator end) {
return std::find(begin, end, i) != end;
}
int main(int argc, char *argv[])
{
int n = 6;
if (argc < 2) {
std::cerr << "need at least 1 arg (l or b!)" << std::endl;
return 1;
}
char algo = argv[1][0];
if (algo != 'b' && algo != 'l') {
std::cerr << "algo must be l or b, not '" << algo << "'" << std::endl;
return 1;
}
if (argc > 2) {
n = atoi(argv[2]);
}
std::vector<int> vv;
for (int i=0; i<n; ++i) {
if(data[i]==-1) break;
vv.push_back(data[i]);
}
if (algo=='b') {
std::sort(vv.begin(), vv.end());
}
bool (*search)(int i, std::vector<int>::const_iterator begin,
std::vector<int>::const_iterator end);
if (algo=='b') search = binsearch;
else search = linsearch;
int nf = 0;
int ns = 0;
for(int k=0; k<10000; ++k) {
for (int j=0; tosearch[j] >= 0; ++j) {
++ns;
if (search(tosearch[j], vv.begin(), vv.end()))
++nf;
}
}
std::cout << nf <<'/'<< ns << std::endl;
return 0;
}
and my a couple of my timings on a core duo:
AmAir:stko aleax$ time ./a.out b 93
1910000/2030000
real 0m0.230s
user 0m0.224s
sys 0m0.005s
AmAir:stko aleax$ time ./a.out l 93
1910000/2030000
real 0m0.169s
user 0m0.164s
sys 0m0.005s
They're pretty repeatable, anyway...
OP says: Alex, I edited your program to just fill the array with 1..n, not run std::sort, and do about 10 million (mod integer division) searches. Binary search starts to pull away from linear search at n=150 on a Pentium 4. Sorry about the chart colors.
binary vs linear search http://spreadsheets.google.com/pub?key=tzWXX9Qmmu3_COpTYkTqsOA&oid=1&output=image
I don't think branch prediction should matter because a linear search also has branches. And to my knowledge there are no SIMD that can do linear search for you.
Having said that, a useful model would be to assume that each step of the binary search has a multiplier cost C.
C log2 n = n
So to reason about this without actually benchmarking, you would make a guess for C, and round n to the next integer. For example if you guess C=3, then it would be faster to use binary search at n=11.
Not many - but hard to say exactly without benchmarking it.
Personally I'd tend to prefer the binary search, because in two years time, when someone else has quadrupled the size of your little array, you haven't lost much performance. Unless I knew very specifically that it's a bottleneck right now and I needed it to be as fast as possible, of course.
Having said that, remember that there are hash tables too; you could ask a similar question about them vs. binary search.

Resources