Get only Hexadecima values (bytes) from array in ruby - ruby

I have the following array that represents decimal values of ASCII and non ASCII characters.
a=[32, 57, 50, 32, 56, 51, 32, 65, 52, 130, 0, 101, 131, 69, 72, 38, 146, 89, 9]
Converting to char looks like this
a.map{|b| b.chr}
=> [" ", "9", "2", " ", "8", "3", " ", "A", "4", "\x82", "\x00", "e", "\x83", "E", "H", "&", "\x92", "Y", "\t"]
and joining in order to create a string with bytes (pairs of hexadecimal numbers, [0-9A-F]) I do this:
a.map{|b| b.chr}.join
=> " 92 83 A4\x82\x00e\x83EH&\x92Y\t"
Then I want to remove the string beginning from the first non ASCII value that is \x82 and I do like this but nothing happens.
a.map{|b| b.chr}.join.gsub(/\\x.*/,"")
=> " 92 83 A4\x82\x00e\x83EH&\x92Y\t"
My expected output is to have only the hexadecimal numbers below:
92 83 A4
How can I do this?
Thanks for any help.
UPDATE
Testing with a larger array like below one, I see that the output is correct only for the #rewritten's solution. The output for this new arrays is " 92 83 49 26 92 59 00"
a=[32, 57, 50, 32, 56, 51, 32, 52, 57, 32, 50, 54, 32, 57, 50, 32, 53, 57,
32, 48, 48, 0, 0, 0, 0, 2, 130, 0, 0, 8, 254, 70, 124, 0, 6, 0, 3, 0, 3,
27, 0,2, 27, 3, 0, 227, 7, 1, 14, 17, 33, 0, 28, 14, 47, 38, 146, 89, 9]
a.map(&:chr).join.match(/^( \X\X)+/)[0] # rewritten's solution
a.map(&:chr).take_while(&"\x80".method(:>)).join # Aleksei's solution
a.map(&:chr).take_while(&:ascii_only?).join # cremno's solution
irb(main): a.map(&:chr).join.match(/^( \X\X)+/)[0]
=> " 92 83 49 26 92 59 00"
irb(main): a.map(&:chr).take_while(&"\x80".method(:>)).join
=> " 92 83 49 26 92 59 00\x00\x00\x00\x00\x02"
irb(main): a.map(&:chr).take_while(&:ascii_only?).join
=> " 92 83 49 26 92 59 00\x00\x00\x00\x00\x02"
Thanks to all for the help.

Just filter it out before joining an array into a string:
[" ", "9", "2", " ", "8", "3", " ", "A", "4", "\x82", "\x00"].
take_while(&"\x80".method(:>))
#⇒ [" ", "9", "2", " ", "8", "3", " ", "A", "4"]
Then do whatever you want with the resulting array.

Given the comment, I assume that you really want to ask about matching pattern "space, hex, hex" up to the first non-match.
This would be like
a.map(&:chr).join.match(/^( \X\X)+/)[0]
It uses the special \X placeholder for regular expressions that matches u̶p̶p̶e̶r̶c̶a̶s̶e̶ hex digits (0-9,A-F,a-f).
Additional info:
Again based on my interpretation of the question, if the original array is long (or a stream) there is no need to consume it all. You should better stop generating characters as soon as possible:
hexs = "0123456789ABCDEF".split.map(&:ord)
a.
lazy.
each_slice(3).
take_while { |spc, h1, h2| spc == 32 && hexs.include?(h1) && hexs.include?(h2) }.
flat_map(&:chr).
to_a.
join
This way any piece of your integer array is not even taken into account.

Related

Need a Ruby method to convert a binary array to ASCII alpha-numeric string

I have an array of [1, 0, 11, 0, 4, 0, 106, 211, 169, 1, 0, 12, 0, 8, 0, 1, 26, 25, 32, 189, 77, 216, 1, 0, 1, 0, 4, 0, 0, 0, 0, 12, 15].
I would love to create a string version mostly for logging purposes. My end result would be "01000B0004006AD3..."
I could not find a simple way to take each array byte value and pack a string with an ASCII presentation of the byte value.
My solution is cumbersome. I appreciate advice on making the solution slick.
array.each {|x|
value = (x>>4)&0x0f
if( value>9 ) then
result_string.concat (value-0x0a + 'A'.ord).chr
else
result_string.concat (value + '0'.ord).chr
end
value = (x)&0x0f
if( value>9 ) then
result_string.concat (value-0x0a + 'A'.ord).chr
else
result_string.concat (value + '0'.ord).chr
end
}
Your question isn't very clear, but I guess something like this is what you are looking for:
array.map {|n| n.to_s(16).rjust(2, '0').upcase }.join
#=> "01000B0004006AD3A901000C000800011A1920BD4DD80100010004000000000C0F"
or
array.map(&'%02X'.method(:%)).join
#=> "01000B0004006AD3A901000C000800011A1920BD4DD80100010004000000000C0F"
Which one of the two is more readable depends on how familiar your readers are with sprintf-style format strings, I guess.
It's actually pretty simple:
def hexpack(data)
data.pack('C*').unpack('H*')[0]
end
That packs your bytes using integer values (C) and unpacks the resulting string to hex (H). In practice:
hexpack([1, 0, 11, 0, 4, 0, 106, 211, 169, 1, 0, 12, 0, 8, 0, 1, 26, 25, 32, 189, 77, 216, 1, 0, 1, 0, 4, 0, 0, 0, 0, 12, 15])
# => "01000b0004006ad3a901000c000800011a1920bd4dd80100010004000000000c0f"
I might suggest you stick to hex or base64 instead of making your own formatting
dat = [1, 0, 11, 0, 4, 0, 106, 211, 169, 1, 0, 12, 0, 8, 0, 1, 26, 25, 32, 189, 77, 216, 1, 0, 1, 0, 4, 0, 0, 0, 0, 12, 15]
Hexadecimal
hex = dat.map { |x| sprintf('%02x', x) }.join
# => 01000b0004006ad3a901000c000800011a1920bd4dd80100010004000000000c0f
Base64
require 'base64'
base64 = Base64.encode64(dat.pack('c*'))
# => AQALAAQAatOpAQAMAAgAARoZIL1N2AEAAQAEAAAAAAwP\n
Proquints
What? Proquints are pronounceable unique identifiers which makes them great for reading/communicating binary data. In your case, maybe not the best because you're dealing with 30+ bytes here, but they're very suitable for smaller byte strings
# proquint.rb
# adapted to ruby from https://github.com/deoxxa/proquint
module Proquint
C = %w(b d f g h j k l m n p r s t v z)
V = %w(a i o u)
def self.encode (bytes)
bytes << 0 if bytes.size & 1 == 1
bytes.pack('c*').unpack('S*').reduce([]) do |acc, n|
c1 = n & 0x0f
v1 = (n >> 4) & 0x03
c2 = (n >> 6) & 0x0f
v2 = (n >> 10) & 0x03
c3 = (n >> 12) & 0x0f
acc << C[c1] + V[v1] + C[c2] + V[v2] + C[c3]
end.join('-')
end
def decode str
# learner's exercise
# or see some proquint library (eg) https://github.com/deoxxa/proquint
end
end
Proquint.encode dat
# => dabab-rabab-habab-potat-nokab-babub-babob-bahab-pihod-bohur-tadot-dabab-dabab-habab-babab-babub-zabab
Of course the entire process is reversible too. You might not need it, so I'll leave it as an exercise for the learner
It's particularly nice for things like IP address, or any other short binary blobs. You gain familiarity too as you see common byte strings in their proquint form
Proquint.encode [192, 168, 11, 51] # bagop-rasag
Proquint.encode [192, 168, 11, 52] # bagop-rabig
Proquint.encode [192, 168, 11, 66] # bagop-ramah
Proquint.encode [192, 168, 22, 19] # bagop-kisad
Proquint.encode [192, 168, 22, 20] # bagop-kibid

Ruby 100 doors returning 100 nil

I'm solving the '100 doors' problem from Rosetta Code in Ruby. Briefly,
there are 100 doors, all closed, designated 1 to 100
100 passes are made, designated 1 to 100
on the ith pass, every ith door is "toggled": opened if it's closed, closed if it's open
determine the state of each door after 100 passes have been completed.
Therefore, on the first pass all doors are opened. On the second pass even numbered doors are closed. On the third pass doors i for which i%3 == 0 are toggled, and so on.
Here is my attempt at solving the problem.
visit_number = 0
door_array = []
door_array = 100.times.map {"closed"}
until visit_number == 100 do
door_array = door_array.map.with_index { |door_status, door_index|
if (door_index + 1) % (visit_number + 1) == 0
if door_status == "closed"
door_status = "open"
elsif door_status == "open"
door_status = "closed"
end
end
}
visit_number += 1
end
print door_array
But it keeps printing me an array of 100 nil when I run it: Look at all this nil !
What am I doing wrong?
That's what your if clauses return. Just add a return value explicitly.
until visit_number == 100 do
door_array = door_array.map.with_index { |door_status, door_index|
if (door_index + 1) % (visit_number + 1) == 0
if door_status == "closed"
door_status = "open"
elsif door_status == "open"
door_status = "closed"
end
end
door_status
}
visit_number += 1
end
OR:
1.upto(10) {|i| door_array[i*i-1] = 'open'}
The problem is the outer if block doesn't explicitly return anything (thus returns nil implicitly) when the condition does not meet.
A quick fix:
visit_number = 0
door_array = []
door_array = 100.times.map {"closed"}
until visit_number == 100 do
door_array = door_array.map.with_index { |door_status, door_index|
if (door_index + 1) % (visit_number + 1) == 0
if door_status == "closed"
door_status = "open"
elsif door_status == "open"
door_status = "closed"
end
else #<------------- Here
door_status #<------------- And here
end
}
visit_number += 1
end
print door_array
Consider these approaches:
door_array.map { |door|
case door
when "open"
"closed"
when "closed"
"open"
end
}
or
rule = { "open" => "closed", "closed" => "open" }
door_array.map { |door| rule[door] }
or
door_array.map { |door| door == 'open' ? 'closed' : 'open' }
Code
require 'prime'
def even_nbr_divisors?(n)
return false if n==1
arr = Prime.prime_division(n).map { |v,exp| (0..exp).map { |i| v**i } }
arr.shift.product(*arr).map { |a| a.reduce(:*) }.size.even?
end
closed, open = (1..100).partition { |n| even_nbr_divisors?(n) }
closed #=> [ 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22,
# 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40,
# 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 55, 56, 57,
# 58, 59, 60, 61, 62, 63, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
# 75, 76, 77, 78, 79, 80, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
# 92, 93, 94, 95, 96, 97, 98, 99],
open #= [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Explanation
All doors are initially closed. Consider the 24th door. It is toggled during the following passes:
pass 1: opened
pass 2: closed
pass 3: opened
pass 4: closed
pass 6: opened
pass 8: closed
pass 12: opened
pass 24: closed
Notice that the door is toggled once for each of 24's divisors. Therefore, if we had a method divisors(n) that returned an array of n's divisors, we could determine the number of toggles as follows:
nbr_toggles = divisors(24).size
#=> [1,2,3,4,6,8,12,24].size
#=> 8
Since the door is toggled an even number of times, we conclude that it will be in its original state (closed) after all the dust has settled. Similarly, for n = 9,
divisors(9).size
#=> [1,3,9].size
#=> 3
We therefore conclude door #9 will be open at the end, since 3 is odd.
divisors can be defined as follows.
def divisors(n)
arr = Prime.prime_division(n).map { |v,exp| (0..exp).map { |i| v**i } }
arr.first.product(*arr[1..-1]).map { |a| a.reduce(:*) }
end
For example,
divisors 24
#=> [1, 3, 2, 6, 4, 12, 8, 24]
divisors 9
#=> [1, 3, 9]
divisors 1800
#=> [1, 5, 25, 3, 15, 75, 9, 45, 225, 2, 10, 50, 6, 30, 150, 18, 90, 450,
# 4, 20, 100, 12, 60, 300, 36, 180, 900, 8, 40, 200, 24, 120, 600, 72,
# 360, 1800]
Since we only care if there are an odd or even number of divisors, we can instead write
def even_nbr_divisors?(n)
return false if n==1
arr = Prime.prime_division(n).map { |v,exp| (0..exp).map { |i| v**i } }
arr.shift.product(*arr).map { |a| a.reduce(:*) }.size.even?
end
For n = 24, the steps are as follows:
n = 24
a = Prime.prime_division(n)
#=> [[2, 3], [3, 1]]
arr = a.map { |v,exp| (0..exp).map { |i| v**i } }
#=> [[1, 2, 4, 8], [1, 3]]
b = arr.shift
#=> [1, 2, 4, 8]
arr
#=> [[1, 3]]
c = b.product(*arr)
#=> [[1, 1], [1, 3], [2, 1], [2, 3], [4, 1], [4, 3], [8, 1], [8, 3]]
d = c.map { |a| a.reduce(:*) }
#=> [1, 3, 2, 6, 4, 12, 8, 24]
e = d.size
#=> 8
e.even?
#=> true
Lastly,
(1..100).partition { |n| even_nbr_divisors?(n) }
returns the result shown above.

sql server 2008 checksum algorithm? [duplicate]

We perform checksums of some data in sql server as follows:
declare #cs int;
select
#cs = CHECKSUM_AGG(CHECKSUM(someid, position))
from
SomeTable
where
userid = #userId
group by
userid;
This data is then shared with clients. We'd like to be able to repeat the checksum at the client end... however there doesn't seem to be any info about how the checksums in the functions above are calculated. Can anyone enlighten me?
On SQL Server Forum, at this page, it's stated:
The built-in CHECKSUM function in SQL Server is built on a series of 4 bit left rotational xor operations. See this post for more explanation.
The CHECKSUM function doesn't provide a very good quality checksum and IMO is pretty useless for most purposes. As far as I know the algorithm isn't published. If you want a check that you can reproduce yourself then use the HashBytes function and one of the standard, published algorithms such as MD5 or SHA.
//Quick hash sum of SQL and C # mirror Ukraine
private Int64 HASH_ZKCRC64(byte[] Data)
{
Int64 Result = 0x5555555555555555;
if (Data == null || Data.Length <= 0) return 0;
int SizeGlobalBufer = 8000;
int Ost = Data.Length % SizeGlobalBufer;
int LeftLimit = (Data.Length / SizeGlobalBufer) * SizeGlobalBufer;
for (int i = 0; i < LeftLimit; i += 64)
{
Result = Result
^ BitConverter.ToInt64(Data, i)
^ BitConverter.ToInt64(Data, i + 8)
^ BitConverter.ToInt64(Data, i + 16)
^ BitConverter.ToInt64(Data, i + 24)
^ BitConverter.ToInt64(Data, i + 32)
^ BitConverter.ToInt64(Data, i + 40)
^ BitConverter.ToInt64(Data, i + 48)
^ BitConverter.ToInt64(Data, i + 56);
if ((Result & 0x0000000000000080) != 0)
Result = Result ^ BitConverter.ToInt64(Data, i + 28);
}
if (Ost > 0)
{
byte[] Bufer = new byte[SizeGlobalBufer];
Array.Copy(Data, LeftLimit, Bufer, 0, Ost);
for (int i = 0; i < SizeGlobalBufer; i += 64)
{
Result = Result
^ BitConverter.ToInt64(Bufer, i)
^ BitConverter.ToInt64(Bufer, i + 8)
^ BitConverter.ToInt64(Bufer, i + 16)
^ BitConverter.ToInt64(Bufer, i + 24)
^ BitConverter.ToInt64(Bufer, i + 32)
^ BitConverter.ToInt64(Bufer, i + 40)
^ BitConverter.ToInt64(Bufer, i + 48)
^ BitConverter.ToInt64(Bufer, i + 56);
if ((Result & 0x0000000000000080)!=0)
Result = Result ^ BitConverter.ToInt64(Bufer, i + 28);
}
}
byte[] MiniBufer = BitConverter.GetBytes(Result);
Array.Reverse(MiniBufer);
return BitConverter.ToInt64(MiniBufer, 0);
#region SQL_FUNCTION
/* CREATE FUNCTION [dbo].[HASH_ZKCRC64] (#data as varbinary(MAX)) Returns bigint
AS
BEGIN
Declare #I64 as bigint Set #I64=0x5555555555555555
Declare #Bufer as binary(8000)
Declare #i as int Set #i=1
Declare #j as int
Declare #Len as int Set #Len=Len(#data)
if ((#data is null) Or (#Len<=0)) Return 0
While #i<=#Len
Begin
Set #Bufer=Substring(#data,#i,8000)
Set #j=1
While #j<=8000
Begin
Set #I64=#I64
^ CAST(Substring(#Bufer,#j, 8) as bigint)
^ CAST(Substring(#Bufer,#j+8, 8) as bigint)
^ CAST(Substring(#Bufer,#j+16,8) as bigint)
^ CAST(Substring(#Bufer,#j+24,8) as bigint)
^ CAST(Substring(#Bufer,#j+32,8) as bigint)
^ CAST(Substring(#Bufer,#j+40,8) as bigint)
^ CAST(Substring(#Bufer,#j+48,8) as bigint)
^ CAST(Substring(#Bufer,#j+56,8) as bigint)
if #I64<0 Set #I64=#I64 ^ CAST(Substring(#Bufer,#j+28,8) as bigint)
Set #j=#j+64
End;
Set #i=#i+8000
End
Return #I64
END
*/
#endregion
}
I figured out the CHECKSUM algorithm, at least for ASCII characters. I created a proof of it in JavaScript (see https://stackoverflow.com/a/59014293/9642).
In a nutshell: rotate 4 bits left and xor by a code for each character. The trick was figuring out the "XOR codes". Here's the table of those:
var xorcodes = [
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
0, 33, 34, 35, 36, 37, 38, 39, // !"#$%&'
40, 41, 42, 43, 44, 45, 46, 47, // ()*+,-./
132, 133, 134, 135, 136, 137, 138, 139, // 01234567
140, 141, 48, 49, 50, 51, 52, 53, 54, // 89:;<=>?#
142, 143, 144, 145, 146, 147, 148, 149, // ABCDEFGH
150, 151, 152, 153, 154, 155, 156, 157, // IJKLMNOP
158, 159, 160, 161, 162, 163, 164, 165, // QRSTUVWX
166, 167, 55, 56, 57, 58, 59, 60, // YZ[\]^_`
142, 143, 144, 145, 146, 147, 148, 149, // abcdefgh
150, 151, 152, 153, 154, 155, 156, 157, // ijklmnop
158, 159, 160, 161, 162, 163, 164, 165, // qrstuvwx
166, 167, 61, 62, 63, 64, 65, 66, // yz{|}~
];
The main thing to note is the bias towards alphanumerics (their codes are similar and ascending). English letters use the same code regardless of case.
I haven't tested high codes (128+) nor Unicode.

Convert string to hexadecimal in Ruby

I'm trying to convert a Binary file to Hexadecimal using Ruby.
At the moment I have the following:
File.open(out_name, 'w') do |f|
f.puts "const unsigned int modFileSize = #{data.length};"
f.puts "const char modFile[] = {"
first_line = true
data.bytes.each_slice(15) do |a|
line = a.map { |b| ",#{b}" }.join
if first_line
f.puts line[1..-1]
else
f.puts line
end
first_line = false
end
f.puts "};"
end
This is what the following code is generating:
const unsigned int modFileSize = 82946;
const char modFile[] = {
116, 114, 97, 98, 97, 108, 97, 115, 104, 0, 0, 0, 0, 0, 0
, 0, 0, 0, 0, 0, 62, 62, 62, 110, 117, 107, 101, 32, 111, 102
, 32, 97, 110, 97, 114, 99, 104, 121, 60, 60, 60, 8, 8, 130, 0
};
What I need is the following:
const unsigned int modFileSize = 82946;
const char modFile[] = {
0x74, 0x72, etc, etc
};
So I need to be able to convert a string to its hexadecimal value.
"116" => "0x74", etc
Thanks in advance.
Ruby 1.9 added an even easier way to do this:
"0x101".hex will return the number given in hexadecimal in the string.
Change this line:
line = a.map { |b| ", #{b}" }.join
to this:
line = a.map { |b| sprintf(", 0x%02X",b) }.join
(Change to %02x if necessary, it's unclear from the example whether the hex digits should be capitalized.)
I don't know if this is the best solution, but this a solution:
class String
def to_hex
"0x" + self.to_i.to_s(16)
end
end
"116".to_hex
=> "0x74"
Binary to hex conversion in four languages (including Ruby) might be helpful.
One of the comments on that page seems to provide a very easy short cut. The example covers reading input from STDIN, but any string representation should do.:
STDIN.read.to_i(base=16).to_s(base=2)
For another approach, check out unpack

At which n does binary search become faster than linear search on a modern CPU?

Due to the wonders of branch prediction, a binary search can be slower than a linear search through an array of integers. On a typical desktop processor, how big does that array have to get before it would be better to use a binary search? Assume the structure will be used for many lookups.
I've tried a little C++ benchmarking and I'm surprised - linear search seems to prevail up to several dozen items, and I haven't found a case where binary search is better for those sizes. Maybe gcc's STL is not well tuned? But then -- what would you use to implement either kind of search?-) So here's my code, so everybody can see if I've done something silly that would distort timing grossly...:
#include <vector>
#include <algorithm>
#include <iostream>
#include <stdlib.h>
int data[] = {98, 50, 54, 43, 39, 91, 17, 85, 42, 84, 23, 7, 70, 72, 74, 65, 66, 47, 20, 27, 61, 62, 22, 75, 24, 6, 2, 68, 45, 77, 82, 29, 59, 97, 95, 94, 40, 80, 86, 9, 78, 69, 15, 51, 14, 36, 76, 18, 48, 73, 79, 25, 11, 38, 71, 1, 57, 3, 26, 37, 19, 67, 35, 87, 60, 34, 5, 88, 52, 96, 31, 30, 81, 4, 92, 21, 33, 44, 63, 83, 56, 0, 12, 8, 93, 49, 41, 58, 89, 10, 28, 55, 46, 13, 64, 53, 32, 16, 90
};
int tosearch[] = {53, 5, 40, 71, 37, 14, 52, 28, 25, 11, 23, 13, 70, 81, 77, 10, 17, 26, 56, 15, 94, 42, 18, 39, 50, 78, 93, 19, 87, 43, 63, 67, 79, 4, 64, 6, 38, 45, 91, 86, 20, 30, 58, 68, 33, 12, 97, 95, 9, 89, 32, 72, 74, 1, 2, 34, 62, 57, 29, 21, 49, 69, 0, 31, 3, 27, 60, 59, 24, 41, 80, 7, 51, 8, 47, 54, 90, 36, 76, 22, 44, 84, 48, 73, 65, 96, 83, 66, 61, 16, 88, 92, 98, 85, 75, 82, 55, 35, 46
};
bool binsearch(int i, std::vector<int>::const_iterator begin,
std::vector<int>::const_iterator end) {
return std::binary_search(begin, end, i);
}
bool linsearch(int i, std::vector<int>::const_iterator begin,
std::vector<int>::const_iterator end) {
return std::find(begin, end, i) != end;
}
int main(int argc, char *argv[])
{
int n = 6;
if (argc < 2) {
std::cerr << "need at least 1 arg (l or b!)" << std::endl;
return 1;
}
char algo = argv[1][0];
if (algo != 'b' && algo != 'l') {
std::cerr << "algo must be l or b, not '" << algo << "'" << std::endl;
return 1;
}
if (argc > 2) {
n = atoi(argv[2]);
}
std::vector<int> vv;
for (int i=0; i<n; ++i) {
if(data[i]==-1) break;
vv.push_back(data[i]);
}
if (algo=='b') {
std::sort(vv.begin(), vv.end());
}
bool (*search)(int i, std::vector<int>::const_iterator begin,
std::vector<int>::const_iterator end);
if (algo=='b') search = binsearch;
else search = linsearch;
int nf = 0;
int ns = 0;
for(int k=0; k<10000; ++k) {
for (int j=0; tosearch[j] >= 0; ++j) {
++ns;
if (search(tosearch[j], vv.begin(), vv.end()))
++nf;
}
}
std::cout << nf <<'/'<< ns << std::endl;
return 0;
}
and my a couple of my timings on a core duo:
AmAir:stko aleax$ time ./a.out b 93
1910000/2030000
real 0m0.230s
user 0m0.224s
sys 0m0.005s
AmAir:stko aleax$ time ./a.out l 93
1910000/2030000
real 0m0.169s
user 0m0.164s
sys 0m0.005s
They're pretty repeatable, anyway...
OP says: Alex, I edited your program to just fill the array with 1..n, not run std::sort, and do about 10 million (mod integer division) searches. Binary search starts to pull away from linear search at n=150 on a Pentium 4. Sorry about the chart colors.
binary vs linear search http://spreadsheets.google.com/pub?key=tzWXX9Qmmu3_COpTYkTqsOA&oid=1&output=image
I don't think branch prediction should matter because a linear search also has branches. And to my knowledge there are no SIMD that can do linear search for you.
Having said that, a useful model would be to assume that each step of the binary search has a multiplier cost C.
C log2 n = n
So to reason about this without actually benchmarking, you would make a guess for C, and round n to the next integer. For example if you guess C=3, then it would be faster to use binary search at n=11.
Not many - but hard to say exactly without benchmarking it.
Personally I'd tend to prefer the binary search, because in two years time, when someone else has quadrupled the size of your little array, you haven't lost much performance. Unless I knew very specifically that it's a bottleneck right now and I needed it to be as fast as possible, of course.
Having said that, remember that there are hash tables too; you could ask a similar question about them vs. binary search.

Resources