Related
How do I clear every other for a string in Ruby, and convert it to byte array? I understand that I need to do AND operation with 0x01010101 value for every byte. But the difficulty is with correct conversion from string to binary. Ideally it should be fast and with least amount of allocations.
Later I will need to pass this value to Digest::MD5.hexdigest.
Firstly, note that 0x is for base 16, 0b is for base 2:
0b11111111.to_s(2) #=> "11111111"
0x11111111.to_s(2) #=> "10001000100010001000100010001"
As you are converting bits within bytes you want to use 0b... for your mask.
Next,
0b01010101.to_s(2) #=> "1010101"
showing that, as with all integers, leading zeroes are dropped, meaning you can include them or not. Consider,
0b11111111 & 0 #=> 0
It is seen that, as a mask, zero is treated as having 7 leading bits of zero. We see that
(0b11111111 &
0b1010101).to_s(2) #=> "1010101"
So, we can define your bitwise mask as
MASK = 0b1010101
We can now use String#unpack with format string "C*" to convert the string to an array of 8-bit unsigned integers, which we then bitwise and with MASK (using &):
str = "Let's party, now!"
str.unpack("C*").map { |u| u & MASK }
#=> [68, 69, 84, 5, 81, 0, 80, 65, 80, 84, 81, 4, 0, 68, 69, 85, 1]
The "C" in "C*" means the format directive "C" is applied to the first character; "*" means to repeat "C" for all subsequent characters.
See also Integer#&.
I see from #DavidKling's answer that one could alternatively write
str.bytes.map { |u| u & MASK }
You can use String#bytes to give you an array of the string's characters' unicode values (in decimal).
'Roman'.bytes # [82, 111, 109, 97, 110]
I am trying to convert the following byte array to hexadecimal;
[1, 1, 65, -50, 6, 104, -91, -70, -100, 119, -100, 123, 52, -109, -33, 45, -14, 86, -105, -97, -115, 16]
The result should be;
010141CE0668A5BA9C779C7B3493DF2DF256979F8D10
Here is my current attempt;
item.getProperties["Mapi-Conversation-Index"].to_a.map {|s| s.to_s(16)}.join()
But my output is: 010141-320668-5b-46-6477-647b34-6d-212d-e56-69-61-7310
arr = [1, 1, 65, -50, 6, 104, -91, -70, -100, 119, -100, 123, 52, -109, -33, 45, -14, 86]
arr.pack("c*").unpack("H*").first
#=> "010141ce0668a5ba9c779c7b3493df2df256"
See Array#pack and String#unpack.
The argument "c" for pack specifies an 8-bit signed integer. The argument "H" for unpack specifies "hex string (high nibble first)". The asterisk at the end of each directive specifies that "c" applies to all elements of arr and "H" applies to all characters of the string produced by pack.
Note that
arr.pack("c*")
#=> "\x01\x01A\xCE\x06h\xA5\xBA\x9Cw\x9C{4\x93\xDF-\xF2V"
and
arr.pack("c*").unpack("H*")
#=> ["010141ce0668a5ba9c779c7b3493df2df256"]
which is why first is needed to extract the string.
This works:
[1, 1, 65, -50].map { |n| '%02X' % (n & 0xFF) }.join
The %02X format specifier makes a 2-character-wide hex number, padded with 0 digits. The & 0xFF is necessary to convert your negative numbers into the standard 0 through 255 range that people usually use when talking about byte values.
in Ruby I have a string like this:
myString = "mystring"
I want to convert the string to a byte array taking only the first 16 bytes and pad with 0's if shorter.
I can do this the brute force way. But...
Care to share a 'cool' way?
Something like this? You should probably check for edge cases like multibyte chars.
"my string"[0..15].ljust(16,'0')
You can get the string as a byte array by calling bytes on it, then once you have it as a byte array, you can take the first 16 elements. Finally, you pad the array by filling it with a range as the second argument:
def padded_byte_array(string, length = 16)
bytes = string.bytes.take(length)
bytes.fill(0, bytes.length...length)
end
and then you can call it:
padded_byte_array('my string')
# => [109, 121, 32, 115, 116, 114, 105, 110, 103, 0, 0, 0, 0, 0, 0, 0]
padded_byte_array('some super long string longer than 16 bytes')
# => [115, 111, 109, 101, 32, 115, 117, 112, 101, 114, 32, 108, 111, 110, 103, 32]
padded_byte_array('本当に長いマルチバイト文字列')
# => [230, 156, 172, 229, 189, 147, 227, 129, 171, 233, 149, 183, 227, 129, 132, 227]
I assume that if arr.size < str.size, where str is the string and arr is the array to be returned, str.bytes is returned. If, in that case, str.bytes[0, [str.size, arr.size].min] is to be returned, that requires an obvious adjustment.
def padded_bytes(str, arr_size)
str_bytes = str.bytes
Array.new([arr_size, str.size].max) { |i| str_bytes.fetch(i, 0) }
end
padded_bytes("tiger", 8)
#=> [116, 105, 103, 101, 114, 0, 0, 0]
padded_bytes("tiger", 3)
#=> [116, 105, 103, 101, 114]
Thanks folks for your answers.
In the end, I implemented
ba = name[0..15].ljust(16, "\0").bytes.to_a
Aza gave the closest to what I asked.
My original looked like this:
ba = name[0..15].bytes.to_a
while ba.length < 16 do ba.push(0) end
Until I got your answers. Thanks again!
On one of my log file I have the below hidden values and color ASCII codes,
...WAITING^H^H^H^H^H^H^H^H^H^H
I was able to remove the color ASCII codes using below method,
gsub(/\e\[(\d+)(;(\d+))?m/, '')
but am still unable to remove that mentioned above hidden characters. Is there any way to get them rid of?
Theory
Backspaces ?
If the ctrl-H characters are backspaces :
puts "foo\b\b\bbar"
#=> "bar"
puts "foo\b\b\bbar".delete("\b")
#=> "foobar"
NOTE: delete is fine here, because we use it with just one character.
Or "^H" substring ?
If the ctrl-H characters are "^H" :
puts "foo^H^H^Hbar".gsub(/\^H/,'')
#=> "foobar"
NOTE: delete wouldn't work here, because it would also remove every H character from the strings, not just the substring ^H. Also, using delete("^H") means delete every character that isn't a 'H'. So :
"foo^H^H^Hbar".delete("^H") => "HHH"
Test
With :
bytes = [46, 46, 46, 87, 65, 73, 84, 73, 78, 71, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 91, 32, 32, 32, 79, 75, 32, 32, 32, 93, 10, 27, 91, 63, 49, 50, 108, 27, 91, 63, 50, 53, 104, 68, 111, 110, 101, 33, 10, 10]
We get :
string = bytes.map(&:chr).join
string # => "...WAITING\b\b\b\b\b\b\b\b\b\b[ OK ]\n\e[?12l\e[?25hDone!\n\n"
puts string
# [ OK ]
# Done!
#
Bytes equal to 8 are backspaces, and they delete WAITING when displayed with puts.
The first alternative should work fine :
puts string.delete("\b")
# ...WAITING[ OK ]
# Done!
NOTE: This only works on the original data, in which backspaces are byte 8. Any copy-paste, use of cat, | or text editor might convert those to "^H" or other string.
I was under the impression that set() would order a collection much like .sort()
However it seems that it doesn't, what was peculiar to me was why it reorders the collection.
>>> h = '321'
>>> set(h)
set(['1', '3', '2'])
>>> h
'321'
>>> h = '22311'
>>> set(h)
set(['1', '3', '2'])
why doesn't it return set(['1', '2', '3']). I also seems that no matter how many instances of each number I user or in what order I use them it always return set(['1', '3', '2']). Why?
Edit:
So I have read your answers and my counter to that is this.
>>> l = [1,2,3,3]
>>> set(l)
set([1, 2, 3])
>>> l = [3,3,2,3,1,1,3,2,3]
>>> set(l)
set([1, 2, 3])
Why does it order numbers and not strings?
Also
import random
l = []
for itr in xrange(101):
l.append(random.randint(1,101))
print set(l)
Outputs
>>>
set([1, 2, 4, 5, 6, 8, 10, 11, 12, 14, 15, 16, 18, 19, 23, 24, 25, 26, 29, 30, 31, 32, 34, 40, 43, 45, 46, 47, 48, 49, 50, 51, 53, 54, 55, 57, 58, 59, 60, 61, 62, 63, 64, 66, 67, 69, 70, 74, 75, 77, 79, 80, 83, 84, 85, 87, 88, 89, 90, 93, 94, 96, 97, 99, 101])
python set is unordered, hence there is no guarantee that the elements would be ordered in the same way as you specify them
If you want a sorted output, then call sorted:
sorted(set(h))
Responding to your edit: it comes down to the implementation of set. In CPython, it boils down to two things:
1) the set will be sorted by hash (the __hash__ function) modulo a limit
2) the limit is generally the next largest power of 2
So let's look at the int case:
x=1
type(x) # int
x.__hash__() # 1
for ints, the hash equals the original value:
[x==x.__hash__() for x in xrange(1000)].count(False) # = 0
Hence, when all the values are ints, it will use the integer hash value and everything works smoothly.
for the string representations, the hashes dont work the same way:
x='1'
type(x)
# str
x.__hash__()
# 6272018864
To understand why the sort breaks for ['1','2','3'], look at those hash values:
[str(x).__hash__() for x in xrange(1,4)]
# [6272018864, 6400019251, 6528019634]
In our example, the mod value is 4 (3 elts, 2^1 = 2, 2^2 = 4) so
[str(x).__hash__()%4 for x in xrange(1,4)]
# [0, 3, 2]
[(str(x).__hash__()%4,str(x)) for x in xrange(1,4)]
# [(0, '1'), (3, '2'), (2, '3')]
Now if you sort this beast, you get the ordering that you see in set:
[y[1] for y in sorted([(str(x).__hash__()%4,str(x)) for x in xrange(1,4)])]
# ['1', '3', '2']
From the python documentation of the set type:
A set object is an unordered collection of distinct hashable objects.
This means that the set doesn't have a concept of the order of the elements in it. You should not be surprised when the elements are printed on your screen in an unusual order.
A set in Python tries to be a "set" in the mathematical sense of the term. No duplicates, and order shouldn't matter.