I am currently using Ruby's 'base64' but the strings that are created have special characters like /+= .
How do I remove these and still make sure that my decode works in the future?
Essentially I want alphanumeric to be used.
Rather than invent something new, I'd use Base64.urlsafe_encode64 (and its counterpart Base64.urlsafe_decode64) which is basically base64 with + and / replaced with - and _. This conforms to rfc 4648 so should be widely understandable
If you want alphanumeric, I think it is better and is practical to use base 36. Ruby has built-in encoding/decoding up to base 36 (26 letters and 10 numbers).
123456.to_s(36)
# => "qglj"
"qglj".to_i(36)
# => 123456
class Integer
Base62_digits = [*("0".."9"), *("a".."z"), *("A".."Z")]
def base_62
return "0" if zero?
sign = self < 0 ? "-" : ""
n, res = self.abs, ""
while n > 0
n, units = n.divmod(62)
res = Base62_digits[units] + res
end
sign + res
end
end
p 124.base_62 # => "20"
This could be adapted to handle lower bases, but it may be sufficient as is.
Related
I've been working with the Ruby chr and ord methods recently and there are a few things I don't understand.
My current project involves converting individual characters to and from ordinal values. As I understand it, if I have a string with an individual character like "A" and I call ord on it I get its position on the ASCII table which is 65. Calling the inverse, 65.chr gives me the character value "A", so this tells me that Ruby has a collection somewhere of ordered character values, and it can use this collection to give me the position of a specific character, or the character at a specific position. I may be wrong on this, please correct me if I am.
Now I also understand that Ruby's default character encoding uses UTF-8 so it can work with thousands of possible characters. Thus if I ask it for something like this:
'好'.ord
I get the position of that character which is 22909. However, if I call chr on that value:
22909.chr
I get "RangeError: 22909 out of char range." I'm only able to get char to work on values up to 255 which is extended ASCII. So my questions are:
Why does Ruby seem to be getting values for chr from the extended ASCII character set but ord from UTF-8?
Is there any way to tell Ruby to use different encodings when it uses these methods? For instance, tell it to use ASCII-8BIT encoding instead of whatever it's defaulting to?
If it is possible to change the default encoding, is there any way of getting the total number of characters available in the set being used?
According to Integer#chr you can use the following to force the encoding to be UTF_8.
22909.chr(Encoding::UTF_8)
#=> "好"
To list all available encoding names
Encoding.name_list
#=> ["ASCII-8BIT", "UTF-8", "US-ASCII", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "UTF-16", "UTF-32", ...]
A hacky way to get the maximum number of characters
2000000.times.reduce(0) do |x, i|
begin
i.chr(Encoding::UTF_8)
x += 1
rescue
end
x
end
#=> 1112064
After tooling around with this for a while, I realized that I could get the max number of characters for each encoding by running a binary search to find the highest value that doesn't throw a RangeError.
def get_highest_value(set)
max = 10000000000
min = 0
guess = 5000000000
while true
begin guess.chr(set)
if (min > max)
return max
else
min = guess + 1
guess = (max + min) / 2
end
rescue
if min > max
return max
else
max = guess - 1
guess = (max + min) / 2
end
end
end
end
The value input to the method is the name of the encoding being checked.
How do I easily convert a number, e.g. 0x616263, equivalently 6382179 in base 10, into a string by dividing the number up into sequential bytes? So the example above should convert into 'abc'.
I've experimented with Array.pack but cant figure out how to get it to convert more than one byte in the number, e.g. [0x616263].pack("C*") returns 'c'.
I've also tried 0x616263.to_s(256), but that throws an ArgumentError: invalid radix. I guess it needs some sort of encoding information?
(Note: Other datatypes in pack like N work with the example I've given above, but only because it fits within 4 bytes, so e.g. [0x616263646566].pack("N") gives cdef, not abcdef)
This question is vaguely similar to this one, but not really. Also, I sort of figured out how to get the hex representation string from a character string using "abcde".unpack("c*").map{|c| c.to_s(16)}.join(""), which gives '6162636465'. I basically want to go backwards.
I don't think this is an X-Y problem, but in case it is - I'm trying to convert a number I've decoded with RSA into a character string.
Thanks for any help. I'm not too experienced with Ruby. I'd also be interested in a Python solution (for fun), but I don't know if its right to add tags for two separate programming languages to this question.
To convert a single number 0x00616263 into 3 characters, what you really need to do first is separate them into three numbers: 0x00000061, 0x00000062, and 0x00000063.
For the last number, the hex digits you want are already in the correct place. But for the other two, you have to do a bitshift using >> 16 and >> 8 respectively.
Afterwards, use a bitwise and to get rid of the other digits:
num1 = (0x616263 >> 16) & 0xFF
num2 = (0x616263 >> 8) & 0xFF
num3 = 0x616263 & 0xFF
For the characters, you could then do:
char1 = ((0x616263 >> 16) & 0xFF).chr
char2 = ((0x616263 >> 8) & 0xFF).chr
char3 = (0x616263 & 0xFF).chr
Of course, bitwise operations aren't very Ruby-esque. There are probably more Ruby-like answers that someone else might provide.
64 bit integers
If your number is smaller than 2**64 (8 bytes), you can :
convert the "big-endian unsigned long long" to 8 bytes
remove the leading zero bytes
Ruby
[0x616263].pack('Q>').sub(/\x00+/,'')
# "abc"
[0x616263646566].pack('Q>').sub(/\x00+/,'')
# "abcdef"
Python 2 & 3
In Python, pack returns bytes, not a string. You can use decode() to convert bytes to a String :
import struct
import re
print(re.sub('\x00', '', struct.pack(">Q", 0x616263646566).decode()))
# abcdef
print(re.sub('\x00', '', struct.pack(">Q", 0x616263).decode()))
# abc
Large numbers
With gsub
If your number doesn't fit in 8 bytes, you could use a modified version of your code. This is shorter and outputs the string correctly if the first byte is smaller than 10 (e.g. for "\t") :
def decode(int)
if int < 2**64
[int].pack('Q>').sub(/\x00+/, '')
else
nhex = int.to_s(16)
nhex = '0' + nhex if nhex.size.odd?
nhex.gsub(/../) { |hh| hh.to_i(16).chr }
end
end
puts decode(0x616263) == 'abc'
# true
puts decode(0x616263646566) == 'abcdef'
# true
puts decode(0x0961) == "\ta"
# true
puts decode(0x546869732073656e74656e63652069732077617920746f6f206c6f6e6720666f7220616e20496e743634)
# This sentence is way too long for an Int64
By the way, here's the reverse method :
def encode(str)
str.reverse.each_byte.with_index.map { |b, i| b * 256**i }.inject(:+)
end
You should still check if your RSA code really outputs arbitrary large numbers or just an array of integers.
With shifts
Here's another way to get the result. It's similar to #Nathan's answer, but it works for any integer size :
def decode(int)
a = []
while int>0
a << (int & 0xFF)
int >>= 8
end
a.reverse.pack('C*')
end
According to fruity, it's twice as fast as the gsub solution.
I'm currently rolling with this:
n = 0x616263
nhex = n.to_s(16)
nhexarr = nhex.scan(/.{1,2}/)
nhexarr = nhexarr.map {|e| e.to_i(16)}
out = nhexarr.pack("C*")
But was hoping for a concise/built-in way to do this, so I'll leave this answer unaccepted for now.
How can I generate an n-character pseudo random string containing only A-Z, 0-9 like SecureRandom.base64 without "+", "/", and "="? For example:
(0..n).map {(('1'..'9').to_a + ('A'..'Z').to_a)[rand(36)]}.join
Array.new(n){[*"A".."Z", *"0".."9"].sample}.join
An elegant way to do it in Rails 5 (I don't test it in another Rails versions):
SecureRandom.urlsafe_base64(n)
where n is the number of digits that you want.
ps: SecureRandom uses a array to mount your alphanumeric string, so keep in mind that n should be the amount of digits that you want + 1.
ex: if you want a 8 digit alphanumeric:
SecureRandom.urlsafe_base64(9)
Even brute force is pretty easy:
n = 20
c = [*?A..?Z + *?0..?9]
size = c.size
n.times.map { c[rand(size)] }.join
#=> "IE210UOTDSJDKM67XCG1"
or, without replacement:
c.sample(n).join
#=> "GN5ZC0HFDCO2G5M47VYW"
should that be desired. (I originally had c = [*(?A..?Z)] + [*(?0..?9)], but saw from #sawa's answer that that could be simplified quite a bit.)
To generate a random string from 10 to 20 characters including just from A to Z and numbers, both always:
require 'string_pattern'
puts "10-20:/XN/".gen
You can do simply like below:
[*'A'..'Z', *0..9].sample(10).join
Change the number 10 to any number to change the length of string
I have the following function which accepts text and a word count and if the number of words in the text exceeded the word-count it gets truncated with an ellipsis.
#Truncate the passed text. Used for headlines and such
def snippet(thought, wordcount)
thought.split[0..(wordcount-1)].join(" ") + (thought.split.size > wordcount ? "..." : "")
end
However what this function doesn't take into account is extremely long words, for instance...
"Helloooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
world!"
I was wondering if there's a better way to approach what I'm trying to do so it takes both word count and text size into consideration in an efficient way.
Is this a Rails project?
Why not use the following helper:
truncate("Once upon a time in a world far far away", :length => 17)
If not, just reuse the code.
This is probably a two step process:
Truncate the string to a max length (no need for regex for this)
Using regex, find a max words quantity from the truncated string.
Edit:
Another approach is to split the string into words, loop through the array adding up
the lengths. When you find the overrun, join 0 .. index just before the overrun.
Hint: regex ^(\s*.+?\b){5} will match first 5 "words"
The logic for checking both word and char limits becomes too convoluted to clearly express as one expression. I would suggest something like this:
def snippet str, max_words, max_chars, omission='...'
max_chars = 1+omision.size if max_chars <= omission.size # need at least one char plus ellipses
words = str.split
omit = words.size > max_words || str.length > max_chars ? omission : ''
snip = words[0...max_words].join ' '
snip = snip[0...(max_chars-3)] if snip.length > max_chars
snip + omit
end
As other have pointed out Rails String#truncate offers almost the functionality you want (truncate to fit in length at a natural boundary), but it doesn't let you independently state max char length and word count.
First 20 characters:
>> "hello world this is the world".gsub(/.+/) { |m| m[0..20] + (m.size > 20 ? '...' : '') }
=> "hello world this is t..."
First 5 words:
>> "hello world this is the world".gsub(/.+/) { |m| m.split[0..5].join(' ') + (m.split.size > 5 ? '...' : '') }
=> "hello world this is the world..."
I am trying to convert a hex value to a binary value (each bit in the hex string should have an equivalent four bit binary value). I was advised to use this:
num = "0ff" # (say for eg.)
bin = "%0#{num.size*4}b" % num.hex.to_i
This gives me the correct output 000011111111. I am confused with how this works, especially %0#{num.size*4}b. Could someone help me with this?
You can also do:
num = "0ff"
num.hex.to_s(2).rjust(num.size*4, '0')
You may have already figured out, but, num.size*4 is the number of digits that you want to pad the output up to with 0 because one hexadecimal digit is represented by four (log_2 16 = 4) binary digits.
You'll find the answer in the documentation of Kernel#sprintf (as pointed out by the docs for String#%):
http://www.ruby-doc.org/core/classes/Kernel.html#M001433
This is the most straightforward solution I found to convert from hexadecimal to binary:
['DEADBEEF'].pack('H*').unpack('B*').first # => "11011110101011011011111011101111"
And from binary to hexadecimal:
['11011110101011011011111011101111'].pack('B*').unpack1('H*') # => "deadbeef"
Here you can find more information:
Array#pack: https://ruby-doc.org/core-2.7.1/Array.html#method-i-pack
String#unpack1 (similar to unpack): https://ruby-doc.org/core-2.7.1/String.html#method-i-unpack1
This doesn't answer your original question, but I would assume that a lot of people coming here are, instead of looking to turn hexadecimal to actual "0s and 1s" binary output, to decode hexadecimal to a byte string representation (in the spirit of such utilities as hex2bin). As such, here is a good method for doing exactly that:
def hex_to_bin(hex)
# Prepend a '0' for padding if you don't have an even number of chars
hex = '0' << hex unless (hex.length % 2) == 0
hex.scan(/[A-Fa-f0-9]{2}/).inject('') { |encoded, byte| encoded << [byte].pack('H2') }
end
Getting back to hex again is much easier:
def bin_to_hex(bin)
bin.unpack('H*').first
end
Converting the string of hex digits back to binary is just as easy. Take the hex digits two at a time (since each byte can range from 00 to FF), convert the digits to a character, and join them back together.
def hex_to_bin(s) s.scan(/../).map { |x| x.hex.chr }.join end