I can't iterate over the entire range of unicode characters.
I searched everywhere...
I am building a fuzzer and want to embed into a url, all unicode characters (one at a time).
For example:
http://www.example.com?a=\uff1c
I know that there are some built tools but I need more flexibility.
If i could do someting like the following: "\u" + "ff1c" it would be great.
This is the closest I got:
char = "\u0000"
...
#within iteration
char.succ!
...
but after the character "\u0039", which is the number 9, I will get "10" instead of ":"
You could use pack to convert numbers to UTF8 characters but I'm not sure if this solves your problem.
You can either create an array with numeric values of all the characters and use pack to get an UTF8 string or you can just loop from 0 to whatever you need and use pack within the loop.
I've written a small example to explain myself. The code below prints out the hex value of each character followed by the character itself.
0.upto(100) do |i|
puts "%04x" % i + ": " + [i].pack("U*")
end
Here's some simpler code, albeit slightly obfuscated, that takes advantage of the fact that Ruby will convert an integer on the right hand side of the << operator to a codepoint. This only works with Ruby 1.8 up for integer values <= 255. It will work for values greater than 255 in 1.9.
0.upto(100) do |i|
puts "" << i
end
Related
I've been working with the Ruby chr and ord methods recently and there are a few things I don't understand.
My current project involves converting individual characters to and from ordinal values. As I understand it, if I have a string with an individual character like "A" and I call ord on it I get its position on the ASCII table which is 65. Calling the inverse, 65.chr gives me the character value "A", so this tells me that Ruby has a collection somewhere of ordered character values, and it can use this collection to give me the position of a specific character, or the character at a specific position. I may be wrong on this, please correct me if I am.
Now I also understand that Ruby's default character encoding uses UTF-8 so it can work with thousands of possible characters. Thus if I ask it for something like this:
'好'.ord
I get the position of that character which is 22909. However, if I call chr on that value:
22909.chr
I get "RangeError: 22909 out of char range." I'm only able to get char to work on values up to 255 which is extended ASCII. So my questions are:
Why does Ruby seem to be getting values for chr from the extended ASCII character set but ord from UTF-8?
Is there any way to tell Ruby to use different encodings when it uses these methods? For instance, tell it to use ASCII-8BIT encoding instead of whatever it's defaulting to?
If it is possible to change the default encoding, is there any way of getting the total number of characters available in the set being used?
According to Integer#chr you can use the following to force the encoding to be UTF_8.
22909.chr(Encoding::UTF_8)
#=> "好"
To list all available encoding names
Encoding.name_list
#=> ["ASCII-8BIT", "UTF-8", "US-ASCII", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "UTF-16", "UTF-32", ...]
A hacky way to get the maximum number of characters
2000000.times.reduce(0) do |x, i|
begin
i.chr(Encoding::UTF_8)
x += 1
rescue
end
x
end
#=> 1112064
After tooling around with this for a while, I realized that I could get the max number of characters for each encoding by running a binary search to find the highest value that doesn't throw a RangeError.
def get_highest_value(set)
max = 10000000000
min = 0
guess = 5000000000
while true
begin guess.chr(set)
if (min > max)
return max
else
min = guess + 1
guess = (max + min) / 2
end
rescue
if min > max
return max
else
max = guess - 1
guess = (max + min) / 2
end
end
end
end
The value input to the method is the name of the encoding being checked.
I am wondering how to make something where if X=5 and Y=2, then have it output something like
Hello 2 World 5.
In Java I would do
String a = "Hello " + Y + " World " + X;
System.out.println(a);
So how would I do that in TI-BASIC?
You have two issues to work out, concatenating strings and converting integers to a string representation.
String concatenation is very straightforward and utilizes the + operator. In your example:
"Hello " + "World"
Will yield the string "Hello World'.
Converting numbers to strings is not as easy in TI-BASIC, but a method for doing so compatible with the TI-83+/84+ series is available here. The following code and explanation are quoted from the linked page:
:"?
:For(X,1,1+log(N
:sub("0123456789",ipart(10fpart(N10^(-X)))+1,1)+Ans
:End
:sub(Ans,1,length(Ans)-1?Str1
With our number stored in N, we loop through each digit of N and store
the numeric character to our string that is at the matching position
in our substring. You access the individual digit in the number by
using iPart(10fPart(A/10^(X, and then locate where it is in the string
"0123456789". The reason you need to add 1 is so that it works with
the 0 digit.
In order to construct a string with all of the digits of the number, we first create a dummy string. This is what the "? is used
for. Each time through the For( loop, we concatenate the string from
before (which is still stored in the Ans variable) to the next numeric
character that is found in N. Using Ans allows us to not have to use
another string variable, since Ans can act like a string and it gets
updated accordingly, and Ans is also faster than a string variable.
By the time we are done with the For( loop, all of our numeric characters are put together in Ans. However, because we stored a dummy
character to the string initially, we now need to remove it, which we
do by getting the substring from the first character to the second to
last character of the string. Finally, we store the string to a more
permanent variable (in this case, Str1) for future use.
Once converted to a string, you can simply use the + operator to concatenate your string literals with the converted number strings.
You should also take a look at a similar Stack Overflow question which addresses a similar issue.
For this issue you can use the toString( function which was introduced in version 5.2.0. This function translates a number to a string which you can use to display numbers and strings together easily. It would end up like this:
Disp "Hello "+toString(Y)+" World "+toString(X)
If you know the length of "Hello" and "World," then you can simply use Output() because Disp creates a new line after every statement.
I have the following:
{:department=>{"Pet Supplies"=>{"Birds"=>"16,414", "Cats"=>"243,384",
"Dogs"=>"512,186", "Fish & Aquatic Pets"=>"47,018",
"Horses"=>"14,749", "Insects"=>"359", "Reptiles &
Amphibians"=>"5,794", "Small Animals"=>"19,797"}}}
Now if I use to_i I get say 16. If I do to_f I get something like 16.0 (and as you can see Ruby is considering the , as a . for some reason).
I want the number to be exactly as in the string but as a number instead: "Birds"=>16,414
How to accomplish that?
Just a notice:
If I do to_f I get something like 16.0 (and as you can see Ruby is considering the , as a . for some reason)
Ruby is not treating the , as a . at all. If it would the resulting float would be 16.414 and not 16.0. Ruby is just noticing an extraneous character and decides to ignore ,414.
How to accomplish that?
Well if you want 16,414 to be transformed to 16414 there's nothing as easy as just removing the character:
str = '16,414'
str.delete(',').to_i
# => 16414
In some cultures the , is considered a floating point. In that case, if you want to return 16.414 you can just transform the , into . and convert to Float:
str = '16,414'
str.gsub(/,/, '.').to_f
# => 16.414
Try something like below:
"16,414".gsub(",","_").to_i
# => 16414
or(as #Chris Heald suggested)
"19,797".delete(",").to_i
# => 19797
as you can see Ruby is considering the , as a . for some reason
Yes, it's all quite confusing:
class String
to_i(base=10) → integer
Returns the result of interpreting leading characters in str as an
integer base base (between 2 and 36). Extraneous characters past the
end of a valid number are ignored.
to_f → float
Returns the result of interpreting leading characters in str as a
floating point number. Extraneous characters past the end of a valid
number are ignored.
The ruby docs are public. They are not secret. In fact, you probably have the docs on your computer. Try this:
$ ri String#to_i
I am using Ruby 1.8.7 (and upgrading isn't an option). I would like to create a string of all UTF-8 code points from 0 to 127, written as "\uXXXX".
My problem is that this is being interpreted as (for example): 'u0008'. If I try to use '\u0008', the string becomes "\u0008" which IS NOT what I want.
I have tried many different ways, but it seems impossible to create a string that is exactly just "\uXXXX" ie. "\u000B". it always is either "\u000B" or "u000B"
Escaping the '\' isn't an option. I need to send a string to a server, such that the server will receive '\u000B' for example. It is so that other server can test its parsing of the \uXXXX syntax. This seems impossible to do in Ruby however.
Happy if someone can prove me wrong :)
Use Integer #chr to get the character. Here's a clean version:
(1..127).each do |i|
value << "U+#{i} = #{i.chr}, hex = \\x#{"%02x" % i}; "
end
The "%02x" % i is the equal to sprintf("%02x", i). It returns the integer as a 2-digit hexadecimal number.
Escaped output (see comments):
(1..127).each do |i|
value << "U+#{i} = \\u#{"%04x" % i}, hex = \\x#{"%02x" % i}; "
end
I am trying to convert a hex value to a binary value (each bit in the hex string should have an equivalent four bit binary value). I was advised to use this:
num = "0ff" # (say for eg.)
bin = "%0#{num.size*4}b" % num.hex.to_i
This gives me the correct output 000011111111. I am confused with how this works, especially %0#{num.size*4}b. Could someone help me with this?
You can also do:
num = "0ff"
num.hex.to_s(2).rjust(num.size*4, '0')
You may have already figured out, but, num.size*4 is the number of digits that you want to pad the output up to with 0 because one hexadecimal digit is represented by four (log_2 16 = 4) binary digits.
You'll find the answer in the documentation of Kernel#sprintf (as pointed out by the docs for String#%):
http://www.ruby-doc.org/core/classes/Kernel.html#M001433
This is the most straightforward solution I found to convert from hexadecimal to binary:
['DEADBEEF'].pack('H*').unpack('B*').first # => "11011110101011011011111011101111"
And from binary to hexadecimal:
['11011110101011011011111011101111'].pack('B*').unpack1('H*') # => "deadbeef"
Here you can find more information:
Array#pack: https://ruby-doc.org/core-2.7.1/Array.html#method-i-pack
String#unpack1 (similar to unpack): https://ruby-doc.org/core-2.7.1/String.html#method-i-unpack1
This doesn't answer your original question, but I would assume that a lot of people coming here are, instead of looking to turn hexadecimal to actual "0s and 1s" binary output, to decode hexadecimal to a byte string representation (in the spirit of such utilities as hex2bin). As such, here is a good method for doing exactly that:
def hex_to_bin(hex)
# Prepend a '0' for padding if you don't have an even number of chars
hex = '0' << hex unless (hex.length % 2) == 0
hex.scan(/[A-Fa-f0-9]{2}/).inject('') { |encoded, byte| encoded << [byte].pack('H2') }
end
Getting back to hex again is much easier:
def bin_to_hex(bin)
bin.unpack('H*').first
end
Converting the string of hex digits back to binary is just as easy. Take the hex digits two at a time (since each byte can range from 00 to FF), convert the digits to a character, and join them back together.
def hex_to_bin(s) s.scan(/../).map { |x| x.hex.chr }.join end