Lets say we have the string '\342\200\231' (same as "\\342\\200\\231"). What is a quick way to convert this string to "\342\200\231" (same as ’ Unicode character)?
Proposal:
s.gsub(/\\(\d{3})/) { $1.oct.chr }
It depends on what assumptions you can make about your input.
What you appear to be asking is how to change a 12-character string into a three-character string.
'\342\200\231'
is 12 characters long.
"\342\200\231"
is three characters long; actually three bytes long, but in Ruby 1.8 it is about the same since strings are sequences of bytes anyway.
Here is an EVIL answer for you (you did say quick), which takes advantage of eval to do your "parsing":
irb(main):017:0> s = '\342\200\231'
=> "\\342\\200\\231"
irb(main):018:0> t = eval('"' + s + '"')
=> "\342\200\231"
irb(main):019:0> s.length
=> 12
irb(main):020:0> t.length
=> 3
Sorry for the eval!
I should probably give a more helpful answer... EDIT: Someone else just did.
Related
I have the following:
{:department=>{"Pet Supplies"=>{"Birds"=>"16,414", "Cats"=>"243,384",
"Dogs"=>"512,186", "Fish & Aquatic Pets"=>"47,018",
"Horses"=>"14,749", "Insects"=>"359", "Reptiles &
Amphibians"=>"5,794", "Small Animals"=>"19,797"}}}
Now if I use to_i I get say 16. If I do to_f I get something like 16.0 (and as you can see Ruby is considering the , as a . for some reason).
I want the number to be exactly as in the string but as a number instead: "Birds"=>16,414
How to accomplish that?
Just a notice:
If I do to_f I get something like 16.0 (and as you can see Ruby is considering the , as a . for some reason)
Ruby is not treating the , as a . at all. If it would the resulting float would be 16.414 and not 16.0. Ruby is just noticing an extraneous character and decides to ignore ,414.
How to accomplish that?
Well if you want 16,414 to be transformed to 16414 there's nothing as easy as just removing the character:
str = '16,414'
str.delete(',').to_i
# => 16414
In some cultures the , is considered a floating point. In that case, if you want to return 16.414 you can just transform the , into . and convert to Float:
str = '16,414'
str.gsub(/,/, '.').to_f
# => 16.414
Try something like below:
"16,414".gsub(",","_").to_i
# => 16414
or(as #Chris Heald suggested)
"19,797".delete(",").to_i
# => 19797
as you can see Ruby is considering the , as a . for some reason
Yes, it's all quite confusing:
class String
to_i(base=10) → integer
Returns the result of interpreting leading characters in str as an
integer base base (between 2 and 36). Extraneous characters past the
end of a valid number are ignored.
to_f → float
Returns the result of interpreting leading characters in str as a
floating point number. Extraneous characters past the end of a valid
number are ignored.
The ruby docs are public. They are not secret. In fact, you probably have the docs on your computer. Try this:
$ ri String#to_i
I am currently using Ruby's 'base64' but the strings that are created have special characters like /+= .
How do I remove these and still make sure that my decode works in the future?
Essentially I want alphanumeric to be used.
Rather than invent something new, I'd use Base64.urlsafe_encode64 (and its counterpart Base64.urlsafe_decode64) which is basically base64 with + and / replaced with - and _. This conforms to rfc 4648 so should be widely understandable
If you want alphanumeric, I think it is better and is practical to use base 36. Ruby has built-in encoding/decoding up to base 36 (26 letters and 10 numbers).
123456.to_s(36)
# => "qglj"
"qglj".to_i(36)
# => 123456
class Integer
Base62_digits = [*("0".."9"), *("a".."z"), *("A".."Z")]
def base_62
return "0" if zero?
sign = self < 0 ? "-" : ""
n, res = self.abs, ""
while n > 0
n, units = n.divmod(62)
res = Base62_digits[units] + res
end
sign + res
end
end
p 124.base_62 # => "20"
This could be adapted to handle lower bases, but it may be sufficient as is.
Recently I stumbled over this code snippet in Ruby:
#data = 3.chr * 5
which results in "\003\003\003\003\003"
later in the code for example
flag = #data[2] & 2
is used,
I know that it has something todo with bitwise-flags. It seems the values 1,2 and 3 are used as state flags, but because ruby 1.9, which is the version I am familar with, changed the Integer.chr method the code does no longer work and I would really like to know whats going on.
Furthermore, what is the purpose of the "\00x" escaped-thing?
Thanks for your answers
To make the code work in Ruby 1.9, try changing that line to:
flag = #data[2].ord & 2
Prior to Ruby 1.9, str[n] would return an integer between 0 and 255, but in Ruby 1.9 with its new unicode support, str[n] returns a character (string of length 1). To get the integer instead of character, you can call .ord on the character.
The & operator is just the standard bitwise AND operator common to C, Ruby, and many other languages.
Byte number three (0x03) is not a printable ASCII character, so when you have that byte in a string and call inspect ruby denotes that byte as \003. Just make sure you understand that "\003" is a single-byte string while '\003' is a four-byte string.
In Ruby, strings are really sequences of bytes. In Ruby 1.9, there is also encoding information, but they are still really just a sequence of bytes.
The "\00X" thing is an octal representation of the value.
So if we do:
irb(main):001:0> 15.chr
=> "\017"
irb(main):002:0> 16.chr
=> "\020"
Notice how we went from 17 right to 20? Octal.
"\003\003\003\003\003" is 5 bytes of the value 3 and you can then bitwise and them with other bytes, such as 2 or \002.
So 3 or 0011 in binary anded with 2 (0010) is 2 (0010)
The 1.9 issue occurs on account of 1.9 not using ascii like 1.8 does. David Grayson hits that point well.
Note that ruby 1.9 will inspect unprintable characters in the hexadecimal representation:
3.chr # => "\x03"
Even more confusing is that sometimes the strings will appear in unicode (UTF-8):
"\003" # => "\u0003" (utf-8)
3.chr.encoding # => #<Encoding:US-ASCII>
"\003".encoding # => #<Encoding:UTF-8>
"\003" == 3.chr # => true (this is strange because the encoding is different)
If you're trying to understand how these octal and hex strings relate to decimal numbers, you can convert them to binary:
"\003".unpack('B*') # same as "\003".ord.to_s(2)
# => ["00000011"] # the 2 least significant bits are set
2.to_s(2) # convert to base 2
#=> "10"
The expression 3 & 2 is a bitwise-and of binary numbers 11b and 10b, which will yield 10b (because 1 & 1 is 1 for the most significant bit; 1 & 0 is 0 for least significant).
Other conversions:
'%x' % 97 # => '61' hex
0x61 # => 97 decimal from raw hex input
'%o' % 97 # => '141' octal
0141 # => 97 decimal from raw octal input
This is sort of a crash course but you should probably google for more in-depth info.
What's the best way to truncate a string to the first n words?
n = 3
str = "your long long input string or whatever"
str.split[0...n].join(' ')
=> "your long long"
str.split[0...n] # note that there are three dots, which excludes n
=> ["your", "long", "long"]
You could do it like this:
s = "what's the best way to truncate a ruby string to the first n words?"
n = 6
trunc = s[/(\S+\s+){#{n}}/].strip
if you don't mind making a copy.
You could also apply Sawa's Improvement (wish I was still a mathematician, that would be a great name for a theorem) by adjusting the whitespace detection:
trunc = s[/(\s*\S+){#{n}}/]
If you have to deal with an n that is greater than the number of words in s then you could use this variant:
s[/(\S+(\s+)?){,#{n}}/].strip
You can use str.split.first(n).join(' ')
with n being any number.
Contiguous white spaces in the original string are replaced with a single white space in the returned string.
For example, try this in irb:
>> a='apple orange pear banana pineaple grapes'
=> "apple orange pear banana pineaple grapes"
>> b=a.split.first(2).join(' ')
=> "apple orange"
This syntax is very clear (as it doesn't use regular expression, array slice by index). If you program in Ruby, you know that clarity is an important stylistic choice.
A shorthand for join is *
So this syntax str.split.first(n) * ' ' is equivalent and shorter (more idiomatic, less clear for the uninitiated).
You can also use take instead of first
so the following would do the same thing
a.split.take(2) * ' '
This could be following if it's from rails 4.2 (which has truncate_words)
string_given.squish.truncate_words(number_given, omission: "")
I'm outputting a set of numbered files from a Ruby script. The numbers come from incrementing a counter, but to make them sort nicely in the directory, I'd like to use leading zeros in the filenames. In other words
file_001...
instead of
file_1
Is there a simple way to add leading zeros when converting a number to a string? (I know I can do "if less than 10.... if less than 100").
Use the % operator with a string:
irb(main):001:0> "%03d" % 5
=> "005"
The left-hand-side is a printf format string, and the right-hand side can be a list of values, so you could do something like:
irb(main):002:0> filename = "%s/%s.%04d.txt" % ["dirname", "filename", 23]
=> "dirname/filename.0023.txt"
Here's a printf format cheat sheet you might find useful in forming your format string. The printf format is originally from the C function printf, but similar formating functions are available in perl, ruby, python, java, php, etc.
If the maximum number of digits in the counter is known (e.g., n = 3 for counters 1..876), you can do
str = "file_" + i.to_s.rjust(n, "0")
Can't you just use string formatting of the value before you concat the filename?
"%03d" % number
Use String#next as the counter.
>> n = "000"
>> 3.times { puts "file_#{n.next!}" }
file_001
file_002
file_003
next is relatively 'clever', meaning you can even go for
>> n = "file_000"
>> 3.times { puts n.next! }
file_001
file_002
file_003
As stated by the other answers, "%03d" % number works pretty well, but it goes against the rubocop ruby style guide:
Favor the use of sprintf and its alias format over the fairly
cryptic String#% method
We can obtain the same result in a more readable way using the following:
format('%03d', number)
filenames = '000'.upto('100').map { |index| "file_#{index}" }
Outputs
[file_000, file_001, file_002, file_003, ..., file_098, file_099, file_100]