Convert string with comma to integer - ruby

Is there any neat method to convert "1,112" to integer 1112, instead of 1?
I've got one, but not neat:
"1,112".split(',').join.to_i #=> 1112

How about this?
"1,112".delete(',').to_i

You may also want to make sure that your code localizes correctly, or make sure the users are used to the "international" notation. For example, "1,112" actually means different numbers across different countries. In Germany it means the number a little over one, instead of one thousand and something.
Corresponding Wikipedia article is at http://en.wikipedia.org/wiki/Decimal_mark. It seems to be poorly written at this time though. For example as a Chinese I'm not sure where does these description about thousand separator in China come from.

Some more convenient
"1,1200.00".gsub(/[^0-9]/,'')
it makes "1 200 200" work properly aswell

The following is another method that will work, although as with some of the other methods it will strip decimal places.
a = 1,112
b = a.scan(/\d+/).join().to_i => 1112

I would do using String#tr :
"1,112".tr(',','').to_i # => 1112

If someone is looking to sub out more than a comma I'm a fan of:
"1,200".chars.grep(/\d/).join.to_i
dunno about performance but, it is more flexible than a gsub, ie:
"1-200".chars.grep(/\d/).join.to_i

String count = count.replace(",", "");

Related

Ruby, regex, sentences

I'm currently building a code generator, which aims to generate boiler plate for me once I write the templates and/or translations, in whatever language I have to work with.
I have a problem with a regex in Ruby. The regex aims to select whatever is between {{{ and }}}, so I can generate functions according to my needs.
My regex is currently :
/\{\{\{(([a-zA-Z]|\s)+)\}\}\}/m
My test data set is:
{{{Demande aaa}}} => {{{tagadatsouintsouin tutu}}}
The results are:
[["Demande aaa", "a"], ["tagadatsouintsouin tutu", "u"]]
Each time the regex picks the last character twice. That's not exactly what I want, I need something more like this:
/\{\{\{((\w|\W)+)\}\}\}/m
But this has a flaw too, the results are:
[["Demande aaa}}} => {{{tagadatsouintsouin tutu", "u"]]
Whereas, I wish to get:
[["Demande aaa"],["tagadatsouintsouin tutu"]]
How do I correct these regexes? I could use two sets of delimiters, but it won't teach me anything.
Edit :
All your regex run against my data sample, so you all got a point.
Regex may be overkill, and probably are overkill for my purpose. So i have two questions.
First, do the regex keep the same exact indentation ? This should be able to handle whole functions.
Second, is there something fitter for that task ?
Detailled explanation of the purpose of this tool. I'm bored to write boiler plate code in php - symfony. So i wish to generate this according to templates.
My intent is to build some views, some controllers, and even parts of model this way.
Pratical example : In my model, i wish to generate some functions according to the type of an object's attribute. For examples, i have functions displaying correctly money. So i need to build the corect function, according to my attribute, and then put in , inside m output file.
So there is some translations which themselves need translations.
So a fictive example :
{{{euro}}} => {{{ function getMyAttributeEuro()
{
return formating($this->get[[MyAttribute]]);
} }}}
In order to stock my translations, should i use regex, like
I wish to build something a bit clever, so it can build most of the basic code with no bug. So i can work on interesting code.
You have one set of capturing parentheses too many.
/\{\{\{([a-zA-Z\s]+)\}\}\}/
Also, you don't need the /m modifier because there is no dot (.) in your regex whose behaviour would be affected by it.
I'm partial to:
data = '{{{Demande aaa}}} => {{{tagadatsouintsouin tutu}}}'
data.scan(/\{{3}(.+?)}{3}/).flatten.map{ |r| r.squeeze(' ') }
=> ["Demande aaa", "tagadatsouintsouin tutu"]
or:
data.scan(/\{{3}(.+?)}{3}/).flatten.map{ |r| [ r.squeeze(' ') ] }
=> [["Demande aaa"], ["tagadatsouintsouin tutu"]]
or:
data.scan(/\{{3}(.+?)}{3}/).map{ |r| [ r[0].squeeze(' ') ] }
=> [["Demande aaa"], ["tagadatsouintsouin tutu"]]
if you need the sub-arrays.
I'm not big on trying to everything possible inside the regex. I prefer to keep it short and sweet, then polish the output once I've found what I was looking for. It's a maintenance issue, because regex make my head hurt, and I stopped thinking of them as a macho thing years ago. Regex are a very useful tool, but too often they are seen as the answer to every problem, which they're not.
Some people, when confronted with a problem, think “I know,
I'll use regular expressions.” Now they have two problems.
-- Jamie Zawinski
You want non capturing groups (?:...), but here is another way.
/\{\{\{(.*?)\}\}\}/m
Just a shot
/\{\{\{([\w\W]+?)\}\}\}/
Added non-greedyness to your regex
Here this seems to work

Ruby regular expression for asterisks/underscore to strong/em?

As part of a chat app I'm writing, I need to use regular expressions to match asterisks and underscores in chat messages and turn them into <strong> and <em> tags. Since I'm terrible with regex, I'm really stuck here. Ideally, we would have it set up such that:
One to three words, but not more, can be marked for strong/em.
Patterns such as "un*believ*able" would be matched.
Only one or the other (strong OR em) work within one line.
The above parameters are in order of importance, with only #1 being utterly necessary - the others are just prettiness. The closest I came to anything that worked was:
text = text.sub(/\*([(0-9a-zA-Z).*])\*/,'<b>\1<\/b>')
text = text.sub(/_([(0-9a-zA-Z).*])_/,'<i>\1<\/i>')
But it obviously doesn't work with any of our params.
It's odd that there's not an example of something similar already out there, given the popularity of using asterisks for bold and whatnot. If there is, I couldn't find it outside of plugins/gems (which won't work for this instance, as I really only need it in in one place in my model). Any help would be appreciated.
This should help you finish what you are doing:
sub(/\*(.*)\*/,'<b>\1</b>')
sub(/_(.*)_/,'<i>\1</i>')
Firstly, your criteria are a little strange, but, okay...
It seems that a possible algorithm for this would be to find the number of matches in a message, count them to see if there are less than 4, and then try to perform one set of substitutions.
strong_regexp = /\*([^\*]*)\*/
em_regexp = /_([^_]*)_/
def process(input)
if input ~= strong_regexp && input.match(strong_regexp).size < 4
input.sub strong_regexp, "<b>\1<\b>"
elsif input ~= em_regexp && intput.match(em_regexp).size < 4
input.sub em_regexp, "<i>\1<\i>"
end
end
Your specifications aren't entirely clear, but if you understand this, you can tweak it yourself.

Ruby on Rails - generating bit.ly style identifiers

I'm trying to generate UUIDs with the same style as bit.ly urls like:
http://bit [dot] ly/aUekJP
or cloudapp ones:
http://cl [dot] ly/1hVU
which are even smaller
how can I do it?
I'm now using UUID gem for ruby but I'm not sure if it's possible to limitate the length and get something like this.
I am currently using this:
UUID.generate.split("-")[0] => b9386070
But I would like to have even smaller and knowing that it will be unique.
Any help would be pretty much appreciated :)
edit note: replaced dot letters with [dot] for workaround of banned short link
You are confusing two different things here. A UUID is a universally unique identifier. It has a very high probability of being unique even if millions of them were being created all over the world at the same time. It is generally displayed as a 36 digit string. You can not chop off the first 8 characters and expect it to be unique.
Bitly, tinyurl et-al store links and generate a short code to represent that link. They do not reconstruct the URL from the code they look it up in a data-store and return the corresponding URL. These are not UUIDS.
Without knowing your application it is hard to advise on what method you should use, however you could store whatever you are pointing at in a data-store with a numeric key and then rebase the key to base32 using the 10 digits and 22 lowercase letters, perhaps avoiding the obvious typo problems like 'o' 'i' 'l' etc
EDIT
On further investigation there is a Ruby base32 gem available that implements Douglas Crockford's Base 32 implementation
A 5 character Base32 string can represent over 33 million integers and a 6 digit string over a billion.
If you are working with numbers, you can use the built in ruby methods
6175601989.to_s(30)
=> "8e45ttj"
to go back
"8e45ttj".to_i(30)
=>6175601989
So you don't have to store anything, you can always decode an incoming short_code.
This works ok for proof of concept, but you aren't able to avoid ambiguous characters like: 1lji0o. If you are just looking to use the code to obfuscate database record IDs, this will work fine. In general, short codes are supposed to be easy to remember and transfer from one medium to another, like reading it on someone's presentation slide, or hearing it over the phone. If you need to avoid characters that are hard to read or hard to 'hear', you might need to switch to a process where you generate an acceptable code, and store it.
I found this to be short and reliable:
def create_uuid(prefix=nil)
time = (Time.now.to_f * 10_000_000).to_i
jitter = rand(10_000_000)
key = "#{jitter}#{time}".to_i.to_s(36)
[prefix, key].compact.join('_')
end
This spits out unique keys that look like this: '3qaishe3gpp07w2m'
Reduce the 'jitter' size to reduce the key size.
Caveat:
This is not guaranteed unique (use SecureRandom.uuid for that), but it is highly reliable:
10_000_000.times.map {create_uuid}.uniq.length == 10_000_000
The only way to guarantee uniqueness is to keep a global count and increment it for each use: 0000, 0001, etc.

Find email addresses in large data stream

STILL NOT RESOLVED :( [Feb 11th]
I have a large text file full of random data and want to pull out all the email addresses from it.
I would like to do this in Ruby, with pseudo code like this:
monster_data_string = "asfsfsdfsdfsf sfda **joe#example.com** sdfdsf"
monster_data_string.match(EMAIL_REGEX)
Does anyone know what Ruby email regular expression I would use to accomplish this?
Please keep in mind that I'm looking for a Ruby answer to this. I have already tried numerous regex found by googling but most of them cause Ruby runtime errors stating that characters like "+" and "" are invalid/unrecognized.*
What I have already tried is:
monster_data_string.match(/^([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})$/i)
but I receive Ruby errors stating that "+" is an invalid character
Thanks in advance
Watch this...
f = File.open("content.txt")
content = f.read
r = Regexp.new(/\b[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/)
emails = content.scan(r).uniq
puts YAML.dump(emails)
If you're getting an error message about + or * being invalid in regexes, you're doing something very wrong. This is a valid regex in Ruby, although it's not the one you want:
/^([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})$/i
For one thing, you don't want to anchor the regex to the start and end of lines (^ and $) if you're trying to pluck the addresses from "random" text. But once you've gotten rid of the anchors, your regex will match **joe#example.com in your test string, which I presume you don't want. This regex from Regular-Expressions.info does a better job, but read that page for tips on tweaking it to meet your particular needs.
/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
Finally (and you may already know this), you won't want to use the match() method because that will only find the first match. Try scan() instead.
Given that it is not possible to parse every valid email address using a regexp you are left with two choices:
Make a regexp that matches as many valid email addresses as possible and live with the the fact that some valid but rarely used forms of email address might get overlooked.
or
Make a regexp that Matches anything that "might be" an email address and then live with the false positives
I use the second approach to weed out obviously wrong email addresses when validating user sign up email addresses on a web page
Gleaned from Ruby Cookbook which has a very good section on email address validation:
valid = '[^ #]+'
/^#{valid}##{valid}\.#{valid}/
Apparently there is a 6343 character Perl regexp written by Paul Warren that does a very good job and also works in Ruby, but even that is not foolproof (I think it might also have some performance implications).
What kind of runtime error messages are you gettting? Is it regarding the regexps as invalid, or is it breaking due to the target string being too large?
To try and help you get there (though not very elegantly, I admit):
I think the start and end anchors (^ and $) aren't helping. You may also want to filter the asterisks?:
irb(main):001:0> mds = "asfsfsdfsdfsf sfda **joe#example.com** sdfdsf"
=> "asfsfsdfsdfsf sfda **joe#example.com** sdfdsf"
irb(main):003:0> mds.match(/^([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})$/i)
=> nil
irb(main):004:0> mds.match(/([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})/i)
=> #<MatchData "**joe#example.com" 1:"**joe" 2:"example.com">
irb(main):005:0> mds.match(/([^#\s*]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})/i)
=> #<MatchData "joe#example.com" 1:"joe" 2:"example.com">
Even better,
require 'yaml'
content = "asfsfsdfsdfsf sfda **joe#example.com.au** sdfdsf cool_me#example.com.fr"
r = Regexp.new(/\b([a-zA-Z0-9._%+-]+)#([a-zA-Z0-9.-]+?)(\.[a-zA-Z.]*)\b/)
emails = content.scan(r).uniq
puts YAML.dump(emails)
will give you
---
- - joe
- example
- .com.au
- - cool_me
- example
- .com.au

Creating a unique alphanumeric 10-character string

I'm looking to create a simple short-lived reservation system, and I'd like to generate confirmation numbers that are
unique
random-looking
alphanumeric
short-ish, at least much shorter than 32 character-long strings returned by sha1
I'm only looking to have ~500 reservations, so I don't imagine high likelyhood of collissions.
One idea I had is generate an sha1 hash based on a date-time stamp and username, then truncating it to its first 10 characters. Would something like that be reliably unique enough for the purposes of processing ~500 reservations?
There should be no difference in the randomness of any given bit of a SHA-1 hash, so that's possible. Another way would be to fold the hash into itself using XOR until you have 60 bits worth of data, then encode it using Base 64 to get a mostly alpha-numeric result.
This is only necessary if you want to be able to generate the same Id repeatedly for the same input data. Otherwise, if a random id that you generate once, and hold onto after that, use Anders' suggestion. If you get a conflict, just generate another one.
You can use whatever, even a plain random number generator; however, you should check that the reservation code isn't already present. If this is the case, add characters ('x') to the string (date+user) until you get a new random/sha1/etc.
I'm only looking to have ~500 reservations, so I don't imagine high likelyhood of collissions.
Another stupid idea: generate 1000 or 2000 unique random numbers with the desired properties, store them somewhere, and assign them to the users as they register :)
Here's one way to do it in Perl:
sub get_random_name()
{
my #chars=('a'..'z','A'..'Z');
my $random_string;
foreach (1..22)
{
# rand #chars will generate a random
# number between 0 and scalar #chars
$random_string .= $chars[rand #chars];
}
return $random_string . "-" . time();
}
I don't remember how long the time() part is, so you may have to adjust the numbers to fit your length. You can also remove that part if you don't need it.
If it's really just 500, then pre-generate 20,000 of them, into a table, then get the "next unused one" when you need it.
Some good tips on this question: How do I create a random alpha-numeric string in C++?
I'd avoid including characters like "1", "l", and "O", "0" and "5", "S", and "Z", "2" in your string, to make it easier for customers when they need to read your reservation code over the phone. The algorithm presented at that link should help you do this.
use a guid? 16 characters, though if you really don't care about collision, you could just choose the first n characters.
In C# you can use http://www.dotnetfunda.com/forums/thread1357-how-do-generate-unique-alpha-numeric-random-number-in-aspnet.aspx (the super easy way, they say)

Resources