Is there a way to break string into list using list comprehensions? - python-2.x

Say I have a string of words separated by spaces and commas like
phrase="Hey John, take that"
I would like to split the string into a list using list comprehensions to obtain
["Hey", "John", "take", "that"]
Any suggestion?

Related

Parse many numbers containing commas from string

I have a series of strings that all include 1 or many numbers (a number in this case would be 123,123,123) in the following format
"This is a number 123,124,123"
"These are some more numbers 123,345,123; 231,123,123; 124,152,123"
"This one is an odd situation 123,124,125; 123,123,123; more text"
What is the cleanest way to parse these numbers into either an array or a string that I can split that looks like this?
"123,124,123"
"123,345,123;231,123,123;124,152,123"
"123,124,125;123,123,123;"
Ultimately I want to be able to separate out the numbers like this.
"123,124,123"
"123,345,123" "231,123,123" "124,152,123"
"123,124,125" "123,123,123"
Currently attempting to use
"string".scan( /\d/ )
but obviously this is only giving me the numbers without the commas and also not separated properly.
Do it like this
string.scan(/[\d,]+/)
Another way would be to remove the unwanted characters.
arr = ["This is a number 123,124,123",
"These are some more numbers 123,345,123; 231,123,123; 124,152,123",
"This one is an odd situation 123,124,125; 123,123,123; more text"]
arr.map { |str| str.gsub(/[^\s\d,]+/,'').split }
#=> [["123,124,123"],
# ["123,345,123", "231,123,123", "124,152,123"],
# ["123,124,125", "123,123,123"]]
Regex that matches your numbers is \d{1,3}(,\d{3})*

extracting strings out of one long string in Ruby

I have this really long string and I would like to extract specific strings out of it in a list form.
the string:
[#<User id: 1, login: "test", hash ... ]
I would like to extract everything that appears in between login: " and ", so in this case it would be the word test. This string can be indefinitely long but the pattern will be the same. How can I go about extracting the words out in a list form?
thanks!
string.scan(/login: "(.*?)",/)

Split a string with multiple delimiters in Ruby

Take for instance, I have a string like this:
options = "Cake or pie, ice cream, or pudding"
I want to be able to split the string via or, ,, and , or.
The thing is, is that I have been able to do it, but only by parsing , and , or first, and then splitting each array item at or, flattening the resultant array afterwards as such:
options = options.split(/(?:\s?or\s)*([^,]+)(?:,\s*)*/).reject(&:empty?);
options.each_index {|index| options[index] = options[index].sub("?","").split(" or "); }
The resultant array is as such: ["Cake", "pie", "ice cream", "pudding"]
Is there a more efficient (or easier) way to split my string on those three delimiters?
What about the following:
options.gsub(/ or /i, ",").split(",").map(&:strip).reject(&:empty?)
replaces all delimiters but the ,
splits it at ,
trims each characters, since stuff like ice cream with a leading space might be left
removes all blank strings
First of all, your method could be simplified a bit with Array#flatten:
>> options.split(',').map{|x|x.split 'or'}.flatten.map(&:strip).reject(&:empty?)
=> ["Cake", "pie", "ice cream", "pudding"]
I would prefer using a single regex:
>> options.split /\s*, or\s+|\s*,\s*|\s+or\s+/
=> ["Cake", "pie", "ice cream", "pudding"]
You can use | in a regex to give alternatives, and putting , or first guarantees that it won’t produce an empty item. Capturing the whitespace with the regex is probably best for efficiency, since you don’t have to scan the array again.
As Zabba points out, you may still want to reject empty items, prompting this solution:
>> options.split(/,|\sor\s/).map(&:strip).reject(&:empty?)
=> ["Cake", "pie", "ice cream", "pudding"]
As "or" and "," does the same thing, the best approach is to tell the regex that multiple cases should be treated the same as a single case:
options = "Cake or pie, ice cream, or pudding"
regex = /(?:\s*(?:,|or)\s*)+/
options.split(regex)

Ruby: How can I process a CSV file with "bad commas"?

I need to process a CSV file from FedEx.com containing shipping history. Unfortunately FedEx doesn't seem to actually test its CSV files as it doesn't quote strings that have commas in them.
For instance, a company name might be "Dog Widgets, Inc." but the CSV doesn't quote that string, so any CSV parser thinks that comma before "Inc." is the start of a new field.
Is there any way I can reliably parse those rows using Ruby?
The only differentiating characteristic that I can find is that the commas that are part of a string have a space after then. Commas that separate fields have no spaces. No clue how that helps me parse this, but it is something I noticed.
you can use a negative lookahead
>> "foo,bar,baz,pop, blah,foobar".split(/,(?![ \t])/)
=> ["foo", "bar", "baz", "pop, blah", "foobar"]
Well, here's an idea: You could replace each instance of comma-followed-by-a-space with a unique character, then parse the CSV as usual, then go through the resulting rows and reverse the replace.
Perhaps something along these lines..
using gsub to change the ', ' to something else
ruby-1.9.2-p0 > "foo,bar,baz,pop, blah,foobar".gsub(/,\ /,'| ').split(',')
[
[0] "foo",
[1] "bar",
[2] "baz",
[3] "pop| blah",
[4] "foobar"
]
and then remove the | after words.
If you are so lucky as to only have one field like that, you can parse the leading fields off the start, the trailing fields off than end and assume whatever is left is the offending field. In python (no habla ruby) this would look something like:
fields = line.split(',') # doesn't work if some fields are quoted
fields = fields[:5] + [','.join(fields[5:-3])] + fields[-3:]
Whatever you do, you should be able at a minimum determine the number of offending commas and that should give you something (a sanity check if nothing else).

How do I match valid words with a ruby regular expression

Using a ruby regular expression, how do I match all words in a coma separated list, but only match if the entire word contains valid word characters (i.e.: letter number or underscore). For instance, given the string:
"see, jane, run, r#un, j#ne, r!n"
I would like to match the words
'see', 'jane' and 'run',
but not the words
'r#un', 'j#ne' or 'r1n'.
I do not want to match the coma ... just the words themselves.
I have started the regex here: http://rubular.com/regexes/12126
s="see, jane, run, r#un, j#ne, r!n, fast"
s.scan(/(?:\A|,\s*)(\w+)(?=,|\Z)/).flatten
# => ["see", "jane", "run", "fast"]
another way
result = s.split(/[\s,]/).select{|_w| _w =~ /^\w+$/}

Resources