How to split string in ruby [duplicate] - ruby

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I have a string:
"1 chocolate bar at 25"
and I want to split this string into:
[1, "chocolate bar", 25]
I don't know how to write a regex for this split. And I wanted to know whether there are any other functions to accomplish it.

You could use scan with a regex:
"1 chocolate bar at 25".scan(/^(\d+) ([\w ]+) at (\d+)$/).first
The above method doesn't work if item_name has special characters.
If you want a more robust solution, you can use split:
number1, *words, at, number2 = "1 chocolate bar at 25".split
p [number1, words.join(' '), number2]
# ["1", "chocolate bar", "25"]
number1 is the first part, number2 is the last one, at the second to last, and *words is an array with everything in-between. number2 is guaranteed to be the last word.
This method has the advantage of working even if there are numbers in the middle, " at " somewhere in the string or if prices are given as floats.

It is not necessary to use a regular expression.
str = "1 chocolate bar, 3 donuts and a 7up at 25"
i1 = str.index(' ')
#=> 1
i2 = str.rindex(' at ')
#=> 35
[str[0,i1].to_i, str[i1+1..i2-1], str[i2+3..-1].to_i]
#=> [1, "chocolate bar, 3 donuts and a 7up", 25]

I would do:
> s="1 chocolate bar at 25"
> s.scan(/[\d ]+|[[:alpha:] ]+/)
=> ["1 ", "chocolate bar at ", "25"]
Then to get the integers and the stripped string:
> s.scan(/[\d ]+|[[:alpha:] ]+/).map {|s| Integer(s) rescue s.strip}
=> [1, "chocolate bar at", 25]
And to remove the " at":
> s.scan(/[\d ]+|[[:alpha:] ]+/).map {|s| Integer(s) rescue s[/.*(?=\s+at\s*)/]}
=> [1, "chocolate bar", 25]

You may try returning captures property of match method on regex (\d+) ([\w ]+) at (\d+):
string.match(/(\d+) +(\D+) +at +(\d+)/).captures
Live demo
Validating input string
If you didn't validate your input string to be within desired format already, then there may be a better approach in validating and capturing data. This solution also brings the idea of accepting any type of character in item_name field and decimal prices at the end:
string.match(/^(\d+) +(.*) +at +(\d+(?:\.\d+)?)$/).captures

You can also do something like this:
"1 chocolate bar at 25"
.split()
.reject {|string| string == "at" }
.map {|string| string.scan(/^\D+$/).empty? ? string.to_i : string }
Code Example: http://ideone.com/s8OvlC

I live in the country where prices might be float, hence the more sophisticated matcher for the price.
"1 chocolate bar at 25".
match(/\A(\d+)\s+(.*?)\s+at\s+(\d[.\d]*)\z/).
captures
#⇒ ["1", "chocolate bar", "25"]

Related

How to extract values from string with its formatted mask in Ruby

We can do it in Ruby: "I have %{amount} %{food}" % {amount: 5, food: 'apples'} to get "I have 5 apples". Is there common way for the inverse transformation: using "I have 5 apples" and "I have %{amount} %{food}" to get {amount: 5, food: 'apples'}?
def doit(s1, s2)
a1 = s1.split
a2 = s2.split
a2.each_index.with_object({}) do |i,h|
word = a2[i][/(?<=%\{).+(?=\})/]
h[word.to_sym] = a1[i] unless word.nil?
end.transform_values { |s| s.match?(/\A\-?\d+\z/) ? s.to_i : s }
end
s1 = "I have 5 apples"
s2 = "I have %{amount} %{food}"
doit(s1, s2)
#=> {:amount=>5, :food=>"apples"}
s1 = "223 parcels were delivered last month"
s2 = "%{number} parcels were %{action} last %{period}"
doit(s1, s2)
#=> {:number=>223, :action=>"delivered", :period=>"month"}
The regular expression reads, "match one or more characters (.+), immediately preceded by "%{" ((?<=%\{) being a positive lookbehind) and immediately followed by "}" ((?=\}) being a positive lookahead).
If the substrings are separated with spaces, you could find the corresponding regex with named captures:
text = "I have 5 apples"
# "I have %{amount} %{food}"
format = /\AI have (?<amount>\S+) (?<food>\S+)\z/
p text.match(format).named_captures
# {"amount"=>"5", "food"=>"apples"}
You didn't show any code, so I'll leave it as an exercise to transform the "I have %{amount} %{food}" string into the /\AI have (?<amount>\S+) (?<food>\S+)\z/ regex.

Split a string by multiple delimiters

I want to split a string by whitespaces, commas, and dots. Given this input :
"hello this is a hello, allright this is a hello."
I want to output:
hello 3
a 2
is 2
this 2
allright 1
I tried:
puts "Enter string "
text=gets.chomp
frequencies=Hash.new(0)
delimiters = [',', ' ', "."]
words = text.split(Regexp.union(delimiters))
words.each { |word| frequencies[word] +=1}
frequencies=frequencies.sort_by {|a,b| b}
frequencies.reverse!
frequencies.each { |wor,freq| puts "#{wor} #{freq}"}
This outputs:
hello 3
a 2
is 2
this 2
allright 1
1
I do not want the last line of the output. It considers the space as a
word too. This may be because there were consecutive delimiters (,, &, " ").
Use a regex:
str = 'hello this is a hello, allright this is a hello.'
str.split(/[.,\s]+/)
# => ["hello", "this", "is", "a", "hello", "allright", "this", "is", "a", "hello"]
This allows you to split a string by any of the three delimiters you've requested.
The stop and comma are self-explanatory, and the \s refers to whitespace. The + means we match one or more of these, and means we avoid empty strings in the case of 2+ of these characters in sequence.
You might find the explanation provided by Regex101 to be handy, available here: https://regex101.com/r/r4M7KQ/3.
Edit: for bonus points, here's a nice way to get the word counts using each_with_object :)
str.split(/[.,\s]+/).each_with_object(Hash.new(0)) { |word, counter| counter[word] += 1 }
# => {"hello"=>3, "this"=>2, "is"=>2, "a"=>2, "allright"=>1}

Replace matched lines in a file but ignore commented-out lines using Ruby

How to replace a file in Ruby, but do not touch commented-out lines? To be more specific I want to change variable in configuration file. An example would be:
irb(main):014:0> string = "#replaceme\n\t\s\t\s# replaceme\nreplaceme\n"
=> "#replaceme\n\t \t # replaceme\nreplaceme\n"
irb(main):015:0> puts string.gsub(%r{replaceme}, 'replaced')
#replaced
# replaced
replaced
=> nil
irb(main):016:0>
Desired output:
#replaceme
# replaceme
replaced
I don't fully understand the question. To do a find and replace in each line, disregarding text following a pound sign, one could do the following.
def replace_em(str, source, replacement)
str.split(/(\#.*?$)/).
map { |s| s[0] == '#' ? s : s.gsub(source, replacement) }.
join
end
str = "It was known that # that dog has fleas, \nbut who'd know that that dog # wouldn't?"
replace_em(str, "that", "the")
#=> "It was known the # that dog has fleas, \nbut who'd know the the dog # wouldn't?"
str = "#replaceme\n\t\s\t\s# replaceme\nreplaceme\n"
replace_em(str, "replaceme", "replaced")
#=> "#replaceme\n\t \t # replaceme\nreplaced\n"
For the string
str = "It was known that # that dog has fleas, \nbut who'd know that that dog # wouldn't?"
source = "that"
replacement = "the"
the steps are as follows.
a = str.split(/(\#.*?$)/)
#=> ["It was known that ", "# that dog has fleas, ",
# "\nbut who'd know that that dog ", "# wouldn't?"]
Note that the body of the regular expression must be put in a capture group in order that the text used to split the string be included as elements in the resulting array. See String#split.
b = a.map { |s| s[0] == '#' ? s : s.gsub(source, replacement) }
#=> ["It was known the ", "# that dog has fleas, ",
# "\nbut who'd know the the dog ", "# wouldn't?"]
b.join
#=> "It was known the # that dog has fleas, \nbut who'd know the the dog # wouldn't?"
How about this?
puts string.gsub(%r{^replaceme}, 'replaced')

Converting string to proper title case

I have this exercise:
Write a Title class which is initialized with a string.
It has one method -- fix -- which should return a title-cased version of the string:
Title.new("a title of a book").fix =
A Title of a Book
You'll need to use conditional logic - if and else statements - to make this work.
Make sure you read the test specification carefully so you understand the conditional logic to be implemented.
Some methods you'll want to use:
String#downcase
String#capitalize
Array#include?
Also, here is the Rspec, I should have included that:
describe "Title" do
describe "fix" do
it "capitalizes the first letter of each word" do
expect( Title.new("the great gatsby").fix ).to eq("The Great Gatsby")
end
it "works for words with mixed cases" do
expect( Title.new("liTTle reD Riding hOOD").fix ).to eq("Little Red Riding Hood")
end
it "downcases articles" do
expect( Title.new("The lord of the rings").fix ).to eq("The Lord of the Rings")
expect( Title.new("The sword And The stone").fix ).to eq("The Sword and the Stone")
expect( Title.new("the portrait of a lady").fix ).to eq("The Portrait of a Lady")
end
it "works for strings with all uppercase characters" do
expect( Title.new("THE SWORD AND THE STONE").fix ).to eq("The Sword and the Stone")
end
end
end
Thank you #simone, I incorporated your suggestions:
class Title
attr_accessor :string
def initialize(string)
#string = string
end
IGNORE = %w(the of a and)
def fix
s = string.split(' ')
s.map do |word|
words = word.downcase
if IGNORE.include?(word)
words
else
words.capitalize
end
end
s.join(' ')
end
end
Although I'm still running into errors when running the code:
expected: "The Great Gatsby"
got: "the great gatsby"
(compared using ==)
exercise_spec.rb:6:in `block (3 levels) in <top (required)>'
From my beginner's perspective, I cannot see what I'm doing wrong?
Final edit: I just wanted to say thanks for all the effort every one put in in assisting me earlier. I'll show the final working code I was able to produce:
class Title
attr_accessor :string
def initialize(string)
#string = string
end
def fix
word_list = %w{a of and the}
a = string.downcase.split(' ')
b = []
a.each_with_index do |word, index|
if index == 0 || !word_list.include?(word)
b << word.capitalize
else
b << word
end
end
b.join(' ')
end
end
Here's a possible solution.
class Title
attr_accessor :string
IGNORES = %w( the of a and )
def initialize(string)
#string = string
end
def fix
tokens = string.split(' ')
tokens.map do |token|
token = token.downcase
if IGNORES.include?(token)
token
else
token.capitalize
end
end.join(" ")
end
end
Title.new("a title of a book").fix
Your starting point was good. Here's a few improvements:
The comparison is always lower-case. This will simplify the if-condition
The list of ignored items is into an array. This will simplify the if-condition because you don't need an if for each ignored string (they could be hundreds)
I use a map to replace the tokens. It's a common Ruby pattern to use blocks with enumerations to loop over items
There are two ways you can approach this problem:
break the string into words, possibly modify each word and join the words back together; or
use a regular expression.
I will say something about the latter, but I believe your exercise concerns the former--which is the approach you've taken--so I will concentrate on that.
Split string into words
You use String#split(' ') to split the string into words:
str = "a title of a\t book"
a = str.split(' ')
#=> ["a", "title", "of", "a", "book"]
That's fine, even when there's extra whitespace, but one normally writes that:
str.split
#=> ["a", "title", "of", "a", "book"]
Both ways are the same as
str.split(/\s+/)
#=> ["a", "title", "of", "a", "book"]
Notice that I've used the variable a to signify that an array is return. Some may feel that is not sufficiently descriptive, but I believe it's better than s, which is a little confusing. :-)
Create enumerators
Next you send the method Enumerable#each_with_index to create an enumerator:
enum0 = a.each_with_index
# => #<Enumerator: ["a", "title", "of", "a", "book"]:each_with_index>
To see the contents of the enumerator, convert enum0 to an array:
enum0.to_a
#=> [["a", 0], ["title", 1], ["of", 2], ["a", 3], ["book", 4]]
You've used each_with_index because the first word--the one with index 0-- is to be treated differently than the others. That's fine.
So far, so good, but at this point you need to use Enumerable#map to convert each element of enum0 to an appropriate value. For example, the first value, ["a", 0] is to be converted to "A", the next is to be converted to "Title" and the third to "of".
Therefore, you need to send the method Enumerable#map to enum0:
enum1 = enum.map
#=> #<Enumerator: #<Enumerator: ["a", "title", "of", "a",
"book"]:each_with_index>:map>
enum1.to_a
#=> [["a", 0], ["title", 1], ["of", 2], ["a", 3], ["book", 4]]
As you see, this creates a new enumerator, which could think of as a "compound" enumerator.
The elements of enum1 will be passed into the block by Array#each.
Invoke the enumerator and join
You want to a capitalize the first word and all other words other than those that begin with an article. We therefore must define some articles:
articles = %w{a of it} # and more
#=> ["a", "of", "it"]
b = enum1.each do |w,i|
case i
when 0 then w.capitalize
else articles.include?(w) ? w.downcase : w.capitalize
end
end
#=> ["A", "Title", "of", "a", "Book"]
and lastly we join the array with one space between each word:
b.join(' ')
=> "A Title of a Book"
Review details of calculation
Let's go back to the calculation of b. The first element of enum1 is passed into the block and assigned to the block variables:
w, i = ["a", 0] #=> ["a", 0]
w #=> "a"
i #=> 0
so we execute:
case 0
when 0 then "a".capitalize
else articles.include?("a") ? "a".downcase : "a".capitalize
end
which returns "a".capitalize => "A". Similarly, when the next element of enum1 is passed to the block:
w, i = ["title", 1] #=> ["title", 1]
w #=> "title"
i #=> 1
case 1
when 0 then "title".capitalize
else articles.include?("title") ? "title".downcase : "title".capitalize
end
which returns "Title" since articles.include?("title") => false. Next:
w, i = ["of", 2] #=> ["of", 2]
w #=> "of"
i #=> 2
case 2
when 0 then "of".capitalize
else articles.include?("of") ? "of".downcase : "of".capitalize
end
which returns "of" since articles.include?("of") => true.
Chaining operations
Putting this together, we have:
str.split.each_with_index.map do |w,i|
case i
when 0 then w.capitalize
else articles.include?(w) ? w.downcase : w.capitalize
end
end
#=> ["A", "Title", "of", "a", "Book"]
Alternative calculation
Another way to do this, without using each_with_index, is like this:
first_word, *remaining_words = str.split
first_word
#=> "a"
remaining_words
#=> ["title", "of", "a", "book"]
"#{first_word.capitalize} #{ remaining_words.map { |w|
articles.include?(w) ? w.downcase : w.capitalize }.join(' ') }"
#=> "A Title of a Book"
Using a regular expression
str = "a title of a book"
str.gsub(/(^\w+)|(\w+)/) do
$1 ? $1.capitalize :
articles.include?($2) ? $2 : $2.capitalize
end
#=> "A Title of a Book"
The regular expression "captures" [(...)] a word at the beginning of the string [(^\w+)] or [|] a word that is not necessarily at the beginning of string [(\w+)]. The contents of the two capture groups are assigned to the global variables $1 and $2, respectively.
Therefore, stepping through the words of the string, the first word, "a", is captured by capture group #1, so (\w+) is not evaluated. Each subsequent word is not captured by capture group #1 (so $1 => nil), but is captured by capture group #2. Hence, if $1 is not nil, we capitalize the (first) word (of the sentence); else we capitalize $2 if the word is not an article and leave it unchanged if it is an article.
def fix
string.downcase.split(/(\s)/).map.with_index{ |x,i|
( i==0 || x.match(/^(?:a|is|of|the|and)$/).nil? ) ? x.capitalize : x
}.join
end
Meets all conditions:
a, is, of, the, and all lowercase
capitalizes all other words
all first words are capitalized
Explanation
string.downcase calls one operation to make the string you're working with all lower case
.split(/(\s)/) takes the lower case string and splits it on white-space (space, tab, newline, etc) into an array, making each word an element of the array; surrounding the \s (the delimiter) in the parentheses also retains it in the array that's returned, so we don't lose that white-space character when rejoining
.map.with_index{ |x,i| iterates over that returned array, where x is the value and i is the index number; each iteration returns an element of a new array; when the loop is complete you will have a new array
( i==0 || x.match(/^(?:a|is|of|the|and)$/).nil? ) if it's the first element in the array (index of 0), or the word matches a,is,of,the, or and -- that is, the match is not nil -- then x.capitalize (capitalize the word), otherwise (it did match the ignore words) so just return the word/value, x
.join take our new array and combine all the words into one string again
Additional
Ordinarily, what is inside parentheses in regex is considered a capture group, meaning that if the pattern inside is matched, a special variable will retain the value after the regex operations have finished. In some cases, such as the \s we wanted to capture that value, because we reuse it, in other cases like our ignore words, we need to match, but do not need to capture them. To avoid capturing a match you can pace ?: at the beginning of the capture group to tell the regex engine not to retain the value. There are many benefits of this that fall outside the scope of this answer.
Here is another possible solution to the problem
class Title
attr_accessor :str
def initialize(str)
#str = str
end
def fix
s = str.downcase.split(" ") #convert all the strings to downcase and it will be stored in an array
words_cap = []
ignore = %w( of a and the ) # List of words to be ignored
s.each do |item|
if ignore.include?(item) # check whether word in an array is one of the words in ignore list.If it is yes, don't capitalize.
words_cap << item
else
words_cap << item.capitalize
end
end
sentence = words_cap.join(" ") # convert an array of strings to sentence
new_sentence =sentence.slice(0,1).capitalize + sentence.slice(1..-1) #Capitalize first word of the sentence. Incase it is not capitalized while checking the ignore list.
end
end

ruby 1.9 how to convert array to string without brackets

My question is about how to convert array elements to string in ruby 1.9 without getting the brackets and quotation marks. I've got an array (DB extract), from which I want to use to create a periodic report.
myArray = ["Apple", "Pear", "Banana", "2", "15", "12"]
In ruby 1.8 I had the following line
reportStr = "In the first quarter we sold " + myArray[3].to_s + " " + myArray[0].to_s + "(s)."
puts reportStr
Which produced the (wanted) output
In the first quarter we sold 2 Apple(s).
The same two lines in ruby 1.9 produce (not wanted)
In the first quarter we sold ["2"] ["Apple"] (s).
After reading in the documentation
Ruby 1.9.3 doc#Array#slice
I thought I could produce code like
reportStr = "In the first quarter we sold " + myArray[3] + " " + myArray[0] + "(s)."
puts reportStr
which returns a runtime error
/home/test/example.rb:450:in `+': can't convert Array into String (TypeError)
My current solution is to remove brackets and quotation marks with a temporary string, like
tempStr0 = myArray[0].to_s
myLength = tempStr0.length
tempStr0 = tempStr0[2..myLength-3]
tempStr3 = myArray[3].to_s
myLength = tempStr3.length
tempStr3 = tempStr3[2..myLength-3]
reportStr = "In the first quarter we sold " + tempStr3 + " " + tempStr0 + "(s)."
puts reportStr
which in general works.
However, what would be a more elegant "ruby" way how to do that?
You can use the .join method.
For example:
my_array = ["Apple", "Pear", "Banana"]
my_array.join(', ') # returns string separating array elements with arg to `join`
=> Apple, Pear, Banana
Use interpolation instead of concatenation:
reportStr = "In the first quarter we sold #{myArray[3]} #{myArray[0]}(s)."
It's more idiomatic, more efficient, requires less typing and automatically calls to_s for you.
And if you need to do this for more than one fruit the best way is to transform the array and the use the each statement.
myArray = ["Apple", "Pear", "Banana", "2", "1", "12"]
num_of_products = 3
tranformed = myArray.each_slice(num_of_products).to_a.transpose
p tranformed #=> [["Apple", "2"], ["Pear", "1"], ["Banana", "12"]]
tranformed.each do |fruit, amount|
puts "In the first quarter we sold #{amount} #{fruit}#{amount=='1' ? '':'s'}."
end
#=>
#In the first quarter we sold 2 Apples.
#In the first quarter we sold 1 Pear.
#In the first quarter we sold 12 Bananas.
You can think of this as arrayToString()
array = array * " "
E.g.,
myArray = ["One.","_1_?! Really?!","Yes!"]
=> "One.","_1_?! Really?!","Yes!"
myArray = myArray * " "
=> "One. _1_?! Really?! Yes."

Resources