How to split string across new lines and keep blank lines? - ruby

Given the ruby code:
"aaaa\nbbbb\n\n".split(/\n/)
This outputs:
["aaaa", "bbbb"]
I would like the output to include the blank line indicated by \n\n -- I want the result to be:
["aaaa", "bbbb", ""]
What is the easiest/best way to get this exact result?

I'd recommend using lines instead of split for this task. lines will retain the trailing line-break, which allows you to see the desired empty-line. Use chomp to clean up:
"aaaa\nbbbb\n\n".lines.map(&:chomp)
[
[0] "aaaa",
[1] "bbbb",
[2] ""
]
Other, more convoluted, ways of getting there are:
"aaaa\nbbbb\n\n".split(/(\n)/).each_slice(2).map{ |ary| ary.join.chomp }
[
[0] "aaaa",
[1] "bbbb",
[2] ""
]
It's taking advantage of using a capture-group in split, which returns the split text with the intervening text being split upon. each_slice then groups the elements into two-element sub-arrays. map gets each two-element sub-array, does the join followed by the chomp.
Or:
"aaaa\nbbbb\n\n".split(/(\n)/).delete_if{ |e| e == "\n" }
[
[0] "aaaa",
[1] "bbbb",
[2] ""
]
Here's what split is returning:
"aaaa\nbbbb\n\n".split(/(\n)/)
[
[0] "aaaa",
[1] "\n",
[2] "bbbb",
[3] "\n",
[4] "",
[5] "\n"
]
We don't see that used very often, but it can be useful.

You can supply a negative argument for the second parameter of split to avoid stripping trailing empty strings;
"aaaa\nbbbb\n\n".split(/\n/, -1)
Note that this will give you one extra empty string compared to what you want.

You can use the numeric argument, but IMO it's a bit tricky since (IMO) it's not quite consistent with what I'd expect, and AFAICT you'd want to trim the last null field:
jruby-1.6.7 :020 > "aaaa\nbbbb\n\n".split(/\n/, -1)[0..-2]
=> ["aaaa", "bbbb", ""]

Related

Convert array string objects into hashes?

I have an array:
a = ["us.production => 1", "us.stats => 1", "us.stats.total_active => 1", "us.stats.inactive => 0"]
How can I modify it into a hash object? e.g.:
h = {"us.production" => 1, "us.stats" => 1, "us.stats.total_active" => 1, "us.stats.inactive" => 0}
Thank you,
If pattern you are having is proper and constant, you can try following,
h = a.map { |x| x.split(' => ') }.to_h
# => {"us.production"=>"1", "us.stats"=>"1", "us.stats.total_active"=>"1", "us.stats.inactive"=>"0"}
Instead it is better to use split(/\s*=>\s*/) instead of split(' => ')
You can split every string with String#split and then convert an array of pairs into a hash with Array#to_h:
a = ["us.production => 1", "us.stats => 1", "us.stats.total_active => 1", "us.stats.inactive => 0"]
pairs = a.map{|s| s.split(/\s*=>\s*/)}
# => [["us.production", "1"], ["us.stats", "1"], ["us.stats.total_active", "1"], ["us.stats.inactive", "0"]]
pairs.to_h
# => {"us.production"=>"1", "us.stats"=>"1", "us.stats.total_active"=>"1", "us.stats.inactive"=>"0"}
/\s*=>\s*/ is a regular expression that matches first any number of whitespaces with \s*, then => then again any nuber of whitespaces. As it's a String#split delimiter, this part of string won't be present in a string pair.
The other answers to date are incorrect because they leave the values as strings, whereas the spec is that they be integers. That can be easily corrected. One way is to change s.split(/\s*=>\s*/) in #mrzasa's answer to k,v = s.split(/\s*=>\s*/); [k,v.to_i]. Another way is to tack .transform_values(&:to_i) to the ends of the expressions given in those answers. I expect the authors of those answers either didn't notice that integers were required or intended to leave it as an exercise for the OP to do the (rather uninteresting) conversion.
To make a single pass through the array and avoid the creation of a temporary array and local variables (other than block variables), I suggest using Enumerable#each_with_object (rather than map and to_h), and use regular expressions to extract both keys and values (rather than using String#split):
a = ["us.production => 1", "us.stats=>1", "us.stats.total_active => 1"]
a.each_with_object({}) { |s,h| h[s[/.*[^ ](?= *=>)/]] = s[/\d+\z/].to_i }
#=> {"us.production"=>1, "us.stats"=>1, "us.stats.total_active"=>1}
The first regular expression reads, "match zero or more characters (.*) followed by a character that is not a space ([^ ]), provided that is followed by zero or more spaces (*) followed by the string "=>". (?= *=>) is a positive lookahead.
The second regular expression reads, "match one or more digits (\d+) at the end of the string (the anchor \z). If that string could represent a negative integer, change that regex to /-?\d+\z/ (? makes the minus sign optional).

Regex for a word that can't be preceded by another word but must be preceded by certain characters

I want to match occurrences of a word in my string. The word cannot be preceded by the word "per", but must be preceded by a word boundary or numbers. So for example if my word to match were "pie", this would match
"123pie"
"abc pie"
"d-pie"
but this would not
"per pie"
"mpie"
I was able to figure out how to write a regex that specifies my word cannot be preceded by the word "per" ...
regex = /(?<!per\s)#{Regexp.escape(word)}(\s|$)/i
I don't know how to incorporate the other conditions in there. How can I do that?
You've got part of it with the negative lookbehind, just add a positive lookbehind for a word break or digit and that should do it:
/(?<!per\s)(?<=\b|\d)#{Regexp.escape(word)}/i
Example:
> strs
[
[0] "123pie",
[1] "abc pie",
[2] "abc Pie",
[3] "d-pie",
[4] "per pie",
[5] "mpie",
[6] "perpie"
]
> word = "pie"
> regex = /(?<!per\s)(?<=\b|\d)#{Regexp.escape(word)}/i
> strs.select {|str| str =~ regex }
[
[0] "123pie",
[1] "abc pie",
[2] "abc Pie",
[3] "d-pie"
]
See: http://rubular.com/r/slD3Uz9iw9 for an interactive example.
And: https://ruby-doc.org/core-2.1.0/Regexp.html#class-Regexp-label-Anchors

Scanning through a hash and return a value if true

Based on my hash, I want to match it if it's in the string:
def conv
str = "I only have one, two or maybe sixty"
hash = {:one => 1, :two => 2, :six => 6, :sixty => 60 }
str.match( Regexp.union( hash.keys.to_s ) )
end
puts conv # => <blank>
The above does not work but this only matches "one":
str.match( Regexp.union( hash[0].to_s ) )
Edited:
Any idea how to match "one", "two" and sixty in the string exactly?
If my string has "sixt" it return "6" and that should not happen based on #Cary's answer.
You need to convert each element of hash.keys to a string, rather than converting the array hash.keys to a string, and you should use String#scan rather than String#match. You may also need to play around with the regex until it returns everyhing you want and nothing you don't want.
Let's first look at your example:
str = "I only have one, two or maybe sixty"
hash = {:one => 1, :two => 2, :six => 6, :sixty => 60}
We might consider constructing the regex with word breaks (\b) before and after each word we wish to match:
r0 = Regexp.union(hash.keys.map { |k| /\b#{k.to_s}\b/ })
#=> /(?-mix:\bone\b)|(?-mix:\btwo\b)|(?-mix:\bsix\b)|(?-mix:\bsixty\b)/
str.scan(r0)
#=> ["one", "two", "sixty"]
Without the word breaks, scan would return ["one", "two", "six"], as "sixty" in str would match "six". (Word breaks are zero-width. One before a string requires that the string be preceded by a non-word character or be at the beginning of the string. One after a string requires that the string be followed by a non-word character or be at the end of the string.)
Depending on your requirements, word breaks may not be sufficient or suitable. Suppose, for example (with hash above):
str = "I only have one, two, twenty-one or maybe sixty"
and we do not wish to match "twenty-one". However,
str.scan(r0)
#=> ["one", "two", "one", "sixty"]
One option would be to use a regex that demands that matches be preceded by whitespace or be at the beginning of the string, and be followed by whitespace or be at the end of the string:
r1 = Regexp.union(hash.keys.map { |k| /(?<=^|\s)#{k.to_s}(?=\s|$)/ })
str.scan(r1)
#=> ["sixty"]
(?<=^|\s) is a positive lookbehind; (?=\s|$) is a positive lookahead.
Well, that avoided the match of "twenty-one" (good), but we no longer matched "one" or "two" (bad) because of the comma following each of those words in the string.
Perhaps the solution here is to first remove punctuation, which allows us to then apply either of the above regexes:
str.tr('.,?!:;-','')
#=> "I only have one two twentyone or maybe sixty"
str.tr('.,?!:;-','').scan(r0)
#=> ["one", "two", "sixty"]
str.tr('.,?!:;-','').scan(r1)
#=> ["one", "two", "sixty"]
You may also want to change / at the end of the regex to /i to make the match insensitive to case.1
1 Historical note for readers who want to know why 'a' is called lower case and 'A' is called upper case.

Break up variable into array in Ruby

I'd like to take a variable that I have, and turn it into an array separated by the character of my choosing. In the example below, that separator is %
dump = "1%2%3%apple%car%yellow"
into
Array= [1,2,3,apple,car,yellow]
Use String#split:
"1%2%3%apple%car%yellow".split('%')
# => ["1", "2", "3", "apple", "car", "yellow"]
(Note that every element of the returned array is a string, even the ones containing digits.)
From the docs:
split (pattern=$;, [limit]) → anArray
Divides
str into substrings based on a delimiter, returning an array of these
substrings.
You can pass a string like above ('%'), or a regular expression.

putting enumeration with spaces in rails collection

irb(main):001:0> t = %w{this is a test}
=> ["this", "is", "a", "test"]
irb(main):002:0> t.size
=> 4
irb(main):003:0> t = %w{"this is" a test}
=> ["\"this", "is\"", "a", "test"]
irb(main):004:0> t.size
=> 4
In the end I expected t.size to be 3.
As suggested, each space has to be escaped ...which turns out to be a lot of work. What other options are there? I have a list of about 30 words that I need to put in a collection because I am showing them as checkboxes using simple_form
Why not just use a normal array so no one has to visually parse all the escaping to figure out what's going on? This is pretty clear:
t = [
'this is',
'a',
'test'
]
and the people maintaining your code won't hate you for using %w{} when it isn't appropriate or when they mess things up because they didn't see your escaped whitespace.
You need to escape the space with a '\', like t = %w{this\ is a test} if you dont want that space to be a splitter.
Escape the space using \:
%w{this\ is a test}
You can escape the space %w{this\ is a test} to get ['this is', 'a', 'test'], but in general I wouldn't use %w unless then intention is to split on whitespace.
As others have pointed out use the %w{} construct when spaces are the separator for the words. If you have items that must be quoted and still want to use the construct you can do:
> %w{a test here}.unshift("This is")
=> ["This is", "a", "test", "here"]
require 'csv'
str = '"this is" a test'
p CSV.parse_line(str,{:col_sep=>' '})
#=> ["this is", "a", "test"]

Resources