Ruby REGEX split, any issues with the code - ruby
I am a rookie in Regex for Ruby. I read some tutorials and evaluated a piece of code.
Please let me know if I can do it in a better way.
Here is my text which needs to be split at {iwsection(*)} and {{usersection}}
t='{{iwsection(1)}}
This has some sample text 1 - line 1
This has some sample text 1 - line 2
{{iwsection(2)}}
This has some sample text 2
{{iwsection(3)}}
This has some sample text 3
{{usersection}}
This is a user section.
This has some sample text
This has some sample text'
Here is the ruby regex code I was able to manage.
t.split(/^({{[i|u][wsection]\w*...}})/)
Thank You.
The Desired Output :
A array as,
[ '{{iwsection(1)}}', 'This has some sample text 1\nThis has some sample text 1 - line 2',
'{{iwsection(2)}}', 'This has some sample text 2',
'{{iwsection(3)}}', 'This has some sample text 3',
'{{usersection}}', 'This is a user section\nThis has some sample text\nThis has some sample text.']
With this I will build a Hash,
{
'{{iwsection(1)}}' => 'This has some sample text 1\nThis has some sample text 1 - line 2',
'{{iwsection(2)}}' => 'This has some sample text 2',
'{{iwsection(3)}}' => 'This has some sample text 3',
'{{usersection}}' => 'This is a user section\nThis has some sample text\nThis has some sample text.'
}
Edit: .....
The code.
section_array = text.chomp.split(/\r\n|\n/).inject([]) do |a, v|
if v =~ /{{.*}}/
a << [v.gsub(/^{{|}}$/, ""), []]
else
a.last[1] << v
end
a
end.select{ |k, v| (k.start_with?("iwsection") || k.start_with?("usersection")) }.map{ |k, v| ["{{#{k}}}", v.join("\n")] }
Using String#scan:
> t.scan(/{{([^}]*)}}\r?\n(.*?)\r?(?=\n{{|\n?$)/)
=> [["iwsection(1)", "This has some sample text 1"], ["iwsection(2)", "This has some sample text 2"], ["iwsection(3)", "This has some sample text 3"], ["usersection", "This is a user section."]]
> h = t.scan(/{{([^}]*)}}\r?\n(.*?)\r?(?=\n{{|\n?$)/).to_h
=> {"iwsection(1)"=>"This has some sample text 1", "iwsection(2)"=>"This has some sample text 2", "iwsection(3)"=>"This has some sample text 3", "usersection"=>"This is a user section."}
> h.values
=> ["This has some sample text 1", "This has some sample text 2", "This has some sample text 3", "This is a user section."]
> h.keys
=> ["iwsection(1)", "iwsection(2)", "iwsection(3)", "usersection"]
> h["usersection"]
=> "This is a user section."
Update:
#!/usr/bin/env ruby
t = "{{iwsection(1)}}\nThis has some sample text 1 - line 1\nThis has some sample text 1 - line 2\n{{iwsection(2)}}\nThis has some sample text 2\n{{iwsection(3)}}\nThis has some sample text 3\nThis has some sample text\nThis has some sample text\n{{usersection}}\nThis is a user section.\nThis has some sample text\nThis has some sample text"
h = t.chomp.split(/\n/).inject([]) do |a, v|
if v =~ /{{.*}}/
a << [v.gsub(/^{{|}}$/, ""), []]
else
a.last[1] << v
end
a
end.select{ |k, v| k.start_with? "iwsection" or k === "usersection" }.map{ |k, v| [k, v.join("\n")] }.to_h
puts h.inspect
Output:
{"iwsection(1)"=>"This has some sample text 1 - line 1\nThis has some sample text 1 - line 2", "iwsection(2)"=>"This has some sample text 2", "iwsection(3)"=>"This has some sample text 3\nThis has some sample text\nThis has some sample text", "usersection"=>"This is a user section.\nThis has some sample text\nThis has some sample text"}
You can do that like this:
t.split(/{{iwsection\(\d+\)}}|{{usersection}}/)
#=> ["", "\n This has some sample text 1\n ",
# "\n This has some sample text 2\n ",
# "\n This has some sample text 3\n ",
# "\n This is a user section."]
That's what you asked for, but if you want to clean that up, add .map(&:strip):
t.split(/{{iwsection\(\d+\)}}|{{usersection}}/).map(&:strip).map(&:strip)
#=> ["", "This has some sample text 1", "This has some sample text 2",
# "This has some sample text 3", "This is a user section."]
You may not want the empty string at offset zero, but that's how String#split works when you are splitting on a substring that is at the beginning of the string. Suppose the string were instead:
t =
'Some text here{{iwsection(1)}}
This has some sample text 1
{{iwsection(2)}}
This has some sample text 2'
t.split(/{{iwsection\(\d+\)}}|{{usersection}}/).map(&:strip).map(&:strip)
#=> ["Some text here", "This has some sample text 1",
# "This has some sample text 2"]
Here you want "Some text here", so you can't just delete the first element of the array.
Additional requirements
To satisfied your added requirement, you could do this:
t='{{iwsection(1)}}
Text 1 - line 1
Text 1 - line 2
{{iwsection(2)}}
Text 2
{{iwsection(3)}}
Text 3
{{usersection}}
User section.
Text
Text'
h = t.scan(/(?:{{iwsection\(\d+\)}}|{{usersection}})/)
.zip(t.split(/{{iwsection\(\d+\)}}|{{usersection}}/)[1..-1])
.map { |s1,s2| [s1, s2.strip
.lines
.map(&:strip)
.join("\n")] }
.to_h
#=> {"{{iwsection(1)}}"=>"Text 1 - line 1\nText 1 - line 2",
# "{{iwsection(2)}}"=>"Text 2",
# "{{iwsection(3)}}"=>"Text 3",
# "{{usersection}}"=>"User section.\nText\nText"}
Note that this formatting may not be understood by IRB or PRY, but will work fine from the command line.
Explanation
a = t.scan(/(?:{{iwsection\(\d+\)}}|{{usersection}})/)
#=> ["{{iwsection(1)}}", "{{iwsection(2)}}", "{{iwsection(3)}}", "{{usersection}}"]
b = t.split(/{{iwsection\(\d+\)}}|{{usersection}}/)
#=> ["", "\n Text 1 - line 1\n Text 1 - line 2\n ",
# "\n Text 2\n ", "\n Text 3\n ",
# "\n User section.\n Text\n Text"]
c = b[1..-1]
#=> ["\n Text 1 - line 1\n Text 1 - line 2\n ",
# "\n Text 2\n ", "\n Text 3\n ",
# "\n User section.\n Text\n Text"]
h = a.zip(c)
#=> [["{{iwsection(1)}}", "\n Text 1 - line 1\n Text 1 - line 2\n "],
# ["{{iwsection(2)}}", "\n Text 2\n "],
# ["{{iwsection(3)}}", "\n Text 3\n "],
# ["{{usersection}}", "\n User section.\n Text\n Text"]]
d = h.map { |s1,s2| [s1, s2.strip
.lines
.map(&:strip)
.join("\n")] }
#=> [["{{iwsection(1)}}", "Text 1 - line 1\nText 1 - line 2"],
# ["{{iwsection(2)}}", "Text 2"], ["{{iwsection(3)}}", "Text 3"],
# ["{{usersection}}", "User section.\nText\nText"]]
d.to_h
#=> {"{{iwsection(1)}}"=>"Text 1 - line 1\nText 1 - line 2",
# "{{iwsection(2)}}"=>"Text 2",
# "{{iwsection(3)}}"=>"Text 3",
# "{{usersection}}"=>"User section.\nText\nText"}
Related
Sort an array that are merged together
I have 2 strings that I made into an array to sort then to convert back into a string. But, in my test in my response.body The given string is sorted differently. I have a method that takes 2 strings and removes the headers from both and mergers the array and sorts it. But getting different results. How can I get the desired results of the below response.body string string1 = "Category Name,Code,Enabled?,Category Hidden?\nPRESENT AVAIALBLE,PRESENT AVAILABLE,No,No,\nBUG AVAILABLE,BUG,No,No,\nBUG,BUG,No,No,\nPRESENT,PRESENT,No,No\n" string2 = "Category Name,Code,Enabled?,Category Hidden?\nBUG,BUG,No,No,\nBUG AVAILABLE,BUG,No,No,\nEXAMPLE 1,EXAMPLE 1,Yes,No,\nEXAMPLE 2,EXAMPLE 2,Yes,No,\nPRESENT AVAIALBLE,PRESENT AVAILABLE,No,No,\nPRESENT,PRESENT,No,No\n" how would I get that array to be sorted as the response.body string before inserting the header "Category Name,Code,Enabled?,Category Hidden?" response.body string "Category Name,Code,Enabled?,Category Hidden? BUG,BUG,No,No, BUG AVAILABLE,BUG,No,No, EXAMPLE 1,EXAMPLE 1,No,No, EXAMPLE 2,EXAMPLE 2,Yes,No, PRESENT,PRESENT,No,No" PRESENT AVAIALBLE,PRESENT AVAILABLE,No,No" My output from method "Category Name,Code,Enabled?,Category Hidden? BUG AVAILABLE,BUG,No,No, BUG,BUG,No,No, EXAMPLE 1,EXAMPLE 1,No,No, EXAMPLE 2,EXAMPLE 2,Yes,No, PRESENT AVAIALBLE,PRESENT AVAILABLE,No,No, PRESENT,PRESENT,No,No" method I wrote def merge(string1, string2) string1 = string1.split("\n") # Split into array. headers = string1.first # Get headers. string1.shift # Remove headers. string2 = string2.split("\n")[1..-1] # Remove headers. final = (string1 + string2).sort.unshift(headers).join("\n") + "\n" # Create merged sorted string. end desired result wanted "Category Name,Code,Enabled?,Category Hidden? BUG,BUG,No,No, BUG AVAILABLE,BUG,No,No, EXAMPLE 1,EXAMPLE 1,No,No, EXAMPLE 2,EXAMPLE 2,Yes,No, PRESENT,PRESENT,No,No" PRESENT AVAIALBLE,PRESENT AVAILABLE,No,No"
Here are three ways to do that. I assume you are given two strings: str1 = "Category Name,Code,Enabled?,Category Hidden?\nBUG,BUG,No,No\nEXAMPLE 1,EXAMPLE 1,No,No\nPRESENT,PRESENT,No,No" str2 = "Category Name,Code,Enabled?,Category Hidden?\nBUG AVAILABLE,BUG,No,No\nEXAMPLE 2,EXAMPLE 2,Yes,No\nPRESENT AVAILABLE,PRESENT AVAILABLE,No,No" Then header, *body1 = str1.split("\n") #=> ["Category Name,Code,Enabled?,Category Hidden?", # "BUG,BUG,No,No", # "EXAMPLE 1,EXAMPLE 1,No,No", # "PRESENT,PRESENT,No,No"] so header #=> "Category Name,Code,Enabled?,Category Hidden?" body1 #=> ["BUG,BUG,No,No", # "EXAMPLE 1,EXAMPLE 1,No,No", # "PRESENT,PRESENT,No,No"] and _, *body2 = str2.split("\n") #=> ["Category Name,Code,Enabled?,Category Hidden?", # "BUG AVAILABLE,BUG,No,No", # "EXAMPLE 2,EXAMPLE 2,Yes,No", # "PRESENT AVAILABLE,PRESENT AVAILABLE,No,No"] so _ #=> "Category Name,Code,Enabled?,Category Hidden?" body2 #=> ["BUG AVAILABLE,BUG,No,No", # "EXAMPLE 2,EXAMPLE 2,Yes,No", # "PRESENT AVAILABLE,PRESENT AVAILABLE,No,No"] We may then compute the desired string. str = [header].concat(body1.zip(body2).flatten).join("\n") #=> "Category Name,Code,Enabled?,Category Hidden?\nBUG,BUG,No,No\nBUG AVAILABLE,BUG,No,No\nEXAMPLE 1,EXAMPLE 1,No,No\nEXAMPLE 2,EXAMPLE 2,Yes,No\nPRESENT,PRESENT,No,No\nPRESENT AVAILABLE,PRESENT AVAILABLE,No,No" which when displayed appears as follows. puts str Category Name,Code,Enabled?,Category Hidden? BUG,BUG,No,No BUG AVAILABLE,BUG,No,No EXAMPLE 1,EXAMPLE 1,No,No EXAMPLE 2,EXAMPLE 2,Yes,No PRESENT,PRESENT,No,No PRESENT AVAILABLE,PRESENT AVAILABLE,No,No See Array#concat, Array#zip, Array#flatten and Array#join. The variable _ in _, *body2 = str2.split("\n") is so-named to tell the reader that it is not used in subsequent calculations. Sometimes might write _header, *body2 = str2.split("\n") to convey the same message. Here is a second way of doing that by treating the strings as comma-delimited CSV strings. require 'csv' arr1 = CSV.parse(str1) #=> [["Category Name", "Code", "Enabled?", "Category Hidden?"], # ["BUG", "BUG", "No", "No"], # ["EXAMPLE 1", "EXAMPLE 1", "No", "No"], # ["PRESENT", "PRESENT", "No", "No"]], arr2 = CSV.parse(str2) #=> [["Category Name", "Code", "Enabled?", "Category Hidden?"], # ["BUG AVAILABLE", "BUG", "No", "No"], # ["EXAMPLE 2", "EXAMPLE 2", "Yes", "No"], # ["PRESENT AVAILABLE", "PRESENT AVAILABLE", "No", "No"]] Then str = CSV.generate do |csv| csv << arr1.shift arr2.shift until arr2.empty? do csv << arr1.shift csv << arr2.shift end end #=> "Category Name,Code,Enabled?,Category Hidden?\nBUG,BUG,No,No\nBUG AVAILABLE,BUG,No,No\nEXAMPLE 1,EXAMPLE 1,No,No\nEXAMPLE 2,EXAMPLE 2,Yes,No\nPRESENT,PRESENT,No,No\nPRESENT AVAILABLE,PRESENT AVAILABLE,No,No\n" puts str Category Name,Code,Enabled?,Category Hidden? BUG,BUG,No,No BUG AVAILABLE,BUG,No,No EXAMPLE 1,EXAMPLE 1,No,No EXAMPLE 2,EXAMPLE 2,Yes,No PRESENT,PRESENT,No,No PRESENT AVAILABLE,PRESENT AVAILABLE,No,No See CSV::parse and CSV::generate. This can also be done without converting the strings to arrays, manipulating those arrays to form a single array and then converting the single array back to a string. arr = [str1, str2] str_indices = 0..str1.count("\n") arr_indices = 0..arr.size-1 idx_begin = Array.new(arr.size, 0) puts str_indices.each_with_object("") do |i, str| arr_indices.each do |j| idx_end = arr[j].index(/(?:\n|\z)/, idx_begin[j]) s = arr[j][idx_begin[j]..idx_end] s << "\n" unless s[-1] == "\n" || (i == str_indices.last && j == arr_indices.last) str << s unless i.zero? && j > 0 idx_begin[j] = idx_end + 1 end end Category Name,Code,Enabled?,Category Hidden? BUG,BUG,No,No BUG AVAILABLE,BUG,No,No EXAMPLE 1,EXAMPLE 1,No,No EXAMPLE 2,EXAMPLE 2,Yes,No PRESENT,PRESENT,No,No PRESENT AVAILABLE,PRESENT AVAILABLE,No,No The regular expression /(?:\n|\z)/ matches a newline character (\n) or (|) the end of the string (\z). See the form of String#index that takes an optional second argument that specifies the string index where the search is to begin.
Ruby regex to get text blocks including delimiters
When using scan in Ruby, we are searching for a block within a text file. Sample file: sometextbefore begin sometext end sometextafter begin sometext2 end sometextafter2 We want the following result in an array: ["begin\nsometext\nend","begin\nsometext2\nend"] With this scan method: textfile.scan(/begin\s.(.*?)end/m) we get: ["sometext","sometext2"] We want the begin and end still in the output, not cut off. Any suggestions?
You may remove the capturing group completely: textfile.scan(/begin\s.*?end/m) See the IDEONE demo The String#scan method returns captured values only if you have capturing groups defined inside the pattern, thus a non-capturing one should fix the issue. UPDATE If the lines inside the blocks must be trimmed from leading/trailing whitespace, you can just use a gsub against each matched block of text to remove all the horizontal whitespace (with the help of \p{Zs} Unicode category/property class): .scan(/begin\s.*?end/m).map { |s| s.gsub(/^\p{Zs}+|\p{Zs}+$/, "") } Here, each match is passed to a block where /^\p{Zs}+|\p{Zs}+$/ matches either the start of a line with 1+ horizontal whitespace(s) (see ^\p{Zs}+), or 1+ horizontal whitespace(s) at the end of the line (see \p{Zs}+$). See another IDEONE demo
Here's another approach, using Ruby's flip-flop operator. I cannot say I would recommend this approach, but Rubiests should understand how the flip-flop operator works. First let's create a file. str =<<_ some text at beginning begin some text 1 end some text between begin some text 2 end some text at end _ #=> "some\ntext\nat beginning\nbegin\n some\n text\n 1\nend\n...at end\n" FName = "text" File.write(FName, str) Now read the file line-by-line into the array lines: lines = File.readlines(FName) #=> ["some\n", "text\n", "at beginning\n", "begin\n", " some\n", " text\n", # " 1\n", "end\n", "some text\n", "between\n", "begin\n", " some\n", # " text\n", " 2\n", "end\n", "some text at end\n"] We can obtain the desired result as follows. lines.chunk { |line| true if line =~ /^begin\s*$/ .. line =~ /^end\s*$/ }. map { |_,arr| arr.map(&:strip).join("\n") } #=> ["begin\nsome\ntext\n1\nend", "begin\nsome\ntext\n2\nend"] The two steps are as follows. First, select and group the lines of interest, using Enumerable#chunk with the flip-flop operator. a = lines.chunk { |line| true if line =~ /^begin\s*$/ .. line =~ /^end\s*$/ } #=> #<Enumerator: #<Enumerator::Generator:0x007ff62b981510>:each> We can see the objects that will be generated by this enumerator by converting it to an array. a.to_a #=> [[true, ["begin\n", " some\n", " text\n", " 1\n", "end\n"]], # [true, ["begin\n", " some\n", " text\n", " 2\n", "end\n"]]] Note that the flip-flop operator is distinguished from a range definition by making it part of a logical expression. For that reason we cannot write lines.chunk { |line| line =~ /^begin\s*$/ .. line =~ /^end\s*$/ }.to_a #=> ArgumentError: bad value for range The second step is the following: b = a.map { |_,arr| arr.map(&:strip).join("\n") } #=> ["begin\nsome\ntext\n1\nend", "begin\nsome\ntext\n2\nend"]
Ruby has some great methods in Enumerable. slice_before and slice_after can help with this sort of problem: string = <<EOT sometextbefore begin sometext end sometextafter begin sometext2 end sometextafter2 EOT ary = string.split # => ["sometextbefore", "begin", "sometext", "end", "sometextafter", "begin", "sometext2", "end", "sometextafter2"] .slice_after(/^end/) # => #<Enumerator: #<Enumerator::Generator:0x007fb1e20b42a8>:each> .map{ |a| a.shift; a } # => [["begin", "sometext", "end"], ["begin", "sometext2", "end"], []] ary.pop # => [] ary # => [["begin", "sometext", "end"], ["begin", "sometext2", "end"]] If you want the resulting sub-arrays joined then that's an easy step: ary.map{ |a| a.join("\n") } # => ["begin\nsometext\nend", "begin\nsometext2\nend"]
Why don't these string expressions print the same result?
Why does this expression: puts "abc" * 5 => "abcabcabcabcabc" not equal this expression? 5.times do puts "abc" abc abc abc abc abc => 5 Could you please explain why they don't print the same result?
The first writes the string "abc" concatenated to itself five times: "abc"*5 = "abc"+"abc"+"abc"+"abc"+"abc" = "abcabcabcabcabc" The second piece of code writes "abc" using the puts function 5 times. The puts function writes a newline character after each message, meaning that it writes "abc\n" 5 times. 5.times do puts "abc" turns to puts "abc" ->also jumps to the next line puts "abc" ->also jumps to the next line puts "abc" ->also jumps to the next line puts "abc" ->also jumps to the next line puts "abc" ->also jumps to the next line
you can replace puts with print, which doesn't add the new line at the end 5.times do print "abc" end abcabcabcabcabc => 5
Ruby String concatenation
I have an array books = ["Title 1", "Title 2", "Title 3"] I need to iterate through this array and get a variable like this: #books_read = "Title 1 \n Title 2 \n Title 3" I tried this bit of code: books.each do |book| #books_read += "#{book} \n" end puts #books_read But, the + operator does not concatenate the strings. Any leads on this please. Cheers!
You can use Array#join: books.join(" \n "). join(sep=$,) → str Returns a string created by converting each element of the array to a string, separated by sep.
You can use join: books.join(" \n ")
Read data from yaml file and produce an array in ruby
I have the following data in a yaml file - --- - :Subject_list Subject 1: :Act 1: A :Act 2: B Subject 2: :Skill 1: :Act 1: B :Act 2: B :Skill 2: :Act 1: B I need to read data from this file and and generate an output which is given below - For subject 1 it will be like this as it has no skill level. Meaning the first element of the array is null. ["","Act 1", "A"], ["","Act 2", "B"] For the second subject it will be like this - ["Skill 1","Act 1", "B"], ["","Act 2" "B"],["Skill 2","Act 1", "B"] I am using these values to generate a prawn pdf table. Any help is greatly appreciated. I tried doing this - data=YAML::load(File.read("file.yaml")); subject = data[:Subject_list] sub_list =["Subject 1", "Subject 2"] sub_list.each do |sub| sub_data = [] sub_data = subject["#{sub}"] # I convert the list symbol to an array, so i can loop through the sub activities. #I need some direction here as how to check whether the symbol will be a skill or activity end Cheers!!
First off, your yaml file is not correct YAML, you cannot have keys like that, if you have space or weirdness in them you need to quote them, and what's up with the : at the beginning? "Subject_list": "Subject 1": "Act 1": A "Act 2": B "Subject 2": "Skill 1": "Act 1": B "Act 2": B "Skill 2": "Act 1": B Then you need to load the file properly. You call the method load_file on the YAML module. No :: for method access in ruby afaik. require 'yaml' data = YAML.load_file "file.yaml" subject = data["Subject_list"] require 'pp' subject.each do |s| item = s.last if item.keys.first =~ /Skill/ pp item.keys.inject([]) { |memo,x| item[x].map { |i| memo << i.flatten.unshift(x) } ; memo} else pp item.map { |k,v| ["", k, v] } end end
When building up a YAML file for data, especially a complex data structure, I let YAML generate it for me. Then I tweak as necessary: require 'yaml' require 'pp' foo = ["Skill 1","Act 1", "B"], ["","Act 2" "B"],["Skill 2","Act 1", "B"] puts foo.to_yaml When I run that code I get this output: --- - - Skill 1 - Act 1 - B - - "" - Act 2B - - Skill 2 - Act 1 - B You can prove the data is correctly generated by having YAML generate, then immediately parse the code and show what it looks like as the returned structure, letting you compare it, and by an equality check: bar = YAML.load(foo.to_yaml) pp bar puts "foo == bar: #{ foo == bar }" Which would output: [["Skill 1", "Act 1", "B"], ["", "Act 2B"], ["Skill 2", "Act 1", "B"]] foo == bar: true