How do I use Parslet with strings not Parslet Slices - ruby

I've started using Parslet to parse some custom data. In the examples, the resulting parsed data is something like:
{ :custom_string => "data"#6 }
And I've created the Transform something like
rule(:custom_string => simple(:x)) { x.to_s }
But it doesn't match, presumably because I'm passing "data"#6 instead of just "data" which isn't just a simple string. All the examples for the Transform have hashes with strings, not with Parslet::Slices which is what the parser outputs. Maybe I'm missing a step but I can't see anything in the docs.
EDIT : More sample code (reduced version but should still be explanatory)
original_text = 'MSGSTART/DATA1/DATA2/0503/MAR'
require "parslet"
include Parslet
module ParseExample
class Parser < Parslet::Parser
rule(:fs) { str("/") }
rule(:newline) { str("\n") | str("\r\n") }
rule(:msgstart) { str("MSGSTART") }
rule(:data1) { match("\\w").repeat(1).as(:data1) }
rule(:data2) { match("\\w").repeat(1).as(:data2) }
rule(:serial_number) { match("\\w").repeat(1).as(:serial_number) }
rule(:month) { match("\\w").repeat(1).as(:month) }
rule(:first_line) { msgstart >> fs >> data1 >> fs >> data2 >> fs >> serial_number >> fs >> month >> newline }
rule(:document) { first_line >> newline.maybe }
root(:document)
end
end
module ParseExample
class Transformer < Parslet::Transform
rule(:data1 => simple(:x)) { x.to_s }
rule(:data2 => simple(:x)) { x.to_s }
rule(:serial_number => simple(:x)) { x.to_s }
rule(:month => simple(:x)) { x.to_s }
end
end
# Run by calling...
p = ParseExample::Parser.new
parse_result = p.parse(original_text)
# => {:data1=>"data1"#6, :data2=>"data2"#12, :serial_number=>"0503"#18, :month=>"MAR"#23}
t = ParseExample::Transformer.new
transformed = t.apply(parser_result)
# Actual result => {:data1=>"data1"#6, :data2=>"data2"#12, :serial_number=>"0503"#18, :month=>"MAR"#23}
# Expected result => {:data1=>"data1", :data2=>"data2", :serial_number=>"0503", :month=>"MAR"}

You can't replace individual key/value pairs. You have to replace the whole hash at once.
I fell for this the first time I wrote transformers too. The key is that transform rules match a whole node and replace it.. in it's entirity. Once a node has been matches it's not visited again.
If you did consume a hash and only match a single key/value pair, replacing it with a value... you just lost all the other key/value pairs in the same hash.
However... There is a way!
If you do want to pre-process all the nodes in a hash before matching the whole hash, the the hash's values need to be hashes themselves. Then you could match those and convert them to strings. You can usually do this by simply adding another 'as' in your parser.
For example:
original_text = 'MSGSTART/DATA1/DATA2/0503/MAR'
require "parslet"
include Parslet
module ParseExample
class Parser < Parslet::Parser
rule(:fs) { str("/") }
rule(:newline) { str("\n") | str("\r\n") }
rule(:msgstart) { str("MSGSTART") }
rule(:string) {match("\\w").repeat(1).as(:string)} # Notice the as!
rule(:data1) { string.as(:data1) }
rule(:data2) { string.as(:data2) }
rule(:serial_number) { string.as(:serial_number) }
rule(:month) { string.as(:month) }
rule(:first_line) {
msgstart >> fs >>
data1 >> fs >>
data2 >> fs >>
serial_number >> fs >>
month >> newline.maybe
}
rule(:document) { first_line >> newline.maybe }
root(:document)
end
end
# Run by calling...
p = ParseExample::Parser.new
parser_result = p.parse(original_text)
puts parser_result.inspect
# => {:data1=>{:string=>"DATA1"#9},
:data2=>{:string=>"DATA2"#15},
:serial_number=>{:string=>"0503"#21},
:month=>{:string=>"MAR"#26}}
# See how the values in the hash are now all hashes themselves.
module ParseExample
class Transformer < Parslet::Transform
rule(:string => simple(:x)) { x.to_s }
end
end
# We just need to match the "{:string => x}" hashes now...and replace them with strings
t = ParseExample::Transformer.new
transformed = t.apply(parser_result)
puts transformed.inspect
# => {:data1=>"DATA1", :data2=>"DATA2", :serial_number=>"0503", :month=>"MAR"}
# Tada!!!
If you had wanted to handle the whole line, do make an object from it.. say..
class Entry
def initialize(data1:, data2:, serial_number:,month:)
#data1 = data1
#data2 = data2
#serial_number = serial_number
#month = month
end
end
module ParseExample
class Transformer < Parslet::Transform
rule(:string => simple(:x)) { x.to_s }
# match the whole hash
rule(:data1 => simple(:d1),
:data2 => simple(:d2),
:serial_number => simple(:s),
:month => simple(:m)) {
Entry.new(data1: d1,data2: d2,serial_number: s,month: m)}
end
end
t = ParseExample::Transformer.new
transformed = t.apply(parser_result)
puts transformed.inspect
# => #<Entry:0x007fd5a3d26bf0 #data1="DATA1", #data2="DATA2", #serial_number="0503", #month="MAR">

Related

How to create a Hash from a nested CSV in Ruby?

I have a CSV in the following format:
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,codes.1
YK,1234,4567,AB001,AK002
As you can see, this is a nested structure. The CSV may contain multiple rows. I would like to convert this into an array of hashes like this:
[
{
name: 'YK',
contacts: [
{
phone_no: '1234'
},
{
phone_no: '4567'
}
],
codes: ['AB001', 'AK002']
}
]
The structure uses numbers in the given format to represent arrays. There can be hashes inside arrays. Is there a simple way to do that in Ruby?
The CSV headers are dynamic. It can change. I will have to create the hash on the fly based on the CSV file.
There is a similar node library called csvtojson to do that for JavaScript.
Just read and parse it line-by-line. The arr variable in the code below will hold an array of Hash that you need
arr = []
File.readlines('README.md').drop(1).each do |line|
fields = line.split(',').map(&:strip)
hash = { name: fields[0], contacts: [fields[1], fields[2]], address: [fields[3], fields[4]] }
arr.push(hash)
end
Let's first construct a CSV file.
str = <<~END
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,IQ,codes.1
YK,1234,4567,AB001,173,AK002
ER,4321,7654,BA001,81,KA002
END
FName = 't.csv'
File.write(FName, str)
#=> 121
I have constructed a helper method to construct a pattern that will be used to convert each row of the CSV file (following the first, containing the headers) to an element (hash) of the desired array.
require 'csv'
def construct_pattern(csv)
csv.headers.group_by { |col| col[/[^.]+/] }.
transform_values do |arr|
case arr.first.count('.')
when 0
arr.first
when 1
arr
else
key = arr.first[/(?<=\d\.).*/]
arr.map { |v| { key=>v } }
end
end
end
In the code below, for the example being considered:
construct_pattern(csv)
#=> {"name"=>"name",
# "contacts"=>[{"phone_no"=>"contacts.0.phone_no"},
# {"phone_no"=>"contacts.1.phone_no"}],
# "codes"=>["codes.0", "codes.1"],
# "IQ"=>"IQ"}
By tacking if pattern.empty? onto the above expression we ensure the pattern is constructed only once.
We may now construct the desired array.
pattern = {}
CSV.foreach(FName, headers: true).map do |csv|
pattern = construct_pattern(csv) if pattern.empty?
pattern.each_with_object({}) do |(k,v),h|
h[k] =
case v
when Array
case v.first
when Hash
v.map { |g| g.transform_values { |s| csv[s] } }
else
v.map { |s| csv[s] }
end
else
csv[v]
end
end
end
#=> [{"name"=>"YK",
# "contacts"=>[{"phone_no"=>"1234"}, {"phone_no"=>"4567"}],
# "codes"=>["AB001", "AK002"],
# "IQ"=>"173"},
# {"name"=>"ER",
# "contacts"=>[{"phone_no"=>"4321"}, {"phone_no"=>"7654"}],
# "codes"=>["BA001", "KA002"],
# "IQ"=>"81"}]
The CSV methods I've used are documented in CSV. See also Enumerable#group_by and Hash#transform_values.

How do I split an atom in Parslet?

I'm building an SQL-like query language. I would like to be able to handle lists of items delimited by commas. I have successfully achieved this with this code:
class QueryParser < Parslet::Parser
rule(:space) { match('\s').repeat(1) }
rule(:space?) { space.maybe }
rule(:delimiter) { space? >> str(',') >> space? }
rule(:select) { str('SELECT') >> space? }
rule(:select_value) { str('*') | match('[a-zA-Z]').repeat(1) }
rule(:select_arguments) do
space? >>
(select_value >> (delimiter >> select_value).repeat).maybe.as(:select) >>
space?
end
rule(:from) { str('FROM') >> space? }
rule(:from_arguments) { match('[a-zA-Z]').repeat(1).as(:from) >> space? }
rule(:query) { select >> select_arguments >> from >> from_arguments }
root(:query)
end
Where something like SELECT id,name,fork FROM forks correctly outputs the {:select=>"id,name,fork"#7, :from=>"forks"#25} tree.
Now, instead of messing around with this later, I would like to be able to convert the SELECT arguments (id,name,fork in this case) into an Array. I can do this by running 'id,name,fork'.split ','. I cannot get the Parslet transformer to do this for me when applied. This my code for my query transformer:
class QueryTransformer < Parslet::Transform
rule(select: simple(:args)) { args.split(',') }
end
When applied like so:
QueryTransformer.new.apply(
QueryParser.new.parse('SELECT id,name,fork FROM forks')
)
The result is the same as when I didn't apply it: {:select=>"id,name,fork"#7, :from=>"forks"#25}.
The value I was hoping :select to be is an Array like this ["id","name","fork"].
My question is: how do I split the value of :select into an Array using transformers?
You need to put "as(:xxx)" on whatever part of the parse tree you want to be able to play with later.
Here I changed your rule(:select_value) to remember the values as a :value
rule(:select_value) { (str('*') | match('[a-zA-Z]').repeat(1)).as(:value) }
Now your parser outputs :
{:select=>[{:value=>"id"#7}, {:value=>"name"#10}, {:value=>"fork"#15}], :from=>"forks"#25}
Which is easy to transform using:
class QueryTransformer < Parslet::Transform
rule(:value => simple(:val)) { val }
end
Then you get:
{:select=>["id"#7, "name"#10, "fork"#15], :from=>"forks"#25}
So in full the code is as follows :-
require 'parslet'
class QueryParser < Parslet::Parser
rule(:space) { match('\s').repeat(1) }
rule(:space?) { space.maybe }
rule(:delimiter) { space? >> str(',') >> space? }
rule(:select) { str('SELECT') >> space? }
rule(:select_value) { (str('*') | match('[a-zA-Z]').repeat(1)).as(:value) }
rule(:select_arguments) do
space? >>
(select_value >> (delimiter >> select_value).repeat).maybe.as(:select) >>
space?
end
rule(:from) { str('FROM') >> space? }
rule(:from_arguments) { match('[a-zA-Z]').repeat(1).as(:from) >> space? }
rule(:query) { select >> select_arguments >> from >> from_arguments }
root(:query)
end
puts QueryParser.new.parse('SELECT id,name,fork FROM forks')
# => {:select=>[{:value=>"id"#7}, {:value=>"name"#10}, {:value=>"fork"#15}], :from=>"forks"#25}
class QueryTransformer < Parslet::Transform
rule(:value => simple(:val)) { val }
end
puts QueryTransformer.new.apply(
QueryParser.new.parse('SELECT id,name,fork FROM forks')
)
# => {:select=>["id"#7, "name"#10, "fork"#15], :from=>"forks"#25}

How to merge multiple hashes?

Right now, I'm merging two hashes like this:
department_hash = self.parse_department html
super_saver_hash = self.parse_super_saver html
final_hash = department_hash.merge(super_saver_hash)
Output:
{:department=>{"Pet Supplies"=>{"Birds"=>16281, "Cats"=>245512,
"Dogs"=>513926, "Fish & Aquatic Pets"=>46811, "Horses"=>14805,
"Insects"=>364, "Reptiles & Amphibians"=>5816, "Small
Animals"=>19769}}, :super_saver=>{"Free Super Saver
Shipping"=>126649}}
But now I want to merge more in the future. For example:
department_hash = self.parse_department html
super_saver_hash = self.parse_super_saver html
categories_hash = self.parse_categories html
How to merge multiple hashes?
How about:
[department_hash, super_saver_hash, categories_hash].reduce &:merge
You can just call merge again:
h1 = {foo: :bar}
h2 = {baz: :qux}
h3 = {quux: :garply}
h1.merge(h2).merge(h3)
#=> {:foo=>:bar, :baz=>:qux, :quux=>:garply}
You can do below way using Enumerable#inject:
h = {}
arr = [{:a=>"b"},{"c" => 2},{:a=>4,"c"=>"Hi"}]
arr.inject(h,:update)
# => {:a=>4, "c"=>"Hi"}
arr.inject(:update)
# => {:a=>4, "c"=>"Hi"}
It took me a while to figure out how to merge multi-nested hashes after going through this Question and its Answers. It turned out I was iterating through the collections of hashes incorrectly, causing all kinds of problems with null values.
This sample command-line app shows how to merge multiple hashes with a combination of store and merge!, depending on whether or not they were top-level hash keys. It uses command-line args with a few known key name for categorization purposes.
Full code from the Gist URL is provided below as a courtesy:
# Ruby - A nested hash example
# Load each pair of args on the command-line as a key-value pair
# For example from CMD.exe:
# call ruby.exe ruby_nested_hash_example.rb Age 30 Name Mary Fav_Hobby Ataraxia Fav_Number 42
# Output would be:
# {
# "data_info": {
# "types": {
# "nums": {
# "Age": 30,
# "Fav_Number": 42
# },
# "strings": {
# "Name": "Mary",
# "Fav_Hobby": "Ataraxia"
# }
# },
# "data_id": "13435436457"
# }
# }
if (ARGV.count % 2 != 0) || (ARGV.count < 2)
STDERR.puts "You must provide an even amount of command-line args to make key-value pairs.\n"
abort
end
require 'json'
cmd_hashes = {}
nums = {}
strings = {}
types = {}
#FYI `tl` == top-level
all_tl_keys = {}
data_info = {}
data_id = {:data_id => "13435436457"}
_key = ""
_value = ""
element = 0
ARGV.each do |i|
if element % 2 == 0
_key=i
else
if (i.to_i!=0) && (i!=0)
_value=i.to_i
else
_value=i
end
end
if (_key != "") && (_value != "")
cmd_hashes.store(_key, _value)
_key = ""
_value = ""
end
element+=1
end
cmd_hashes.each do |key, value|
if value.is_a? Numeric
nums.store(key, value)
else
strings.store(key, value)
end
end
if nums.size > 0; types.merge!(:nums => nums) end
if strings.size > 0; types.merge!(:strings => strings) end
if types.size > 0; all_tl_keys.merge!(:types => types) end
if data_id.size > 0; all_tl_keys.merge!(data_id) end
if all_tl_keys.size > 0; data_info.merge!(:data_info => all_tl_keys) end
if data_info.size > 0; puts JSON.pretty_generate(data_info) end
Suppose you are having arr = [{x: 10},{y: 20},{z: 30}]
then do
arr.reduce(:merge)

Indentation sensitive parser using Parslet in Ruby?

I am attempting to parse a simple indentation sensitive syntax using the Parslet library within Ruby.
The following is an example of the syntax I am attempting to parse:
level0child0
level0child1
level1child0
level1child1
level2child0
level1child2
The resulting tree would look like so:
[
{
:identifier => "level0child0",
:children => []
},
{
:identifier => "level0child1",
:children => [
{
:identifier => "level1child0",
:children => []
},
{
:identifier => "level1child1",
:children => [
{
:identifier => "level2child0",
:children => []
}
]
},
{
:identifier => "level1child2",
:children => []
},
]
}
]
The parser that I have now can parse nesting level 0 and 1 nodes, but cannot parse past that:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
rule(:indent) { str(' ') }
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) }
rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) }
rule(:document) { node.repeat }
root :document
end
require 'ap'
require 'pp'
begin
input = DATA.read
puts '', '----- input ----------------------------------------------------------------------', ''
ap input
tree = IndentationSensitiveParser.new.parse(input)
puts '', '----- tree -----------------------------------------------------------------------', ''
ap tree
rescue IndentationSensitiveParser::ParseFailed => failure
puts '', '----- error ----------------------------------------------------------------------', ''
puts failure.cause.ascii_tree
end
__END__
user
name
age
recipe
name
foo
bar
It's clear that I need a dynamic counter that expects 3 indentation nodes to match a identifier on the nesting level 3.
How can I implement an indentation sensitive syntax parser using Parslet in this way? Is it possible?
There are a few approaches.
Parse the document by recognising each line as a collection of indents and an identifier, then apply a transformation afterwards to reconstruct the hierarchy based on the number of indents.
Use captures to store the current indent and expect the next node to include that indent plus more to match as a child (I didn't dig into this approach much as the next one occurred to me)
Rules are just methods. So you can define 'node' as a method, which means you can pass parameters! (as follows)
This lets you define node(depth) in terms of node(depth+1). The problem with this approach, however, is that the node method doesn't match a string, it generates a parser. So a recursive call will never finish.
This is why dynamic exists. It returns a parser that isn't resolved until the point it tries to match it, allowing you to now recurse without problems.
See the following code:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
def indent(depth)
str(' '*depth)
end
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }
def node(depth)
indent(depth) >>
identifier >>
newline.maybe >>
(dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
end
rule(:document) { node(0).repeat }
root :document
end
This is my favoured solution.
I don't like the idea of weaving knowledge of the indentation process through the whole grammar. I would rather just have INDENT and DEDENT tokens produced that other rules could use similarly to just matching "{" and "}" characters. So the following is my solution. It is a class IndentParser that any parser can extend to get nl, indent, and decent tokens generated.
require 'parslet'
# Atoms returned from a dynamic that aren't meant to match anything.
class AlwaysMatch < Parslet::Atoms::Base
def try(source, context, consume_all)
succ("")
end
end
class NeverMatch < Parslet::Atoms::Base
attr_accessor :msg
def initialize(msg = "ignore")
self.msg = msg
end
def try(source, context, consume_all)
context.err(self, source, msg)
end
end
class ErrorMatch < Parslet::Atoms::Base
attr_accessor :msg
def initialize(msg)
self.msg = msg
end
def try(source, context, consume_all)
context.err(self, source, msg)
end
end
class IndentParser < Parslet::Parser
##
# Indentation handling: when matching a newline we check the following indentation. If
# that indicates an indent token or detent tokens (1+) then we stick these in a class
# variable and the high-priority indent/dedent rules will match as long as these
# remain. The nl rule consumes the indentation itself.
rule(:indent) { dynamic {|s,c|
if #indent.nil?
NeverMatch.new("Not an indent")
else
#indent = nil
AlwaysMatch.new
end
}}
rule(:dedent) { dynamic {|s,c|
if #dedents.nil? or #dedents.length == 0
NeverMatch.new("Not a dedent")
else
#dedents.pop
AlwaysMatch.new
end
}}
def checkIndentation(source, ctx)
# See if next line starts with indentation. If so, consume it and then process
# whether it is an indent or some number of dedents.
indent = ""
while source.matches?(Regexp.new("[ \t]"))
indent += source.consume(1).to_s #returns a Slice
end
if #indentStack.nil?
#indentStack = [""]
end
currentInd = #indentStack[-1]
return AlwaysMatch.new if currentInd == indent #no change, just match nl
if indent.start_with?(currentInd)
# Getting deeper
#indentStack << indent
#indent = indent #tells the indent rule to match one
return AlwaysMatch.new
else
# Either some number of de-dents or an error
# Find first match starting from back
count = 0
#indentStack.reverse.each do |level|
break if indent == level #found it,
if level.start_with?(indent)
# New indent is prefix, so we de-dented this level.
count += 1
next
end
# Not a match, not a valid prefix. So an error!
return ErrorMatch.new("Mismatched indentation level")
end
#dedents = [] if #dedents.nil?
count.times { #dedents << #indentStack.pop }
return AlwaysMatch.new
end
end
rule(:nl) { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }}
rule(:unixnl) { str("\n") }
rule(:macnl) { str("\r") }
rule(:winnl) { str("\r\n") }
rule(:anynl) { unixnl | macnl | winnl }
end
I'm sure a lot can be improved, but this is what I've come up with so far.
Example usage:
class MyParser < IndentParser
rule(:colon) { str(':') >> space? }
rule(:space) { match(' \t').repeat(1) }
rule(:space?) { space.maybe }
rule(:number) { match['0-9'].repeat(1).as(:num) >> space? }
rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) }
rule(:block) { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent }
rule(:stmt) { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock }
rule(:testblock) { identifier.as(:name) >> block }
rule(:prgm) { testblock >> nl.repeat }
root :prgm
end

How to update a Ruby nested hash inside a loop?

I'm creating a nested hash in ruby rexml and want to update the hash when i enter a loop.
My code is like:
hash = {}
doc.elements.each(//address) do |n|
a = # ...
b = # ...
hash = { "NAME" => { a => { "ADDRESS" => b } } }
end
When I execute the above code the hash gets overwritten and I get only the info in the last iteration of the loop.
I don't want to use the following way as it makes my code verbose
hash["NAME"] = {}
hash["NAME"][a] = {}
and so on...
So could someone help me out on how to make this work...
Assuming the names are unique:
hash.merge!({"NAME" => { a => { "ADDRESS" => b } } })
You always create a new hash in each iteration, which gets saved in hash.
Just assign the key directly in the existing hash:
hash["NAME"] = { a => { "ADDRESS" => b } }
hash = {"NAME" => {}}
doc.elements.each('//address') do |n|
a = ...
b = ...
hash['NAME'][a] = {'ADDRESS' => b, 'PLACE' => ...}
end
blk = proc { |hash, key| hash[key] = Hash.new(&blk) }
hash = Hash.new(&blk)
doc.elements.each('//address').each do |n|
a = # ...
b = # ...
hash["NAME"][a]["ADDRESS"] = b
end
Basically creates a lazily instantiated infinitely recurring hash of hashes.
EDIT: Just thought of something that could work, this is only tested with a couple of very simple hashes so may have some problems.
class Hash
def can_recursively_merge? other
Hash === other
end
def recursive_merge! other
other.each do |key, value|
if self.include? key and self[key].can_recursively_merge? value
self[key].recursive_merge! value
else
self[key] = value
end
end
self
end
end
Then use hash.recursive_merge! { "NAME" => { a => { "ADDRESS" => b } } } in your code block.
This simply recursively merges a heirachy of hashes, and any other types if you define the recursive_merge! and can_recusively_merge? methods on them.

Resources