Indentation sensitive parser using Parslet in Ruby?

Indentation sensitive parser using Parslet in Ruby? - ruby

I am attempting to parse a simple indentation sensitive syntax using the Parslet library within Ruby.
The following is an example of the syntax I am attempting to parse:
level0child0
level0child1
level1child0
level1child1
level2child0
level1child2
The resulting tree would look like so:
[
{
:identifier => "level0child0",
:children => []
},
{
:identifier => "level0child1",
:children => [
{
:identifier => "level1child0",
:children => []
},
{
:identifier => "level1child1",
:children => [
{
:identifier => "level2child0",
:children => []
}
]
},
{
:identifier => "level1child2",
:children => []
},
]
}
]
The parser that I have now can parse nesting level 0 and 1 nodes, but cannot parse past that:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
rule(:indent) { str(' ') }
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) }
rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) }
rule(:document) { node.repeat }
root :document
end
require 'ap'
require 'pp'
begin
input = DATA.read
puts '', '----- input ----------------------------------------------------------------------', ''
ap input
tree = IndentationSensitiveParser.new.parse(input)
puts '', '----- tree -----------------------------------------------------------------------', ''
ap tree
rescue IndentationSensitiveParser::ParseFailed => failure
puts '', '----- error ----------------------------------------------------------------------', ''
puts failure.cause.ascii_tree
end
__END__
user
name
age
recipe
name
foo
bar
It's clear that I need a dynamic counter that expects 3 indentation nodes to match a identifier on the nesting level 3.
How can I implement an indentation sensitive syntax parser using Parslet in this way? Is it possible?

There are a few approaches.
Parse the document by recognising each line as a collection of indents and an identifier, then apply a transformation afterwards to reconstruct the hierarchy based on the number of indents.
Use captures to store the current indent and expect the next node to include that indent plus more to match as a child (I didn't dig into this approach much as the next one occurred to me)
Rules are just methods. So you can define 'node' as a method, which means you can pass parameters! (as follows)
This lets you define node(depth) in terms of node(depth+1). The problem with this approach, however, is that the node method doesn't match a string, it generates a parser. So a recursive call will never finish.
This is why dynamic exists. It returns a parser that isn't resolved until the point it tries to match it, allowing you to now recurse without problems.
See the following code:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
def indent(depth)
str(' '*depth)
end
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }
def node(depth)
indent(depth) >>
identifier >>
newline.maybe >>
(dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
end
rule(:document) { node(0).repeat }
root :document
end
This is my favoured solution.

I don't like the idea of weaving knowledge of the indentation process through the whole grammar. I would rather just have INDENT and DEDENT tokens produced that other rules could use similarly to just matching "{" and "}" characters. So the following is my solution. It is a class IndentParser that any parser can extend to get nl, indent, and decent tokens generated.
require 'parslet'
# Atoms returned from a dynamic that aren't meant to match anything.
class AlwaysMatch < Parslet::Atoms::Base
def try(source, context, consume_all)
succ("")
end
end
class NeverMatch < Parslet::Atoms::Base
attr_accessor :msg
def initialize(msg = "ignore")
self.msg = msg
end
def try(source, context, consume_all)
context.err(self, source, msg)
end
end
class ErrorMatch < Parslet::Atoms::Base
attr_accessor :msg
def initialize(msg)
self.msg = msg
end
def try(source, context, consume_all)
context.err(self, source, msg)
end
end
class IndentParser < Parslet::Parser
##
# Indentation handling: when matching a newline we check the following indentation. If
# that indicates an indent token or detent tokens (1+) then we stick these in a class
# variable and the high-priority indent/dedent rules will match as long as these
# remain. The nl rule consumes the indentation itself.
rule(:indent) { dynamic {|s,c|
if #indent.nil?
NeverMatch.new("Not an indent")
else
#indent = nil
AlwaysMatch.new
end
}}
rule(:dedent) { dynamic {|s,c|
if #dedents.nil? or #dedents.length == 0
NeverMatch.new("Not a dedent")
else
#dedents.pop
AlwaysMatch.new
end
}}
def checkIndentation(source, ctx)
# See if next line starts with indentation. If so, consume it and then process
# whether it is an indent or some number of dedents.
indent = ""
while source.matches?(Regexp.new("[ \t]"))
indent += source.consume(1).to_s #returns a Slice
end
if #indentStack.nil?
#indentStack = [""]
end
currentInd = #indentStack[-1]
return AlwaysMatch.new if currentInd == indent #no change, just match nl
if indent.start_with?(currentInd)
# Getting deeper
#indentStack << indent
#indent = indent #tells the indent rule to match one
return AlwaysMatch.new
else
# Either some number of de-dents or an error
# Find first match starting from back
count = 0
#indentStack.reverse.each do |level|
break if indent == level #found it,
if level.start_with?(indent)
# New indent is prefix, so we de-dented this level.
count += 1
next
end
# Not a match, not a valid prefix. So an error!
return ErrorMatch.new("Mismatched indentation level")
end
#dedents = [] if #dedents.nil?
count.times { #dedents << #indentStack.pop }
return AlwaysMatch.new
end
end
rule(:nl) { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }}
rule(:unixnl) { str("\n") }
rule(:macnl) { str("\r") }
rule(:winnl) { str("\r\n") }
rule(:anynl) { unixnl | macnl | winnl }
end
I'm sure a lot can be improved, but this is what I've come up with so far.
Example usage:
class MyParser < IndentParser
rule(:colon) { str(':') >> space? }
rule(:space) { match(' \t').repeat(1) }
rule(:space?) { space.maybe }
rule(:number) { match['0-9'].repeat(1).as(:num) >> space? }
rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) }
rule(:block) { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent }
rule(:stmt) { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock }
rule(:testblock) { identifier.as(:name) >> block }
rule(:prgm) { testblock >> nl.repeat }
root :prgm
end

Related

in Parslet, how to reconstruct substrings from parse subtrees?

I'm writing a parser for strings with interpolated name-value arguments, e.g.: 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.' The argument values are code, which has its own set of parse rules.
Here's a version of my parser, simplified to only allow basic arithmetic as code:
require 'parslet'
require 'ap'
class TestParser < Parslet::Parser
rule :integer do match('[0-9]').repeat(1).as :integer end
rule :space do match('[\s\\n]').repeat(1) end
rule :parens do str('(') >> code >> str(')') end
rule :operand do integer | parens end
rule :addition do (operand.as(:left) >> space >> str('+') >> space >> operand.as(:right)).as :addition end
rule :code do addition | operand end
rule :name do match('[a-z]').repeat 1 end
rule :argument do name.as(:name) >> str(':') >> space >> code.as(:value) end
rule :arguments do argument >> (str(',') >> space >> argument).repeat end
rule :interpolation do str('#{') >> arguments.as(:arguments) >> str('}') end
rule :text do (interpolation.absent? >> any).repeat(1).as(:text) end
rule :segments do (interpolation | text).repeat end
root :segments
end
string = 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.'
ap TestParser.new.parse(string), index: false
Since the code has its own parse rules (to ensure valid syntax), the argument values are parsed into a subtree (with parentheses etc. replaced by nesting within the subtree):
[
{
:text => "This sentence "#0
},
{
:arguments => [
{
:name => "x"#16,
:value => {
:integer => "2"#19
}
},
{
:name => "y"#22,
:value => {
:addition => {
:left => {
:addition => {
:left => {
:integer => "2"#26
},
:right => {
:integer => "5"#30
}
}
},
:right => {
:integer => "3"#35
}
}
}
}
]
},
{
:text => " has stuff in it."#37
}
]
However, I want to store the argument values as strings, so this would be the ideal result:
[
{
:text => "This sentence "#0
},
{
:arguments => [
{
:name => "x"#16,
:value => "2"
},
{
:name => "y"#22,
:value => "(2 + 5) + 3"
}
]
},
{
:text => " has stuff in it."#37
}
]
How can I use the Parslet subtrees to reconstruct the argument-value substrings? I could write a code generator, but that seems overkill -- Parslet clearly has access to the substring position information at some point (although it might discard it).
Is it possible to leverage or hack Parslet to return the substring?

The tree produced is based on the use of as in your parser.
You can try removing them from anything in an expression so you get a single string match for the expression. This seems to be what you are after.
If you want the parsed tree for these expressions too, then you need to either:
Transform the expression trees back to the matched text.
Re-Parse the matched text back into an expression tree.
Neither of these is ideal, but if speed is not vital, I would go the re-parse option. ie. remove the as atoms, and then later reparse the expressions to trees as needed.
As you rightly want to reuse the same rules, but this time you need as captures throughout the rules, then you could implement this by deriving a parser from your existing parser and implementing rules with the same names in terms of rule :x { super.x.as(:x)}
OR
You could have a general rule for expression that matches the whole expression without knowing what is in it.
eg. "#{" >> (("}".absent >> any) | "\\}").repeat(0) >> "}"
Then later you can parse each expression into a tree as needed. that way you are not repeating your rules. It assumes you can tell when your expression is complete without parsing the whole expression subtree.
Failing that, it leaves us with hacking parslet.
I don't have a solution here, just some hints.
Parslet has a module called "CanFlatten" that implements flatten and is used by as to convert the captured tree back to a single string. You are going to want to do something like this.
Alternatively you need to change the succ method in Atom::Base to return "[success/fail, result, consumed_upto_position]" so each match knows where it consumed up to. Then you can read from the source between the start position and end position to get the raw text back. The current position of the source at the point the parser matches should be the value you want.
Good Luck.
Note: My example expression parser doesn't handle escaping of the escape character.. (left as an exercise for the reader)

Here's the hack I ended up with. There are better ways to accomplish this, but they'd require more extensive changes. Parser#parse now returns a Result. Result#tree gives the normal parse result, and Result#strings is a hash that maps subtree structures to source strings.
module Parslet
class Parser
class Result < Struct.new(:tree, :strings); end
def parse(source, *args)
source = Source.new(source) unless source.is_a? Source
value = super source, *args
Result.new value, source.value_strings
end
end
class Source
prepend Module.new{
attr_reader :value_strings
def initialize(*args)
super *args
#value_strings = {}
end
}
end
class Atoms::Base
prepend Module.new{
def apply(source, *args)
old_pos = source.bytepos
super.tap do |success, value|
next unless success
string = source.instance_variable_get(:#str).string.slice(old_pos ... source.bytepos)
source.value_strings[flatten(value)] = string
end
end
}
end
end

How do I use Parslet with strings not Parslet Slices

I've started using Parslet to parse some custom data. In the examples, the resulting parsed data is something like:
{ :custom_string => "data"#6 }
And I've created the Transform something like
rule(:custom_string => simple(:x)) { x.to_s }
But it doesn't match, presumably because I'm passing "data"#6 instead of just "data" which isn't just a simple string. All the examples for the Transform have hashes with strings, not with Parslet::Slices which is what the parser outputs. Maybe I'm missing a step but I can't see anything in the docs.
EDIT : More sample code (reduced version but should still be explanatory)
original_text = 'MSGSTART/DATA1/DATA2/0503/MAR'
require "parslet"
include Parslet
module ParseExample
class Parser < Parslet::Parser
rule(:fs) { str("/") }
rule(:newline) { str("\n") | str("\r\n") }
rule(:msgstart) { str("MSGSTART") }
rule(:data1) { match("\\w").repeat(1).as(:data1) }
rule(:data2) { match("\\w").repeat(1).as(:data2) }
rule(:serial_number) { match("\\w").repeat(1).as(:serial_number) }
rule(:month) { match("\\w").repeat(1).as(:month) }
rule(:first_line) { msgstart >> fs >> data1 >> fs >> data2 >> fs >> serial_number >> fs >> month >> newline }
rule(:document) { first_line >> newline.maybe }
root(:document)
end
end
module ParseExample
class Transformer < Parslet::Transform
rule(:data1 => simple(:x)) { x.to_s }
rule(:data2 => simple(:x)) { x.to_s }
rule(:serial_number => simple(:x)) { x.to_s }
rule(:month => simple(:x)) { x.to_s }
end
end
# Run by calling...
p = ParseExample::Parser.new
parse_result = p.parse(original_text)
# => {:data1=>"data1"#6, :data2=>"data2"#12, :serial_number=>"0503"#18, :month=>"MAR"#23}
t = ParseExample::Transformer.new
transformed = t.apply(parser_result)
# Actual result => {:data1=>"data1"#6, :data2=>"data2"#12, :serial_number=>"0503"#18, :month=>"MAR"#23}
# Expected result => {:data1=>"data1", :data2=>"data2", :serial_number=>"0503", :month=>"MAR"}

You can't replace individual key/value pairs. You have to replace the whole hash at once.
I fell for this the first time I wrote transformers too. The key is that transform rules match a whole node and replace it.. in it's entirity. Once a node has been matches it's not visited again.
If you did consume a hash and only match a single key/value pair, replacing it with a value... you just lost all the other key/value pairs in the same hash.
However... There is a way!
If you do want to pre-process all the nodes in a hash before matching the whole hash, the the hash's values need to be hashes themselves. Then you could match those and convert them to strings. You can usually do this by simply adding another 'as' in your parser.
For example:
original_text = 'MSGSTART/DATA1/DATA2/0503/MAR'
require "parslet"
include Parslet
module ParseExample
class Parser < Parslet::Parser
rule(:fs) { str("/") }
rule(:newline) { str("\n") | str("\r\n") }
rule(:msgstart) { str("MSGSTART") }
rule(:string) {match("\\w").repeat(1).as(:string)} # Notice the as!
rule(:data1) { string.as(:data1) }
rule(:data2) { string.as(:data2) }
rule(:serial_number) { string.as(:serial_number) }
rule(:month) { string.as(:month) }
rule(:first_line) {
msgstart >> fs >>
data1 >> fs >>
data2 >> fs >>
serial_number >> fs >>
month >> newline.maybe
}
rule(:document) { first_line >> newline.maybe }
root(:document)
end
end
# Run by calling...
p = ParseExample::Parser.new
parser_result = p.parse(original_text)
puts parser_result.inspect
# => {:data1=>{:string=>"DATA1"#9},
:data2=>{:string=>"DATA2"#15},
:serial_number=>{:string=>"0503"#21},
:month=>{:string=>"MAR"#26}}
# See how the values in the hash are now all hashes themselves.
module ParseExample
class Transformer < Parslet::Transform
rule(:string => simple(:x)) { x.to_s }
end
end
# We just need to match the "{:string => x}" hashes now...and replace them with strings
t = ParseExample::Transformer.new
transformed = t.apply(parser_result)
puts transformed.inspect
# => {:data1=>"DATA1", :data2=>"DATA2", :serial_number=>"0503", :month=>"MAR"}
# Tada!!!
If you had wanted to handle the whole line, do make an object from it.. say..
class Entry
def initialize(data1:, data2:, serial_number:,month:)
#data1 = data1
#data2 = data2
#serial_number = serial_number
#month = month
end
end
module ParseExample
class Transformer < Parslet::Transform
rule(:string => simple(:x)) { x.to_s }
# match the whole hash
rule(:data1 => simple(:d1),
:data2 => simple(:d2),
:serial_number => simple(:s),
:month => simple(:m)) {
Entry.new(data1: d1,data2: d2,serial_number: s,month: m)}
end
end
t = ParseExample::Transformer.new
transformed = t.apply(parser_result)
puts transformed.inspect
# => #<Entry:0x007fd5a3d26bf0 #data1="DATA1", #data2="DATA2", #serial_number="0503", #month="MAR">

How to pass data along with a ruby block?

I'm trying to pass some data as a block to some external API. It would be a hassle to accommodate it to accepting additional parameters. If it were javascript, I might make it like so:
var callback = function() {
// do something
}
callback['__someData'] = options;
someExternalAPI(callback);
Is this possible with Ruby? Or how should I go about associating some data with a block?
Not sure if the edits to the question were correct. First, I'd like to specifically pass some data along with a block if that is possible. Not sure if it is though. And probably the only way to do it in ruby is to pass some data as a block.
Additionally, here might be some useful info.
Okay, it probably makes sense to show the whole picture. I'm trying to adapt webmock to my needs. I have a function, which checks if request's params (be them of POST, or of GET) match specified criteria:
def check_params params, options
options.all? do |k,v|
return true unless k.is_a? String
case v
when Hash
return false unless params[k]
int_methods = ['<', '<=', '>', '>=']
v1 = int_methods.include?(v.first[0]) ? params[k].to_i : params[k]
v2 = int_methods.include?(v.first[0]) \
? v.first[1].to_i : v.first[1].to_s
v1.send(v.first[0], v2)
when TrueClass, FalseClass
v ^ ! params.key?(k)
else
params[k] == v.to_s
end
end
end
It's not perfect, but it suffices for my particular needs, for now. I'm calling it like this:
stub_request(:post, 'http://example.com/')
.with { |request|
check_params Rack::Utils.parse_query(request.body), options
}
And the thing is generally I see no sensible way to output with block conditions. But in my particular case one can just output options hash. And instead of this:
registered request stubs:
stub_request(:post, "http://example.com")
to have this:
stub_request(:post, "http://example.com").
with(block: {"year"=>2015})
Which is what I'm trying to do.

Okay, I ended up doing this:
p = Proc.new {}
p.class.module_eval { attr_accessor :__options }
p.__options = {a: 1}
# ...
pp p.__options
Or to be more specific:
def mk_block_cond options, &block_cond
options = options.select { |k,v| ! k.is_a?(Symbol) }
return block_cond if options.empty?
block_cond.class.module_eval { attr_accessor :__options }
block_cond.__options = options
block_cond
end
module WebMock
class RequestPattern
attr_reader :with_block
end
end
module StubRequestSnippetExtensions
def to_s(with_response = true)
request_pattern = #request_stub.request_pattern
string = "stub_request(:#{request_pattern.method_pattern.to_s},"
string << " \"#{request_pattern.uri_pattern.to_s}\")"
with = ""
if (request_pattern.body_pattern)
with << ":body => #{request_pattern.body_pattern.to_s}"
end
if (request_pattern.headers_pattern)
with << ",\n " unless with.empty?
with << ":headers => #{request_pattern.headers_pattern.to_s}"
end
if request_pattern.with_block \
&& request_pattern.with_block.respond_to?('__options') \
&& request_pattern.with_block.__options
with << ",\n " unless with.empty?
with << "block: #{request_pattern.with_block.__options}"
end
string << ".\n with(#{with})" unless with.empty?
if with_response
string << ".\n to_return(:status => 200, :body => \"\", :headers => {})"
end
string
end
end
module WebMock
class StubRequestSnippet
prepend StubRequestSnippetExtensions
end
end
module RequestPatternExtensions
def to_s
string = "#{#method_pattern.to_s.upcase}"
string << " #{#uri_pattern.to_s}"
string << " with body #{#body_pattern.to_s}" if #body_pattern
string << " with headers #{#headers_pattern.to_s}" if #headers_pattern
if #with_block
if #with_block.respond_to?('__options') \
&& #with_block.__options
string << " with block: %s" % #with_block.__options.inspect
else
string << " with given block"
end
end
string
end
end
module WebMock
class RequestPattern
prepend RequestPatternExtensions
end
end
And now I stub requests this way:
stub_request(:post, 'http://example.com/')
.with &mk_block_cond(options) { |request|
check_params Rack::Utils.parse_query(request.body), options
}
P.S. github issue

Using field as input to Logstash Grok filter pattern

I'm wondering if it is possible to use a field in the Logstash message as the input the to Grok pattern. Say I have an entry that looks like:
{
"message":"10.1.1.1",
"grok_filter":"%{IP:client}"
}
I want to be able to do something like this:
filter {
grok {
match => ["message", ["%{grok_filter}"]]
}
}
The problem is this crashes Logstash as it appears to treat "%{grok_filter}" as the Grok filter itself instead of the value of grok_filter. I get the following after Logstash has crashed:
The error reported is:
pattern %{grok_filter} not defined
Is there anyway to get the value of a field from inside the Grok filter block and use that as the input to the Grok pattern?

The answer is no -- the grok filter compiles its pattern when the filter is initialized. If you need to do something like that you'll have to write your own filter that compiles the pattern every time (and pay the performance penalty).
Without knowing more about why you want to do this, it's hard to recommend the best course of action. If you have a limited number of patterns, you can just set a grok_filter_type parameter and then have a bunch of if [grok_filter_type] == 'ip' { grok { ... } } type of things.
Here's a custom filter that will allow you to do what you want -- it's mostly a copy of the grok code, but there are some changes/simplifications. I've tested it and it seems to work for me.
# encoding: utf-8
require "logstash/filters/base"
require "logstash/namespace"
require "logstash/environment"
require "set"
# A version of grok that can parse from a log-defined pattern. Not really
# recommended for high usage patterns, but for the occassional pattern it
# should work
# filter {
# grok_dynamic {
# match_field => "message"
# pattern_field => "message_pattern"
# }
# }
#
class LogStash::Filters::GrokDynamic < LogStash::Filters::Base
config_name "grok_dynamic"
milestone 1
# The field that contains the data to match against
config :match_field, :validate => :string, :required => true
# the field that contains the pattern
config :pattern_field, :validate => :string, :required => true
# where the patterns are
config :patterns_dir, :validate => :array, :default => []
# If true, only store named captures from grok.
config :named_captures_only, :validate => :boolean, :default => true
# If true, keep empty captures as event fields.
config :keep_empty_captures, :validate => :boolean, :default => false
# Append values to the 'tags' field when there has been no
# successful match
config :tag_on_failure, :validate => :array, :default => ["_grokparsefailure"]
# The fields to overwrite.
#
# This allows you to overwrite a value in a field that already exists.
config :overwrite, :validate => :array, :default => []
# Detect if we are running from a jarfile, pick the right path.
##patterns_path ||= Set.new
##patterns_path += [LogStash::Environment.pattern_path("*")]
public
def initialize(params)
super(params)
#handlers = {}
end
public
def register
require "grok-pure" # rubygem 'jls-grok'
#patternfiles = []
# Have ##patterns_path show first. Last-in pattern definitions win; this
# will let folks redefine built-in patterns at runtime.
#patterns_dir = ##patterns_path.to_a + #patterns_dir
#logger.info? and #logger.info("Grok patterns path", :patterns_dir => #patterns_dir)
#patterns_dir.each do |path|
if File.directory?(path)
path = File.join(path, "*")
end
Dir.glob(path).each do |file|
#logger.info? and #logger.info("Grok loading patterns from file", :path => file)
#patternfiles << file
end
end
#patterns = Hash.new { |h,k| h[k] = [] }
#grok = Grok.new
#patternfiles.each { |path| #grok.add_patterns_from_file(path) }
end # def register
public
def filter(event)
return unless filter?(event)
return if event[#match_field].nil? || event[#pattern_field].nil?
#logger.debug? and #logger.debug("Running grok_dynamic filter", :event => event);
#grok.compile(event[#pattern_field]);
if match(#grok,#match_field, event)
filter_matched(event)
else
# Tag this event if we can't parse it. We can use this later to
# reparse+reindex logs if we improve the patterns given.
#tag_on_failure.each do |tag|
event["tags"] ||= []
event["tags"] << tag unless event["tags"].include?(tag)
end
end
#logger.debug? and #logger.debug("Event now: ", :event => event)
end # def filter
private
def match(grok, field, event)
input = event[field]
if input.is_a?(Array)
success = true
input.each do |input|
match = grok.match(input)
if match
match.each_capture do |capture, value|
handle(capture, value, event)
end
else
success = false
end
end
return success
#elsif input.is_a?(String)
else
# Convert anything else to string (number, hash, etc)
match = grok.match(input.to_s)
return false if !match
match.each_capture do |capture, value|
handle(capture, value, event)
end
return true
end
rescue StandardError => e
#logger.warn("Grok regexp threw exception", :exception => e.message)
end
private
def handle(capture, value, event)
handler = #handlers[capture] ||= compile_capture_handler(capture)
return handler.call(value, event)
end
private
def compile_capture_handler(capture)
# SYNTAX:SEMANTIC:TYPE
syntax, semantic, coerce = capture.split(":")
# each_capture do |fullname, value|
# capture_handlers[fullname].call(value, event)
# end
code = []
code << "# for capture #{capture}"
code << "lambda do |value, event|"
#code << " p :value => value, :event => event"
if semantic.nil?
if #named_captures_only
# Abort early if we are only keeping named (semantic) captures
# and this capture has no semantic name.
code << " return"
else
field = syntax
end
else
field = semantic
end
code << " return if value.nil? || value.empty?" unless #keep_empty_captures
if coerce
case coerce
when "int"; code << " value = value.to_i"
when "float"; code << " value = value.to_f"
end
end
code << " # field: #{field}"
if #overwrite.include?(field)
code << " event[field] = value"
else
code << " v = event[field]"
code << " if v.nil?"
code << " event[field] = value"
code << " elsif v.is_a?(Array)"
code << " event[field] << value"
code << " elsif v.is_a?(String)"
# Promote to array since we aren't overwriting.
code << " event[field] = [v, value]"
code << " end"
end
code << " return"
code << "end"
#puts code
return eval(code.join("\n"), binding, "<grok capture #{capture}>")
end # def compile_capture_handler
end # class LogStash::Filters::Grok

How to use custom RR wildcard matcher?

I have created a wildcard matcher for RR that matches JSON strings by parsing them into hashes. This is because JSON (de)serialization doesn't preserve order; if we have:
{ 'foo': 42, 'bar': 123 }
... then after (de)serialization, we might find that our update method is called with:
{ 'bar': 123, 'foo': 42 }
The wildcard matcher looks like this:
class RR::WildcardMatchers::MatchesJsonString
attr_reader :expected_json_hash
def initialize(expected_json_string)
#expected_json_hash = JSON.parse(expected_json_string)
end
def ==(other)
other.respond_to?(:expected_json_hash) && other.expected_json_hash == self.expected_json_hash
end
def wildcard_matches?(actual_json_string)
actual_json_hash = JSON.parse(actual_json_string)
#expected_json_hash == actual_json_hash
end
end
module RR::Adapters::RRMethods
def matches_json(expected_json_string)
RR::WildcardMatchers::MatchesJsonString.new(expected_json_string)
end
end
... and we're using it like:
describe 'saving manifests' do
before do
#manifests = [
{ :sections => [], 'title' => 'manifest1' },
{ :sections => [], 'title' => 'manifest2' }
]
mock(manifest).create_or_update!(matches_json(#manifests[0].to_json)) { raise 'uh oh' }
mock(manifest).create_or_update!(matches_json(#manifests[1].to_json))
parser = ContentPack::ContentPackParser.new({
'manifests' => #manifests
})
#errors = parser.save
end
it 'updates manifests' do
manifest.should have_received.create_or_update!(anything).twice
end
end
This in accordance with the RR documentation. However, instead of mock() expecting an argument that matches JSON, it expects the argument to be a MatchesJsonString object:
1) ContentPack::ContentPackParser saving manifests updates manifests
Failure/Error: mock(Manifest).create_or_update!(matches_json(#manifests[0].to_json)) { raise 'uh oh' }
RR::Errors::TimesCalledError:
create_or_update!(#<RR::WildcardMatchers::MatchesJsonString:0x13540def0 #expected_json_hash={"title"=>"manifest1", "sections"=>[]}>)
Called 0 times.
Expected 1 times.
# ./spec/models/content_pack/content_pack_parser_spec.rb:196

The answer is that there's a typo in the documentation to which I linked. This (my emphasis):
#wildcard_matches?(other)
wildcard_matches? is the method that actually checks the argument against the expectation. It should return true if other is considered to match, false otherwise. In the case of DivisibleBy, wildcard_matches? reads:
... should actually read:
#wildcard_match?(other)
...
One of my colleagues suggested that we compare our code with one of the matchers defined in the rr gem, and then the difference stood out.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Indentation sensitive parser using Parslet in Ruby? - ruby

Related

in Parslet, how to reconstruct substrings from parse subtrees?

How do I use Parslet with strings not Parslet Slices

How to pass data along with a ruby block?

Using field as input to Logstash Grok filter pattern

How to use custom RR wildcard matcher?

Categories

Resources