I came across this XPath string...
courses[?(#.id==101)].students[?(#.id==111)]
I'm not even sure if it is valid XPath because of how '?' and '.' are used. And it doesn't work in many online XPath evaluators.
Although, in XPath 3, there is a lookup operator ('?') and some JSON scanning features. Refer: https://www.altova.com/training/xpath3/xpath-31#lookup-operator
So, I'm wondering what exactly the purpose of the '?()' and '.' are.
So far I'm guessing that this expression is being used to search into a JSON content within an XML.
Guessing the JSON object is like this:
{
"courses":[
{
'id':'101',
'name':'Course 101',
'students':[
{
'id':'111',
'name':'Student 111'
}
]
}
]
}
In short, is it a valid XPath 3 expression? And if yes, then what exactly are '?()' and '.' doing?
JSON can only work with double quotes
Related
New to Go. My first project is to compare a NodeJS proxy and a Go proxy for account number tokenization. I have been doing NodeJS for a few years and am very comfortable with it. My proxies will not know the format of any request or response from the target servers. But it does have configurations coming from Redis/MongoDB that is similar to JSONPath expression. These configurations can change things like the target server/path, query parameters, headers, request body and response body.
For NodeJS, I am using deepdash's paths function to get an array of all the leaves in a JSON object in JSONPath format. I am using this array and RegEx to find my matching paths that I need to process from any request or response body. So far, it looks like I will be using gjson for my JSONPath needs, but it does not have anything for the paths command I was using in deepdash.
Will I need to create a recursive function to build this JSONPath array myself, or does anyone know of a library that will produce something similar?
for example:
{
"response": {
"results": [
{
"acctNum": 1234,
"someData": "something"
},
{
"acctNum": 5678,
"someData": "something2"
}
]
}
}
I will get an array back in the format:
[
"response.results[0].acctNum",
"response.results[0].someData",
"response.results[1].acctNum",
"response.results[1].someData"
]
and I can then use my filter of response.results[*].acctNum which translates to response\.results\[.*\]\.acctNum in Regex to get my acctNum fields.
From this filtered array, I should be able to use gjson to get the actual value, process it and then set the new value (I am using lodash in NodeJS)
There are a number of JSONPath implementations in GoLang. And I cannot really give a recommendation in this respect.
However, I think all you need is this basic path: $..*
It should return in pretty much any implementation that is able to return pathes instead of values:
[
"$['response']",
"$['response']['results']",
"$['response']['results'][0]",
"$['response']['results'][1]",
"$['response']['results'][0]['acctNum']",
"$['response']['results'][0]['someData']",
"$['response']['results'][1]['acctNum']",
"$['response']['results'][1]['someData']"
]
If I understand correctly this should still work using your approach filtering using RegEx.
Go SONPath implementations:
http://github.com-PaesslerAG-jsonpath
http://github.com-bhmj-jsonslice
http://github.com-ohler55-ojg
http://github.com-oliveagle-jsonpath
http://github.com-spyzhov-ajson
http://github.com-vmware-labs-yaml-jsonpath
Is it possible to search for a uri whose document contains a certain XPath using cts:uris()? I thought it may be quicker than returning uris from a cts:search. Here is what I have currently:
declare function local:xpath-search($collection) {
for $i in cts:search(//a/b, cts:and-query((cts:collection-query($collection)) ))[1] return fn:base-uri($i)
} ;
Is there a quicker way to return documents that contain a match to the XPath //a/b, using cts:uris()?
You can use cts:element-query() to construct a cts:query that functions similar to the XPath expression //a/b searching for documents that have a elements that have b element descendants. It isn't exactly the same, and might give you some false positives, because it is really more akin to //a//b, but might be acceptable and can be used with cts:uris().
xquery version "1.0-ml";
declare function local:xpath-search($collection) {
cts:uris("", (),
cts:and-query((
cts:collection-query($collection),
cts:element-query(xs:QName("a"),
cts:element-query(xs:QName("b"), cts:and-query(()) ) ) )) )
};
The Logstash filter regular expression to parse our syslog stream is getting more and more complicated, which led me to write tests. I simply copied the structure of a Grok test in the main Logstash repository, modified it a bit, and ran it with bin/logstash rspec as explained here. After a few hours of fighting with the regular expression syntax, I found out that there is a difference in how modifier characters have to be escaped. Here is a simple test for a filter involving square brackets in the log message, which you have to escape in the filter regular expression:
require "test_utils"
require "logstash/filters/grok"
describe LogStash::Filters::Grok do
extend LogStash::RSpec
describe "Grok pattern difference" do
config <<-CONFIG
filter {
grok {
match => [ "message", '%{PROG:theprocess}(?<forgetthis>(: )?(\\[[\\d:|\\s\\w/]*\\])?:?)%{GREEDYDATA:message}' ]
add_field => { "process" => "%{theprocess}" "forget_this" => "%{forgetthis}" }
}
}
CONFIG
sample "uwsgi: [pid: 12345|app: 0|req: 21/93281] BLAHBLAH" do
insist { subject["tags"] }.nil?
insist { subject["process"] } == "uwsgi"
insist { subject["forget_this"] } == ": [pid: 12345|app: 0|req: 21/93281]"
insist { subject["message"] } == "BLAHBLAH"
end
end
end
Save this as e.g. grok_demo.rb and test it with bin/logstash rspec grok_demo.rb, and it will work. If you remove the double escapes in the regexp, though, it won't.
I wanted to try the same thing in straight Ruby, using the same regular expression library that Logstash uses, and followed the directions given here. The following test worked as expected, without the need for double escape:
require 'rubygems'
require 'grok-pure'
grok = Grok.new
grok.add_patterns_from_file("/Users/ulas/temp/grok_patterns.txt")
pattern = '%{PROG:theprocess}(?<forgetthis>(: )?(\[[\d:|\s\w/]*\])?:?)%{GREEDYDATA:message}'
grok.compile(pattern)
text1 = 'uwsgi: [pid: 12345|app: 0|req: 21/93281] BLAHBLAH'
puts grok.match(text1).captures()
I'm not a Ruby programmer, and am a bit lost as to what causes this difference. Is it possible that the heredoc config specification necessitates double escapes? Or does it have to do with the way the regular expression gets passed to the regexp library within Logstash?
I never worked with writing tests for logstash before, but my guess is the double escape is due to the fact that you have strings embedded in strings.
The section:
<<-CONFIG
# stuff here
CONFIG
Is a heredoc in ruby (which is a fancy way to generate a string). So the filter, grok, match, add and all the brackets/braces are actually part of the string. Inside this string, you are escaping the escape sequence so the resulting string has a single literal escape sequence. I'm guessing that this string gets eval'd somewhere so that all the filter etc. stuff gets implemented as needed and that's where the single escape sequence is getting used.
When using "straight ruby" you aren't doing this double interpretation. You're just passing a string directly into the method to compile it.
I want to create a regex field in my Mongoid document so that I can have a behavior something like this:
MagicalDoc.create(myregex: /abc\d+xyz/)
MagicalDoc.where(myregex: 'abc123xyz')
I'm not sure if this is possible and what kind of affect it would have. How can I achieve this sort of functionality?
Update: I've learned from the documentation that Mongoid supports Regexp fields but it does not provide an example of how to query for them.
class MagicalDoc
include Mongoid::Document
field :myregex, type: Regexp
end
I would also accept a pure MongoDB answer. I can find a way to convert it to Mongoid syntax.
Update: Thanks to SuperAce99 for helping find this solution. Pass a string to a Mongoid where function and it will create a javascript function:
search_string = 'abc123xyz'
MagicalDoc.where(%Q{ return this.myregex.test("#{search_string}") })
The %Q is a Ruby method that helps to escape quotes.
regexp is not a valid BSON type, so you'll have to figure out how Mongoid represents it to devise a proper query.
Query String using Regex
If you want to send MongoDB a regular expression and return documents MongoDB provides the $regex query operator, which allows you to return documents where a string matches your regular expression.
Query Regex using String
If you want to sent Mongo a string and return all documents that have a regular expression that matches the provided string, you'll probably need the $where operator. This allows you to run a Javascript command on each document:
db.myCollection.find( { $where: function() { return (this.credits == this.debits) } } )
You can define a function which returns True when the provided string matches the Regex stored in the document. Obviously this can't use an Index because it has to execute code for every document in the collection. These queries will be very slow.
How can I get H1,H2,H3 contents in one single xpath expression?
I know I could do this.
//html/body/h1/text()
//html/body/h2/text()
//html/body/h3/text()
and so on.
Use:
/html/body/*[self::h1 or self::h2 or self::h3]/text()
The following expression is incorrect:
//html/body/*[local-name() = "h1"
or local-name() = "h2"
or local-name() = "h3"]/text()
because it may select text nodes that are children of unwanted:h1, different:h2, someWeirdNamespace:h3.
Another recommendation: Always avoid using // when the structure of the XML document is statically known. Using // most often results in significant inefficiencies because it causes the complete document (sub)tree roted in the context node to be traversed.