I'm trying to find a means of working through Elasticsearch query parse exceptions in a fashion that doesn't treat me like a machine. I want to be a human and that means that when I have a syntax error in my query I want an informative and concise message.
Is there an existing service / gem / technology that makes this possible? Maybe it's a parser that you feed it ES gibberish query parse exception messages and it gives you back something (more) helpful? Or maybe it's an ES plugin that simply changes how parse exceptions are rendered.
My most wanted characteristics -
Concise, no more than 80 characters to summarize the problem and another 200 to explain how to fix it
It tells me exactly where in my query the error occurred (too often the error coarsely directs my debugging efforts; as in from[-1],size[-1]: Parse Failure [Expected [START_OBJECT] under [and], but got a [START_ARRAY] in [filter]]], which vaguely directs my attention to help me debug, but surely it could tell me at which line or character the syntax error occurred)
Human readable - it gets rid of the machine friendly cruft like {, (, and ; and uses proper English instead of jargon.
Your help in reducing the cognitive burden imparted by these exceptions is greatly appreciated.
I'm not sure if this helps, but if I have a query that is failing for a reason I don't understand, I like to use the Elasticsearch validate API.
So for my queries, I will do
curl -XPOST 'localhost:9200/<index>/_validate/query/?explain=true&pretty=true -d {
"query": {"match_all": {}}
}
It will take the query and run it through the validator, and if it fails, it will display the specific error it detected without all that cruft.
Hope that is helpful!
Related
I'm using zerolog in golang, which outputs json formatted log, the app is running on k8s, and has cri-o format as following.
actual log screenshot on Grafana loki
My question is, since there's some non-json text prepended to my json log, I can't seem to effectively query the log, one example is, when I tried to pipe the log into logfmt, exceptions were thrown.
What I want is to be able to query into the sub field of the json.
My intuition is to maybe for each log, only select the parts from { (start of the json), then maybe I can do more interesting manipulation. I'm a bit stuck and not sure what's the best way to proceed.
Any help and comments is appreciated.
after some head scratching, problem solved.
As I'm directly using the promtail setup from here https://raw.githubusercontent.com/grafana/loki/master/tools/promtail.sh
And within this setup, the default parser is docker, but we need to change it to cri, afterwards, the logs are properly parsed as json in my Grafana dashboard
I'm doing some custom error handling/logging with Elasticsearch in a .NET environment using Elasticsearch.NET. Given an IResponse object, I'm trying to arrive at the best strategy for plucking out a short, succinct, and useful "root cause" message. I originally arrived at this, which works great when we come across indexing errors specifically:
shortMsg = response.ServerError?.Error?.RootCause?.FirstOrDefault()?.Reason;
But I recently ran into a query-time error where the above gave me this:
"failed to create query: { ... }"
(Details left out, but it effectively dumped the entire query.)
Since that isn't particularly useful, I spent a little time traversing the response to see what else is available. response.ServerError.Error.Reason, for example, returns "all shards failed" - also not particularly useful. response.DebugInformation is much bigger than what I'd like for this particular purpose, but I did find the needle in the haystack I was looking for toward end of it:
"Can't parse boolean value [True], expected [true] or [false]"
That's perfect, and to avoid parsing it out of DebugInfomation I also managed to find it here:
response.ServerError.Error.Metadata.FailedShards.First().Reason.CausedBy.Reason
So at this point I've arrived at this strategy to get my shortMsg:
shortMsg =
response.ServerError?.Error?.Metadata?.FailedShards?.FirstOrDefault()?.Reason?.CausedBy?.Reason ??
response.ServerError?.Error?.RootCause?.FirstOrDefault()?.Reason;
My concern with this is that it might be naive to assume that if something exists along the first path, it'll always be "better" than the second. A better understanding of the response structure itself might be key to arriving at the best strategy here.
Any suggestions on improving this?
I have this regular expressions to find syntax errors off of webpages (I'm a pentester for a living):
SQL_REGEX = %r((?-mix:SQL query error)|(?-mix:MySQL Query Error)|(?-mix:expects parameter)|(?-mix:You have an error in your SQL syntax))
I would like a regex that will find the error messages on a website if they have incorrectly closed SQL syntax, the one above works, but it seems to me that it's a little slower then it could be, any suggestions on how to make a better more reliable regex?
I'm running a very standard elk server to parse my python applications logs. I set up python to output the logs in json with the log message string in a field 'msg'. This has been working really well for me, but someone one accidentally spammed the logs last night with a dictionary passed directly to the message field. Because not much else was being logged last night the first 'msg' the new index saw was parsed as a object. Now all the properly formatted log messages are being rejected with the error:
"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [msg] tried to parse field [msg] as object, but found a concrete value"}}}, :level=>:warn}
I understand that 1 elasticsearch can't handle both objects and strings in the same field. Does anyone know the best way to set the field type? Should this be done by mutating them with a logstash filter, by setting the elasticsearch mapping, or both? Or should pre-process the logs in python formatter to ensure the msg can't be parsed as json? All 3 options seem relatively straight forward, but I really don't understand the trade offs.
Any recommendations?
Specifying the mapping is decidedly the best practice.
Specifying a "text" or "keyword" type would not only prevent the error that you saw, but would have other beneficial effects in performance.
I would recommend the logstash json_encode filter only if you knew the input was always json and for some reason didn't want it parsed into its constituents (for example, if it was very sparse that would be bad for performance).
i am using ruby gems json_pure and when i get parsing errors i am not able to determine the line number where the error is occuring. i was expecting to find a validator written in ruby that would tell me the line number. what is the best ruby approach to finding the json errors quickly?
thanks!
You could try Kwalify: http://www.kuwata-lab.com/kwalify/ruby/users-guide.html
It's not just a JSON/YAML validator, but you give it a schema and it validates against that.
You could use that to verify that your config file is correct (as per your definition of "correct") as well as correct JSON (or YAML), and it will tell you what line number the error happened on, and a bit of context for the error as well.
I removed a ']' from a sample in their documentation and the error message it gave me was
ERROR: document12a.json:10:1 [/favorite] flow sequence is not closed by ']'.
It also does data binding class generation if you want. It seems like a pretty good configuration management/validation tool.
You might find it easier to use http://www.jsonlint.com/ to check the JSON is valid - it will highlight any problematic lines.
It's and old question, but it's still an issue in current Ruby.
Default JSON parser in Ruby does not tell you the line number and/or column number the parse error occured. Just 'parse' error and that's it.
You could change the parser to Oj and use with MultiJson gem.
Oj is great, it's very fast, and it will show the line and even the column number the parsing errors occured! Great.
Sample error for one-line very big JSON (170KB+), Oj will give column 102421.
.../adapters/oj.rb:15:in `load': unexpected character at line 1, column 102421 [parse.c:666] (MultiJson::ParseError)
JSON is supposed to be sent over the network as a string, so there's actually only one line.