I'm running a very standard elk server to parse my python applications logs. I set up python to output the logs in json with the log message string in a field 'msg'. This has been working really well for me, but someone one accidentally spammed the logs last night with a dictionary passed directly to the message field. Because not much else was being logged last night the first 'msg' the new index saw was parsed as a object. Now all the properly formatted log messages are being rejected with the error:
"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [msg] tried to parse field [msg] as object, but found a concrete value"}}}, :level=>:warn}
I understand that 1 elasticsearch can't handle both objects and strings in the same field. Does anyone know the best way to set the field type? Should this be done by mutating them with a logstash filter, by setting the elasticsearch mapping, or both? Or should pre-process the logs in python formatter to ensure the msg can't be parsed as json? All 3 options seem relatively straight forward, but I really don't understand the trade offs.
Any recommendations?
Specifying the mapping is decidedly the best practice.
Specifying a "text" or "keyword" type would not only prevent the error that you saw, but would have other beneficial effects in performance.
I would recommend the logstash json_encode filter only if you knew the input was always json and for some reason didn't want it parsed into its constituents (for example, if it was very sparse that would be bad for performance).
Related
I'm using zerolog in golang, which outputs json formatted log, the app is running on k8s, and has cri-o format as following.
actual log screenshot on Grafana loki
My question is, since there's some non-json text prepended to my json log, I can't seem to effectively query the log, one example is, when I tried to pipe the log into logfmt, exceptions were thrown.
What I want is to be able to query into the sub field of the json.
My intuition is to maybe for each log, only select the parts from { (start of the json), then maybe I can do more interesting manipulation. I'm a bit stuck and not sure what's the best way to proceed.
Any help and comments is appreciated.
after some head scratching, problem solved.
As I'm directly using the promtail setup from here https://raw.githubusercontent.com/grafana/loki/master/tools/promtail.sh
And within this setup, the default parser is docker, but we need to change it to cri, afterwards, the logs are properly parsed as json in my Grafana dashboard
I have a field in my logs called json_path containing data like /nfs/abc/123/subdir/blah.json and I want to create count plot on part of the string abc here, so the third chunk using the token /. I have tried all sorts of online answers, but they're all partial answers (nothing I can easily understand how to use or integrate). I've tried running POST/GET queries in the Console, which all failed due to syntax errors I couldn't manage to debug (they were complaining about newline control chars, when there were none that I could obviously see or see in a text editor explicitly showing control-characters). I also tried Management -> Index Patterns -> Scripted Field but after adding my code there, basically the whole Kibana crashed (stopped working temporarily) until I removed that Scripted Field.
All this elasticsearch and kibana stuff is annoyingly difficult, all the docs expect you to be an expert in their tool, rather than just an engineer needing to visualize some data.
I don't really want to add a new data field in my log-generation code, because then all my old logs will be unsupported (which have the relevant data, it just needs that bit of string processing before data viz). I know I could probably back-annotate the old logs, but the whole Kibana/elasticsearch experience is just frustrating and I don't use it enough to justify learning such detailed procedures (I actually learned a bunch of this stuff a year ago, and then promptly forgot it due to lack of use).
You cannot plot on a sub string of a field unless you extract that sub string into a new field. I can understand the frustration in learning a new product but to be able to achieve what you want you need to have that sub string value in a new field. Scripted fields are generally used to modify a field. To be able to extract sub string from a field I’d recommend using Ingest Node processor like grok processor. This will add a new field which you can use to plot in Kibana visualizations..
I'm looking into creating a mapping for a variety of data that's mostly simple, and have not done this before. One of the fields I need to map is a string representation of json that ideally would be json in elastic search.
The json string value sample:
"{\"United States\":{\"original\":\"United States\",\"importance\":\"1\"},\"Protestantism\":{\"original\":\"Protestantism\",\"importance\":\"1\"}}"
The size and values of the map representation are dynamic, so i understand that having it as a string makes sense, but I'm hoping there is a way to have it as json.
1 EDIT:
I just discovered there is a way to set up a pipeline processor designed to make this convertion: https://www.elastic.co/guide/en/elasticsearch/reference/5.3/json-processor.html
I have not tried it yet, but this seems to be the closest thing so far to what i was looking for.
I have just started log stash, i have log files in that log file whole object is printed in the logs, Since my object is huge i cant write the grok patterns to the whole object and also i expecting only two values out of those object. Can you please let us know how can i get that?
my logs files looks like below
2015-06-10 13:02:57,903 your done OBJ[name:test;loc:blr;country:india,acc:test#abe.com]
This is just an example my object has lot attributes in int , in those object i need to get only name and acc.
Regards
Mohan.
You can use the following pattern for the same
%{GREEDYDATA}\[name:%{WORD:name};%{GREEDYDATA},acc:%{NOTSPACE:account}\]
GREEDYDATA us defined as follows -
GREEDYDATA .*
The key lie in understanding greedydata macro.
It eats up all possible characters as possible.
Logstash patterns don't have to match the entire line. You could also pull the leading information off (date, time, etc) in one grok{} and then use a different grok{} to pull off just the two fields that you want.
I'm trying to find a means of working through Elasticsearch query parse exceptions in a fashion that doesn't treat me like a machine. I want to be a human and that means that when I have a syntax error in my query I want an informative and concise message.
Is there an existing service / gem / technology that makes this possible? Maybe it's a parser that you feed it ES gibberish query parse exception messages and it gives you back something (more) helpful? Or maybe it's an ES plugin that simply changes how parse exceptions are rendered.
My most wanted characteristics -
Concise, no more than 80 characters to summarize the problem and another 200 to explain how to fix it
It tells me exactly where in my query the error occurred (too often the error coarsely directs my debugging efforts; as in from[-1],size[-1]: Parse Failure [Expected [START_OBJECT] under [and], but got a [START_ARRAY] in [filter]]], which vaguely directs my attention to help me debug, but surely it could tell me at which line or character the syntax error occurred)
Human readable - it gets rid of the machine friendly cruft like {, (, and ; and uses proper English instead of jargon.
Your help in reducing the cognitive burden imparted by these exceptions is greatly appreciated.
I'm not sure if this helps, but if I have a query that is failing for a reason I don't understand, I like to use the Elasticsearch validate API.
So for my queries, I will do
curl -XPOST 'localhost:9200/<index>/_validate/query/?explain=true&pretty=true -d {
"query": {"match_all": {}}
}
It will take the query and run it through the validator, and if it fails, it will display the specific error it detected without all that cruft.
Hope that is helpful!