How to ignore highlighted fields in a kibana query? - elasticsearch

I'm trying to make a query in kibana that shows all the errors in a service, but the results only shows the data with the field "highlight", how can I ignore it?
I've tried making a DSL query like this:
{
"query": {
"exists": {
"field": "payload.error"
}
}
}
but it does not work as I expect
This is the structure of the data that is show in the query answer:
"payload": {
"method": "standardError",
"error": {
"code": "300",
"detail": "{\"Cliente no posee fecha\"}",
"message": "BUS_ERROR"
}
},
"highlight": {
"payload.error.code.keyword": [
"#kibana-highlighted-field#107#/kibana-highlighted-field#"
]
}
The data that is not show in the query result does not have the field "highlight" but have the exact same payload structure
I expect a query that shows all the data with the field payload.error, no matter if it has a highlight field or not

Not sure what you are looking for, but can't you just use the Search Box on the top and write something like:
payload.error:* AND highlight.payload.error.code.keyword:
That will give you all the hits that fulfill them both. Or:
payload.error:*
That will give you all the hits where "payload.error" is used.
As I read your example, then highlight doesn't really have anything to do with the first payload, you have above and therefore searching only for "payload.error:*" should be enough?

Related

Elastic query bool must match issue

Below is the query part in Elastic GET API via command line inside openshift pod , i get all the match query as well as unmatch element in the fetch of 2000 documents. how can i limit to only the match element.
i want to specifically get {\"kubernetes.container_name\":\"xyz\"}} only.
any suggestions will be appreciated
-d ' {\"query\": { \"bool\" :{\"must\" :{\"match\" :{\"kubernetes.container_name\":\"xyz\"}},\"filter\" : {\"range\": {\"#timestamp\": {\"gte\": \"now-2m\",\"lt\": \"now-1m\"}}}}},\"_source\":[\"#timestamp\",\"message\",\"kubernetes.container_name\"],\"size\":2000}'"
For exact matches there are two things you would need to do:
Make use of Term Queries
Ensure that the field is of type keyword datatype.
Text datatype goes through Analysis phase.
For e.g. if you data is This is a beautiful day, during ingestion, text datatype would break down the words into tokens, lowercase them [this, is, a, beautiful, day] and then add them to the inverted index. This process happens via Standard Analyzer which is the default analyzer applied on text field.
So now when you query, it would again apply the analyzer at querying time and would search if the words are present in the respective documents. As a result you see documents even without exact match appearing.
In order to do an exact match, you would need to make use of keyword fields as it does not goes through the analysis phase.
What I'd suggest is to create a keyword sibling field for text field that you have in below manner and then re-ingest all the data:
Mapping:
PUT my_sample_index
{
"mappings": {
"properties": {
"kubernetes":{
"type": "object",
"properties": {
"container_name": {
"type": "text",
"fields":{ <--- Note this
"keyword":{ <--- This is container_name.keyword field
"type": "keyword"
}
}
}
}
}
}
}
}
Note that I'm assuming you are making use of object type.
Request Query:
POST my_sample_index
{
"query":{
"bool": {
"must": [
{
"term": {
"kubernetes.container_name.keyword": {
"value": "xyz"
}
}
}
]
}
}
}
Hope this helps!

How to extract and visualize values from a log entry in OpenShift EFK stack

I have an OKD cluster setup with EFK stack for logging, as described here. I have never worked with one of the components before.
One deployment logs requests that contain a specific value that I'm interested in. I would like to extract just this value and visualize it with an area map in Kibana that shows the amount of requests and where they come from.
The content of the message field basically looks like this:
[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}
This plz is a German zip code, which I would like to visualize as described.
My problem here is that I have no idea how to extract this value.
A nice first success would be if I could find it with a regexp, but Kibana doesn't seem to work the way I think it does. Following its docs, I expect this /\"plz\":\"[0-9]{5}\"/ to deliver me the result, but I get 0 hits (time interval is set correctly). Even if this regexp matches, I would only find the log entry where this is contained and not just the specifc value. How do I go on here?
I guess I also need an external geocoding service, but at which point would I include it? Or does Kibana itself know how to map zip codes to geometries?
A beginner-friendly step-by-step guide would be perfect, but I could settle for some hints that guide me there.
It would be possible to parse the message field as the document gets indexed into ES, using an ingest pipeline with grok processor.
First, create the ingest pipeline like this:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
}
]
}
Then, when you index your data, you simply reference that pipeline:
PUT plz/_doc/1?pipeline=parse-plz
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}"""
}
And you will end up with a document like the one below, which now has a field called plz with the 12345 value in it:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345"
}
When indexing your document from Fluentd, you can specify a pipeline to be used in the configuration. If you can't or don't want to modify your Fluentd configuration, you can also define a default pipeline for your index that will kick in every time a new document is indexed. Simply run this on your index and you won't need to specify ?pipeline=parse-plz when indexing documents:
PUT index/_settings
{
"index.default_pipeline": "parse-plz"
}
If you have several indexes, a better approach might be to define an index template instead, so that whenever a new index called project.foo-something is created, the settings are going to be applied:
PUT _template/project-indexes
{
"index_patterns": ["project.foo*"],
"settings": {
"index.default_pipeline": "parse-plz"
}
}
Now, in order to map that PLZ on a map, you'll first need to find a data set that provides you with geolocations for each PLZ.
You can then add a second processor in your pipeline in order to do the PLZ/ZIP to lat,lon mapping:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
},
{
"script": {
"lang": "painless",
"source": "ctx.location = params[ctx.plz];",
"params": {
"12345": {"lat": 42.36, "lon": 7.33}
}
}
}
]
}
Ultimately, your document will look like this and you'll be able to leverage the location field in a Kibana visualization:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345",
"location": {
"lat": 42.36,
"lon": 7.33
}
}
So to sum it all up, it all boils down to only two things:
Create an ingest pipeline to parse documents as they get indexed
Create an index template for all project* indexes whose settings include the pipeline created in step 1

Search within the results got from elasticsearch

Is it possible to search within the results that I get from elasticsearch?
To achieve that currently I need to run & wait for two searches on elasticsearch: the first search is
{ "match": { "title": "foo" } }
It takes 5 seconds and returns 500 docs etc.. And then a second search
{
"bool": {
"must": [
{ "match": { "title": "foo" } },
{ "match": { "title": "bar" } }
]
}
}
It takes another 5 seconds and returns 200 docs, which basically has nothing to do with the first search from elasticsearch's perspective.
Instead of doing it this way, I'd like to offer a "search further within the result" option to my users. Hopefully with this option, users can make a search with more keyword provided based on the result returned from the first search.
So my scenario is that a user makes a first search with keyword "foo", and gets 500 results on the webpage, and then selects "search further within the result", to make a second search within the 500 results, and hope to get some refined results really quick.
How can I achive it? Thanks!
What you could do is use the IDS query. Collect all document IDs from the first request, and then post them with a new Bool query that includes an IDS query in a must clause next to the original query. You could efficiently collect the IDs in the first request using the Scroll API. Since you will return the second result sorted anyway, it does not make sense to do any sorting in the first request, so you can speed up the first request.
See:
Scroll API: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
IDS Query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html
post filter is a way to search inside an other search.
In your case :
GET _search
{
"query": {
"match": {
"title": "foo"
}
},
"post_filter": {
"match": {
"title": "bar"
}
}
}
post_filter will be executed on the query result.

Find documents in Elasticsearch where `ignore_malformed` was triggered

Elasticsearch by default throws an exception if inserting data to a field which does not fit the existing type. For example, if a field has been created as number type, inserting a document with a string value for that field causes an error.
This behavior can be changed by enabling then ignore_malformed setting, which means such fields are silently ignored for indexing purposes, but retained in the _source document - meaning that the invalid values cannot be searched or aggregated, but are still included in the returned document.
This is preferable behavior in our use case, but we would wish to be able to locate such documents somehow so we can fix them in the future.
Is there any way to somehow flag documents for which some malformed fields were ignored? We control the document insertion process fully, so we can modify all insertion flags, or do a trial insert, or anything, to reach our goal.
You can use the exists query to find document where this field does not exist, see this example
PUT foo
{
"mappings": {
"bar": {
"properties": {
"baz": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}
PUT foo/bar/1
{
"baz": "field"
}
GET foo/bar/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "baz"
}
}
]
}
}
}
}
}
There is no dedicated mechanism though, so this search finds also documents where the field is not set intentionally
You cannot, when you search on elasticsearch, you don't search on document source but on the inverted index, which contains the analyzed data.
ignore_malformed flag is saying "always store document, analyze if possible".
You can try, create a mal-formed document, and use _termvectors API to see how the document is analyzed and stored in the inverted index, in a case of a string field, you can see an "Array" is stored as an empty string etc.. but the field will exists.
So forget the inverted index, let's use the source!
Scroll all your data until you find the anomaly, I use a small python script that search scroll, unserialize and I test field type for every documents (very long) but I can have a list of wrong document IDs.
Use a script query can be very long and crash your cluster, use with caution, maybe as a post_filter:
Here I want to retrieve the document where country_name is not a string:
{
"_source": false,
"timeout" : "30s",
"query" : {
"query_string" : {
"query" : "locale:de_ch"
}
},
"post_filter": {
"script": {
"script": "!(_source.country_name instanceof String)"
}
}
}
"_source:false" => I want only document ID
"timeout" => prevent crash
As you notice, this is a missing feature, I know logstash will tag
document that fail, so elasticsearch could implement the same thing.

Sorting a match query with ElasticSearch

I'm trying to use ElasticSearch to find all records containing a particular string. I'm using a match query for this, and it's working fine.
Now, I'm trying to sort the results based on a particular field. When I try this, I get some very unexpected output, and none of the records even contain my initial search query.
My request is structured as follows:
{
"query":
{
"match": {"_all": "some_search_string"}
},
"sort": [
{
"some_field": {
"order": "asc"
}
}
] }
Am I doing something wrong here?
In order to sort on a string field, your mapping must contain a non-analyzed version of this field. Here's a simple blog post I found that describes how you can do this using the multi_field mapping type.

Resources