Elastic query bool must match issue - elasticsearch

Below is the query part in Elastic GET API via command line inside openshift pod , i get all the match query as well as unmatch element in the fetch of 2000 documents. how can i limit to only the match element.
i want to specifically get {\"kubernetes.container_name\":\"xyz\"}} only.
any suggestions will be appreciated
-d ' {\"query\": { \"bool\" :{\"must\" :{\"match\" :{\"kubernetes.container_name\":\"xyz\"}},\"filter\" : {\"range\": {\"#timestamp\": {\"gte\": \"now-2m\",\"lt\": \"now-1m\"}}}}},\"_source\":[\"#timestamp\",\"message\",\"kubernetes.container_name\"],\"size\":2000}'"

For exact matches there are two things you would need to do:
Make use of Term Queries
Ensure that the field is of type keyword datatype.
Text datatype goes through Analysis phase.
For e.g. if you data is This is a beautiful day, during ingestion, text datatype would break down the words into tokens, lowercase them [this, is, a, beautiful, day] and then add them to the inverted index. This process happens via Standard Analyzer which is the default analyzer applied on text field.
So now when you query, it would again apply the analyzer at querying time and would search if the words are present in the respective documents. As a result you see documents even without exact match appearing.
In order to do an exact match, you would need to make use of keyword fields as it does not goes through the analysis phase.
What I'd suggest is to create a keyword sibling field for text field that you have in below manner and then re-ingest all the data:
Mapping:
PUT my_sample_index
{
"mappings": {
"properties": {
"kubernetes":{
"type": "object",
"properties": {
"container_name": {
"type": "text",
"fields":{ <--- Note this
"keyword":{ <--- This is container_name.keyword field
"type": "keyword"
}
}
}
}
}
}
}
}
Note that I'm assuming you are making use of object type.
Request Query:
POST my_sample_index
{
"query":{
"bool": {
"must": [
{
"term": {
"kubernetes.container_name.keyword": {
"value": "xyz"
}
}
}
]
}
}
}
Hope this helps!

Related

Elasticsearch: why can't find by `term` query but can find by `match` query?

I am using Elasticsearch 11 for query text.
I have below query but it doesn't return any document.
POST/_search
{
"query": {
"term":{
"metric_name" : {"value": "ConsumedReadCapacityUnits","boost": 1.0}
}
}
}
Then I change it to text query like below which can find the matched document:
POST/_search
{
"query": {
"match":{
"metric_name" : "ConsumedReadCapacityUnits"
}
}
}
Based on the doc in term query, it matches exact term but the value ConsumedReadCapacityUnits is an exact one for metric_name, so why term query doesn't return anything?
Match query analyzes the search term, based on the standard analyzer (if no analyzer is specified) and then matches the analyzed term with the terms stored in the inverted index. By default text type field uses a standard analyzer if no analyzer is specified. For eg. SchooL gets analyzed to school
Term query returns documents that contain an exact term in a provided field. If you have not defined any explicit index mapping, then you need to add .keyword to the field. This uses the keyword analyzer instead of the standard analyzer.
As mentioned in the comments above mapping type of ConsumedReadCapacityUnits is text, so you can perform term query on ConsumedReadCapacityUnits by updating your index mapping
If you want to store the ConsumedReadCapacityUnits field as of both text and keyword type, then you can update your index mapping as shown below to use multi fields
PUT /_mapping
{
"properties": {
"ConsumedReadCapacityUnits": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
And then reindex the data again. After this, you will be able to perform term query using the "ConsumedReadCapacityUnits.keyword" field as of keyword type and "ConsumedReadCapacityUnits" as of text type
OR the other way is to create a new index, with the below index mapping
{
"mappings": {
"properties": {
"ConsumedReadCapacityUnits": {
"type": "keyword"
}
}
}
}
And then index the data in this new index

Elasticsearch 6.2: terms query require lowercase input when searching on keyword

I've created an example index, with the following mapping:
{
"_doc": {
"_source": {
"enabled": False
},
"properties": {
"status": { "type": "keyword" }
}
}
}
And indexed a document:
{"status": "CMP"}
When searching the documents with this status with a terms query, I find no results:
{
"query" : {
"terms": { "status": ["CMP"]}
}
}
However, if I make the same query by putting the input in lowercase, I will find my document:
{
"query" : {
"terms": { "status": ["cmp"]}
}
}
Why is it? Since I'm searching on a keyword field, the indexed content should not be analyzed and should match an uppercase value...
no more #Oliver Charlesworth Now - in Elastic 6.x - you could continue to use a keyword datatype, lowercasing your text with a normalizer,doc here. However in every cases you should change your index mapping and reindex your docs
The index and mapping creation and the search were part of a test suite. It seems that the setup part of the test suite was not executed, and the mapping was not applied to the index.
The index was then using the default types instead of the mapping types, resulting of the use of string fields instead of keywords.
After changing the setup method of the automated tests, the mappings are well applied to the index, and the uppercase values for the status "CMP" are now matching documents.
The symptoms you're seeing shouldn't occur, unless something else is wrong.
A keyword index is not analysed, so your index should contain only CMP. A terms query is also not analysed, etc. so your index is searched only for CMP. Hence there should be a match.

Find documents in Elasticsearch where `ignore_malformed` was triggered

Elasticsearch by default throws an exception if inserting data to a field which does not fit the existing type. For example, if a field has been created as number type, inserting a document with a string value for that field causes an error.
This behavior can be changed by enabling then ignore_malformed setting, which means such fields are silently ignored for indexing purposes, but retained in the _source document - meaning that the invalid values cannot be searched or aggregated, but are still included in the returned document.
This is preferable behavior in our use case, but we would wish to be able to locate such documents somehow so we can fix them in the future.
Is there any way to somehow flag documents for which some malformed fields were ignored? We control the document insertion process fully, so we can modify all insertion flags, or do a trial insert, or anything, to reach our goal.
You can use the exists query to find document where this field does not exist, see this example
PUT foo
{
"mappings": {
"bar": {
"properties": {
"baz": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}
PUT foo/bar/1
{
"baz": "field"
}
GET foo/bar/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "baz"
}
}
]
}
}
}
}
}
There is no dedicated mechanism though, so this search finds also documents where the field is not set intentionally
You cannot, when you search on elasticsearch, you don't search on document source but on the inverted index, which contains the analyzed data.
ignore_malformed flag is saying "always store document, analyze if possible".
You can try, create a mal-formed document, and use _termvectors API to see how the document is analyzed and stored in the inverted index, in a case of a string field, you can see an "Array" is stored as an empty string etc.. but the field will exists.
So forget the inverted index, let's use the source!
Scroll all your data until you find the anomaly, I use a small python script that search scroll, unserialize and I test field type for every documents (very long) but I can have a list of wrong document IDs.
Use a script query can be very long and crash your cluster, use with caution, maybe as a post_filter:
Here I want to retrieve the document where country_name is not a string:
{
"_source": false,
"timeout" : "30s",
"query" : {
"query_string" : {
"query" : "locale:de_ch"
}
},
"post_filter": {
"script": {
"script": "!(_source.country_name instanceof String)"
}
}
}
"_source:false" => I want only document ID
"timeout" => prevent crash
As you notice, this is a missing feature, I know logstash will tag
document that fail, so elasticsearch could implement the same thing.

Elasticsearch find missing word in phrase

How can i use Elasticsearch to find the missing word in a phrase? For example i want to find all documents which contain this pattern make * great again, i tried using a wildcard query but it returned no results:
{
"fields": [
"file_name",
"mime_type",
"id",
"sha1",
"added_at",
"content.title",
"content.keywords",
"content.author"
],
"highlight": {
"encoder": "html",
"fields": {
"content.content": {
"number_of_fragments": 5
}
},
"order": "score",
"tags_schema": "styled"
},
"query": {
"wildcard": {
"content.content": "make * great again"
}
}
}
If i put in a word and use a match_phrase query i get results, so i know i have data which matches the pattern.
Which type of query should i use? or do i need to add some type of custom analyzer to the field?
Wildcard queries operate on terms, so if you use it on an analyzed field, it will actually try to match every term in that field separately. In your case, you can create a not_analyzed sub-field (such as content.content.raw) and run the wildcard query on that. Or just map the actual field to not be analyzed, if you don't need to query it in other ways.

Indexing a multi-field property in Elastic Search

I am trying to re-index my documents in order for them to be sortable which requires making the sortable fields Multi-field properties with a "raw" version of the string which does not get analyzed.
I am following this article, but I am still getting errors when searching my documents with a sorting query.
I have a question then regarding the re-indexing of the data... if I re-index the doucments into this new index, then do I need to have some extra logic to set the analyzed version and the non_analyzed or "raw" version of the string as well? Or does elastic search automatically fill that one? Here is what my field looks like:
{
"entityName": {
"type":"string",
"fields": {
"raw": {
"type":"string",
"index":"not_analyzed"
}
}
}
}
So when I index a document with a _source like:
{
...
"entityName":"Ned Stark"
...
}
Will the mapping to both the analyzed field and the not_analyzed field complete or is there something else I have to do to tell the indexing to fill in the "raw" property as well?
No, you don't need to do anything else.
After reindexing your documents, you must tell which fields the query should use like in your given documentation article.
Raw subfield:
POST /_search
{
"query": {
"match": {
"entityName.raw": "foo-bar"
}
}
}
or original analysed type:
POST /_search
{
"query": {
"match": {
"entityName": "foo-bar"
}
}
}

Resources