Querying against an array only works for first element - elasticsearch

I am having an issue searching for items within an array in a document; a simple tagging system in my case. I have a relatively simple document representing a recipe. This is a truncated version of the data in the index:
{
"_index" : "recipes",
"_type" : "recipe",
"_id" : "37",
"_version" : 1,
"found" : true,
"_source" : {
"id" : 37,
"title" : "Crab Cakes",
"tags" : [
"seafood",
"appetizer"
]
}
}
When I search for the tag seafood it matches this recipe. However, when I search for the tag appetizer, I get nothing. Here is the explain for a very basic appetizer query:
curl -XGET 'http://localhost:9200/recipes/recipe/37/_explain?pretty' -H 'Content-Type: application/json' -d'{"query":{"term":{"tags":"appetizer"}}}'
Which results in this:
{
"_index" : "recipes",
"_type" : "recipe",
"_id" : "37",
"matched" : false,
"explanation" : {
"value" : 0.0,
"description" : "no matching term",
"details" : [ ]
}
}

Correct answer came in comments from sramalingam24. It was to change the query to a match instead of a term.
[updated]
I also tested switching the tags to be a keyword field and that works as well. This is the solution I ended up going with.

Related

Check documents not existing at elasticsearch

I have millions of indexed documents. after indexing I figured that there is an document count mismatch. i want to send array of hundreds of document ids and search at Elastic search if those document ids exists?. and in response get ids that has not Indexed.
example:
these are indexed documents
[497499, 497550, 498370, 498476, 498639, 498726, 498826, 500479, 500780, 500918]
I'm sending 4 at a time
[497599, 88888, 497550, 77777]
response should be whats not at there
[88888, 77777]
You should consider using the _mget endpoint and then parse the result like for instance :
GET someidx/_mget?_source=false
{
"docs" : [
{
"_id" : "c37m5W4BifZmUly9Ni-X"
},
{
"_id" : "2"
}
]
}
Result :
{
"docs" : [
{
"_index" : "someidx",
"_type" : "_doc",
"_id" : "c37m5W4BifZmUly9Ni-X",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true
},
{
"_index" : "someidx",
"_type" : "_doc",
"_id" : "2",
"found" : false
}
]
}

Using field instead of "_id" for more-like-this query

I have a slug field that I want to use to identify object to use as a reference instead of "_id" field. But instead of using it as a reference, doc seems to use it as query to comapre against. Since slug is a unique field with a simple analyzer, it just returns exactly one result like the following. As far as I know, there is no way to use a custom field as _id field:
https://github.com/elastic/elasticsearch/issues/6730
So is double look up, finding out elasticsearch's id first then doing more_like_this the only way to achieve what I am looking for? Someone seems to have asked a similar question three years ago, but it doesn't have an answer.
ArticleDocument.search().query("bool",
should=Q("more_like_this",
fields= ["slug", "text"],
like={"doc": {"slug": "OEXxySDEPWaUfgTT54QvBg",
}, "_index":"article", "_type":"doc"},
min_doc_freq=1,
min_term_freq=1
)
).to_queryset()
Returns:
<ArticleQuerySet [<Article: OEXxySDEPWaUfgTT54QvBg)>]>
You can make some of your documents field as "default" _id while ingesting data.
Logstash
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "my_name"
document_id => "%{some_field_id}"
}
}
Spark (Scala)
DF.saveToEs("index_name" + "/some_type", Map("es.mapping.id" -> "some_field_id"))
Index API
PUT twitter/_doc/1
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
{
"_shards" : {
"total" : 2,
"failed" : 0,
"successful" : 2
},
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"result" : "created"
}

Get from ElasticSearch why a result is a hit

In the ElasticSearch below I search for the word Balances in two fields name and notes:
GET /_search
{ "query": {
"multi_match": { "query": "Balances",
"fields": ["name","notes"]
}
}
}
And the result in the name field:
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.673515,
"hits" : [
{
"_index" : "idx",
"_type" : "_doc",
"_id" : "25",
"_score" : 1.673515,
"_source" : {
"name" : "Deposits checking accounts balances",
"notes" : "These are the notes",
"#timestamp" : "2019-04-18T21:05:00.387Z",
"id" : 25,
"#version" : "1"
}
}
]
}
Now, I want to know in which field ElasticSearch found the value. I could evaluate the result and see if the searched text is in name or notes, but I cannot do that if it's a fuzzy search.
Can ElasticSearch tell me in which field the text was found, and in addition provide a snippet with 5 words to the left and to the right of the result to tell the user why the result is a hit?
What I want to achieve is similar to Google highlighting in bold the text that was found within a phrase.
I think the 2 solutions in Find out which fields matched in a multi match query are still the valid solutions:
Highlight to find it.
Split the query up into multiple named match queries.

Kibana - given an index, how to find saved objects relying on it?

In Kibana I have many dozens of indices.
Given one of them, I want a way to find all the saved objects (searches/dashboards/visualizations) that rely on this index.
Thanks
You can retrieve the document ID of your index pattern and then use that to search your .kibana index
{
"_index" : ".kibana",
"_type" : "index-pattern",
"_id" : "AWBWDmk2MjUJqflLln_o", <---- take this id...
You can use this query on Kibana 5:
GET .kibana/_search?q=AWBWDmk2MjUJqflLln_o <---- ...and use it here
You'll find your visualizations:
{
"_index" : ".kibana",
"_type" : "visualization",
"_id" : "AWBZNJNcMjUJqflLln_s",
"_score" : 6.2450323,
"_source" : {
"title" : "CA groupe",
"visState" : """{"title":"XXX","type":"pie","params":{"addTooltip":true,"addLegend":true,"legendPosition":"right","isDonut":false,"type":"pie"},"aggs":[{"id":"1","enabled":true,"type":"sum","schema":"metric","params":{"field":"XXX","customLabel":"XXX"}},{"id":"2","enabled":true,"type":"terms","schema":"segment","params":{"field":"XXX","size":5,"order":"desc","orderBy":"1","customLabel":"XXX"}}],"listeners":{}}""",
"uiStateJSON" : "{}",
"description" : "",
"version" : 1,
"kibanaSavedObjectMeta" : {
"searchSourceJSON" : """{"index":"AWBWDmk2MjUJqflLln_o","query":{"match_all":{}},"filter":[]}"""
^
|
this is where your index pattern is used
}
}
},

Elasticsearch: Nested query under a boolean 'should' not returning results

I'm running the following query (it has been shortened for clarity):
body : {
query : {
bool : {
must : [
{
match : {
active : 1
}
},
],
should : [
{
term : {
apply : '2'
}
},
{
nested : {
path : 'items',
query : {
terms : {
'items.product' : ["1","2"]
}
}
}
}
],
minimum_should_match : 1
}
}
}
};
When I run this query, I don't pull back the documents that match the nested query in the should clause; I only pull back documents matching the first condition. What am I doing wrong? Why can't the terms query not test the field against an array of input items and return results?
When I change the nested query to a match_all or match the items.product field to an exact value, I do get results.
Changing the nested query into the following instead of the current nested query (while everything else stays the same) gives me no results either.
nested : {
path : 'items',
query : {
bool : {
must : [
{
terms : {
'items.product' : ["1","2"],
minimum_should_match : 1
}
},
]
}
}
}
Any help would be greatly appreciated - this has been driving me crazy for a couple days now!
EDITED to include discussion of the index mapping
Given that the terms condition expects a non-analyzed field (per the documentation here), I would recommend you verify that your index has a mapping that specifically makes it so. For instance:
{"mappings" : {
"your_doc_type" : {
"items" : {
"type" : "nested",
"properties" : {
"product" : {"type" : "string", "index" : "not_analyzed"},
...
... Other properties of the nested object
...
}
},
...
... Mappings for the other fields in your document type
...
}
}
That should enable the terms to do what they are supposed to do when checking items.product.
My earlier suspicion was that there is something else in your query (min_score perhaps) that is filtering out results based upon score, and that threshold is weeding out the documents that match the items.product condition but not the apply condition due to the underlying Lucene scoring model. In other words, if all other things are equal for documents meeting only one item of the should query, the ones that meet the "apply":"2" condition will score higher than the documents for which items.product is 1 or 2. This was my empirical observation querying a trivially small test set of data with your query.
Test data set:
{"active":1, "apply":"2", "items" : [{"product": "3"}]}
{"active":0, "apply":"2", "items" : [{"product": "3"}]}
{"active":1, "apply":"3", "items" : [{"product": "3"}]}
{"active":1, "apply":"3", "items" : [{"product": "1"}]}
{"active":1, "apply":"3", "items" : [{"product": "2"}]}
Based on the conditions in your query, we should see three documents returned - the first, fourth, and fifth documents.
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1rIJ0nixSnh_cG",
"_score" : 0.731233,
"_source":{"active":1, "apply":"2", "items" : [{"product": "3"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cK",
"_score" : 0.4601705,
"_source":{"active":1, "apply":"3", "items" : [{"product": "2"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cJ",
"_score" : 0.35959372,
"_source":{"active":1, "apply":"3", "items" : [{"product": "1"}]}
} ]
The expected documents came back, but you can see that the first document (for which apply is 2, meeting the first criterion of the should query) scored much higher.
If your intent is for these conditions to not affect the scoring of the documents but to use them instead as simple inclusion/exclusion criteria, you may want to switch to a filtered query. Something like:
{
"query" : {"filtered" : {
"query" : {"match_all" : {}},
"filter" : {"bool" : {
"must" : [
{"term" : {"active" : 1}}
],
"should" : [
{"term" : {"apply" : "2"}},
{"nested" : {
"path": "items",
"query" : {
"terms" : {"items.product" : ["1", "2"]}
}
}}
]
}}
}}
}
Since you are now specifying a filter instead, these conditions should not impact the scoring of the returned documents but instead only determine whether a document qualifies at all for the result set (the scores are then calculated independently of the conditions above). Using this filtered query, the results from my dumb data set are:
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1rIJ0nixSnh_cG",
"_score" : 1.0,
"_source":{"active":1, "apply":"2", "items" : [{"product": "3"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cK",
"_score" : 1.0,
"_source":{"active":1, "apply":"3", "items" : [{"product": "2"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cJ",
"_score" : 1.0,
"_source":{"active":1, "apply":"3", "items" : [{"product": "1"}]}
} ]
The scores are now identical for all returned documents, without regard for which part of the should was satisfied.
Note that the query property above is match_all - if you had other conditions in your query that are not represented in the original question, then you would need to modify this accordingly.

Resources