Need Elasticsearch guidance for searching on an array of chemical compounds - elasticsearch

I have a list of products and array of chemical compounds for each product, i.e. ['Sodium', 'Sodium bicarbonate', .....]. In this example 'sodium', and 'sodium bicarbonate' are two different values that can be search on independently, which complicates things, so using the text keyword field criteria did not help.
I need some guidance on the best method to handle these array of strings within Elasticsearch while retaining Elasticsearch's indexing magic. I appreciate any help you can provide.
FYI
I'm currently using Elasticsearch 6.3

You can use the multi-match query, which builds on the match query to allow multi-field queries
Adding a working example with index data, search query, and search result.
Index Data:
{
"product": "product1",
"compounds": [
"Sodium",
"Sodium bicarbonate"
]
}
{
"product": "product2",
"compounds": [
"Sodium"
]
}
{
"product": "product3",
"compounds": [
"Sodium bicarbonate"
]
}
{
"product": "product4",
"compounds": [
"Chlorine
]
}
Search Query:
{
"query": {
"multi_match" : {
"query": "Sodium AND Sodium bicarbonate",
"fields": [ "compounds", "compounds.keyword" ]
}
}
}
Search Result:
"hits": [
{
"_index": "65513968",
"_type": "_doc",
"_id": "1",
"_score": 1.0897084,
"_source": {
"product": "product1",
"compounds": [
"Sodium",
"Sodium bicarbonate"
]
}
},
{
"_index": "65513968",
"_type": "_doc",
"_id": "3",
"_score": 1.0659102,
"_source": {
"product": "product3",
"compounds": [
"Sodium bicarbonate"
]
}
},
{
"_index": "65513968",
"_type": "_doc",
"_id": "2",
"_score": 0.7032229,
"_source": {
"product": "product",
"compounds": [
"Sodium"
]
}
}
]
You can use terms query if you want to return documents that contain one or more exact terms in a field
A unique list of chemical compounds
To find the unique lists of chemical compounds you can use the terms aggregation.
{
"size": 0,
"aggs": {
"compounds": {
"terms": {
"field": "compounds.keyword"
}
}
}
}
Result:
"aggregations": {
"compounds": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sodium",
"doc_count": 2
},
{
"key": "Sodium bicarbonate",
"doc_count": 2
},
{
"key": "Chlorine",
"doc_count": 1
}
]
}
}

Related

Elasticsearch - Find documents missing two fields

I'm trying to create a query that returns information about how many documents that don't have data for two fields (date.new and date.old). I have tried the query below, but it works as OR-logic, where all documents missing either date.new or date.old are returned. Does anyone know how I can make this only return documents missing both fields?
{
"aggs":{
"Missing_field_count1":{
"missing":{
"field":"date.new"
}
},
"Missing_field_count2":{
"missing":{
"field":"date.old"
}
}
}
}
Aggregations is not the feature to use for this. You need to use the exists query wrapped within a bool/must_not query, like this:
GET index/_count
{
"size": 0,
"bool": {
"must_not": [
{
"exists": {
"field": "date.new"
}
},
{
"exists": {
"field": "date.old"
}
}
]
}
}
hits.total.value indicates the count of the documents that match the search request. The value indicates the number of hits that match and relation indicates whether the value is accurate (eq) or a lower bound (gte)
Index Data:
{
"data": {
"new": 1501,
"old": 10
}
}
{
"title": "elasticsearch"
}
{
"title": "elasticsearch-query"
}
{
"date": {
"new": 1400
}
}
The search query given by #Val answers on how to achieve your use case.
Search Result:
"hits": {
"total": {
"value": 2, <-- note this
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "65112793",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"title": "elasticsearch"
}
},
{
"_index": "65112793",
"_type": "_doc",
"_id": "5",
"_score": 0.0,
"_source": {
"title": "elasticsearch-query"
}
}
]
}

Elasticsearch query filter combination issue

Im trying to understand why the below elasticsearch query does not work.
EDIT:
The fields mentioned in the query are from different indices. For example Filter has classification field which is in a different index to the fields mentioned in the query string.
The expectation of the filter query is that when the user searches specifically on classification field i.e. secret or protected then the values are displayed. Else if the user searches for any other field from a different index for example firstname or person, then it should not consider any filter applied as firstname or person is not part of the filter
{
"query": {
"bool": {
"filter": {
"terms": {
"classification": [
"secret",
"protected"
]
}
},
"must": {
"query_string": {
"query": "*john*",
"fields": [
"classification",
"firstname",
"releasability",
"person"
]
}
}
}
}
}
The result expected is john in the field person is returned. This works when there is no filter applied in the above code as
{
"query": {
"query_string": {
"query": "*john*",
"fields": [
"classification",
"firstname",
"releasability",
"person"
]
}
}
}
The purpose of the filter is only to filter records when the said fields contain the values mentioned, otherwise it should work for all values.
Why is it not producing the results for john and only producing results for classification values only?
Adding a working example with sample index data and search query.
To know more about Bool query refer this official documentation
Index Data:
Index data in my_index index
{
"name":"John",
"title":"b"
}
{
"name":"Johns",
"title":"a"
}
Index data in my_index1 index
{
"classification":"protected"
}
{
"classification":"secret"
}
Search Query :
POST http://localhost:9200/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"filter": [
{
"terms": {
"classification": [
"secret",
"protected"
]
}
}
]
}
},
{
"bool": {
"must": [
{
"query_string": {
"query": "*john*",
"fields": [
"name",
"title"
]
}
}
]
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"name": "John",
"title": "b"
}
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"name": "Johns",
"title": "a"
}
},
{
"_index": "my_index1",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"classification": "secret"
}
},
{
"_index": "my_index1",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"classification": "protected"
}
}
]

Elasticsearch associating exact match terms

I have a search index of filenames containing over 100,000 entries that share about 500 unique variations of the main filename field. I have recently made some modifications to certain filename values that are being generated from my data. I was wondering if there is a way to link certain queries to return an exact match. In the following query:
"query": {
"bool": {
"must": [
{
"match": {
"filename": "foo-bar"
}
}
],
}
}
how would it be possible to modify the index and associate the results so that above query will also match results foo-bar-baz, but not foo-bar-foo or any other variation?
Thanks in advance for your help
You can use a term query instead of a match query. Perfect to use on a keyword:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
Adding a working example with index data and search query. (Using the default mapping)
Index Data:
{
"fileName": "foo-bar"
}
{
"fileName": "foo-bar-baz"
}
{
"fileName": "foo-bar-foo"
}
Search Query:
{
"query": {
"bool": {
"should": [
{
"match": {
"fileName.keyword": "foo-bar"
}
},
{
"match": {
"fileName.keyword": "foo-bar-baz"
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"fileName": "foo-bar"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.9808291,
"_source": {
"fileName": "foo-bar-baz"
}
}
]

Elasticsearch: Top k results per keyword

We have the following document in elasticsearch.
class Query(DocType):
text = Text(analyzer='snowball', fields={'raw': Keyword()})
src = Keyword()
Now we want top k results for each src. How can we achieve this?
Example:- Lets assume we index the following:
# src: place_order
Query(text="I want to order food", src="place_order")
Query(text="Take my order", src="place_order")
...
# src: payment
Query(text="How to pay ?", src="payment")
Query(text="Do you accept credit card ?", src="payment")
...
Now if the user writes a query take my order please along with the credit card details, and k=1, then we should return the following two results
[{"text": "Take my order", "src": "place_order", },
{"text": "Do you accept credit card ?", "src": "payment"}
]
Here since k=1, we are returning the just one result for each src.
You may try top hits aggregation which will return top N matching documents per each bucket in aggregation.
For the example in your post the query might look like this:
POST queries/query/_search
{
"query": {
"match": {
"text": "take my order please along with the credit card details"
}
},
"aggs": {
"src types": {
"terms": {
"field": "src"
},
"aggs": {
"best hit": {
"top_hits": {
"size": 1
}
}
}
}
}
}
The search on the text query restricts the set of documents for the aggregation. "src types" aggregation groups all src values found in the matched documents, and "best hit" selects one most relevant document per bucket (size parameter can be changed according to your needs).
The result of the query would be like the following:
{
"hits": {
"total": 3,
"max_score": 1.3862944,
"hits": [
{
"_index": "queries",
"_type": "query",
"_id": "VD7QVmABl04oXt2HGbGB",
"_score": 1.3862944,
"_source": {
"text": "Do you accept credit card ?",
"src": "payment"
}
},
{
"_index": "queries",
"_type": "query",
"_id": "Uj7PVmABl04oXt2HlLFI",
"_score": 0.8630463,
"_source": {
"text": "Take my order",
"src": "place_order"
}
},
{
"_index": "queries",
"_type": "query",
"_id": "UT7PVmABl04oXt2HKLFy",
"_score": 0.6931472,
"_source": {
"text": "I want to order food",
"src": "place_order"
}
}
]
},
"aggregations": {
"src types": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "place_order",
"doc_count": 2,
"best hit": {
"hits": {
"total": 2,
"max_score": 0.8630463,
"hits": [
{
"_index": "queries",
"_type": "query",
"_id": "Uj7PVmABl04oXt2HlLFI",
"_score": 0.8630463,
"_source": {
"text": "Take my order",
"src": "place_order"
}
}
]
}
}
},
{
"key": "payment",
"doc_count": 1,
"best hit": {
"hits": {
"total": 1,
"max_score": 1.3862944,
"hits": [
{
"_index": "queries",
"_type": "query",
"_id": "VD7QVmABl04oXt2HGbGB",
"_score": 1.3862944,
"_source": {
"text": "Do you accept credit card ?",
"src": "payment"
}
}
]
}
}
}
]
}
}
}
Hope that helps!

Elastic Search- Fetch Distinct Tags

I have document of following format:
{
_id :"1",
tags:["guava","apple","mango", "banana", "gulmohar"]
}
{
_id:"2",
tags: ["orange","guava", "mango shakes", "apple pie", "grammar"]
}
{
_id:"3",
tags: ["apple","grapes", "water", "gulmohar","water-melon", "green"]
}
Now, I want to fetch unique tags value from whole document 'tags field' starting with prefix g*, so that these unique tags will be display by tag suggestors(Stackoverflow site is an example).
For example: Whenever user types, 'g':
"guava", "gulmohar", "grammar", "grapes" and "green" should be returned as a result.
ie. the query should returns distinct tags with prefix g*.
I tried everywhere, browse whole documentations, searched es forum, but I didn't find any clue, much to my dismay.
I tried aggregations, but aggregations returns the distinct count for whole words/token in tags field. It does not return the unique list of tags starting with 'g'.
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"allow_leading_wildcard": false,
"fields": [
"tags"
],
"query": "g*",
"fuzziness":0
}
}
]
}
},
"filter": {
//some condition on other field...
}
}
},
"aggs": {
"distinct_tags": {
"terms": {
"field": "tags",
"size": 10
}
}
},
result of above: guava(w), apple(q), mango(1),...
Can someone please suggest me the correct way to fetch all the distinct tags with prefix input_prefix*?
It's a bit of a hack, but this seems to accomplish what you want.
I created an index and added your docs:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"tags":["guava","apple","mango", "banana", "gulmohar"]}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"tags": ["orange","guava", "mango shakes", "apple pie", "grammar"]}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"tags": ["guava","apple","grapes", "water", "grammar","gulmohar","water-melon", "green"]}
Then I used a combination of prefix query and highlighting as follows:
POST /test_index/_search
{
"query": {
"prefix": {
"tags": {
"value": "g"
}
}
},
"fields": [ ],
"highlight": {
"pre_tags": [""],
"post_tags": [""],
"fields": {
"tags": {}
}
}
}
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"highlight": {
"tags": [
"guava",
"gulmohar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grammar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grapes",
"grammar",
"gulmohar",
"green"
]
}
}
]
}
}
Here is the code I used:
http://sense.qbox.io/gist/c14675ee8bd3934389a6cb0c85ff57621a17bf11
What you're trying to do amounts to autocomplete, of course, and there are perhaps better ways of going about that than what I posted above (though they are a bit more involved). Here are a couple of blog posts we did about ways to set up autocomplete:
http://blog.qbox.io/quick-and-dirty-autocomplete-with-elasticsearch-completion-suggest
http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
As per #Sloan Ahrens advice, I did following:
Updated the mapping:
"tags": {
"type": "completion",
"context": {
"filter_color": {
"type": "category",
"default": "",
"path": "fruits.color"
},
"filter_type": {
"type": "category",
"default": "",
"path": "fruits.type"
}
}
}
Reference: ES API Guide
Inserted these indexes:
{
_id :"1",
tags:{input" :["guava","apple","mango", "banana", "gulmohar"]},
fruits:{color:'bar',type:'alice'}
}
{
_id:"2",
tags:{["orange","guava", "mango shakes", "apple pie", "grammar"]}
fruits:{color:'foo',type:'bob'}
}
{
_id:"3",
tags:{ ["apple","grapes", "water", "gulmohar","water-melon", "green"]}
fruits:{color:'foo',type:'alice'}
}
I don't need to modify much, my original index. Just added input before tags array.
POST rescu1/_suggest?pretty'
{
"suggest": {
"text": "g",
"completion": {
"field": "tags",
"size": 10,
"context": {
"filter_color": "bar",
"filter_type": "alice"
}
}
}
}
gave me the desired output.
I accepted #Sloan Ahrens answer as his suggestions worked like a charm for me, and he showed me the right direction.

Resources