Elasticsearch searching a subset of nested objects with simple_query_string - elasticsearch

Let's say I have the following data format:
{
"name": "John Smith",
"publications": [
{
"category": "cat-aaa",
"text": "car airplane boat"
},
{
"category": "cat-bbb",
"text": "pen chain headphones"
},
{
"category": "cat-ccc",
"text": "mouse screen computer"
}
]
}
Right now, I am using simple_query_string to search across the publications.text field. A query such as car AND mouse would return the document above.
What I want is to search for people by publications' texts, but only for some categories. For instance, a query car AND mouse searching through categories cat-aaa and cat-ccc should return the above document. However, searching for car AND mouse through category cat-aaa should not match the above document.
The two approaches that I have tried so far did not work:
Without using nested queries it's not possible to filter publications as ES is flattening the data.
When using nested queries, I cannot search across all the publications of a person, so the query example from above wouldn't work. Nested queries seem to process individual nested documents, but I need to search across all nested documents matching the category I choose.
Can such a query be done using ES ?

Related

Fuzzy Results Score in Complete Suggester Elastic Search

I have just started using Elastic search and am stuck with the following use case -
I am using complete suggester in elastic search with auto fuzziness setting to get city suggestions as output. My city name in completion field has weights according to popularity. The problem is the ordering in case of fuzzy results.
Example if user types "dilh" -> I would want to give "delhi" result above "digha" or "dighwara" owing to popularity i.e. weights assigned to different cities.
Right now "digha","dighwara","Dihira" etc are coming above more relevant cities like "delhi" or "dalhousie". Since the edit distance is same anyone can let me know how can I configure this so the order is according to the weights of cities?
Attaching sample request:
{
"suggest": {
"loc-suggest2": {
"prefix": "dilh",
"completion": {
"field": "suggestedNames",
"size":20,
"fuzzy": {
"fuzziness": auto
}
}
}
}
}

Retrieve distinct values for search as you type in Elasticsearch

We have a field title and the type is search_as_you_type,
{
"mappings": {
"properties": {
"title": {
"type": "search_as_you_type"
}
}
}
}
and when we a searching
{
"query": {
"match_phrase_prefix": {
"title": "red"
}
}
}
we are getting duplicates results
red car
red icecream
red car
This is because we have documents with same title values.
Is there a way to indicate that result must have distinct vaules?
You can see terms aggregation of your title field in case of search as you type works on not by following the example given in [this SO answer] 1. You can also check this blog which explains how to get unique values from Elasticsearch.
Also, make sure these documents which are coming in your results are the same documents and not the different document which has the same values.
Edit:- As discussed in the comment, in this case, completion suggestor was more useful as it deals with duplicates and it solved the issue.

How to boost Elasticsearch results based on another field?

Kinda simple use case but cannot come up with good solution.
Basically I have two indexed fields: content and keywords (keyword tokenizer), where content is a long text field and keywords contain important terms within that content. When I query with some long text, I have to boost those results based on the keywords present in the matching document.
I tried querying the complete text on both content and keywords field, but it is too slow or it throws too_many_clauses error for text with more than 40 words.
{"query": {
"match": {
"keywords": {
"query": "some long text",
"analyzer": "custom_analyzer"
}
}
}}
Is there any better way? Would percolator work here?
I can relate this to my application, which is similar to Stackoverflow, which consists of question and answers, for a question, there is subject, body, tags etc.
Subject here relates to your keyword indexed field and body relate to your content indexed field. Normally subject contains the important keywords about the post, which is also the case with you.
Now coming to solution part,
How we solve it by querying both on subject and body indexed fields but boost subject by a factor of 15, which is configurable.
ES query which we use:
{
"query": {
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^15", "message" ]
}
}
}
This ES doc also has a similar example where they are boosting a subject field in multi_match query by a factor of 3.
Let me know if you have any questions.

Elasticsearch: querying on both nested object properties and parent properties

I have some documents which have nested objects inside nested objects :
{
"started_at": 1455088063966,
"ended_at": 1455088131966,
"tags": [{
"type": "transfer",
"at": 1455088064462,
"events": [{
"type": "transfer_processed",
"at": 1455088131981
}]
}, {
"at": 1455088138232,
"item": "tag",
"type": "info"
}]
}
Here, the main document has several nested objects (the tags), and for each tag there are several nested objects (the events).
I would like to get all the documents where the events of type transfer_processed occured 60000 milliseconds after the tags of type transfer.
For this, I would need to query on both tags.at, tags.type, tags.events.at and tags.events.type. And I can't figure out how: I only manage to query on the tags.events properties, or only on the tags properties, not both.
Nested objects are actually separate Lucene documents under the hood, so you are essentially trying to "join" multiple documents together to do your comparisons. Unfortunately, this is not supported by Elasticsearch.
Have a look at this similar question and answer which explain it well.

ElasticSearch query referencing document

I read some time ago that there was a way to build a query that references another document in your index. At the time, this wasn't helpful to me, but I now have very large GIS areas that I need to query against and sending this data to ElasticSearch in the query body every time seems wasteful.
While my specific use-case relates to GIS, geo_shape, etc, it's a general issue that can be applied to other types of queries.
I have a document type areas that holds all of the predefined search areas (these are things like suburbs, states, etc) and entities that hold all of my search data, including a geo_point type field with lat/lon.
I need to be able to construct a geo_shape query for entities documents that references the mpoly attribute (which is a GeoShape type) on an areas document for it's shape coordinates.
Unfortunately, neither Google nor reading the ElasticSearch docs have proved useful in this case, because generally nested documents (related, but not what I'm looking for) is what people seem to be more interested in.
Finally found the answer myself while looking for something different. Unfortunately, the information about the GeoShape filter is not in the GeoShape query manual pages:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-shape-filter.html#_pre_indexed_shape
{
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_shape": {
"location": {
"indexed_shape": {
"id": "DEU",
"type": "countries",
"index": "shapes",
"path": "location"
}
}
}
}
}
}
If anyone has better information about how to do this generically, I will happily accept their answer instead.

Resources