ElasticSearch query referencing document - elasticsearch

I read some time ago that there was a way to build a query that references another document in your index. At the time, this wasn't helpful to me, but I now have very large GIS areas that I need to query against and sending this data to ElasticSearch in the query body every time seems wasteful.
While my specific use-case relates to GIS, geo_shape, etc, it's a general issue that can be applied to other types of queries.
I have a document type areas that holds all of the predefined search areas (these are things like suburbs, states, etc) and entities that hold all of my search data, including a geo_point type field with lat/lon.
I need to be able to construct a geo_shape query for entities documents that references the mpoly attribute (which is a GeoShape type) on an areas document for it's shape coordinates.
Unfortunately, neither Google nor reading the ElasticSearch docs have proved useful in this case, because generally nested documents (related, but not what I'm looking for) is what people seem to be more interested in.

Finally found the answer myself while looking for something different. Unfortunately, the information about the GeoShape filter is not in the GeoShape query manual pages:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-shape-filter.html#_pre_indexed_shape
{
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_shape": {
"location": {
"indexed_shape": {
"id": "DEU",
"type": "countries",
"index": "shapes",
"path": "location"
}
}
}
}
}
}
If anyone has better information about how to do this generically, I will happily accept their answer instead.

Related

Position aware search results in Elasticsearch automcompletion

I want to implement address autocompletion using Elasticsearch.
The current approach I am investigating is based on search_as_you_type field type.
Consider this two addresses:
3543JN Carl Zellerhof 8 Utrecht (3543JN is postcode)
1234JN The Street 3543 Utrecht
It is important to prioritize some address parts over others, for instance, postcode should have more weight than number, eg when a user types 3543 - the first address should be first in search results.
I see two solutions here:
Combine address into one string and give weight based on position within the combined string
Do search on multiple fields (then weight can be adjusted per field, but it seems more complex to me, how to ensure the same address part is not matched several times?)
I am leaning more towards one-string solution, but this implementation gives the same weight for the 3543 search query.
Please advise how to implement this.
(It is also desirable to allow some fuzziness)
UPD:
seems adding postcode field to the multi_match fields gives me what I want. Are there any disadvantages of this approach?
the index
{
"mappings": {
"properties": {
"search": {
"type": "search_as_you_type"
}
}
}
}
the search query
{
"query": {
"multi_match": {
"query": "3543",
"type": "bool_prefix",
"fields": [
"search",
"search._2gram",
"search._3gram"
]
}
}
}

Most performant way to update a single document in Elasticsearch via an alias

I have an Elasticsearch setup with an alias that points to many indices. I need to update a single document, but I don't know which index it resides in.
There are two ways I can accomplish this as far as I can see:
_update_by_query:
POST my-alias/_update_by_query
{
"query": {
"terms": {
"_id": ["my-id-to-update"]
}
},
"script": {
"source": "ctx._source['Field'] = 'new value'"
}
}
read (which returns the specific index) then write:
GET my-alias/_search
{
"query": {
"terms": {
"_id": ["my-id-to-update"]
}
}
}
POST my-index-returned-from-the-get/_update/my-id-to-update
{
"doc": {
"Field": "new value"
}
}
Which method is more performant?
Which method is preferred?
Is there a better way than either of these two?
The performance of both approach will be the same with one difference that your first approach only need to send one request compare to second one with two request, so it would be better to use first approach as you will reduce the API calls by half.
Also in my opinion the first approach is much cleaner and fits more in concept of aliases of Elasticsearch because you are encapsulating exact index name from your application, as application doesn't need to have any clue about exact index-name your documents are in.
An important note about updating a document in Elasticsearch is documents in Elasticsearch don't get updated, it means the document will be flagged as deleted and new document will be created (this is due to Lucene implementation), then during process of Lucene segment merging the document will be actually deleted.
you can find a good blog post about segment merging here.

How to boost Elasticsearch results based on another field?

Kinda simple use case but cannot come up with good solution.
Basically I have two indexed fields: content and keywords (keyword tokenizer), where content is a long text field and keywords contain important terms within that content. When I query with some long text, I have to boost those results based on the keywords present in the matching document.
I tried querying the complete text on both content and keywords field, but it is too slow or it throws too_many_clauses error for text with more than 40 words.
{"query": {
"match": {
"keywords": {
"query": "some long text",
"analyzer": "custom_analyzer"
}
}
}}
Is there any better way? Would percolator work here?
I can relate this to my application, which is similar to Stackoverflow, which consists of question and answers, for a question, there is subject, body, tags etc.
Subject here relates to your keyword indexed field and body relate to your content indexed field. Normally subject contains the important keywords about the post, which is also the case with you.
Now coming to solution part,
How we solve it by querying both on subject and body indexed fields but boost subject by a factor of 15, which is configurable.
ES query which we use:
{
"query": {
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^15", "message" ]
}
}
}
This ES doc also has a similar example where they are boosting a subject field in multi_match query by a factor of 3.
Let me know if you have any questions.

Is it possible to make elasticsearch highlights linkable?

I'm successfully using ES for indexing documents and higlighting searched text. But now I have a new requirement - make all yellow highlights linkable, i.e user have to be able to dive into the page with selected occurence.
I haven't implemented page preview of document yet but I'm sure that there exists some software which gets page number or bytes offset and returns docx or pdf page as image. So, I want elastic to return index of occurence (most likely, byte offset from the beginning). After that I probably may use indexToImage soft for showing occurence page to user. Even if such software does not exist I may open RandomAccessFile and read occurence page and somehow show it to user. But anyway I need occurence index. is it possible to get it from elastic?
My search request looks like:
http://localhost:9200/mongofilesindex/_search?pretty&source={
"_source": ["filename",
"metadata"],
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*test*"
}
}
}
},
"highlight": {
"pre_tags": ["<mark>"],
"post_tags": ["</mark>"],
"fields": {
"content": {
"fragment_size": 200,
"number_of_fragments": 10
}
}
}
}&size=10&from=0
Of course, I may use ES just for extracting matching documents and after that manually apply KMP in input stream which works in linear time. But I want something better than linear because I know that suffix automatas and other complex data structures may return occurences in O(search_string_len+occurences_count) which is much more better than O(doc_len).
I'm sure that elastic uses such cool data structures and probably I'm missing some API for getting occurences indices.

Elastic search paginating on multiple types in an index

I have an index with multiple types like below :
songs
books
movies
I am building an API for suggesting indexed items grouped by their type, The problem is that I want a size functionality inside each aggregation, Just like the completion suggester approach which returns an exact number of items for each type. I ended up with multi index query approach to query each type separately, Is there any better approach to handle this ?
Each aggregation you specify can have a filter associated with it, so you could
reduce the context of an aggregation to a specific type that way. Additionally,
you can use the filters aggregation to create buckets for each filter, and run
an aggregation with a certain size on each sub-bucket, like this:
GET /_search
{
"aggs": {
"alltypes": {
"filters": {
"filters": {
"songs": {"term": {"_type": "songs"}},
"books": {"term": {"_type": "books"}},
"movies": {"term": {"_type": "movies"}}
}
},
"aggs": {
... your aggregation for each individual type here ...
}
}
}
}
More info about the filters aggregation can be found at
http://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html
Hopefully that helps, let me know if I misunderstood your question (it was a
little uncertain whether you were talking about suggestors or aggregations since
both were mentioned in the question).

Resources