Sorting results based on location using elastica - elasticsearch

I am trying to learn ElasticSearch using elastica to connect in and finding information hard to get in order to understand how to query it.
So basically what I am trying to do is, I have inserted data into elastic search, added geo coordinates in and now what i need to do is to be able to run a query that will sort the results i get by closest to farthest.
I wanted to find all the stores in my state, then order them by which one is closest to my current location.
so given a field called "state" and field called "point" which is an array holding long/Lat using elastica what would the query be?
Thanks for any help that you can give me.

First, you need to map your location field as type geo_point (this needs to be done before inserting any data)
{
"stores" : {
"properties" : {
"point" : {
"type" : "geo_point"
}
}
}
}
After that, you can simply sort your search by _geo_distance
{
"sort" : [
{
"_geo_distance" : {
"stores.point" : [-70, 40], // <- reference starting position
"order" : "asc",
"unit" : "km"
}
}
],
"query" : {
"match_all" : {}
}
}
For Elastica, have a look at their docs regarding mapping and query building, and read the unit tests.

For those wanting to sort by distance, this cut down snippet details how to use a custom score:
$q = new BoolQuery();
$subQ = new MultiMatchQuery();
$subQ->setQuery('find me')->setFields(array('foo', 'bar'));
$q->addShould($subQ);
$cs = new CustomScore();
$cs->setScript('-doc["location"].distanceInKm(lat,lon)');
$cs->addParams(array(
'lat' => -33.882583,
'lon' => 151.209737
));
$cs->setQuery($q);
Hope it helps.

Related

How to dynamically change destination index in continuous Elasticsearch transforms?

I am trying to extract some high level metrics from the log data we store in Elasticsearch. To achieve this I am running a number of continuous transforms to generate more meaningful high level logs.
I have included a dest block in my transform definition JSON, as follows:
"dest": {
"index": "transform_index" + date
}
But the aforementioned code is evaluated only once on transform creation time, and is not updated in future sync cycles.
I am looking for a solution to change the transform index on a monthly basis and I think it is doable using a pipeline. However, I am not sure how.
Any pointers are appreciated.
I've read through the documentation and found my answer. I've managed to achieve what I needed using pipelines, I've created a pipeline as follows:
PUT /_ingest/pipeline/add_timestamp_pipeline
{
"processors" : [
{
//copy timestamp field from transform source
"set" : {
"field" : "#timestamp",
"value" : "{{#timestamp}}"
}
},
{
//create indices based on #timestamp rounded to month
"date_index_name" => {
"field" => "#timestamp",
"index_name_prefix" => "hourly-activity-index-",
"date_rounding" => "M",
"date_formats" => ["UNIX_MS"]
}
}
]
}
Then you use the created pipeline in your transform:
PUT /_transform/hourly_transform
{
"dest" : {
"index" : "hourly_activity_index",
"pipeline" => "add_timestamp_pipeline"
},
//rest of the transform definition
}

Delete by Query with Sort in Elasticsearch

I want to delete the most current item in my Elasticsearch index sorted by myDateField which is a date type. Is that possible? I want something like this query but this would delete all matching items even though I have the size at 1.
{
"query" : {
"match_all" : {
}
},
"size" : "1",
"sort" : [
{
"myDateField" : {
"order" : "desc"
}
}
]
}
Delete by query is unlikely to support any sorting features.
If you try Delete by query - however you'll get the error: request does not support [sort]. I couldn't find any documentation saying that the "sort" parameter is not supported in delete by query.
I've one idea to do it but don't know it's the best way or not?
Step 1: Do a normal query based on your conditions+sorting and get those ids.
Step 2: Build a bulk query to delete all documents retrieved above by id those you got on Step 1.

How to run Elasticsearch completion suggester query on limited set of documents

I'm using a completion suggester in Elasticsearch on a single field. The type contains documents of several users. Is there a way to limit the returned suggestions to documents that match a specific query?
I'm currently using this query:
{
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
Is there a way to combine this query with a different one, e.g.
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
}
}
Have a look at the context suggester, which is just a specialized completion suggester with filtering capabilities - however this is still not a regular query filter, just keep that in mind.
You can specify both the query and the suggester in your query, like this:
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
},
"suggest": {
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
}
I have a similar use case, and I've posted my question on elastic search forum, see here
From what I've read so far, I don't think with completion suggester you can limit documents. They essentially create a finite state transducer (prefix tree) at index time, this makes it fast but you lose the flexibility of filtering on additional fields. I don't think context suggester would work in your case (let me know if i am wrong), because the cardinality of user_id is very high.
I think edge-ngrams partial matching is more flexible and might actually work in your use case.
Let me know what you end up implementing.

elastic search filter by documents count in nested document

I have this schema in elastic search.
79[
'ID' : '1233',
Geomtries:[{
'doc1' : 'F1',
'doc2' : 'F2'
},
(optional for some of the documents)
{
'doc2' : 'F1',
'doc3' : 'F2'
}]
]
the Geometries is a nested element.
I want to get all of the documents that have one object inside Geometries.
Tried so far :
"script" : {"script" : "if (Geomtries.size < 2) return true"}
But i get exceptions : no such property GEOMTRIES
If you have the field as type nested in the mapping, the typical doc[fieldkey].values.size() approached does not seem to work. I found the following script to work:
{
"from" : 0,
"size" : <SIZE>,
"query" : {
"filtered" : {
"filter" : {
"script" : {
"script" : "_source.containsKey('Geomtries') && _source['Geomtries'].size() == 1"
}
}
}
}
}
NB: You must use _source instead of doc.
The problem is in the way you access fields in your script, use:
doc['Geometry'].size()
or
_source.Geometry.size()
By the way for performance reasons, I would denormalize and add GeometryNumber field. You can use the transform mapping to compute size at index time.

Full-text schema in ElasticSearch

I'm (extremely) new to ElasticSearch so forgive my potentially ridiculous question. I currently use MySQL to perform full-text searches, and want to move this to ElasticSearch. Currently my table has a fulltext index spanning three columns:
title,description,tags
In ES, each document would therefore have title, description and tags fields, allowing me to do a fulltext search for a general phrase, or filter on a given tag.
I also want to add further searchable fields such as username (so I can retrieve posts by a given user). So, how do I specify that a fulltext search should match title OR description OR tags but not username?
From the OR filter example, I'd assume I'd have to use something like this:
{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"or" : [
{
"term" : { "title" : "foobar" }
},
{
"term" : { "description" : "foobar" }
},
{
"term" : { "tags" : "foobar" }
}
]
}
}
}
Coming at this new, it doesn't seem like this is very efficient. Is there a better way of doing this, or do I need to move the username field to a separate index?
This is fine.
I general I would suggest getting familiar with ElasticSearch mapping types and options.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html

Resources