Trouble with Elasticsearch nested query & date calculations - elasticsearch

I'm having trouble writing a query to query users with active events.
The short setup is I have users whom have events with start dates and end dates. Given a specific date, I need to know which users do NOT have active events on that day. Events are indexed as nested objects as they have their own models.
So here is some data
[
{
id: 1
name: 'MyUser',
events :[
{ id: 1, start: 02/01/2016, end: 02/05/2016 },
{ id: 2, start: 02/09/2016, end: 02/10/2016 },
]
},
{
id: 2
name: 'MyUser2',
events :[
{ id: 3, start: 02/02/2016, end: 02/04/2016 },
]
},
{
id: 3
name: 'MyUser3',
events :[
]
}
]
the map looks like this
'events' => [
'type'=>'nested',
'properties'=>[
'start'=>[
'type' => 'date',
"format"=> "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
],
'end'=>[
'type' => 'date',
"format"=> "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
]
]
],
So for a example query of 02/08/2016, i need to show all users free, 02/04/2016 only user 3, and 02/08/2016 only users 1 and 3
my query currently looks like this
{
"filtered": {
"filter": {
"bool": {
"should": [{
"term": {
"events_count": 0
}
}, {
"nested": {
"path": "events",
"query": {
"bool": {
"must_not": [{
"range": {
"events.start": {
"lte" : "2016-02-08"
}
}
}, {
"range": {
"events.end": {
"gte": "2016-02-08"
}
}
}]
}
}
}
}]
}
}
}
}
I indexed events_count separate because I already gave up on mixing missing with nested objects it just didn't work as expected
Actual Problem:
So the trouble with this is user trying to match start and end dates together, currently User1 is matching the start criteria lte $search_date when it shouldn't.
The logic I'm trying to write is WHEN events.start < $search_date AND events.end > $search_date, consider it a match.
What is actually happening it seems is its evaluating the start & end logic as separate logic and thus if start < $search_date even if .end < $search_date it considers it a match.

You need to wrap your range queries within another bool query and must clause (equivalent of SQL AND).
must_not will exclude all the documents which match any of the queries
So rather than having
must_not => range_queries
make it like so:
must_not => bool => must => range_queries

Related

Elasticsearch - Sort query based on collapse results

I'm trying to group/stack items based on their SKU.
Currently if sorting from high to low, an item thats being sold for $10 or $1, will show the $1 item first (because it's also sold for $10 it will be placed in front of the array ofcourse). The sorting should only respect the lowest_price for its sorting operation, for only that specific SKU.
Is there a way so I can do sorting based on the lowest_price of for every SKU and only return 1 single item per SKU?
If the results from the collapse could be used as variable for the sorting, this could be solved but I haven't been able to find out how this work.
My item object looks like this:
{
itemId: String,
sku: String,
price: Number
}
This is my query:
let itemsPerPage = 25;
let searchQuery = {
from: itemsPerPage * page,
size: itemsPerPage,
_source: ['itemId'],
sort: [{'sale.price': 'desc'}],
query: {
bool: {
must: [],
must_not: []
}
},
collapse: {
field: 'sku',
inner_hits: [{
name: 'lowest_price',
size: 1,
_source: ['itemId'],
sort: [{
'price': 'asc'
}]
}
],
}
};
You need to add sort underneeth collapse.
example:
GET /test/_search
{
"query": {
"function_score": {
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match" : {
"job_status" : "SUCCESS"
}
}
]
}
}
}
}
}
},
"collapse": {
"field": "run_id.keyword"
},
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
This may solve your issue.

How to aggregate only results returned from query in elastic search

I'm trying to do bucket aggregations in elastic search that only runs on the results that were returned from query.
It seems like the the aggregation runs on every hits but only return a portion of it. Which is fine but the problem is the documents that are returned from the aggregation doesn't match the documents that are returned from the query.
Here is the mapping:
LOCATION_MAPPING = {
id: { type: 'long' },
name: { type: 'text' },
street: { type: 'text' },
city: { type: 'text' },
state: { type: 'text' },
zip: { type: 'text' },
price: { type: 'text' },
geolocation: { type: 'geo_point' },
amenities: { type: 'nested' },
reviews: { type: 'nested' },
};
Here is the query:
{
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"bool": {
"filter": {
"geo_distance": {
"distance": "1000yd",
"geolocation": [
-73.990768410025,
40.713144830193
]
}
},
"must": {
"multi_match": {
"query": "new york",
"fields": [
"name^2",
"city",
"state",
"zip"
],
"type": "best_fields"
}
}
}
},
"aggs": {
"reviews": {
"nested": {
"path": "reviews"
},
"aggs": {
"location": {
"terms": {
"field": "reviews.locationId"
},
"aggs": {
"avg_rating": {
"avg": {
"field": "reviews.rating"
}
}
}
}
}
}
}
}
Following resources should help understand the behavior you are observing and the questions you have:
It seems like the the aggregation runs on every hits but only return a portion of it.
Yes, the terms aggregation that you have will by default only return the top 10 buckets and you can update that with a size parameter (size 0 will return all buckets). See Show all Elasticsearch aggregation buckets, a related post.
the problem is the documents that are returned from the aggregation doesn't match the documents that are returned from the query.
In the Elasticsearch response, you should be seeing the top 10 scoring results (again there's a size param at the root level of the query that defaults to 10 - see Elasticsearch From/Size Doc) and the top 10 buckets for your aggregations. The top scoring results may not have the most common review.locationId.
I think your options are:
specify a size n to say you only want top n results and run aggregations on top n results - review this post on sampler aggregation for aggregating on top n results or review this post on leveraging filter aggregation with limit filter for aggregation on top n results... pay attention to the notes on shards
fetch ALL results (specify a ridiculously large size) and fetch ALL buckets (size 0 within the terms aggregation)

Elasticsearch custom function score

i am trying to do a search with custom functions to modify document score.
I have a mapping with specialities stored inside a hospital and every speciality has a priority with it:
Something like:
hospital:{
name: 'Fortis',
specialities: [
{
name: 'Cardiology',
priority: 10
},
{
name: 'Oncology',
priority: 15
}
]
}
Now i have a function score :
functions: [{
filter: {terms: {'specialities.name' => params[:search_text]}},
script_score: {script: "_score * doc['specialities.priority'].value"}
},
I have a filter query to match the search text to any speciality.
Like if i search Oncology, it will match and then I have specified a script_score to take priority of that speciality and add it to final score of document.
But, it is taking the priority of the first speciality it encounters that is 10 and a score of 1 for the filter matched and the end score is 11 not 21 (priority of oncology + 1 for filter match)
I solved it using nested mapping in elasticsearch.
Lucene internally has no concept of storing object mappings by default, so if I am looking to store priority for every speciality I should have a mapping like this:
hospital: {
properties: {
specialities: {
type: nested,
properties: {
name: {
type: 'string'
}priority: {
type: 'long'
}
}
}
}
}
Reference: https://www.elastic.co/guide/en/elasticsearch/reference/2.0/nested.html
After that, I was able to define function score with nested query and my query looks like this:
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"nested": {
"path": "specialities",
"query": {
"function_score": {
"score_mode": "sum",
"boost_mode": "sum",
"filter": {
"terms": {
"specialities.name.raw": ["Oncology"]
}
},
"functions": [
{
"field_value_factor": {
"field": "specialities.priority"
}
}
]
}
}
}
}
]
}
}
}
}

ElasticSearch: Labelling documents with matching search term

I'm using elasticsearch 1.7 and am in need of a way to label documents with what part of a query_string query they match.
I've been experimenting with highlighting, but found that it gets a bit messy with some cases. I'd love to have the document tagged with matching search terms.
Here is the query that I'm using: ( note this is a ruby hash that later gets encoded to JSON )
{
query: {
query_string: {
fields: ["title^10", "keywords^4", "content"],
query: query_string,
use_dis_max: false
}
},
size: 20,
from: 0,
sort: [
{ pub_date: { order: :desc }},
{ _score: { order: :desc }}
]
}
The query_string variable is based off user followed topics and might look something like this: "(the AND walking AND dead) OR (iphone) OR (video AND games)"
Is there any option I can use so that documents returned would have a property matching a search term like the walking dead or (the AND walking AND dead)
If you're ready to switch to using bool/should queries, you can split the match on each field and use named queries, then in the results you'll get the name of the query that matched.
It goes basically like this: in a bool/should query, you add one query_string query per field and name the query so as to identify that field (e.g. title_query for the title field, etc)
{
"query": {
"bool": {
"should": [
{
"query_string": {
"fields": [
"title^10"
],
"query": "query_string",
"use_dis_max": false,
"_name": "title_query"
}
},
{
"query_string": {
"fields": [
"keywords^4"
],
"query": "query_string",
"use_dis_max": false,
"_name": "keywords_query"
}
},
{
"query_string": {
"fields": [
"content"
],
"query": "query_string",
"use_dis_max": false,
"_name": "content_query"
}
}
]
}
}
}
In the results, you'll then get below the _source another array called matched_queries which contains the name of the query that matched the returned document.
"_source": {
...
},
"matched_queries": [
"title_query"
],

Elastic search returning wrong results

I am running a query against elastic search but the results returned are wrong. The idea is that I can check against a range of fields with individual queries. But when I pass the following query, items which don't have the included lineup are returned.
query: {
bool: {
must: [
{match:{"lineup.name":{query:"The 1975"}}}
]
}
}
The objects are events which looks like.
{
title: 'Glastonbury'
country: 'UK',
lineup: [
{
name: 'The 1975',
genre: 'Indie',
headliner: false
}
]
},
{
title: 'Reading'
country: 'UK',
lineup: [
{
name: 'The Strokes',
genre: 'Indie',
headliner: true
}
]
}
In my case both of these events are returned.
The mapping can be seen here:
https://jsonblob.com/567e8f10e4b01190df45bb29
You need to use match_phrase query, match query is looking for either The or 1975 and it find The in The strokes and it gives you that result.
Try
{
"query": {
"bool": {
"must": [
{
"match": {
"lineup.name": {
"query": "The 1975",
"type": "phrase"
}
}
}
]
}
}
}

Resources