Elasticsearch grouping facet by owner, mine vs others - elasticsearch

I am using Elasticsearch to index documents that have an owner which is stored in a userId property of the source object. I can easily do a facet on the userId and get facets for each owner that there is, but I'd like to have the facets for owner show up like so:
Documents owned by me (X)
Documents owned by others (Y)
I could handle this on the client side and take all of the facets returned by elasticsearch and go through them and figure out those owned by the current user and not and display it appropriately, but I was hoping there was a way to tell elasticsearch to handle this in the query itself.

You can use filtered facets to do this:
curl -XGET "http://localhost:9200/_search" -d'
{
"query": {
"match_all": {}
},
"facets": {
"my_docs": {
"filter": {
"term": { "user_id": "my_user_id" }
}
},
"others_docs": {
"filter": {
"not": {
"term": { "user_id": "my_user_id" }
}
}
}
}
}'
One of the nice things about this is that the two terms filters are identical and so are only executed once. The not filter just inverts the results of the cached term filter.

You're right, ElasticSearch has a way to do that. Take a look to scripting term facets, specially to the second example ("using the boolean feature"). You should be able to do somthing like:
{
"query" : {
"match_all" : { }
},
"facets" : {
"userId" : {
"terms" : {
"field" : "userId",
"size" : 10,
"script" : "term == '<your user id>' ? true : false"
}
}
}
}

Related

Elasticsearch: How to filter results with a specific word in a value using elasticsearch

I need to add a parameter to my search that filters results containing a specific word in a value. The query is searching for user history records and contains a url key. I need to filter out /history and any other url containing that string.
Here's my current query:
GET /user_log/_search
{
"size" : 50,
"query": {
"match": {
"user_id": 56678
}
}
}
Here's an example of a record, boiled down to just the value we're looking at:
"_source": {
"url": "/history?page=2&direction=desc",
},
How can the parameters of the search be changed to filter out this result.
You can use the filter param of boolean query in Elasticsearch.
if your url field is of type keyword, you can use the below query
{
"query": {
"bool": {
"must": {
"match": {
"user_id": 56678
}
},
"filter": { --> note filter
"term": {
"url": "/history"
}
}
}
}
}
I found a way to solve my specific issue. Instead of filtering on the url I'm filtering on a different value. Here's what I'm using now:
{
"size" : 50,
"query": {
"bool" : {
"must" : {
"match" : { "user_id" : 56678 }
},
"must_not": {
"match" : { "controller": "History" }
}
}
}
}
I'm still going to leave this question open for a while to see if anyone has other ways of solving the original problem.

Elastic Search Multiple Filter values for the same field

Say that I have to filter cars constructors in a Elastic Search Index (ES 7.15), where the field car_maker is mapped to keyword, having it a limited number of possibilities among car makers string names:
{
"mappings": {
"properties": {
"car_maker": {
"type": "keyword"
}
}
}
}
GET /cars/_search
{
"query": {
"bool": {
"filter": [{
"term": {
"car_maker": "Honda"
}
}]
}
}
}
This, along with a matching query will work ok. The filter will not participate to score calculation as desired.
Now I would like to to filter more car makers for that query (let's say a should query):
{
"query": {
"bool": {
"filter" : [
{"term" : { "car_maker" : "Honda"}},
{"term" : { "car_maker" : "Ferrari"}}
]
}
}
}
this is not going to work. I will have any error from ES query engine, but any result too. Of course is always possibile to apply more filters to different fields like car_maker and car_color, but how to do the opposite: apply more values (Honda, Ferrari, etc.) to the same filter field car_maker like in the example above, without conditioning the score calculation?
You might want to try the following filter query:
{
"query" : {
"bool" : {
"filter" : {
"terms" : {
"car_maker" : ["Honda", "Ferrari"]
}
}
}
}
}

Elasticsearch filter multiple terms with only matching results and not any of them

How I can get only filtered matching results with all the multi term search. I have this sample table where titleid is a mapping int field and personid is a keyword:
titleid:1,personid:a
titleid:3,personid:a
titleid:1,personid:b
titleid:2,personid:b
titleid:1,personid:c
titleid:5,personid:c
The expeted result is:
titleid:1
With a sample query like this one:
{query:
{bool:
{filter:
{must:[
{terms : {fields: {personid:[a,b,c]}}
]
}}}}
I have the following results:
titleid: 1,2,3,5
Maybe this will help, I did the query in sql and got the expected result. What I did was ask the query to give me the sum of titleid that matches the quantity of searched parameters. This is only to be more self explained, the idea is to use elasticsearch.
select titleid
from (
select count(titleid) as title_count, titleid
from table1
where personid in ('a','b','c')
group by titleid
) as vw
where title_count = 3
if you only want records with titleid == 1 AND personid == 'a' you can filter on both fields. only the boolean query uses must, should, and most_not. with a filter since it's filtering (eg, removing) by definition it's a must
"query": {
"bool": {
"filter": [
{
"term": {
"titleId": { "value": 1 }
}
},
{
"term": {
"personid": { "value": "a" }
}
}
]
}
}
UPDATE::
Now your question looks like you want to filter and aggregate your results and then aggregate on those. There's a few metrics and bucket aggregations
Using bucket selector aggregation (this isn't tested but should be very close if not correct)
{
"aggs" : {
"title_id" : {
"filter" : { "terms": { "personid": ["a","b","c"] } },
"aggs" : {
"id_count" : { "count" : { "field" : "titleid" } }
}
},
aggs": {
"count_filter": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "the_doc_count == 3"
}
}
}
}
}
However, be aware that Pipeline aggregations work on the outputs produced from other aggregations, so the overall amount of work that needs to be done to calculate the initial doc_counts will be the same. Since the script parts needs to be executed for each input bucket, the opetation might potentially be slow for high cardinality fields as in thousands of thousands of terms.

how to achieve an exists filter on ES5.0?

The exists filter has been replaced by an exists query in ES5.0.
So how can we achieve, within the same query the equivalent? In other words, we don't want to do two query but just on for various aggregations, including the exists count?
So I want to count the number of time the field "the_field" exists (or is not null)
"aggregation":{
"exists_count":{
"filter":{
"exists":{
"field":"the_field"
}
}
}
}
I think you can use stats aggregation,
{ "aggs" :
{ "time_stats" :
{ "extended_stats" :
{ "field" : "time" }
}
}
}
Look at elastic stats doc
With Elastic 5.0, filters didn't so much get replaced by queries, but combined. Syntactically they look the same, but the context in which you use it determines if it gets interpreted as a query (factors into scoring) or as a filter to simply weed out documents. The below code should achieve exactly what you want:
{
"query": {
"match_all": {}
},
"aggs": {
"field_exists": {
"filter": {
"exists": {
"field": "name"
}
}
}
}
}
The aggregation returned will look something like this, with the doc_count representing the number of documents where the "name field exists. Hope this helps!
{
"aggregations": {
"field_exists": {
"doc_count": 11984
}
}
}

Is it possible to figure out whether a 'user' is 'roaming'

I have a question whether it is possible to write a query to figure out whether user is/were roaming.
I have a type users that has home geo location:
curl -XGET "xxxxxxxxx/users/_mapping?pretty=true"
{
"xxxxx" : {
"mappings" : {
"users" : {
"properties" : {
....
"location" : {
"type" : "geo_point"
},
....
}
}
}
}
}
I also have a type clicks that has a geo location of where click happened and when it happened (eventTimestamp). clicks is also set as being a child of users:
curl -XGET "xxxxxx/clicks/_mapping?pretty=true"
{
"xxxxx" : {
"mappings" : {
"clicks" : {
"_parent" : {
"type" : "users"
},
"_routing" : {
"required" : true
},
"properties" : {
....
"eventTimestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"location" : {
"type" : "geo_point"
},
....
}
}
}
}
}
What i am interested in is getting all the users who were outside of their home locations in the past x days for example.
When i say outside of their home locations, lets say, outside 250 mil radius from their home geo.
any suggestions would be highly appreciated.
I think you'll need to do two queries to accomplish this. First, run a simple query for all users. Then iterate over the results and for each user do a query for clicks that uses a filter that checks if the eventTimestamp is greater than the date x days ago and a geo_distance_range filter to test for click locations greater than 250mi from the current user. This second query might look something like this:
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"and": [
{
"range": {
"eventTimestamp": {"gte": "2015-11-01"}
}
},
{
"geo_distance_filter": {
"gte": "250mi",
"location": {
"lat": <latitude from current user>,
"lon": <longitude from current user>
}
}
}
]
}
}
}
}
The reason you have to use two queries is that Elasticsearch has no way to compare two fields without using a script. Of course, you could try using a script... but I'm not sure if there's a way to calculate geo distance with scripts.
Another option would be to include the eventTimestamp filtering in the first query (using a has_child query to check the clicks made after the given date). Then again iterate over those results and filter this time only by the geo_distance_range.
Hopefully this helps!

Resources