Is it possible to figure out whether a 'user' is 'roaming' - elasticsearch

I have a question whether it is possible to write a query to figure out whether user is/were roaming.
I have a type users that has home geo location:
curl -XGET "xxxxxxxxx/users/_mapping?pretty=true"
{
"xxxxx" : {
"mappings" : {
"users" : {
"properties" : {
....
"location" : {
"type" : "geo_point"
},
....
}
}
}
}
}
I also have a type clicks that has a geo location of where click happened and when it happened (eventTimestamp). clicks is also set as being a child of users:
curl -XGET "xxxxxx/clicks/_mapping?pretty=true"
{
"xxxxx" : {
"mappings" : {
"clicks" : {
"_parent" : {
"type" : "users"
},
"_routing" : {
"required" : true
},
"properties" : {
....
"eventTimestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"location" : {
"type" : "geo_point"
},
....
}
}
}
}
}
What i am interested in is getting all the users who were outside of their home locations in the past x days for example.
When i say outside of their home locations, lets say, outside 250 mil radius from their home geo.
any suggestions would be highly appreciated.

I think you'll need to do two queries to accomplish this. First, run a simple query for all users. Then iterate over the results and for each user do a query for clicks that uses a filter that checks if the eventTimestamp is greater than the date x days ago and a geo_distance_range filter to test for click locations greater than 250mi from the current user. This second query might look something like this:
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"and": [
{
"range": {
"eventTimestamp": {"gte": "2015-11-01"}
}
},
{
"geo_distance_filter": {
"gte": "250mi",
"location": {
"lat": <latitude from current user>,
"lon": <longitude from current user>
}
}
}
]
}
}
}
}
The reason you have to use two queries is that Elasticsearch has no way to compare two fields without using a script. Of course, you could try using a script... but I'm not sure if there's a way to calculate geo distance with scripts.
Another option would be to include the eventTimestamp filtering in the first query (using a has_child query to check the clicks made after the given date). Then again iterate over those results and filter this time only by the geo_distance_range.
Hopefully this helps!

Related

Elasticsearch: How to filter results with a specific word in a value using elasticsearch

I need to add a parameter to my search that filters results containing a specific word in a value. The query is searching for user history records and contains a url key. I need to filter out /history and any other url containing that string.
Here's my current query:
GET /user_log/_search
{
"size" : 50,
"query": {
"match": {
"user_id": 56678
}
}
}
Here's an example of a record, boiled down to just the value we're looking at:
"_source": {
"url": "/history?page=2&direction=desc",
},
How can the parameters of the search be changed to filter out this result.
You can use the filter param of boolean query in Elasticsearch.
if your url field is of type keyword, you can use the below query
{
"query": {
"bool": {
"must": {
"match": {
"user_id": 56678
}
},
"filter": { --> note filter
"term": {
"url": "/history"
}
}
}
}
}
I found a way to solve my specific issue. Instead of filtering on the url I'm filtering on a different value. Here's what I'm using now:
{
"size" : 50,
"query": {
"bool" : {
"must" : {
"match" : { "user_id" : 56678 }
},
"must_not": {
"match" : { "controller": "History" }
}
}
}
}
I'm still going to leave this question open for a while to see if anyone has other ways of solving the original problem.

Elastic(search): How to structure nested queries correctly?

I'm currently quite confuse about the structuring of queries in elastic. Let me explain what I mean with the following template that works fine for me:
{
"template" : {
"query" : {
"filtered" : {
"query" : {
"bool" : {
"must" : [
{ "match" : {
"user" : "{{param_user}}"
} },
{ "match" : {
"session" : "{{param_session}}"
} },
{ "range" : {
"date" : {
"gte" : "{{param_from}}",
"lte" : "{{param_to}}"
}
} }
]
}
}
}
}
}
}
Ok so I want to get entries of a specific session of a user in a certain time period. Now if you take a llok at this link http://www.elastic.co/guide/en/elasticsearch/guide/current/combining-filters.html you can find the following query:
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"should" : [
{ "term" : {"price" : 20}},
{ "term" : {"productID" : "XHDK-A-1293-#fJ3"}}
],
"must_not" : {
"term" : {"price" : 30}
}
}
}
}
}
}
In this example we have right after the "filtered" the "filter" keyword. However if I exchange my second "query" with a "filter" as in the example , my template won't work anymore. This is really counterintuitive and I payed alot of time to figure this out. A̶l̶s̶o̶ ̶I̶ ̶d̶o̶n̶'̶t̶ ̶u̶n̶d̶e̶r̶s̶t̶a̶n̶d̶ ̶w̶h̶y̶ ̶w̶e̶ ̶n̶e̶e̶d̶ ̶t̶o̶ ̶p̶u̶t̶ ̶e̶v̶e̶r̶y̶ ̶f̶i̶l̶t̶e̶r̶ ̶i̶n̶ ̶s̶e̶p̶a̶r̶a̶t̶e̶ ̶̶{̶ ̶}̶̶ ̶e̶v̶e̶n̶ ̶t̶h̶o̶u̶g̶h̶ ̶t̶h̶e̶y̶ ̶a̶r̶e̶ ̶a̶l̶r̶e̶a̶d̶y̶ ̶s̶e̶p̶a̶r̶a̶t̶e̶d̶ ̶b̶y̶ ̶t̶h̶e̶ ̶a̶r̶r̶a̶y̶ ̶s̶y̶n̶t̶a̶x̶.̶
Another issue I had was that I suggested to match several fields I can just type smth like:
{
"query" : {
"match" : {
"user" : "{{param_user}}",
"session" : "{{param_session}}"
}
}
}
but it seemed that I have to use a bool query which I didn't know of, so I searched for 'elastic multi match' but got something completely different.
My question: where can I find how to structure a query properly (smth like a PEG)? The documentation only give basic examples but doesn't state what we can actually do and how.
Best regards,
Jan
Edit: Ok I just found by accident that I cannot exchange "query" with "filter" as "match" is a query and not a filter. But then again what about "range"? It seems to be a query as well as a filter... Is there a summary of keywords specifying in which context they can be used?
Is there a summary of keywords specifying in which context they can be used?
I wouldn't consider that as keywords. It's just there are both queries and filters with the same names (but not all of them).
Here is everything you need. For example there are both range query and filter. All you need is to understand the difference between filters and queries.
For example, if you want to move range section from query to filter, you can do that like shown in the code below (not tested). Since your code already contains filtered type of query, you can just create filter section right after query section.
{
"template": {
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"user": "{{param_user}}"
}
},
{
"match": {
"session": "{{param_session}}"
}
}
]
}
},
"filter": {
"range": {
"date": {
"gte": "{{param_from}}",
"lte": "{{param_to}}"
}
}
}
}
}
}
}
Just remember that you can filter only not analyzed fields.

Elasticsearch query on array index

How do I query/filter by index of an array in elasticsearch?
I have a document like this:-
PUT /edi832/record/1
{
"LIN": [ "UP", "123456789" ]
}
I want to search if LIN[0] is "UP" and LIN[1] exists.
Thanks.
This might look like a hack , but then it will work for sure.
First we apply token count type along with multi field to capture the the number of tokens as a field.
So the mapping will look like this -
{
"record" : {
"properties" : {
"LIN" : {
"type" : "string",
"fields" : {
"word_count": {
"type" : "token_count",
"store" : "yes",
"analyzer" : "standard"
}
}
}
}
}
}
LINK - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#token_count
So to check if the second field exists , its as easy as checking if this field value is more than or equal to 2.
Next we can use the token filter to check if the token "up" exists in position 0.
We can use the scripted filter to check this.
Hence a query like below should work -
{
"query": {
"filtered": {
"query": {
"range": {
"LIN.word_count": {
"gte": 2
}
}
},
"filter": {
"script": {
"script": "for(pos : _index['LIN'].get('up',_POSITIONS)){ if(pos.position == 0) { return true}};return false;"
}
}
}
}
}
Advanced scripting - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html
Script filters - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html

Elasticsearch grouping facet by owner, mine vs others

I am using Elasticsearch to index documents that have an owner which is stored in a userId property of the source object. I can easily do a facet on the userId and get facets for each owner that there is, but I'd like to have the facets for owner show up like so:
Documents owned by me (X)
Documents owned by others (Y)
I could handle this on the client side and take all of the facets returned by elasticsearch and go through them and figure out those owned by the current user and not and display it appropriately, but I was hoping there was a way to tell elasticsearch to handle this in the query itself.
You can use filtered facets to do this:
curl -XGET "http://localhost:9200/_search" -d'
{
"query": {
"match_all": {}
},
"facets": {
"my_docs": {
"filter": {
"term": { "user_id": "my_user_id" }
}
},
"others_docs": {
"filter": {
"not": {
"term": { "user_id": "my_user_id" }
}
}
}
}
}'
One of the nice things about this is that the two terms filters are identical and so are only executed once. The not filter just inverts the results of the cached term filter.
You're right, ElasticSearch has a way to do that. Take a look to scripting term facets, specially to the second example ("using the boolean feature"). You should be able to do somthing like:
{
"query" : {
"match_all" : { }
},
"facets" : {
"userId" : {
"terms" : {
"field" : "userId",
"size" : 10,
"script" : "term == '<your user id>' ? true : false"
}
}
}
}

Trouble with has_parent query containing scripted function_score

I have two document types, in a parent-child relationship:
"myParent" : {
"properties" : {
"weight" : {
"type" : "double"
}
}
}
"myChild" : {
"_parent" : {
"type" : "myParent"
},
"_routing" : {
"required" : true
}
}
The weight field is to be used for custom scoring/sorting. This query directly against the parent documents works as intended:
{
"query" : {
"function_score" : {
"script_score" : {
"script" : "_score * doc['weight'].value"
}
}
}
}
However, when trying to do similar scoring for the child documents with a has_parent query, I get an error:
{
"query" : {
"has_parent" : {
"query" : {
"function_score" : {
"script_score" : {
"script" : "_score * doc['weight'].value"
}
}
},
"parent_type" : "myParent",
"score_type" : "score"
}
}
}
The error is:
QueryPhaseExecutionException[[myIndex][3]: query[filtered(ParentQuery[myParent](filtered(function score (ConstantScore(:),function=script[_score * doc['weight'].value], params [null]))->cache(_type:myParent)))->cache(_type:myChild)],from[0],size[10]: Query Failed [failed to execute context rewrite]]; nested: ElasticSearchIllegalArgumentException[No field found for [weight] in mapping with types [myChild]];
It seems like instead of applying the scoring function to the parent and then passing its result to the child, ES is trying to apply the scoring function itself to the child, causing the error.
If I don't use score for score_type, the error doesn't occur, although the results scores are then all 1.0, as documented.
What am I missing here? How can I query these child documents with custom scoring based on a parent field?
This I would say is a bug: it is using the myChild mapping as the default context, even though you are inside a has_parent query. But I'm not sure how easy the bug would be to fix. properly.
However, you can work around it by including the type name in the full field name:
curl -XGET "http://localhost:9200/t/myChild/_search" -d'
{
"query": {
"has_parent": {
"query": {
"function_score": {
"script_score": {
"script": "_score * doc[\"myParent.weight\"].value"
}
}
},
"parent_type": "myParent",
"score_type": "score"
}
}
}'
I've opened an issue to see if we can get this fixed #4914
I think the problem is that you are trying to score child documents based on a field in the parent document and that the function score should really be the other way round.
To solve the problem my idea would be to store the parent/child relation and the score with the child documents. Then you would filter for child documents and score them according to the weight in the child document.
An example:
"myParent" : {
"properties" : {
"name" : {
"type" : "string"
}
}
}
"myChild" : {
"_parent" : {
"type" : "myParent"
},
"_routing" : {
"required" : true
},
"properties": {
"weight" : {
"type" : "double"
}
}
}
Now you could use a has_parent filter to select all child documents that have a certain parent and then score them using the function score:
{
"query": {
"filtered": {
"query": {
"function_score" : {
"script_score" : {
"script" : "_score * doc['weight'].value"
}
}
},
"filter": {
"has_parent": {
"parent_type": "myParent",
"query": {
"term": {
"name": "something"
}
}
}
}
}
}
}
So if parent documents were blog posts and child comments, then you could filter all posts and score the comments based on weight. I doubt that scoring childs based on parents is possible though I might be wrong :)
Disclaimer: 1st post to stack overflow...

Resources