Store website URL hit count in elastic search - elasticsearch

I want to keep a record of pages requested by users. Additionally, I have to store count of each page requested. I am currently storing my website page visits of users by updating an index in elastic search.
I do this by updating a document which is similar to
{
userid : 'id1234',
url : 'website.com/url-1',
count : 23,
}
Here, count of '23' is the total number of time the URL was requested by user with id 'id1234'.
To achieve this, I retrieve the document, increment the present count, and re-push again. My questions is that is it possible to do this with a single query?
I saw a similar approach using scripts here.
Can we do this without scripts?

Elasticsearch is not well suited for Updates. So, even if it was possible to do an update like this, it was first deleting the record, then adding it (the whole document) and reindexing.
Probably the closest thing here is using partial update feature:
Here is an example from a documentation:
POST /metrics/users/1/_update
{
"script" : "ctx._source.count+=1"
}
But you've mentioned it in the question ( The link to the relevant document is available here)
But if you were using scripts, the problem still is that it's relatively slow

Related

Fetch data less than a score in Elasticsearch

I am trying to make an Instagram like Explore page using Elasticsearch. The contents are scored based on time as well as number of likes. Since, the content likes are frequently updated, pagination is difficult using From/Size and Search After. Suppose, I fetched first 10 posts using From 0, Size 10. Another 10 posts scored more likes by the time I'm trying to fetch the second page in pagination. Now, I have the same posts that I fetched in first pagination at positions 10 to 20. This will create lot of duplicate in my explore page.
I am more concerned about avoiding duplicates in pagination than missing some content, because if the user refresh explore page, the top contents will be displayed again. The best way I think is to fetch all posts below a particular score. Is there anything like a max_score api. If not, how can i solve this problem?

Elasticsearch : search for sets of items instead of items

I created a website where I log users actions: visit page, download document, log in, etc. Each action is timestamped, attached to a user and indexed in Elasticsearch
I would like to recognize predefined patterns in thoses actions. eg:
find users who visited this page, this other page and downloaded 2 documents in the last 3 weeks
find users who logged in and visited at least 5 pages in the same day
The problem I have is I always used ES to find items that match criterias but never to find set of items.
How would you start to solve this problem ?
Thank you for your help.
For the second query I would suggest aggregations (like SQL GROUP BY): count the number of page visits aggregated per user and day.
And then add conditions on these aggregated results (like SQL HAVING)
To filter on aggregation results I found this (not tested or tried to understand:):
https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline-bucket-selector-aggregation.html
Hope it helps

Couchdb get the changed document with each change notification

I'm quite sure that I want to be notified with the inserted document by each insertion in the couch db.
something like this:
http://localhost:5058/db-name/_chnages/_view/inserted-document
And I like the response to be something like the following:
{
"id":"0552065465",
"name":"james"
.
.
.
}
Reconnecting to the database for giving the actual document by each notification can cause performance issues.
Can I define a view that return the actual document by each change?
There are 3 possible way to define if a document was just added:
You add a status field to your document with a specific status for new documents.
If the revision starts with a 1- but it's not 100% accurate according to this if you do replication.
In the changes response, check if the number of revision of the document is equal to one. If so, it means it was just added(best solution IMO)
If you want to query the _changes endpoint and directly get the newly inserted documents, you can use the approach #1 and use a filter function that only returns documents with status="new".
Otherwise, you should go with approach #3 and filter the _changes responses locally. Eg: your application would receive all changes and only handle documents with revisions array count equal to 1.
And as you mentioned, you want to receive the document, not only the _id and the _rev. To do so, you can simply add the query parameter: include_docs=true

Elasticsearch Jest update a whole document

I have an elasticsearch server which i'm accessing via a java server using the Jest client and i was looking for the best way to update multiple fields of a document each time.
I have looked to the documentation so far, and i have found that there are two way for doing it :
Partial update via a script : i don't think it is suitable for multiple field update (because i don't know the modified fields).
Whole document update: via re-indexing the whole document.
My question is how could i update the whole document knowing that Jest provide only update via a script?
Is it the best way to delete a document and indexing the updated version?
Already answered this in the github issue you also opened but again:
You should use the second way you linked (Whole document update) and there is no special API for it, it's just a regular index request. So you can do it simply by sending your Index request against the id of the document you want to update.
For example assuming you have below document already indexed in Elasticsearch within index people, type food, id 9:
{"user": "kramer", "fav_food": "jello"}
Then you would do:
String source = "{\"user\": \"kramer\", \"fav_food\": \"pizza\"}";
JestResult result = client.execute(
new Index.Builder(source)
.index("people")
.type("food")
.id(9)
.build()
);

Updating filtered documents in elasticsearch

I want to know if there is a way to update elasticsearch documents after filtering them out.
Let's say I have a user collection with following documents:
[
{ "name":"u1","age":23},
{ "name":"u2","age":31},
{ "name":"u3","age":27},
{ "name":"u4","age":33}
]
Now what I need to do is update the names of all the users who have ages above 30.
Looking at a lot of documentation and searching for hours on google, including the following document
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_updating_documents.html
I couldn't find a way to do it. So if we look into the docs, we are providing the id of the document, so it doesn't suite my need. Is there a way to do this sort do this sort of stuff in Elasticsearch?
From the link you provided:
Note that as of this writing, updates can only be performed on a
single document at a time. In the future, Elasticsearch will provide
the ability to update multiple documents given a query condition (like
an SQL UPDATE-WHERE statement).
So, this is not supported at the moment. But you can consider taking a look at this plugin: https://github.com/yakaz/elasticsearch-action-updatebyquery/.

Resources