How to get most retrieved items from an elasticsearch index? - elasticsearch

I have uploaded some data to elasticsearch and I would like to keep track of how many times a data point is returned by past searches, that is to say, the most popular searched items.
Does elasticsearch provide such functionality to achieve this without implementing and updating a counter myself?
Cheers.

Related

Get last document from index in Elasticsearch

I'm playing around the package github.com/olivere/elastic; all works fine, but I've a question: is it possible to get the last N inserted documents?
The From statement has 0 as default starting point for the Search action and I didn't understand if is possible to omit it in search.
Tldr;
Although I am not aware of a feature in elasticsearch api to retrieve the latest inserted documents.
There is a way to achieve something alike if you store the ingest time of the documents.
Then you can sort on the ingest time, and retrieve the top N documents.

Filter result in memory to search in elasticsearch from multiple indexes

I have 2 indexes and they both have one common field (basically relationship).
Now as elastic search is not giving filters from multiple indexes, should we store them in memory in variable and filter them in node.js (which basically means that my application itself is working as a database server now).
We previously were using MongoDB which is also a NoSQL DB but we were able to manage it through aggregate queries but seems the elastic search is not providing that.
So even if we use both databases combined, we have to store results of them somewhere to further filter data from them as we are giving users advanced search functionality where they are able to filter data from multiple collections.
So should we store results in memory to filter data further? We are currently giving advanced search in 100 million records to customers but that was not having the advanced text search that elastic search provides, now we are planning to provide elastic search text search to customers.
What do you suggest should we use the approach here to make MongoDB and elastic search together? We are using node.js to serve data.
Or which option to choose from
Denormalizing: Flatten your data
Application-side joins: Run multiple queries on normalized data
Nested objects: Store arrays of objects
Parent-child relationships: Store multiple documents through joins
https://blog.mimacom.com/parent-child-elasticsearch/
https://spoon-elastic.com/all-elastic-search-post/simple-elastic-usage/denormalize-index-elasticsearch/
Storing things client side in memory is not the solution.
First of all the simplest way to solve this problem is to simply make one combined index. Its very trivial to do this. Just insert all the documents from index 2 into index 1. Prefix all fields coming from index-2 by some prefix like "idx2". That way you won't overwrite any similar fields. You can use an ingestion pipeline to do this, or just do it client side. You only will ever do this once.
After that you can perform aggregations on the single index, since you have all the data in one-index.
If you are using somehting other than ES as your primary data-store you need to reconfigure the indexing operation to redirect everything that was earlier going into index-2 to go into index-1 as well(with the prefixed terms).
100 million records is trivial for something like ELasticsearch. Doing anykind of "joins" client side is NOT RECOMMENDED, as this will obviate the entire value of using ES.
If you need any further help on executing this, feel free to contact me. I have 11 years exp in ES. And I have seen people struggle with "joins" for 99% of the time. :)
The first thing to do when coming from MySQL/PostGres or even Mongodb is to restructure the indices to suit the needs of data-querying. Never try to work with multiple indices, ES is not built for that.
HTH.

Elasticsearch slow performance for huge data retrieval with source field

I'm using ElasticSearch to search from more than 10 million records, most records contains 1 to 25 words. I want to retrieve data from it, the method I'm using now is drastically slow for big data retrieval as I'm trying to get data from the source field. I want a method that can make this process faster. I'm free to use other database or anything with ElasticSearch. Can anyone suggest some good Ideas and Example for this?
I've tried searching for solution on google and one solution I found was pagination and I've already applied it wherever it's possible but pagination is not an option when I want to retrieve many(5000+) hits in one query.
Thanks in advance.
Try using scroll
While a search request returns a single “page” of results, the scroll
API can be used to retrieve large numbers of results (or even all
results) from a single search request, in much the same way as you
would use a cursor on a traditional database.

Is it possible to know when some data is available for being searched in Elasticsearch?

I'm implementing a software in which data is sent to some web server, stored in an Elasticsearch and then queried right away. I know that Elasticsearch is a NoSQL following BASE (Basically Available, soft State, eventual consistency) principles which means there's no guarantee when your data will be available for searching.
That's why when I query for the data just being added to Elasticsearch, I have to wait for some time before it is found. Right now all I can do is to implement a polling mechanism to detect when data is completely applied. It is worth mentioning that if I'm using _id to retrieve a document, it is found right away. But if I'm searching for it using some type of Elasticsearch query (like term or query_string), it will take a while before the document is found.
So my question is: Is there a cheaper way to detect when data is completely indexed in Elasticsearch?
This part is done by the Refresh API, this API does not provide a way to know when the indexed data is available. But the folks of elastic are working in a hack to let the request wait for a refresh.
I think should be better if you take a look here: https://www.elastic.co/blog/refreshing_news
This post have a good overview of the issues and the stuffs that they are working to improve.
Hope it help :D

Can I narrow results from Elastic Search _stats get?

I am using elastic search for the project I'm working on and I was wondering if there was a way to narrow the results I get from an indices stats search.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html
I currently use the docs to narrow the data I get back about the indices but now I want to only get back ones with a doc count greater than 0. Does anyone know if this is possible or how to?
Thanks!
For elastic search 1.5.2
If you're concerned about the size of the response (i.e. if you many many indices with many shards), the best you can do is to use response filtering (available only since ES 1.7) and only retrieve the docs field that you can further filter on the client-side:
curl 'localhost:9200/_stats/docs?pretty&filter_path=**.docs.count'

Resources