I'm playing around the package github.com/olivere/elastic; all works fine, but I've a question: is it possible to get the last N inserted documents?
The From statement has 0 as default starting point for the Search action and I didn't understand if is possible to omit it in search.
Tldr;
Although I am not aware of a feature in elasticsearch api to retrieve the latest inserted documents.
There is a way to achieve something alike if you store the ingest time of the documents.
Then you can sort on the ingest time, and retrieve the top N documents.
Related
For Kibana server decommissioning purposes, I want to get a list of index patterns which never had any single document and had documents.
How to achieve this using Kibana only?
I tried this but it doesn't give the list based on the document count.
GET /_cat/indices
Also in individual level getting the count to check the documents are there is time consuming .
GET index-pattern*/_count
You can try this. V is for verbose and s stands for sort.
GET /_cat/indices?v&s=store.size:desc
From the docs :
These metrics are retrieved directly from Lucene, which {es} uses internally to power indexing and search. As a result, all document counts include hidden nested documents.
Is there a way to get the date and time that an elastic search document was written?
I am running es queries via spark and would prefer NOT to look through all documents that I have already processed. Instead I would like read the only documents that were ingested between the last time the program ran and now.
What is the best most efficient way to do this?
I have looked at;
updating to add a field with an array with booleans for if its been looked at by which analytic. The negative is waiting for the update to occur.
index per time frame method, which would be to break down the current indexes into smaller ones so by hour.The negative I see is the number of open file descriptors.
??
Elasticsearch version 5.6
I posted the question on the elasticsearch discussion board and it appears using the ingest pipeline is the best option.
I am running es queries via spark and would prefer NOT to look through
all documents that I have already processed. Instead I would like read
the only documents that were ingested between the last time the
program ran and now.
A workaround could be :
While inserting data using Logstash to Elasticsearch, Logstash appends a #timestamp key to the document which represents the time (in UTC) at which the document is created or we can use an ingest pipline
After that we can query based on the timestamp.
For more on this please have a look at :
Mapping changes
There is no way to ask ES to insert a timestamp at index time
Elasticsearch doesn't have such functionality.
You need manually save with each document date. In this case you will be able to search by date range.
I have the following elasticsearch setup:
4 to 6 small-ish indices (<5 million docs, <5Gb each)
they are unioned through an alias
they all contain the same doc type
they change very infrequently (i.e. >99% of the indexing happens when the index is created)
One of the use cases for my app requires to read all documents for the alias, ordered by a field, do some magic and serve the result.
I understand using deep pagination will most likely bring down my cluster, or at the very least have dismal performance so I'm wondering if the scroll API could be the solution. I know the documentation says it is not intended for use in real-time user queries, but what are the actual reasons for that?
Generally, how are people dealing with having to read through all the documents in an index? Should I look for another way to chunk the data?
When you use the scroll API, Elasticsearch creates a sort of a cursor for the current state of the index, so the reason for it not being recommended for real time search is because you will not see any new documents that were inserted after you created the scroll token.
Since your use case indicates that you rarely update or insert new documents into your indices, that may not be an issue for you.
When generating the scroll token you can specify a query with a sort, so if your documents have some sort of timestamp, you could create one scroll context for all documents with timestamp: { lte: "now" } and another scroll (or every a simple query) for the rest of the documents that were not included in the first search context by specifying a certain date range filter.
I am using elastic search for the project I'm working on and I was wondering if there was a way to narrow the results I get from an indices stats search.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html
I currently use the docs to narrow the data I get back about the indices but now I want to only get back ones with a doc count greater than 0. Does anyone know if this is possible or how to?
Thanks!
For elastic search 1.5.2
If you're concerned about the size of the response (i.e. if you many many indices with many shards), the best you can do is to use response filtering (available only since ES 1.7) and only retrieve the docs field that you can further filter on the client-side:
curl 'localhost:9200/_stats/docs?pretty&filter_path=**.docs.count'
I will be getting documents from a filtered query (quite a lot of documents). I will then immediately create an index from them (in Python, using requests to directly query the REST API), without any modification.
Is it possible to make this operation directly on the server, without the round-trip of data to the script and back?
Another question was similar (in the intent) and the only answer is to go via Logstash (equivalent to using my code, though possibly more efficient)
refer http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/reindex.html
in short what you need to do is
0.) ensure you have _source set to true
1.) use scan and scroll API , pass your filtered query with search type scan,
2.)fetch documents using scroll id
2.) bulk index the result using the source field which returns you the json used to index data
refer:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
guide/en/elasticsearch/guide/current/bulk.html
guide/en/elasticsearch/guide/current/reindex.html
es 2.3 has an experimental feature that allows reindex from a query
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html