I have data with the following format::
{
timestamp: Date,
x: number
}
I want to display these values simply in a line, without any aggregation over x, but in Kibana it always requires me to select some kind of aggregation, like average.
It is possible to create the line-chart that you request, but for Kibana to create an visualization, I'm afraid an aggregation would be necessary.
Kibana basis its visualization on buckets (Date, x-axis) and metrics (x, y-axis). Buckets are aggregations of documents over a specified search (almost 30 aggregation methods)
. Metrics are value(s) based on the documents contained in each bucket (almost 20 aggregation methods)
(https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html).
However, you could try to create buckets with 'date_histogram' for which the time interval is small enough so it contain one document. Then for the metric aggregation you could select min or max aggregation (Note: This assumes though that you timestamp is unique for each document).
Related
I know how Elasticsearch index words and strings, but I wonder if there's a different behaviour for timestamps?
We have internal elasticsearch instance that index events ( millions of events per day).
I want to pull once in X seconds all the events that we received in the last X seconds.
Does Elasticsearch index the timestamp in efficient way such that we don't need to traverse all the documents to return the relevant results? How it index this data?
Anything numeric, like date fields, integer fields, geo fields, etc, are not stored in the inverted index, but in BKD trees (since ES 5), which are especially suited for range queries and finding collection of unordered docIDs that meet the time range conditions.
If I have set of documents as a response from elastic search,
how can I aggregate the results based on the score? the results should have two buckets , where the first bucket has documents whose score greater than 1 and the other less than 1.
I am new to elastic search, have seen that I can use script for this, but could not get that working.
I have terms aggregation and I need sort result buckets by another field (date). Or I need to add 2 sub aggregations with max (and top hit) and min (and top hit).
I didn't find any API that allows me to do this.
I think I can add max subAggregation with top hit for the main terms aggregation, and create another terms aggregation with min with top hits sub aggregation, but it will be so heavy job.
Suppose I have an index for cars on a dealer's car lot. Each document resembles the following:
{
color: 'red',
model_year: '2015',
date_added: '2015-07-20'
}
Suppose I have a million cars.
Suppose I want to present a view of the most recently added 1000 cars, along with facets over those 1000 cars.
I could just use from and size to paginate the results up to a fixed limit of 1000, but in doing so the totals and facets on model_year and color (i.e. aggregations) I get back from Elasticsearch aren't right--they're over the entire matched set.
How do I limit my search to the most recently added 1000 documents for pagination and aggregation?
As you probably saw in the documentation, the aggregations are performed on the scope of the query itself. If no query is given, the aggregations are performed on a match_all list of results. Even if you would use size at the query level, it will still not give you what you need because size is just a way of returning a set of documents from all the documents the query matched. Aggregations operate on what the query matches.
This feature request is not new and has been asked for before some time ago.
In 1.7 there is no straight forward solution. Maybe you can use the limit filter or terminate_after in-body request parameter, but this will not return the documents that were, also, sorted. This will give you the first terminate_after number of docs that matched the query and this number is per shard. This is not performed after the sorting has been applied.
In ES 2.0 there is, also, the sampler aggregation which works more or less the same way as the terminate_after is working, but this one takes into consideration the score of the documents to be considered from each shard. In case you just sort after date_added and the query is just a match_all all the documents will have the same score and it will be returning an irrelevant set of documents.
In conclusion:
there is no good solution for this, there are workarounds with number of docs per shard. So, if you want 1000 cars, then you need to take this number divide it by the number of primary shards, use it in sampler aggregation or with terminate_after and get a set of documents
my suggestion is to use a query to limit the number of documents (cars) by a different criteria instead. For example, show (and aggregate) the cars in the last 30 days or something similar. Meaning, the criteria should be included in the query itself, so that the resulting set of documents to be the one you want it aggregated. Applying aggregations to a certain number of documents, after they have been sorted, is not easy.
I have a series of JSON documents like {"type":"A", "value": 2}, {"type":"B"," value":3}, and {"type":"C","value":7} and I feed that into elastic search.
Let's say I want to do one query to avg value all documents with "type": "A"
What is the difference between how elastic search calculates the count vs how let's say Mongo would?
Is Elastic search:
Automatically creating a "rolling count" for all those types and
incrementing the something like "typeA_sum", "typeA_count" "typeA_avg" as new
data is fed in? If so that would be sweet, because then it's not
actually having to calculate anything.
Is it just creating an
index over type and actually calculate the sum each time the query
is ran?
Is it doing #2 in the background (i.e. precalculating)
and just updating some cache value so when the query runs it has the
result pretty quickly?
It is closest to your #2, however the results are cached, so that if the results are useful in a subsequent query that will be very quick. There is no way Elasticsearch could know beforehand what query you are going to run, so #1 is impossible, and #3 would be wasteful.
However, for your example use case you probably do not need two queries, one would be enough. See for instance the stats aggregation that will return count, min, max, average and sum. Combine that with a terms aggregation (and perhaps a missing aggregation) to group the documents on your type field, and you'll get count and average (and min, max, sum) for all types separately with a single query.