Why unique count in kibana visualize chart is incorrect? - elasticsearch

my kibana version is 4.5.
my elastic version is 2.3.1.
see the pic1 .the uv is 7665.
but see the pic2.the uv is 7845.
why diffrent ?
kibana unique count seeing not correct.

If these charts are based on live data, then I doubt both the graphs cannot show the same count since you're having two different time-range in both the graphs.
In the first one your time range is yesterday, where as in the second one your trying to have an auto-refresh every minute which shows as paused. I'm assuming that you're dealing with live data so that some records might have slipped through, by the time you paused. If not I cannot see any chances of these two showing two different values.
Just being curious, how do you know that the correct count for uv should be 7665 since I can't see the exact value of uv from the snapshot of the graph? Did you double check from your ES indice through a query?
EDIT:
Interestingly Unique counts are based on the cardinality aggregation, which is designed to work efficiently across very large amounts of data and delivers an approximate result, which may why your results vary. You can maybe try increasing the precision_threshold.
To get a more correct value, add a something like: {"precision_threshold": 1000} to the "JSON Input" box for the aggregation.
Hope this helps!

Related

How to sort records by distance using Algolia?

Using Algolia Search, I’m trying sort my records by distance without filtering them by a radius but I fail to do so. I can successfully filter by a radius by setting aroundLatLng and aroundRadius in my query. But records are not sorted in the result. According to this part of Algolia's documentation, a ranking is created based on the distance from the central point. So, records must be sorted by distance if there aren't any sorting in the query. According to this, to sort by distance without filtering, one needs to set aroundRadius to "all". I've tried setting aroundRadius to "all" but it didn't change anything.
Also in this part of the docs, it’s stated that geo criterion must be present in the ranking formula to sort by distance and I confirmed that it is present. So what could be the problem here? Any help would be appreciated. Note that there is location info in every record with the key "_geoloc".
If filtering is working as expected for aroundLatLng, then it's safe to assume you have the _geoloc data populated appropriately for your records.
Since you want to sort by geo proximity, this sounds like a ranking issue.
I'm unfamiliar with setting aroundRadius to "all" -- the engine will need a lat/long to compare distances for the ranking, so you'll need a specific set of coordinates for the search (that's why geo ranking doesn't work for area searches insideBoundingBox).
If you're supplying specific coordinates in aroundLatLng and still not getting the geo ranking you expect (and you've confirmed geo is a ranking criteria) -- is it possible the default precision of 10 meters doesn't work well with your data set? Maybe tweaking the precision would help here?
If sort/ranking config and geo precision aren't the problem, I would recommend adding getRankingInfo=true to your query. This will return all of the ranking criteria evaluated against each record -- you can confirm that the index took _geoloc into account.
If none of this helps -- maybe drop me your index configuration and the query you're executing so I can dig in further.
https://www.algolia.com/doc/guides/managing-results/refine-results/geolocation/how-to/geo-ranking-info/
The problem was that the default ranking criteria did not exist. In Ranking and Sorting section of my index, I've reset settings to default and the default ranking criteria appeared. After that, I've been able sort records by distance. Below is a screenshot that shows the Ranking and Sorting section after resetting settings to default.

Elastic search calculation with data from different indexes

Good day, everyone. I have a lit bit strange case of using elastic search for me.
There are two different indexes, each index contain one data type.
First type contains next important for this case data:
keyword (text,keyword),
URL (text,keyword)
position (number).
Second type contains next data fields:
keyword (text,keyword)
numberValue (number).
I need to do next things:
1.Group data from the first ind by URL
2.For each object in group calculate new metric (metric A) by next simple formula: position*numberValue*Param
3.For each groups calculate sum of elements metric A we have calculated on stage 1
4.Order by desc result groups by sums we have calculated on stage 3
5.Take some interval of result groups.
Param - param, i need to set for calculation, this is not in elastic.
That is not difficult algorithm, but data in different indices, and i don`t know how to do it fast, and i prefer to do it on elastic search level.
I don`t know how to make effective data search or pipeline of data processing which can help me to implement this case.
I use ES version 6.2.3 if it is important.
Give me some advice, please, how can i implement this algorithm.
By reading 2. you seem to assume keyword is some sort of primary key. Elasticsearch is not an RDB and can only reason over one document at a time, so unless numberValue and position are (indexed) fields of the same document you can't combine them.
The rest of the items seem to be possible to achieve with the help of Aggregation

Kibana graphing just the difference of a metric instead of total

I was just wondering if anybody knew of a way to be able to show a graph of the difference of metrics like system.network.in.bytes -
If you look at this graph you can just see that the value continuously gets bigger (at around the same speed) - but I just want to graph the difference between each value not the total.
Example
Anyone have any ideas?
Try a timeseries visualization or timelion.
Assuming your field name is 'bytesIn' (for simplicity) and taking 1 minute intervals (as IMO 30s isn't possible in timelion), your timelion expression should look something like:
.es(*,metric='avg:bytesIn').subtract(.es(*,metric='avg:bytesIn',offset='-1m'))
Explanation
.es(*,metric='avg:bytesIn') gives average of bytesIn over a time interval (here I'm assuming 1m)
Adding offset='-1m', offsets the series retrieval by -1m as if they are happening now
.subtract just subtracts value of one series from another

How to create value over time line chart in Kibana 4?

I'm facing a following problem. In Kibana 4 I've created a line chart based on my input from elasticeasrch but I can only display average, min, max instead of an actual value of the field per time, e.g. sent bytes.
Most answears to that question on stackoverflow are about Kibana 3 (How to create value over time chart with Kibana 3?) and seem to include a Histogram on a X axis, yet I can't seem to find one which will enable me to apply them to Kibana 4. I was unable to find the histogram panel and once I click on the discover tab there is the constant Searching loading.
If I have the following fields in my _source:
{"timestamp":"2015-06-02T10:16:44.0855","time":587,"threadName":"Thread Group 1-957","byte":1372,"status":"false","latence":306,"registerCall":"404"}
and I would like to have the number of bytes on the Y-axis and on the X-axis my timestamp.
Any help in the right direction will be appreciated :)
To create a value over time line chart in Kibana, follow these steps:
Go to visualize tab and select line chart
In the X-axis, select X-axis, Aggregation as Date Histogram and then select your timestamp field as the date field.
Next for the Y-Axis, select Sum as the aggregation and then bytes as the field.
For the X axis, what Alcanzar said is good, but as you notice, the Y axis is problematic.
Sum (suggested by "Limit") works, but since it's aggregated, it shows the total used in each aggregated bucket, but that may be meaningless depending on what you are trying to show. Your question isn't clear on what you want, so I'm just guessing here. One hour of requests, each of which ran for one minute and sent 1 megabyte is indeed 60 megabytes-minutes, if you are trying to show total capacity used over than hour (maybe you are paying a bill based on usage per time). On the other hand, if you are trying to show peak usage in each time, it would be wrong.
You said you already looked and Max and Min and they don't meet your needs. I don't suppose Standard Deviation would be any better?
I have the same concern. The best I've been able to do so far is
display Min and Max simultaneously in the Y axis. When they diverge, I know I'm zoomed out too far, so I zoom in until they align.
This is how I know I'm seeing individual events.
In any case, I share your frustration. I too would like to be able to show time series as easily as I can in, say, Excel.

trend of ratio in kibana 4.0

I have documents under two daily indexes. Both have count field which is >=1.
I want to create a graph which shows trend of ratio of these two fields aggregated over time.
Data will be sampled based on time duration selected in dashboard ex : for one day each sample would be be 10 min which will sum these two fields separately and calculate ratio and then show as one data point. So for 24 hours it would be 24*60 point in the graph.
How can I achieve same in Kibana 4 ?
We tried something similar but turns out it is not possible in Kibana.
As of now you can not plot a calculated field based on two different fields in Kibana.
To workaround this, we implemented a plugin that modifies data before it is pumped to elastic search. So we carried out calculations in that plugin. Also, the plugin periodically pumps data to elastic search so kibana gets the latest values

Resources