How to retrieve unique count of a field using Kibana + Elastic Search - elasticsearch

Is it possible to query for a distinct/unique count of a field using Kibana? I am using elastic search as my backend to Kibana.
If so, what is the syntax of the query? Heres a link to the Kibana interface I would like to make my query: http://demo.kibana.org/#/dashboard
I am parsing nginx access logs with logstash and storing the data into elastic search. Then, I use Kibana to run queries and visualize my data in charts. Specifically, I want to know the count of unique IP addresses for a specific time frame using Kibana.

For Kibana 4 go to this answer
This is easy to do with a terms panel:
If you want to select the count of distinct IP that are in your logs, you should specify in the field clientip, you should put a big enough number in length (otherwise, it will join different IP under the same group) and specify in the style table. After adding the panel, you will have a table with IP, and the count of that IP:

Now Kibana 4 allows you to use aggregations. Apart from building a panel like the one that was explained in this answer for Kibana 3, now we can see the number of unique IPs in different periods, that was (IMO) what the OP wanted at the first place.
To build a dashboard like this you should go to Visualize -> Select your Index -> Select a Vertical Bar chart and then in the visualize panel:
In the Y axis we want the unique count of IPs (select the field where you stored the IP) and in the X axis we want a date histogram with our timefield.
After pressing the Apply button, we should have a graph that shows the unique count of IP distributed on time. We can change the time interval on the X axis to see the unique IPs hourly/daily...
Just take into account that the unique counts are approximate. For more information check also this answer.

Be aware with Unique count you are using 'cardinality' metric, which does not always guarantee exact unique count. :-)
the cardinality metric is an approximate algorithm. It is based on the
HyperLogLog++ (HLL) algorithm. HLL works by hashing your input and
using the bits from the hash to make probabilistic estimations on the
cardinality.
Depending on amount of data I can get differences of 700+ entries missing in a 300k dataset via Unique Count in Elastic which are otherwise really unique.
Read more here: https://www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html

Create "topN" query on "clientip" and then histogram with count on "clientip" and set "topN" query as source. Then you will see count of different ips per time.

Unique counts of field values are achieved by using facets. See ES documentation for the full story, but the gist is that you will create a query and then ask ES to prepare facets on the results for counting values found in fields. It's up to you to customize the fields used and even describe how you want the values returned. The most basic of facet types is just to group by terms, which would be like an IP address above. You can get pretty complex with these, even requiring a query within your facet!
{
"query": {
"match_all": {}
},
"facets": {
"terms": {
"field": "ip_address"
}
}
}

Using Aggs u can easily do that.
Writing down query for now.
GET index/_search
{
"size":0,
"aggs": {
"source": {
"terms": {
"field": "field",
"size": 100000
}
}
}
}
This would return the different values of field with there doc counts.

For Kibana 7.x, Unique Count is available in most visualizations.
For example, in Lens:
In aggregation based visualizations:
And even in TSVB (supporting normal fields as well as Runtime Fields, Scripted Fields are not supported):

Related

Elasticsearch Join

I have two indices. One indices "indications" which have some set of values.
Other is "projects". In this indices, I will add indications value like " indication = oncology".
Now I want to show all indications. Which I can do using terms aggregations. But my issue is that I also want to show count of project in which that indication is used .
So for that, I need to write join query.
Can anyone help me to resolve this issue?
Expected result example:
[{name:"onclogogy",projectCount:"12"}]
You cannot have joins in Elasticsearch. What you can do is store indication name in project index and then apply the term aggregation on project index. That basically will get you the different indications from all the project documents and count of each indication.
Something of the sort:
GET /project/_search
{
"query": {},
"aggs": {
"indcation":{
"terms": {
"field": "indication_name"
}
}
}
}
Elasticsearch does not supports joins. That's the whole point of having NoSQL that you keep the data as denormalised as you can. Make the documents more and more self sufficient.
There are some ways with which you can add some sort of relationship b/w your data. This is a nice blog on it.

Painless script with Spring Data Elasticsearch

We are using Spring Data Elasticsearch to build a 'fan out on read' user content feed. Our first attempt is currently showing content based on keyword matching and latest content using NativeSearchQueryBuilder.
We want to further improve the relevancy order of what is shown to the user based on additional factors (e.g. user engagement, what currently the user is working on etc).
Can this custom ordering be done using NativeSearchQueryBuilder or do we get more control using a painless script? If it's a painless script, can we call this from Spring Data ElasticSearch?
Any examples, recommendations would be most welcome.
Elasticsearch orders it result by it relevance-score (which marks a result relevancy to your search query), think that each document in the result set includes a number which signifies how relevant the document is to the given query.
If the data you want to change your ordering upon is part of your indexed data (document fields for example), you can use QueryDSL, to boost the _score field, few options I can think on:
boost a search query dependent on it criteria: a user searches for a 3x room flat but 4x room in same price would be much better match, then we can: { "range": { "rooms": { "gte": 4, "boost": 1 }}}
field-value-factor you can favor results by it field value: more 'clicks' by users, more 'likes', etc..,
random-score if you want randomness in your results: different
result every time a user refreshes your page or you can mix with existing scoring.
decay functions (Gauss!) to boost/unboost results that are close/far to our central point. lets say we want to search apartments and our budget is set to 1700. { "gauss": { "price": { "origin": "1700", "scale": "300" } } } will give us a feeling on how close we are to our budget of 1,700. any flat with much higher prices (let's say 2,300) - would get much more penalized by the gauss function - as it is far from our origin. the decay and the behavior of gauss function - will separate our results accordingly to our origin.
I don't think this has any abstraction on spring-data-es and I would use FunctionScoreQueryBuilder with the NativeSearchQueryBuilder.

How to find what index a field belongs to in elasticsearch?

I am new to elasticsearch. I have to write a query using a given field but I don't know how to find the appropriate index. How would I find this information?
Edit:
Here's an easier/better way using mapping API
GET _mapping/field/<fieldname>
One of the ways you can find is to get records where the field exist
Replace the <fieldName> with your fields name. /_search will search across all indices and return any document that matches or has the field. Set _source to false, since you dont care about document contents but only index name.
GET /_search
{
"_source": false,
"query": {
"exists": {
"field": "<fieldName>"
}
}
}
Another, more visual way to do that is through the kibana Index Management UI (assuming you have privileges to access the site).
There you can click on the indices and open the mappings tab to get all fields of the particular index. Then just search for the desired field.
Summary:
#Polynomial Proton's answer is the way of choice in 90% of the time. I just wanted to show you another way to solve your issue. It will require more manual steps than #Polynomial Proton's answer. Also, if you have a large amount of indices this way is not appropriate.

Grafana - Show metric by field value

I'm currently trying to create a graph on Grafana to monitor the status of my servers, however, I can't seem to find a way to use the value of a field as the value to be displayed on the graph. (Datasource is ElasticSearch)
The following "document" is going to be sent to GrayLog (which saves to Elastic) every 1 minute for an array of regions.
{
"region_key": "some_key",
"region_name": "Some Name",
"region_count": 1610
}
By using the following settings, I can get Grafana to display the count of messages it received for each region, however, I want to display the number on the region_count field instead.
Result:
How can I accomplish this? is this even possible using Elastic as the datasource?
1) Make sure that your document includes a timestamp in ElasticSearch.
2) In the Query box, provide the Lucene query which narrows down the documents to only those related to this metric
3) In the Metric line, press "Count" and change that to one which takes a specific field: for example, "Average"
4) Next to the "Average" box will appear "select field", which is a dropdown of the available fields. If you see unexpected fieldnames here, it's probably because your Lucene query isn't specific enough. (Kibana can be useful for getting this query right)

Elasticsearch query too many results

I'm tring to set up a simple search that would return me simple results with a custom ordering, the ordering i get back is fine based on a custom score.
The problem is that for this query
"query": {
"query_string": {
"query": query_term,
"fields": ["name_auto"],
}
}
NOTE: name_auto is an Edge N gram field on elastics
I always get a result set also if the query does not make any sense.
Example:
I have an elastcisearch index populated with the name of all the android applications.
If i search for face i get back all the results related to it ordered by number of comments on the play store, menans [facebook, facebook messenger, ...]
The problem is that when i query for something like facesomeuselesschars i still get the same results as before but fore sure there is nothing that match "someuselesschars".
Can anybody help about
ElasticSearch will always return results that match your query, even if the score of those results are poor. Your query for 'facesomeuselesschars' will match anything that has 'face' in it because of your ngrams (e.g. the first four characters of your query will be match multiple tokens in your index).
The rest of the characters in your query will simply lower the score of the returned match, but not prevent it from being returned.
If you want to set a minimum score that a result must reach, you can use the min_score parameter.

Resources