How to use DSL query from Kibana dev-tools in visualisation? - elasticsearch

I have successfully aggregated and queried a particular content I needed in Kibana Dev Tools. However, I need this information in a tabular form either as CSV or PDF. For this, I need to run the DSL query I constructed in Dev Tools in visualisation tool of Kibana. However, I am not able to do it.
I tried copying the DSL to the Lucene query text box on the top part of the visualisation page and also tried within the add filter option. Both way it returns an error.
The query that works in Dev Tools:
{
"query": {
"bool": {
"must": [
{ "match": { "start_datetime":"1569868200" }}
]
}
},
"aggs" : {
"state_location" : {
"terms": {
"field" : "state_location"
},
"aggs": {
"stakeholder_category": {
"terms": {
"field": "stakeholder_category"
},
"aggs": {
"coverage_category": {
"terms": {
"field": "category_paragraph_name.keyword"
}
}
}
}
}
}
}
}
Expecting to get the result on visualisation screen as a table, so that I can export it to CSV or PDF.

The search bar in the discovery bar doesn't work with the json-syntax of a search request towards the REST-API. Instead it uses a simple lucene syntax.
However, you still can edit your search in the discovery manually:
You should be able to see a button with the label "Inspect" like in the following figure.
Note that the look & feel of Kibana got a significant update, so depending of the version you are using, you will find the Inspect button somewhere else in the discovery)
By hitting the button, a right-sided pane will show up with three tabs (Statistics, Request and Response). In the Request-section you can paste your query. Be sure NOT to past the root "query"-node of your json.
Hope, this will help you :-)

Related

How to correctly denormalize one-to-many indexes coming from multiple sources

How can I restructure below elastic indexes to be able to search for registrations that had certain mailing events?
In our application we have the Profile entity which can have one to multiple Registration entities.
The registrations index is used in the majority of searches and contains the data we want to return.
Then we have multiple *Events indexes that contain events that relate to profiles.
A simplified version would look like this:
Registrations
- RegistrationId
- ProfileId
- Location
MailEvents
- ProfileId
- Template
- Actions
A simplified search might be: all the registrations in a certain location with any mailevent action for templates starting with "Solar".
Joining like in a classical RDB is an anti-pattern in elastic Db.
We are considering de-normalizing by adding all the various events for profiles to the registrations index? This wil result in an explosion of data in the registrations index.
Nested objects are also bad for searching, so we should somehow make them into arrays. But how?
We have 100's of rows in the events for every related row in registration. The change rates on the event indexes is way higher then the ones on the registration index.
We are considering doing two requests. One for all the *Events indexes, gathering all the profileIds, unique-ing them, then doing one for the registration part with the result of the first one.
It feels wrong and introduces complicated edge cases where there are more results then the max returned rows in the first request or max Terms values in the second.
By searching around I see many people struggling with this and looking for a way to do join queries.
It feels like de-normalizing is the way to go, but what would be the recommended approach?
What other approaches am I missing?
One approach to consider is using Elasticsearch's parent-child relationship, which allows for denormalization in a way that makes it efficient for search. With parent-child, you would make the Registrations index the parent and the MailEvents index the child. This would allow you to store all the MailEvents data directly under each Registration document and would allow for efficient search and retrieval.
Additionally, you could consider using the has_child query to find all Registrations documents that have a certain MailEvent criteria. For example, to find all Registrations with a MailEvent action for templates starting with "Solar", you could write a query like this:
GET /registrations/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"Location": "some_location"
}
},
{
"has_child": {
"type": "mailevents",
"query": {
"bool": {
"must": [
{
"prefix": {
"Template": "Solar"
}
},
{
"exists": {
"field": "Actions"
}
}
]
}
}
}
}
]
}
}
}
This approach would give you the best of both worlds - you'd have denormalized data that's efficient for search and retrieval, while also avoiding the complexities of multiple requests and potential edge cases.
Another approach is to use Elasticsearch's aggregation feature. In this approach, you would perform a single search query on the Registrations index, filtered by the desired location. Then, you would use the ProfileId field to aggregate the data and retrieve the related MailEvents information. You can achieve this by using a nested aggregation, where you group by ProfileId and retrieve the relevant MailEvents data for each profile.
Here's an example query that performs this aggregation:
GET /registrations/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"Location": "some_location"
}
}
]
}
},
"aggs": {
"profiles": {
"terms": {
"field": "ProfileId"
},
"aggs": {
"mail_events": {
"nested": {
"path": "MailEvents"
},
"aggs": {
"filtered_mail_events": {
"filter": {
"bool": {
"must": [
{
"prefix": {
"MailEvents.Template": "Solar"
}
},
{
"exists": {
"field": "MailEvents.Actions"
}
}
]
}
},
"aggs": {
"actions": {
"terms": {
"field": "MailEvents.Actions"
}
}
}
}
}
}
}
}
}
}
This query will return the Registrations documents that match the desired location, and also provide aggregated information about the related MailEvents data. You can further manipulate the aggregated data to get the information that you need.
Note that this approach can be more complex than the parent-child relationship approach and may have performance implications if your data is large and complex. However, it may be a good solution if you need to perform complex aggregations on the MailEvents data.
As far as I know, the Elasticsearch aggregation function might be another way to do this. You can run search across multiple indices and aggregate the list of profileId from MailEvents and use them to filter Registrations.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
https://discuss.elastic.co/t/aggregation-across-multiple-indices/271350

Elasticsearch delete By Query not completing deletes

I need to delete a large number of documents in a 5.5 Elasticsearch cluster. I know the optimal way to do this is to rebuild the cluster without the intended documents, but that's not possible in our case. I run the following query that deletes documents from a subset of the indexes in the cluster:
GET myindex_1*/doc_type/_delete_by_query
{
"query": {
"bool": {
"filter": [
{
"terms": {
"typeCode": [
"Filtered_Type"
]
}
}
],
"must": [
{
"range": {
"createdDateUTC": {
"lt": "2017-10-28"
}
}
}
]
}
}
}
It starts deleting documents for a couple of hours but then just stops and I have to kick it off again. Any ideas why it stops running the delete query?
Just a note, I'm using Kibana to run the query and the request times out on the client side when though I can see it continues deleting on the backend.
From here:
By default _delete_by_query uses scroll batches of 1000. You can change the batch size with the scroll_size URL parameter:
POST twitter/_delete_by_query?scroll_size=5000
{
"query": {
"term": {
"user": "kimchy"
}
}
}
You can find more information here about batching and batch sizes here:
batches and requests_per_second in ElasticSearch Delete By Query API
And since you'll need to scroll through one to many batches to delete all of the documents found by your query, you can find more information about scrolling here:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-request-scroll.html

How to create visualization from data inside a hit in kibana?

I am looking to create dashboard in kibana using the data from the postgresql database. But the datas are shown up in the same hit in kibana, so can't able to create the visualization.
I would like to create visualization in kibana from the data I had fetched from postgresql. I need to create visualization by comparing the datas inside single column in postgresql. But in kibana, the data in single column of postgresql is showing in the single hit of kibana. So, I am unable to create the visualization from this single hit data. If there is any way to filter the data inside a hit in kibana or to check the word count in single hit?
To find the number of "FAILED" you need to use match query in Kibana like:
GET <YOUR_INDEX>/_count
{
"query": {
"bool": {
"must": {
"match": {
"<YOUR_FIELD>": "FAILED"
}
}
}
}
}
or if you have many fields:
GET <YOUR_INDEX>/_count
{
"query": {
"multi_match" : {
"query": "FAILED",
"fields": [ "<FIELD1>", "<FIELD2>" ]
}
}
}

Is it possible to run an elasticsearch aggregation query in Kibana?

I would like to run the following aggregation query in Kibana:
GET _search
{
"size": 0,
"aggs": {
"group_by_host": {
"terms": {
"field": "host",
"size": 20
}
}
}
}
I can run it in the dev tools console (what used to be called Sense), but I would like to run it in the Kibana proper. Having a hard time figuring it out.
Just create a Chart from Visualize tab.
Then buckets => X Axis (or Split Rows or whatever based on your chart type) => Terms => Choose your field.
Then click Advanced link and write {"size":10} to there:
Hope that helps!

Elasticsearch query with nested aggregations causing out of memory

I have Elasticsearch installed with 16gb of memory. I started using aggregations, but ran into a "java.lang.OutOfMemoryError: Java heap space" error when I attempted to issue the following query:
POST /test-index-syslog3/type-syslog/_search
{
"query": {
"query_string": {
"default_field": "DstCountry",
"query": "CN"
}
},
"aggs": {
"whatever": {
"terms": {
"field" : "SrcIP"
},
"aggs": {
"destination_ip": {
"terms": {
"field" : "DstIP"
},
"aggs": {
"port" : {
"terms": {
"field" : "DstPort"
}
}
}
}
}
}
}
}
The query_string itself only returns 1266 hits so I'm a bit confused by the OOM error.
Am I using aggregations incorrectly? If not, what can I do to troubleshoot this issue?
Thanks!
You are loading the entire SrcIP-, DstIP-, and DstPort-fields into memory in order to aggregate on them. This is because Elasticsearch un-inverts the entire field to be able to rapidly look up a document's value for a field given its ID.
If you're going to largely be aggregating on a very small set of data, you should look into using docvalues. Then a document's value is stored in a way that makes it easy to look up given the document's ID. There's a bit more overhead to it, but that way you'll leave it to the operating system's field cache to have the relevant pages in memory, instead of having to load the entire field.
Not sure about the mapping of course, but looking at the value the field DstCountry can be non_analyzed. Than you could replace the query by a filter within the aggregate. Maybe that helps.
Also check if the fields you use in your aggregation are of type non_analyzed.

Resources