I am very new to elasticsearch and come from a SQL background. We are trying to use a ELK stack to monitor a Jenkins server. We use the elasticsearch report plugin to send a bunch of information about the job. However, we also have some custom information that we would need to send. However, how can I join these two pieces of information in Kibana? In a SQL database, I would have two tables, then join them based on a key. However, I don't how to do it in elasticsearch. Any suggestions?
Generally speaking, join is the strong-suite of relational DBs (aka SQL DBs) and the weak-spot of the NoSQL (Elasticsearch among them). Having said that, ES does support such operations and if performance is not critical, you can try it: Elasticsearch joining queries. In a nutshell:
Create a join-field mapping. This is the equivalent for a foreign key constraint in SQL. Since you have control over the logstash part, I suggest you make it the parent and the ES report info the child.
Use the has_child query when you query the logs. This type of query acts like the join query in SQL.
Related
I have 2 indexes and they both have one common field (basically relationship).
Now as elastic search is not giving filters from multiple indexes, should we store them in memory in variable and filter them in node.js (which basically means that my application itself is working as a database server now).
We previously were using MongoDB which is also a NoSQL DB but we were able to manage it through aggregate queries but seems the elastic search is not providing that.
So even if we use both databases combined, we have to store results of them somewhere to further filter data from them as we are giving users advanced search functionality where they are able to filter data from multiple collections.
So should we store results in memory to filter data further? We are currently giving advanced search in 100 million records to customers but that was not having the advanced text search that elastic search provides, now we are planning to provide elastic search text search to customers.
What do you suggest should we use the approach here to make MongoDB and elastic search together? We are using node.js to serve data.
Or which option to choose from
Denormalizing: Flatten your data
Application-side joins: Run multiple queries on normalized data
Nested objects: Store arrays of objects
Parent-child relationships: Store multiple documents through joins
https://blog.mimacom.com/parent-child-elasticsearch/
https://spoon-elastic.com/all-elastic-search-post/simple-elastic-usage/denormalize-index-elasticsearch/
Storing things client side in memory is not the solution.
First of all the simplest way to solve this problem is to simply make one combined index. Its very trivial to do this. Just insert all the documents from index 2 into index 1. Prefix all fields coming from index-2 by some prefix like "idx2". That way you won't overwrite any similar fields. You can use an ingestion pipeline to do this, or just do it client side. You only will ever do this once.
After that you can perform aggregations on the single index, since you have all the data in one-index.
If you are using somehting other than ES as your primary data-store you need to reconfigure the indexing operation to redirect everything that was earlier going into index-2 to go into index-1 as well(with the prefixed terms).
100 million records is trivial for something like ELasticsearch. Doing anykind of "joins" client side is NOT RECOMMENDED, as this will obviate the entire value of using ES.
If you need any further help on executing this, feel free to contact me. I have 11 years exp in ES. And I have seen people struggle with "joins" for 99% of the time. :)
The first thing to do when coming from MySQL/PostGres or even Mongodb is to restructure the indices to suit the needs of data-querying. Never try to work with multiple indices, ES is not built for that.
HTH.
So, I have 2 indexes in my Elasticsearch server.
I need to gather the results from the first index, and for each result I need to gather info from the second index.
How to do that? Tried the foreach processor, but no luck so far.
Tky
I need to gather the results from the first index, and for each result I need to gather info from the second index.
Unless you create parent/child relationships, that's not possible in ElasticSearch.
However, note:
In Elasticsearch the key to good performance is to de-normalize your data into documents. Each join field, has_child or has_parent query adds a significant tax to your query performance.
Handle reading from multiple indexes within your application or rethink your index mapping.
The foreach processor is for ingest pipelines, meaning, stuff that gets done at indexing time. So it won't help you when you are trying to gather the results.
In general, it's not going to be possible to query another index (which might live on another shard) from within a query.
In some cases, you can use a join field. There are performance implications, it's only recommended in specific cases.
If you are not in the join field use case, and you can restructure your data to use nested objects, it will be more performant than join fields.
Otherwise, you'll be better off running multiple queries in the application code (maybe you can fetch all the "secondary" results using just one query, so you'd have 2 queries in total?)
Let's imagine that we have an Elastic index and we want to get all the documents of that index and a calculated field with the result of a filtering a different Elastic index.
I will better explain that in SQL code so even if Elastic is NoSQL, I can share the goal:
select id, name, (id IN (select customer_id from invoices where customer_id = 123)) as hasBought
from customers;
Elasticsearch doesn't support table joining. You'll need to denormalize your data one way or another, even it results in data duplication. That's the "downside" of NoSQL like ES.
Quoting the docs:
Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively expensive. Instead, Elasticsearch offers two forms of join which are designed to scale horizontally.
I have two Elasticsearch indexes. I want to be able to search them in a similar way to an SQL join.
One index stores data for Lessons and contains a reference to the Locations index using the id of the location document.
What I'm trying to do in essence is a typical SQL join.
SELECT * FROM Lessons L JOIN Locations LC ON L.location_id = LC.id
My first solution would be to add the locations info into the Lesson index when I update a document. This would be the correct approach in the methodology of Elasticsearch - flat data. However the problem is that the two sets of data are maintained independently. So when a Location is updated all the relevant Lessons documents would need to be updated.
The second solution I've looked at are joining queries in Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html , however from what I understand from the documentation this not able between different indexes.
You might be able to use terms lookup .
https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl-terms-query.html#query-dsl-terms-lookup
But if there are too many terms involved in join - performance would be a concern. Also elastic limits them to 65536 ( in newer versions i guess)
how I have 2 index one is called assignment and the other user in sql had a data field fk but I do not know how to perform an inner join in elasticsearch someone can support me
So you have a couple options which might be useful, without knowing your specific use case I'm going to list a potentially useful links.
1)
parent child mapping, really useful when you want to return all documents associated with a specific document. To make the mapping process a bit easier I typically index the data the retrieve the mapping using the /_mapping endpoint, modify the mapping, delete the index, then reingest the data. Sometimes that isn't an option in the case of short lived data.
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html
after updating the current mapping it's possible to use one of the joining queries.
https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html
2)
When deleting the index and re ingesting the data isn't an option, create a new index, modify the data as described above, but instead of deleting the index use the re index API to get the information to the new index.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
3)
It might also be possible to use an ingest processor to join the tables
https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest-processors.html
4)
possibly the quickest until you get your head wrapped around how elasticsearch works is to either join the information prior to ingesting or write a script joining the tables using one of the sdk's.
https://elasticsearch-py.readthedocs.io/en/master/
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/index.html
plus a lot more build by the community.