Can I make elasticsearch index like hive's partitioned table? - elasticsearch

Hive tables can partition Date Field data into keys within a table.
Can I also do the elasticsearch index?
I would like to be able to partition an index by date using specific field values within the index.
I would appreciate it if you have any of these techniques, even if you are not necessarily using partitioning with specific field values.
Thank you.

Sure, you can define a index with a YYYY.MM.dd format.
This is what Logstash does, by default
In Kibana, you can do wildcard searches on logstash-* or logstash-2018.*. Not sure if you can do the same with the regular search API

Related

How to delete elasticsearch indices between certain date ranges using the delete api?

I have elasticsearch indexes of the format "abc-xyz-yyyy-mm-dd". I have a
requirement wherein I have to write a script to take two different dates as the input and then delete the indexes of the above format between these ranges using elasticsearch delete api.
Can anyone suggest me how to get implement this?

Elasticsearch data comparison

I have two different Elasticsearch clusters,
One cluster is Elastcisearch 6.x with the data, Second new Elasticsearch cluster 7.7.1 with pre-created indexes.
I reindexed data from Elastcisearch 6.x to Elastcisearch 7.7.1
Is there any way to get the doc from source and compare it with the target doc, in order to check that data is there and it is not affected somehow.
When you perform a reindex the data will be indexed based on destination index mapping, so if your mapping is same you should get the same result in search, the _source value will be unique on both indices but it doesn't mean your search result will be the same. If you really want to be sure everything is OK you should check the inverted index generated by both indices and compare them for fulltext search, this data can be really big and there is not an easy way to retrieve it, you can check this for getting term-document matrix .

Can I add a calculated boolean column to an Elasticsearch Kibana query from data from another query?

Let's imagine that we have an Elastic index and we want to get all the documents of that index and a calculated field with the result of a filtering a different Elastic index.
I will better explain that in SQL code so even if Elastic is NoSQL, I can share the goal:
select id, name, (id IN (select customer_id from invoices where customer_id = 123)) as hasBought
from customers;
Elasticsearch doesn't support table joining. You'll need to denormalize your data one way or another, even it results in data duplication. That's the "downside" of NoSQL like ES.
Quoting the docs:
Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively expensive. Instead, Elasticsearch offers two forms of join which are designed to scale horizontally.

What is the equivalent of creating MySQL indexes in Elasticsearch?

As you probably know, in MySQL you can create indexes to improve the performance of your queries. Is there any such equivalent in Elastic? (I already know that an index is somewhat the equivalent of creating a database in Elastic)
I just need confirmation from black-belt Elastic users ;)
From the documentation:
Relational databases add an index, such as a B-tree index, to specific
columns in order to improve the speed of data retrieval. Elasticsearch
and Lucene use a structure called an inverted index for exactly the
same purpose.
By default, every field in a document is indexed (has an inverted index) and thus is searchable. A field without an inverted index is
not searchable. We discuss inverted indexes in more detail in Inverted
Index.

Elasticsearch > Is it possible to build indices on base of FIELDS

In the context of ELK (Elasticsearch, Logstash, Kibana), I learnt that Logstash has FILTER to make use of grok to divide log messages into different fields. According to my understanding, it only helps to make the unstructured log data into more structured data. But I do no have any idea about how Elasticsearch can make use of the fields (done by grok) to improve the querying performance? Is it possible to build indices on base of the fields like in traditional relational database?
From Elasticsearch: The Definitive Guide
Inverted index
Relational databases add an index, such as a B-tree index, to specific columns in
order to improve the speed of data retrieval. Elasticsearch and Lucene use a
structure called an inverted index for exactly the same purpose.
By default, every field in a document is indexed (has an inverted
index) and thus is searchable. A field without an inverted index is
not searchable. We discuss inverted indexes in more detail in Inverted Index.
So you not need to do anything special. Elasticsearch already indexes all the fields by default.

Resources