Elasticsearch > Is it possible to build indices on base of FIELDS - elasticsearch

In the context of ELK (Elasticsearch, Logstash, Kibana), I learnt that Logstash has FILTER to make use of grok to divide log messages into different fields. According to my understanding, it only helps to make the unstructured log data into more structured data. But I do no have any idea about how Elasticsearch can make use of the fields (done by grok) to improve the querying performance? Is it possible to build indices on base of the fields like in traditional relational database?

From Elasticsearch: The Definitive Guide
Inverted index
Relational databases add an index, such as a B-tree index, to specific columns in
order to improve the speed of data retrieval. Elasticsearch and Lucene use a
structure called an inverted index for exactly the same purpose.
By default, every field in a document is indexed (has an inverted
index) and thus is searchable. A field without an inverted index is
not searchable. We discuss inverted indexes in more detail in Inverted Index.
So you not need to do anything special. Elasticsearch already indexes all the fields by default.

Related

ElasticSearch - search by property from concrete index while going through multiple indexes

We're using ElasticSearch and we have two different indexes with different data. Recently, we wanted to make a query that needs data from both indexes. ES allows to search through multiple indexes: /index1,index2/_search. The problem is that both indexes have properties with the same name and there could be collisions because ES doesn't know on which index to search.
How can we tell ES to look up a property from concrete index?
For example: index1.myProperty and index2.otherProperty

ElasticSearch as primary DB for document library

My task is a full-text search system for a really large amount of documents. Now I have documents as RTF file and their metadata, so all this will be indexed in elastic search. These documents are unchangeable (they can be only deleted) and I don't really expect many new documents per day. So is it a good idea to use elastic as primary DB in this case?
Maybe I'll store the RTF file separately, but I really don't see the point of storing all this data somewhere else.
This question was solved here. So it's a good case for elasticsearch as the primary DB
Elastic is more known as distributed full text search engine , not as database...
If you preserve the document _source it can be used as database since almost any time you decide to apply document changes or mapping changes you need to re-index the documents in the index(known as table in relation world) , there is no possibility to update parts of the elastic lucene inverse index , you need to re-index the whole document ...
Elastic index survival mechanism is one of the best , meaning that if you loose node the index lost replicas are automatically replicated to some of the other available nodes so you dont need to do any manual operations ...
If you do regular backups and having no requirement the data to be 24/7 available it is completely acceptable to hold the data and full text index in elasticsearch as like in database ...
But if you need highly available combination I would recommend keeping the documents in mongoDB (known as best for distributed document store) for example and use elasticsearch only in its original purpose as full text search engine ...

Elasticsearch data comparison

I have two different Elasticsearch clusters,
One cluster is Elastcisearch 6.x with the data, Second new Elasticsearch cluster 7.7.1 with pre-created indexes.
I reindexed data from Elastcisearch 6.x to Elastcisearch 7.7.1
Is there any way to get the doc from source and compare it with the target doc, in order to check that data is there and it is not affected somehow.
When you perform a reindex the data will be indexed based on destination index mapping, so if your mapping is same you should get the same result in search, the _source value will be unique on both indices but it doesn't mean your search result will be the same. If you really want to be sure everything is OK you should check the inverted index generated by both indices and compare them for fulltext search, this data can be really big and there is not an easy way to retrieve it, you can check this for getting term-document matrix .

Confusion about Elasticsearch

I have some confusion about ElasticSearch's Index.
In some place I read it's the equivalent of rdbms' database and some other place, an Index is like what we have at the end of books : list of words with corresponding documents that contain the word.
If someone can clarify.
Thanks
An Elasticsearch cluster can contain multiple Indices (databases). These indices hold multiple Documents (rows), and each document has Properties or field(columns).
you can check list of your available indices with http://localhost:9200/_cat/indices?v .
but in general (computer sciences and DB) indexing means like you said.
list of words with corresponding documents that contain the word
. this structure improves the speed of data retrieval operations on a database table. this concept could be used in many DB like mysql or oracle. in elasticsearch by default all document will be indexed. (you can change this settings to not indexing some columns/fields)

What is the equivalent of creating MySQL indexes in Elasticsearch?

As you probably know, in MySQL you can create indexes to improve the performance of your queries. Is there any such equivalent in Elastic? (I already know that an index is somewhat the equivalent of creating a database in Elastic)
I just need confirmation from black-belt Elastic users ;)
From the documentation:
Relational databases add an index, such as a B-tree index, to specific
columns in order to improve the speed of data retrieval. Elasticsearch
and Lucene use a structure called an inverted index for exactly the
same purpose.
By default, every field in a document is indexed (has an inverted index) and thus is searchable. A field without an inverted index is
not searchable. We discuss inverted indexes in more detail in Inverted
Index.

Resources