is elastic search database contains only structured data? - elasticsearch

I am a new comer for Elastic search and my question is "I want to store a large amount of log files into Elastic search database. And I am confused with how data files are stored?, which type of files should be stored in the elastic search?, is Elastic search stores only stored structured data files(JSON Format files or some other structure format)? or It will stores unstructured data as well?".
Thanks.

Elasticsearch stores nothing itself, but relies on Apache Lucene for this. Each elasticsearch shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.
https://lucene.apache.org/core/ "Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java."
More about what elasticsearch stores: https://www.elastic.co/blog/found-dive-into-elasticsearch-storage
To understand how the data is stored: https://www.elastic.co/guide/en/elasticsearch/guide/current/inverted-index.html the inverted index:
Elasticsearch uses a structure called an inverted index, which is
designed to allow very fast full-text searches. An inverted index
consists of a list of all the unique words that appear in any
document, and for each word, a list of the documents in which it
appears.

Related

Elastic search with hadoop

Currently in my organization we are holding semi structured data in elastic search and we use queries for fast text search and aggregation, but we have other products which lie in other databases so we want to put all the data in a data lake like HDFS
So if I use HDFS as a data lake to hold raw data, how will use elastic search with it? I mean elastic search index data before using it, so is it possible to hold the data in the data lake , and then elastic search will query the data from the data lake directly without needing to store the data in elastic? or will i hold the data in the data lake then process it and store it again in elastic so it can index it?
to summarize, I want to know the concepts of elastic and hadoop intergation
Both Spark and Hive offer Elasticsearch connectors; there's no need to export documents into HDFS, other than possibly backup functionality.
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/reference.html

ElasticSearch as primary DB for document library

My task is a full-text search system for a really large amount of documents. Now I have documents as RTF file and their metadata, so all this will be indexed in elastic search. These documents are unchangeable (they can be only deleted) and I don't really expect many new documents per day. So is it a good idea to use elastic as primary DB in this case?
Maybe I'll store the RTF file separately, but I really don't see the point of storing all this data somewhere else.
This question was solved here. So it's a good case for elasticsearch as the primary DB
Elastic is more known as distributed full text search engine , not as database...
If you preserve the document _source it can be used as database since almost any time you decide to apply document changes or mapping changes you need to re-index the documents in the index(known as table in relation world) , there is no possibility to update parts of the elastic lucene inverse index , you need to re-index the whole document ...
Elastic index survival mechanism is one of the best , meaning that if you loose node the index lost replicas are automatically replicated to some of the other available nodes so you dont need to do any manual operations ...
If you do regular backups and having no requirement the data to be 24/7 available it is completely acceptable to hold the data and full text index in elasticsearch as like in database ...
But if you need highly available combination I would recommend keeping the documents in mongoDB (known as best for distributed document store) for example and use elasticsearch only in its original purpose as full text search engine ...

Indexing .png/JPG/PDF files in elastic search from fileserver

Am having search based requirement. Am able to do indexing oracle database tables into elasticsearch by using logstash. In the same way, i have to index png/JPG/PDF files which are all presented in fileserver now.
Am using elasticsearch version 6.2.3. Can anyone have any idea about indexing files from fileserver to elasticsearch ?
purpose - why am seeing for indexing png/JPG/PDF :
i have to search and display some products with product information, along with that i have to display product picture also which is stored in fileserver.
I have a feature to search for documents (pdf). so,if is search with any keywords, it should also search in the contents of the documents and bring those document as search results. Here documents filepath is available in DB only files are available in fileserver.
For these two purpose, am looking for indexing png/JPG/PDF files.
You just have to get bytes from your image(you can do it in any program language) and then save them in field with binary type. But it is not a good idea, try to save the link to the image.

Solr to Return respponse as a document or Rich text

I am new to Solr and below is my requirement in Solr
I have loads of emails stored in text format (semi-structured).
using Solr I have to index these documents when I am searching for a particular string (could be name) Solr should return the entire matching document/s as a response.
Kindly let me know how to do this in Solr. Is it advisable to store indexes in HDFS?
Solr can store original representation of the field with stored flag. So, you could store your text format in a field and then index it, or split it and index in multiple fields.
However, you may be better off storing those documents outside of Solr and structure content in Solr specifically for searching. Then, your middle-ware combines results returned from Solr with original documents stored somewhere.
The bigger emails are, the better it is for you to store them outside of Solr.

couchbase data replication elasticsearch

I went through Couchbase xcdr replication documentation, but failed to understand below point:
1. couchbase replicate the all the data in bucket in batches to elstic search. And elastic search provide the indexing for these data for realtime statical data. My question is if all the data is replicated to elsastic search , then in this case elastic search is like database which can hold huge amount of data. So can we replace couchbase with elastic search?
2.how the data in form json is send to d3.js for display statical graph.
All of the data is replicated to Elastic Search, but is not held there by default. The indexes and such are created, but the documents are discarded. Elastic Search is not a database and does not perform like one and certainly not on the level of Couchbase. Take a look at this presentation where it talks about performance and stuff and why Cochbas
If your data are not critical or if you have another source of truth, you can use Elasticsearch only.
Otherwise, I'd keep Couchbase and Elasticsearch.
There is a resiliency page on Elastic.co website which describes potential known problems. https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html
My 2 cents.

Resources