We have some GB's of data in the form of json files in Amazon S3 , we also index data in Elasticsearch for searching.
We want to have functionality to reindex data in Elasticsearch for following reasons.
1] ES outage
2] New fields added/removed from document.
We tried reading data from s3 in our code and then sending it to ES , it was quite slow and cannot be used.
What is the best way to reindex data from Amazon S3 to Elasticsearch.
Regards,
Related
Currently in my organization we are holding semi structured data in elastic search and we use queries for fast text search and aggregation, but we have other products which lie in other databases so we want to put all the data in a data lake like HDFS
So if I use HDFS as a data lake to hold raw data, how will use elastic search with it? I mean elastic search index data before using it, so is it possible to hold the data in the data lake , and then elastic search will query the data from the data lake directly without needing to store the data in elastic? or will i hold the data in the data lake then process it and store it again in elastic so it can index it?
to summarize, I want to know the concepts of elastic and hadoop intergation
Both Spark and Hive offer Elasticsearch connectors; there's no need to export documents into HDFS, other than possibly backup functionality.
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/reference.html
When I search in Elasticsearch, Elasticsearch send RESTAPI to original db?
Or Elasticsearch have orginal data?
I find Elasticsearch have indexing data. But I can't certain Elasticsearch have original data.
Elasticsearch is a database itself, so if you want some external data source to be in Elasticsearch (e.g: SQL Database) you need to index the data into Elasticsearch first, and then search against that data.
So no, the REST Api will not query against the original DB but against the data you have in Elasticsearch.
You can read more about the process here:
https://www.elastic.co/guide/en/cloud/current/ec-getting-started-search-use-cases-db-logstash.html
Disclaimer: I am very new to ELK Stack, so this question can be very basic.
I am setting up ELK stack now. I have below basic questions about ElasticSearch.
What is the storage model elastic search is following?
For example Oracle is using relational model ,Alfresco is using "document model" and Apache Jackrabbit is using "hierarchial model"
2.Log data stored in elastic search is persistent/permanent ? Or ElasticSearch deletes log data after certain period?
3.How we will manage/backup this data?
4.Log/data files in Elastic Search is human-readable?
Any help/route to documentation will be appreciated.
the storage model is a Document model. Everything is a document. The documents are of a particular type and they are stored in an index.
Data send to ES is stored on disk. It can be then read, searched or deleted through a REST API.
The Data is managed through the rest API. Usually for log centralisation, the logs are stored in date-based index (one index for today, one for yesterday and so on), so to delete the logs from one day, you delete the relevant index. Curator can help in this case. ES offers a backup and restore module.
To access the data in ES, you'll have to use the REST API or use the Kibana client.
Documentation:
https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
I have some GB of data to be transferred to ES. I know of a way to first dump data from Redshift to S3 and then to ES.
Is there any other way.
You can check this Elastic Dump
For reference: how to move elasticsearch data from one server to another
I started looking into ElasticSearch, and most examples of creating and reading involve POSTing data to the ElasticSearch server and then doing a GET to retrieve them.
Is this data that is POSTed stored separately by the ElasticSearch server? So, if I want to use ElasticSearch with MongoDB, does the raw data, not including the search indices, get stored twice (once copy for MongoDB and one for ElasticSearch)?
In conjunction with an answer to this question, a description or a link to a description of how ElasticSearch and the primary data store interact would be very helpful.
Yes, ElasticSearch can only search within its own data store, so a separate copy will be there.
You can use the mongodb connector to keep the data in elastic in sync with the mongo database: https://github.com/mongodb-labs/mongo-connector