Selecting elasticsearch memory storage - elasticsearch

I need to know which setting I have to set for selecting an onheap or offheap memory index. It seems that index.store.type=memory stores index data offheap, but I need to store my data onheap.
I looked at the documentation and wasn't able to find this setting.
Thanks,
Joan.

Related

How to configure Elasticsearch ILM rollover to create indexes tiwh date?

My data source write index MyIndex-%{+YYYY.MM.dd.HH.mm}, but data in index in each day to big.
I need rollover to create new index if data more than 10gb
For example
MyIndex-2022.12.23-1 size 10GB
MyIndex-2022.12.23-2 size 10GB
MyIndex-2022.12.23-3 size 10GB
...
MyIndex-2022.12.24-1 size 10GB
MyIndex-2022.12.24-2 size 10GB
...
MyIndex-2022.12.25-1 size 10GB
etc.
Can someone help me? I using logsstash to put data to elastic
do you have a Kibana instance?
If so see this article:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/index-lifecycle-management.html
You need to create an index template matching the pattern of your index.
Then create an ILM in the Stack Management. Here you should be able to set the shard and index sizes for rollovers. Just open the advanced options in the hot phase.
See here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/getting-started-index-lifecycle-management.html
You don't need to change anything in Logstash for that.
If you don't have Kibana you need to use the APIs and some REST calls to add the policy.
Hope that helps!

What kind of persistent data does Kibana stores in path.data?

There is a Kibana configuration called path.data that says:
The path where Kibana stores persistent data not saved in Elasticsearch.
What kind of persistent data is stored there?
As far as I am aware, Kibana stores most of its information in Elasticsearch under its index (.kibana by default).
There isn't much documentation on this, however I could find a Elasticsearch team member response to another question. Stating:
You're correct that all Kibana saved objects are stored in Elasticsearch, in the .kibana index. It doesn't write anything to the filesystem (save for maybe some temporary files, but even that I'm pretty sure doesn't happen).
Therefore, I would say that only temporary information is stored in the path.data and all the relevant information (for either persistence or for monitorization) is stored under the .kibana index.
Can someone else confirm this?

Memory consumed by persistent index

We are using arangodb 3.1.3 for our project and we have created a collection with 1GB of data.
When we uploaded the data without creating persistent index for the attributes in the document, the memory consumed by the indexes as shown in the web console is 225.4 MB of memory.
When we uploaded the data by creating a persistent index for one of the attributes which is present in all the documents, the memory size was still the same. We assumed that the persistent index would consume more memory. But it did not.
How should we measure the memory size in Arangodb especially index memory ?
I believe you can get the index's size through arangodbsh, as in:
db.[collectionName].figures()
There's another SO question similar this, but I can't seem to find it now.

Is Elasticsearch suitable as a final storage solution?

I'm currently learning Elasticsearch, and I have noticed that a lot of operations for modifying indices require reindexing of all documents, such as adding a field to all documents, which from my understanding means retrieving the document, performing the desirable operation, deleting the original document from the index and reindex it. This seems to be somewhat dangerous and a backup of the original index seems to be preferable before performing this (obviously).
This made me wonder if Elasticsearch actually is suitable as a final storage solution at all, or if I should keep the raw documents that makes up an index separately stored to be able to recreate an index from scratch if necessary. Or is a regular backup of the index safe enough?
You are talking about two issues here:
Deleting old documents and re-indexing on schema change: You don't always have to delete old documents when you add new fields. There are various options to change the schema. Have a look at this blog which explains changing the schema without any downtime.
http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/
Also, look at the Update API which gives you the ability to add/remove fields.
The update API allows to update a document based on a script provided. The operation gets the document (collocated with the shard) from the index, runs the script (with optional script language and parameters), and index back the result (also allows to delete, or ignore the operation). It uses versioning to make sure no updates have happened during the "get" and "reindex".
Note, this operation still means full reindex of the document, it just removes some network roundtrips and reduces chances of version conflicts between the get and the index. The _source field need to be enabled for this feature to work.
Using Elasticsearch as a final storage solution at all : It depends on how you intend to use Elastic Search as storage. Do you need RDBMS , key Value store, column based datastore or a document store like MongoDb? Elastic Search is definitely well suited when you need a distributed document store (json, html, xml etc) with Lucene based advanced search capabilities. Have a look at the various use cases for ES especially the usage at The Guardian:http://www.elasticsearch.org/case-study/guardian/
I'm pretty sure, that search engines shouldn't be viewed as a storage solution, because of the nature of these applications. I've never heard about this kind of a practice to backup index of search engine.
Usual schema when you using ElasticSearch or Solr or whatever search engine you have:
You have some kind of a datasource (it could be database, legacy mainframe, excel papers, some REST service with data or whatever)
You have search engine that should index this datasource to add to your system capability for search. When datasource is changed - you could reindex it, or index only changed part with the help of incremental indexation.
If something happen to search engine index - you could easily reindex all your data.

How can I copy hadoop data to SOLR

I've a SOLR search which uses lucene index as a backend.
I also have some data in Hadoop I would like to use.
How do I copy this data into SOLR ??
Upon googling the only likns I can find tell me how to use use an HDFS index instead of a local index, in SOLR.
I don't want to read the index directly from hadoop, I want to copy them to SOLR and read it from there.
How do I copy? And it would be great if there is some incremental copy mechanism.
If you have a standalone Solr instance, then you could face some scaling issues, depending on the volume of data.
I am assuming high volume given you are using Hadoop/HDFS. In which case, you might need to look at SolrCloud.
As for reading from hdfs, here is a tutorial from LucidImagination, that addresses this issue, and recommends the use of Behemoth
You might also want to look at Katta project, that claims to integrate with hadoop and provide near real-time read access of large datasets . The architecture is illustrated here
EDIT 1
Solr has an open ticket for this. Support for HDFS is scheduled for Solr 4.9. You can apply the patch if you feel like it.
You cannot just copy custom data to Solr, you need to index* it. You data may have any type and format (free text, XML, JSON or even binary data). To use it with Solr, you need to create documents (flat maps with key/value pairs as fields) and add them to Solr. Take a look at this simple curl-based example.
Note, that reading data from HDFS is a different question. For Solr, it doesn't matter where you are reading data from as long as you provide it with documents.
Storing index on local disk or in HDFS is also a different question. If you expect your index to be really large, you can configure Solr to use HDFS. Otherwise you can use default properties and use local disk.
* - "Indexing" is a common term for adding documents to Solr, but in fact adding documents to Solr internal storage and indexing (making fields searchable) are 2 distinct things and can be configured separately.

Resources