Memory consumed by persistent index - memory-management

We are using arangodb 3.1.3 for our project and we have created a collection with 1GB of data.
When we uploaded the data without creating persistent index for the attributes in the document, the memory consumed by the indexes as shown in the web console is 225.4 MB of memory.
When we uploaded the data by creating a persistent index for one of the attributes which is present in all the documents, the memory size was still the same. We assumed that the persistent index would consume more memory. But it did not.
How should we measure the memory size in Arangodb especially index memory ?

I believe you can get the index's size through arangodbsh, as in:
db.[collectionName].figures()
There's another SO question similar this, but I can't seem to find it now.

Related

Indexing 7TB of data with elasticsearch. FScrawler stops after sometime

I am using fscrawler to create an index of data above 7TB. The indexing starts fine but then stops when the index size gets to 2.6gb. I believe this is a memory issue, how do I configure the memory?
My machine memory is 40GB and I have assigned 12GB to elasticsearch.
You might have to assign as well enough memory to FSCrawler using FS_JAVA_OPTS. Like:
FS_JAVA_OPTS="-Xmx4g -Xms4g" bin/fscrawler

Elasticsearch reindex store sizes vary greatly

I am running Elasticsearch 6.2.4. I have a program that will automatically create an index for me as well as the mappings necessary for my data. For this issue, I created an index called "landsat" but it needs to actually be named "landsat_8", so I chose to reindex. The original "landsat" index has 2 shards and 0 read replicas. The store size is ~13.4gb with ~6.6gb per shard and the index holds just over 515k documents.
I created a new index called "landsat_8" with 5 shards, 1 read replica, and started a reindex with no special options. On a very small Elastic Cloud cluster (4GB RAM), it finished in 8 minutes. It was interesting to see that the final store size was only 4.2gb, yet it still held all 515k documents.
After it was finished, I realized that I failed to create my mappings before reindexing, so I blew it away and started over. I was shocked to find that after an hour, the /cat/_indices endpoint showed that only 7.5gb of data and 154,800 documents had been reindexed. 4 hours later, the entire job seemed to have died at 13.1gb, but it only showed 254,000 documents had been reindexed.
On this small 4gb cluster, this reindex operation was maxing out CPU. I increased the cluster to the biggest one Elastic Cloud offered (64gb ram), 5 shards, 0 RR and started the job again. This time, I set the refresh_interval on the new index to -1 and changed the size for the reindex operation to 2000. Long story short, this job ended in somewhere between 1h10m and 1h19m. However, this time I ended up with a total store size of 25gb, where each shard held ~5gb.
I'm very confused as to why the reindex operation causes such wildly different results in store size and reindex performance. Why, when I don't explicitly define any mappings and let ES automatically create mappings, is the store size so much smaller? And why, when I use the exact same mappings as the original index, is the store so much bigger?
Any advice would be greatly appreciated. Thank you!
UPDATE 1:
Here are the only differences in mappings:
The left image is "landsat" and the right image is "landsat_8". There is a root level "type" field and a nested "properties.type" field in the original "landsat" index. I forgot one of my goals was to remove the field "properties.type" from the data during the reindex. I seem to have been successful in doing so, but at the same time, accidentally renamed the root-level "type" field mapping to "provider", thus "landsat_8" has an unused "provider" mapping and an auto-created "type" mapping.
So there are some problems here, but I wouldn't think this would nearly double my store size...

Solr Performance Issues (Caching/RAM usage)

We are using Solr 5.2 (on windows 2012 server/jdk 1.8) for document content indexing/querying. We found that querying slows down intermittently under load condition.
In our analysis we found following two issues.
Solr is not effectively using caches
Whenever new document indexed, it opens new searcher and cache will become invalid (as it was associated with old Index Searcher). In our scenario, new documents are indexed very frequently (at least 10 document are indexed per minute). So effectively cache will not be useful as it will open new searcher frequently to make new documents available for searching. How can improve caching usage?
RAM is not utilized
We observed that Solr is using only 1-2 GB of heap even though we have assign 50 GB. Seems like it is not loading index into RAM which leads to high IO. Is it possible to configure Solr to fully load indexes in memory? Don't find any documentation about this.

Selecting elasticsearch memory storage

I need to know which setting I have to set for selecting an onheap or offheap memory index. It seems that index.store.type=memory stores index data offheap, but I need to store my data onheap.
I looked at the documentation and wasn't able to find this setting.
Thanks,
Joan.

How to free up unused space after deleting documents in ElasticSearch?

When deleting records in ElasticSearch, I heard that the disk space is not freed up. So if I only wanted to keep rolling three months of documents in a type, how do I ensure that disk space is reused?
The system will naturally re-use the space freed up as it needs to, provided the files have been marked as such by ElasticSearch.
However, ElasticSearch goes through a series of stages Even 'retiring' the data will not remove it from the system, only hide it away.
This command should do what you need:
DELETE /
See here for more information: https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html

Resources