As we know a new document written into elasticsearch will usually been visible after 1 seconds. Because elasticsearch make data visible only after it's been moved from cache into Segments.
However, since those newly added data is already stored in memory cache, why not just directly take those data into consideration, without waiting it to be moved to segments?
The official documentation explains it pretty well why it's designed the way it is: https://www.elastic.co/guide/en/elasticsearch/reference/8.6/near-real-time.html
Sitting between Elasticsearch and the disk is the filesystem cache. Documents in the in-memory indexing buffer (Figure 1) are written to a new segment (Figure 2). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read just like any other file.
So, between the moment a document to be indexed arrives into the Elasticsearch in-memory indexing buffer (i.e. inside the JVM heap) and the moment it is written into a segment to the physical disk, it will transit through the filesystem cache (i.e. the remaining 50% of the physical RAM) where it is already searchable.
The transition from the indexing buffer to the filesystem cache is carried out by the refresh operation which happens in general every second, hence why "near real time". Then transiting the data from the filesystem cache to the disk requires a Lucene commit operation, which is a much more expensive operation and is performed less frequently.
Related
I want to use a Lucene MMapDirectory as a primary file store. Each file would be stored in a separate document as a byte array in a StoredField. All file properties that should be searchable, like file name, size etc., would be stored in indexable fields in the same document.
My questions would be:
What are the drawbacks of using Lucene directories for storing files, especially with regards to indexing and search performance and memory (RAM) consumption?
If this is not a "no-go", is there a better/faster way of storing files in the directory than as a byte array?
Short Answer
I really love Luсene and consider it to be the best opensource library, but I'm afraid that it's not a good decision to use it as a primary file source due to:
high CPU/memory overhead
slow index/query performance
high HDD utilization and doubled index size
weak capabilities to recovery
Long Answer
Under the hood lucene uses the following files to keep all stored fields in one segment:
the fields index file (.fdx),
the fields data file (.fdt).
You can read more about how it works in Lucene50StoredFieldsFormat’s docs.
This means in case of any I/O issue it is almost impossible to restore any file.
In order to return one file - lucene have to read and decompress binary data from the disc in block-by-block manner. This means high CPU overhead to decompress and high memory footprint to keep the whole file in java heap space. No streaming is also avaialbe - compared to file and network storages.
Maximum document size is limited by codec implementation - 2 GB per document
Lucene has a unique write-once segmented architecture: recently indexed documents are written to a new self-contained segment, in append-only, write-once fashion: once written, those segment files will never again change. This happens either when too much RAM is being used to hold recently indexed documents, or when you ask Lucene to refresh your searcher so you can search all recently indexed documents. Over time, smaller segments are merged away into bigger segments, and the index has a logarithmic "staircase" structure of active segment files at any time. This architecture becomes a big problem for file storage:
you can not delete file - only mark as unavailable
merge operation requires 2x disc space and consumes a lot of resources and disc throughput - it creates new .fdt file and copies content of other .fdt files thru java code and java heap memory
So you won't be using MMapDirectory but an actual lucene index.
I have made good experiences with using lucene as the primary data-store for some projects.
Just be sure to also include a generated/natural unique ID, because the document IDs are not constant or reliable.
Also make sure you use a Directory implementation fitting to your use-case. I have switched to the normal RandomAccess implementation in the low-load case, since it uses less memory and is almost as fast.
I have one question about the following quota from ES official doc:
But if you give all available memory to Elasticsearch’s heap,
there won’t be any left over for Lucene.
This can seriously impact the performance of full-text search.
If my server has 80G memory, I issued the following command to start ES node: bin/elasticsearch -xmx 30g
That means I only give the process of ES 30g memory maximum. How can Lucene use the left 50G, since Lucene is running in ES process, it's just part of the process.
The Xmx parameter simply indicates how much heap you allocate to the ES Java process. But allocating RAM to the heap is not the only way to use the available memory on a server.
Lucene does indeed run inside the ES process, but Lucene doesn't only make use of the allocated heap, it also uses memory by heavily leveraging the file system cache for managing index segment files.
There were these two great blog posts (this one and this other one) from Lucene's main committer which explain in greater details how Lucene leverages all the available remaining memory.
The bottom line is to allocate 30GB heap to the ES process (using -Xmx30g) and then Lucene will happily consume whatever is left to do what needs to be done.
Lucene uses the off heap memory via the OS. It is described in the Elasticsearch guide in the section about Heap sizing and swapping.
Lucene is designed to leverage the underlying OS for caching in-memory data structures. Lucene segments are stored in individual files. Because segments are immutable, these files never change. This makes them very cache friendly, and the underlying OS will happily keep hot segments resident in memory for faster access.
I will be creating a 5 node mongodb cluster. It will be more read heavy than write and had a question which design would bring better performance. These nodes will be dedicated to only mongodb. For the sake of an example, say each node will have 64GB of ram.
From the mongodb docs it states:
MongoDB automatically uses all free memory on the machine as its cache
Does this mean as long as my data is smaller than the available ram it will be like having an in-memory database?
I also read that it is possible to implement mongodb purely in memory
http://edgystuff.tumblr.com/post/49304254688/how-to-use-mongodb-as-a-pure-in-memory-db-redis
If my data was quite dynamic (can range from 50gb to 75gb every few hours), would it be theoretically be better performing to design mongodb in a way which allows mongodb to manage itself with its cache (default setup of mongo), or to put the mongodb into memory initially and if the data grows over the size of ram use swap space (SSD)?
MongoDB default storage engine maps the files in memory. It provides an efficient way to access the data, while avoiding double caching (i.e. MongoDB cache is actually the page cache of the OS).
Does this mean as long as my data is smaller than the available ram it will be like having an in-memory database?
For read traffic, yes. For write traffic, it is different, since MongoDB may have to journalize the write operation (depending on the configuration), and maintain the oplog.
Is it better to run MongoDB from memory only (leveraging tmpfs)?
For read traffic, it should not be better. Putting the files on tmpfs will also avoid double caching (which is good), but the data can still be paged out. Using a regular filesystem instead will be as fast once the data have been paged in.
For write traffic, it is faster, provided the journal and oplog are also put on tmpfs. Note that in that case, a system crash will result in a total data loss. Usually, the performance gain does not worth the risk.
It is mentioned here(http://www.couchbase.com/memcached) that couchbase can be used as the caching layer. I am supposed to use the community edition for my caching layer. As found in Internet, there are many large scale organizatios are using for heavy usage, but the size of their caches are around 1kb to 100kb. I want to know,
will there be a performance draw backs when large objects(1mb-10mb size) are cached and when it is replicated ?
will data be synchronized/replicated among nodes as soon as they are updated
any one has experience?
To answer your questions:
Will there be a performance draw backs when large objects(1mb-10mb size) are cached and when it is replicated ?
Couchbase has a maximum document size of 20MB for Couchbase type buckets. Depending on your settings each document will need to be written both to disk and across the network to each replica node. Other than the actual disk / network bandwidth required for this you shouldn't see any particular performance issues.
Will data be synchronized/replicated among nodes as soon as they are updated?
As documented in the Couchbase Admin Guide, data is queued to be replicated to replica nodes as soon as it is received by the master.
Couchbase automatically shards each Bucket into a number of vBuckets, and each vBucket is "owned" by just a single master node, so a client will normally only need to communicate with one node for a particular document; therefore replication time isn't relevant for consistency (it's mainly there to provide backup copies in the event of a node failure).
You may also want to look at the high level Architecture and Concepts of Couchbase to see how it all fits together.
I come to know that cassandra uses blooms filter for performance ,and it stores these filter data into physical-memory.
1)Where does cassandra stores this filters?(in heap memory ?)
2)How much memory do these filters consumes?
When running, the Bloom filters must be held in memory, since their whole purpose is to avoid disk IO.
However, each filter is saved to disk with the other files that make up each SSTable - see http://wiki.apache.org/cassandra/ArchitectureSSTable
The filters are typically a very small fraction of the data size, though the actual ratio seems to vary quite a bit. On the test node I have handy here, the biggest filter I can find is 3.3MB, which is for 1GB of data. For another 1.3GB data file, however, the filter is just 93KB...
If you are running Cassandra, you can check the size of your filters yourself by looking in the data directory for files named *-Filter.db