These are the statistics for my Ehcache.
I have it configured to use only memory (no persistence, no overflow to disk).
cacheHits = 50
onDiskHits = 0
offHeapHits = 0
inMemoryHits = 50
misses = 1194
onDiskMisses = 0
offHeapMisses = 0
inMemoryMisses = 1138
size = 69
averageGetTime = 0.061597
evictionCount = 0
As you can see, misses is higher than onDiskMisses + offHeapMisses + inMemoryMisses. I do have statistics strategy set to best effort:
cache.setStatisticsAccuracy(Statistics.STATISTICS_ACCURACY_BEST_EFFORT)
But the hits add up, and the difference between misses is rather large. Is there a reason why the misses do not add up correctly?
This quesiton is similar to Ehcache misses count and hitrate statistics, but the answer attributes the differences to the multiple tiers. There is only one tier here.
I'm almost certain that you're seeing this because inMemoryMisses does not include misses due to expired elements. On a get if the value is stored, but expired then you will not see an inMemoryMiss recorded, but you will see a cache miss.
Related
I am using a standalone EHcache 3.10 with heap tier, and disk tier.
The heap is configured to contain 100 entries with no expirations.
Actually, the algorithm insert only 30 entries into the cache, but it perform many "updates" and "reads" of these 30 entries.
Before I insert a new entry to the cache, there is a check if the entry already exist.
Therefore when I check ehcache statistics, I expect to see only 30 - misses on the heap tier, and 30 misses on the disk tier. But,
Instead I get 9806 misses on the heap tier, and 9806 hits on the disk tier. (Meaning 9806 times ehcache did not found the entry on the heap, but instead it was found on the disk)
These numbers make no sense to me, since the heap tier should contain 100 entries, so, why there are so many misses?
Here is my configuration:
statisticsService = new DefaultStatisticsService();
// create temp dir
Path cacheDirectory = getCacheDirectory();
ResourcePoolsBuilder resourcePoolsBuilder =
ResourcePoolsBuilder.newResourcePoolsBuilder()
.heap(100, EntryUnit.ENTRIES)
.disk(Long.MAX_VALUE, MemoryUnit.B, true)
;
CacheConfigurationBuilder cacheConfigurationBuilder =
CacheConfigurationBuilder.newCacheConfigurationBuilder(
String.class, // The cache key
ARFileImpl.class, // The cache value
resourcePoolsBuilder)
.withExpiry(ExpiryPolicyBuilder.noExpiration()) // No expiration
.withResilienceStrategy(new ThrowingResilienceStrategy<>())
.withSizeOfMaxObjectGraph(100000);
// Create the cache manager
cacheManager =
CacheManagerBuilder.newCacheManagerBuilder()
// Set the persistent directory
.with(CacheManagerBuilder.persistence(cacheDirectory.toFile()))
.withCache(ARFILE_CACHE_NAME, cacheConfigurationBuilder)
.using(statisticsService)
.build(true);
Here is the result of the statistics:
Cache stats:
CacheExpirations: 0
CacheEvictions: 0
CacheGets: 6669684
CacheHits: 6669684
CacheMisses: 0
CacheHitPercentage: 100.0
CacheMissPercentage: 0.0
CachePuts: 10525
CacheRemovals: 0
Heap stats:
AllocatedByteSize: -1
Mappings: 30
Evictions: 0
Expirations: 0
Hits: 6659878
Misses: 9806
OccupiedByteSize: -1
Puts: 0
Removals: 0
Disk stats:
AllocatedByteSize: 22429696
Mappings: 30
Evictions: 0
Expirations: 0
Hits: 9806
Misses: 0
OccupiedByteSize: 9961952
Puts: 10525
Removals: 0
The reason for asking this question is that there are a lot of redundant disk reads, which result with performance degradation.
We want to increase the speed of bulk-load.
Now we used JAVA to bulk load documents to Elasticsearch. We planned to import 10m documents each document size is almost 8M. Now we only can import 400K documents each day/ 5 documents every second.
Our ES infrastructure is 3 master node with 4G ES_JAVA_OPTS(heap size) 2 data nodes and 2 client nodes with 2G memory. When I want to increase the speed of bulk-load, we will get over the heap size issue. we set up the es cluster on Kubernetes.
The I/O is below.
dd if=/dev/zero of=/data/tmp/test1.img bs=1G count=10 oflag=dsync
10737418240 bytes (11 GB) copied, 50.7528 s, 212 MB/s
dd if=/dev/zero of=/data/tmp/test2.img bs=512 count=100000 oflag=dsync
51200000 bytes (51 MB) copied, 336.107 s, 152 kB/s
Any advice for the improvement?
for (int x =0; x<200000;x++) {
BulkRequest bulkRequest = new BulkRequest();
for (int k = 0; k < 50; k++) {
Order order = generateOrder();
IndexRequest indexRequest = new IndexRequest("orderpot", "orderpot");
Object esDataMap = objectToMap(order);
String source = JSONObject.valueToString(esDataMap);
indexRequest.source(source, XContentType.JSON);
bulkRequest.add(indexRequest);
}
rhlclient.bulk(bulkRequest, RequestOptions.DEFAULT);
over heap size
Seems you need more memory for data node.10m documents with 8M each will cost a lot of memory.And you can reduce the memory of master node and add on data nodes, master node need less memory than data nodes, and if there is no more nodes, you can combine the client nodes with data nodes, more data nodes with share the pressure.
Some other advise:
1. disable refresh by setting index.refresh_interval to -1 and set index.number_of_replicas to 0, when indexing.
2. set a mapping for your index, do not use default mapping, for example:some fields can be integer no need to use long, some fields can be text but keyword will never be used, and some fields will only be used as text.
[tune-for-indexing-speed given by official][1]https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html
I am starting a discussion, which I hope, will become one place to discuss data loading method using mutators Vs. loading using flat file via 'LOAD DATA INFILE'.
I have been baffled to get enormous performance gain using mutators (using batch size = 1000 or 10000 or 100K et cetera).
My project involved loading close to 400 million rows of social media data into HyperTable to be used for real time analytics. It took me close to 3 days to just load just 1 million row of data (code sample below). Each row is approximately 32 byte. So, in order to avoid taking 2-3 weeks to load this much data, I prepared a flat file with rows and used DATA LOAD INFILE method. Performance gain was amazing. Using this method, loading rate was 368336 cells/sec.
See below for actual snapshot of action:
hypertable> LOAD DATA INFILE "/data/tmp/users.dat" INTO TABLE users;
Loading 7,113,154,337 bytes of input data...
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Load complete.
Elapsed time: 508.07 s
Avg key size: 8.92 bytes
Total cells: 218976067
Throughput: 430998.80 cells/s
Resends: 2210404
hypertable> LOAD DATA INFILE "/data/tmp/graph.dat" INTO TABLE graph;
Loading 12,693,476,187 bytes of input data...
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Load complete.
Elapsed time: 1189.71 s
Avg key size: 17.48 bytes
Total cells: 437952134
Throughput: 368118.13 cells/s
Resends: 1483209
Why is performance difference between 2 method is so vast? What's the best way to enhance mutator performance. Sample mutator code is below:
my $batch_size = 1000000; # or 1000 or 10000 make no substantial difference
my $ignore_unknown_cfs = 2;
my $ht = new Hypertable::ThriftClient($master, $port);
my $ns = $ht->namespace_open($namespace);
my $users_mutator = $ht->mutator_open($ns, 'users', $ignore_unknown_cfs, 10);
my $graph_mutator = $ht->mutator_open($ns, 'graph', $ignore_unknown_cfs, 10);
my $keys = new Hypertable::ThriftGen::Key({ row => $row, column_family => $cf, column_qualifier => $cq });
my $cell = new Hypertable::ThriftGen::Cell({key => $keys, value => $val});
$ht->mutator_set_cell($mutator, $cell);
$ht->mutator_flush($mutator);
I would appreciate any input on this? I don't have tremendous amount of HyperTable experience.
Thanks.
If it's taking three days to load one million rows, then you're probably calling flush() after every row insert, which is not the right thing to do. Before I describe hot to fix that, your mutator_open() arguments aren't quite right. You don't need to specify ignore_unknown_cfs and you should supply 0 for the flush_interval, something like this:
my $users_mutator = $ht->mutator_open($ns, 'users', 0, 0);
my $graph_mutator = $ht->mutator_open($ns, 'graph', 0, 0);
You should only call mutator_flush() if you would like to checkpoint how much of the input data has been consumed. A successful call to mutator_flush() means that all data that has been inserted on that mutator has durably made it into the database. If you're not checkpointing how much of the input data has been consumed, then there is no need to call mutator_flush(), since it will get flushed automatically when you close the mutator.
The next performance problem with your code that I see is that you're using mutator_set_cell(). You should use either mutator_set_cells() or mutator_set_cells_as_arrays() since each method call is a round-trip to the ThriftBroker, which is expensive. By using the mutator_set_cells_* methods, you amortize that round-trip over many cells. The mutator_set_cells_as_arrays() method can be more efficient for languages where object construction overhead is large in comparison to native datatypes (e.g. string). I'm not sure about Perl, but you might want to give that a try to see if it boosts performance.
Also, be sure to call mutator_close() when you're finished with the mutator.
I dont seem to have any caching enabled when checking in Opscenter or cfstats. Im running Cassandra 1.1.7 with Solandra on Debian. I have set the required global options in cassandra.yaml:
key_cache_size_in_mb: 800
key_cache_save_period: 14400
row_cache_size_in_mb: 800
row_cache_save_period: 15400
row_cache_provider: SerializingCacheProvider
Column Families were created as follows:
create column family example
with column_type = 'Standard'
and comparator = 'BytesType'
and default_validation_class = 'BytesType'
and key_validation_class = 'BytesType'
and read_repair_chance = 1.0
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'ALL';
Opscenter shows no data available on caching graphs and CFSTATS doesn't show any cache related fields:
Column Family: charsets
SSTable count: 1
Space used (live): 5558
Space used (total): 5558
Number of Keys (estimate): 128
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 61381
Read Latency: 0.123 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 16
Compacted row minimum size: 1917
Compacted row maximum size: 2299
Compacted row mean size: 2299
Any help or suggestions are appreciated.
Sam
The caching stats have been moved from cfstats to info in Cassandra 1.1. If you run nodetool info you should see something like:
Key Cache : size 5552 (bytes), capacity 838860800 (bytes), 38 hits, 47 requests, 0.809 recent hit rate, 14400 save period in seconds
Row Cache : size 0 (bytes), capacity 838860800 (bytes), 0 hits, 0 requests, NaN recent hit rate, 15400 save period in seconds
This is because there are now global caches, rather than per-CF. It seems that Opscenter needs updating for this change - maybe there is a later version available that will work.
I upgraded my pc from 2.1gh and 2gb ram to dual coure 2.6gh processor and 4GB RAM, magento runs faster but I am still not happy with it (takes 4-6 seconds to open a page).
My memory usage is around 40% alltogether.
Would upgrating to 8GB RAM speed up my magento locally?
I would say, by itself, No. The fact that you are sharing resources on a local machine between MySQL and PHP with Magento is inherently slow within itself. Will you get more throughput? Probably, but not enough to notice.
You will get more of a performance gain by installing Varnish, and enabling Magento full page caching AFTER you install more RAM. Magento cache stores itself in the RAM and so does Varnish. Also make sure you have APC cache installed. Those three COMBINED with more RAM will make all the difference in the world.
For Varnish .. Give it about 1GB RAM in the VCL settings .. Sounds like a lot, but it'll save your life.
For APC, give it at least 256MB of room in the APC settings ... It would probably behove you to do 512MB if you can afford it.
I am also going to include my PHP.INI magento optimized settings as well as my MySQL settings:
PHP.INI
max_execution_time = 18000
max_input_time = 60
memory_limit = 1024M
max_input_vars = 10000
post_max_size = 102M
upload_max_filesize =100 M
max_file_uploads = 20
default_socket_timeout = 60
pdo_mysql.cache_size = 2000
mysql.cache_size = 2000
mysqli.cache_size = 2000
apc.enabled = 1
apc.shm_segments = 1
apc.shm_size = 1024M
apc.num_files_hint = 10000
apc.user_entries_hint = 10000
apc.ttl = 0
apc.user_ttl = 0
apc.gc_ttl = 3600
apc.cache_by_default = 1
apc.filters = "apc\.php$"
apc.mmap_file_mask = "/tmp/apc.XXXXXX"
apc.slam_defense = 0
apc.file_update_protection = 2
apc.enable_cli = 0
apc.max_file_size = 10M
apc.use_request_time = 1
apc.stat = 1
apc.write_lock = 1
apc.report_autofilter = 0
apc.include_once_override = 0
apc.localcache = 0
apc.localcache.size = 256M
apc.coredump_unmap = 0
apc.stat_ctime = 0
apc.canonicalize = 1
apc.lazy_functions = 1
apc.lazy_classes = 1
And MySQL
MY.CNF
key_buffer = 256M
max_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 32
myisam-recover = BACKUP
max_connections = 2500
query_cache_limit = 2M
query_cache_size = 64M
expire_logs_days = 10
max_binlog_size = 100M
[mysqldump]
quick
quote-names
max_allowed_packet = 16M
[isamchk]
key_buffer = 16M
I hope that helps you
If your memory usage is 40% now, then no. Sufficient RAM does make a difference, but in this case the extra 4 wouldn't make much of a difference.
Magento is quite slow due do its complexity and the fact that it uses thousands of files.
To increase Magento's load speed, try to disable stuff you don't need in the admin section, and perhaps Google for other tips. Also, the load speed will differ between different browsers.