HazelCast IMap.values() giving OutofMemory on Tomcat - spring

I'm still trying to get to know hazelcast and have to make a decision on whether to use it or not.
I wrote a simple application where in I startup the cache on (single node) server startup and load the Map at the same time with about 400 entries.The Object itself has two String fields. I have a service class that looksup the cache and tries to get all the values from the map.
However, I'm getting a OutofMemoryError on Java Heap Space while trying to get values out of the hazelcast map. Eventually we plan to move to a 5 node cluster to start with.
Following is the cache spring config:
<hz:hazelcast id="instance">
<hz:config>
<hz:group name="dev" password=""/>
<hz:properties>
<hz:property name="hazelcast.merge.first.run.delay.seconds">5</hz:property>
<hz:property name="hazelcast.merge.next.run.delay.seconds">5</hz:property>
</hz:properties>
<hz:network port="5701" port-auto-increment="false">
<hz:join>
<hz:multicast enabled="true" />
</hz:join>
</hz:network>
</hz:config>
</hz:hazelcast>
<hz:map instance-ref="instance" id="statusMap" name="statuses" />
Following is where the error occurs:
map = instance.getMap("statuses");
Set<Status> statuses = (Set<Status>) map.values();
return statuses;
Any other method of IMap works fine. I tried getting the keySet and size and both worked fine. It is only when I try to get the values that the OutofMemory error shows up.
java.lang.OutOfMemoryError: Java heap space
I've tried the above with a standalone java application and it works fine. I've also monitored with visual VM and don't see any spike in used Heap Memory when the error occurs which is all the more confusing. Available Heap is 1G and the used Heap was about 70MB when the error occurred.
However, when I take out cache implementation from the application, it works fine going to the Database and getting the data.
I've also tried playing around with the tomcat vm args to no success. It is the same OutofMemoryError when I access IMap.values() with or without SQLPredicate. Any help or direction in this matter will be greatly appreciated.
Thanks.

As the exception mentions you're running out of heap space since the values method tries to return all deserialized values at once. If they don't fit into memory you'll likely to get an OOME.
You can use paging to prevent this from happening: http://hazelcast.org/docs/latest/manual/html-single/hazelcast-documentation.html#paging-predicate-order-limit-

How big are your 400 entries?
And like Chris said, the whole data is being pulled in memory.
In the future we'll replace this by an iteration based approach where we'll only pull small chunks in memory instead of the whole thing.

I figured out the issue. The Status object was implementing "com.hazelcast.nio.serialization.Portable" for Serialization. I did not configure the corresponding serialization factory. After I configured the factory as follows, it worked fine:
<hz:serialization>
<hz:portable-factories>
<hz:portable-factory factory-id="1" class-name="ApplicationPortableFactory" />
</hz:portable-factories>
</hz:serialization>
Apologize for not giving the complete background initially as I myself noticed it later on. Thanks for replying though. I wasn't aware of the Paging Predicate and now I'm using it for sorting and paging results. Thanks again.

Related

Hazelcast ensure Near Cache preload on Member

I have a use-case where I need to ensure that the data is loaded to near cache, before working with it. I am using Hazelcast as Cache Manager provider. I do store data in near cache in OBJECT in-memory-format, and do cache local entries. On application start up I do warm cache by calling service cached method:
#Override
public void onApplicationEvent(final ApplicationReadyEvent applicationReadyEvent) {
applicationReadyEvent.getApplicationContext().getBean(MyService.class).myCachedMethod(); //1 warm cahce
applicationReadyEvent.getApplicationContext().getBean(MyService.class).myCachedMethod(); //2 warm near cache here
MyData myData = applicationReadyEvent.getApplicationContext().getBean(MyService.class).myCachedMethod(); // 3 warm near cache here
//4
//work with data from near cache here
myDtata.doStufff();
}
After it I do call same method several times, to load data into near cache. Issue is that some times near cache is not loaded by the time I try to work with method data. It looks like Near Cache is loaded asynchronously, and I keep receiving data not from near cache at step 4. So my question is - is there any way to make NearCache preload possible, or to ensure that at particular point NearCache is populated? Even if that mean to wait for some time for it to get populated, before using. Adding Thread.sleep() does the trick for me, but this is by no means way to go. Hazelcast version is 3.11.4
Any help appreciated.
You can preload the Near Cache using the <preloader> configuration element. See the following example config similar to one from the Reference Manual:
<near-cache name="myDataStructure">
<in-memory-format>OBJECT</in-memory-format>
<preloader enabled="true"
directory="nearcache-example"
store-initial-delay-seconds="600"
store-interval-seconds="600"/>
</near-cache>

ElasticSearch randomly fails when running tests

I have a test ElasticSearch box (2.3.0) and my tests that are using ES are failing in random order which is really frustrating (failed with All shards failed exception).
Looking at the elastic_search.log file it only showed me this
[2017-05-04 04:19:15,990][DEBUG][action.search.type ] [es-testing-1] All shards failed for phase: [query]
RemoteTransportException[[es-testing-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: IllegalIndexShardStateException[CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]];
Caused by: [derp_test][[derp_test][3]] IllegalIndexShardStateException[CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]]
at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:993)
at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:814)
at org.elasticsearch.search.SearchService.createContext(SearchService.java:641)
at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:618)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:369)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Any idea what's going on? So far my research only told me this is most likely due to corrupt translog -- but I don't think deleting translog will help because the test drops the test index for every namespace
ES test box has 3.5GB RAM and it's using 2.5GB heap size, CPU usage is quite normal during the test (peaked at 15%)
To clarify: when I said failing test, I meant error with the weird exception as mentioned above (not failing test due to incorrect value). I did manual refresh after every insert/update operation so value is correct.
After investigating ElasticSearch log file (at DEBUG level) and the source code, turns out what actually happened was that after index is created, the shards are entering RECOVERING state and sometimes my test tried to perform a query on ElasticSearch while the shards are not yet active -- thus the exception.
Fix is simple - after creating an index, just wait until shards are active using setWaitForActiveShards function and to be more paranoid I also added setWaitForYellowStatus
It's a recommendation use ESIntegTestCase to do the integration test.
ESIntegTestCase has some helper method, like: ensureGreen and refresh ... to ensure the Elasticsearch is ready to continue testing. and you can configure node settings for test.
if use Elasticsearch directly as a test box, it maybe cause various problems:
like your Exception, this seems it's recovering shards for index
derp_test.
even you have indexed your data into index, but when you immediately search will fail, since cluster need flush or refresh
...
Those most problems can just use Thread.sleep to wait some time to fix :), but it's a bad way to do this.
Try manually refreshing your indices after inserting the data and before performing a query to ensure the data is searchable.
Either:
As part of the index request - https://www.elastic.co/guide/en/elasticsearch/reference/2.3/docs-index_.html#index-refresh
Or separately - https://www.elastic.co/guide/en/elasticsearch/reference/2.3/indices-refresh.html
There could be another reason. I had the same problem with my elasticsearch unit tests, at first I thought the problem root cause is somewhere in .Net Core or Nest or elsewhere outside of my code because the test would run successfully in Debug mode (When debugging tests) but randomly failed in Release mode (when running tests).
After lots of investigations and many try and errors, I found out the problem root cause (in my case) was Concurrency !! or on the other hand Race Condition used to happen
Since the tests run concurrently and I used to recreate and seed my index (initializing and preparing) on test class constructor which means executing on the beginning of every test and since the tests would run concurrently, race condition were likely to happen and make my tests fail
Here is my initialization code that caused tests fail randomly when running them (on release mode)
public BaseElasticDataTest(RootFixture fixture)
: base(fixture)
{
ElasticHelper = fixture.Builder.Build<ElasticDataProvider<FakePersonIndex>();
deleteFakeIndex();
createFakeIndex();
fillFakeIndexData();
}
the code above used to run on every test concurrently. I fixed my problem by executing initialization code only once per test class (once for all the test cases inside the test class) and the problem went away.
Here is my fixed test class constructor code :
static bool initialized = false;
public BaseElasticDataTest(RootFixture fixture)
: base(fixture)
{
ElasticHelper = fixture.Builder.Build<ElasticDataProvider<FakePersonIndex>>();
if (!initialized)
{
deleteFakeIndex();
createFakeIndex();
fillFakeIndexData();
//for concurrency
System.Threading.Thread.Sleep(100);
initialized = true;
}
}
Hope it helps

How to Fix Read timed out in Elasticsearch

I used Elasticsearch-1.1.0 to index tweets.
The indexing process is okay.
Then I upgraded the version. Now I use Elasticsearch-1.3.2, and I get this message randomly:
Exception happened: Error raised when there was an exception while talking to ES.
ConnectionError(HTTPConnectionPool(host='127.0.0.1', port=8001): Read timed out. (read timeout=10)) caused by: ReadTimeoutError(HTTPConnectionPool(host='127.0.0.1', port=8001): Read timed out. (read timeout=10)).
Snapshot of the randomness:
Happened --33s-- Happened --27s-- Happened --22s-- Happened --10s-- Happened --39s-- Happened --25s-- Happened --36s-- Happened --38s-- Happened --19s-- Happened --09s-- Happened --33s-- Happened --16s-- Happened
--XXs-- = after XX seconds
Can someone point out on how to fix the Read timed out problem?
Thank you very much.
Its hard to give a direct answer since the error your seeing might be associated with the client you are using. However a solution might be one of the following:
1.Increase the default timeout Globally when you create the ES client by passing the timeout parameter. Example in Python
es = Elasticsearch(timeout=30)
2.Set the timeout per request made by the client. Taken from Elasticsearch Python docs below.
# only wait for 1 second, regardless of the client's default
es.cluster.health(wait_for_status='yellow', request_timeout=1)
The above will give the cluster some extra time to respond
Try this:
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
It might won't fully avoid ReadTimeoutError, but it minimalize them.
Read timeouts can also happen when query size is large. For example, in my case of a pretty large ES index size (> 3M documents), doing a search for a query with 30 words took around 2 seconds, while doing a search for a query with 400 words took over 18 seconds. So for a sufficiently large query even timeout=30 won't save you. An easy solution is to crop the query to the size that can be answered below the timeout.
For what it's worth, I found that this seems to be related to a broken index state.
It's very difficult to reliably recreate this issue, but I've seen it several times; operations run as normal except certain ones which periodically seem to hang ES (specifically refreshing an index it seems).
Deleting an index (curl -XDELETE http://localhost:9200/foo) and reindexing from scratch fixed this for me.
I recommend periodically clearing and reindexing if you see this behaviour.
Increasing various timeout options may immediately resolve issues, but does not address the root cause.
Provided the ElasticSearch service is available and the indexes are healthy, try increasing the the Java minimum and maximum heap sizes: see https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html .
TL;DR Edit /etc/elasticsearch/jvm.options -Xms1g and -Xmx1g
You also should check if all fine with elastic. Some shard can be unavailable, here is nice doc about possible reasons of unavailable shard https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/

FastRWeb performance on Ubuntu with built-in web server

I have installed FastRWeb 1.1-0 on an installation of R 2.15.2 (Trick or Treat) running on an Ubuntu 10.04 box. I hope to use the resulting system to run a web service.
I've configured the system by setting http.port to 8181 in rserve.conf and unsetting the socket destination. I've assigned .http.request to FastRWeb::.http.request. I exchange JSON blobs between the client and the server using HTTP POST (the second blob can exceed 150KB in size, and will not fit in an HTTP GET query string.)
Everything works end to end -- I have a little client-side R script which generates JSON RPC calls across the channel. I see the run function invoked, and see it returned.
I've run into a significant performance problem, however: the return path takes in excess of 12 seconds from the time run() returns (including the call to done()) and the time that the R client gets the return value. RCurl doesn't seem to be the culprit; it appears that something is taking twelve seconds to do a return.
Does anybody have any suggestions of where to look? I can easily shift over to using Apache 2.0 and CGI, but, honestly, I'd rather keep everything R centric.
Answering my own question.
I wrapped .http.request with an Rprof()/Rprof(NULL) pair and looked at the time spent in each routine. It turns out that the system spends ~11 seconds inside URLDecode in the standard implementation of .run. This looks like a scaling problem in URLDecode in the core.

Flex big data volume performance (ADEP/LCDS dataservice)

As we have found a solution for Hibernate, server side loads the data very fast : less than a sec for thousands of records and more. Now the problem is on transporting data from server to browser. Two issue:
1.The datagrid always waits until the ArrayCollection is fully loaded. We overcame this by specifying :
<destination id="someAssemble">
<properties>
<use-transactions>true</use-transactions>
<source>com.assembler.SomeAssembler</source>
<scope>application</scope>
<item-class>vo.SomeVo</item-class>
<network>
<paging enabled="true" pageSize="50" />
</network>
==> the datagrid started display quickly. The problem is the server stops loading data from row 51. Is there a way to force Flex keep loading the data in the background (by config or by code )
2.If I try to load a big ArrayCollection (for example more than 20K records), it locks down the whole browser. Is it possible to load it smoothly behind the scene?
Please help! Thank you

Resources