Apache Ignite: is my data read from memory or disk? - spring-boot

I need to understand if the following configuration of Ignite will serve my data from memory or from disk.
Ignite configuration:
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
</bean>
</property>
</bean>
</property>
Java Code:
ClientConfiguration cfg = new ClientConfiguration().setAddresses("127.0.0.1:10800");
try (IgniteClient client = Ignition.startClient(cfg)) {
ClientCache<Long, SensorsWaiting> cache = client.cache("SQL_PUBLIC_FOO");
FieldsQueryCursor<List<?>> query = cache.query(new SqlFieldsQuery("select * from foo"));
}
Background of the question:
I expect a large number of queries and need the results being served from memory. At the same time I need the data to be stored to disk in case the Ignite Server crashes or needs to be restarted.
Is my understanding correct, that in this case my data is served from memory?
What if I use the JDBC driver? Is it still the same? What is the difference between the cache and the jdbc driver?

It'll be served from memory if it's in memory, otherwise it will be pulled in from disk. (This distinction is important if you have more data than you have memory or for when the cluster starts up.)
The different APIs all access the same data. Whether you use the JCache (get, put), SqlFieldsQuery or JDBC/ODBC, it's all the same.

Related

Apache Ignite cache write is visible to other client after a delay

We have a 8 node Ignite cluster on production. Below is the cache configuration for one of the caches.
<bean id="cache-template-bean" abstract="true"
class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="inputDataCacheTemplate*"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="backups" value="1"/>
<property name="atomicityMode" value="ATOMIC"/>
<property name="dataRegionName" value="dr.prod.input"/>
<property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
<property name="writeSynchronizationMode" value="PRIMARY_SYNC"/>
<property name="statisticsEnabled" value="true"/>
<property name="affinity">
<bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
<property name="partitions" value="256"/>
</bean>
</property>
<property name="expiryPolicyFactory">
<bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
<constructor-arg>
<bean class="javax.cache.expiry.Duration">
<constructor-arg value="DAYS"/>
<constructor-arg value="7"/>
</bean>
</constructor-arg>
</bean>
</property>
</bean>
We are seeing a strange behaviour. It is as follows
Application A writes a record to cache
Application B tries to read that record
Application B is unable to find record in cache, so it inserts new one thereby wiping the data entered by Application A
3 happens very rarely. There are 1000 such cache miss for about 50M events we receive daily.
Gap between 1 and 2 is more than 20ms at least.
We tried putting a code in Application B where on first cache miss we wait for about 20ms. Now we could reduce those misses by a great margin. But still there were some misses. The fact that app B could read same record it could not find after a delay means that app A is not failing in record insertion, nor there is some other network factor which is impacting inserts nor it is because eviction or expiry. It is also ensured that for 1 and 2 key used for put and get operations is same.
What could be going on here? Please help.
I think it's more likely that you have a race condition.
Application B tries to read that record
Application A writes a record to cache
Application B is unable to find record in cache, so it inserts new one thereby wiping the data entered by Application A
Clients generally go to the primary partition to retrieve data, so it's incredibly unlikely that applications A and B are seeing different data.
The traditional way of dealing with this is with transactions, which would also work in Ignite.
Better might be using different APIs. For example, there's IgniteCache#getAndPutIfAbsent() and IgniteCache#putIfAbsent(), both of which do the check and write atomically without needing transactions.

Prevent duplicates across restarts in spring integration

I have to poll a directory and write entries to rdbms.
I wired up a redis metadatstore for duplicates check. I see that the framework updates the redis store with entries for all files in the folder [~ 140 files], much before the rdbms entries gets written. At the time of application termination, rdbms has logged only 90 files. On application restart no more files are picked from folder.
Properties: msgs.per.poll=10, polling.interval=2000
How can I ensure entries to redis are made after writing to db, so that both are in sync and I don't miss any files.
<code>
<task:executor id="executor" pool-size="5" />
<int-file:inbound-channel-adapter channel="filesIn" directory="${input.Dir}" scanner="dirScanner" filter="compositeFileFilter" prevent-duplicates="true">
<int:poller fixed-delay="${polling.interval}" max-messages-per-poll="${msgs.per.poll}" task-executor="executor">
</int:poller>
</int-file:inbound-channel-adapter>
<int:channel id="filesIn" />
<bean id="dirScanner" class="org.springframework.integration.file.RecursiveLeafOnlyDirectoryScanner" />
<bean id="compositeFileFilter" class="org.springframework.integration.file.filters.CompositeFileListFilter">
<constructor-arg ref="persistentFilter" />
</bean>
<bean id="persistentFilter" class="org.springframework.integration.file.filters.FileSystemPersistentAcceptOnceFileListFilter">
<constructor-arg ref="metadataStore" />
</bean>
<bean name="metadataStore" class="org.springframework.integration.redis.metadata.RedisMetadataStore">
<constructor-arg name="connectionFactory" ref="redisConnectionFactory"/>
</bean>
<bean id="redisConnectionFactory" class="org.springframework.data.redis.connection.jedis.JedisConnectionFactory" p:hostName="localhost" p:port="6379" />
<int-jdbc:outbound-channel-adapter channel="filesIn" data-source="dataSource" query="insert into files values (:path,:name,:size,:crDT,:mdDT,:id)"
sql-parameter-source-factory="spelSource">
</int-jdbc:outbound-channel-adapter>
....
</code>
Artem is correct, you might as well extend the RedisMetadataStore and flush the entries that are not in your database on initialization time, this way you could use Redis and be in sync with the DB. But this kind of couples things a little.
How can I ensure entries to redis are made after writing to db
It's isn't possible, because FileSystemPersistentAcceptOnceFileListFilter works before any message sending and only once, when FileReadingMessageSource.toBeReceived is empty. Of course, it tries to refetch files on the next application restart, but it can't do that because your RedisMetadataStore already contains entries for those files.
I think we don't have in your case any choice unless use some custom JdbcFileListFilter based on your files table. Fortunately you logic ends up with file entry anyway.

JPA 2 + EclipseLink : Caching Issue

I have a strange behavior with caching and JPA Entities (EclipseLink 2.4.1 ) + GUICE PERSIST
I will not use caching, nevertheless I get randomly an old instance that has already changed in MySQL database.
I have tried the following:
Add # Cacheable (false) to the JPA Entity.
Disable Cache properties in the persistence.xml file :
<class>MyEntity</class>
<shared-cache-mode>NONE</shared-cache-mode>
<properties>
<property name="eclipselink.cache.shared.default" value="false"/>
<property name="eclipselink.cache.size.default" value="0"/>
<property name="eclipselink.cache.type.default" value="None"/>
<property name="eclipselink.refresh" value="true"/>
<property name="eclipselink.query-results-cache" value="false"/>
<property name="eclipselink.weaving" value="false"/>
</properties>
Even activating trace EclipseLink, i see the JPQL query:
ReadObjectQuery Execute query (name = "readObject" referenceClass = XX sql = "... (just making a call" find "the entityManager
but, However randomly returns an old value of that class.
Note
Perhaps happens for using different instances of EntityManager and everyone has their cache?
I have seen the following related post : Disable JPA EclipseLink 2.4 cache
If so, is possible to clear the cache of ALL EntityManager whithout using : ????
em.getEntityManagerFactory().getCache().evictAll();
Is it possible to clear ALL caches whithout using evictALL ??
Evict all is for the shared cache which you have disabled already anyway. EntityManager instances are required by default to have a first level cache of their own to keep track of all managed instances they have created. An entityManager is meant to represent logical transactions and so should not be long lived. You need to throw away your EntityManagers and re obtain them or just clear them at logical points rather than let the number of managed entitites in its cache grow endlessly. This will also help limit the stale data issue, though nothing other than pessimistic locking can eliminate it. I recommend using optimistic locking if you aren't already to avoid overwriting with stale data.

How to make eclipselink perform better about bulk insert

I am testsing eclipselink to make bulk data insert into derby.
Compared by the same set of data, eclipse link take double time of jdbc batch update.
I have enabled the batchupdate feature of eclipse link, and the other properties:
<property name="eclipselink.jdbc.batch-writing" value="JDBC"/>
<property name="eclipselink.jdbc.batch-writing.size" value="1000"/>
<property name="eclipselink.jdbc.cache-statements" value="true"/>
<property name="eclipselink.jdbc.cache-statements.size" value="30"/>
<property name="eclipselink.cache.shared.default" value="false"/>
<property name="eclipselink.jdbc.read-connections.max" value="20"/>
<property name="eclipselink.jdbc.read-connections.min" value="1"/>
<property name="eclipselink.jdbc.write-connections.min" value="1"/>
<property name="eclipselink.jdbc.write-connections.max" value="30"/>
The question is how to make eclipse link be faster?
Please include the code, and the SQL log. Also include your JDBC code, and make sure it is kosher (closing statements, etc.).
Are you using sequence preallocation? If not then you will not be getting any batching (check your SQL log to see if the batch is occurring).
I would not change the connection pooling defaults, your are less efficient than the default (initial 1, min 32, max 32, no separate read/write pool). Having a different min/max will cause connection throttling, which is bad.
See,
http://java-persistence-performance.blogspot.com/2011/06/how-to-improve-jpa-performance-by-1825.html
Since JPA operates on top of JDBC, is will always take more time than fully optimized JDBC code. But does have the advantage of letting you use Java objects and not write JDBC code, and make major optimization such as batch writing just by changing a flag, instead of rewriting the code.

OracleDataSource vs. Oracle UCP PoolDataSource

I was researching some JDBC Oracle Connection Pooling items and came across a new(er) Oracle Pool implementation called Universal Connection Pool (UCP). Now this uses a new class, PoolDataSource, for connection pooling rather then the OracleDataSource [with the cache option enabled]. I am debating whether to switch to this new implementation but can't find any good documentation of what (if any) fixes/upgrades this would buy me. Anyone have an experience with both? Pluses/Minuses? Thanks.
Latest Oracle jdbc driver (11.2.0.1.0) explicit states that Oracle Implicit Connection cache (which is that one that use OracleDataSource) it's deprecated :
Oracle JDBC Drivers release 11.2.0.1.0 production Readme.txt
What Is New In This Release ?
Universal Connection Pool
In this release the Oracle Implicit Connection Cache feature is
deprecated. Users are strongly encouraged to use the new Universal
Connection Pool instead. The UCP has all of the features of the
ICC, plus much more. The UCP is available in a separate jar file,
ucp.jar.
So I think it's better to start using UCP, but the documentation it's not that good.
For example I didn't find a way to use UCP with spring...
UPDATE: I've found the correct spring configuration:
OK I think I've found the right configuration:
<bean id="dataSource" class="oracle.ucp.jdbc.PoolDataSourceFactory" factory-method="getPoolDataSource">
<property name="URL" value="jdbc:oracle:thin:#myserver:1521:mysid" />
<property name="user" value="myuser" />
<property name="password" value="mypassword" />
<property name="connectionFactoryClassName" value="oracle.jdbc.pool.OracleDataSource" />
<property name="connectionPoolName" value="ANAG_POOL" />
<property name="minPoolSize" value="5" />
<property name="maxPoolSize" value="10" />
<property name="initialPoolSize" value="5" />
<property name="inactiveConnectionTimeout" value="120" />
<property name="validateConnectionOnBorrow" value="true" />
<property name="maxStatements" value="10" />
</bean>
The key is to specify the right factory class and the right factory method
PDS is 'universal' as it provides the same level of pooling functionality you get in ODS for non-Oracle databases, e.g. MySQL.
See UCP Dev Guide, an article on Oracle website and UCP Transition Guide
I don't see any immediate benefit of moving to UCP (PDS) from ODS, but perhaps in the future Oracle will deprecate some of the functionality in ODS. I used ODS for a while and I'm quite happy with it for the time being, but if I started fresh I'd go with PDS.
I did an extensive evaluation of UCP and decided to NOT use UCP - please have a look at this post for details.
I tested the UCP and deployed it to production in a Spring 3.0.5 Hibernate app using Spring JMS listener containers and Spring-managed sessions and transactions using the #Transactional annotation. The data sometimes causes SQL constraint errors, due to separate listener threads trying to update the same record. When that happens, the exception is thrown by one method annotated by #Transactional and the error is logged into the database using another method annotated by #Transactional. For whatever reason, this process seems to result in a cursor leak, that eventually adds up and triggers the ORA-01000 open cursor limit exceeded error, causing the thread to cease processing anything.
OracleDataSource running in the same code doesn't seem to leak cursors, so it doesn't cause this problem.
This is a pretty weird scenario, but it indicates to me that it's a little too early to be using the UCP in an application with this kind of structure.
I too am testing UCP and am finding myself that I am having performance issues in a Thread Pool based application. Initially, I tried OracleDataSource, but am having trouble configuring it for batch processing. I keep getting NullPointerExceptions in the connections, leading me to believe I have some sort connection leak, but only with some application, there are other applications we manage that are not batch process oriented that OracleDataSource works well.
Based on this post and a few others I found researching this subject, I tried UCP. I found that with enough tweaking, I could get rid of closed connections/NullPointerExceptions on connections style errors, but Garbage Collection was taking a beating. Long-Term GC fills up fast and does not ever seem to free up until the application finishes running. This can sometimes take as long as a day or more if the load is really heavy. I also notice that it takes progressive longer to process data as well. I compare that to the now depreciated OracleCacheImpl class (that we currently use in production because it still "just works"), where it used a third of the GC memory that UCP does and processes files much faster. In all other applications UCP seems to work just fine and handles just about everything I throw at it, but the Thread Pool Application is a major app and I could not risk GC Exceptions in production.
The implicit connection caching performs quite a bit better than UCP if you use the connection validation. This corresponds to bug 16723836, which is scheduled to be fixed in 12.1.0.2.
UCP pooling becomes increasingly more expensive to get/return
connections as the concurrent load increases. The test compares the oracle
implicit connection caching, tomcat's pooling, and UCP. All 3 are
configured to allow a max of 200 connections, a minimum of 20 connections and
an initial size of 2. All 3 are configured to validate the connections as
they are removed from the pool. The tomcat pool uses the statement "select
sysdate from dual" for validation.
These results on a 64bit RedHat node with 64 logical cores (32 physical) and 128 GB of ram.
At 5 concurrent threads, UCP is the slowest, but total connection management
time (get and close) is under 1 ms on average.
As the concurrency is increased, UCP falls further and further behind the
other solutions:
25 Threads:
Implicit: 0.58ms
Tomcat: 0.92ms
UCP: 1.50ms
50 Threads:
Implicit: 0.92ms
Tomcat: 1.60ms
UCP: 6.80ms
100 Threads:
Implicit: 2.60ms
Tomcat: 3.20ms
UCP: 21.40ms
180 Threads:
Implicit: 13.86ms
Tomcat: 15.34ms
UCP: 40.70ms
There are two possible ways to use UCP in Spring Bean.xml.
For db.properties set by some file then load this then Use one of them:
<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
<property name="location">
<value>classpath:resources/db.properties</value>
</property>
</bean>
First one wiht oracle.ucp.jdbc.PoolDataSourceImpl :-
<bean id="dataSource" class="oracle.ucp.jdbc.PoolDataSourceImpl">
<property name="URL" value="${jdbc.url}" />
<property name="user" value="${jdbc.username}" />
<property name="password" value="${jdbc.password}" />
<property name="validateConnectionOnBorrow" value="true"/>
<property name="connectionFactoryClassName" value="oracle.jdbc.pool.OracleDataSource" />
<property name="connectionPoolName" value="TEST_POOL" />
<property name="minPoolSize" value="10" />
<property name="maxPoolSize" value="20" />
<property name="initialPoolSize" value="12" />
</bean>
Second one with oracle.ucp.jdbc.PoolDataSourceFactory :-
<bean id="dataSource" class="oracle.ucp.jdbc.PoolDataSourceFactory"
factory-method="getPoolDataSource">
<property name="URL" value="${jdbc.url}" />
<property name="user" value="${jdbc.username}" />
<property name="password" value="${jdbc.password}" />
<property name="validateConnectionOnBorrow" value="true"/>
<property name="connectionFactoryClassName" value="oracle.jdbc.pool.OracleDataSource" />
<property name="connectionPoolName" value="TEST_POOL" />
<property name="minPoolSize" value="10" />
<property name="maxPoolSize" value="20" />
<property name="initialPoolSize" value="12" />
</bean>
That's It :)
Here is the link for Detail Documentation :
https://docs.oracle.com/cd/E11882_01/java.112/e12265/connect.htm#CHDDCICA
I tried ucp and the performance is better... May be the key is using this
oracle.ucp.jdbc.PoolDataSource ds = (oracle.ucp.jdbc.PoolDataSource)envContext.lookup(url_r);
MyConnectionLabelingCallback callback = new MyConnectionLabelingCallback();
ds.registerConnectionLabelingCallback( callback );
Properties label = new Properties();
label.setProperty(pname, KEY);
conn = ds.getConnection(label);
This helps to borrow the connection and never closing it.. so the performance is great
The code for the callback class is
public class MyConnectionLabelingCallback
implements ConnectionLabelingCallback {
public MyConnectionLabelingCallback()
{
}
public int cost(Properties reqLabels, Properties currentLabels)
{
// Case 1: exact match
if (reqLabels.equals(currentLabels))
{
System.out.println("## Exact match found!! ##");
return 0;
}
// Case 2: some labels match with no unmatched labels
String iso1 = (String) reqLabels.get("TRANSACTION_ISOLATION");
String iso2 = (String) currentLabels.get("TRANSACTION_ISOLATION");
boolean match =
(iso1 != null && iso2 != null && iso1.equalsIgnoreCase(iso2));
Set rKeys = reqLabels.keySet();
Set cKeys = currentLabels.keySet();
if (match && rKeys.containsAll(cKeys))
{
System.out.println("## Partial match found!! ##");
return 10;
}
// No label matches to application's preference.
// Do not choose this connection.
System.out.println("## No match found!! ##");
return Integer.MAX_VALUE;
}
public boolean configure(Properties reqLabels, Object conn)
{
System.out.println("Configure################");
try
{
String isoStr = (String) reqLabels.get("TRANSACTION_ISOLATION");
((Connection)conn).setTransactionIsolation(Integer.valueOf(isoStr));
LabelableConnection lconn = (LabelableConnection) conn;
// Find the unmatched labels on this connection
Properties unmatchedLabels =
lconn.getUnmatchedConnectionLabels(reqLabels);
// Apply each label <key,value> in unmatchedLabels to conn
for (Map.Entry<Object, Object> label : unmatchedLabels.entrySet())
{
String key = (String) label.getKey();
String value = (String) label.getValue();
lconn.applyConnectionLabel(key, value);
}
}
catch (Exception exc)
{
return false;
}
return true;
}
}

Resources