Spring-boot + Ignite create null value tombstones in cassandra - spring-boot

I'm using cassandra as the 3rd party store for Ignite. For persisting data to Ignite I use spring-boots ignite-jpa implementation and my objects often have fields that are not populated and therefore null. Naturally when the data is persisted into cassandra, this creates a null value tombstone(s) and in my case quite many due to the volume of data.
Does anyone know if there is a configuration where I could specify not to set (unset) fields (columns) with null value during insert?
I'm aware that null value in cassandra has a special meaning and that this approach would end up not overriding a column value in case of an update to null. However, I'm not concerned with that case.
What I did so far, was to implement my custom CassandraCacheStoreFactory, but was wandering if there is a simpler way.

If you insert null value, there is no way on cluster side to tell Cassandra to ignore it.
But on client side, with your driver you can tell it to ignore null values so it won't send it to Cassandra.
Take a look at InsertOptions.InsertOptionsBuilder there is a method that tells the driver to ignore null value : withInsertNulls(boolean insertNulls), I think it fits for your use case :
https://docs.spring.io/spring-data/cassandra/docs/current/api/org/springframework/data/cassandra/core/InsertOptions.InsertOptionsBuilder.html#withInsertNulls-boolean-

Related

How spring data Cassandra handles null while saving

I have a spring boot application with spring-boot-starter-data-cassandra of version-2.1.2.RELEASE.
Need to understand how spring data Cassandra internally handles null in entity while performing insert option.
Using AsyncCassandraOperations.insert(<T>) method for persisting these entities. In some cases these entity's few fields may be null. Is this approach impact Cassandra performance or tombstones may create in Cassandra. Or please suggest an ideal approach.
There are no issues if some of the individual fields of an object are null. There are problems if one of the fields that comprise the key is null.

How to Increment the ID value in Cassandra Table Automatically?

I have a challenge when I am inserting a values in Cassandra table , I have one column name is "ID", this ID column values are increase the automatically like mysql auto_increment column. I think Counter DataType is not suitable in this Scenario. Please any one help me to design the Schema, I don't want use the UUID's also for Replace the ID column
In short I don't believe it is possible. The nature of Cassandra is that it does not do a read before write. With only one exception, lightweight transactions, but all they do is what's called "compare and swap", but there is no way, the autoincrement can be implemented on the server side.
Even with counters, you won't be able to achieve the desired result, if you increase the counter every time you add a record to the table, because you will not know whether the current value (even if it is totally consistent), is a result of an increment from your process, or from a concurrent process.
The only way is to implement this mechanism on the application side.

Issue with SSIS Lookup Cache Mode and NULL values

I’m hoping that someone may be able to help me.
My question relates to SISS, specifically the Lookup Data Flow Item and how it handles NULL values depending on the selected Cache Mode.
I have a very large dataset (72 columns, 37,000,000 records) which uses a Type 2 update methodology.
I use a lookup in the data flow to identify updates to existing record, I match on all of the relevant fields and if all the fields match then obviously the incoming record matches the existing record in the table and it is therefore discarded. If there isn’t a match then a type 2 update is performed.
Due to the large dataset and limited server resources if the Cache Mode of the Lookup is set to Full Cache, it causes the process to fail due to insufficient memory; I have therefore had to switch the Cache Mode to Partial Cache. This resolves the memory issue, but causes another issue. For some reason in Partial Cache mode a NULL value from the table does not match a NULL value in the incoming records, while if the Cache Mode is set to Full Cache then it does.
This behaviour seams quite odd and I am unable to find it documented anywhere. One way round it could be to coalesce the NULL values, but this is something I would like to avoid.
Any help would be much appreciated.
Cheers
Ben
No Cache and Partial Cache Modes use the database engine to match. In most database engines (SQL Server included) NULL does not equal NULL. NULL means an unknown value so you will never get a match. Do an isnull on all your nullable col

Is jdbcType necessary in a MyBatis mapper?

I've been searching and I don't have this very clear. When using a MyBatis mapper, is it necessary to set the jdbcType? I'm using it with MySql.
For what I've read, it's for when you pass null values, but I don't know if this is still necessary or it's something old. For example, both of these queries work:
SELECT <include refid="columns"/> FROM user WHERE uid=#{uid, jdbcType=INTEGER}
SELECT <include refid="columns"/> FROM user WHERE uid=#{uid}
As you mentioned yourself, you need to specify the jdbcType when passing null values for parameters.
Some databases need to know the value's type even if the value itself is NULL. For this reason, for maximum portability, it's the JDBC specification itself that requires the type to be specified and MyBatis needs to pass it along since it's build on top of JDBC.
From the MyBatis documentation:
The JDBC type is only required for nullable columns upon insert, update or delete. This
is a JDBC requirement, not a MyBatis one. So even if you were coding JDBC directly, you'd need to specify this type – but only for nullable values.
Most of the times you don't need to specify the jdbcType as MyBatis is smart enough to figure out the type from the objects you are working with. But if you send your parameters to the MyBatis statement inside a HashMap, for example, and one of the parameters is null, MyBatis won't be able to determine the type of the parameter by looking at the HashMap because the HashMap is just a generic container and null itself carries no type information. At that point it would be o good idea to provide the jdbcType so that switching the database implementation later on does not cause any issues with null values.

Default values in target tables

I have some mappings, where business entities are being populated after transformation logic. The row volumes are on the higher side, and there are quite a few business attributes which are defaulted to certain static values.
Therefore, in order to reduce the data pushed from mapping, i created "default" clause on the target table, and stopped feeding them from the mapping itself. Now, this works out just fine when I am running the session in "Normal" mode. This effectively gives me target table rows, with some columns being fed by the mapping, and the rest taking values based on the "default" clause on the table DDL.
However, since we are dealing with higher end of volumes, I want to run my session in bulk mode (there are no pre-existing indexes on the target tables).
As soon as I switch the session to bulk mode, this particular feature, (of default values) stops working. As a result of this, I get NULL values in the target columns, instead of defined "default" values.
I wonder -
Is this expected behavior ?
If not, am I missing out on some configuration somewhere ?
Should I be making a ticket to Oracle ? or Informatica ?
my configuration -
Informatica 9.5.1 64 bit,
with
Oracle 11g r2 (11.2.0.3)
running on
Solaris (SunOS 5.10)
Looking forward to help here...
Could be expected behavior.
Seem that bulk mode in Informatica use "Direct Path" API in Oracle (see for example https://community.informatica.com/thread/23522 )
From this document ( http://docs.oracle.com/cd/B10500_01/server.920/a96652/ch09.htm , search Field "Defaults on the Direct Path") I gather that:
Default column specifications defined in the database are not
available when you use direct path loading. Fields for which default
values are desired must be specified with the DEFAULTIF clause. If a
DEFAULTIF clause is not specified and the field is NULL, then a null
value is inserted into the database.
This could be the reason of this behaviour.
I don't believe that you'll see a great benefit from not including the defaults, particularly in comparison to the benefits of a direct path load. If the data is going to be readonly then consider compression also.
You should also note that SQL*Net features compression for same values in the same column, so even in conventional path inserts the network overhead is not as high as you might think.

Resources