Spring Data Cassandra batch increase a counter column - spring

I am trying to batch update (increase or decrease) a counter column using the CassandraBatchOperation in spring-data-cassandra.
I can't find way to batch increase a counter using the CassandraBatchOperation
cassandraOperation.batchOps(BatchType.COUNTER)
.update(...) // it can only set entities as parameter
.execute()
Does it support batch increase a counter?

Related

using Counter I want unique values to be picked for three times in the same thread itself. Is it possible? Data is called from CSV Data Set Config

using Counter I want unique values to be picked for three times in the same thread itself. Is it possible? Data is called from CSV Data Set Config
using Counter I want unique values to be picked for three times in the same thread itself. Is it possible? Data is called from CSV Data Set Config

How to prevent timeout error when importing large CSV data set into Memgraph?

I'm trying to load a rather large CSV file. It has more than 700K entries and it times out every time when I try to import the data.
I am currently using for loop over the data to load it but it's quite time-consuming.
You can try changing the query execution timeout flag in your configuration settings:
-query-execution-timeout-sec=180
The default setting is 180 seconds. You can set a larger value. If you set it to 0 there will be no time limit for query execution.

Need approach to run Spring batch continuously

We are using Spring batch which will fetch x number of records from Database and Make API call and again update database for those x records and this process we are running in 2 steps but we dont want it to be running as scheduler want this to be executed as continuous process as if step 2 is done we should re-execute job , Can someone Please help us as is this recommended approach to run batch continuously for set fo specific batch size of x records and its impact on performance?

Get Hbase processor filter row by timestamp

I'm trying to use HBase get processor in NIFI, and i want to do this command in the hbase processor is it possible ?
scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
The GetHBase processor is made to do incremental extraction from an HBase table based on the timestamp. The Initial Time Range property determines whether the min time starts at 0 or at the current time, after that the processor is keeping track of the max time seen in the previous execution and using that as the min time in the next execution. So you can't provide your own timerange since the processor is managing that for you.
The GetHBase processor always looks for incremental updates based on the timestamp. Basically it recognizes the new/updated data automatically.
But if you still want to read row specifically for timestamp(s), you have to use regular expression in the following format in the tab "Filter Expression":
TimeStampsFilter(timestamp1,timestamp2....timestampn)
You can find a list of these filters in: https://www.cloudera.com/documentation/enterprise/5-3-x/topics/admin_hbase_filtering.html

How does PutHiveQL works on batch?

I am trying to input multiple insert statements to PutHiveQL via ReplaceText processor. Each insert statement is a flowfile coming out from ReplaceText. I set the batch in PutHiveQL to 100. However, it seems it still sends it 1 flowfile at a time. How to best implement this batch?
I don't think the PutHiveQL processor batches statements at the JDBC layer as you expect, not in the way that processors like PutSQL do. From the code, it looks like the Batch Size property is used to control how many flowfiles the processor works on before yielding, but the statements for each flowfile are still executed individually.
That might be a good topic for a NiFi feature request.
The version of Hive supported by NiFi doesn't allow for batching/transactions. The Batch Size parameter is meant to try to move multiple incoming flow files a bit faster than having the processor invoked every so often. So if you schedule the PutHiveQL processor for every 5 seconds with a Batch Size of 100, then every 5 seconds (if there are 100 flow files queued), the processor will attempt to process those during one "session".
Alternatively you can specify a Batch Size of 0 or 1 and schedule it as fast as you like; unfortunately this will have no effect on the Hive side of things, as it auto-commits each HiveQL statement; the version of Hive doesn't support transactions or batching.
Another (possibly more performant) alternative is to put the entire set of rows as a CSV file into HDFS and use the HiveQL "LOAD DATA" DML statement to create a table on top of the data: https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations

Resources