Tomb stone message handling in kgrouptable - apache-kafka-streams

For a Kgrouptable table, aggregation
"When a tombstone record – i.e. a record with a null value – is received for a key (e.g., DELETE), then only the subtractor is called. Note that, whenever the subtractor returns a null value itself, then the corresponding key is removed from the resulting KTable. If that happens, any next input record for that key will trigger the initializer again."...
How do we identify in sub method if it was called due to tomb stone message.
As I receive old value , and store value. As would be case for update.
I want to know if this was a tomb stone message, and skip it.

Related

The difference of a output table generated by aggregation is keyedTable and keyedStreamTable

When the output table generated by aggregation is keyedTable and keyedStreamTable, the results are different
When the aggregation engine uses the tables generated by keyedTable and keyedStreamTable to receive the results, the effect is different. The former can be received, but it cannot be used as a data source for a larger period; the latter does not play an aggregation role, but only intercepts the first record of ticks data per minute.
The code executed by the GUI is as follows:
barColNames=`ActionTime`InstrumentID`Open`High`Low`Close`Volume`Amount`OpenPosition`AvgPrice`TradingDay
barColTypes=[TIMESTAMP,SYMBOL,DOUBLE,DOUBLE,DOUBLE,DOUBLE,INT,DOUBLE,DOUBLE,DOUBLE,DATE]
Choose one of the following two lines of code, and find that the results are inconsistent
/////////// Generate a 1-minute K line (barsMin01) This is an empty table
share keyedTable(`ActionTime`InstrumentID,100:0, barColNames, barColTypes) as barsMin01
//////// This code can be used for aggregation, but it cannot be used as a data source for other periods
share keyedStreamTable(`ActionTime`InstrumentID,100:0, barColNames, barColTypes) as barsMin01
////////Choosing this code does not have an aggregation effect, and it is found that only the first tick of every minute is intercepted.
//////////define the data sources
metrics=<[first(LastPrice), max(LastPrice), min(LastPrice), last(LastPrice), sum(Volume), sum(Amount), sum(OpenPosition), sum(Amount)/sum(Volume)/300, last(TradingDay) ]>
////////////Aggregation engine
//////////// generate 1-min k line, Aggregation engine
nMin01=1*60000
tsAggrKlineMin01 = createTimeSeriesAggregator(name="aggr_kline_min01", windowSize=nMin01, step=nMin01, metrics=metrics, dummyTable=ticks, outputTable=barsMin01, timeColumn=`ActionTime, keyColumn=`InstrumentID,updateTime=500, useWindowStartTime=true)
/////////// subscribe and the 1-min k line will be generated
subscribeTable(tableName="ticks", actionName="act_tsaggr_min01", offset=0, handler=append!{getStreamEngine("aggr_kline_min01")}, batchSize=1000, throttle=1, hash=0, msgAsTable=true)
There are some diffenence between keyedTable and keyedStreamTable:
keyedTable: When adding a new record to the table, the system will automatically check the primary key of the new record. If the primary key of the new record is the same as the primary key of the existing record, the corresponding record in the table will be updated.
keyedStreamTable: When adding a new record to the table, the system will automatically check the primary key of the new record. If the primary key of the new record is the same as the primary key of the existing record, the corresponding record will not be updated.
That is, one of them is for updating and the other is for filtering.
The keyedStreamTable you mentioned "does not play an aggregation role, but intercepts the first record of ticks data per minute", is exactly because you set updateTime=500 in createTimeSeriesAggregator. If updateTime is specified, the calculations may occur multiple times in the current window.
You use keyedStreamTable here to subscribe to this result table, so updateTime cannot be used. If you want to force trigger, you can specify the forceTrigger parameter.

Epicor, sending an email when price is changed, but not the initial entry

I am trying to create a BPM that sends an email when a field is updated.
I have a condition checking if - The Field had been changed from 'any' to 'another'.
This works to fire off the email, but it also goes when the price in the sales order is initially created. How would I make it so that it only goes when the price is updated, but not originally set?
bpm image
By definition, a new record does not change from any to anything else. It's just a new record. So it's satisfying your condition block for false. If you had the logic reversed, you'd only get the email when the field were changed from something to something else... but never when a new record is created.
To handle this, you should add another condition block that checks for added rows. If that's false, point that to the existing condition block you have there for the field changing from any to another.
Add another condition as field RowMod = "U" for updated rows
Add a condition block after start that contains the following:
The Field had been changed from 0 to 'another'.
OR There is at least one added row in the OrderDtl table
Connect the false condition to your existing condition block. Remove your false condition connection and connect your true condition to the email. After that, the email will only be executed when the field is changed after being populated for the first time.
Resetting the price to zero will trigger an email, but the subsequent setting will not. If this is undesirablle, you can mitigate this by adding a UD field to track "first time populated", or enabling ChangeLog tracking and retraining to any undersirable behaviors.

Persist calculated master-block field which depends on details block, within commit

I have a master-block with a details-block. One of the fields in the master-block holds a calculated value which depends on the details-block, and is persisted to the database.
The details-block has POST-INSERT, POST-UPDATE and POST-DELETE form triggers, in which the value of the master-block field is calculated and set:
MASTERBLOCK.FIELD1:=FUNC1; --DB Function that queries the details block's table
When a form is committed, the following happens:
the master block is saved with the stale value
the details-block is saved
the form triggers are executed and the value of the master block is calculated and set.
the master-block field now contains the updated value, but the master-block's record status is not CHANGED and the updated value is not saved.
How can I force the persistence of the calculated field in the master-block?
"One of the fields in the master-block holds a calculated value which depends on the details-block"
Generally the ongoing maintenance of calculated totals exceeds the effort required to calculate them on-demand. But there are exceptions, so let's assume this is the case here.
I think this is your problem: --DB Function that queries the details block's table. Your processing is split between the client and the server in an unhelpful manner. A better approach would be to either:
maintain the total in the master block by capturing the relevant changes in the detail block as they happen (say in navigation triggers); or
calculate the total and update the master record in a database procedure, returning the total for display in the form.
It's not possible to give a definitive answer without knowing more about the specifics of your case. The key thing is you need to understand the concept of a Transaction as the Unit Of Work, and make sure that all the necessary changes are readied before the database issues the COMMIT.

Cassandra and Tombstones: Creating a Row , Deleting the Row, Recreating the Row = Performance?

Could someone please explain, what effect the following process has on tombstones:
1.)Creating a "Row" with Key "1" ("Fields": user, password, date)
2.)Deleting the "Row" with Key "1"
3.)Creating a "Row" with Key "1" ("Fields": user, password,logincount)
The sequence is executed in one thread sequentially (so this happens with a relatively high "speed" = no long pauses between the actions).
My Questions:
1.) What effect does this have on the creation of a tombstone. After 2.) a tombstone is created/exists. But what happens to the existing tombstone, if the new (slightly changed row) is created again under the same key (in process Step 3.)). Can cassandra "reanimate" the tombstones very efficiently?)
2.) How much worse is the process described above in comparison to only very targetly deleting the date "field" and then creating the "logincount" field instead? (It will most likely be more performant. But on the contrary it is much more complex to find out which fields have been deleted in comparison to just simply delete the whole row and recreate it from scratch with the correct data...)
Remark/Update:
What I actually want to do is, setting the "date" field to null. But this does not work in cassandra. Nulls are not allowed for values. So in case I want to set it to null I have to delete it. But I am afraid that this explicit second delete request will have a negative performance impact (compared to just setting it to null)...And as described I have to first find out which fields are nulliefied and foremost had a value (I have to compare all atributes for this state...)
Thank you very much!
Markus
I would like to belatedly clarify some things here.
First, with respect to Theodore's answer:
1) All rows have a tombstone field internally for simplicity, so when the new row is merged with the tombstone, it just becomes "row with new data, that also remembers that it was once deleted at time X." So there is no real penalty in that respect.
2) It is incorrect to say that "If you create and delete a column value rapidly enough that no flush takes place in the middle... the tombstone [is] simply discarded"; tombstones are always persisted, for correctness. Perhaps the situation Theodore was thinking was the other way around: if you delete, then insert a new column value, then the new column replaces the tombstone (just as it would any obsolete value). This is different from the row case since the Column is the "atom" of storage.
3) Given (2), the delete-row-and-insert-new-one is likely to be more performant if there are many columns to be deleted over time. But for a single column the difference is negligible.
Finally, regarding Tyler's answer, in my opinion it is more idiomatic to simply delete the column in question than to change its value to an empty [byte]string.
1). If you delete the whole row, then the tombstone is still kept and not reanimated by the subsequent insertion in step 3. This is because there may have been an insertion for the row a long time ago (e.g. step 0: key "1", field "name"). Row "1" key "name" needs to stay deleted, while row "1" key "user" is reanimated.
2). If you create and delete a column value rapidly enough that no flush takes place in the middle, there is no performance impact. The column will be updated in-place in the Memtable, and the tombstone simply discarded. Only a single value will end up being written persistently to an SSTable.
However, if the Memtable is flushed to disk between steps 2 and 3, then the tombstone will be written to the resulting SSTable. A subsequent flush will write the new value to the next SSTable. This will make subsequent reads slower, since the column now needs to be read from both SSTables and reconciled. (Similarly if a flush occurs between steps 1 and 2.)
Just set the "date" column to hold an empty string. That's what's typically used instead of null.
If you want to delete the column, just delete the column explicitly instead of deleting the entire row. The performance effect of this is similar to writing an empty string for the column value.

ActiveRecord: ensure only one record has specific attribute value?

I have a Races table with many races in various states. But I need to ensure that only one race is marked as current = true. Here is what I have been using in the Race model validation.
# current: boolean
validate :only_one_current
private
def only_one_current
if self.current && (Race.current_race.id != self.id)
errors.add(:base, "Races can have only one current race")
end
end
This seems to work most of the time, but occasionally it does not and I'm not sure why. When it doesn't work it disallows the saving of a new record with current = t just after a different record that was current is deleted. I think it has to do with AR's persistence.
There must be a better way to do this?
Your problem actually extends beyond ActiveRecord. No matter how you implement your before_save method, it will always be possible for a race condition to occur (no pun intended), and for two records to have current = true in the database. See the Concurrancy and Integrity section for validates_uniqueness_of for more information.
The core problem is that the logic to check whether a record has current = true and the operation set the record to current = true is not atomic. This issue comes up in concurrent systems often.
To solve this, you need a unique key index in the database. I'd recommend that you change your current flag to a priority field. The priority is an integer which has a unique key index. The database will guarantee that no two records exist at the same time with the same priority value. The "current" race will always be the one that has the highest priority value.
The race condition will actually still exist - you now just have a way of detecting it. When you set a race to current (by querying the table for the largest priority value), an exception will be generated if another record currently holds the same priority value as the one you're trying to save. Simply catch the duplicate key exception and try again.
You need to call this as before_save, not as a validator:
before_save :only_one_current

Resources