State Manager not persisting/retrieving data

State Manager not persisting/retrieving data - apache-nifi

NiFi 1.1.1
I am trying to persist a byte [] using the State Manager.
private byte[] lsnUsedDuringLastLoad;
#Override
public void onTrigger(final ProcessContext context,
final ProcessSession session) throws ProcessException {
...
...
...
final StateManager stateManager = context.getStateManager();
try {
StateMap stateMap = stateManager.getState(Scope.CLUSTER);
final Map<String, String> newStateMapProperties = new HashMap<>();
newStateMapProperties.put(ProcessorConstants.LAST_MAX_LSN,
new String(lsnUsedDuringLastLoad));
logger.debug("Persisting stateMap : "
+ newStateMapProperties);
stateManager.replace(stateMap, newStateMapProperties,
Scope.CLUSTER);
} catch (IOException ioException) {
logger.error("Error while persisting the state to NiFi",
ioException);
throw new ProcessException(
"The state(LSN) couldn't be persisted", ioException);
}
...
...
...
}
I don't get any exception or even a log error entry, the processor continues to run.
The following load code always returns a null value(Retrieved the statemap : {})for the persisted field :
try {
stateMap = stateManager.getState(Scope.CLUSTER);
stateMapProperties = new HashMap<>(stateMap.toMap());
logger.debug("Retrieved the statemap : "+stateMapProperties);
lastMaxLSN = (stateMapProperties
.get(ProcessorConstants.LAST_MAX_LSN) == null || stateMapProperties
.get(ProcessorConstants.LAST_MAX_LSN).isEmpty()) ? null
: stateMapProperties.get(
ProcessorConstants.LAST_MAX_LSN).getBytes();
logger.debug("Attempted to load the previous lsn from NiFi state : "
+ lastMaxLSN);
} catch (IOException ioe) {
logger.error("Couldn't load the state map", ioe);
throw new ProcessException(ioe);
}
I am wondering if the ZK is at fault or have I missed something while using the State Map !

The docs for replace say:
"Updates the value of the component's state to the new value if and only if the value currently is the same as the given oldValue."
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/components/state/StateManager.java#L79-L92
I would suggest something like this:
if (stateMap.getVersion() == -1) {
stateManager.setState(stateMapProperties, Scope.CLUSTER);
} else {
stateManager.replace(stateMap, stateMapProperties, Scope.CLUSTER);
}
The first time through when you retrieve the state, the version should be -1 since nothing was ever stored before, and in that case you use setState, but then all the times after that you can use replace.

The idea behind replace() and the return value is, to be able to react on conflicts. Another task on the same or on another node (in a cluster) might have changed the state in the meantime. When replace() returns false, you can react to the conflict, sort out, what can be sorted out automatically and inform the user when it can not be sorted out.
This is the code I use:
/**
* Set or replace key-value pair in status cluster wide. In case of a conflict, it will retry to set the state, when the given
* key does not yet exist in the map. If the key exists and the value is equal to the given value, it does nothing. Otherwise
* it fails and returns false.
*
* #param stateManager that controls state cluster wide.
* #param key of key-value pair to be put in state map.
* #param value of key-value pair to be put in state map.
* #return true, if state map contains the key with a value equal to the given value, probably set by this function.
* False, if a conflict occurred and key-value pair is different.
* #throws IOException if the underlying state mechanism throws exception.
*/
private boolean setState(StateManager stateManager, String key, String value) throws IOException {
boolean somebodyElseUpdatedWithoutConflict = false;
do {
StateMap stateMap = stateManager.getState(Scope.CLUSTER);
// While the next two lines run, another thread might change the state.
Map<String,String> map = new HashMap<String, String>(stateMap.toMap()); // Make mutable
String oldValue = map.put(key, value);
if(!stateManager.replace(stateMap, map, Scope.CLUSTER)) {
// Conflict happened. Sort out action to take
if(oldValue == null)
somebodyElseUpdatedWithoutConflict = true; // Different key was changed. Retry
else if(oldValue.equals(value))
break; // Lazy case. Value already set
else
return false; // Unsolvable conflict
}
} while(somebodyElseUpdatedWithoutConflict);
return true;
}
You can replace the part after // Conflict happened... with whatever conflict resolution you need.

Related

How to modify/update to the data before sending it to downstream

I have a topic which has data in the format
{
before: {...},
after: {...},
source: {...},
op: 'u'
}
The data was produced by Debezium. I want to send the data to SQL Server db table, so I selected JDBC Sink Connector. I need to process the data before sending it to downstream.
Logic that needs to be applied:
if op = 'u' or op = 'c' or op = 'r' // update or insert or snapshot
select all the fields present in 'after' and perform upsert to downstream.
if op = 'd' // delete
select all the fields present in 'before' + add a field IsActive=false and perform upsert to downstream.
How can I achieve this?

If it is not mandatory for you to receive the complex debezium message to kafka topic, check the Debezium's New Record State Extraction SMT. You'll need to configure it in Debezium's connector configuration and if you use it with delete.handling.mode:rewrite you will get a field __deletedin your messages which will serve the purpose of the field IsActive you have indicated in your question.
The simplified format of the messages you will receive to kafka will match the format of messages that jbdc sink connector expects, although you might just need to apply some of Single Message Transforms for Confluent Platform to jdbc sink connector's configuration in order to filter some fields, replace some fields, etc.
As a side benefit, you'll also get much less data to kafka.

I was able do achieve this using custom transform in sink jdbc connector.
I extracted the after field and op field and applied the logic. There is no direct way to update the record i.e. there is no method to setSchema and setValue. So i have used reflection to update schema and value.
The below code snippets:
private final ExtractField<R> afterDelegate = new ExtractField.Value<R>();
private final ExtractField<R> beforeDelegate = new ExtractField.Value<R>();
private final ExtractField<R> operationDelegate = new ExtractField.Value<R>();
public R apply(R record) {
R operationRecord = operationDelegate.apply(record);
String op = String.valueOf(operationRecord.value());
Boolean isDeletedRecord = op.equalsIgnoreCase(Operation.DELETE.getValue())? true: false;
...
finalRecord = afterDelegate.apply(record);
if(isDeletedRecord){
addDeletedFlag(finalRecord);
}
}
private void addDeletedFlag(R finalRecord){
final SchemaBuilder builder = SchemaBuilder.struct();
builder.name(finalRecord.valueSchema().name());
for(Field f: finalRecord.valueSchema().fields()){
builder.field(f.name(),f.schema());
}
builder.field(deleteFlagName,Schema.BOOLEAN_SCHEMA).optional();
Schema newValueSchema = builder.build();
try{
java.lang.reflect.Field s = finalRecord.getClass().getSuperclass().getDeclaredField("valueSchema");
s.setAccessible(true);
s.set(finalRecord,newValueSchema);
}catch (Exception e){
e.printStackTrace();
}
Struct s = (Struct) finalRecord.value();
updateValueSchema(s,finalRecord.valueSchema());
updateValue(finalRecord.value(),true);
}
private void updateValueSchema(Object o,Schema newSchema) {
if(!(o instanceof Struct)){
return;
}
Struct value = (Struct) o;
try{
java.lang.reflect.Field s = value.getClass().getDeclaredField("schema");
s.setAccessible(true);
s.set(value,newSchema);
}catch (Exception e){
e.printStackTrace();
}
}
private void updateValue(Object o, Object newValue){
if(!(o instanceof Struct)){
return;
}
Struct value = (Struct) o;
try{
java.lang.reflect.Field s = value.getClass().getDeclaredField("values");
s.setAccessible(true);
Object[] newValueArray = ((Object[]) s.get(value)).clone();
List<Object> newValueList = new ArrayList<>(Arrays.asList(newValueArray));
newValueList.add(newValue);
s.set(value, newValueList.toArray());
}catch (Exception e){
e.printStackTrace();
}
}

Update KTable based on partial data attributes

I am trying to update a KTable with partial data of an object.
Eg. User object is
{"id":1, "name":"Joe", "age":28}
The object is being streamed into a topic and grouped by key into KTable.
Now the user object is updated partially as follows {"id":1, "age":33} and streamed into table. But the updated table looks as follows {"id":1, "name":null, "age":28}.
The expected output is {"id":1, "name":"Joe", "age":33}.
How can I use Kafka streams and spring cloud streams to achieve the expected output. Any suggestions would be appreciated. Thanks.
Here is the code
#Bean
public Function<KStream<String, User>, KStream<String, User>> process() {
return input -> input.map((key, user) -> new KeyValue<String, User>(user.getId(), user))
.groupByKey(Grouped.with(Serdes.String(), new JsonSerde<>(User.class))).reduce((user1, user2) -> {
user1.merge(user2);
return user1;
}, Materialized.as("allusers")).toStream();
}
and modified the User object with below code:
public void merge(Object newObject) {
assert this.getClass().getName().equals(newObject.getClass().getName());
for (Field field : this.getClass().getDeclaredFields()) {
for (Field newField : newObject.getClass().getDeclaredFields()) {
if (field.getName().equals(newField.getName())) {
try {
field.set(this, newField.get(newObject) == null ? field.get(this) : newField.get(newObject));
} catch (IllegalAccessException ignore) {
}
}
}
}
}
Is this the right approach or any other approach in KStreams?

I've tested your merge code, and it seems to be working as expected. But since your result after the reduce is {"id":1, "name":null, "age":28}, I can think of two things:
Your state isn't being updated at all, since no attribute has changed.
Maybe you have a serialization problem, since the string attribute is null, but the other int attributes are fine.
My guess is that, because you are mutating the original object and return the same value, kafka streams doesn't detect that as a change and won't store the new state. Actually, you shouldn't mutate your object, since it could lead to non-determinism depending on your pipeline.
Try to change your merge function to create a new User object, and see if the behavior changes.

So here is the recommended generic approach for merging the 2 objects, please feel to comment here. For this to work the the object being merged should have an empty constructor.
public <T> T mergeObjects(T first, T second) {
Class<?> clazz = first.getClass();
Field[] fields = clazz.getDeclaredFields();
Object newObject = null;
try {
newObject = clazz.getDeclaredConstructor().newInstance();
for (Field field : fields) {
field.setAccessible(true);
Object value1 = field.get(first);
Object value2 = field.get(second);
Object value = (value2 == null) ? value1 : value2;
field.set(newObject, value);
}
} catch (InstantiationException | IllegalAccessException | IllegalArgumentException
| InvocationTargetException | NoSuchMethodException | SecurityException e) {
e.printStackTrace();
}
return (T) newObject;
}

How to repeat Job with Partitioner when data is dynamic with Spring Batch?

I am trying to develop a batch process using Spring Batch + Spring Boot (Java config), but I have a problem doing so. I have a software that has a database and a Java API, and I read records from there. The batch process should retrieve all the documents which expiration date is less than a certain date, update the date, and save them again in the same database.
My first approach was reading the records 100 by 100; so the ItemReader retrieve 100 records, I process them 1 by 1, and finally I write them again. In the reader, I put this code:
public class DocumentItemReader implements ItemReader<Document> {
public List<Document> documents = new ArrayList<>();
#Override
public Document read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
if(documents.isEmpty()) {
getDocuments(); // This method retrieve 100 documents and store them in "documents" list.
if(documents.isEmpty()) return null;
}
Document doc = documents.get(0);
documents.remove(0);
return doc;
}
}
So, with this code, the reader reads from the database until no records are found. When the "getDocuments()" method doesn't retrieve any documents, the List is empty and the reader returns null (so the Job finish). Everything worked fine here.
However, the problem appears if I want to use several threads. In this case, I started using the Partitioner approach instead of Multi-threading. The reason of doing that is because I read from the same database, so if I repeat the full step with several threads, all of them will find the same records, and I cannot use pagination (see below).
Another problem is that database records are updated dynamically, so I cannot use pagination. For example, let's suppose I have 200 records, and all of them are going to expire soon, so the process is going to retrieve them. Now imagine I retrieve 10 with one thread, and before anything else, that thread process one and update it in the same database. The next thread cannot retrieve from 11 to 20 records, as the first record is not going to appear in the search (as it has been processed, its date has been updated, and then it doesn't match the query).
It is a little difficult to understand, and some things may sound strange, but in my project:
I am forced to use the same database to read and write.
I can have millions of documents, so I cannot read all the records at the same time. I need to read them 100 by 100, or 500 by 500.
I need to use several threads.
I cannot use pagination, as the query to the databse will retrieve different documents each time it is executed.
So, after hours thinking, I think the unique possible solution is to repeat the job until the query retrives no documents. Is this possible? I want to do something like the step does: Do something until null is returned - repeat the job until the query return zero records.
If this is not a good approach, I will appreciate other possible solutions.
Thank you.

Maybe you can add a partitioner to your step that will :
Select all the ids of the datas that needs to be updated (and other columns if needed)
Split them in x (x = gridSize parameter) partitions and write them in temporary file (1 by partition).
Register the filename to read in the executionContext
Then your reader is not reading from the database anymore but from the partitioned file.
Seem complicated but it's not that much, here is an example which handle millions of record using JDBC query but it can be easily transposed for your use case :
public class JdbcToFilePartitioner implements Partitioner {
/** number of records by database fetch */
private int fetchSize = 100;
/** working directory */
private File tmpDir;
/** limit the number of item to select */
private Long nbItemMax;
#Override
public Map<String, ExecutionContext> partition(final int gridSize) {
// Create contexts for each parttion
Map<String, ExecutionContext> executionsContexte = createExecutionsContext(gridSize);
// Fill partition with ids to handle
getIdsAndFillPartitionFiles(executionsContexte);
return executionsContexte;
}
/**
* #param gridSize number of partitions
* #return map of execution context, one for each partition
*/
private Map<String, ExecutionContext> createExecutionsContext(final int gridSize) {
final Map<String, ExecutionContext> map = new HashMap<>();
for (int partitionId = 0; partitionId < gridSize; partitionId++) {
map.put(String.valueOf(partitionId), createContext(partitionId));
}
return map;
}
/**
* #param partitionId id of the partition to create context
* #return created executionContext
*/
private ExecutionContext createContext(final int partitionId) {
final ExecutionContext context = new ExecutionContext();
String fileName = tmpDir + File.separator + "partition_" + partitionId + ".txt";
context.put(PartitionerConstantes.ID_GRID.getCode(), partitionId);
context.put(PartitionerConstantes.FILE_NAME.getCode(), fileName);
if (contextParameters != null) {
for (Entry<String, Object> entry : contextParameters.entrySet()) {
context.put(entry.getKey(), entry.getValue());
}
}
return context;
}
private void getIdsAndFillPartitionFiles(final Map<String, ExecutionContext> executionsContexte) {
List<BufferedWriter> fileWriters = new ArrayList<>();
try {
// BufferedWriter for each partition
for (int i = 0; i < executionsContexte.size(); i++) {
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(executionsContexte.get(String.valueOf(i)).getString(
PartitionerConstantes.FILE_NAME.getCode())));
fileWriters.add(bufferedWriter);
}
// Fetching the datas
ScrollableResults results = runQuery();
// Get the result and fill the files
int currentPartition = 0;
int nbWriting = 0;
while (results.next()) {
fileWriters.get(currentPartition).write(results.get(0).toString());
fileWriters.get(currentPartition).newLine();
currentPartition++;
nbWriting++;
// If we already write on all partitions, we start again
if (currentPartition >= executionsContexte.size()) {
currentPartition = 0;
}
// If we reach the max item to read we stop
if (nbItemMax != null && nbItemMax != 0 && nbWriting >= nbItemMax) {
break;
}
}
// closing
results.close();
session.close();
for (BufferedWriter bufferedWriter : fileWriters) {
bufferedWriter.close();
}
} catch (IOException | SQLException e) {
throw new UnexpectedJobExecutionException("Error writing partition file", e);
}
}
private ScrollableResults runQuery() {
...
}
}

How to handle duplicate messages using Kafka streaming DSL functions

My requirement is to skip or avoid duplicate messages(having same key) received from INPUT Topic using kafka stream DSL API.
There is possibility of source system sending duplicate messages to INPUT topic in case of any failures.
FLOW -
Source System --> INPUT Topic --> Kafka Streaming --> OUTPUT Topic
Currently I am using flatMap to generate multiple keys out the payload but flatMap is stateless so not able to avoid duplicate message processing upon receiving from INPUT Topic.
I am looking for DSL API which can skip duplicate records received from INPUT Topic and also generate multiple key/values before sending to OUTPUT Topic.
Thought Exactly Once configuration will be useful here to deduplicate messages received from INPUT Topic based on keys but looks like its not working, probably I did not understand usage of Exactly Once.
Could you please put some light on it.

My requirement is to skip or avoid duplicate messages(having same key) received from INPUT Topic using kafka stream DSL API.
Take a look at the EventDeduplication example at https://github.com/confluentinc/kafka-streams-examples, which does that. You can then adapt the example with the required flatMap functionality that is specific to your use case.
Here's the gist of the example:
final KStream<byte[], String> input = builder.stream(inputTopic);
final KStream<byte[], String> deduplicated = input.transform(
// In this example, we assume that the record value as-is represents a unique event ID by
// which we can perform de-duplication. If your records are different, adapt the extractor
// function as needed.
() -> new DeduplicationTransformer<>(windowSize.toMillis(), (key, value) -> value),
storeName);
deduplicated.to(outputTopic);
and
/**
* #param maintainDurationPerEventInMs how long to "remember" a known event (or rather, an event
* ID), during the time of which any incoming duplicates of
* the event will be dropped, thereby de-duplicating the
* input.
* #param idExtractor extracts a unique identifier from a record by which we de-duplicate input
* records; if it returns null, the record will not be considered for
* de-duping but forwarded as-is.
*/
DeduplicationTransformer(final long maintainDurationPerEventInMs, final KeyValueMapper<K, V, E> idExtractor) {
if (maintainDurationPerEventInMs < 1) {
throw new IllegalArgumentException("maintain duration per event must be >= 1");
}
leftDurationMs = maintainDurationPerEventInMs / 2;
rightDurationMs = maintainDurationPerEventInMs - leftDurationMs;
this.idExtractor = idExtractor;
}
#Override
#SuppressWarnings("unchecked")
public void init(final ProcessorContext context) {
this.context = context;
eventIdStore = (WindowStore<E, Long>) context.getStateStore(storeName);
}
public KeyValue<K, V> transform(final K key, final V value) {
final E eventId = idExtractor.apply(key, value);
if (eventId == null) {
return KeyValue.pair(key, value);
} else {
final KeyValue<K, V> output;
if (isDuplicate(eventId)) {
output = null;
updateTimestampOfExistingEventToPreventExpiry(eventId, context.timestamp());
} else {
output = KeyValue.pair(key, value);
rememberNewEvent(eventId, context.timestamp());
}
return output;
}
}
private boolean isDuplicate(final E eventId) {
final long eventTime = context.timestamp();
final WindowStoreIterator<Long> timeIterator = eventIdStore.fetch(
eventId,
eventTime - leftDurationMs,
eventTime + rightDurationMs);
final boolean isDuplicate = timeIterator.hasNext();
timeIterator.close();
return isDuplicate;
}
private void updateTimestampOfExistingEventToPreventExpiry(final E eventId, final long newTimestamp) {
eventIdStore.put(eventId, newTimestamp, newTimestamp);
}
private void rememberNewEvent(final E eventId, final long timestamp) {
eventIdStore.put(eventId, timestamp, timestamp);
}
#Override
public void close() {
// Note: The store should NOT be closed manually here via `eventIdStore.close()`!
// The Kafka Streams API will automatically close stores when necessary.
}
}
I am looking for DSL API which can skip duplicate records received from INPUT Topic and also generate multiple key/values before sending to OUTPUT Topic.
The DSL doesn't include such functionality out of the box, but the example above shows how you can easily build your own de-duplication logic by combining the DSL with the Processor API of Kafka Streams, with the use of Transformers.
Thought Exactly Once configuration will be useful here to deduplicate messages received from INPUT Topic based on keys but looks like its not working, probably I did not understand usage of Exactly Once.
As Matthias J. Sax mentioned in his answer, from Kafka's perspective these "duplicates" are not duplicates from the point of view of its exactly-once processing semantics. Kafka ensures that it will not introduce any such duplicates itself, but it cannot make such decisions out-of-the-box for upstream data sources, which are black box for Kafka.

Exactly-once can be use to ensure that consuming and processing an input topic, does not result in duplicates in the output topic. However, from an exactly-once point of view, the duplicates in the input topic that you describe are not really duplicates but two regular input messages.
For remove input topic duplicates, you can use a transform() step with an attached state store (there is no built-in operator in the DSL that does what you want). For each input records, you first check if you find the corresponding key in the store. If not, you add it to the store and forward the message. If you find it in the store, you drop the input as duplicate. Note, this will only work with 100% correctness guarantee if you enable exactly-once processing in your Kafka Streams application. Otherwise, even if you try do deduplicate, Kafka Streams could re-introduce duplication in case of a failure.
Additionally, you need to decide how long you want to keep entries in the store. You could use a Punctuation to remove old data from the store if you are sure that no further duplicate can be in the input topic. One way to do this, would be to store the record timestamp (or maybe offset) in the store, too. This way, you can compare the current time with the store record time within punctuate() and delete old records (ie, you would iterator over all entries in the store via store#all()).
After the transform() you apply your flatMap() (or could also merge your flatMap() code into transform() directly.

It's achievable with DSL only as well, using SessionWindows changelog without caching.
Wrap the value with duplicate flag
Turn the flag to true in reduce() within time window
Filter out true flag values
Unwrap the original key and value
Topology:
Serde<K> keySerde = ...;
Serde<V> valueSerde = ...;
Duration dedupWindowSize = ...;
Duration gracePeriod = ...;
DedupValueSerde<V> dedupValueSerde = new DedupValueSerde<>(valueSerde);
new StreamsBuilder()
.stream("input-topic", Consumed.with(keySerde, valueSerde))
.mapValues(v -> new DedupValue<>(v, false))
.groupByKey()
.windowedBy(SessionWindows.ofInactivityGapAndGrace(dedupWindowSize, gracePeriod))
.reduce(
(value1, value2) -> new DedupValue<>(value1.value(), true),
Materialized
.<K, DedupValue<V>, SessionStore<Bytes, byte[]>>with(keySerde, dedupValueSerde)
.withCachingDisabled()
)
.toStream()
.filterNot((wk, dv) -> dv == null || dv.duplicate())
.selectKey((wk, dv) -> wk.key())
.mapValues(DedupValue::value)
.to("output-topic", Produced.with(keySerde, valueSerde));
Value wrapper:
record DedupValue<V>(V value, boolean duplicate) { }
Value wrapper SerDe (example):
public class DedupValueSerde<V> extends WrapperSerde<DedupValue<V>> {
public DedupValueSerde(Serde<V> vSerde) {
super(new DvSerializer<>(vSerde.serializer()), new DvDeserializer<>(vSerde.deserializer()));
}
private record DvSerializer<V>(Serializer<V> vSerializer) implements Serializer<DedupValue<V>> {
#Override
public byte[] serialize(String topic, DedupValue<V> data) {
byte[] vBytes = vSerializer.serialize(topic, data.value());
return ByteBuffer
.allocate(vBytes.length + 1)
.put(data.duplicate() ? (byte) 1 : (byte) 0)
.put(vBytes)
.array();
}
}
private record DvDeserializer<V>(Deserializer<V> vDeserializer) implements Deserializer<DedupValue<V>> {
#Override
public DedupValue<V> deserialize(String topic, byte[] data) {
ByteBuffer buffer = ByteBuffer.wrap(data);
boolean duplicate = buffer.get() == (byte) 1;
int remainingSize = buffer.remaining();
byte[] vBytes = new byte[remainingSize];
buffer.get(vBytes);
V value = vDeserializer.deserialize(topic, vBytes);
return new DedupValue<>(value, duplicate);
}
}
}

Copy Document/Page excluding field/column or setting new value

I'm using version 8 of Kentico and I have a custom document/page that has a unique numeric identity field, unfortunately this data from an existing source and because I cannot set the primary key ID of the page's coupled data when using the API I was forced to have this separate field.
I ensure the field is new and unique during the DocumentEvents.Insert.Before event using node.SetValue("ItemIdentifier", newIdentifier); if the the node's class name matches, etc. So Workflow is handled as well I also implemented the same method for WorkflowEvents.SaveVersion.Before.
This works great when creating a new item, whoever if we attempt to Copy an existing node the source Identifier remains unchanged. I was hoping I could exclude the field from being copied, but am as yet to find an example of that.
So I went ahead and implemented a solution to ensure a new identifier is created when a node is being copied by handling the DocumentEvents.Copy.Before and DocumentEvents.Copy.After.
Unfortunately in my case the e.Node from these event args are useless, I could not for the life of me get the field modified, when I opened IlSpy I realized why, the node copy method grabs a fresh copy of the node from the database always! Hence rendering DocumentEvents.Copy.Before useless if you want to modify fields before a node is copied.
So I instead pass the identifier along in a RequestStockHelper that the Insert, further down the cycle, handles to generate a new identifier for the cloned node.
Unfortunately, unbeknownst to me, if we copy a published node, the value on the database is correct, but the NodeXML value of it is not.
This IMO sounds like a Kentico bug, it's either retaining the source node's NodeXML/version, or for some reason node.SetValue("ItemIdentifier", newIdentifier); is not working properly on the WorkflowEvents.SaveVersion.Before since it's a published and workflowed node.
Anyone come across a similar issue to this? Is there any other way I can configure a field to be a unique numeric identity field, that is not the primary key, and is automatically incremented when inserted? Or exclude a field from the copy procedure?

As a possible solution, could you create a new document in DocumentEvents.Copy.Before and copy the values over from the copied document, then cancel the copy event itself?

ok, turns out this is not a Kentico issue but the way versions are saved.
if you want to compute a unique value in DocumentEvents.Insert.Before you need to pass it along to WorkflowEvents.SaveVersion.Before because the node that is sent in the later is the same as the original from the former. e.g. whatever changes you do in the Insert node are not sent along to SaveVersion, you need to handle this manually.
So here's the pseudo code that handles the copy scenario and insert of a new item of compiled type CineDigitalAV:
protected override void OnInit()
{
base.OnInit();
DocumentEvents.Insert.Before += Insert_Before;
DocumentEvents.Copy.Before += Copy_Before;
WorkflowEvents.SaveVersion.Before += SaveVersion_Before;
}
private void Copy_Before(object sender, DocumentEventArgs e)
{
if (e.Node != null)
{
SetCopyCineDigitalIdentifier(e.Node);
}
}
private void SaveVersion_Before(object sender, WorkflowEventArgs e)
{
if (e.Document != null)
{
EnsureCineDigitalIdentifier(e.Document);
}
}
private void Insert_Before(object sender, DocumentEventArgs e)
{
if (e.Node != null)
{
EnsureCineDigitalIdentifier(e.Node);
}
}
private void SetCopyCineDigitalIdentifier(TreeNode node)
{
int identifier = 0;
if (node.ClassName == CineDigitalAV.CLASS_NAME)
{
identifier = node.GetValue<int>("AVCreation_Identifier", 0);
// flag next insert to create a new identifier
if (identifier > 0)
RequestStockHelper.Add("Copy-Identifier-" + identifier, true);
}
}
private void EnsureCineDigitalIdentifier(TreeNode node)
{
int identifier = 0;
if (node.ClassName == CineDigitalAV.CLASS_NAME)
{
identifier = node.GetValue<int>("AVCreation_Identifier", 0);
}
if (identifier == 0 || (identifier != 0 && RequestStockHelper.Contains("Copy-Identifier-" + identifier)))
{
// generate a new identifier for new items ot those being copied
RequestStockHelper.Remove("Copy-Identifier-" + identifier);
int newIdentifier = GetNewCineDigitalIdentifierAV(node.NodeSiteName);
node.SetValue("AVCreation_Identifier", newIdentifier);
// store the newidentifier so that saveversion includes it
RequestStockHelper.Add("Version-Identifier-" + identifier, newIdentifier);
}
else if (RequestStockHelper.Contains("Version-Identifier-" + identifier))
{
// handle saveversion with value from the insert
int newIdentifier = ValidationHelper.GetInteger(RequestStockHelper.GetItem("Version-Identifier-" + identifier), 0);
RequestStockHelper.Remove("Version-Identifier-" + identifier);
node.SetValue("AVCreation_Identifier", newIdentifier);
}
}
private int GetNewCineDigitalIdentifierAV(string siteName)
{
return (DocumentHelper.GetDocuments<CineDigitalAV>()
.OnSite(siteName)
.Published(false)
.Columns("AVCreation_Identifier")
.OrderByDescending("AVCreation_Identifier")
.FirstObject?
.AVCreation_Identifier ?? 0) + 1;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio