Update KTable based on partial data attributes - apache-kafka-streams

I am trying to update a KTable with partial data of an object.
Eg. User object is
{"id":1, "name":"Joe", "age":28}
The object is being streamed into a topic and grouped by key into KTable.
Now the user object is updated partially as follows {"id":1, "age":33} and streamed into table. But the updated table looks as follows {"id":1, "name":null, "age":28}.
The expected output is {"id":1, "name":"Joe", "age":33}.
How can I use Kafka streams and spring cloud streams to achieve the expected output. Any suggestions would be appreciated. Thanks.
Here is the code
#Bean
public Function<KStream<String, User>, KStream<String, User>> process() {
return input -> input.map((key, user) -> new KeyValue<String, User>(user.getId(), user))
.groupByKey(Grouped.with(Serdes.String(), new JsonSerde<>(User.class))).reduce((user1, user2) -> {
user1.merge(user2);
return user1;
}, Materialized.as("allusers")).toStream();
}
and modified the User object with below code:
public void merge(Object newObject) {
assert this.getClass().getName().equals(newObject.getClass().getName());
for (Field field : this.getClass().getDeclaredFields()) {
for (Field newField : newObject.getClass().getDeclaredFields()) {
if (field.getName().equals(newField.getName())) {
try {
field.set(this, newField.get(newObject) == null ? field.get(this) : newField.get(newObject));
} catch (IllegalAccessException ignore) {
}
}
}
}
}
Is this the right approach or any other approach in KStreams?

I've tested your merge code, and it seems to be working as expected. But since your result after the reduce is {"id":1, "name":null, "age":28}, I can think of two things:
Your state isn't being updated at all, since no attribute has changed.
Maybe you have a serialization problem, since the string attribute is null, but the other int attributes are fine.
My guess is that, because you are mutating the original object and return the same value, kafka streams doesn't detect that as a change and won't store the new state. Actually, you shouldn't mutate your object, since it could lead to non-determinism depending on your pipeline.
Try to change your merge function to create a new User object, and see if the behavior changes.

So here is the recommended generic approach for merging the 2 objects, please feel to comment here. For this to work the the object being merged should have an empty constructor.
public <T> T mergeObjects(T first, T second) {
Class<?> clazz = first.getClass();
Field[] fields = clazz.getDeclaredFields();
Object newObject = null;
try {
newObject = clazz.getDeclaredConstructor().newInstance();
for (Field field : fields) {
field.setAccessible(true);
Object value1 = field.get(first);
Object value2 = field.get(second);
Object value = (value2 == null) ? value1 : value2;
field.set(newObject, value);
}
} catch (InstantiationException | IllegalAccessException | IllegalArgumentException
| InvocationTargetException | NoSuchMethodException | SecurityException e) {
e.printStackTrace();
}
return (T) newObject;
}

Related

How is possible the Map will find the right element, when the HasCode() of that element has changed?

From my previous question: Hibernate: Cannot fetch data back to Map<>, I was getting NullPointerException after I tried to fetch data back. I though the reason was the primary key (when added to Map as put(K,V), the primary key was null, but after JPA persist, it created the primary key and thus changed the HashMap()). I had this equals and hashCode:
User.java:
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof User)) return false;
User user = (User) o;
return Objects.equals(id, user.id) && Objects.equals(username, user.username) && Objects.equals(about, user.about) && Objects.equals(friendships, user.friendships) && Objects.equals(posts, user.posts);
}
#Override
public int hashCode() {
return Objects.hash(id, username, about, friendships, posts);
}
-> I used all fields in the calculation of hash. That made the NullPointerException BUT not because of id (primary key), but because of collections involved in the hash (friends and posts). So I changed both functions to use only database equality:
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (id == null) return false;
if (!(o instanceof User)) return false;
User user = (User) o;
return this.id.equals(user.getId());
}
#Override
public int hashCode() {
return id == null ? System.identityHashCode(this) :
id.hashCode();
So now only the id field is involved in the hash. Now, it didn't give me NullPointerException for fetched data. I used this code to test it:
(from User.java):
public void addFriend(User friend){
Friendship friendship = new Friendship();
friendship.setOwner(this);
friendship.setFriend(friend);
this.friendships.put(friend, friendship);
}
DemoApplication.java:
#Bean
public CommandLineRunner dataLoader(UserRepository userRepo, FriendshipRepository friendshipRepo){
return new CommandLineRunner() {
#Override
public void run(String... args) throws Exception {
User f1 = new User("friend1");
User f2 = new User("friend2");
User u1 = new User("user1");
System.out.println(f1);
System.out.println(f1.hashCode());
u1.addFriend(f1);
u1.addFriend(f2);
userRepo.save(u1);
User fetchedUser = userRepo.findByUsername("user1");
System.out.println(fetchedUser.getFriendships().get(f1).getFriend());
System.out.println(fetchedUser.getFriendships().get(f1).getFriend().hashCode());
}
};
}
You can see I am
puting the f1 User into friendship of user1 (owner of the friendship). The time when the f1.getId() == null
saving the user1. The time when the f1 id gets assign its primary key value by Hibernate (because the friendship relation is Cascade.All so including the persisting)
Fetching the f1 User back by geting it from the Map, which does the look-up with the hashCode, which is now broken, because the f1.getId() != null.
But even then, I got the right element. The output:
User{id=null, username='friend1', about='null', friendships={}, posts=[]}
-935581894
...
User{id=3, username='friend1', about='null', friendships={}, posts=[]}
3
As you can see: the id is null, then 3 and the hashCode is -935581894, then 3... So how is possible I was able to get the right element?
Not all Map implementation use the hashCode (for example a TreeMap implementation do not use it, and rather uses a Comparator to sort entries into a tree).
So i would first check that hibernate is not replacing the field :
private Map<User, Friendship> friendships = new HashMap<>();
with its own implementation of Map.
Then, even if hibernate keeps the HashMap, and the hashcode of the object changed, you might be lucky and both old and new hashcodes gives the same bucket of the hashmap.
As the object is the same (the hibernate session garantees that), the equals used to find the object in the bucket will work. (if the bucket has more than 8 elements, instead of the bucket being a linked list, it will be a b-tree ordered on hashcode, in that case it won't find your entry, but the map seems to have only 2-3 elements so it can't be the case).
Now I understood your question.
Looking at the Map documentation we read the following:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map.
It looks like there is no definitive answer for this and as #Thierry already said it seems that you just got lucky. The key takeaway is "do not use mutable objects as Map keys".

How to retrieve data by property in Couchbase Lite?

My documents have the property docType that separated them based on the purpose of each type, in the specific case template or audit. However, when I do the following:
document.getProperty("docType").equals("template");
document.getProperty("docType").equals("audit");
The results of them are always the same, it returns every time all documents stored without filtering them by the docType.
Below, you can check the query function.
public static Query getData(Database database, final String type) {
View view = database.getView("data");
if (view.getMap() == null) {
view.setMap(new Mapper() {
#Override
public void map(Map<String, Object> document, Emitter emitter) {
if(String.valueOf(document.get("docType")).equals(type)){
emitter.emit(document.get("_id"), null);
}
}
}, "4");
}
return view.createQuery();
}
Any hint?
This is not a valid way to do it. Your view function must be pure (it cannot reference external state such as "type"). Once that is created you can then query it for what you want by setting start and end keys, or just a set of keys in general to filter on.

Parallel Stream repeating items

I am retrieving big chunks of data from DB and using this data to write it somewhere else. In order to avoid a long processing time, I'm trying to use parallel streams to write it. When I run this as sequential streams, it works perfectly. However, if I change it to parallel, the behavior is odd: it prints the same object multiple times (more than 10).
#PostConstruct
public void retrieveAllTypeRecords() throws SQLException {
logger.info("Retrieve batch of Type records.");
try {
Stream<TypeRecord> typeQueryAsStream = jdbcStream.getTypeQueryAsStream();
typeQueryAsStream.forEach((type) -> {
logger.info("Printing Type with field1: {} and field2: {}.", type.getField1(), type.getField2()); //the same object gets printed here multiple times
//write this object somewhere else
});
logger.info("Completed full retrieval of Type data.");
} catch (Exception e) {
logger.error("error: " + e);
}
}
public Stream<TypeRecord> getTypeQueryAsStream() throws SQLException {
String sql = typeRepository.getQueryAllTypesRecords(); //retrieves SQL query in String format
TypeMapper typeMapper = new TypeMapper();
JdbcStream.StreamableQuery query = jdbcStream.streamableQuery(sql);
Stream<TypeRecord> stream = query.stream()
.map(row -> {
return typeMapper.mapRow(row); //maps columns values to object values
});
return stream;
}
public class StreamableQuery implements Closeable {
(...)
public Stream<SqlRow> stream() throws SQLException {
final SqlRowSet rowSet = new ResultSetWrappingSqlRowSet(preparedStatement.executeQuery());
final SqlRow sqlRow = new SqlRowAdapter(rowSet);
Supplier<Spliterator<SqlRow>> supplier = () -> Spliterators.spliteratorUnknownSize(new Iterator<SqlRow>() {
#Override
public boolean hasNext() {
return !rowSet.isLast();
}
#Override
public SqlRow next() {
if (!rowSet.next()) {
throw new NoSuchElementException();
}
return sqlRow;
}
}, Spliterator.CONCURRENT);
return StreamSupport.stream(supplier, Spliterator.CONCURRENT, true); //this boolean sets the stream as parallel
}
}
I've also tried using typeQueryAsStream.parallel().forEach((type) but the result is the same.
Example of output:
[ForkJoinPool.commonPool-worker-1] INFO TypeService - Saving Type with field1: L6797 and field2: P1433.
[ForkJoinPool.commonPool-worker-1] INFO TypeService - Saving Type with field1: L6797 and field2: P1433.
[main] INFO TypeService - Saving Type with field1: L6797 and field2: P1433.
[ForkJoinPool.commonPool-worker-1] INFO TypeService - Saving Type with field1: L6797 and field2: P1433.
Well, look at you code,
final SqlRow sqlRow = new SqlRowAdapter(rowSet);
Supplier<Spliterator<SqlRow>> supplier = () -> Spliterators.spliteratorUnknownSize(new Iterator<SqlRow>() {
…
#Override
public SqlRow next() {
if (!rowSet.next()) {
throw new NoSuchElementException();
}
return sqlRow;
}
}, Spliterator.CONCURRENT);
You are returning the same object every time. You achieve your desired effects by implicitly modifying the state of this object when calling rowSet.next().
This obviously can’t work when multiple threads try to access that single object concurrently. Even buffering some items, to hand them over to another thread will cause trouble. Therefore, such interference can cause problems with sequential streams as well, as soon as stateful intermediate operations are involved, like sorted or distinct.
Assuming that typeMapper.mapRow(row) will produce an actual data item which has no interference to other data items, you should integrate this step into the stream source, to create a valid stream.
public Stream<TypeRecord> stream(TypeMapper typeMapper) throws SQLException {
SqlRowSet rowSet = new ResultSetWrappingSqlRowSet(preparedStatement.executeQuery());
SqlRow sqlRow = new SqlRowAdapter(rowSet);
Spliterator<TypeRecord> sp = new Spliterators.AbstractSpliterator<TypeRecord>(
Long.MAX_VALUE, Spliterator.CONCURRENT|Spliterator.ORDERED) {
#Override
public boolean tryAdvance(Consumer<? super TypeRecord> action) {
if(!rowSet.next()) return false;
action.accept(typeMapper.mapRow(sqlRow));
return true;
}
};
return StreamSupport.stream(sp, true); //this boolean sets the stream as parallel
}
Note that for a lot of use cases, like this one, implementing a Spliterator is simpler than implementing an Iterator (which needs to be wrapped via spliteratorUnknownSize anyway). Also, there is no need to encapsulate this instantiation into a Supplier.
As a final note, the current implementation does not perform well for streams with an unknown size, as it treats Long.MAX_VALUE like a very large number, ignoring the “unknown” semantic assigned to it by the specification. It will be very beneficial to the parallel performance to provide an estimate size, it doesn’t need to be precise, in fact, with the current implementation, even a completely made up number, say 1000 may perform better than correctly using Long.MAX_VALUE to denote an entirely unknown size.

State Manager not persisting/retrieving data

NiFi 1.1.1
I am trying to persist a byte [] using the ​State Manager.
private byte[] lsnUsedDuringLastLoad;
#Override
public void onTrigger(final ProcessContext context,
final ProcessSession session) throws ProcessException {
...
...
...
​final StateManager stateManager = context.getStateManager();
try {
StateMap stateMap = stateManager.getState(Scope.CLUSTER);
final Map<String, String> newStateMapProperties = new HashMap<>();
newStateMapProperties.put(ProcessorConstants.LAST_MAX_LSN,
new String(lsnUsedDuringLastLoad));
logger.debug("Persisting stateMap : "
+ newStateMapProperties);
stateManager.replace(stateMap, newStateMapProperties,
Scope.CLUSTER);
} catch (IOException ioException) {
logger.error("Error while persisting the state to NiFi",
ioException);
throw new ProcessException(
"The state(LSN) couldn't be persisted", ioException);
}
...
...
...
}
I don't get any exception or even a log error entry, the processor continues to run.
The following load code always returns a null value(Retrieved the statemap : {})for the persisted field :
try {
stateMap = stateManager.getState(Scope.CLUSTER);
stateMapProperties = new HashMap<>(stateMap.toMap());
logger.debug("Retrieved the statemap : "+stateMapProperties);
lastMaxLSN = (stateMapProperties
.get(ProcessorConstants.LAST_MAX_LSN) == null || stateMapProperties
.get(ProcessorConstants.LAST_MAX_LSN).isEmpty()) ? null
: stateMapProperties.get(
ProcessorConstants.LAST_MAX_LSN).getBytes();
logger.debug("Attempted to load the previous lsn from NiFi state : "
+ lastMaxLSN);
} catch (IOException ioe) {
logger.error("Couldn't load the state map", ioe);
throw new ProcessException(ioe);
}
I am wondering if the ZK is at fault or have I missed something while using the State Map !
The docs for replace say:
"Updates the value of the component's state to the new value if and only if the value currently is the same as the given oldValue."
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/components/state/StateManager.java#L79-L92
I would suggest something like this:
if (stateMap.getVersion() == -1) {
stateManager.setState(stateMapProperties, Scope.CLUSTER);
} else {
stateManager.replace(stateMap, stateMapProperties, Scope.CLUSTER);
}
The first time through when you retrieve the state, the version should be -1 since nothing was ever stored before, and in that case you use setState, but then all the times after that you can use replace.
The idea behind replace() and the return value is, to be able to react on conflicts. Another task on the same or on another node (in a cluster) might have changed the state in the meantime. When replace() returns false, you can react to the conflict, sort out, what can be sorted out automatically and inform the user when it can not be sorted out.
This is the code I use:
/**
* Set or replace key-value pair in status cluster wide. In case of a conflict, it will retry to set the state, when the given
* key does not yet exist in the map. If the key exists and the value is equal to the given value, it does nothing. Otherwise
* it fails and returns false.
*
* #param stateManager that controls state cluster wide.
* #param key of key-value pair to be put in state map.
* #param value of key-value pair to be put in state map.
* #return true, if state map contains the key with a value equal to the given value, probably set by this function.
* False, if a conflict occurred and key-value pair is different.
* #throws IOException if the underlying state mechanism throws exception.
*/
private boolean setState(StateManager stateManager, String key, String value) throws IOException {
boolean somebodyElseUpdatedWithoutConflict = false;
do {
StateMap stateMap = stateManager.getState(Scope.CLUSTER);
// While the next two lines run, another thread might change the state.
Map<String,String> map = new HashMap<String, String>(stateMap.toMap()); // Make mutable
String oldValue = map.put(key, value);
if(!stateManager.replace(stateMap, map, Scope.CLUSTER)) {
// Conflict happened. Sort out action to take
if(oldValue == null)
somebodyElseUpdatedWithoutConflict = true; // Different key was changed. Retry
else if(oldValue.equals(value))
break; // Lazy case. Value already set
else
return false; // Unsolvable conflict
}
} while(somebodyElseUpdatedWithoutConflict);
return true;
}
You can replace the part after // Conflict happened... with whatever conflict resolution you need.

learning java stream, how to pass a value from the outer loop to the nested loop in a functional way

I have map of a map of strings. This map is a parsing of a json object and represents the criteria entered by the user to filter a list in the UI.
In the rest service I want to populate an object with data comes from this map. Unfortunately I cannot change queryModel Object. Query Model object has a list of filters. Each filter has a list of fields and a list of operations to be applied to the field. My goal is to convert the following code with java 8 stream.
for(Map.Entry<String,Map<String,String>> entry: filters.entrySet()) {
Filter filter = new Filter();
filter.setFields(new ArrayList<String>());
filter.getFields().add(entry.getKey());
filter.setValues(new ArrayList<String>());
filter.setOperators(new ArrayList<String>());
if (entry.getValue() != null) {
for(String key : entry.getValue().keySet()) {
if(key.equals("value")) {
filter.getValues().add(entry.getValue().get(key));
}
else if(key.equals("matchMode")){
filter.getOperators().add(entry.getValue().get(key));
}
}
queryModel.getFilters().add(filter);
}
As you can see I first set the name of the field in the fields list and then for that field I loop in the values to get the value entered and the match mode. In a functional I don't know ho to save the field of the outer loop to set it in the filter object created in the inner loop.
That was my attempt
public static Filter getFilter(Map.Entry<String,String> entry) {
Filter filter = new Filter();
filter.setFields(new ArrayList<String>());
filter.getFields().add(entry.getKey());
filter.setValues(new ArrayList<String>());
filter.setOperators(new ArrayList<String>());
if(entry.getKey().equals("value")) {
filter.getValues().add(entry.getValue());
}
else if(entry.getKey().equals("matchMode")){
filter.getOperators().add(entry.getValue());
}
return filter;
}
List<Filter> filterList = filters.entrySet().stream()
.filter( stringMapEntry -> stringMapEntry.getValue() != null)
.flatMap( entry -> entry.getValue().entrySet().stream())
.map (innerEntry-> QueryModelAdapter.getFilter(innerEntry))
.collect (Collectors.toList());
queryModel.setFilters (filterList);
I need in QueryModelAdapter.getFilter the entry of the flat map. How can I do that?
Before I say anything, be polite when asking questions. Nobody gets paid for answering questions here. All are doing it for their pleasure.
So, be nice to them at least with your words.
Alright, I think your question is more suitable for CodeReview than StackOverflow.
One thing to note, You can't rewrite your legacy java projects to have every single line with lambdas and streams.
Sometimes, it's better the old fashioned way than the new features.
You don't need to iterate a Map to retrieve its matching value. You can remove that Inner-loop.
Let's take your current class (whatever the class you copied the code from) named it as RespectOthers.java
private static Filter getEmptyFilter(){
Filter filter = new Filter();
filter.setFields(new ArrayList<String>());
filter.setValues(new ArrayList<String>());
filter.setOperators(new ArrayList<String>());
return filter;
}
private static Filter setKeyAndValues(Filter inputFilterObj, Map.Entry<String,Map<String,String>> entry, QueryModel queryModel){
inputFilterObj.setFields(new ArrayList<String>());
inputFilterObj.getFields().add(entry.getKey());
if (entry.getValue() != null) {
inputFilterObj.getValues().add(entry.getValue().get("value"));
inputFilterObj.getOperators().add(entry.getValue().get("matchMode"));
queryModel.getFilters().add(inputFilterObj);
}
return inputFilterObj;
}
List<Filter> finalOutput = filters.entrySet().stream()
.map(e -> RespectOthers.setKeyAndValues(RespectOthers.getEmptyFilter(), e, myQueryModel))
.collect(Collectors.toList());

Resources