Kafka stream groupByKey not working for count() - apache-kafka-streams

I am trying to generate count based on keys, using the below code, this code is based on the word count example. Strangely if the mapValues function returns on a String then the groupBy works as mentioned in the commented line, but when I send a keypair of String as key and GenericRecord as value.
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
final Map<String, String> serdeConfig = Collections.singletonMap("schema.registry.url","http://localhost:8081");
stringSerde.configure(serdeConfig, true); // `true` for record keys
final Serde<GenericRecord> valueGenericAvroSerde = new GenericAvroSerde();
valueGenericAvroSerde.configure(serdeConfig, false); // `false` for record values
StreamsBuilder builder = new StreamsBuilder();
KStream<String, GenericRecord> textLines =
builder.stream("ora-query-in",Consumed.with(stringSerde, valueGenericAvroSerde));
final KTable<String, Long> wordCounts = textLines
.mapValues(new ValueMapperWithKey<String, GenericRecord, KeyValue<String, GenericRecord>>() {
#Override
public KeyValue<String, GenericRecord> apply(String arg0, GenericRecord arg1) {
return new KeyValue<String, GenericRecord>(arg1.get("KEY_FIELD").toString(),arg1);
}
})
// .groupBy((key, value) -> value) //THIS WORKS if value is STRING
// .groupBy((key, value) -> key) //DOES NOT WORK EITHER
.groupByKey() //THIS does nothing
.count();
wordCounts.toStream().to("test.topic.out",Produced.with(stringSerde, longSerde));
Am I missing something in configuration
streamsConfiguration.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());

You haven't write what exactly is wrong but it seems it is an issue with Serialization
You can use:
KStream::groupBy(final KeyValueMapper<? super K, ? super V, KR> selector, final Grouped<KR, V> grouped).
someStream.groupByKey((key, value) -> value, Grouped.with(newKeySerdes, valueSerdes)
KGroupedStream::count(final Materialized<K, Long, KeyValueStore<Bytes, byte[]>> materialized)
someGroupedStream.count(Materialized.with(newKeySerdes, valueSerdes)
Can be same reason like:
Kafka Streams 2.1.1 class cast while flushing timed aggregation to store
KafkaStreams: Getting Window Final Results

Related

How to convert Generic List with Predicate Interface into a Lambda Expression?

I am learning Java 8 Functional Interface and was trying out some examples.
I am trying to create a method which will accept Generic List as one argument and a String data filter argument as another.
Below code is working as expected, but when I am trying to convert Predicate into Lambda Expression, then I am struggling.
#SuppressWarnings("unchecked")
public static <T> List<T> filter_and_find_only_selected_Data1(List<T> genericList, String dataFilter){
Stream<List<T>> list = genericList.stream().map(eachListObj-> {
if(eachListObj instanceof Employee){
return genericList.stream().filter((Predicate<? super T>) new Predicate<Employee>() {
public boolean test(Employee eachEmpObj) {
return eachEmpObj.getEmpDept().equalsIgnoreCase(dataFilter);
}
}).collect(Collectors.toList());
}else if(eachListObj instanceof Customer){
return genericList.stream().filter((Predicate<? super T>) new Predicate<Customer>(){
public boolean test(Customer eachCust) {
return !eachCust.getCustomerName().equalsIgnoreCase(dataFilter);
}
}).collect(Collectors.toList());
}
return null;
});
return list.findAny().get();
}
Is there any way, I can convert the Predicate into Lambda as well as if there a way, I can convert if-else-if into Ternary Operator.
Like: (if condition)?return Value:(else-if condition):return value:null;
I think, you actually want something like this:
public static <T> List<T> filter_and_find_only_selected_Data(
List<T> list, Function<? super T, String> stringProperty, String filterValue) {
return list.stream()
.filter(t -> filterValue.equalsIgnoreCase(stringProperty.apply(t)))
.collect(Collectors.toList());
}
Then, the caller can use
List<Employee> source = …;
List<Employee> filtered
= filter_and_find_only_selected_Data(source, Employee::getEmpDept, "value");
or
List<Customer> source = …;
List<Customer> filtered
= filter_and_find_only_selected_Data(source, Customer::getCustomerName, "Bob");
or
List<File> source = Arrays.asList(new File("foo", "bar"), new File("foo", "test"),
new File("xyz"), new File("TEST"), new File("abc", "bar"), new File("bla", "Test"));
List<File> filtered = filter_and_find_only_selected_Data(source, File::getName, "test");
to demonstrate the flexibility of a truly generic method.
Why not put all in the filter? try this
return genericList.stream().filter(item ->
(item instanceof Customer && ((Customer) item).getCustomerName().equalsIgnoreCase(dataFilter)
|| (item instanceof Employee && ((Employee) item).getEmpDept().equalsIgnoreCase(dataFilter))))
.collect(Collectors.toList());
or extract a function for this filter
public boolean isAllow(T item, String dataFilter) {
return (item instanceof Customer && ((Customer) item).getCustomerName().equalsIgnoreCase(dataFilter))
|| (item instanceof Employee && ((Employee) item).getEmpDept().equalsIgnoreCase(dataFilter)))
}
//then use it in filter
return genericList.stream().filter(item -> isAllow(item, dataFilter)
.collect(Collectors.toList());
Hope it helps
The generics doesn't help you much here since Customer and Employee seem not mutually compatible. As long as you want to use a generic type <T>, you have to assure that this type is consistent across all the method scope execution. All you can do is using the explicit cast.
I'd start with a static Map extracting a mapping function based on the incoming Class<?>. The Function<Object, String> results in String as long as you wish to compare these with dataFilter:
static Map<Class<?>, Function<Object, String>> exctractionMap() {
Map<Class<?>, Function<Object, String>> map = new HashMap<>();
map.put(Customer.class, item -> Customer.class.cast(item).getCustomerName());
map.put(Employee.class, item -> Employee.class.cast(item).getEmpDept());
return map;
}
Putting this static map aside for a while, I think your whole stream might be simplified anyway. This should work together:
static List<String> findSelectedData(List<?> genericList, String dataFilter) {
return genericList.stream() // Stream<Object>
.map(item -> exctractionMap() // Stream<String> using the function
.get(item.getClass()) // ... get through Class<Object>
.apply(item)) // ... applied Function<Object,String>
.filter(s-> s.equalsIgnoreCase(dataFilter)) // Stream<String> equal to dataFilter
.collect(Collectors.toList()); // List<String>
}
A note: Please, respect the Java conventions and name the method filterAndFindOnlySelectedData1.

Convert a list of objects to a map of key and list of objects in java 8 [duplicate]

I want to translate a List of objects into a Map using Java 8's streams and lambdas.
This is how I would write it in Java 7 and below.
private Map<String, Choice> nameMap(List<Choice> choices) {
final Map<String, Choice> hashMap = new HashMap<>();
for (final Choice choice : choices) {
hashMap.put(choice.getName(), choice);
}
return hashMap;
}
I can accomplish this easily using Java 8 and Guava but I would like to know how to do this without Guava.
In Guava:
private Map<String, Choice> nameMap(List<Choice> choices) {
return Maps.uniqueIndex(choices, new Function<Choice, String>() {
#Override
public String apply(final Choice input) {
return input.getName();
}
});
}
And Guava with Java 8 lambdas.
private Map<String, Choice> nameMap(List<Choice> choices) {
return Maps.uniqueIndex(choices, Choice::getName);
}
Based on Collectors documentation it's as simple as:
Map<String, Choice> result =
choices.stream().collect(Collectors.toMap(Choice::getName,
Function.identity()));
If your key is NOT guaranteed to be unique for all elements in the list, you should convert it to a Map<String, List<Choice>> instead of a Map<String, Choice>
Map<String, List<Choice>> result =
choices.stream().collect(Collectors.groupingBy(Choice::getName));
Use getName() as the key and Choice itself as the value of the map:
Map<String, Choice> result =
choices.stream().collect(Collectors.toMap(Choice::getName, c -> c));
Most of the answers listed, miss a case when the list has duplicate items. In that case there answer will throw IllegalStateException. Refer the below code to handle list duplicates as well:
public Map<String, Choice> convertListToMap(List<Choice> choices) {
return choices.stream()
.collect(Collectors.toMap(Choice::getName, choice -> choice,
(oldValue, newValue) -> newValue));
}
Here's another one in case you don't want to use Collectors.toMap()
Map<String, Choice> result =
choices.stream().collect(HashMap<String, Choice>::new,
(m, c) -> m.put(c.getName(), c),
(m, u) -> {});
One more option in simple way
Map<String,Choice> map = new HashMap<>();
choices.forEach(e->map.put(e.getName(),e));
For example, if you want convert object fields to map:
Example object:
class Item{
private String code;
private String name;
public Item(String code, String name) {
this.code = code;
this.name = name;
}
//getters and setters
}
And operation convert List To Map:
List<Item> list = new ArrayList<>();
list.add(new Item("code1", "name1"));
list.add(new Item("code2", "name2"));
Map<String,String> map = list.stream()
.collect(Collectors.toMap(Item::getCode, Item::getName));
If you don't mind using 3rd party libraries, AOL's cyclops-react lib (disclosure I am a contributor) has extensions for all JDK Collection types, including List and Map.
ListX<Choices> choices;
Map<String, Choice> map = choices.toMap(c-> c.getName(),c->c);
You can create a Stream of the indices using an IntStream and then convert them to a Map :
Map<Integer,Item> map =
IntStream.range(0,items.size())
.boxed()
.collect(Collectors.toMap (i -> i, i -> items.get(i)));
I was trying to do this and found that, using the answers above, when using Functions.identity() for the key to the Map, then I had issues with using a local method like this::localMethodName to actually work because of typing issues.
Functions.identity() actually does something to the typing in this case so the method would only work by returning Object and accepting a param of Object
To solve this, I ended up ditching Functions.identity() and using s->s instead.
So my code, in my case to list all directories inside a directory, and for each one use the name of the directory as the key to the map and then call a method with the directory name and return a collection of items, looks like:
Map<String, Collection<ItemType>> items = Arrays.stream(itemFilesDir.listFiles(File::isDirectory))
.map(File::getName)
.collect(Collectors.toMap(s->s, this::retrieveBrandItems));
I will write how to convert list to map using generics and inversion of control. Just universal method!
Maybe we have list of Integers or list of objects. So the question is the following: what should be key of the map?
create interface
public interface KeyFinder<K, E> {
K getKey(E e);
}
now using inversion of control:
static <K, E> Map<K, E> listToMap(List<E> list, KeyFinder<K, E> finder) {
return list.stream().collect(Collectors.toMap(e -> finder.getKey(e) , e -> e));
}
For example, if we have objects of book , this class is to choose key for the map
public class BookKeyFinder implements KeyFinder<Long, Book> {
#Override
public Long getKey(Book e) {
return e.getPrice()
}
}
I use this syntax
Map<Integer, List<Choice>> choiceMap =
choices.stream().collect(Collectors.groupingBy(choice -> choice.getName()));
It's possible to use streams to do this. To remove the need to explicitly use Collectors, it's possible to import toMap statically (as recommended by Effective Java, third edition).
import static java.util.stream.Collectors.toMap;
private static Map<String, Choice> nameMap(List<Choice> choices) {
return choices.stream().collect(toMap(Choice::getName, it -> it));
}
Another possibility only present in comments yet:
Map<String, Choice> result =
choices.stream().collect(Collectors.toMap(c -> c.getName(), c -> c)));
Useful if you want to use a parameter of a sub-object as Key:
Map<String, Choice> result =
choices.stream().collect(Collectors.toMap(c -> c.getUser().getName(), c -> c)));
Map<String, Set<String>> collect = Arrays.asList(Locale.getAvailableLocales()).stream().collect(Collectors
.toMap(l -> l.getDisplayCountry(), l -> Collections.singleton(l.getDisplayLanguage())));
This can be done in 2 ways. Let person be the class we are going to use to demonstrate it.
public class Person {
private String name;
private int age;
public String getAge() {
return age;
}
}
Let persons be the list of Persons to be converted to the map
1.Using Simple foreach and a Lambda Expression on the List
Map<Integer,List<Person>> mapPersons = new HashMap<>();
persons.forEach(p->mapPersons.put(p.getAge(),p));
2.Using Collectors on Stream defined on the given List.
Map<Integer,List<Person>> mapPersons =
persons.stream().collect(Collectors.groupingBy(Person::getAge));
Here is solution by StreamEx
StreamEx.of(choices).toMap(Choice::getName, c -> c);
Map<String,Choice> map=list.stream().collect(Collectors.toMap(Choice::getName, s->s));
Even serves this purpose for me,
Map<String,Choice> map= list1.stream().collect(()-> new HashMap<String,Choice>(),
(r,s) -> r.put(s.getString(),s),(r,s) -> r.putAll(s));
If every new value for the same key name has to be overridden:
public Map < String, Choice > convertListToMap(List < Choice > choices) {
return choices.stream()
.collect(Collectors.toMap(Choice::getName,
Function.identity(),
(oldValue, newValue) - > newValue));
}
If all choices have to be grouped in a list for a name:
public Map < String, Choice > convertListToMap(List < Choice > choices) {
return choices.stream().collect(Collectors.groupingBy(Choice::getName));
}
List<V> choices; // your list
Map<K,V> result = choices.stream().collect(Collectors.toMap(choice::getKey(),choice));
//assuming class "V" has a method to get the key, this method must handle case of duplicates too and provide a unique key.
As an alternative to guava one can use kotlin-stdlib
private Map<String, Choice> nameMap(List<Choice> choices) {
return CollectionsKt.associateBy(choices, Choice::getName);
}
List<Integer> listA = new ArrayList<>();
listA.add(1);
listA.add(5);
listA.add(3);
listA.add(4);
System.out.println(listA.stream().collect(Collectors.toMap(x ->x, x->x)));
String array[] = {"ASDFASDFASDF","AA", "BBB", "CCCC", "DD", "EEDDDAD"};
List<String> list = Arrays.asList(array);
Map<Integer, String> map = list.stream()
.collect(Collectors.toMap(s -> s.length(), s -> s, (x, y) -> {
System.out.println("Dublicate key" + x);
return x;
},()-> new TreeMap<>((s1,s2)->s2.compareTo(s1))));
System.out.println(map);
Dublicate key AA
{12=ASDFASDFASDF, 7=EEDDDAD, 4=CCCC, 3=BBB, 2=AA}

can I modify the consumer auto-offset-reset to latest of kafka stream?

Working with kafka 0.10.1.0, I used these config
val props = new Properties
props.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId)
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, broker)
props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String.getClass)
props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.Integer.getClass)
props.put(StreamsConfig.consumerPrefix(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG), "latest")
but these code props.put(StreamsConfig.consumerPrefix(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG), "latest")
does not work, what is the reason?
I read the code of org.apache.kafka.streams.StreamsConfig, there has some code:
private static final Map<String, Object> CONSUMER_DEFAULT_OVERRIDES;
static
{
Map<String, Object> tempConsumerDefaultOverrides = new HashMap<>();
tempConsumerDefaultOverrides.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "1000");
tempConsumerDefaultOverrides.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
tempConsumerDefaultOverrides.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
CONSUMER_DEFAULT_OVERRIDES = Collections.unmodifiableMap(tempConsumerDefaultOverrides);
}
public Map<String, Object> getConsumerConfigs(StreamThread streamThread, String groupId, String clientId) throws ConfigException {
final Map<String, Object> consumerProps = getClientPropsWithPrefix(CONSUMER_PREFIX, ConsumerConfig.configNames());
// disable auto commit and throw exception if there is user overridden values,
// this is necessary for streams commit semantics
if (consumerProps.containsKey(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG)) {
throw new ConfigException("Unexpected user-specified consumer config " + ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG
+ ", as the streams client will always turn off auto committing.");
}
consumerProps.putAll(CONSUMER_DEFAULT_OVERRIDES);
// bootstrap.servers should be from StreamsConfig
consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, this.originals().get(BOOTSTRAP_SERVERS_CONFIG));
// add client id with stream client id prefix, and group id
consumerProps.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
consumerProps.put(CommonClientConfigs.CLIENT_ID_CONFIG, clientId + "-consumer");
// add configs required for stream partition assignor
consumerProps.put(StreamsConfig.InternalConfig.STREAM_THREAD_INSTANCE, streamThread);
consumerProps.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, getInt(REPLICATION_FACTOR_CONFIG));
consumerProps.put(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, getInt(NUM_STANDBY_REPLICAS_CONFIG));
consumerProps.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, StreamPartitionAssignor.class.getName());
consumerProps.put(StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG, getLong(WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG));
if (!getString(ZOOKEEPER_CONNECT_CONFIG).equals("")) {
consumerProps.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, getString(ZOOKEEPER_CONNECT_CONFIG));
}
consumerProps.put(APPLICATION_SERVER_CONFIG, getString(APPLICATION_SERVER_CONFIG));
return consumerProps;
}
It will be use the CONSUMER_DEFAULT_OVERRIDES override the config of I set?
This is a bug of 0.10.1.0,which is fixed in 0.10.1.1 and beyond.
https://issues.apache.org/jira/browse/KAFKA-4361

Gson: How do I deserialize an inner JSON object to a map if the property name is not fixed?

My client retrieves JSON content as below:
{
"table": "tablename",
"update": 1495104575669,
"rows": [
{"column5": 11, "column6": "yyy"},
{"column3": 22, "column4": "zzz"}
]
}
In rows array content, the key is not fixed. I want to retrieve the key and value and save into a Map using Gson 2.8.x.
How can I configure Gson to simply use to deserialize?
Here is my idea:
public class Dataset {
private String table;
private long update;
private List<Rows>> lists; <-- little confused here.
or private List<HashMap<String,Object> lists
Setter/Getter
}
public class Rows {
private HashMap<String, Object> map;
....
}
Dataset k = gson.fromJson(jsonStr, Dataset.class);
log.info(k.getRows().size()); <-- I got two null object
Thanks.
Gson does not support such a thing out of box. It would be nice, if you can make the property name fixed. If not, then you can have a few options that probably would help you.
Just rename the Dataset.lists field to Dataset.rows, if the property name is fixed, rows.
If the possible name set is known in advance, suggest Gson to pick alternative names using the #SerializedName.
If the possible name set is really unknown and may change in the future, you might want to try to make it fully dynamic using a custom TypeAdapter (streaming mode; requires less memory, but harder to use) or a custom JsonDeserializer (object mode; requires more memory to store intermediate tree views, but it's easy to use) registered with GsonBuilder.
For option #2, you can simply add the names of name alternatives:
#SerializedName(value = "lists", alternate = "rows")
final List<Map<String, Object>> lists;
For option #3, bind a downstream List<Map<String, Object>> type adapter trying to detect the name dynamically. Note that I omit the Rows class deserialization strategy for simplicity (and I believe you might want to remove the Rows class in favor of simple Map<String, Object> (another note: use Map, try not to specify collection implementations -- hash maps are unordered, but telling Gson you're going to deal with Map would let it to pick an ordered map like LinkedTreeMap (Gson internals) or LinkedHashMap that might be important for datasets)).
// Type tokens are immutable and can be declared constants
private static final TypeToken<String> stringTypeToken = new TypeToken<String>() {
};
private static final TypeToken<Long> longTypeToken = new TypeToken<Long>() {
};
private static final TypeToken<List<Map<String, Object>>> stringToObjectMapListTypeToken = new TypeToken<List<Map<String, Object>>>() {
};
private static final Gson gson = new GsonBuilder()
.registerTypeAdapterFactory(new TypeAdapterFactory() {
#Override
public <T> TypeAdapter<T> create(final Gson gson, final TypeToken<T> typeToken) {
if ( typeToken.getRawType() != Dataset.class ) {
return null;
}
// If the actual type token represents the Dataset class, then pick the bunch of downstream type adapters
final TypeAdapter<String> stringTypeAdapter = gson.getDelegateAdapter(this, stringTypeToken);
final TypeAdapter<Long> primitiveLongTypeAdapter = gson.getDelegateAdapter(this, longTypeToken);
final TypeAdapter<List<Map<String, Object>>> stringToObjectMapListTypeAdapter = stringToObjectMapListTypeToken);
// And compose the bunch into a single dataset type adapter
final TypeAdapter<Dataset> datasetTypeAdapter = new TypeAdapter<Dataset>() {
#Override
public void write(final JsonWriter out, final Dataset dataset) {
// Omitted for brevity
throw new UnsupportedOperationException();
}
#Override
public Dataset read(final JsonReader in)
throws IOException {
in.beginObject();
String table = null;
long update = 0;
List<Map<String, Object>> lists = null;
while ( in.hasNext() ) {
final String name = in.nextName();
switch ( name ) {
case "table":
table = stringTypeAdapter.read(in);
break;
case "update":
update = primitiveLongTypeAdapter.read(in);
break;
default:
lists = stringToObjectMapListTypeAdapter.read(in);
break;
}
}
in.endObject();
return new Dataset(table, update, lists);
}
}.nullSafe(); // Making the type adapter null-safe
#SuppressWarnings("unchecked")
final TypeAdapter<T> typeAdapter = (TypeAdapter<T>) datasetTypeAdapter;
return typeAdapter;
}
})
.create();
final Dataset dataset = gson.fromJson(jsonReader, Dataset.class);
System.out.println(dataset.lists);
The code above would print then:
[{column5=11.0, column6=yyy}, {column3=22.0, column4=zzz}]

Implement Hadoop Map with JavaPairRDD as Spark Way

I have an RDD:
JavaPairRDD<Long, ViewRecord> myRDD
which is created via newAPIHadoopRDD method. I have an existed map function which I want to implement it in Spark way:
LongWritable one = new LongWritable(1L);
protected void map(Long key, ViewRecord viewRecord, Context context)
throws IOException ,InterruptedException {
String url = viewRecord.getUrl();
long day = viewRecord.getDay();
tuple.getKey().set(url);
tuple.getValue().set(day);
context.write(tuple, one);
};
PS: tuple is derived from:
KeyValueWritable<Text, LongWritable>
and can be found here: TextLong.java
I don't know what tuple is but if you just want to map record to tuple with key (url, day) and value 1L you can do it like this:
result = myRDD
.values()
.mapToPair(viewRecord -> {
String url = viewRecord.getUrl();
long day = viewRecord.getDay();
return new Tuple2<>(new Tuple2<>(url, day), 1L);
})
//java 7 style
JavaPairRDD<Pair, Long> result = myRDD
.values()
.mapToPair(new PairFunction<ViewRecord, Pair, Long>() {
#Override
public Tuple2<Pair, Long> call(ViewRecord record) throws Exception {
String url = record.getUrl();
Long day = record.getDay();
return new Tuple2<>(new Pair(url, day), 1L);
}
}
);

Resources