Compile error while calling updateStateByKey - spark-streaming

Compile Error :
The method updateStateByKey(Function2<List<Integer>,Optional<S>,Optional<S>>) in the type JavaPairDStream<String,Integer> is not applicable for the arguments (Function2<List<Integer>,Optional<Integer>,Optional<Integer>>)
In a simple word count example , mapping the words with 1
JavaPairDStream<String, Integer> wordCounts = words.mapToPair(s -> new Tuple2<>(s,1));
And then applying updateStateByKey on wordCounts
JavaPairDStream<String, Integer> finalcount = wordCounts.updateStateByKey(updateFunction);
The updateFunction is defined as follows:
final Function2<List<Integer>, Optional<Integer>, Optional<Integer>> updateFunction =
new Function2<List<Integer>, Optional<Integer>, Optional<Integer>>() {
#Override
public Optional<Integer> call(List<Integer> values, Optional<Integer> state) {
Integer newSum = state.orElse(0);
for (Integer value : values) {
newSum += value;
}
return Optional.of(newSum);
}
};
The updateStateByKey has following recommended signatures available:

Please check which package you import for using Optional. Spark use com.google.common.base.Optional not jdk default package java.util.Optional.

Related

Convert a list of objects to a map of key and list of objects in java 8 [duplicate]

I want to translate a List of objects into a Map using Java 8's streams and lambdas.
This is how I would write it in Java 7 and below.
private Map<String, Choice> nameMap(List<Choice> choices) {
final Map<String, Choice> hashMap = new HashMap<>();
for (final Choice choice : choices) {
hashMap.put(choice.getName(), choice);
}
return hashMap;
}
I can accomplish this easily using Java 8 and Guava but I would like to know how to do this without Guava.
In Guava:
private Map<String, Choice> nameMap(List<Choice> choices) {
return Maps.uniqueIndex(choices, new Function<Choice, String>() {
#Override
public String apply(final Choice input) {
return input.getName();
}
});
}
And Guava with Java 8 lambdas.
private Map<String, Choice> nameMap(List<Choice> choices) {
return Maps.uniqueIndex(choices, Choice::getName);
}
Based on Collectors documentation it's as simple as:
Map<String, Choice> result =
choices.stream().collect(Collectors.toMap(Choice::getName,
Function.identity()));
If your key is NOT guaranteed to be unique for all elements in the list, you should convert it to a Map<String, List<Choice>> instead of a Map<String, Choice>
Map<String, List<Choice>> result =
choices.stream().collect(Collectors.groupingBy(Choice::getName));
Use getName() as the key and Choice itself as the value of the map:
Map<String, Choice> result =
choices.stream().collect(Collectors.toMap(Choice::getName, c -> c));
Most of the answers listed, miss a case when the list has duplicate items. In that case there answer will throw IllegalStateException. Refer the below code to handle list duplicates as well:
public Map<String, Choice> convertListToMap(List<Choice> choices) {
return choices.stream()
.collect(Collectors.toMap(Choice::getName, choice -> choice,
(oldValue, newValue) -> newValue));
}
Here's another one in case you don't want to use Collectors.toMap()
Map<String, Choice> result =
choices.stream().collect(HashMap<String, Choice>::new,
(m, c) -> m.put(c.getName(), c),
(m, u) -> {});
One more option in simple way
Map<String,Choice> map = new HashMap<>();
choices.forEach(e->map.put(e.getName(),e));
For example, if you want convert object fields to map:
Example object:
class Item{
private String code;
private String name;
public Item(String code, String name) {
this.code = code;
this.name = name;
}
//getters and setters
}
And operation convert List To Map:
List<Item> list = new ArrayList<>();
list.add(new Item("code1", "name1"));
list.add(new Item("code2", "name2"));
Map<String,String> map = list.stream()
.collect(Collectors.toMap(Item::getCode, Item::getName));
If you don't mind using 3rd party libraries, AOL's cyclops-react lib (disclosure I am a contributor) has extensions for all JDK Collection types, including List and Map.
ListX<Choices> choices;
Map<String, Choice> map = choices.toMap(c-> c.getName(),c->c);
You can create a Stream of the indices using an IntStream and then convert them to a Map :
Map<Integer,Item> map =
IntStream.range(0,items.size())
.boxed()
.collect(Collectors.toMap (i -> i, i -> items.get(i)));
I was trying to do this and found that, using the answers above, when using Functions.identity() for the key to the Map, then I had issues with using a local method like this::localMethodName to actually work because of typing issues.
Functions.identity() actually does something to the typing in this case so the method would only work by returning Object and accepting a param of Object
To solve this, I ended up ditching Functions.identity() and using s->s instead.
So my code, in my case to list all directories inside a directory, and for each one use the name of the directory as the key to the map and then call a method with the directory name and return a collection of items, looks like:
Map<String, Collection<ItemType>> items = Arrays.stream(itemFilesDir.listFiles(File::isDirectory))
.map(File::getName)
.collect(Collectors.toMap(s->s, this::retrieveBrandItems));
I will write how to convert list to map using generics and inversion of control. Just universal method!
Maybe we have list of Integers or list of objects. So the question is the following: what should be key of the map?
create interface
public interface KeyFinder<K, E> {
K getKey(E e);
}
now using inversion of control:
static <K, E> Map<K, E> listToMap(List<E> list, KeyFinder<K, E> finder) {
return list.stream().collect(Collectors.toMap(e -> finder.getKey(e) , e -> e));
}
For example, if we have objects of book , this class is to choose key for the map
public class BookKeyFinder implements KeyFinder<Long, Book> {
#Override
public Long getKey(Book e) {
return e.getPrice()
}
}
I use this syntax
Map<Integer, List<Choice>> choiceMap =
choices.stream().collect(Collectors.groupingBy(choice -> choice.getName()));
It's possible to use streams to do this. To remove the need to explicitly use Collectors, it's possible to import toMap statically (as recommended by Effective Java, third edition).
import static java.util.stream.Collectors.toMap;
private static Map<String, Choice> nameMap(List<Choice> choices) {
return choices.stream().collect(toMap(Choice::getName, it -> it));
}
Another possibility only present in comments yet:
Map<String, Choice> result =
choices.stream().collect(Collectors.toMap(c -> c.getName(), c -> c)));
Useful if you want to use a parameter of a sub-object as Key:
Map<String, Choice> result =
choices.stream().collect(Collectors.toMap(c -> c.getUser().getName(), c -> c)));
Map<String, Set<String>> collect = Arrays.asList(Locale.getAvailableLocales()).stream().collect(Collectors
.toMap(l -> l.getDisplayCountry(), l -> Collections.singleton(l.getDisplayLanguage())));
This can be done in 2 ways. Let person be the class we are going to use to demonstrate it.
public class Person {
private String name;
private int age;
public String getAge() {
return age;
}
}
Let persons be the list of Persons to be converted to the map
1.Using Simple foreach and a Lambda Expression on the List
Map<Integer,List<Person>> mapPersons = new HashMap<>();
persons.forEach(p->mapPersons.put(p.getAge(),p));
2.Using Collectors on Stream defined on the given List.
Map<Integer,List<Person>> mapPersons =
persons.stream().collect(Collectors.groupingBy(Person::getAge));
Here is solution by StreamEx
StreamEx.of(choices).toMap(Choice::getName, c -> c);
Map<String,Choice> map=list.stream().collect(Collectors.toMap(Choice::getName, s->s));
Even serves this purpose for me,
Map<String,Choice> map= list1.stream().collect(()-> new HashMap<String,Choice>(),
(r,s) -> r.put(s.getString(),s),(r,s) -> r.putAll(s));
If every new value for the same key name has to be overridden:
public Map < String, Choice > convertListToMap(List < Choice > choices) {
return choices.stream()
.collect(Collectors.toMap(Choice::getName,
Function.identity(),
(oldValue, newValue) - > newValue));
}
If all choices have to be grouped in a list for a name:
public Map < String, Choice > convertListToMap(List < Choice > choices) {
return choices.stream().collect(Collectors.groupingBy(Choice::getName));
}
List<V> choices; // your list
Map<K,V> result = choices.stream().collect(Collectors.toMap(choice::getKey(),choice));
//assuming class "V" has a method to get the key, this method must handle case of duplicates too and provide a unique key.
As an alternative to guava one can use kotlin-stdlib
private Map<String, Choice> nameMap(List<Choice> choices) {
return CollectionsKt.associateBy(choices, Choice::getName);
}
List<Integer> listA = new ArrayList<>();
listA.add(1);
listA.add(5);
listA.add(3);
listA.add(4);
System.out.println(listA.stream().collect(Collectors.toMap(x ->x, x->x)));
String array[] = {"ASDFASDFASDF","AA", "BBB", "CCCC", "DD", "EEDDDAD"};
List<String> list = Arrays.asList(array);
Map<Integer, String> map = list.stream()
.collect(Collectors.toMap(s -> s.length(), s -> s, (x, y) -> {
System.out.println("Dublicate key" + x);
return x;
},()-> new TreeMap<>((s1,s2)->s2.compareTo(s1))));
System.out.println(map);
Dublicate key AA
{12=ASDFASDFASDF, 7=EEDDDAD, 4=CCCC, 3=BBB, 2=AA}

Stream operation returns an Object instead of a List

I have the following code that executes as I intend:
import java.util.*;
import java.util.stream.Collectors;
public class HelloWorld{
public static void main(String []args){
HelloWorld.TreeNode rootNode = new HelloWorld().new TreeNode<Integer>(4);
List<Integer> traversal = rootNode.inorderTraversal();
// Prints 4
System.out.println(
String.join(",",
traversal
.stream()
.map(Object::toString)
.collect(Collectors.toList())
)
);
}
class TreeNode<K extends Comparable<K>> {
TreeNode<K> left;
TreeNode<K> right;
K val;
TreeNode(K val, TreeNode<K> left, TreeNode<K> right) {
this.val = val;
this.left = left;
this.right = right;
}
TreeNode(K val) {
this(val, null, null);
}
List<K> inorderTraversal() {
List<K> list = new ArrayList<>();
list.add(this.val);
return list;
}
}
}
However, if I replace the commented line with
System.out.println(
String.join(",",
rootNode.inorderTraversal()
.stream()
.map(Object::toString)
.collect(Collectors.toList())
)
);
I get the following error:
HelloWorld.java:14: error: no suitable method found for join(String,Object)
String.join(",",
^
method String.join(CharSequence,CharSequence...) is not applicable
(varargs mismatch; Object cannot be converted to CharSequence)
method String.join(CharSequence,Iterable<? extends CharSequence>) is not
applicable
(argument mismatch; Object cannot be converted to Iterable<? extends
CharSequence>)
Note: HelloWorld.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 error
I saw this very similar issue (Why does this java 8 stream operation evaluate to Object instead of List<Object> or just List?), but I don't see how my solution doesn't circumvent the problem that user had because rootNode.inorderTraversal() return a List<Integer> instead of a List.
Thanks in advance for any assistance!
This is because you are using raw types. Parameterize it with the generic types like so.
HelloWorld.TreeNode<Integer> rootNode = new HelloWorld().new TreeNode<>(4);
This will fix the issue. If you don't supply a generic type parameter on the left-hand side, the List is declared as a raw type.

How to concat Java Flux lists into one list from external sources

In a spring-boot 2.0 rest controller, I have created the following code which works as desired:
#ResponseBody
#GetMapping("/test3")
Mono<List<String>> test3(){
List<String> l1 = Arrays.asList("one","two","three");
List<String> l2 = Arrays.asList("four","five","six");
return Flux
.concat(Flux.fromIterable(l1),Flux.fromIterable(l2))
.collectList();
}
My problem comes from trying to do the same thing from an external datasource. I have created the following test case:
#ResponseBody
#GetMapping("/test4")
Flux<Object> test4(){
List<String> indecies = Arrays.asList("1","2");
return Flux.concat(
Flux.fromIterable(indecies)
.flatMap(k -> Flux.just(myRepository.getList(k))
.subscribeOn(Schedulers.parallel()),2
)
).collectList();
}
Where myRepository is the following:
#Repository
public class MyRepository {
List<String> l1 = Arrays.asList("one","two","three");
List<String> l2 = Arrays.asList("four","five","six");
Map<String, List<String>> pm = new HashMap<String, List<String>>();
MyRepository(){
pm.put("1", l1);
pm.put("2", l2);
}
List<String> getList(String key){
List<String> list = pm.get(key);
return list;
}
}
My code labeled test4 gives me the code hint error:
Type mismatch: cannot convert from Flux< List < String >> to Publisher < ?
extends Publisher < ? extends Object >>
So a few questions:
I thought that a Flux was a publisher? So why the error?
What am I doing wrong in test 4 so that it will output the same result as in test3?
The expected output is: [["one","two","three","four","five","six"]]
Using M. Deinum's comment, here is what works:
#ResponseBody
#GetMapping("/test6")
Mono<List<String>> test6(){
List<String> indecies = Arrays.asList("1","2");
return Flux.fromIterable(indecies)
.flatMap(k -> Flux.fromIterable(myRepository.getList(k)).subscribeOn(Schedulers.parallel()),2)
.collectList();
}

Using map of maps as Maven plugin parameters

Is it possible to use a map of maps as a Maven plugin parameter?, e.g.
#Parameter
private Map<String, Map<String, String>> converters;
and then to use it like
<converters>
<json>
<indent>true</indent>
<strict>true</strict>
</json>
<yaml>
<stripComments>false</stripComments>
</yaml>
<converters>
If I use it like this, converters only contain the keys json and yaml with null as values.
I know it is possible to have complex objects as values, but is it also somehow possible to use maps for variable element values like in this example?
This is apparently a limitation of the sisu.plexus project internally used by the Mojo API. If you peek inside the MapConverter source, you'll find out that it first tries to fetch the value of the map by trying to interpret the configuration as a String (invoking fromExpression), and when this fails, looks up the expected type of the value. However this method doesn't check for parameterized types, which is our case here (since the type of the map value is Map<String, String>). I filed the bug 498757 on the Bugzilla of this project to track this.
Using a custom wrapper object
One workaround would be to not use a Map<String, String> as value but use a custom object:
#Parameter
private Map<String, Converter> converters;
with a class Converter, located in the same package as the Mojo, being:
public class Converter {
#Parameter
private Map<String, String> properties;
#Override
public String toString() { return properties.toString(); } // to test
}
You can then configure your Mojo with:
<converters>
<json>
<properties>
<indent>true</indent>
<strict>true</strict>
</properties>
</json>
<yaml>
<properties>
<stripComments>false</stripComments>
</properties>
</yaml>
</converters>
This configuration will correctly inject the values in the inner-maps. It also keeps the variable aspect: the object is only introduced as a wrapper around the inner-map. I tested this with a simple test mojo having
public void execute() throws MojoExecutionException, MojoFailureException {
getLog().info(converters.toString());
}
and the output was the expected {json={indent=true, strict=true}, yaml={stripComments=false}}.
Using a custom configurator
I also found a way to keep a Map<String, Map<String, String>> by using a custom ComponentConfigurator.
So we want to fix MapConverter by inhering it, the trouble is how to register this new FixedMapConverter. By default, Maven uses a BasicComponentConfigurator to configure the Mojo and it relies on a DefaultConverterLookup to look-up for converters to use for a specific class. In this case, we want to provide a custom converted for Map that will return our fixed version. Therefore, we need to extend this basic configurator and register our new converter.
import org.codehaus.plexus.classworlds.realm.ClassRealm;
import org.codehaus.plexus.component.configurator.BasicComponentConfigurator;
import org.codehaus.plexus.component.configurator.ComponentConfigurationException;
import org.codehaus.plexus.component.configurator.ConfigurationListener;
import org.codehaus.plexus.component.configurator.expression.ExpressionEvaluator;
import org.codehaus.plexus.configuration.PlexusConfiguration;
public class CustomBasicComponentConfigurator extends BasicComponentConfigurator {
#Override
public void configureComponent(final Object component, final PlexusConfiguration configuration,
final ExpressionEvaluator evaluator, final ClassRealm realm, final ConfigurationListener listener)
throws ComponentConfigurationException {
converterLookup.registerConverter(new FixedMapConverter());
super.configureComponent(component, configuration, evaluator, realm, listener);
}
}
Then we need to tell Maven to use this new configurator instead of the basic one. This is a 2-step process:
Inside your Maven plugin, create a file src/main/resources/META-INF/plexus/components.xml registering the new component:
<?xml version="1.0" encoding="UTF-8"?>
<component-set>
<components>
<component>
<role>org.codehaus.plexus.component.configurator.ComponentConfigurator</role>
<role-hint>custom-basic</role-hint>
<implementation>package.to.CustomBasicComponentConfigurator</implementation>
</component>
</components>
</component-set>
Note a few things: we declare a new component having the hint "custom-basic", this will serve as an id to refer to it and the <implementation> refers to the fully qualified class name of our configurator.
Tell our Mojo to use this configurator with the configurator attribute of the #Mojo annotation:
#Mojo(name = "test", configurator = "custom-basic")
The configurator passed here corresponds to the role-hint specified in the components.xml above.
With such a set-up, you can finally declare
#Parameter
private Map<String, Map<String, String>> converters;
and everything will be injected properly: Maven will use our custom configurator, that will register our fixed version of the map converter and will correctly convert the inner-maps.
Full code of FixedMapConverter (which pretty much copy-pastes MapConverter because we can't override the faulty method):
public class FixedMapConverter extends MapConverter {
public Object fromConfiguration(final ConverterLookup lookup, final PlexusConfiguration configuration,
final Class<?> type, final Type[] typeArguments, final Class<?> enclosingType, final ClassLoader loader,
final ExpressionEvaluator evaluator, final ConfigurationListener listener)
throws ComponentConfigurationException {
final Object value = fromExpression(configuration, evaluator, type);
if (null != value) {
return value;
}
try {
final Map<Object, Object> map = instantiateMap(configuration, type, loader);
final Class<?> elementType = findElementType(typeArguments);
if (Object.class == elementType || String.class == elementType) {
for (int i = 0, size = configuration.getChildCount(); i < size; i++) {
final PlexusConfiguration element = configuration.getChild(i);
map.put(element.getName(), fromExpression(element, evaluator));
}
return map;
}
// handle maps with complex element types...
final ConfigurationConverter converter = lookup.lookupConverterForType(elementType);
for (int i = 0, size = configuration.getChildCount(); i < size; i++) {
Object elementValue;
final PlexusConfiguration element = configuration.getChild(i);
try {
elementValue = converter.fromConfiguration(lookup, element, elementType, enclosingType, //
loader, evaluator, listener);
}
// TEMP: remove when http://jira.codehaus.org/browse/MSHADE-168
// is fixed
catch (final ComponentConfigurationException e) {
elementValue = fromExpression(element, evaluator);
Logs.warn("Map in " + enclosingType + " declares value type as: {} but saw: {} at runtime",
elementType, null != elementValue ? elementValue.getClass() : null);
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
map.put(element.getName(), elementValue);
}
return map;
} catch (final ComponentConfigurationException e) {
if (null == e.getFailedConfiguration()) {
e.setFailedConfiguration(configuration);
}
throw e;
}
}
#SuppressWarnings("unchecked")
private Map<Object, Object> instantiateMap(final PlexusConfiguration configuration, final Class<?> type,
final ClassLoader loader) throws ComponentConfigurationException {
final Class<?> implType = getClassForImplementationHint(type, configuration, loader);
if (null == implType || Modifier.isAbstract(implType.getModifiers())) {
return new TreeMap<Object, Object>();
}
final Object impl = instantiateObject(implType);
failIfNotTypeCompatible(impl, type, configuration);
return (Map<Object, Object>) impl;
}
private static Class<?> findElementType( final Type[] typeArguments )
{
if ( null != typeArguments && typeArguments.length > 1 )
{
if ( typeArguments[1] instanceof Class<?> )
{
return (Class<?>) typeArguments[1];
}
// begin fix here
if ( typeArguments[1] instanceof ParameterizedType )
{
return (Class<?>) ((ParameterizedType) typeArguments[1]).getRawType();
}
// end fix here
}
return Object.class;
}
}
One solution is quite simple and works for 1-level nesting. A more sophisticated approach can be found in the alternative answer which possibly also allows for deeper nesting of Maps.
Instead of using an interface as type parameter, simply use a concrete class like TreeMap
#Parameter
private Map<String, TreeMap> converters.
The reason is this check in MapConverter which fails for an interface but suceeds for a concrete class:
private static Class<?> findElementType( final Type[] typeArguments )
{
if ( null != typeArguments && typeArguments.length > 1
&& typeArguments[1] instanceof Class<?> )
{
return (Class<?>) typeArguments[1];
}
return Object.class;
}
As a side-note, an as it is also related to this answer for Maven > 3.3.x it also works to install a custom converter by subclassing BasicComponentConfigurator and using it as a Plexus component. BasicComponentConfigurator has the DefaultConverterLookup as a protected member variable and is hence easily accessible for registering custom converters.

Implement Hadoop Map with JavaPairRDD as Spark Way

I have an RDD:
JavaPairRDD<Long, ViewRecord> myRDD
which is created via newAPIHadoopRDD method. I have an existed map function which I want to implement it in Spark way:
LongWritable one = new LongWritable(1L);
protected void map(Long key, ViewRecord viewRecord, Context context)
throws IOException ,InterruptedException {
String url = viewRecord.getUrl();
long day = viewRecord.getDay();
tuple.getKey().set(url);
tuple.getValue().set(day);
context.write(tuple, one);
};
PS: tuple is derived from:
KeyValueWritable<Text, LongWritable>
and can be found here: TextLong.java
I don't know what tuple is but if you just want to map record to tuple with key (url, day) and value 1L you can do it like this:
result = myRDD
.values()
.mapToPair(viewRecord -> {
String url = viewRecord.getUrl();
long day = viewRecord.getDay();
return new Tuple2<>(new Tuple2<>(url, day), 1L);
})
//java 7 style
JavaPairRDD<Pair, Long> result = myRDD
.values()
.mapToPair(new PairFunction<ViewRecord, Pair, Long>() {
#Override
public Tuple2<Pair, Long> call(ViewRecord record) throws Exception {
String url = record.getUrl();
Long day = record.getDay();
return new Tuple2<>(new Pair(url, day), 1L);
}
}
);

Resources