affects on output when different sets are used in java - set

import java.util.*;
public class Example {
public static void main(String[] args) {
// insert code here
set.add(new Integer(2));
set.add(new Integer(1));
System.out.println(set);
}
}
Set set = new TreeSet();
Set set = new HashSet();
Set set = new SortedSet();
List set = new SortedList();
Set set = new LinkedHashSet();
this question was in ocjp please help me with it
Which code, inserted at line 4, guarantees that this program will output [1, 2]?

Answer is Set set = new TreeSet();
For all Wrapper classes, Treeset defines one natural ordering using Comparable interface. So it will automatically sort the integers.

Related

How to transform pattern in jFugue 5.0?

In jFugue 4.0 there's a nice function:
Transforming Patterns with PatternTransformer
but all pattern transformers are removed in jFugue 5.0. I understand it must be replaced with something cool. but what to do in jFugue 5.0 please? i get no clue. I googled but have so far had no outcome.
The class "PatternTransformer" has gone, but transforming patterns has never been easier!
In older versions of JFugue, there was actually very little difference between a PatternTransformer and a ParserListener. Older versions of JFugue also referred to a PatternTool, which was like a Transformer but instead of transforming a pattern, it would just measure it; for example, you could write a tool to tell you what instruments were used in a piece.
To transform a Pattern in JFugue, just create a class that implements ParserListener (or extends ParserListenerAdapter), and add it as a listener to a parser - such as a StaccatoParser:
For example, here's a tool that finds what instruments are used in a piece:
public class InstrumentTool extends ParserListenerAdapter
{
private List<String> instrumentNames;
public InstrumentTool() {
super();
instrumentNames = new ArrayList<String>();
}
#Override
public void onInstrumentParsed(byte instrument) {
String instrumentName = MidiDictionary.INSTRUMENT_BYTE_TO_STRING.get(instrument);
if (!instrumentNames.contains(instrumentName)) {
instrumentNames.add(instrumentName);
}
}
public List<String> getInstrumentNames() {
return this.instrumentNames;
}
}
and here's how to use it:
MidiParser midiParser = new MidiParser();
InstrumentTool instrumentTool = new InstrumentTool();
midiParser.addParserListener(instrumentTool);
midiParser.parse(MidiSystem.getSequence(new File("filename")));
List<String> instrumentNames = instrumentTool.getInstrumentNames();
for (String name : instrumentNames) {
System.out.println(name);
}
There's a new class in JFugue 5 that lets you chain ParserListeners together. This would let you create a chain of listeners that each modify a pattern before sending events to the next listener in the chain. For example, suppose you have a pattern, and you want to transform all of the instruments (say, change GUITAR to PIANO); then you want to take any note played with PIANO and stretch its duration by two; then you want to take any note with a new duration greater than 2.0 (two whole notes) and you want to change its octave. A bit of a crazy example, but it shows the need for a "chaining" series of parser listeners.
Here's a demo example that uses chaining. This class reads a MIDI pattern; it then changes all of the instruments, and then it creates a Staccato pattern from the original MIDI.
public class ChainingParserListenerDemo {
public static void main(String[] args) throws InvalidMidiDataException, IOException {
MidiParser parser = new MidiParser();
InstrumentChangingParserListener instrumentChanger = new InstrumentChangingParserListener();
StaccatoParserListener staccatoListener = new StaccatoParserListener();
instrumentChanger.addParserListener(staccatoListener);
parser.addParserListener(instrumentChanger);
parser.parse(MidiSystem.getSequence(new File("filename")));
System.out.println("Changed "+instrumentChanger.counter+" Piano's to Guitar! "+ staccatoListener.getPattern().toString());
}
}
class InstrumentChangingParserListener extends ChainingParserListenerAdapter {
int counter = 0;
#Override
public void onInstrumentParsed(byte instrument) {
if (instrument == MidiDictionary.INSTRUMENT_STRING_TO_BYTE.get("PIANO")) {
instrument = MidiDictionary.INSTRUMENT_STRING_TO_BYTE.get("GUITAR");
counter++;
}
super.onInstrumentParsed(instrument);
}
}

Gson: How do I deserialize an inner JSON object to a map if the property name is not fixed?

My client retrieves JSON content as below:
{
"table": "tablename",
"update": 1495104575669,
"rows": [
{"column5": 11, "column6": "yyy"},
{"column3": 22, "column4": "zzz"}
]
}
In rows array content, the key is not fixed. I want to retrieve the key and value and save into a Map using Gson 2.8.x.
How can I configure Gson to simply use to deserialize?
Here is my idea:
public class Dataset {
private String table;
private long update;
private List<Rows>> lists; <-- little confused here.
or private List<HashMap<String,Object> lists
Setter/Getter
}
public class Rows {
private HashMap<String, Object> map;
....
}
Dataset k = gson.fromJson(jsonStr, Dataset.class);
log.info(k.getRows().size()); <-- I got two null object
Thanks.
Gson does not support such a thing out of box. It would be nice, if you can make the property name fixed. If not, then you can have a few options that probably would help you.
Just rename the Dataset.lists field to Dataset.rows, if the property name is fixed, rows.
If the possible name set is known in advance, suggest Gson to pick alternative names using the #SerializedName.
If the possible name set is really unknown and may change in the future, you might want to try to make it fully dynamic using a custom TypeAdapter (streaming mode; requires less memory, but harder to use) or a custom JsonDeserializer (object mode; requires more memory to store intermediate tree views, but it's easy to use) registered with GsonBuilder.
For option #2, you can simply add the names of name alternatives:
#SerializedName(value = "lists", alternate = "rows")
final List<Map<String, Object>> lists;
For option #3, bind a downstream List<Map<String, Object>> type adapter trying to detect the name dynamically. Note that I omit the Rows class deserialization strategy for simplicity (and I believe you might want to remove the Rows class in favor of simple Map<String, Object> (another note: use Map, try not to specify collection implementations -- hash maps are unordered, but telling Gson you're going to deal with Map would let it to pick an ordered map like LinkedTreeMap (Gson internals) or LinkedHashMap that might be important for datasets)).
// Type tokens are immutable and can be declared constants
private static final TypeToken<String> stringTypeToken = new TypeToken<String>() {
};
private static final TypeToken<Long> longTypeToken = new TypeToken<Long>() {
};
private static final TypeToken<List<Map<String, Object>>> stringToObjectMapListTypeToken = new TypeToken<List<Map<String, Object>>>() {
};
private static final Gson gson = new GsonBuilder()
.registerTypeAdapterFactory(new TypeAdapterFactory() {
#Override
public <T> TypeAdapter<T> create(final Gson gson, final TypeToken<T> typeToken) {
if ( typeToken.getRawType() != Dataset.class ) {
return null;
}
// If the actual type token represents the Dataset class, then pick the bunch of downstream type adapters
final TypeAdapter<String> stringTypeAdapter = gson.getDelegateAdapter(this, stringTypeToken);
final TypeAdapter<Long> primitiveLongTypeAdapter = gson.getDelegateAdapter(this, longTypeToken);
final TypeAdapter<List<Map<String, Object>>> stringToObjectMapListTypeAdapter = stringToObjectMapListTypeToken);
// And compose the bunch into a single dataset type adapter
final TypeAdapter<Dataset> datasetTypeAdapter = new TypeAdapter<Dataset>() {
#Override
public void write(final JsonWriter out, final Dataset dataset) {
// Omitted for brevity
throw new UnsupportedOperationException();
}
#Override
public Dataset read(final JsonReader in)
throws IOException {
in.beginObject();
String table = null;
long update = 0;
List<Map<String, Object>> lists = null;
while ( in.hasNext() ) {
final String name = in.nextName();
switch ( name ) {
case "table":
table = stringTypeAdapter.read(in);
break;
case "update":
update = primitiveLongTypeAdapter.read(in);
break;
default:
lists = stringToObjectMapListTypeAdapter.read(in);
break;
}
}
in.endObject();
return new Dataset(table, update, lists);
}
}.nullSafe(); // Making the type adapter null-safe
#SuppressWarnings("unchecked")
final TypeAdapter<T> typeAdapter = (TypeAdapter<T>) datasetTypeAdapter;
return typeAdapter;
}
})
.create();
final Dataset dataset = gson.fromJson(jsonReader, Dataset.class);
System.out.println(dataset.lists);
The code above would print then:
[{column5=11.0, column6=yyy}, {column3=22.0, column4=zzz}]

Is there a way to make a custom implementation of Nashorn JSObject work with Object.keys()?

I recently asked this question How can I pass a proper method reference in so Nashorn can execute it? and got an answer that helped me get much further along with my project, but I discovered a limitation around providing a custom JSObject implementation that I don't know how to resolve.
Given this simple working JSObject that can handle most of the methods JS would invoke on it such as map:
import javax.script.*;
import jdk.nashorn.api.scripting.*;
import java.util.*;
import java.util.function.*;
public class scratch_6 {
public static void main(String[] args) throws Exception {
ScriptEngineManager m = new ScriptEngineManager();
ScriptEngine e = m.getEngineByName("nashorn");
// The following JSObject wraps this list
List<Object> l = new ArrayList<>();
l.add("hello");
l.add("world");
l.add(true);
l.add(1);
JSObject jsObj = new AbstractJSObject() {
#Override
public Object getMember(String name) {
if (name.equals("map")) {
// return a functional interface object - nashorn will treat it like
// script function!
final Function<JSObject, Object> jsObjectObjectFunction = callback -> {
List<Object> res = new ArrayList<>();
for (Object obj : l) {
// call callback on each object and add the result to new list
res.add(callback.call(null, obj));
}
// return fresh list as result of map (or this could be another wrapper)
return res;
};
return jsObjectObjectFunction;
} else {
// unknown property
return null;
}
}
};
e.put("obj", jsObj);
// map each String to it's uppercase and print result of map
e.eval("print(obj.map(function(x) '\"'+x.toString()+'\"'))");
//PROBLEM
//e.eval("print(Object.keys(obj))");
}
}
If you uncomment the last line where Object.keys(obj) is called, it will fail with the error ... is not an Object.
This appears to be because Object.keys() [ NativeObject.java:376 ] only checks whether the object is an instance of ScriptObject or of ScriptObjectMirror. If it is neither of those things, it throws the notAnObject error. :(
Ideally, user implemented JSObject objects should be exactly equivalent to script objects. But, user implemented JSObjects are almost script objects - but not quite. This is documented here -> https://wiki.openjdk.java.net/display/Nashorn/Nashorn+jsr223+engine+notes
Object.keys is one such case where it breaks. However, if you just want for..in javascript iteration support for your objects, you can implement JSObject.keySet in your class.
Example code:
import javax.script.*;
import jdk.nashorn.api.scripting.*;
import java.util.*;
public class Main {
public static void main(String[] args) throws Exception {
ScriptEngineManager m = new ScriptEngineManager();
ScriptEngine e = m.getEngineByName("nashorn");
// This JSObject wraps the following Properties object
Properties props = System.getProperties();
JSObject jsObj = new AbstractJSObject() {
#Override
public Set<String> keySet() {
return props.stringPropertyNames();
}
#Override
public Object getMember(String name) {
return props.getProperty(name);
}
};
e.put("obj", jsObj);
e.eval("for (i in obj) print(i, ' = ', obj[i])");
}
}

How to write multiple outputs of different formats in a hadoop reducer?

How do you use the MultipleOutputs class in a reducer to write multiple outputs, each of which can have its own unique configuration? There is some documentation in the MultipleOutputs javadoc, but it seems limited to Text outputs. It turns out that MultipleOutputs can handle the output path, key class and value class of each output, but attempts to use output formats that require the use of other configuration properties fail.
(This question has come up several times but my attempts to answer it have been thwarted because the asker actually had a different problem. Since this question has taken more than a few days of investigation for me to answer, I'm answering my own question here as suggested by this Meta Stack Overflow question.)
I've crawled through the MultipleOutputs implementation and have found that it doesn't support any OutputFormatType that has properties other than outputDir, key class and value class. I tried to write my own MultipleOutputs class, but that failed because it needs to call a private method somewhere in the Hadoop classes.
I'm left with only one workaround that seems to work in all cases and all combinations of output formats and configurations: Write subclasses of the OutputFormat classes that I want to use (these turn out to be reusable). These classes understand that other OutputFormats are in use concurrently and know how to store away their properties. The design exploits the fact that an OutputFormat can be configured with the context just before being asked for its RecordWriter.
I've got this to work with Cassandra's ColumnFamilyOutputFormat:
package com.myorg.hadoop.platform;
import org.apache.cassandra.hadoop.ColumnFamilyOutputFormat;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configuration;
public abstract class ConcurrentColumnFamilyOutputFormat
extends ColumnFamilyOutputFormat
implements Configurable {
private static String[] propertyName = {
"cassandra.output.keyspace" ,
"cassandra.output.keyspace.username" ,
"cassandra.output.keyspace.passwd" ,
"cassandra.output.columnfamily" ,
"cassandra.output.predicate",
"cassandra.output.thrift.port" ,
"cassandra.output.thrift.address" ,
"cassandra.output.partitioner.class"
};
private Configuration configuration;
public ConcurrentColumnFamilyOutputFormat() {
super();
}
public Configuration getConf() {
return configuration;
}
public void setConf(Configuration conf) {
configuration = conf;
String prefix = "multiple.outputs." + getMultiOutputName() + ".";
for (int i = 0; i < propertyName.length; i++) {
String property = prefix + propertyName[i];
String value = conf.get(property);
if (value != null) {
conf.set(propertyName[i], value);
}
}
}
public void configure(Configuration conf) {
String prefix = "multiple.outputs." + getMultiOutputName() + ".";
for (int i = 0; i < propertyName.length; i++) {
String property = prefix + propertyName[i];
String value = conf.get(propertyName[i]);
if (value != null) {
conf.set(property, value);
}
}
}
public abstract String getMultiOutputName();
}
For each Cassandra (in this case) output you want for your reducer, you'd have a class:
package com.myorg.multioutput.ReadCrawled;
import com.myorg.hadoop.platform.ConcurrentColumnFamilyOutputFormat;
public class StrongOutputFormat extends ConcurrentColumnFamilyOutputFormat {
public StrongOutputFormat() {
super();
}
#Override
public String getMultiOutputName() {
return "Strong";
}
}
and you'd configure it in your mapper/reducer configuration class:
// This is how you'd normally configure the ColumnFamilyOutputFormat
ConfigHelper.setOutputColumnFamily(job.getConfiguration(), "Partner", "Strong");
ConfigHelper.setOutputRpcPort(job.getConfiguration(), "9160");
ConfigHelper.setOutputInitialAddress(job.getConfiguration(), "localhost");
ConfigHelper.setOutputPartitioner(job.getConfiguration(), "org.apache.cassandra.dht.RandomPartitioner");
// This is how you tell the MultipleOutput-aware OutputFormat that
// it's time to save off the configuration so no other OutputFormat
// steps all over it.
new StrongOutputFormat().configure(job.getConfiguration());
// This is where we add the MultipleOutput-aware ColumnFamilyOutputFormat
// to out set of outputs
MultipleOutputs.addNamedOutput(job, "Strong", StrongOutputFormat.class, ByteBuffer.class, List.class);
Just to give another example, the MultipleOutput subclass for FileOutputFormat uses these properties:
private static String[] propertyName = {
"mapred.output.compression.type" ,
"mapred.output.compression.codec" ,
"mapred.output.compress" ,
"mapred.output.dir"
};
and would be implement just like ConcurrentColumnFamilyOutputFormat above except that it would use the above properties.
I have implemented MultipleOutputs support for Cassandra (see this JIRA ticket, and it is currently scheduled for release in 1.2. If you need it now, you can apply the patch in the ticket. Also check out this presentation on the topic which gives examples on its usage.

Any suggestions for reading two different dataset into Hadoop at the same time?

Dear hadooper:
I'm new for hadoop, and recently try to implement an algorithm.
This algorithm needs to calculate a matrix, which represent the different rating of every two pair of songs. I already did this, and the output is a 600000*600000 sparse matrix which I stored in my HDFS. Let's call this dataset A (size=160G)
Now, I need to read the users' profiles to predict their rating for a specific song. So I need to read the users' profile first(which is 5G size), let call this dataset B, and then calculate use the dataset A.
But now I don't know how to read the two dataset from a single hadoop program. Or can I read the dataset B into RAM then do the calculation?( I guess I can't, because the HDFS is a distribute system, and I can't read the dataset B into a single machine's memory).
Any suggestions?
You can use two Map function, Each Map Function Can process one data set if you want to implement different processing. You need to register your map with your job conf. For eg:
public static class FullOuterJoinStdDetMapper extends MapReduceBase implements Mapper <LongWritable ,Text ,Text, Text>
{
private String person_name, book_title,file_tag="person_book#";
private String emit_value = new String();
//emit_value = "";
public void map(LongWritable key, Text values, OutputCollector<Text,Text>output, Reporter reporter)
throws IOException
{
String line = values.toString();
try
{
String[] person_detail = line.split(",");
person_name = person_detail[0].trim();
book_title = person_detail[1].trim();
}
catch (ArrayIndexOutOfBoundsException e)
{
person_name = "student name missing";
}
emit_value = file_tag + person_name;
output.collect(new Text(book_title), new Text(emit_value));
}
}
public static class FullOuterJoinResultDetMapper extends MapReduceBase implements Mapper <LongWritable ,Text ,Text, Text>
{
private String author_name, book_title,file_tag="auth_book#";
private String emit_value = new String();
// emit_value = "";
public void map(LongWritable key, Text values, OutputCollectoroutput, Reporter reporter)
throws IOException
{
String line = values.toString();
try
{
String[] author_detail = line.split(",");
author_name = author_detail[1].trim();
book_title = author_detail[0].trim();
}
catch (ArrayIndexOutOfBoundsException e)
{
author_name = "Not Appeared in Exam";
}
emit_value = file_tag + author_name;
output.collect(new Text(book_title), new Text(emit_value));
}
}
public static void main(String args[])
throws Exception
{
if(args.length !=3)
{
System.out.println("Input outpur file missing");
System.exit(-1);
}
Configuration conf = new Configuration();
String [] argum = new GenericOptionsParser(conf,args).getRemainingArgs();
conf.set("mapred.textoutputformat.separator", ",");
JobConf mrjob = new JobConf();
mrjob.setJobName("Inner_Join");
mrjob.setJarByClass(FullOuterJoin.class);
MultipleInputs.addInputPath(mrjob,new Path(argum[0]),TextInputFormat.class,FullOuterJoinStdDetMapper.class);
MultipleInputs.addInputPath(mrjob,new Path(argum[1]),TextInputFormat.class,FullOuterJoinResultDetMapper.class);
FileOutputFormat.setOutputPath(mrjob,new Path(args[2]));
mrjob.setReducerClass(FullOuterJoinReducer.class);
mrjob.setOutputKeyClass(Text.class);
mrjob.setOutputValueClass(Text.class);
JobClient.runJob(mrjob);
}
Hadoop allows you to use different map input formats for different folders. So you can read from several datasources and then cast to specific type in Map function i.e. in one case you got (String,User) in other (String,SongSongRating) and you Map signature is (String,Object).
The second step is selection recommendation algorithm, join those data in some way so aggregator will have least information enough to calculate recommendation.

Resources