Hadoop Multiple Outputs with CQL3 - hadoop

I need to output the results of a MR job to multiple CQL3 column families.
In my reducer, I specify the CF using MultipleOutputs, but all the results are written to the one CF defined in the job's OutputCQL statement.
Job definiton:
...
job.setOutputFormatClass(CqlOutputFormat.class);
ConfigHelper.setOutputKeyspace(job.getConfiguration(), "keyspace1");
MultipleOutputs.addNamedOutput(job, "CF1", CqlOutputFormat.class, Map.class, List.class);
MultipleOutputs.addNamedOutput(job, "CF2", CqlOutputFormat.class, Map.class, List.class);
CqlConfigHelper.setOutputCql(job.getConfiguration(), "UPDATE keyspace1.CF1 SET value = ? ");
...
Reducer class setup:
mos = new MultipleOutputs(context);
Reduce method (psudo code):
keys = new LinkedHashMap<>();
keys.put("key", ByteBufferUtil.bytes("rowKey"));
keys.put("name", ByteBufferUtil.bytes("columnName"));
List<ByteBuffer> variables = new ArrayList<>();
variables.add(ByteBufferUtil.bytes("columnValue"));
mos.write("CF2", keys, variables);
The problem is that my reducer ignores the CF I specify in mos.write() and instead must just run the outputCQL. So in the example above, everything is written to CF1.
Ive tried using a prepared statement to inject the CF into the outputCQL, along the lines of "UPDATE keyspace1.? SET value = ?", but I dont think its possible to use a placeholder for the CF like this.
Is there any way I can overwrite the outputCQL inside the reducer class?

So the simple answer is that you cannot output results from a mr job to multiple CFs. However, having the need to do this actually highlights a flaw in the approach, rather than a missing feature in Hadoop.
Instead of processing a bunch of records and trying to produce 2 different results sets in one pass, a better approach is to arrive at the desired result sets iteratively. Basically, this means having multiple jobs iterating over the results of previous jobs until the desired results are achieved.

Related

Nested Parameter substitution problems in jmeter

I am trying to create a jmx to test some database entries. The exact queries are not known beforehand and need to be picked from one of many sets of queries, that I have declared in separate pre-processor units.
For eg. one of the set of queries is:
String[][] animals = new String[][]{
{"VLS_CATS_ASSOC","CAT_ID_AS = '${CAT_ID_AS}'"},
{"VLS_DOGS_EXCH","DOG_ID_ST = '${DOG_ID_ST}' and DOG_TXT_REF = '${DOG_TXT_REF}'"},
};
The 2D array has two entries, the 'table name' and the 'where clause'
So, the jdbc request is select * from ${table_name} where ${where_clause}
I've set up a loop controller for iterating through the tables one by one, and as a child i have jdbc sampler, that has the csv table config which will contain data for CAT_ID_AS, DOG_ID_ST, DOG_TXT_REF.
Currently, when i see my requests through a results listener, i see that the queries sent are:
select * from VLS_CATS_ASSOC where DOG_ID_ST = '${DOG_ID_ST}' and DOG_TXT_REF = '${DOG_TXT_REF}'
It is clear from the output that the first level of substitution has worked, but not the second one.
Can anybody please help me on this?
Edit: Adding an image of the test plan. Jmeter Test Plan
Don't use ${CAT_ID_AS} notation in (I presume) the BeanShell strings: while it is generally working, that's an undocumented and error-prone feature.
The legit way is vars.get("CAT_ID_AS")
vars is a pre-defined object available in any BeanShell piece inside your test plan
So your code would be looking like
String[][] animals = new String[][]{ {"VLS_CATS_ASSOC","CAT_ID_AS = '" + vars.get("CAT_ID_AS") + "'"}, ... etc

Flume: HDFSEventSink - how to multiplex dynamically?

Summary: I have a multiplexing scenario, and would like to know how to multiplex dynamically - not based on a value statically configured, but based on the variable value of a field(e.g. dates).
Details:
I have an input, that is separated by an entityId.
As I know the entities that I am working with, I can configure it in typical Flume multi-channel selection.
agent.sources.jmsSource.channels = chan-10 chan-11 # ...
agent.sources.jmsSource.selector.type = multiplexing
agent.sources.jmsSource.selector.header = EntityId
agent.sources.jmsSource.selector.mapping.10 = chan-10
agent.sources.jmsSource.selector.mapping.11 = chan-11
# ...
Each of the channels goes to a separate HDFSEventSink, "hdfsSink-n":
agent.sinks.hdfsSink-10.channel = chan-10
agent.sinks.hdfsSink-10.hdfs.path = hdfs://some/path/
agent.sinks.hdfsSink-10.hdfs.filePrefix = entity10
# ...
agent.sinks.hdfsSink-11.channel = chan-11
agent.sinks.hdfsSink-11.hdfs.path = hdfs://some/path/
agent.sinks.hdfsSink-11.hdfs.filePrefix = entity11
# ...
This generates a file per entity, which is fine.
Now I want to introduce a second variable, which is dynamic: a date. Depending on event date, I want to create files per-entity per-date.
Date is a dynamic value, so I cannot preconfigure a number of sinks so each one sends to a separate file. Also, you can only specify one HDFS output per Sink.
So, it's like a "Multiple Outputs HDFSEventSink" was needed (in a similar way as Hadoop's MultipleOutputs library). Is there such a functionality in Flume?
If not, is there any elegant way to fix this or work this around? Another option is to modify HDFSEventSink and it seems it could be implemented, by having a different creation of "realName" (String) for each event.
Actually you can specific the variable in you hdfs sink's path or filePrefix.
For example, if the variable's key is "date" in event's headers, then you can configure like this:
agent.sinks.hdfsSink-11.hdfs.filePrefix = entity11-%{date}

How do I make the mapper process the entire file from HDFS

This is the code where I read the file that contain Hl7 messages and iterate through them using Hapi Iterator (from http://hl7api.sourceforge.net)
File file = new File("/home/training/Documents/msgs.txt");
InputStream is = new FileInputStream(file);
is = new BufferedInputStream(is);
Hl7InputStreamMessageStringIterator iter = new
Hl7InputStreamMessageStringIterator(is);
I want to make this done inside the map function? obviously I need to prevent the splitting in InputFormat to read the entire file as once as a single value and change it toString (the file size is 7KB), because as you know Hapi can parse only entire message.
I am newbie to all of this so please bear with me.
You will need to implement you own FileInputFormat subclass:
It must override isSplittable() method to false which means that number of mappers will be equal to number of input files: one input file per each mapper.
You also need to implement getRecordReader() method. This is exactly the class where you need to put you parsing logic from above to.
If you do not want your data file to split or you want a single mapper which will process your entire file. So that one file will be processed by only one mapper. In that case extending map/reduce inputformat and overriding isSplitable() method and return "false" as boolean will help you.
For ref : ( Not based on your code )
https://gist.github.com/sritchie/808035
As the input is getting from the text file, you can override isSplitable() method of fileInputFormat. Using this, one mapper will process the whole file.
public boolean isSplitable(Context context,Path args[0])
{
return false;
}

JMeter set variable to random option

I've been using JMeter and I'm aware of the __Random and __RandomString functions. I need to pick a random option and store it in a variable because it will be used as part of a parameter path for multiple calls. For example:
http://www.example.com/pets/{random option such as: cat, dog, parakeet}/
I've tried doing simple like this, where I set the variable ${query} to one, two, or three using a random controller with userdefined variables as children. This seems like it should work, however I always get ${query} set to three.
Any insight or ideas are will be well recieved. Thanks to all in advance.
You can use Beanshell Pre Processor to generate random value
String[] query = new String[]{"cat", "dog", "parakeet"};
Random random = new Random();
int i = random.nextInt(query.length);
vars.put("randomOption",query[i]);
After that in your HTTP Request
http://www.example.com/pets/${randomOption}
As an alternative to String[] query = new String[]{"cat", "dog", "parakeet"}; you can use Beanshell pre-defined Parameters stanza.
Random random = new Random();
int i = random.nextInt(query.length);
vars.put("randomOption",bsh.args[i]);
I know this is an old post and there is a new function available:
__RandomFromMultipleVars(animalCat|animalDog|animalParakeet, query)
somewhere you need to define the variables:
animalCat=cat
animalDog=dog
animalParakeet=parakeet
It looks like this is not a feature of Jmeter natively. I'm using a plugin that accomplishes this goal. http://jmeter-plugins.org/wiki/Functions/ implements a new function that lets you choose a random string from a list of strings. From their website:
${__chooseRandom(red,green,blue,orange,violet,magenta,randomColor)}
See also:
Get random values from an array
This however requires writing some code in a PreProcessor.

Hadoop Map-Reduce , Need to combine two mapper with one common Reducer

I need to implement below Functionality using Hadoop Map-Reduce?
1) I am reading one input for a mapper from one source & another input from another different input source.
2) I need to pass both output of mapper into a single reducer for further process.
Is there any to do the above requirement in Hadoop Map-Reduce
MultipleInputs.addInputPath is what you are looking for. This is how your configuration would look like. Make sure both AnyMapper1 and AnyMapper2 write the same output expected by MergeReducer
JobConf conf = new JobConf(Merge.class);
conf.setJobName("merge");
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(Text.class);
conf.setReducerClass(MergeReducer.class);
conf.setOutputFormat(TextOutputFormat.class);
MultipleInputs.addInputPath(conf, inputDir1, SequenceFileInputFormat.class, AnyMapper1.class);
MultipleInputs.addInputPath(conf, inputDir2, TextInputFormat.class, AnyMapper2.class);
FileOutputFormat.setOutputPath(conf, outputPath);
You can create a custom writable. You can populate the same in the Mapper. Later in the Reducer you can get the Custom writable Object and do the necessary business operation.

Resources