java.lang.ArrayIndexOutOfBoundsException while execution - hadoop

package com.HadoopExpert;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class SumMapper extends Mapper<LongWritable, Text,Text,IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException{
String s = value.toString();
String[] words = s.split(",");
String gender = words[4];
int sal = Integer.parseInt(words[2]);
con.write(new Text(gender), new IntWritable(sal));
}
}
this is my mapper class code i want to fetch array by index m getting a eror aarray out of bound index
thanx in advance

According to your data metioned in your comment, the index of gender should be 3. Note that the index of array starts from 0 in java.
And you should always check your data before use it, such as:
if (words.length > 3) {
String gender = words[3];
......
}
And you should think about how to process the bad data (count up and then ignore it, or try to fix it, and so on).

Related

Optional class in spring boot?

I'm developing an application that manages accounts. I made a Package named org.sid.entities where exists the IBanqueJob interface and below its code
package org.sid.metier;
import org.sid.entities.Compte;
import org.sid.entities.Operation;
import org.springframework.data.domain.Page;
public interface IBanqueMetier {
public Compte consulterCompte(String CodeCompte);
public void verser(String CodeCompte,double mt);
public void retirer(String CodeCompte, double mt);
public void virement(String Cp1, String Cp2, double mt);
public Page<Operation> listOperation(String cp,int page,int size);
}
and the implementation of this interface, below is its code
package org.sid.metier;
import java.util.Date;
import org.sid.dao.CompteRepository;
import org.sid.dao.OperationRepository;
import org.sid.entities.Compte;
import org.sid.entities.Operation;
import org.sid.entities.Versement;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.Page;
import org.springframework.lang.Nullable;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
#Service
#Transactional
public class BanqueMetierImpl implements IBanqueMetier {
#Autowired
private CompteRepository compteRepository;
#Autowired
private OperationRepository operationRepository;
#Override
public Compte consulterCompte(String CodeCompte) {
Compte cp = compteRepository.findById(CodeCompte);
if (cp == null)
throw new RuntimeException("compte introuvable");
return cp;
}
i have an error in this line below that says "Type mismatch: cannot convert from Optional to Compte "
Compte cp = compteRepository.findById(CodeCompte);
findById returns an Optional of something, meaning that we either have a single result for a given id or we don't have anything for that.
In order to unpack this Optional, what I usually advise to do is the following:
Compte cp = compteRepository.findById(CodeCompte).orElseThrow(() -> new Exception("Element not found!");
This will throw an exception in case we don't find anything for that specific id.
In some cases is more beneficial to return some default value instead of throwing an exception. In this case we can use this:
Compte cp = compteRepository.findById(CodeCompte).orElse(new Compte());
or with a supplier:
Compte cp = compteRepository.findById(CodeCompte).orElseGet(() -> new Compte());
compteRepository.findById returns an Optional as it implements probably the CrudRepository Interface.
So please use
Optional<Compte> cp = compteRepository.findById(CodeCompte);

Instantiate a field level HashMap in JCodeModel

I want to declare and instantiate a HashMap in one go in JCodeModel.
I do:
jc.field(JMod.PRIVATE, HashMap.class, "initAttributes");
which declares it but doesn't instantiate it. How do I instantiate it?
Thanks
In the simplest case, you can just append the initialization directly to your creation of the field:
jc.field(JMod.PRIVATE, HashMap.class, "initAttributes")
.init(JExpr._new(codeModel.ref(HashMap.class)));
Some further hints:
Considering that you should usually program to an interface, it is a good practice to declare the variable using a type that is "as basic as possible". You should hardly ever declare a variable as
private HashMap map;
but basically always only as
private Map map;
because Map is the interface that is relevant here.
You can also add generics in JCodeModel. These usually involve some calls to narrow on certain types. It is a bit more effort, but it will generate code that can be compiled without causing warnings due to the raw types.
An example is shown here. (It uses String as the key type and Integer as the value type of the map. You may adjust this accordingly)
import java.util.HashMap;
import java.util.Map;
import com.sun.codemodel.CodeWriter;
import com.sun.codemodel.JClass;
import com.sun.codemodel.JCodeModel;
import com.sun.codemodel.JDefinedClass;
import com.sun.codemodel.JExpr;
import com.sun.codemodel.JMod;
import com.sun.codemodel.writer.SingleStreamCodeWriter;
public class InitializeFieldInCodeModel
{
public static void main(String[] args) throws Exception
{
JCodeModel codeModel = new JCodeModel();
JDefinedClass definedClass = codeModel._class("com.example.Example");
JClass keyType = codeModel.ref(String.class);
JClass valueType = codeModel.ref(Integer.class);
JClass mapClass =
codeModel.ref(Map.class).narrow(keyType, valueType);
JClass hashMapClass =
codeModel.ref(HashMap.class).narrow(keyType, valueType);
definedClass.field(JMod.PRIVATE, mapClass, "initAttributes")
.init(JExpr._new(hashMapClass));
CodeWriter codeWriter = new SingleStreamCodeWriter(System.out);
codeModel.build(codeWriter);
}
}
The generated class looks as follows:
package com.example;
import java.util.HashMap;
import java.util.Map;
public class Example {
private Map<String, Integer> initAttributes = new HashMap<String, Integer>();
}

Error in importing hive packages

I am new to hive's udf. I have downloaded "apache-hive-2.1.0-bin" and configured build path of my project to apache-hive-2.1.0-bin\lib (all jars).
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text; // <= I am not able to import this package.
public class replace extends UDF {
private Text result = new Text();
public Text evaluate(String str, String str1, String str2) {
String rep = str.replace(str1, str2);
result.set(rep);
return result;
}
}
add the following in lib path hadoop-common-2.2.0.jar
It will be available in
http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/2.2.0/hadoop-common-2.2.0.jar

Explanation of mapper program for search in hadoop

I am new to hadoop, so i am having difficulty in understanding the programs a little. So, If someone can help me in understanding this mapper program ?
package SearchTxn;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MyMap extends Mapper<LongWritable, Text, NullWritable, Text>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String Txn = value.toString();
String TxnParts[] = Txn.split(",");
Double Amt = Double.parseDouble(TxnParts[3]);
String Uid = TxnParts[2];
if(Uid.equals("4000010") && Amt>100)
{
context.write(null, value);
}
}
}
The code basically filters lines in which Uid (second column in your csv) is "4000010" and Amt (I guess for amount, third column in your csv) is greater than 100.
Along with answer from #Thomas Jungblut, below line of your program says about Mapper class overall input and output. Here nothing is retuned as a key but Text as a value.
public class MyMap extends Mapper<LongWritable, Text, NullWritable, Text>{
So are the parameters in write method.
context.write(null, value);
Its not always necessary to write key for serialization from Mapper class. As per your use case , either key or value or both can be written to context.write method.

Run and Reduce mehods in Reducer class

Can anybody help me explaining the execution flow of run() and reduce() method in a Reducer class. I am trying to calculate the average of word counts in my MapReduce job. My Reducer class receives "word" and "iterable of occurrences" as key-value pairs.
My objective is to calculate the average of word occurrences with respect to all the words in the document. Can run() method in reducer iterate through all the keys and count all the number of words? I can then use this sum to find the average by looping through each iterable value provided with the keys
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class AverageReducer extends Reducer<Text, IntWritable, Text,IntWritable> {
private IntWritable average = new IntWritable();
private static int count=0;
protected void run()
{
//loop through all the keys and increment count
}
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum=0;
for(IntWritable val:values)
{
sum=sum+val.get();
}
average.set(sum/count);
context.write(key, average);
}
As described here you can't iterate over values twice. And i think it is bad idea to override run method, it just iterates through keys and calls reduce method for every pair (source). So you can't calculate the average of word occurrences with only one map-reduce job.

Resources