I'm trying to test a Hadoop .mapreduce Avro job using MRUnit. I am receiving a NullPointerException as seen below. I've attached a portion of the pom and source code. Any assistance would be appreciated.
Thanks
The error I'm getting is :
java.lang.NullPointerException
at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:73)
at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:91)
at org.apache.hadoop.mrunit.internal.io.Serialization.copyWithConf(Serialization.java:104)
at org.apache.hadoop.mrunit.TestDriver.copy(TestDriver.java:608)
at org.apache.hadoop.mrunit.MapDriverBase.setInputKey(MapDriverBase.java:64)
at org.apache.hadoop.mrunit.MapDriverBase.setInput(MapDriverBase.java:104)
at org.apache.hadoop.mrunit.MapDriverBase.withInput(MapDriverBase.java:218)
at org.lab41.project.mapreduce.ParseMetadataAsTextIntoAvroTest.testMap(ParseMetadataAsTextIntoAvroTest.java:115)
.....
pom snippet:
<dependency>
<groupId>org.apache.mrunit</groupId>
<artifactId>mrunit</artifactId>
<version>0.9.0-incubating</version>
<classifier>hadoop2</classifier>
<scope>test</scope>
</dependency>
<avro.version>1.7.4</avro.version>
<hadoop.version>2.0.0-mr1-cdh4.1.3</hadoop.version>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>${avro.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>${avro.version}</version>
<classifier>hadoop2</classifier>
</dependency>
Here is an excerpt of the test :
import static org.junit.Assert.*;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import org.apache.avro.mapred.AvroKey;
import org.apache.avro.hadoop.io.AvroSerialization;
import org.apache.avro.mapred.AvroValue;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.types.Pair;
import org.junit.After;
import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;
import org.lab41.project.domain.DataRecord;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class ParseMetadataAsTextIntoAvroTest {
Logger logger = LoggerFactory
.getLogger(ParseMetadataAsTextIntoAvroTest.class);
private MapDriver<LongWritable, Text, AvroKey<Long>, AvroValue<DataRecord>> mapDriver;
#BeforeClass
public static void setUpClass() {
}
#AfterClass
public static void tearDownClass() {
}
#Before
public void setUp() throws IOException {
ParseMetadataAsTextIntoAvroMapper mapper = new ParseMetadataAsTextIntoAvroMapper();
mapDriver = new MapDriver<LongWritable, Text, AvroKey<Long>, AvroValue<DataRecord>>();
mapDriver.setMapper(mapper);
mapDriver.getConfiguration().setStrings("io.serializations", new String[]{
AvroSerialization.class.getName()
});
}
#Test
public void testMap() throws ParseException, IOException {
Text testInputText = new Text(test0);
DataRecord record = new DataRecord();
….
AvroKey<Long> expectedPivot = new AvroKey<Long>(1L);
AvroValue<DataRecord> expectedRecord = new AvroValue<DataRecord>(record);
mapDriver.withInput(new Pair<LongWritable, Text>(new LongWritable(1), testInputText));
mapDriver.withOutput(new Pair<AvroKey<Long>, AvroValue<DataRecord>>(expectedPivot, expectedRecord));
mapDriver.runTest();
}
}
In order to get this to work you have add the AvroSerializatio to the default serailizations. You also have to configure AvroSerializationn.
#Before
public void setUp() throws IOException {
ParseMetadataAsTextIntoAvroMapper mapper = new ParseMetadataAsTextIntoAvroMapper();
mapDriver = new MapDriver<LongWritable, Text, AvroKey<Long>, AvroValue<NetworkRecord>>();
mapDriver.setMapper(mapper);
//Copy over the default io.serializations. If you don't do this then you will
//not be able to deserialize the inputs to the mapper
String[] strings = mapDriver.getConfiguration().getStrings("io.serializations");
String[] newStrings = new String[strings.length +1];
System.arraycopy( strings, 0, newStrings, 0, strings.length );
newStrings[newStrings.length-1] = AvroSerialization.class.getName();
//Now you have to configure AvroSerialization by sepecifying the key
//writer Schema and the value writer schema.
mapDriver.getConfiguration().setStrings("io.serializations", newStrings);
mapDriver.getConfiguration().setStrings("avro.serialization.key.writer.schema", Schema.create(Schema.Type.LONG).toString(true));
mapDriver.getConfiguration().setStrings("avro.serialization.value.writer.schema", NetworkRecord.SCHEMA$.toString(true));
}
This also solve the problem, with merits of shorter and more clear code.
MapDriver driver = MapDriver.newMapDriver(your mapper);
Configuration conf = driver.getConfiguration();
AvroSerialization.addToConfiguration(conf);
AvroSerialization.setKeyWriterSchema(conf, your schema);
AvroSerialization.setKeyReaderSchema(conf, your schema);
Job job = new Job(conf);
job.set... your job settings;
AvroJob.set... your avro job settings;
It may be bug of mrunit, that don't set the io.serializations right
Instead it should have been set by job.setInputFormatClass(AvroKeyInputFormat.class) I think.
You have to add AvroSerialization to the default serializations and configure AvroSerialization.
#Before
public void setUp() throws IOException {
ParseMetadataAsTextIntoAvroMapper mapper = new ParseMetadataAsTextIntoAvroMapper();
mapDriver = new MapDriver<LongWritable, Text, AvroKey<Long>, AvroValue<NetworkRecord>>();
mapDriver.setMapper(mapper);
Configuration configuration = mapDriver.getConfiguration();
// Add AvroSerialization to the configuration
// (copy over the default serializations for deserializing the mapper inputs)
String[] serializations = configuration.getStrings(CommonConfigurationKeysPublic.IO_SERIALIZATIONS_KEY);
String[] newSerializations = Arrays.copyOf(serializations, serializations.length + 1);
newSerializations[serializations.length] = AvroSerialization.class.getName();
configuration.setStrings(CommonConfigurationKeysPublic.IO_SERIALIZATIONS_KEY, newSerializations);
//Configure AvroSerialization by specifying the key writer and value writer schemas
AvroSerialization.setKeyWriterSchema(configuration, Schema.create(Schema.Type.LONG));
AvroSerialization.setValueWriterSchema(configuration, NetworkRecord.SCHEMA$)
}
Answered here: https://issues.apache.org/jira/browse/MRUNIT-181 specifically: https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+with+Avro
Related
I am new to kafka and currently looking at Kafka Streams, especially joining two streams.
The samples I browsed worked with rather simple messages/ text messages.
So I constructed another simple sample, that more applies to the traditional ETL.
Let's say, we have two "datasets": Contract (=Vertrag) and Cashflow, with a cardinality of 1 to n.
In my sample I created a topic for each and sent objects (Vertrag, Cashflow) to each.
And I managed a first join of them.
KStream<String, String> joined = srcVertrag.leftJoin(srcCashflow,
(leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue, /* ValueJoiner */
JoinWindows.of(5000),
Joined.with(
Serdes.String(), /* key */
Serdes.String(), /* left value */
Serdes.String()) /* right value */
);
The result looks like this:
left={"name":"Vertrag123","vertragId":"123"}, right={"buchungstag":1560715764709,"betrag":12.0,"vertragId":"123"}
Now my questions:
is this the right way to do this?
should I create Objects at all or rather process just Strings?
After your hints and further research, I came up with the following test.
- I created Pojos for "Vertrag" and "Cashflow"
- I created Serdes for each
- I stream them as objects
- Finally I try to join them into a Wrapper-Class. (and here I hang)
I don't find samples, that do something like this. Is this so exotic?
package tki.bigdata.kafkaetl;
import java.time.Duration;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.Serializer;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.JoinWindows;
import org.apache.kafka.streams.kstream.Joined;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Printed;
import org.apache.kafka.streams.kstream.ValueJoiner;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.scheduling.annotation.EnableScheduling;
import tki.bigdata.domain.Cashflow;
import tki.bigdata.domain.Vertrag;
import tki.bigdata.serde.JsonPOJODeserializer;
import tki.bigdata.serde.JsonPOJOSerializer;
#ComponentScan(basePackages = { "tki.bigdata.domain", "tki.bigdata.config", "tki.bigdata.app" }, basePackageClasses = App.class)
#SpringBootApplication
#EnableScheduling
public class App implements CommandLineRunner {
private static String bootstrapServers = "tobi0179.westeurope.cloudapp.azure.com:9092";
#Autowired
private KafkaTemplate<String, Object> template;
// #Autowired
// ExcelReader excelReader;
public static void main(String[] args) {
SpringApplication.run(App.class, args).close();
}
private void populateSampleData() {
Vertrag v = new Vertrag();
v.setVertragId("123");
v.setName("Vertrag123");
template.send("Vertrag", "123", v);
//template.send("Vertrag", "124", "124;Vertrag12");
Cashflow c = new Cashflow();
c.setVertragId("123");
c.setBetrag(12);
c.setBuchungstag(new Date());
template.send("Cashflow", "123", c);
}
//#Override
public void run(String... args) throws Exception {
// Topics mit Demodata befüllen
populateSampleData();
Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
// TODO: the following can be removed with a serialization factory
Map<String, Object> serdeProps = new HashMap<>();
// prepare Serde for Vertrag
final Serializer<Vertrag> vertragSerializer = new JsonPOJOSerializer<Vertrag>();
serdeProps.put("JsonPOJOClass", Vertrag.class);
vertragSerializer.configure(serdeProps, false);
final Deserializer<Vertrag> vertragDeserializer = new JsonPOJODeserializer<Vertrag>();
serdeProps.put("JsonPOJOClass", Vertrag.class);
vertragDeserializer.configure(serdeProps, false);
final Serde<Vertrag> vertragSerde = Serdes.serdeFrom(vertragSerializer, vertragDeserializer);
// prepare Serde for Cashflow
final Serializer<Cashflow> cashflowSerializer = new JsonPOJOSerializer<Cashflow>();
serdeProps.put("JsonPOJOClass", Vertrag.class);
cashflowSerializer.configure(serdeProps, false);
final Deserializer<Cashflow> cashflowDeserializer = new JsonPOJODeserializer<Cashflow>();
serdeProps.put("JsonPOJOClass", Vertrag.class);
cashflowDeserializer.configure(serdeProps, false);
final Serde<Cashflow> cashflowSerde = Serdes.serdeFrom(cashflowSerializer, cashflowDeserializer);
// streamsConfiguration.put(StreamsConfig.STATE_DIR_CONFIG,
// TestUtils.tempDir().getAbsolutePath());
StreamsBuilder builder = new StreamsBuilder();
KStream<String, Vertrag> srcVertrag = builder.stream("Vertrag");
KStream<String, Cashflow> srcCashflow = builder.stream("Cashflow");
// print to sysout
//srcVertrag.print(Printed.toSysOut());
KStream<String, MyValueContainer> joined = srcVertrag.leftJoin(srcCashflow,
(leftValue, rightValue) -> new MyValueContainer(leftValue , rightValue), /* ValueJoiner */
JoinWindows.of(600),
Joined.with(
Serdes.String(), /* key */
vertragSerde, /* left value */
cashflowSerde) /* right value */
);
joined.to("Output");
final Topology topology = builder.build();
System.out.println(topology.describe());
final KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}
When executed, it produces the error:
2019-06-17 22:18:31.892 ERROR 1599 --- [-StreamThread-1] o.a.k.s.p.i.AssignedStreamsTasks : stream-thread [streams-pipe-0638d359-94df-43bd-9ef7-eb6769ed8a1c-StreamThread-1] Failed to process stream task 0_0 due to the following error:
java.lang.ClassCastException: java.lang.String cannot be cast to tki.bigdata.domain.Vertrag
at org.apache.kafka.streams.kstream.internals.KStreamKStreamJoin$KStreamKStreamJoinProcessor.process(KStreamKStreamJoin.java:98) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.kstream.internals.KStreamJoinWindow$KStreamJoinWindowProcessor.process(KStreamJoinWindow.java:63) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:129) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:302) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:409) [kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:964) [kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:832) [kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767) [kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736) [kafka-streams-2.0.1.jar:na]
is this the right way to do this?
Yes.
should I create Objects at all or rather process just Strings?
Yes. Look at Avro as a good example of a data format for serializing/deserializing your pojos. Here, you are looking for an Avro "serde" (serializer/deserializer). Confluent provide such an Avro serde for KStreams, for instance (this serde requires the use of Confluent Schema Registry).
what should I do with the above result?
It's unclear to me what your question is.
How can I send file in Feign as byte array?
#RequestLine("POST /api/files/{num}/push")
#Headers({"Content-Type: application/zip"})
void pushFile(#Param("num") String num, #Param("file") byte[] file);
This is not working and passing the data in form of json with top element named file.
What can I do to properly receive array of bytes on the other side using this controller method parameter annotation?
#RequestBody byte[] file
You can try OpenFeign/feign-form, simple example:
pom.xml dependencies
<dependencies>
<!--feign dependencies-->
<dependency>
<groupId>io.github.openfeign.form</groupId>
<artifactId>feign-form</artifactId>
<version>3.8.0</version>
</dependency>
<dependency>
<groupId>io.github.openfeign</groupId>
<artifactId>feign-core</artifactId>
<version>10.1.0</version>
</dependency>
<!--jetty to dependencies to check feign request-->
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
<version>9.4.3.v20170317</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-servlet</artifactId>
<version>9.4.3.v20170317</version>
</dependency>
</dependencies>
FeignUploadFileExample.java:
import feign.*;
import feign.codec.EncodeException;
import feign.codec.Encoder;
import org.eclipse.jetty.http.HttpStatus;
import org.eclipse.jetty.server.Server;
import org.eclipse.jetty.servlet.ServletContextHandler;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.*;
import java.lang.reflect.Type;
import java.util.Map;
import static java.nio.charset.StandardCharsets.UTF_8;
public class FeignUploadFileExample {
public static void main(String[] args) throws Exception {
// start jetty server to check feign request
startSimpleJettyServer();
SimpleUploadFileApi simpleUploadFileApi = Feign.builder()
.encoder(new SimpleFileEncoder())
.target(SimpleUploadFileApi.class, "http://localhost:8080/upload");
// upload file bytes (simple string bytes)
byte[] fileBytes = "Hello World".getBytes();
String response = simpleUploadFileApi.uploadFile(fileBytes);
System.out.println(response);
}
public static final String FILE_PARAM = "file";
// encode #Param("file") to request body bytes
public static class SimpleFileEncoder implements Encoder {
public void encode(Object object, Type type, RequestTemplate template)
throws EncodeException {
template.body(Request.Body.encoded(
(byte[]) ((Map) object).get(FILE_PARAM), UTF_8));
}
}
// feign interface to upload file
public interface SimpleUploadFileApi {
#RequestLine("POST /upload")
#Headers("Content-Type: application/zip")
String uploadFile(#Param(FILE_PARAM) byte[] file);
}
// embedded jetty server
public static void startSimpleJettyServer() throws Exception {
Server server = new Server(8080);
ServletContextHandler handler = new ServletContextHandler(server, "/upload");
handler.addServlet(SimpleBlockingServlet.class, "/");
server.start();
}
// simple servlet get request and return info of received data
public static class SimpleBlockingServlet extends HttpServlet {
protected void doPost(
HttpServletRequest request,
HttpServletResponse response) throws IOException {
String data = new BufferedReader(
new InputStreamReader(request.getInputStream())).readLine();
response.setStatus(HttpStatus.OK_200);
response.getWriter().println("Request header 'Content-Type': " +
request.getHeaders("Content-Type").nextElement());
response.getWriter().println("Request downloaded file data: " + data);
}
}
}
response output:
Request header 'Content-Type': application/zip
Request downloaded file data: Hello World
also #RequestBody it's annotation for REST json body, for files:
#RequestParam("file") MultipartFile file
take a look at Spring Boot Uploading Files
You can use FormData from feign-form to upload file and specify Content-type and name:
<dependency>
<groupId>io.github.openfeign.form</groupId>
<artifactId>feign-form</artifactId>
<version>3.8.0</version>
</dependency>
#RequestLine("POST /api/files/{num}/push")
void pushFile(#Param("num") String num, #Param("file") FormData file);
Usage example:
byte[] bytes = getFileContent();
FormData file = new FormData("application/zip", "example.zip", bytes);
client.pushFile(num, file);
Or in case you are using spring-cloud-starter-openfeign:
#PostMapping("/api/files/{num}/push")
void pushFile(#PathVariable String num, #RequestPart FormData file);
Tested this for spring-cloud-starter-openfeign but I assume it works without spring considering FormData class lives in form-data dependency.
Hi I just wanted to ask for advice I am really thankful to those who can suggest any. Thanks in advance. I am trying to send an email with attachment it is working if I run it in desktop application netbeans maven however if it I run it in tomcat it will send with no error but if I check the email there is no attachment or the attachment is not sending correctly? if I run in netbeans main class it will send with attachment
I didnt get any error in tomcat or netbeans
I dont receive attachment in email when run in tomcat I only recieve message below
------=_Part_0_1437359590.1537185365793--
but if I run in netbeans main class I will receive correct attachment
I dont know why
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring</artifactId>
<version>2.5.6.SEC03</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-test</artifactId>
<version>2.5.6.SEC03</version>
<scope>test</scope>
</dependency>
<!-- Change plugin specific dependencies here -->
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>jsp-api</artifactId>
<version>2.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.3</version>
</dependency>
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
<version>2.3</version>
</dependency>
<!-- End change plugin specific dependencies here -->
<dependency>
<groupId>net.sourceforge.jexcelapi</groupId>
<artifactId>jxl</artifactId>
<version>2.6.12</version>
<type>jar</type>
</dependency>
<dependency>
<groupId>com.sun.mail</groupId>
<artifactId>javax.mail</artifactId>
<version>1.5.5</version>
<type>jar</type>
</dependency>
</dependencies>
code
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;
import javax.activation.DataHandler;
import javax.activation.DataSource;
import javax.mail.Message;
import javax.mail.MessagingException;
import javax.mail.Multipart;
import javax.mail.PasswordAuthentication;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.AddressException;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeBodyPart;
import javax.mail.internet.MimeMessage;
import javax.mail.internet.MimeMultipart;
import javax.mail.util.ByteArrayDataSource;
import jxl.Workbook;
import jxl.format.Alignment;
import jxl.format.Border;
import jxl.format.BorderLineStyle;
import jxl.format.Colour;
import jxl.format.VerticalAlignment;
import jxl.write.Label;
import jxl.write.WritableCellFormat;
import jxl.write.WritableFont;
import jxl.write.WritableSheet;
import jxl.write.WritableWorkbook;
public class Stack {
public static void main(String[] args) throws IOException {
try {
// *** for Database Connected ***//
Connection connect = null;
Statement s = null;
Class.forName("com.mysql.jdbc.Driver");
connect = DriverManager.getConnection("jdbc:mysql://localhost/mydatabase?user=root&password=root");
s = connect.createStatement();
String sql = "SELECT * FROM customer ORDER BY CustomerID ASC";
ResultSet rec = s.executeQuery(sql);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
WritableWorkbook workbook = Workbook.createWorkbook(baos);
// *** Create Font ***//
WritableFont fontBlue = new WritableFont(WritableFont.TIMES, 10);
fontBlue.setColour(Colour.BLUE);
WritableFont fontRed = new WritableFont(WritableFont.TIMES, 10);
fontRed.setColour(Colour.RED);
// *** Sheet 1 ***//
WritableSheet ws1 = workbook.createSheet("mySheet1", 0);
// *** Header ***//
WritableCellFormat cellFormat1 = new WritableCellFormat(fontRed);
// cellFormat2.setBackground(Colour.ORANGE);
cellFormat1.setAlignment(Alignment.CENTRE);
cellFormat1.setVerticalAlignment(VerticalAlignment.CENTRE);
cellFormat1.setBorder(Border.ALL, BorderLineStyle.THIN);
// *** Data ***//
WritableCellFormat cellFormat2 = new WritableCellFormat(fontBlue);
// cellFormat2.setWrap(true);
cellFormat2.setAlignment(jxl.format.Alignment.CENTRE);
cellFormat2.setVerticalAlignment(VerticalAlignment.CENTRE);
cellFormat2.setWrap(true);
cellFormat2.setBorder(jxl.format.Border.ALL, jxl.format.BorderLineStyle.HAIR, jxl.format.Colour.BLACK);
ws1.mergeCells(0, 0, 5, 0);
Label lable = new Label(0, 0, "Customer Report", cellFormat1);
ws1.addCell(lable);
// *** Header ***//
ws1.setColumnView(0, 10); // Column CustomerID
ws1.addCell(new Label(0, 1, "CustomerID", cellFormat1));
ws1.setColumnView(1, 15); // Column Name
ws1.addCell(new Label(1, 1, "Name", cellFormat1));
ws1.setColumnView(2, 25); // Column Email
ws1.addCell(new Label(2, 1, "Email", cellFormat1));
ws1.setColumnView(3, 12); // Column CountryCode
ws1.addCell(new Label(3, 1, "CountryCode", cellFormat1));
ws1.setColumnView(4, 10); // Column Budget
ws1.addCell(new Label(4, 1, "Budget", cellFormat1));
ws1.setColumnView(5, 10); // Column Used
ws1.addCell(new Label(5, 1, "Used", cellFormat1));
int iRows = 2;
while((rec!=null) && (rec.next())) {
ws1.addCell(new Label(0, iRows, rec.getString("CustomerID"), cellFormat2));
ws1.addCell(new Label(1, iRows, rec.getString("Name"), cellFormat2));
ws1.addCell(new Label(2, iRows, rec.getString("Email"), cellFormat2));
ws1.addCell(new Label(3, iRows, rec.getString("CountryCode"), cellFormat2));
ws1.addCell(new Label(4, iRows, rec.getString("Budget"), cellFormat2));
ws1.addCell(new Label(5, iRows, rec.getString("Used"), cellFormat2));
++iRows;
}
workbook.write();
workbook.close();
System.out.println("Excel file created.");
// Close
try {
if (connect != null) {
s.close();
connect.close();
}
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
sendMail(baos);
} catch (Exception e) {
e.printStackTrace();
}
}
private static void sendMail(ByteArrayOutputStream baos) throws AddressException, MessagingException {
final String username = "your.mail.id#gmail.com";
final String password = "your.password";
Properties props = new Properties();
props.put("mail.smtp.auth", true);
props.put("mail.smtp.starttls.enable", true);
props.put("mail.smtp.host", "smtp.gmail.com");
props.put("mail.smtp.port", "587");
props.put("protocol", "smtp");
Session session = Session.getInstance(props, new javax.mail.Authenticator() {
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(username, password);
}
});
Message message = new MimeMessage(session);
message.setFrom(new InternetAddress("from.mail.id#g_mail.com"));
message.setRecipients(Message.RecipientType.TO, InternetAddress.parse("to.your.mail#g_mail.com"));
message.setSubject("Testing Subject");
message.setText("PFA");
MimeBodyPart messageBodyPart = new MimeBodyPart();
Multipart multipart = new MimeMultipart();
messageBodyPart = new MimeBodyPart();
String fileName = "attachmentName.xls";
DataSource aAttachment = new ByteArrayDataSource(baos.toByteArray(), "application/octet-stream");
messageBodyPart.setDataHandler(new DataHandler(aAttachment));
messageBodyPart.setFileName(fileName);
multipart.addBodyPart(messageBodyPart);
message.setContent(multipart);
System.out.println("Sending");
Transport.send(message);
System.out.println("Done");
}
}
At the moment I'm writing a couple of evaluatuation programs with iText.
I have an issue with AES Encryption.
STANDARD_ENCRYPTION_128 is working fine but ENCRYPTION_AES_128 produces a runtime error.
I tried a lot but nothing worked. Has anybody a clue what's wrong here?
Thanks, Dirk
import java.awt.Desktop;
import java.io.File;
import java.io.IOException;
import com.itextpdf.kernel.pdf.EncryptionConstants;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.pdf.WriterProperties;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.Paragraph;
public class problem3 {
public static void main(String[] args) throws IOException {
String fnPdf = "results/problem3.pdf";
WriterProperties properties = new WriterProperties();
// ENCRYPTION_AES_128 produces an runtime error, STANDARD_ENCRYPTION_128 is working.
properties.setStandardEncryption("Hello".getBytes(), "World".getBytes(), EncryptionConstants.ALLOW_PRINTING,
EncryptionConstants.ENCRYPTION_AES_128);
PdfWriter writer = new PdfWriter(fnPdf, properties);
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
Paragraph paragraph = new Paragraph("Hello AES-128!");
document.add(paragraph);
document.close();
pdf.close();
Desktop.getDesktop().open(new File(fnPdf));
}
}
Your error at runtime is probably
java.lang.NoClassDefFoundError: org/bouncycastle/crypto/BlockCipher
This is because iText uses BouncyCastle library for providing some of the encryption capabilities. The dependency is optional which means you have to add it manually if you need it.
If you use Maven for building, then make sure that you have the following dependencies:
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcpkix-jdk15on</artifactId>
<version>1.49</version>
</dependency>
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk15on</artifactId>
<version>1.49</version>
</dependency>
If you are adding jars to the classpath manually, which is not recommended, then you can go to Maven Central and download the necessary artifact jars manually.
The Mapper is reading file from two places
1) Articles visited by user(sorting by country)
2) Statistics of country (country wise)
The output of both Mapper is Text, Text
I am running program of Amazon Cluster
My aim is read data from two different set and combine the result and store it in hbase.
HDFS to HDFS is working.
The code is getting stuck at reducing 67% and gives error as
17/02/24 10:45:31 INFO mapreduce.Job: map 0% reduce 0%
17/02/24 10:45:37 INFO mapreduce.Job: map 100% reduce 0%
17/02/24 10:45:49 INFO mapreduce.Job: map 100% reduce 67%
17/02/24 10:46:00 INFO mapreduce.Job: Task Id : attempt_1487926412544_0016_r_000000_0, Status : FAILED
Error: java.lang.IllegalArgumentException: Row length is 0
at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:565)
at org.apache.hadoop.hbase.client.Put.<init>(Put.java:110)
at org.apache.hadoop.hbase.client.Put.<init>(Put.java:68)
at org.apache.hadoop.hbase.client.Put.<init>(Put.java:58)
at com.happiestminds.hadoop.CounterReducer.reduce(CounterReducer.java:45)
at com.happiestminds.hadoop.CounterReducer.reduce(CounterReducer.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:635)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Driver class is
package com.happiestminds.hadoop;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class Main extends Configured implements Tool {
/**
* #param args
* #throws Exception
*/
public static String outputTable = "mapreduceoutput";
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new Main(), args);
System.exit(exitCode);
}
#Override
public int run(String[] args) throws Exception {
Configuration config = HBaseConfiguration.create();
try{
HBaseAdmin.checkHBaseAvailable(config);
}
catch(MasterNotRunningException e){
System.out.println("Master not running");
System.exit(1);
}
Job job = Job.getInstance(config, "Hbase Test");
job.setJarByClass(Main.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, ArticleMapper.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, StatisticsMapper.class);
TableMapReduceUtil.addDependencyJars(job);
TableMapReduceUtil.initTableReducerJob(outputTable, CounterReducer.class, job);
//job.setReducerClass(CounterReducer.class);
job.setNumReduceTasks(1);
return job.waitForCompletion(true) ? 0 : 1;
}
}
Reducer class is
package com.happiestminds.hadoop;
import java.io.IOException;
import org.apache.hadoop.hbase.client.Mutation;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class CounterReducer extends TableReducer<Text, Text, ImmutableBytesWritable> {
public static final byte[] CF = "counter".getBytes();
public static final byte[] COUNT = "combined".getBytes();
#Override
protected void reduce(Text key, Iterable<Text> values,
Reducer<Text, Text, ImmutableBytesWritable, Mutation>.Context context)
throws IOException, InterruptedException {
String vals = values.toString();
int counter = 0;
StringBuilder sbr = new StringBuilder();
System.out.println(key.toString());
for (Text val : values) {
String stat = val.toString();
if (stat.equals("***")) {
counter++;
} else {
sbr.append(stat + ",");
}
}
sbr.append("Article count : " + counter);
Put put = new Put(Bytes.toBytes(key.toString()));
put.addColumn(CF, COUNT, Bytes.toBytes(sbr.toString()));
if (counter != 0) {
context.write(null, put);
}
}
}
Dependencies
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.2</version>
</dependency>
</dependencies>
A good practice is to validate your values before submitting them somewhere. In your particular case you can validate your key and sbr or wrap them into try-catch section with proper notification policy. You should output them into some log if they are not correct and update you unit tests with new test-cases:
try
{
Put put = new Put(Bytes.toBytes(key.toString()));
put.addColumn(CF, COUNT, Bytes.toBytes(sbr.toString()));
if (counter != 0) {
context.write(null, put);
}
}
catch (IllegalArgumentException ex)
{
System.err.println("Error processing record - Key: "+ key.toString() +", values: " +sbr.ToString());
}
According to the exception thrown by the program it is clear that key length is 0 so before putting into hbase you can check if key length is 0 or not then only you can put into the hbase.
More clarity why key length's 0 is not supported by hbase
Becuase HBase data model does not allow 0-length row key, it should be at least 1 byte. 0-byte row key is reserved for internal usage (to designate empty start key and end keys).
Can you try to check whether you are inserting any null values or not ?
HBase data model does not allow zero length row key, it should be at least 1 byte.
Please check in your reducer code before executing the put command , whether some of the values are populated to null or not.
The error you get is quite self-explanatory. Row keys in HBase can't be empty (though values can be).
#Override
protected void reduce(Text key, Iterable<Text> values,
Reducer<Text, Text, ImmutableBytesWritable, Mutation>.Context context)
throws IOException, InterruptedException {
if (key == null || key.getLength() == 0) {
// Log a warning about the empty key.
return;
}
// Rest of your reducer follows.
}