Not able to convert the byte[] to string in scala - spark-streaming

**I'm trying to stream the data from kafka and convert it in to a data frame.followed this link
But when im running both producer and consumer applications, this is the output on my console.**
(0,[B#370ed56a) (1,[B#2edd3e63) (2,[B#3ba2944d) (3,[B#2eb669d1)
(4,[B#49dd304c) (5,[B#4f6af565) (6,[B#7714e29e)
Which is literally the output of the kafka producer, the topic is empty before pushing messages.
Here is the producer code snippet :
Properties props = new Properties();
props.put("bootstrap.servers", "##########:9092");
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.ByteArraySerializer");
props.put("producer.type", "async");
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(EVENT_SCHEMA);
Injection<GenericRecord, byte[]> records = GenericAvroCodecs.toBinary(schema);
KafkaProducer<String, byte[]> producer = new KafkaProducer<String, byte[]>(props);
for (int i = 0; i < 100; i++) {
GenericData.Record avroRecord = new GenericData.Record(schema);
setEventValues(i, avroRecord);
byte[] messages = records.apply(avroRecord);
ProducerRecord<String, byte[]> producerRecord = new ProducerRecord<String, byte[]>(
"topic", String.valueOf(i),messages);
System.out.println(producerRecord);
producer.send(producerRecord);
}
And its output is:
key=0, value=[B#680387a key=1, value=[B#32bfb588 key=2,
value=[B#2ac2e1b1 key=3, value=[B#606f4165 key=4, value=[B#282e7f59
Here is my consumer code snippet written in scala,
"group.id" -> "KafkaConsumer",
"zookeeper.connection.timeout.ms" -> "1000000"
val topicMaps = Map("topic" -> 1)
val messages = KafkaUtils.createStream[String, Array[Byte], StringDecoder, DefaultDecoder](ssc, kafkaConf, topicMaps, StorageLevel.MEMORY_ONLY_SER)
messages.print()
I've tried with both StringDecoder and DefaultDecoder in createStream().I'm sure that, the producer and consumer are in compliance with each other.
Any help , from anybody?

Related

QueueBrowser vs MessageConsumer

When we compare QueueBrowser with MessageListener, QueueBrowser is very slow.
QueueBrowser is taking approx 1 min to process 100 messages where as consumer is processing ~840 messages.
This mush difference is expected? can you please suggest if anything needs to be changed in the below code:
queueEnum = queueBrowserIn.GetEnumerator();
while (true)
{
if (queueEnum.MoveNext())
{
messageCount++;
LogWrite($"Message No - {messageCount} - Method: ProcessNewMesage" + DateTime.Now);
IBytesMessage bytesMessage = queueEnum.Current as IBytesMessage;
if (bytesMessage != null)
{
byte[] arrayMessage = new byte[bytesMessage.BodyLength];
bytesMessage.ReadBytes(arrayMessage);
string message = System.Text.Encoding.Default.GetString(arrayMessage);
}
}
}

MQHeaderList size is 0, but reading a specific MQRFH2 works

I am wring custom java code to read messages from Websphere MQ (version 8) and read all the headers from the MQ message.
When I use the MQHeaderList to parse all the headers the list size is 0:
MQMessage message = new MQMessage();
queue.get(message, getOptions);
DataInput in = new DataInputStream (new ByteArrayInputStream (b));
MQHeaderList headersfoundlist = null;
headersfoundlist = new MQHeaderList (in);
System.out.println("headersfoundlist size: " + headersfoundlist.size());
However, I read only a specific MQRFH2 it works
MQMessage message = new MQMessage();
queue.get(message, getOptions);
DataInput in = new DataInputStream (new ByteArrayInputStream (b));
MQRFH2 rfh2 = new MQRFH2(in);
Element usrfolder = rfh2.getFolder("usr", false);
System.out.println("usr folder" + usrfolder);
How can I parse all the headers of the MQ Message?
DataInput in = new DataInputStream (new ByteArrayInputStream (b));
What's that about? Not sure why you want to do that.
It should just be:
MQMessage message = new MQMessage();
queue.get(message, getOptions);
MQHeaderList headersfoundlist = new MQHeaderList(message);
System.out.println("headersfoundlist size: " + headersfoundlist.size());
Read more here.
Update:
#anshu's comment about it not working, well, I've always found MQHeaderList class to be very buggy. Hence, that is why I don't use it.
Also, 99.99% messages in MQ will only ever have 1 embedded MQ header (i.e. MQRFH2). Note: A JMS message == MQRFH2 message. The only case where you will find 2 embedded MQ headers are for messages on the Dead Letter Queue.
i.e.
{MQDLH}{MQRFH2}{message payload}
Is there a real need for your application to process multiple embedded MQ headers? Is your application putting/getting JMS messages (aka MQRFH2 messages)?
If so then you should do something like the following:
queue.get(receiveMsg, gmo);
if (CMQC.MQFMT_RF_HEADER_2.equals(receiveMsg.format))
{
receiveMsg.seek(0);
MQRFH2 rfh2 = new MQRFH2(receiveMsg);
int strucLen = rfh2.getStrucLength();
int encoding = rfh2.getEncoding();
int CCSID = rfh2.getCodedCharSetId();
String format= rfh2.getFormat();
int flags = rfh2.getFlags();
int nameValueCCSID = rfh2.getNameValueCCSID();
String[] folderStrings = rfh2.getFolderStrings();
for (String folder : folderStrings)
System.out.println.logger("Folder: "+folder);
if (CMQC.MQFMT_STRING.equals(format))
{
String msgStr = receiveMsg.readStringOfByteLength(receiveMsg.getDataLength());
System.out.println.logger("Data: "+msgStr);
}
else if (CMQC.MQFMT_NONE.equals(format))
{
byte[] b = new byte[receiveMsg.getDataLength()];
receiveMsg.readFully(b);
System.out.println.logger("Data: "+new String(b));
}
}
else if ( (CMQC.MQFMT_STRING.equals(receiveMsg.format)) ||
(CMQC.MQFMT_NONE.equals(receiveMsg.format)) )
{
Enumeration<String> props = receiveMsg.getPropertyNames("%");
if (props != null)
{
System.out.println.logger("Named Properties:");
while (props.hasMoreElements())
{
String propName = props.nextElement();
Object o = receiveMsg.getObjectProperty(propName);
System.out.println.logger(" Name="+propName+" : Value="+o);
}
}
if (CMQC.MQFMT_STRING.equals(receiveMsg.format))
{
String msgStr = receiveMsg.readStringOfByteLength(receiveMsg.getMessageLength());
System.out.println.logger("Data: "+msgStr);
}
else
{
byte[] b = new byte[receiveMsg.getMessageLength()];
receiveMsg.readFully(b);
System.out.println.logger("Data: "+new String(b));
}
}
else
{
byte[] b = new byte[receiveMsg.getMessageLength()];
receiveMsg.readFully(b);
System.out.println.logger("Data: "+new String(b));
}
I found the mistake in my code. I have a few more steps before reading the headers. It was moving the cursor in message buffer to the end.
I added message.setDataOffset(0); before reading headers and it worked.

Java HivePreparedStatement example

Can someone show an example java program of HivePreparedStatement that can be used to connect to hive tables and retrieve data from it.
https://hive.apache.org/javadocs/r1.2.2/api/org/apache/hive/jdbc/HivePreparedStatement.html
I was trying with the following code and I am unable to complete it as there are many parameters for the constructor
TCLIService.Iface client =null;
EmbeddedThriftBinaryCLIService embeddedClient = new EmbeddedThriftBinaryCLIService();
embeddedClient.init(null);
client = embeddedClient;
TOpenSessionReq openReq = new TOpenSessionReq();
JdbcConnectionParams connParams = Utils.parseURL("jdbc:hive2://168.61.32.157:10000/nadb");
Map<String, String> openConf = new HashMap<String, String>();
// for remote JDBC client, try to set the conf var using 'set foo=bar'
for (Map.Entry<String, String> hiveConf : connParams.getHiveConfs().entrySet()) {
openConf.put("set:hiveconf:" + hiveConf.getKey(), hiveConf.getValue());
}
// For remote JDBC client, try to set the hive var using 'set hivevar:key=value'
for (Map.Entry<String, String=> hiveVar : connParams.getHiveVars().entrySet()) {
openConf.put("set:hivevar:" + hiveVar.getKey(), hiveVar.getValue());
}
// switch the database
openConf.put("use:database", connParams.getDbName());
// set the fetchSize
openConf.put("set:hiveconf:hive.server2.thrift.resultset.default.fetch.size",
Integer.toString(fetchSize));
// set the session configuration
Map<String, String> sessVars = connParams.getSessionVars();
if (sessVars.containsKey(HiveAuthFactory.HS2_PROXY_USER)) {
openConf.put(HiveAuthFactory.HS2_PROXY_USER,
sessVars.get(HiveAuthFactory.HS2_PROXY_USER));
}
openReq.setConfiguration(openConf);
// Store the user name in the open request in case no non-sasl authentication
if (JdbcConnectionParams.AUTH_SIMPLE.equals(sessConfMap.get(JdbcConnectionParams.AUTH_TYPE))) {
openReq.setUsername(sessConfMap.get(JdbcConnectionParams.AUTH_USER));
openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
}
TOpenSessionResp openResp = client.OpenSession(openReq);
// validate connection
Utils.verifySuccess(openResp.getStatus());
if (!supportedProtocols.contains(openResp.getServerProtocolVersion())) {
throw new TException("Unsupported Hive2 protocol");
}
protocol = openResp.getServerProtocolVersion();
sessHandle = openResp.getSessionHandle();
HivePreparedStatement hivePreparedStatement = new HivePreparedStatement(con, client, sessHandle, sql);
Is there any simple method for doing this?

jmeter kafka consumer throwing error as [ClassCastException: [Ljava.lang.String; cannot be cast to java.util.List]

I am trying to read kafka messages using a kafka consumer in jmeter using jsr223 sampler. iam unable to understand the error
[Response message: javax.script.ScriptException: javax.script.ScriptException: java.lang.ClassCastException: [Ljava.lang.String; cannot be cast to java.util.List]
Please Help me solve the issue so that i can subscribe and consume messages using the kafka consumer.
import java.util.Properties;
import java.util.Arrays;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.ConsumerRecord;
Properties props = new Properties();
String groupID = "REQUEST_RESPONSE_JOB_GROUP";
String clientID = "REQUEST_RESPONSE_JOB_CLIENT";
String BSID = "kafka:9092";
String topic = "PROC_REST_EVENTS";
props.put("bootstrap.servers", BSID);
props.put("group.id", groupID);
props.put("client.id", clientID);
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
props.put("partition.assignment.strategy","org.apache.kafka.clients.consumer.RangeAssignor");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
//Kafka Consumer subscribes list of topics here.
consumer.subscribe(Arrays.asList(topic));
//print the topic name
System.out.println("Subscribed to topic " + topic);
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
// print the offset,key and value for the consumer records.
System.out.printf("offset = %d, key = %s, value = %s\n",
record.offset(), record.key(), record.value());
return records;
}
Most probably you're getting a List from the Kafka topic while your consumer expects a String, you need to amend consumer configuration to match the types which come from the topic.
Try out the following Groovy code which sends 3 messages to the test topic (if it doesn't exist you will need to create it) and reads them after this.
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.clients.producer.KafkaProducer
import org.apache.kafka.clients.producer.ProducerConfig
import org.apache.kafka.clients.producer.ProducerRecord
import org.apache.kafka.common.serialization.LongDeserializer
import org.apache.kafka.common.serialization.LongSerializer
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.kafka.common.serialization.StringSerializer
def BOOTSTRAP_SERVERS = 'localhost:9092'
def TOPIC = 'test'
Properties kafkaProps = new Properties()
kafkaProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS)
kafkaProps.put(ProducerConfig.CLIENT_ID_CONFIG, 'KafkaExampleProducer')
kafkaProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName())
kafkaProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName())
kafkaProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS)
kafkaProps.put(ConsumerConfig.GROUP_ID_CONFIG, 'KafkaExampleConsumer')
kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName())
kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName())
def producer = new KafkaProducer<>(kafkaProps)
def consumer = new KafkaConsumer<>(kafkaProps)
1.upto(3) {
def record = new ProducerRecord<>(TOPIC, it as long, 'Hello from JMeter ' + it)
producer.send(record)
log.info('Sent record(key=' + record.key() + 'value=' + record.value() + ')')
}
consumer.subscribe(Collections.singletonList(TOPIC))
final int giveUp = 100
int noRecordsCount = 0
while (true) {
def consumerRecords = consumer.poll(1000)
if (consumerRecords.count() == 0) {
noRecordsCount++
if (noRecordsCount > giveUp) break
else continue
}
consumerRecords.each { record ->
log.info('Received Record:(' + record.key() + ', ' + record.value() + ')')
}
consumer.commitAsync()
}
consumer.close()
You should see output like:
Once done you should be able to use the above code as a basis for your own Kafka messages consumption testing. See Apache Kafka - How to Load Test with JMeter article for more information on Kafka load testing with JMeter.

When creating and loading HFile programmatically to HBase new entries are unavailable

I'm trying to create HFiles programmatically and loading them in a running HBase instance. I found a lot of info in HFileOutputFormat and in LoadIncrementalHFiles
I managed to create the new HFile, send it to the cluster. In the cluster web interface the new store file appears but the new keyrange is unavailable.
InputStream stream = ProgrammaticHFileGeneration.class.getResourceAsStream("ga-hourly.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
String line = null;
Map<byte[], String> rowValues = new HashMap<byte[], String>();
while((line = reader.readLine())!=null) {
String[] vals = line.split(",");
String row = new StringBuilder(vals[0]).append(".").append(vals[1]).append(".").append(vals[2]).append(".").append(vals[3]).toString();
rowValues.put(row.getBytes(), line);
}
List<byte[]> keys = new ArrayList<byte[]>(rowValues.keySet());
Collections.sort(keys, byteArrComparator);
HBaseTestingUtility testingUtility = new HBaseTestingUtility();
testingUtility.startMiniCluster();
testingUtility.createTable("table".getBytes(), "data".getBytes());
Writer writer = new HFile.Writer(testingUtility.getTestFileSystem(),
new Path("/tmp/hfiles/data/hfile"),
HFile.DEFAULT_BLOCKSIZE, Compression.Algorithm.NONE, KeyValue.KEY_COMPARATOR);
for(byte[] key:keys) {
writer.append(new KeyValue(key, "data".getBytes(), "d".getBytes(), rowValues.get(key).getBytes()));
}
writer.appendFileInfo(StoreFile.BULKLOAD_TIME_KEY, Bytes.toBytes(System.currentTimeMillis()));
writer.appendFileInfo(StoreFile.MAJOR_COMPACTION_KEY, Bytes.toBytes(true));
writer.close();
Configuration conf = testingUtility.getConfiguration();
LoadIncrementalHFiles loadTool = new LoadIncrementalHFiles(conf);
HTable hTable = new HTable(conf, "table".getBytes());
loadTool.doBulkLoad(new Path("/tmp/hfiles"), hTable);
ResultScanner scanner = hTable.getScanner("data".getBytes());
Result next = null;
System.out.println("Scanning");
while((next = scanner.next()) != null) {
System.out.format("%s %s\n", new String(next.getRow()), new String(next.getValue("data".getBytes(), "d".getBytes())));
}
Did anyone actually make this work ? I have a compilable / testable version up on my github
take a look at the LoadIncrementalHFiles test in the hbase source code : https://github.com/apache/hbase/blob/7c46646994b7a9d6f947cf12796579ef48d0b0bd/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java

Resources