Hi I am using camel to get the messages from the JMS queue, process the message store it in Hadoop using FS java API and then transfer it to another queue
Currently my JMS concurrent is 20. so at one shot camel JMS consumes 20 messages at a time. For every message, I create the fs connection and perform an operation that creates a file and writes a file.
Here is the issue
What i see is sometimes while writing the content to the file, due to some reason my namenode goes down, in this case, I want to switch my name to the active name node
Here is a log I get
[Camel (camel-1) thread #1 a failover occurred since the last call #169 ClientNamenodeProtocolTranslatorPD.getFileInfo over {name_node_address_allias/ip:port}
When I debug it I show Operation category WRITE is not supported in state standby
I want at that time to switch to the active name node
Hadoop Java API Sample code
package org.myorg;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
public class HdfsTest {
public static void main(String args[]) {
conf.addResource("path-of-core-site.xml");
conf.addResource("path-of-hdfs-site.xml");
conf.set("fs.defaultFS", "hdfs://cloudera:8020");
conf.set("hadoop.security.authentication", "kerberos");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab("hdfs#CLOUDERA", "/etc/hadoop/conf/hdfs.keytab");
FileSystem fs = FileSystem.get(conf);
//logic to create a file and write
//close the cfile and connection
}
}
Related
I didnt see enough examples on web using apache camel with websphere mq to send and receive messages. I had a example code but I got struck at the middle of code. could any one help on this..
import org.apache.camel.CamelContext;
import org.apache.camel.Endpoint;
import org.apache.camel.Exchange;
import org.apache.camel.ExchangePattern;
import org.apache.camel.Producer;
import org.apache.camel.util.IOHelper;
import org.springframework.context.support.AbstractApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;
/**
* Client that uses the Mesage Endpoint
* pattern to easily exchange messages with the Server.
* <p/>
* Notice this very same API can use for all components in Camel, so if we were using TCP communication instead
* of JMS messaging we could just use <code>camel.getEndpoint("mina:tcp://someserver:port")</code>.
* <p/>
* Requires that the JMS broker is running, as well as CamelServer
*/
public final class CamelClientEndpoint {
private CamelClientEndpoint() {
//Helper class
}
// START SNIPPET: e1
public static void main(final String[] args) throws Exception {
System.out.println("Notice this client requires that the CamelServer is already running!");
AbstractApplicationContext context = new ClassPathXmlApplicationContext("camel-client.xml");
CamelContext camel = context.getBean("camel-client", CamelContext.class);
// get the endpoint from the camel context
Endpoint endpoint = camel.getEndpoint("jms:queue:numbers");
// create the exchange used for the communication
// we use the in out pattern for a synchronized exchange where we expect a response
Exchange exchange = endpoint.createExchange(ExchangePattern.InOut);
// set the input on the in body
// must be correct type to match the expected type of an Integer object
exchange.getIn().setBody(11);
// to send the exchange we need an producer to do it for us
Producer producer = endpoint.createProducer();
// start the producer so it can operate
producer.start();
// let the producer process the exchange where it does all the work in this oneline of code
System.out.println("Invoking the multiply with 11");
producer.process(exchange);
// get the response from the out body and cast it to an integer
int response = exchange.getOut().getBody(Integer.class);
System.out.println("... the result is: " + response);
// stopping the JMS producer has the side effect of the "ReplyTo Queue" being properly
// closed, making this client not to try any further reads for the replies from the server
producer.stop();
// we're done so let's properly close the application context
IOHelper.close(context);
}
}
I got struck at this point of code..
exchange.getIn()
Do I have to use exchange.getOut() to send message?? and How to construct message using string and add headers to it.
Welcome to stackoverflow!
I am still not sure what exactly is the problem you are stuck at and it prevents me (and possibly others as well) in helping you resolve your roadblock.
Perhaps you need to familiarize a bit more on what camel is and how it works. Camel in Action is a great book to help you with that.
If you are unable to get a copy at this point, a preview of the first few chapters of the book is available online and it should give you much better leverage. Source code repository for chapter 2 should give you some more ideas around how to process JMS messages.
In addition to it. Please don't expect full blown solutions from StackOverflow. You may read this page on how to ask a good question
The HDFS Client is outside the HDFS Cluster. When the HDFS Client write the file to hadoop the HDFS clients split the files into blocks and then it will write the block to datanode.
The question here is how the HDFS Client knows the Blocksize ? Block size is configured in the Name node and the HDFS Client has no idea about the block size then how it will split the file into blocks ?
HDFS is designed in a way where the block size for a particular file is part of the MetaData.
Let's just check what does this mean?
The client can tell the NameNode that it will put data to HDFS with a particular block size.
The client has its own hdfs-site.xml that can contain this value, and can specify it on a per-request basis as well using the -Ddfs.blocksize parameter.
If the client configuration does not define this parameter, then it defaults to the org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_DEFAULT value which is 128MB.
NameNode can throw an error for the client if it specifies a blocksize that is smaller then dfs.namenode.fs-limits.min-block-size (1MB by default).
There is nothing magical in this, NameNode does know nothing about the data and let the client to decide the optimal splitting, as well as to define the replication factor for blocks of a file.
In simple words, When you do client URI deploy, it will place server URI into Client or you download and manually replace in client. So whenever client request for info, it will go to the NameNode and fetch the required info or place new info on DataNodes.
P.S: Client = EdgeNode
Some more details below (from the Hadoop Definitive Guide 4th edition)
"The client creates the file by calling create() on DistributedFileSystem (step 1 in
Figure 3-4). DistributedFileSystem makes an RPC call to the namenode to create a new
file in the filesystem’s namespace, with no blocks associated with it (step 2). The
namenode performs various checks to make sure the file doesn’t already exist and that the
client has the right permissions to create the file. If these checks pass, the namenode
makes a record of the new file; otherwise, file creation fails and the client is thrown an
IOException. The DistributedFileSystem returns an FSDataOutputStream for the client
to start writing data to. Just as in the read case, FSDataOutputStream wraps a
DFSOutputStream, which handles communication with the datanodes and namenode.
As the client writes data (step 3), the DFSOutputStream splits it into packets, which it writes to an internal queue called the data queue."
Adding more info in response to comment on this post:
Here is a sample client program to copy a file to HDFS (Source-Hadoop Definitive Guide)
public class FileCopyWithProgress {
public static void main(String[] args) throws Exception {
String localSrc = args[0];
String dst = args[1];
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4096, true);
}
}
If you look at create() method implementation in FileSystem class, it has getDefaultBlockSize() as one of its arguments, which inturn fetches the values from configuration which is turn is provided by the namenode.
This is how client gets to know the block size configured on hadoop cluster.
Hope this helps
I am aware of the following methods for generating thread dumps in java:
kill -3
jstack
JMX from inside the JVM
JMX remote
JPDA (remote)
JVMTI (C API)
Of these methods, which is the least detrimental to the JVM's performance?
If you just need to dump all stack traces to stdout, kill -3 and jstack should be the cheapest. The functionality is implemented natively in JVM code. No intermediate structures are created - the VM prints everything itself while it walks through the stacks.
Both commands perform the same VM operation except that signal handler prints stack traces locally to stdout of Java process, while jstack receives the output from the target VM through IPC (Unix domain socket on Linux or Named Pipe on Windows).
jstack uses Dynamic Attach mechanism under the hood. You can also utilize Dynamic Attach directly if you wish to receive the stack traces as a plain stream of bytes.
import com.sun.tools.attach.VirtualMachine;
import sun.tools.attach.HotSpotVirtualMachine;
import java.io.InputStream;
public class StackTrace {
public static void main(String[] args) throws Exception {
String pid = args[0];
HotSpotVirtualMachine vm = (HotSpotVirtualMachine) VirtualMachine.attach(pid);
try (InputStream in = vm.remoteDataDump()) {
byte[] buf = new byte[8000];
for (int bytes; (bytes = in.read(buf)) > 0; ) {
System.out.write(buf, 0, bytes);
}
} finally {
vm.detach();
}
}
}
Note that all of the mentioned methods operate in a VM safepoint anyway. This means that all Java threads are stopped while the stack traces are collected.
The most performant option is likely to be the use of the ThreadMXBean.dumpAllThreads() API rather than requesting a text thread dump written to disk:
http://docs.oracle.com/javase/7/docs/api/java/lang/management/ThreadMXBean.html#dumpAllThreads(boolean,%20boolean)
Of course, whether you can use that depends on whether you need a thread dump file, or just the data.
I'm using weblogic 10.3.3, when I sends messages to Queue then its going in pending messsage which should in current message. I'm using code :
import javax.jms.QueueSender;
import javax.jms.QueueSession;
import javax.jms.Session;
import javax.jms.TextMessage;
//.....
qSession = qConnect.createQueueSession(
false, Session.AUTO_ACKNOWLEDGE);
//.....
TextMessage tmsg= qSession.createTextMessage();
tmsg.setText(message);
QueueSender qSender = qSession.createSender(requestQ);
qSender.send(tmsg);
I have gone through google but not found the helpful solution.
To quote manual entry,
A pending message is one that has either been sent in a transaction
and not committed, or that has been received and not committed or
acknowledged.
As you're using AUTO_ACKNOWLEDGE, I guess either you're sending messages in a transaction that has not been committed or message processing takes so long that it is still in process.
My experimental application is quite simple, trying what can be done with Actors and Akka.
After JVM start, it creates actor system with couple of plain actors, JMS consumer (akka.camel.Consumer) and JMS producer (akka.camel.Producer). It sends couple of messages between actors and also JMS producer -> JMS server -> JMS consumer. It basically talks to itself via JMS service.
From time to time I was experiencing weird behaviour: it seemed that from time to time, first of messages which where supposed to be sent to JMS server was somehow lost. By looking at my application logs, I could see that applications is trying to send the message but it was never received by JMS server. (For each run I have to start JVM&Application again).
Akka Camel Documentation mentions that it's possible that some components may not be fully initialized at the begining: "Some Camel components can take a while to startup, and in some cases you might want to know when the endpoints are activated and ready to be used."
I tried to implement following to wait for Camel initialization
val system = ActorSystem("actor-system")
val camel = CamelExtension(system)
val jmsConsumer = system.actorOf(Props[JMSConsumer])
val activationFuture = camel.activationFutureFor(jmsConsumer)(timeout = 10 seconds, executor = system.dispatcher)
val result = Await.result(activationFuture,10 seconds)
which seems to help with this issue. (Although, when removing this step now, I'm not able to recreate this issue any more... :/).
My question is whether this is correct way to ensure all components are fully initialized?
Should I use
val future = camel.activationFutureFor(actor)(timeout = 10 seconds, executor = system.dispatcher)
Await.result(future, 10 seconds)
for each akka.camel.Producer and akka.camel.Consumer actor to be sure that everything is initialized properly?
Is that all I should to do, or something else should be done as well? Documentation is not clean on that and it's not easy to test as issue was happening only occasionaly...
You need to initialize the camel JMS component and also Producer before sending any messages.
import static java.util.concurrent.TimeUnit.SECONDS;
import scala.concurrent.Future;
import scala.concurrent.duration.Duration;
import akka.dispatch.OnComplete;
ActorRef producer = system.actorOf(new Props(SimpleProducer.class), "simpleproducer");
Timeout timeout = new Timeout(Duration.create(15, SECONDS));
Future<ActorRef> activationFuture = camel.activationFutureFor(producer,timeout, system.dispatcher());
activationFuture.onComplete(new OnComplete<ActorRef>() {
#Override
public void onComplete(Throwable arg0, ActorRef arg1)
throws Throwable {
producer.tell("First!!");
}
},system.dispatcher());