Storm SQS messages not getting acked - apache-storm

I have a topology with 1 spout reading from 2 SQS queues and 5 bolts. After processing when i try to ack from second bolt it is not getting acked.
I'm running it in reliable mode and trying to ack in the last bolt. I get this message as if the messages are getting acked. But it is not getting deleted from the queue and the overwritten ack() methods are not getting called. It looks like it calls the default ack method in backtype.storm.task.OutputCollector instead of the overridden method in my spout.
8240 [Thread-24-conversionBolt] INFO backtype.storm.daemon.task - Emitting: conversionBolt__ack_ack [-7578372739434961741 -8189877254603774958]
I have anchored message ID to the tuple in my SQS queue spout and emitting to first bolt.
collector.emit(getStreamId(message), new Values(jsonObj.toString()), message.getReceiptHandle());
I have ack() and fail() methods overridden in my queue spout.Default Visibility Timeout has been set to 30 seconds
Code snippet from my topology:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("firstQueueSpout",
new SqsQueueSpout(StormConfigurations.getQueueURL()
+ StormConfigurations.getFirstQueueName(), true),
StormConfigurations.getAwsQueueSpoutThreads());
builder.setSpout("secondQueueSpout",
new SqsQueueSpout(StormConfigurations.getQueueURL()
+ StormConfigurations.getSecondQueueName(),
true), StormConfigurations.getAwsQueueSpoutThreads());
builder.setBolt("transformerBolt", new TransformerBolt(),
StormConfigurations.getTranformerBoltThreads())
.shuffleGrouping("firstQueueSpout")
.shuffleGrouping("secondQueueSpout");
builder.setBolt("conversionBolt", new ConversionBolt(),
StormConfigurations.getTranformerBoltThreads())
.shuffleGrouping("transformerBolt");
// To dispatch it to the corresponding bolts based on packet type
builder.setBolt("dispatchBolt", new DispatcherBolt(),
StormConfigurations.getDispatcherBoltThreads())
.shuffleGrouping("conversionBolt");
Code snippet from SQSQueueSpout(extends BaseRichSpout):
#Override
public void nextTuple()
{
if (queue.isEmpty()) {
ReceiveMessageResult receiveMessageResult = sqs.receiveMessage(
new ReceiveMessageRequest(queueUrl).withMaxNumberOfMessages(10));
queue.addAll(receiveMessageResult.getMessages());
}
Message message = queue.poll();
if (message != null)
{
try
{
JSONParser parser = new JSONParser();
JSONObject jsonObj = (JSONObject) parser.parse(message.getBody());
// ack(message.getReceiptHandle());
if (reliable) {
collector.emit(getStreamId(message), new Values(jsonObj.toString()), message.getReceiptHandle());
} else {
// Delete it right away
sqs.deleteMessageAsync(new DeleteMessageRequest(queueUrl, message.getReceiptHandle()));
collector.emit(getStreamId(message), new Values(jsonObj.toString()));
}
}
catch (ParseException e)
{
LOG.error("SqsQueueSpout SQLException in SqsQueueSpout.nextTuple(): ", e);
}
} else {
// Still empty, go to sleep.
Utils.sleep(sleepTime);
}
}
public String getStreamId(Message message) {
return Utils.DEFAULT_STREAM_ID;
}
public int getSleepTime() {
return sleepTime;
}
public void setSleepTime(int sleepTime)
{
this.sleepTime = sleepTime;
}
#Override
public void ack(Object msgId) {
System.out.println("......Inside ack in sqsQueueSpout..............."+msgId);
// Only called in reliable mode.
try {
sqs.deleteMessageAsync(new DeleteMessageRequest(queueUrl, (String) msgId));
} catch (AmazonClientException ace) { }
}
#Override
public void fail(Object msgId) {
// Only called in reliable mode.
try {
sqs.changeMessageVisibilityAsync(
new ChangeMessageVisibilityRequest(queueUrl, (String) msgId, 0));
} catch (AmazonClientException ace) { }
}
#Override
public void close() {
sqs.shutdown();
((AmazonSQSAsyncClient) sqs).getExecutorService().shutdownNow();
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("message"));
}
Code snipped from my first Bolt(extends BaseRichBolt):
public class TransformerBolt extends BaseRichBolt
{
private static final long serialVersionUID = 1L;
public static final Logger LOG = LoggerFactory.getLogger(TransformerBolt.class);
private OutputCollector collector;
#Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
this.collector = collector;
}
#Override
public void execute(Tuple input) {
String eventStr = input.getString(0);
//some code here to convert the json string to map
//Map datamap, long packetId being sent to next bolt
this.collector.emit(input, new Values(dataMap,packetId));
}
catch (Exception e) {
LOG.warn("Exception while converting AWS SQS to HashMap :{}", e);
}
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("dataMap", "packetId"));
}
}
Code snippet from second Bolt:
public class ConversionBolt extends BaseRichBolt
{
private static final long serialVersionUID = 1L;
private OutputCollector collector;
#Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
this.collector = collector;
}
#Override
public void execute(Tuple input)
{
try{
Map dataMap = (Map)input.getValue(0);
Long packetId = (Long)input.getValue(1);
//this ack is not working
this.collector.ack(input);
}catch(Exception e){
this.collector.fail(input);
}
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
Kindly let me know if you need more information. Somebody shed some light on why the overridden ack in my spout is not getting called(from my second bolt)...

You must ack all incoming tuples in all bolts, ie, add collector.ack(input) to TransformerBolt.execute(Tuple input).
The log message you see is correct: your code calls collector.ack(...) and this call gets logged. A call to ack in your topology is not a call to Spout.ack(...): Each time a Spout emits a tuple with a message ID, this ID gets registered by the running ackers of your topology. Those ackers will get a message on each ack of a Bolt, collect those and notify the Spout if all acks of a tuple got received. If a Spout receives this message from an acker, it calls it's own ack(Object messageID) method.
See here for more details: https://storm.apache.org/documentation/Guaranteeing-message-processing.html

Related

Spring Boot WebSocket URL Not Responding and RxJS Call Repetition?

I'm trying to follow a guide to WebSockets at https://www.devglan.com/spring-boot/spring-boot-angular-websocket
I'd like it to respond to ws://localhost:8448/wsb/softlayer-cost-file, but I'm sure I misunderstood something. I'd like to get it to receive a binary file and issue periodic updates as the file is being processed.
Questions are:
How come Spring does not respond to my requests despite all the multiple URLs I try (see below).
Does my RxJS call run once and then conclude, or does it keep running until some closure has happened? Sorry to ask what might be obvious to others.
On my Spring Boot Server start, I see no errors. After about 5-7 minutes of running, I saw the following log message:
INFO o.s.w.s.c.WebSocketMessageBrokerStats - WebSocketSession[0 current WS(0)-HttpStream(0)-HttpPoll(0), 0 total, 0 closed abnormally (0 connect failure, 0 send limit, 0 transport error)], stompSubProtocol[processed CONNECT(0)-CONNECTED(0)-DISCONNECT(0)], stompBrokerRelay[null], inboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], outboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], sockJsScheduler[pool size = 6, active threads = 1, queued tasks = 0, completed tasks = 5]
I've pointed my browser at these URLs and can't get the Spring Boot server to show any reaction:
ws://localhost:8448/app/message
ws://localhost:8448/greeting/app/message
ws://localhost:8448/topic
ws://localhost:8448/queue
(I got the initial request formed in Firefox, then clicked edit/resend to try again).
WebSocketConfig.java
#Configuration
#EnableWebSocketMessageBroker
public class WebSocketConfig extends AbstractWebSocketMessageBrokerConfigurer {
#Autowired
CostFileUploadWebSocketHandler costFileUploadWebSocketHandler;
public void registerWebSocketHandlers(WebSocketHandlerRegistry registry) {
registry.addHandler(new SocketTextHandler(), "/wst");
registry.addHandler(costFileUploadWebSocketHandler, "/wsb/softlayer-cost-file");
}
#Override
public void configureMessageBroker(MessageBrokerRegistry config) {
config.enableSimpleBroker("/topic/", "/queue/");
config.setApplicationDestinationPrefixes("/app");
}
#Override
public void registerStompEndpoints(StompEndpointRegistry registry) {
registry.addEndpoint("/greeting").setAllowedOrigins("*");
// .withSockJS();
}
}
CostFileUploadWebSocketHandler.java
#Component
public class CostFileUploadWebSocketHandler extends BinaryWebSocketHandler {
private final Logger logger = LoggerFactory.getLogger(this.getClass());
private SoftLayerJobService softLayerJobService;
private SoftLayerService softLayerService;
private AuthenticationFacade authenticationFacade;
#Autowired
CostFileUploadWebSocketHandler(SoftLayerJobService softLayerJobService, SoftLayerService softLayerService,
AuthenticationFacade authenticationFacade) {
this.softLayerJobService = softLayerJobService;
this.softLayerService = softLayerService;
this.authenticationFacade = authenticationFacade;
}
Map<WebSocketSession, FileUploadInFlight> sessionToFileMap = new WeakHashMap<>();
#Override
public boolean supportsPartialMessages() {
return true;
}
class WebSocketProgressReporter implements ProgressReporter {
private WebSocketSession session;
public WebSocketProgressReporter(WebSocketSession session) {
this.session = session;
}
#Override
public void reportCurrentProgress(BatchStatus currentBatchStatus, long currentPercentage) {
try {
session.sendMessage(new TextMessage("BatchStatus "+currentBatchStatus));
session.sendMessage(new TextMessage("Percentage Complete "+currentPercentage));
} catch(IOException e) {
throw new RuntimeException(e);
}
}
}
#Override
protected void handleBinaryMessage(WebSocketSession session, BinaryMessage message) throws Exception {
ByteBuffer payload = message.getPayload();
FileUploadInFlight inflightUpload = sessionToFileMap.get(session);
if (inflightUpload == null) {
throw new IllegalStateException("This is not expected");
}
inflightUpload.append(payload);
if (message.isLast()) {
File fileNameSaved = save(inflightUpload.name, "websocket", inflightUpload.bos.toByteArray());
BatchStatus currentBatchStatus = BatchStatus.UNKNOWN;
long percentageComplete;
ProgressReporter progressReporter = new WebSocketProgressReporter(session);
SoftLayerCostFileJobExecutionThread softLayerCostFileJobExecutionThread =
new SoftLayerCostFileJobExecutionThread(softLayerService, softLayerJobService, fileNameSaved,progressReporter);
logger.info("In main thread about to begin separate thread");
ForkJoinPool.commonPool().submit(softLayerCostFileJobExecutionThread);
while(!softLayerCostFileJobExecutionThread.jobDone());
// softLayerCostFileJobExecutionThread.run();
// Wait for above to complete somehow
// StepExecution foundStepExecution = jobExplorer.getJobExecution(
// jobExecutionThread.getJobExecutionResult().getJobExecution().getId()
// ).getStepExecutions().stream().filter(stepExecution->stepExecution.getStepName().equals("softlayerUploadFile")).findFirst().orElseGet(null);
// if (!"COMPLETED".equals(jobExecutionResult.getExitStatus())) {
// throw new UploadFileException(file.getOriginalFilename() + " exit status: " + jobExecutionResult.getExitStatus());
// }
logger.info("In main thread after separate thread submitted");
session.sendMessage(new TextMessage("UPLOAD "+inflightUpload.name));
session.close();
sessionToFileMap.remove(session);
logger.info("Uploaded "+inflightUpload.name);
}
String response = "Upload Chunk: size "+ payload.array().length;
logger.debug(response);
}
private File save(String fileName, String prefix, byte[] data) throws IOException {
Path basePath = Paths.get(".", "uploads", prefix, UUID.randomUUID().toString());
logger.info("Saving incoming cost file "+fileName+" to "+basePath);
Files.createDirectories(basePath);
FileChannel channel = new FileOutputStream(Paths.get(basePath.toString(), fileName).toFile(), false).getChannel();
channel.write(ByteBuffer.wrap(data));
channel.close();
return new File(basePath.getFileName().toString());
}
#Override
public void afterConnectionEstablished(WebSocketSession session) throws Exception {
sessionToFileMap.put(session, new FileUploadInFlight(session));
}
static class FileUploadInFlight {
private final Logger logger = LoggerFactory.getLogger(this.getClass());
String name;
String uniqueUploadId;
ByteArrayOutputStream bos = new ByteArrayOutputStream();
/**
* Fragile constructor - beware not prod ready
* #param session
*/
FileUploadInFlight(WebSocketSession session) {
String query = session.getUri().getQuery();
String uploadSessionIdBase64 = query.split("=")[1];
String uploadSessionId = new String(Base64Utils.decodeUrlSafe(uploadSessionIdBase64.getBytes()));
List<String> sessionIdentifiers = Splitter.on("\\").splitToList(uploadSessionId);
String uniqueUploadId = session.getRemoteAddress().toString()+sessionIdentifiers.get(0);
String fileName = sessionIdentifiers.get(1);
this.name = fileName;
this.uniqueUploadId = uniqueUploadId;
logger.info("Preparing upload for "+this.name+" uploadSessionId "+uploadSessionId);
}
public void append(ByteBuffer byteBuffer) throws IOException{
bos.write(byteBuffer.array());
}
}
}
Below is a snippet of Angular code where I make the call to the websocket. The service is intended to receive a file, then provide regular updates of percentage complete until the service is completed. Does this call need to be in a loop, or does the socket run until it's closed?
Angular Snippet of call to WebSocket:
this.softlayerService.uploadBlueReportFile(this.blueReportFile)
.subscribe(data => {
this.showLoaderBlueReport = false;
this.successBlueReport = true;
this.blueReportFileName = "No file selected";
this.responseBlueReport = 'File '.concat(data.fileName).concat(' ').concat('is ').concat(data.exitStatus);
this.blueReportSelected = false;
this.getCurrentUserFiles();
},
(error)=>{
if(error.status === 504){
this.showLoaderBlueReport = false;
this.stillProcessing = true;
}else{
this.showLoaderBlueReport = false;
this.displayUploadBlueReportsError(error, 'File upload failed');
}
});
}

my rocketMQ 2m-noslave can not consumer messages

I build a rocketmq serveice on my server,it is 2m-noslave cluster,it's can be send messages to rocketmq,but my consumer cannot receive message,somebody tell me where is wrong,thinks...this is my
Consumer CLass code:
public class Consumer{
public static final String CONSUMER_GROUP_NAME = "broker-b";
public static final String CLUSTER_ADDR = "120.27.128.207:9876;120.27.146.42:9876";
public static final String SUBSCRIBE = "dzg_topic_001";
private void consumerMessage() throws MQClientException {
DefaultMQPushConsumer consumer = new DefaultMQPushConsumer(CONSUMER_GROUP_NAME);
consumer.setNamesrvAddr(CLUSTER_ADDR);
consumer.setConsumeFromWhere(ConsumeFromWhere.CONSUME_FROM_FIRST_OFFSET);
consumer.setMessageModel(MessageModel.CLUSTERING);
//设置批量消费个数
consumer.subscribe(SUBSCRIBE, "*");
consumer.registerMessageListener((List<MessageExt> msgList, ConsumeConcurrentlyContext context)->{
MessageExt msg = msgList.get(0);
System.out.println( "received new message: topic===="+msg.getTopic()+" tag==="+msg.getTags()+" body=="+new String(msg.getBody()));
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
});
consumer.start();
System.out.println("ConsumerStarted.");
}
public static void main(String[] args) {
try {
new Consumer().consumerMessage();
} catch (MQClientException e) {
e.printStackTrace();
}
}
}
and my rocketmq server is enter image description here
when I set property autoCreateTopicEnable and autoCreateSubscriptionGroup'svalue is true,the consumer is right,why set false the consumer is can not work?

Wicket 7 WebSocketBehavior

I am developing extending WebSocketBehavior in order to send logging data to a client.. have generated the logging handler and it fires as and when needed.
I am having trouble understanding how exactly to push the log entries to the clients and update the console panel. I already know the onMessage method is what I need to override with the console taking the WeSocketRequestHandler as an argument along with the message I want to send. How exactly do I get the onMessage to fire properly?? Here is the code I am using:
public class LogWebSocketBehavior extends WebSocketBehavior {
private static final long serialVersionUID = 1L;
Console console;
private Handler logHandler;
private Model model;
public LogWebSocketBehavior(Console console) {
super();
configureLogger();
this.console = console;
}
private void configureLogger() {
Logger l = Logger.getLogger(AppUtils.loggerName);
logHandler = getLoggerHandler();
l.addHandler(logHandler);
}
#Override
protected void onMessage(WebSocketRequestHandler handler, TextMessage message) {
console.info(handler, model.getObject());
}
private Handler getLoggerHandler() {
return new Handler() {
#Override
public void publish(LogRecord record) {
model.setObject(record);
}
#Override
public void flush() {
throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
}
#Override
public void close() throws SecurityException {
throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
}
};
}
private Collection<IWebSocketConnection> getConnectedClients() {
IWebSocketConnectionRegistry registry = new SimpleWebSocketConnectionRegistry();
return registry.getConnections(getApplication());
}
private void sendToAllConnectedClients(String message) {
Collection<IWebSocketConnection> wsConnections = getConnectedClients();
for (IWebSocketConnection wsConnection : wsConnections) {
if (wsConnection != null && wsConnection.isOpen()) {
try {
wsConnection.sendMessage("test");
} catch (IOException e) {
}
}
}
}
}
The logger works as I want it to, providing messages as needed, but I cannot find how to actually fire the onMessage method to update my console. Any help is appreciated...
#onMessage() is called by Wicket whenever the browser pushes a message via Wicket.WebSocket.send("some message").
It is not very clear but I guess you need to push messages from the server to the clients (the browsers). If this is the case then you need to get a handle to IWebSocketRequestHandler and use its #push(String) method. You can do this with WebSocketSettings.Holder.get(Application.get()).getConnectionRegistry().getConnection(...).push("message").
Here is the class working as I need. Thank you Martin!!
public class LogWebSocketBehavior extends WebSocketBehavior {
private static final long serialVersionUID = 1L;
Console console;
private Handler logHandler;
private IModel model;
public LogWebSocketBehavior(Console console, IModel model) {
super();
configureLogger();
this.console = console;
this.model = model;
}
private void configureLogger() {
Logger l = Logger.getLogger(AppUtils.loggerName);
logHandler = getLoggerHandler();
l.addHandler(logHandler);
}
#Override
protected void onPush(WebSocketRequestHandler handler, IWebSocketPushMessage message) {
super.onPush(handler, message);
console.info(handler, model);
}
private Handler getLoggerHandler() {
return new Handler() {
#Override
public void publish(LogRecord record) {
model.setObject(record);
sendToAllConnectedClients(record.toString());
}
#Override
public void flush() {
throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
}
#Override
public void close() throws SecurityException {
throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
}
};
}
private Collection<IWebSocketConnection> getConnectedClients() {
IWebSocketConnectionRegistry registry = new SimpleWebSocketConnectionRegistry();
return registry.getConnections(getApplication());
}
private void sendToAllConnectedClients(String message) {
IWebSocketConnectionRegistry registry = new SimpleWebSocketConnectionRegistry();
WebSocketPushBroadcaster b = new WebSocketPushBroadcaster(registry);
IWebSocketPushMessage msg = new Message();
b.broadcastAll(getApplication(), msg);
}
class Message implements IWebSocketPushMessage {
public Message(){
}
}
}

RxJava cache last item for future subscribers

I have implemented simple RxEventBus which starts emitting events, even if there is no subscribers. I want to cache last emitted event, so that if first/next subscriber subscribes, it receive only one (last) item.
I created test class which describes my problem:
public class RxBus {
ApplicationsRxEventBus applicationsRxEventBus;
public RxBus() {
applicationsRxEventBus = new ApplicationsRxEventBus();
}
public static void main(String[] args) {
RxBus rxBus = new RxBus();
rxBus.start();
}
private void start() {
ExecutorService executorService = Executors.newScheduledThreadPool(2);
Runnable runnable0 = () -> {
while (true) {
long currentTime = System.currentTimeMillis();
System.out.println("emiting: " + currentTime);
applicationsRxEventBus.emit(new ApplicationsEvent(currentTime));
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
};
Runnable runnable1 = () -> applicationsRxEventBus
.getBus()
.subscribe(new Subscriber<ApplicationsEvent>() {
#Override
public void onCompleted() {
}
#Override
public void onError(Throwable throwable) {
}
#Override
public void onNext(ApplicationsEvent applicationsEvent) {
System.out.println("runnable 1: " + applicationsEvent.number);
}
});
Runnable runnable2 = () -> applicationsRxEventBus
.getBus()
.subscribe(new Subscriber<ApplicationsEvent>() {
#Override
public void onCompleted() {
}
#Override
public void onError(Throwable throwable) {
}
#Override
public void onNext(ApplicationsEvent applicationsEvent) {
System.out.println("runnable 2: " + applicationsEvent.number);
}
});
executorService.execute(runnable0);
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
executorService.execute(runnable1);
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
executorService.execute(runnable2);
}
private class ApplicationsRxEventBus {
private final Subject<ApplicationsEvent, ApplicationsEvent> mRxBus;
private final Observable<ApplicationsEvent> mBusObservable;
public ApplicationsRxEventBus() {
mRxBus = new SerializedSubject<>(BehaviorSubject.<ApplicationsEvent>create());
mBusObservable = mRxBus.cache();
}
public void emit(ApplicationsEvent event) {
mRxBus.onNext(event);
}
public Observable<ApplicationsEvent> getBus() {
return mBusObservable;
}
}
private class ApplicationsEvent {
long number;
public ApplicationsEvent(long number) {
this.number = number;
}
}
}
runnable0 is emitting events even if there is no subscribers. runnable1 subscribes after 3 sec, and receives last item (and this is ok). But runnable2 subscribes after 3 sec after runnable1, and receives all items, which runnable1 received. I only need last item to be received for runnable2. I have tried cache events in RxBus:
private class ApplicationsRxEventBus {
private final Subject<ApplicationsEvent, ApplicationsEvent> mRxBus;
private final Observable<ApplicationsEvent> mBusObservable;
private ApplicationsEvent event;
public ApplicationsRxEventBus() {
mRxBus = new SerializedSubject<>(BehaviorSubject.<ApplicationsEvent>create());
mBusObservable = mRxBus;
}
public void emit(ApplicationsEvent event) {
this.event = event;
mRxBus.onNext(event);
}
public Observable<ApplicationsEvent> getBus() {
return mBusObservable.doOnSubscribe(() -> emit(event));
}
}
But problem is, that when runnable2 subscribes, runnable1 receives event twice:
emiting: 1447183225122
runnable 1: 1447183225122
runnable 1: 1447183225122
runnable 2: 1447183225122
emiting: 1447183225627
runnable 1: 1447183225627
runnable 2: 1447183225627
I am sure, that there is RxJava operator for this. How to achieve this?
Your ApplicationsRxEventBus does extra work by reemitting a stored event whenever one Subscribes in addition to all the cached events.
You only need a single BehaviorSubject + toSerialized as it will hold onto the very last event and re-emit it to Subscribers by itself.
You are using the wrong interface. When you susbscribe to a cold Observable you get all of its events. You need to turn it into hot Observable first. This is done by creating a ConnectableObservable from your Observable using its publish method. Your Observers then call connect to start receiving events.
You can also read more about in the Hot and Cold observables section of the tutorial.

storm processing data extremely slow

We have 1 spout and 1 bolt on single node. Spout reads the data from RabbitMQ and emits it to the only bolt which writes data to Cassandra.
Our data source generates 10000 messages per second and storm takes around 10 sec to process this, which is too slow for us.
We tried increasing the parallelism of topology but that doesn't make any difference.
What is ideal no of messages that can be processed on a single node machine with 1 spout and 1 bolt? and what are the possible ways to increase the processing speed of storm topology?.
Update :
This is the sample code, it doesent have code for RabbitMQ and cassandra, but gives same performance issue.
// Topology Class
public class SimpleTopology {
public static void main(String[] args) throws InterruptedException {
System.out.println("hiiiiiiiiiii");
TopologyBuilder topologyBuilder = new TopologyBuilder();
topologyBuilder.setSpout("SimpleSpout", new SimpleSpout());
topologyBuilder.setBolt("SimpleBolt", new SimpleBolt(), 2).setNumTasks(4).shuffleGrouping("SimpleSpout");
Config config = new Config();
config.setDebug(true);
config.setNumWorkers(2);
LocalCluster localCluster = new LocalCluster();
localCluster.submitTopology("SimpleTopology", config, topologyBuilder.createTopology());
Thread.sleep(2000);
}
}
// Simple Bolt
public class SimpleBolt implements IRichBolt{
private OutputCollector outputCollector;
public void prepare(Map map, TopologyContext tc, OutputCollector oc) {
this.outputCollector = oc;
}
public void execute(Tuple tuple) {
this.outputCollector.ack(tuple);
}
public void cleanup() {
// TODO
}
public void declareOutputFields(OutputFieldsDeclarer ofd) {
// TODO
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
// Simple Spout
public class SimpleSpout implements IRichSpout{
private SpoutOutputCollector spoutOutputCollector;
private boolean completed = false;
private static int i = 0;
public void open(Map map, TopologyContext tc, SpoutOutputCollector soc) {
this.spoutOutputCollector = soc;
}
public void close() {
// Todo
}
public void activate() {
// Todo
}
public void deactivate() {
// Todo
}
public void nextTuple() {
if(!completed)
{
if(i < 100000)
{
String item = "Tag" + Integer.toString(i++);
System.out.println(item);
this.spoutOutputCollector.emit(new Values(item), item);
}
else
{
completed = true;
}
}
else
{
try {
Thread.sleep(2000);
} catch (InterruptedException ex) {
Logger.getLogger(SimpleSpout.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
public void ack(Object o) {
System.out.println("\n\n OK : " + o);
}
public void fail(Object o) {
System.out.println("\n\n Fail : " + o);
}
public void declareOutputFields(OutputFieldsDeclarer ofd) {
ofd.declare(new Fields("word"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
Update:
Is it possible that with shuffle grouping same tuple will be processed more than once? configuration used (spouts = 4. bolts = 4), the problem now is, with increase in no of bolts the performance is decreasing.
You should find out what is the bottleneck here -- RabbitMQ or Cassandra. Open the Storm UI and take a look at the latency times for each component.
If increasing parallelism didn't help (it normally should), there's definitely a problem with RabbitMQ or Cassandra, so you should focus on them.
In your code you only emit one tuple per call to nextTuple(). Try emitting more tuples per call.
something like:
public void nextTuple() {
int max = 1000;
int count = 0;
GetResponse response = channel.basicGet(queueName, autoAck);
while ((response != null) && (count < max)) {
// process message
spoutOutputCollector.emit(new Values(item), item);
count++;
response = channel.basicGet(queueName, autoAck);
}
try { Thread.sleep(2000); } catch (InterruptedException ex) {
}
We are successfully using RabbitMQ and Storm. The result gets stored in a different DB, but anyway. We first used basic_get in Spout, and had a terrible performance, but then we swiched to basic_consume, and performance is actually very good. So take a look at how you consuming messages from Rabbit.
Some important factors:
basic_consume instead of basic_get
prefetch_count (make it high enough)
If you want to increase performance, and you don't care about loosing messages - do not ack messages and set delivery_mode to 1.

Resources