i was new to apache storm would like to know key difference between storm 1.1 and storm 2.0? - apache-storm

I was trying to find up any major difference between storm 1.1 and storm 2.0.
Is there any difference while setting up cluster for either of the versions?
(read on official website about new Java-based implementation but has anyone seen any difference between these two versions).

In addition to reading the changelog at https://www.apache.org/dist/storm/apache-storm-2.0.0/RELEASE_NOTES.html, you can look at https://issues.apache.org/jira/browse/STORM-2306?focusedCommentId=16291947&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16291947 for some performance numbers. You can also run your own benchmarks of course.

Related

What is the best alternative to Thrift API with Cassandra 3.X or AWS managed Cassandra?

Thrift API seems to be no longer supported with higher versions of Cassandra as well as AWS managed Cassandra.
Could someone please let me know what would be the best alternate option to go with?
Our application is built on Spring framework, tightly coupled with Thrift based data models at this point. So, trying to understand if there is any similar API design that could be used. Our plan is to migrate the application to AWS and use managed Cassandra on AWS.
You should absolutely be using CQL. Modern versions of Spring [Boot|Data] use and work very well with the CQL native binary protocol.
Here's a repo we built with Spring Boot that uses Spring Data underneath for access to DataStax Astra DB (managed Serverless Cassandra-as-a-Service). It should be a decent guide for you to see how the repositories and data objects are annotated, built and used.
https://github.com/datastaxdevs/workshop-ecommerce-app
To add to Aaron's response, enterprises definitely needed to get off Thrift years ago. The drivers based on Thrift API have not been maintained for 5-7 years or more so it is a massive risk to your organisation to still be using it. For example, Nate McCall (who was the Cassandra project Chair until this year) retired the Hector client in 2015 in preference for CQL. Netflix did the same for Astyanax in 2016.
For a bit of background, CQL was introduced in Apache Cassandra 0.8 all the way back to 2011 (CASSANDRA-1703) as a replacement for Thrift. It quickly evolved to CQL2 in Cassandra 1.0.
Cassandra 1.2 added CQL3 in 2012 (CASSANDRA-3761) and support for CQL2 was dropped and replaced by CQL3 in C* 2.0 in 2013 (CASSANDRA-5585). C* 2.2 stopped using Thrift in 2015 (CASSANDRA-8358, CASSANDRA-9319).
5 years after CQL was first introduced, Cassandra 4.0 completely removed Thrift in 2016 (CASSANDRA-11115). This should convince any enterprise to migrate to CQL. There hasn't been support for the Thrift API for at least 6 years and this alone should motivate organisations to get off it.
CQL has been around for 10 years now so you shouldn't have any concerns with its maturity. Cheers!
Here is an example of using Amazon Keyspaces with Spring Boot. It uses CQL and the latest drivers. Although spring is suppose to be an abstraction, you will most likely need to refactor your code.
https://github.com/aws-samples/amazon-keyspaces-examples/tree/main/java/datastax-v4/spring

Versions for integration of apache flink, elasticsearch and kafka

I have problems with different versions of Flink, Kafka and Elastic Search. I'm using Flink 1.8.1 version but I don't know what version to use for Kafka. On the other hand, I want to use the version 6 for Elastic Search. Which versions do you think are suitable for Flink, Kafka and Elastic Search?
The following link is a version of Kafka, but in the comments section, it is introduced as a beta
enter link description here
As listed in the table, Kafka 0.11 (and higher) will work fine. The beta is a version of the Flink Connector, not Kafka itself
Plus, Kafka Connect for Elasticsearch, should you choose to use it, works for elasticsearch 6
As #cricket_007 said, it's safe to use the Kafka connector, even though it is labeled beta (which should be removed as this connector has now been battle-tested since over a year in production).
The setup Kafka -> Flink -> ES6 is quite common, so you can and should use recent version on all involved components.

guava version conflict with HBase 1 and ES 2

I'm having a project using both HBase 1.0.0 (Cloudera version) and Elasticsearch. With the upgrade to ES 2.0 I'm experiencing a problem with guava version. ES 2.0 requires guava version 18.0, but Cloudera requires guava 14.0.1.
No matter what version I define in my dependency management in my parent pom one of the two won't work.
Looking around I see that this problem occurs quite some time (e.g. http://gbif.blogspot.co.at/2014/11/upgrading-our-cluster-from-cdh4-to-cdh5.html)
1) Any ideas on how to solve this problem without any complex re-design of my application?
If not, I'm thinking of doing all the ES-stuff in a separate application. Communicating via messaging (already using AMQ) for indexing. Not sure though how to communicate for search/filter requests (at the moment implemented via Java API).
2) Any other ideas?
3) Any ideas/hints on how to solve the communication issue?
I found this blog post when googling in combination with maven-shade plug-in, so this might be another option.
https://www.elastic.co/blog/to-shade-or-not-to-shade

Apache Storm Installation without ZeroMQ/JZMQ

I am trying to setup a multi-cluster storm system. I have found several 3rd party step by step guides on this. They all have Java, Python, ZeroMQ 2.1.7 and JZMQ as the requirements for the Nimbus and Supervisor/Slave nodes. But on the official Apache Storm website, the only requirements for the Nimbus and Supervisor nodes is Java 6 and Python 2.6.6 (https://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html)
Does anyone know if ZeroMQ and JZMQ are required for Storm cluster configuration? And is there an advantage to have these two softwares installed?
From Storm 0.9.0 and onwards, 0MQ should no longer be needed and you can use Netty instead but it needs to be configured. Please see http://storm.apache.org/2013/12/08/storm090-released.html for quick config setup.

Elasticsearch / Storm integration methods

Looking for a simple integration path between Elasticsearch and Apache Storm. Support for this is included in the elasticsearch-hadoop library, but this brings tons of dependencies on the Hadoop stack: from Hive to Cascading, that I simply don't need. Has anyone out there succeeded in this integration without bringing in elasticsearch-hadoop? Thanks.
In my project we're using rabbitmq river for indexing the storm output. It's very efficient and convenient way to write to elasticsearch. You basically put the messages to the queue and the river does the rest. If something gets stucked the data are simply buffered on the queue.
So I would say, use this river approach for writing and elasticsearch Java API for reading, like Kit Menke suggests (or the Jest client, we've found this cool and it offers async API basing on ApacheHttpAsyncClient, though we're not reading from elasticsearch in storm topology but in different services).

Resources