Recently i install the distributed realtime computation system "Storm".
When enable storm ui, everyone can access the storm ui throuth http://xxxxx:8080, and can kill my topology.
How to enable authentication?
You might want to have a look at Apache Storm Security in general, and particularly the section UI/LogViewer.
Also, have a loot at Storm defaults.yaml configuration file. It contains entries that are related to securing an Apache Storm topology.
And please check which release of Apache Storm that you are using.
Related
I've seen Apache Nifi compared to similar ETL tools like Apache Flume, Airflow and Kafka. These are ETL tools more than ESBs or request mediators.
ESBs/request mediators can be used to orchestrate web services and expose a single service (a proxy service) which is expected to serve concurrent HTTP requests efficiently.
My question is, can I use Apache Nifi for the same purpose? To provide service orchestration and serve proxy service endpoints using Nifi's processors such as HandleHttpRequest? Is it designed to handle real-time concurrent requests efficiently?
You brought up a few technologies that are quite different..
Apache NiFi is a dataflow management tool. Unlike, Kafka Streams, Airflow or Apache Flume, it does not require you to write your own code. You can do almost anything you need using the existing processors developed by Apache.
Besides, Airflow is a workflow management tool, could be compared with Oozie.
NiFi is made for real time performance but not for serving as a Rest API. It can start a flow based on an http request like you said though.
Hope it helps
I have a streaming use case to develop an Spring boot application where it should read data from kafka topic and put into hdfs path, I got two distinct cluster for kafka and hadoop.
Application worked fine without having kerberos authentication in kafka cluster and hadoop being kerberized.
Issues started when both cluster being kerberized, At the same time i could only authenticate into only one cluster.
I did few analysis/googling , i could not find much of help,
My theory is we could not login/authenticate into two kerberized cluster at same jvm instance because we need to set REALM and KDC details in code which are not client specific but jvm specific,
It might happen that i did not used proper APIs, I am very new to Spring boot.
I know we can do this by setting cross realm trust between clusters but i am looking for application level solutions if possible.
I got few questions
is it possible to login/authenticate two separate kerberized cluster at same jvm instance, if possible? please help me, use of Spring boot is preferred.
What would be the best solution to stream data from kafka cluster to hadoop cluster.
What would be the best solution to stream data from kafka cluster to hadoop cluster.
Kafka's Connect API is for streaming integration of sources and targets with Kafka, using just configuration files - no coding! The HDFS connector is what you want, and supports Kerberos authentication. It is open source and available standalone or as part of Confluent Platform.
I am trying to understand difference between Apache Nifi and Hortonworks Data Flow (HDF).
How they differ from each other in terms of capability and overall design ? What will be possible use cases for Nifi and HDF ?
Hortonworks Data Flow (HDF) is a platform for data collection, curation, analysis, and delivery. It is made up of Apache NiFi, Apache Kafka, Apache Storm, and Apache Ranger. You can read more about it here: https://hortonworks.com/products/data-center/hdf/
Apache NiFi is an open-source data flow tool, and is one of the tools included in HDF.
I'm having difficulty understanding the Zookeeper Hadoop framework. The main aspects of zookeeper I find confusing is understanding is how it handles consistency across its nodes, but also how it makes use of its distributed in-memory file system to handle co-ordination? Any help with these points would be great.
As #Yann said, Zookeeper is not related to Hadoop. Zookeeper is a coordination engine for services. With it, you can create distributable configuration, service discovery, leader election, among other features.
I suggest you to read this post to see how it works for load balancing and this other for service discovery. Other nice resource to read is Apache Curator that is a framework that make easy to use Zookeeper with Java or any other JVM language
I am working on Spring XD and GemFire XD. I want to understand how Spring XD's distributed environment works. I know spring xd uses either redis or rabittmq as the transport.
I am clear about this, I have install spring xd and rabittmq on one machine. I changed the redis.properties file and added hostnames.
Do I need to install spring xd on all the machines? If so, after installing, how to bring those up.
On the master machine, I will do ./xd-admin and ./xd-container
How do you start up the nodes (spring xd instances/workers) so that they can listen for instructions from xd-admin?
Please help me on this.
Thanks,
-Suyodhan
Redis is used for analytics as only supported platform. For transport, you need either Redis or Rabbit.
Basically you just need to install Redis and RabbitMQ per their respective documentation. They can be in same or different servers, Ideally you would use their high availability option. For example Redis Sentinal. YOu don't need RabbitMQ unless you want to change the default transport from Redis to Rabbit. Once you install Redis and Rabbit, bring them up and provide their host:port info (and any additional as applicable) to the servers.yml in XD install (in all nodes) and bring up admin and containers. Evrything should work automatically by using zookeeper as the means to manage the distributed runtime.
If you use Spring XD in distributed mode, I assume you have set up zookeeper as well. (If not check this http://docs.spring.io/spring-xd/docs/1.0.0.M7/reference/html/#_setting_up_zookeeper )
Admin and Container instances register themselves with Zookeeper as they come up. Admin queries zookeeper for available containers and assign tasks like deploying modules. Zookeeper is the trick behind Distributed mode.
Hope this helps.
You will install Spring xd one time on one machine, Spring XD will be connected to your hdfs distributed scaled out environment.
You need to start the followings:
1. redis or rappitMQ in your case
2. hsqldb server
3. container
4. admin
when you start spring xd, you need to register the name node firstly using the command:
hadoop config fs --name hdfs://serverip:8020
then you can use any module defined in spring xd (using stream or batch) by specifying its parameters directly without specifying those in the server.yml file.
Moha.