How to design templates for clustered nifi - apache-nifi

Do we need to think about underlying cluster while designing nifi templates?
Here is my simple flow
+-----------------+ +---------------+ +-----------------+
| | | | | |
| READ FROM | | MERGE | | PUT HDFS |
| KAFKA | | FILES | | |
| +-----------------------> | +---------------------> | |
| | | | | |
| | | | | |
| | | | | |
+-----------------+ +---------------+ +-----------------+
I have 3 nodes cluster.. When system is running I check "cluster" menu and see only master node is utilizing sources, other cluster nodes seems idle... The question is in such a cluster should I design template according to cluster or nifi should do the load balancing.
I saw one of my colleagues created remote processors for each node on cluster and put a load balancer in front of these within template, is it required? (like below)
+------------------+
| | +-------------+
| REMOTE PROCESS | | input port |
+----> | GROUP FOR | | (rpg) |
| | NODE 1 | +-------------+
| | | |
| | | |
| +------------------+ v
+-----------------+ +-----------------+ RPG
| | | | | +--------------+
| READ FROM | | | | | |
| KAFKA | | LOAD BALANCER | | +------------------+ | MERGE FILES |
| +-------------> | +-------------> | | | |
| | | | | | REMOTE PROCESS | | |
| | | | | | GROUP FOR | | |
| | | | | | NODE 2 | | |
+-----------------+ +-----------------+ RPG | | +--------------+
| +------------------+ |
| |
| v
|
| +-------------------+ +---------------+
| | | | |
| | REMOTE PROCESS | | PUT HDFS |
+-----> | GROUP FOR | | |
| NODE 3 | | |
| | | |
| | | |
+-------------------+ +---------------+
And what is the use-case for load-balancer except remote clusters, can I use load-balancer to split traffic into several processors to speedup the operation?

Apache NiFi does not do any automatic load balancing or moving of data, so it is up to you to design the data flow in a way that utilizes your cluster. How to do this will depend on the data flow and how the data is being brought into the cluster.
I wrote this article once to try and summarize the approaches:
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
In you case with Kafka, you should be able to have the flow run as shown in your first picture (without remote process groups). This is because Kafka is a data source that will allow each node to consume different data.
If ConsumeKafka appears to be running on only one node, there could be a couple of reasons for this...
First, make sure ConsumeKafka is not scheduled for primary node only.
Second, figure out how many partitions you have for your Kafka topic. The Kafka client (used by NiFi) will assign 1 consumer to 1 partition, so if you have only 1 partition then you can only ever have 1 NiFi node consuming from it. Here is an article to further describe this behavior:
http://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka

Related

how can a table of all databases be sent to elasticsearch?

Here's my situation.
Food_database is in mysql.
There are 130 tables in food_database
I would like to send 130 tables to elasticsearch via logstash_jdbc.
-> how can a table of all databases be sent to elasticsearch?
my conf file (attempt)
input {
jdbc {
clean_run => true
jdbc_driver_library => "C:\ElasticSearch\mysql-connector-java-8.0.23\mysql-connector-java-8.0.23.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/food_database?useSSL=false&user=root&password=1234"
jdbc_user => "root"
jdbc_password => "1234"
schedule => "* * * * *"
statement => "select * from ??????"
#use_column_value => true
#tracking_column => "jobid"
}
}
output{
elasticsearch {
hosts => "localhost:9200"
index => "test_indexfile"
}
stdout {
codec => rubydebug
}
}
But I don't know how to send all 130 tables in food_databases.
I found a similar question through googling, but I couldn't solve it.
-> save whole database to elasticsearch using logstash
-> https://dzone.com/articles/migrating-mysql-data-to-elasticsearch-using-logsta
Please help me.
update posting (tables in food_database)
+--------------------------------------+
| Tables_in_food_database |
+--------------------------------------+
| access_token |
| activity |
| address |
| answer_abuse_reason |
| answer_report_abuse |
| attribute |
| attribute_group |
| banner |
| banner_group |
| banner_image |
| banner_image_description |
| blog |
| blog_related |
| category |
| category_commission |
| category_description |
| category_path |
| contact |
| country |
| coupon |
| coupon_product_category |
| coupon_usage |
| coupon_usage_product |
| currency |
| customer |
| customer_activity |
| customer_cart |
| customer_document |
| customer_group |
| customer_ip |
| customer_transaction |
| customer_wishlist |
| delivery_allocation |
| delivery_location |
| delivery_location_to_location |
| delivery_person |
| delivery_person_to_location |
| delivery_status |
| email_template |
| geo_zone |
| jobs |
| language |
| login_log |
| manufacturer |
| migrations |
| order |
| order_cancel_reason |
| order_history |
| order_log |
| order_product |
| order_product_log |
| order_status |
| order_total |
| page |
| page_group |
| payment |
| payment_archive |
| payment_items |
| payment_items_archive |
| paypal_order |
| paypal_order_transaction |
| permission_module |
| permission_module_group |
| plugins |
| price_update_file_log |
| product |
| product_answer |
| product_answer_like_dislike |
| product_attribute |
| product_description |
| product_discount |
| product_image |
| product_price_log |
| product_question |
| product_rating |
| product_related |
| product_special |
| product_stock_alert |
| product_tag |
| product_tire_price |
| product_to_category |
| product_varient |
| product_varient_option |
| product_varient_option_details |
| product_varient_option_image |
| product_view_log |
| quotation |
| razorpay_order |
| razorpay_order_transaction |
| service |
| service_category |
| service_category_path |
| service_enquiry |
| service_image |
| service_to_category |
| sessions |
| settings |
| settlement |
| settlement_item |
| site_filter |
| site_filter_category |
| site_filter_section |
| site_filter_section_item |
| sku |
| stock_log |
| stock_status |
| stripe_order |
| stripe_order_transaction |
| tax |
| trend |
| trend_image |
| trend_recommend |
| user_group |
| users |
| varients |
| varients_value |
| vendor |
| vendor_category |
| vendor_coupon |
| vendor_coupon_product_category |
| vendor_global_setting |
| vendor_invoice |
| vendor_invoice_item |
| vendor_order_archive |
| vendor_order_archive_log |
| vendor_order_products |
| vendor_order_status |
| vendor_orders |
| vendor_orders_log |
| vendor_payment |
| vendor_payment_archive |
| vendor_product |
| widget |
| widget_item |
| zone |
| zone_to_geo_zone |
+--------------------------------------+
136 rows in set (0.00 sec)
I would like to send all the values of my goals 136 tables to elasticsearch via logstash.
If running a script next to logstash would be an option I would go for the following approach:
Create a bash script (or whatever language your preference has), put this in cron to do a simple 'show tables' and use the output in order to create 130 config files only containing the INPUT part for logstash with a naming convention like 'INPUT_tablename.conf'. This script should create the config as shown above, for each table that exists.
Make sure it lists the INPUT_* files in the directory and deletes the ones that no longer exists.
Make sure that when a file already exists it does not touch it
have your FILTER.conf and OUTPUT.conf in the same directory
Put you logstash in auto reload config mode
By doing it this way you seperate the thing you are struggling with and allows the database to have changes in tables, new ones that are added, and old ones that might be deleted or renamed.
I've learned to do it this way on clusters that I know will become very large and where I need to learn when the maximum io is being hit so i know when to add new nodes to which layer without killing the complete setup.

Spring-Google OAuth2 for stand alone applications

I have a JavaFX client application, where it can log in to the service using Google OpenID.
Client user clicks on Google Sign In
Following URL will be opened in the default browser
https://accounts.google.com/o/oauth2/v2/auth?
response_type=code&
client_id=
scope=email&
redirect_uri=localhost:8080/login/oauth2
state=qwerrffadfadf
User logs in to Google & Authorize
Page will be auto redirected to localhost:8080/login/oauth2 with auth code
Server exchange auth code to access token
Server validate and get user's email address
Now, when JavaFX client sends a request, the server should allow any resources to the authenticated user. But, as you can see, there is no way to connect Google users to the JavaFX client session. I have seen applications are using similar methods to allow social login (Postman, Nvidia Experience).
How should I handle this?
+------------------------+
+---------+ Authorization Code +------------+
| | To | |
| + ----+ Access token exchange +--------+ |
| | +------------------------+ | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| v | |
+---+---+----+ +--------+---v------+
| | Authorization | |
| MyServer +<-------+Code-+ | Google Resource |
| | | | Server |
+------------+ | | |
| +----+---------+----+
| | ^
+-------------------------------------+ | | |
| | | | |
| | | | |
| +-----------------+ | | | |
| | | | | | |
| | JavaFX Client | | | | |
| | | | | Authorization |
| +-------+---------+ | | Code | Authorization
| | | | | | Request
| + | | | |
| Clicked on Google Sign in | | | |
| | | | | |
| v | | | |
| +------+-------+ | | | |
| | +--------------+ | |
| | | | | |
| | Client Web +<-----------------------------+ |
| | Browser | | |
| | +----------------------------------------+
| +--------------+ |
| |
| |
| Client's Computer |
| |
+-------------------------------------+

Configuration of I2C bus over a switch in Linux

I have the following simplified schematic for my system.
GPIO4
+-----------------------------+
| GPIO3 |
| +------------------------+ |
| | GPIO2 | |
| | +-------------------+ | |
| | | GPIO1 | | |
| | | +--------------+ | | |
| | | | | | | |
+-+--+--+--+-+ | | | |
I2C1 +-------------+ | | | | |
| | ++-+-+-++
I2C2 +-------------+ | I2C | |
| SWITCH +-----------+ CPU |
I2C3 +-------------+ | | |
| | +-------+
I2C4 +-------------+ |
+------------+
What is the best way to implement/configure this in Linux and why? Is it possible to somehow just configure this in the device tree exporting 4 i2c devices in the user-space and whenever someone wants to write/read to one of those 4 i2c buses the corresponding GPIO is asserted?
Is it only possible to be implemented in a kernel module/driver?
From the comments:
There is an "i2c-mux-gpio" driver which should help with this. – #Ian Abbott Sep 12 '19 at 11:35

How to share Hazelcast cache over multi-war Tomcats

We have multiple Tomcats, each with multiple .war files (= Spring Boot app) deployed in it.
We now need some distributed caching between app1 on tomcat1 and app1 on tomcat2. It´s essential that app2 on tomcat1 (and app2 on tomcat2) cannot see the Hazelcast cache of the other deployed apps.
The following image shows this situation:
Tomcat 1 Tomcat 2
+-----------------------------------+ +-----------------------------------+
| | | |
| app1.war app2.war | | app1.war app2.war |
| +----------+ +----------+ | | +----------+ +----------+ |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| +----+-----+ +----+-----+ | | +----+-----+ +-----+----+ |
| | | | | ^ ^ |
+-----------------------------------+ +-----------------------------------+
| | | |
| | | |
| | | |
| | | |
+--------------------------------------+ |
Shared cache via Hazelcast | |
| |
+---------------------------------------+
Shared cache via Hazelcast
Is this possible with Hazelcast? And if so, how?
Right now I only find solution talking about shared web sessions via Hazelcast. But this doesn´t seem to be a solution for me here, or am I wrong?
If your applications must be strictly isolated, then you probably need to use different cluster groups. Cluster groups make it possible for different clusters to coexist on the same network, while being completely unreachable to one another (assuming correct configuration).
If, however, you just need application data to be separate, then you can just make sure that app1 instances use caches with names that do not clash with app2 cache names. This is the simplest implementation.
If you are deploying a sort of multitenant environment where you have security boundaries between the two groups of applications, then going for the cluster group option is better as you can protect clusters with passwords, and applications will be using distinct ports to talk to one another in those groups.
Yes, this is possible.
You can configure the cache name.
Application app1 uses a cache named app1. Application app2 uses a cache named app2.
If you configure it correctly then they won't see each's others data.
If by "essential" that they can't you mean that you have a stronger requirement than preventing accidental mis-configuration, then you need to use role-based security.

How to launch an LXD container on another node and exchange ssh keys with the container?

How to launch and LXD container on another node and exchange ssh keys with the container?
That is, how to give Ansible direct access to the LXD container using SSH?
I am aware of the authorized_key module however this would only exchange keys between the host and Ansible and not Ansible and the LXD container.
Please see the below diagram which describes the machine layout:
+----------------------------+ +----------------------------+
| | | |
| Baremetal Machine <------------------+ Ansible Machine |
| + | | |
| | | | |
| | | | |
| | | | |
| +--------------------+ | | |
| | | | | | |
| | v | | | |
| | LXD Container | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| +--------------------+ | | |
| | | |
+----------------------------+ +----------------------------+
Start containers from images that support some sort of provisioning system.
Most common is cloud-init – it's already inside many official cloud images.
When you create such a container, just add required configuration settings via user.user-data config option and it will be automatically applied when container started.
lxd_container module support config parameter to set container configuration options.
You can find useful cloud config examples here.

Resources