How to get multiple data using single flow? - apache-nifi

I want to fetch the multiple table data using a single processor and push the data into respective tables using NiFi, For that which processors do I need to use? and what is the advantage of single flow instead of multiple flows?
Thank you.

As per NiFi-5221 jira, Starting from NiFi-1.7 you can use DatabaseLookupService controller service.
Now using lookupservice you can get more than one table data using the single flow.
Refer to this and this links for more references.

Related

Looking for multi database data processing for spring batch

I have a situation where I need to call 3 databases and create a CSV.
I have created a Batch step where I could get the data from my First database.
This gives around 10000 records.
Now from each of these records I need to get the id and use it to fetch the data from other data source. I could not able to find best solution.
Any help in finding the solution is appreciated
I tried two steps for each data source but not sure how to pass the ids to next step. ( we are talking about 10000) ids.
Is it possible to connect to all 3 databases in the same step? I am new to Spring batch so not have full grasp of all the concepts.
You can do the second call to fetch the details of each item in an item processor. This is a common pattern and is described in the Common Batch Patterns section of the reference documentation.

What is the best way to consume a WebSocket API for PowerBI via OracleDB?

My data source is a WebSocket API that provides a channel to listen to.
The final destination is for use in PowerBI for near Real-Time reporting.
Ideally I need to first write this data to an Oracle DB for data transformation before using DirectQuery in PowerBI.
Also, have Talend at my disposal for ETL.
What would be the best practice solution look like?
I don't know if this is the best practice, but here's how I would do it with Talend:
tRESTClient (API) --> format/extract data (JSON, XML, etcc.) --> tDBOutput (Oracle)
Afterwards, depending on the amount of data to be processed, we could do a first step of collecting the data and saving it in DB.
In a second step, we prepare either with Talend in new tables or views in DB, the desired data for PowerBI

Getting transformation configuration from custom processor: Nifi

I trying to test a functionality for Nifi. The data I pulled from database consist of specific columns say "id". I need to use Nifi to transform the column name to "customer_id". I understood this is a easy job using something like jolt. But my problem is I need to pull these configuration or rules from somewhere else let say in another database or some other place. I don't want to hard code in the jolt transform to specify the column names instead get it from some other location. Is there any best practice or best way of doing this? Will I have to write any customer processor for this and if so what is the best place to start referring for writing the custom processors?
Many different ways you can do transforms as well as JOLT - it is worth researching using Records and Schema's in NiFi.
But on to your problem - you could use LookupRecord with LookupServices for pulling the configurations, for example, you could pull them out of a database or from a REST endpoint. There are many LookupServices - read the LookupRecord docs page for a list of them.

Which Nifi processor to use for RDBMS Extract

i will explain my use case to understand which DB extract utility to use.
I need to extract data from SQL Server tables with varying frequency each day. Each extract query is a complex SQL statement, involving 5-10 tables in joins etc with multiple causes. Have around 20-30 such statements overall.
All these extract queries might be required to run multiple times a day with varying frequencies each day. It depends on how many times we receive data from source system or other cases.
We are planning to use Kafka to publish a message to let Nifi workflow know whenever a RDBMS table is updated and flow needs to be triggered (i can't just trigger Nifi flow based on "incremental" column value, there might only be all row update scenarios and we might not create new rows in tables).
How should i go about designing my Nifi. There are ExecuteSQL/GenerateTableFetch/ExecuteSQLRecord/QueryDatabaseTable all sorts of components available. Which one is going to fit my requirement best?
Thanks!
I am suggesting that you use ExecuteSQL. You can set query from attribute or compose it using attribute. Easiest way is to create json and then parse that json and create attributes. Check this example, here is sql created from file you can adjust it to create it from kafka link

How to implement chunk processing using custom ItemReader

I am using Spring batch 2.1.9.RELEASE
I need to configure a job-step which reads the data from Mysql DB, process it and write back to Mysql. I want to do it in chunks.
I considered using JdbcCursorItemReader but the SQL is a complex one. I need to fetch data from three other tables to create the actual SQL to use in the reader.
But if I use a customItemReader with JdbcTemplate/NamedParameterJdbcTemplate, how can i make sure the step processes the data in chunk? I am not using JPA/DAOs.
Many thanks,
In Spring-batch data are normally processed as chunk; the easy way is to declare a commit-interval in step definition; see Configuring a step.
Another way to define a custom chunk policy is to implements your own CompletionPolicy.
To answer your question use the Driving Query Based ItemReaders to read from main table and build complex object (reading from other tables), define a commit-interval and use the standard read/process/write step pattern.
I hope I was clear, English is not my language.

Resources