Nifi Processor to run multiple instance - apache-nifi

I have a use case where we are using NiFi to extract data from multiple sources. The database information for DbcpConnectionPool and ExecuteSQL processor are updated by NiFi rest api in the processor when user select the table from the custom UI. Now the issue is suppose we have triggered Oracle extraction for a particular database and at the same time some other user logs in and trigger Oracle process for some other database. Now the rest call are sent using processors ID. Since there is only one flow for oracle the second call will stop the first flow.
Is there anyway to fix this? Like dynamically create the dbcpConnection processor or processGroup for each request.
Attaching the current flow.
The ListenHttp calls contains the table Name to be fetched. Before the ListenHttp call is made a rest call is made to update the dbcpConnectionPool with URL and details and also update the ExecuteSql Processor with necessary processor. Now if in between one request being served another request comes up how to handle that.

Related

JMeter - Performance Testing - Third Party Backend GET Response

I am currently working on load testing an application, where the users can create orders. Once the order is created, the request reaches the Middleware which triggers a scheduler. From the scheduler, the GET Status reaches a 3rd Party API and the Response is stored at the Backend (DB). The GET Status response can only be seen on the Backend and it will not be visible to the User Interface. Please help on How to record this GET Status Response at the Backend using latest version of Jmeter.
You can use JDBC Request sampler for reading information from the database
Download JDBC Driver for the database you're using and drop it to JMeter Classpath
Restart JMeter to pick up the driver
Add JDBC Connection Configuration element and specify database URL, credentials and thread pool name
In the JDBC Request sampler set the same thread pool name as in the point 3 and create an SQL Select query to fetch the response from the database. If you will need the response later on it can be stored into a JMeter Variable

Apache Kafka for an existing get request with Oracle DB

I’m trying to learn about streaming services and reading kafka doc’s :
https://kafka.apache.org/quickstart
https://kafka.apache.org/24/documentation/streams/quickstart
To take a simple example I’m attempting to refactor a Spring web services GET request which accepts an ID parameter and returns a list of attributes associated with that ID. The DB backend is Oracle.
What is the approach for loading a single Oracle DB table which can be served by Kafka ? The above docs don't contain information for this. Do I need to replicate the Oracle DB to a NoSql DB such as MongoDB ? (Why we require Apache Kafka with NoSQL databases?)
Kafka is an event streaming platform. It is not a database. Instead of thinking about "loading a single Oracle DB table which can be served by Kafka", you need to think in terms of what events are you looking for that will trigger processing?
Change Data Capture (CDC) products like Oracle Golden Gate (there are other products too) will detect changes to rows and send messages into Kafka each time a row changes.
Alternatively you could configure a Kafka JDBC Source Connector to execute a query and pull data into Kafka.

Row Level Locks in HBase

I have started using hbase recently, just wanted to check here if anyone came across the scenario which i have been facing right now.
I have a webservice deployed in couple of servers and accessing the HBase to update a field. Now this field update is conditional means i have to read the field from hbase and if its value is "A",then update to "B". If the concurrent update is "C" do not update. But since different servers and concurrent requests, possible that both read existing value as A and one update with B and other with "C".
If there are requests coming concurrently from different servers, then there is no use of thread level locking. Also multiple request from same server.
Is there a way to lock at the HBase level, so that i can aquire the lock at service layer and lock the row and then update it.
There is RowLock in HBase API, but we are using the higher version (1.1.2.3) of hbase where that class is removed.
Appreciate if someone could show a direction!!
Thanks in advance.

Manually logging database event in datastage job

i have a parallel job that writes in oracle table. I want to manually write warnings in Datastage's log if some event occur. For example if a certain value for a certain column is inserted i want to track this information in the log. Could this be achieved somehow?
To write custom messages into the logs for a particular jobs data stream, you can use a combination of a copy stage, transformer, and peak stage. The peak stage is the one that writes to the logs. I like to set the peak stage to run in sequential mode, so that your messages are kept together in single entries in the log, instead across nodes.
Also, you can peak the rejects of the oracle stage. maybe combine this with the above option (using a funnel stage and a standard column schema).
Lastly, if you'd actually like to query the logs themselves and write those logs out somewhere else or use them in a job (amoungst allother data kept about jobs in the repository). You can directly query the DSODB schema in the XMETA database. I.e. the DataStage repository (by default DB2).
You would need to have the DataStage Operations Console up and running for that (not sure what version of DataStage you're running). If DataStage is running on a single tier and using the default DB2 database. You can simply catalog the DSODB database so that it's available as a connection in the DB2 connector. Else you'd need to install a DB2 client on the DataStage engine tier and catalog the database there.
All the best!
Twitter: #InforgeAcademy
DataStage tips and Tricks: https://www.inforgeacademy.com/blog/

Manipulating Data Within AWS Redshift to a Schedule

Current Setup:
SQL Server OLTP database
AWS Redshift OLAP database updated from OLTP
via SSIS every 20 minutes
Our customers only have access to the OLAP Db
Requirement:
One customer requires some additional tables to be created and populated to a schedule which can be done by aggregating the data already in AWS Redshift.
Challenge:
This is only for one customer so I cannot leverage the core process for populating AWS; the process must be independent and is to be handed over to the customer who do not use SSIS and don't wish to start. I was considering using Data Pipeline but this is not yet available in the market in which the customer resides.
Question:
What is my alternative? I am aware of numerous partners who offer ETL like solutions but this seems over the top, ultimately all I want to do is execute a series of SQL statements on a schedule with some form of error handling/ alert. Preference of both customer and management is to not use a bespoke app to do this, hence the intended use of Data Pipeline.
For exporting data from AWS Redshift to another data source using datapipeline you can follow a template similar to https://github.com/awslabs/data-pipeline-samples/tree/master/samples/RedshiftToRDS using which data can be transferred from Redshift to RDS. But instead of using RDSDatabase as the sink you could add a JdbcDatabase (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.html). The template https://github.com/awslabs/data-pipeline-samples/blob/master/samples/oracle-backup/definition.json provides more details on how to use the JdbcDatabase.
There are many such templates available in https://github.com/awslabs/data-pipeline-samples/tree/master/samples to use as a reference.
I do exactly the same thing as you, but I use lambda service to perform my ETL. One drawback of lambda service is, it can run max of 5 mins (Initially 1 min) only.
So for ETL's greater than 5 minutes, I am planning to set up PHP server in AWS and with SQL injection I can run my queries, scheduled at any time with help of cron function.

Resources