Error when running NiFi ExecuteSQL processor. Datetime column can not be converted to Avro format - apache-nifi

everyone. I'm learning about some NiFi processors.
I want to obtain all the data of several tables automatically.
So I used a ListDatabaseTable processor with the aim of getting the tables names that are in a specific catalog.
After that, I used other processors to generate the queries like GenerateTableFetch and
RemplaceText. Everything works perfectly since here.
Finally, ExecuteSQL processor plays a role, and here and error is displayed. It says that a datetime column can not be converted to Avro format.
The problem is that there are several tables so specify those columns would be complicated to cast them.
Is a possible solution to fix the error?
The connection is with Microsoft SQL Server.
Here is the image of my flow :

Related

Read data from multiple tables at a time and combine the data based where clause using Nifi

I have scenario where I need to extract multiple database table data including schema and combine(combination data) them and then write to xl file?
In NiFi the general strategy to read in from a something like a fact table with ExecuteSQL or some other SQL processor, then using LookupRecord to enrich the data with a lookup table. The thing in NiFi is that you can only do a table at a time, so you'd need one LookupRecord for each enrichment table. You could then write to a CSV file that you could open in Excel. There might be some extensions elsewhere that can write directly to Excel but I'm not aware of any in the standard NiFi distro.

Import Sqoop column names issue

I have a question on Kylo and Nifi.
The version of Kylo used is 0.10.1
The version of Nifi used is 1.6.0
When we create a feed for database ingest (using database as source), in the Additional Options step there is no provision to enter the source table column names.
However, in Nifi side, we use an Import Sqoop processor which has a mandatory field called Source Fields and it requires that the columns be entered, separated by commas. If it is not done, we get an error:
ERROR tool.ImportTool: Imported Failed: We found column without column name. Please verify that you've entered all column names in your query if using free form query import (consider adding clause AS if you're using column transformation)
For our requirement, we want Import Sqoop to take all the columns from the table automatically into this property without manual intervention at Nifi level. Is there any option to include all columns of a database table in the background automatically? Or is there any other possibility of giving this value in UpdateAttribute processor?
As mentioned in the Comments, ImportSqoop is not a not a normal Nifi processor. This does not have to be problem, but will mean it is probably not possible to troubleshoot the problem without involving the creator.
Also, though I am still debating whether Nifi on Sqoop is an antipattern, it is certainly not necessary.
Please look into the standard options first:
Standard way to get data into Nifi from tables is with standard processors such as ExecuteSQL
If that doesn't suffice, the standard way to use Sqoop (a batch tool) is with a batch scheduler, such as Oozie or Airflow
This thread may take away further doubts on point 1: http://apache-nifi.1125220.n5.nabble.com/Sqoop-Support-in-NIFI-td5653.html
Yes, Teradata Kylo Import Sqoop is not standard NiFi processor, but it's there for us to use. Looking deeper at processor's properties, we can see that indeed, SOURCE_TABLE_FIELDS is required there. Then you have an option to manually hard-code the list of columns or set up a method to generate the list dynamically.
Typical solution is to provide the list of fields is by querying table's metadata. A particular solution depends on where source and target tables are set up and how mapping is defined between source and target columns. For example, one could use databases' INFORMATION_SCHEMA tables and match columns by name. Because SQOOP's output should match the source, one has to find a way to generate the column list and provide it to ImportSqoop processor. A better yet approach could involve a separate metadata that would store the source and target information along with mappings and possible transforms (many tools are available there for that purpose, for example, Wherescape).
More specifically, I would use LookupAttribute paired with database or scripted lookup service to retrieve the column list from some metadata provider.

Nifi Fetching Data From Oracle Issue

I am having a requirement to fetch data from oracle and upload into google cloud storage.
I am using executeSql proecssor but it is failing for large table and even for table with 1million records of approx 45mb size it is taking 2hrs to pull.
The table name are getting passed using restapi to listenHttp which passes them to executeSql. I cant use QueryDatabase because the number of table are dynamic and calls to start the fetch is also dynamic using a UI and Nifi RestUi.
Please suggest any tuning parameter in ExecuteSql Processor.
I believe you are talking about having the capability to have smaller flow files and possibly sending them downstream while the processor is still working on the (large) result set. For QueryDatabaseTable this was added in NiFi 1.6.0 (via NIFI-4836) and in an upcoming release (NiFi 1.8.0 via NIFI-1251) this capability will be available for ExecuteSQL as well.
You should be able to use GenerateTableFetch to do what you want. There you can set the Partition Size (which will end up being the number of rows per flow file) and you don't need a Maximum Value Column if you want to fetch the entire table each time a flow file comes in (which also allows you do handle multiple tables as you described). GenerateTableFetch will generate the SQL statements to fetch "pages" of data from the table, which should give you better, incremental performance on very large tables.

Read data from multiple tables and evaluate results and generate report

I have some question regarding the effective way of reading values in DB and generating report.
I use hadoop to see data from multiple tables and do data analysis based on the results.
I want to know if there is effective tool or way which can read data from multiple tables and evaluate if the values of certain columns are same across tables and send report if they are not same... I have 2 options, either I can read data from hadoop or I can connect to DB in DB2 and do it. Without creating a new java program, is there a tool which helps for the same? Like Talend tool which reads XML and writes output in DB ?
You can use Talend for this. Using Talend, you can read data from Hadoop as well as from database. In between you can perform your operation after fetching data and generate report.
if your using alot of data, and do this sort of function alot elasticsearch is also a great help in this area. use ELK stack. although you would not need the 'L' logstash part of this necessarily

Getting execution time from QueryDatabaseTable in NiFi

I am using the process QueryDatabaseTable in NiFi for incrementally getting data from a DB2. QueryDatabaseTable is scheduled to run every 5 minutes. Maximum-value Columns is set to "rep" (which corresponds to a date, in the DB2 db).
I have a seperate MySQL database I want to update with the value "rep", that QueryDatabaseTable uses to query the DB2 database with. How can i get this value?
In the logfiles I've found that the attributes of the FlowFiles does not contain this value.
QueryDatabaseTable doesn't currently accept incoming flow files or allow the use of Expression Language to define the table name, I've written up an improvement Jira to handle this:
https://issues.apache.org/jira/browse/NIFI-2340

Resources