I have a requirement in which i have to execute multiple sql queries in Nifi using executesql processor.
Those queries are dependent on each other as I am storing data in session based temporary table.
So I need to know if I do so , whether all the queries will get executed in single database session.
Example:
Query a data is inserted into Temp_A, Now I need that data in next query so will it be possible.
Note: I am talking about session based Temporary table only.
you can use ExecuteSQL processor where in SQL Pre-Query parameter that could contain multiple commands separated with semicolon ;
all those commands with main sql query will be executed using the same connection for one flow file.
note that the same sql connection could be used for the next flow file.
Related
I've a small program in Linq that inserts records in my database based on a number of conditions (depending on existing data in the same db). The SQL output are lots of SELECT and INSERT statements. In my production environment this should be executing from a SQL script and reviewed first. Is it possible to generate the SQL output without actually running the query?
Background: I am really new. Informatica Developer for PowerCenter Express Version: 9.6.1 HotFix 2
I want to execute a t-sql statement as one step in a work flow:
truncate table dbo.stage_customer
I tried create a mapping, add a sql transformation on it. Input above query in sql query window. I added the mapping to a workflow of just start, the mapping, and the end. When I validate the flow I got this error:
The group [Input] in transformation xxx must have at least one port
I have no idea what ports are needed since this (the truncate statement) basically doesn't need input or output.
Use your query " truncate table dbo.stage_customer" in Pre-SQL command
As Aswin suggested use the built in option in the session property.
But in the production environments user may not have truncate table access for the table in a database. In this case, informatica workflow will fail if you check the truncate target table option. It is good to have a stored procedure to truncate the target table and use that stored procedure in informatica mapping to avoid workflow failures in case of user having no truncate access to the database.
if you would like to truncate a target table before loading why don't you use the in-built option present in session properties?
goto workflow manager-> open session->mapping tab->click on target table listed left side->choose the property "Truncate table option" just enable it
to answer you question, I think you have to connect at least one input and output port into SQL transformation (because it is not unconnected). Just create dummy ports and try again
try this article - click here
HIVE 0.13 will SHARED lock the entire database(I see a node like LOCK-0000000000 as a child of the database node in Zookeeper) when running a select statement on any table in the database. HIVE creates a shared lock on the entire schema even when running a select statement - this results in a freeze on CREATE/DELETE statements on other tables in the database until the original query finishes and the lock is released.
Does anybody know a way around this? Following link suggests concurrency to be turned off but we can't do that as we are replacing the entire table and we have to make sure that no select statement is accessing the table before we replace the entire contents.
http://mail-archives.apache.org/mod_mbox/hive-user/201408.mbox/%3C0eba01cfc035$3501e4f0$9f05aed0$#com%3E
use mydatabase;
select count(*) from large_table limit 1; # this table is very large and hive.support.concurrency=true`
In another hive shell, meanwhile the 1st query is executing:
use mydatabase;
create table sometable (id string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ;
The problem is that the “create table” does not execute untill the first query (select) has finished.
Update:
We are using Cloudera's distribution of Hive CDH-5.2.1-1 and we are seeing this issue.
I think they never made such that in Hive 0.13. Please verify your Resource manager and see that you have enough memory when you are executing multiple Hive queries.
As you know each Hive query will trigger a map reduce job and if YARN doesn't have enough resources it will wait till the previous running job completes. Please approach your issue from memory point of view.
All the best !!
When we are talking about parallel mode with sqlloader what does that actually mean? When i have in my script to execute:
Sqlldr control=first.ctl parallel=true direct=true data=first.unl
Sqlldr control=second.ctl parallel=true direct=true data=second.unl
I am inserting into 2 tables using as a data file for the inserts of the first table the first.unl and for the 2nd table the second.unl
By having parallel=true and direct=true will this run the 2 instances of sqlloader for first.unl and second.unl in parallel or will it run the first instance and do multiple inserts based on the first.unl and the run the second instance and do multiple inserts again based on the second.unl?
From the documentation
"PARALLEL specifies whether direct loads can operate in multiple concurrent sessions to load data into the same table."
So, one instance of SQL Loader uses multiple sessions to insert into one table. The actual degree of parallelism is governed by the usual parallelization parameters.
"So i cannot have inserting into multiple tables in parallel?"
If you kick off two SQL Loader instances they will run simultaneously. You need to be careful that you have enough CPU to handle the number of threads you're spawning.
Hey EXPERIENCED SSIS DEVELOPERS, I need your help.
High-Level Requirements
Query SQL Server table (on a different server than my SSIS server) resulting in about 200-300k records results set.
Use three output colums for each row to lookup date in Oracle database.
Insert or Update SQL Server table with results.
Use SSIS.
SQL Server 2008
Sounds easy, right?
Here is what I have done:
Created on Control Flow Execute SQL Task that gets a recordset from SQL Server. Very fast, easy query, like select field1, field2, field 3 from table where condition > 0. That's it. Takes less than a second.
Created a variable (evaluated as expression) for the Oracle query that uses the results set from the above in the WHERE clause.
Created a ForEachLoop Container that takes the results (from #1 above) for each row in the recordset and runs it through a Data Flow that uses the Oracle query (from #2 above) with Data access mode: SQL command from variable against an Oracle data source. Fast, simple query with only about 6 columns returned.
Data Conversion - obvious reasons - changing 3 columns from Oracle data types to SQL Server data types.
OLE DB Destination to insert to SQL Server using Fast Load to staging table.
It works perfectly! Hooray! Bad news - it is very, very slow. When I say slow, I mean it process 3000 records per hour. Holy moly - so freaking slow.
Question: am I missing a way to speed it up? It seems like the ForEachLoop Container is the bottleneck. Growl.
Important Points:
- I have NO write access in Oracle environment, so don't even suggest a potential solution that requires it. Not a possibility. At all.
Oracle sources do not allow for direct parameter definition. So no SELECT FIELD FROM TABLE WHERE ?. Don't suggest it - doesn't work.
Ideas
- Should I find a way to break down the results of the Execute SQL task and send them through several ForEachLoop Containers for faster processing?
Is there another design that is more appropriate?
Is there a script I can use that is faster?
Would it be faster to create a temporary table in memory and populate it - then use the results to bulk insert to SQL Server? Does this work when using an Oracle data source?
ANY OTHER IDEAS?