Running multiple hive queries using tHiveRow component in Talend - hadoop

Hi i want to tun multiple hive queries through a single component. Through tHiveRow i'm able to run single query but unable to run multiple queries at a time.
I know that we can run multiple sql queries after going through the following link http://www.vikramtakkar.com/2013/05/example-to-execute-multiple-sql-queries.html
But any one has any idea as how to run multiple queries?

Your link reference shows a MySQL connection... this says nothing about the Hive JDBC driver capabilities, since running multiple statements in one JDBC statement is a driver specific feature!
To run multiple queries:
Start with a tFixedFlowInput component. Configure one String column and choose table input option; you will get a table with one column. Each line, you add, will be one Hive statement. Now connect it with a tHiveRow component and use the column of the ingoing flow in the SQL textarea by <flowName>.<columnName> e.g.: row1.sqlStatement (if the String column in your tFixedFlowInput has the name "sqlStatement" and the connection between the tFixedFlowInput and the tHiveRow component is called "row1").

Related

H2 database table getting clear automatically

I am using H2 Database to test my SpringBoot application. I do not use the file to store the data. instead I just use the In Memory datatabase. in the properties file, my JdbcUrl look like below:
spring.datasource.url=jdbc:h2:mem:;MODE=MSSQLServer;INIT=runscript from 'classpath:/schema.sql'\\;runscript from 'classpath:/data.sql'
Now When I run the tests, I have the following test scenario
Add Some Entities in a table (This adds some rows in a table)
search those entities by some criteria
Do the assertion
Now, sometime this runs successfully, but sometimes what happens is, the search query returns empty list, which causes the test to be failed.
I tried to add print statements just to check whether my entities are getting inserted properly, so in the insert function. after each insertion, I run the below query
SELECT * FROM tableName;
Which returns correct list. means each insertion is inserting in the table correctly. Now in the search function, before running the actual search query, I run the same query again
SELECT * from tableName;
And Surprisingly this is returning empty also, which means there is no data in the table. Please suggest what I check for?
Pretty sure #Evgenij Ryazanov's comment is correct here.
Closing the last connection to a database closes the database.
When using in-memory databases this means the content is lost.
After step 1) Add Some Entities in a table - is the connection closing?
If so to keep the database open, add ;DB_CLOSE_DELAY=-1 to the database URL.
e.g.
spring.datasource.url=jdbc:h2:mem:;DB_CLOSE_DELAY=-1;MODE=MSSQLServer;INIT=runscript from 'classpath:/schema.sql'\\;runscript from 'classpath:/data.sql'
Note, this can create a memory leak!
see: http://www.h2database.com/html/features.html#in_memory_databases

Cloudera Hadoop Impala - Extracting last refresh date

Is there a way to get the list of all tables with the last refresh date from a database in the Cloudera Hadoop impala?
I'm trying to write a custom SQL query that can do that so I can use it to build a dashboard (in Tableau) where we can track if a table is refreshed or not. So we can take action accordingly. I tried it using a join but there are so many tables and I believe there is a better way to do it. (Database name Core_research and there are more than 500 tables)
I used to run a script that refreshed column stats on tables every Sunday. We couldn't run all the tables but we did as many as time permitted. You could do the same but actually record when the script ran in database/table. This would give you the functionality you are looking for.
Another other option would be to create a table out of the Impala logs and keep track of things that way. (With some fancy regex to track refreshes)

Which Nifi processor to use for RDBMS Extract

i will explain my use case to understand which DB extract utility to use.
I need to extract data from SQL Server tables with varying frequency each day. Each extract query is a complex SQL statement, involving 5-10 tables in joins etc with multiple causes. Have around 20-30 such statements overall.
All these extract queries might be required to run multiple times a day with varying frequencies each day. It depends on how many times we receive data from source system or other cases.
We are planning to use Kafka to publish a message to let Nifi workflow know whenever a RDBMS table is updated and flow needs to be triggered (i can't just trigger Nifi flow based on "incremental" column value, there might only be all row update scenarios and we might not create new rows in tables).
How should i go about designing my Nifi. There are ExecuteSQL/GenerateTableFetch/ExecuteSQLRecord/QueryDatabaseTable all sorts of components available. Which one is going to fit my requirement best?
Thanks!
I am suggesting that you use ExecuteSQL. You can set query from attribute or compose it using attribute. Easiest way is to create json and then parse that json and create attributes. Check this example, here is sql created from file you can adjust it to create it from kafka link

Role of H2 database in Apache Ignite

I have an Apache Spark Job and one of its components fires queries at Apache Ignite Data Grid using Ignite SQL and the query is a SQLFieldsQuery. I was going through the thread dump and in one of the Executor logs I saw the following :
org.h2.mvstore.db.TransactionStore.begin(TransactionStore.java:229)
org.h2.engine.Session.getTransaction(Session.java:1580)
org.h2.engine.Session.getStatementSavepoint(Session.java:1588)
org.h2.engine.Session.setSavepoint(Session.java:793)
org.h2.command.Command.executeUpdate(Command.java:252)
org.h2.jdbc.JdbcStatement.executeUpdateInternal(JdbcStatement.java:130)
org.h2.jdbc.JdbcStatement.executeUpdate(JdbcStatement.java:115)
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.connectionForThread(IgniteH2Indexing.java:428)
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.connectionForSpace(IgniteH2Indexing.java:360)
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryLocalSqlFields(IgniteH2Indexing.java:770)
org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:892)
org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:886)
org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)
org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:1666)
org.apache.ignite.internal.processors.query.GridQueryProcessor.queryLocalFields(GridQueryProcessor.java:886)
org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:698)
com.test.ignite.cache.CacheWrapper.queryFields(CacheWrapper.java:1019)
The last line in my code executes a sql fields query as follows :
SqlFieldsQuery sql = new SqlFieldsQuery(queryString).setArgs(args);
cache.query(sql);
According to my understanding, Ignite has its own data grid which it uses to store the cache data and indices. It only makes use of H2 database to parse the SQL query and get a query execution plan.
But, the Thread dump shows that updates are being executed and transactions are involved. I don't understand the need for transactions or updates in a SQL Select Query.
I want to know the following about the role of H2 database in Ignite :
I went into the open source code of Apache Ignite(version 1.7.0) and saw that it was trying to open a connection to a specific schema in H2 database by executing the query SET SCHEMA schema_name ( connectionForThread() method of IgniteH2Indexing class ). Is one schema or one table created for every cache ? If yes, what information does it contain since all the data is stored in ignite's data grid.
I also came across another interesting thing in the open source code which is that Ignite tries to derive the schema name in H2 from space name ( reference can be found in queryLocalSqlFields() method of IgniteH2Indexing class ). I want to know what does this space name indicate and is it something internal to Ignite or configurable ?
Would the setting of schema and connection to H2 db happen for each of my SQL query, if yes then is there any way to avoid this ?
Yes, we call executeUpdate to set schema. In Ignite 2.x we will be able to switch to Connection.setSchema for that. Right now we create SQL schema for each cache and you can create multiple tables in it, but this is going to be changed in the future. It does not actually contain anything, we just utilize some H2 APIs.
Space name is basically the same thing as a cache name. You can configure SQL schema name for a cache using CacheConfiguration.setSqlSchema.
If you run queries using the same cache instance, schema will not change.

Using ColumnName instead of ColumnLabel with clojure.java.jdbc

I am currently using clojure.java.jdbc to connect to a SAS server and pull data from it. I can connect and run queries. The one problem that I have run into is that the clojure jdbc library uses the column labels when constructing the keys in the results map. In many instances this field is coming back blank and I would prefer to use the column name instead.
One work around would be to specify the label as part of the query, which I can do if there are no other simpler options.

Resources