I am having a weird issue. I am building an automated testing A/B tool in order compare Impala to other sources. I have 100 queries I am trying to run, that run fine through other sources and within Impala through Hue. I try to run certain queries into Impala from Java and it hangs. I run same text in Hue and it works. I figure it’s a non-printable character and remove all of them before issuing the query. Still hangs. No exception bubbles up, sits there for an hour. No evidence of query within Cloudera Manager I am at a loss. This is for SQL A/B testing. I have Jethro running consistently in 1-2 second range on almost all queries we tested. I can't document comparison to Impala across 100 queries until I get over this hump. I tried setting Simba JDBC logging through connection string and that did nothing.
Any thoughts on next steps for debugging the issue? Believe it or not, I am locked down corporate machine and do not have HEX or ASCII char view of my query.
Related
I am doing POC for Apache Zeppelin for reporting purpose. I created a JDBC Interpreter to connect Oracle database. This way our user can execute queries and generate reports. Currently they have to login through toad to fetch data/report every day. We must restrict that.
But problem is, they can also execute update/delete queries. I want our user to run the Select queries and not any DML queries or Inserts.
Is there a way to restrict it? I expected some setting in configuration,
I tried to look on their documentation but couldn't find any clue.
I use datagrip to move some data from a mysql installation to another postresql-database.
That worked for 3 other tables like a charm. The next one, over 500.000 rows big, could not be imported.
I use the function "Copy Table To... (F5)".
This is the log.
16:28 Connected
16:30 user#localhost: tmp_post imported to forum_post: 1999 rows (1m
58s 206ms)
16:30 Can't save current transaction state. Check connection and
database settings and try again.
For other errors like wrong data types, null data on not null columns, a very helpful log is created. But not now.
The problem is also relevant when using the database plugin for IntelliJ-based IDEs, not only DataGrip
The simplest way to solve the issue is just to add "prepareThreshold=0" to your connection string as in this answer:
jdbc:postgresql://ip:port/db_name?prepareThreshold=0
Or, for example, if you a using several settings in the connection string:
jdbc:postgresql://hostmaster.com:6432,hostsecond.com:6432/dbName?&targetServerType=master&prepareThreshold=0
It's a well-known problem when connecting to the PostgreSQL server via PgBouncer rather than a problem with IntelliJ itself. When loading massive data to the database IntelliJ splits data into chunks and loads them sequentially, each time executing the query and committing the data. By default, PostgreSQL starts using server-side prepared statements after 5 execution of a query.
The driver uses server side prepared statements by default when
PreparedStatement API is used. In order to get to server-side prepare,
you need to execute the query 5 times (that can be configured via
prepareThreshold connection property). An internal counter keeps track
of how many times the statement has been executed and when it reaches
the threshold it will start to use server side prepared statements.
Probably your PgBouncer runs with transaction pooling and the latest version of PbBouncer doesn't support prepared statements with transaction pooling.
How to use prepared statements with transaction pooling?
To make prepared statements work in this mode would need PgBouncer to
keep track of them internally, which it does not do. So the only way
to keep using PgBouncer in this mode is to disable prepared statements
in the client
You can verify that the issue is indeed because of the incorrect use of prepared statements with the pgbouncer via viewing IntelliJ log files. For that go to Help -> Show Log in Explorer, and search for "org.postgresql.util.PSQLException: ERROR: prepared statement" exception.
2022-04-08 12:32:56,484 [693272684] WARN - j.database.dbimport.ImportHead - ERROR: prepared statement "S_3649" does not exist
java.sql.SQLException: ERROR: prepared statement "S_3649" does not exist
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgConnection.executeTransactionCommand(PgConnection.java:755)
at org.postgresql.jdbc.PgConnection.commit(PgConnection.java:777)
How does OBIEE generate the sql statements that are then run against the target database? I have a report that generates one SQL statement when executed against Oracle database and completely different when executed via jdbc driver against Apache Drill. My problem is that in the second case the query is not even syntactically valid.
I've read this - http://gerardnico.com/wiki/dat/obiee/query_compiler
but still don't understand the mechanism through which Oracle decides on the actual query to be executed based on the driver.
OBIEE uses a "common metadata model" known as the RPD. This has a logical model of your data, along with the physical data source(s) for it. When a user runs a report it is submitted as a "logical" query that the BI Server then compiles using the RPD to generate the necessary SQL query (or queries) against the data sources.
Whilst Hive and Impala definitely work with OBIEE, I've not heard of Drill being successfully used. If you've got the connectivity working then to sort out the query syntax it generates you need to fiddle with the DBFeatures configuration which OBIEE uses to understand what SQL statements are valid for a given database. So if Drill doesn't support, for example, INTERSECT, you simply untick INTERSECT_SUPPORTED (I'm paraphasing the exact dialog terminology).
Team,
I am using HUE-BEEWAX (Hive UI) to execute hive queries. So far, I have been always able to access the query results of queries execute on the same day, but today I see lot of the queries results shown as expired despite running them just an hour back.
my question is?
When does query result set become expired?
What settings control this?
Is it possible to retain this result-set somewhere in HDFS? (how?)
Regards
My understanding is that it's controlled by Hive, not Hue (Beeswax). When HiveServer is restarted it cleans up the scratch directories.
This is controlled by this setting : hive.start.cleanup.scratchdir.
Are you restarting your HiveServers?
Looking through some code, I found that Beeswax sets the scratch directory to "/tmp/hive-beeswax-" + Hadoop Username.
we have problem with slow insert statement using 40 bind variables as columns values. It runs several seconds when running over WAN link and we were not able to nail down the problem, until we used network analyzer. Every single execution of this prepared query required exchanging over 120 packets between client and server to complete. What we can do to to execute it more efficiently?
When I run the same insert with actual parameters(without bind variables) from the same host it completes in tens of miliseconds. There is nothing special about the parameters, there are only short varchars and numbers.
We are using Delphi 6 with ODAC, we tried various versions of ODAC and Oracle client with no avail. On server side we tried both Oracle 10 and 11.
TNS is not designed to work well over WAN.
If it's possible, rewrite your application to use other network layer, like HTTP, which is more efficient.
You can do it using Oracle HTTP Server, for instance.
Have you looked at External Tables? Replaces the need for SQL Loader
Requires Oracle 9i or above though