Hive JDBC getConnection does not return - jdbc

I'm following the hive JDBC tutorial. I could not get it working. When it try to get the connection it just hangs. It does not report any error either. I'm sure the Hive server is running. Any help?
public class HiveJdbcClient {
private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
public static void main(String[] args){
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
System.exit(1);
}
try{
Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", "");
System.out.println("got the connection");
}catch(SQLException e){
e.printStackTrace();
}
}
}
output of the netstat:
$ sudo netstat -anlp | grep 10000
Password:
tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 27738/java
tcp 107 0 127.0.0.1:10000 127.0.0.1:45910 ESTABLISHED 27738/java
tcp 0 0 127.0.0.1:33665 127.0.0.1:10000 ESTABLISHED 24475/java
tcp 0 0 127.0.0.1:45910 127.0.0.1:10000 ESTABLISHED 7445/java
tcp 107 0 127.0.0.1:10000 127.0.0.1:33665 ESTABLISHED 27738/java

Naresh: Try stopping the triffserver, then move to the HIVE_HOME/bin directory from your terminal, then start the hive trift server using the ./hive --service hiveserver 10000 & command. Then try running the program. Do a create table as per the hive client wiki example . Then do a show tables query in the next step. Let us know the result once this steps are followed. We can have a discussion after that.

You can do the following to pinpoint where the hang is happening. Here is an example that I did to trace it in my broken Hive JDBC connection. Note that this is not a concrete solution to any generic hive connection hanging error.
This is an answer to the question: "how can I find out where my JDBC hive connection is hanging? "
What makes this hard to trace is the JDBC dynamic invocation. Instead, you can just manually createa HiveConnection() class. That allows you to add some tracing into the code directly, to see where the hang is happening.
I've traced this by doing the following.
* USING LOG4J *
The thrift and other JDBC hive classes use log4j when connecting, if you turn DEBUG logging on, you can see fine grained errors. You can do this easily by adding
BasicConfigurator.configure()
To your client app. In any case, doing this led me to find that this seems to be stalling in the SASL transport layer. I guess it could be security related but I would assume that a security error would STILL return, and not hang... So I think this may be worthy of a JIRA. I've pasted a follow up question:
How can I trace the failure ot TSaslTransport (hive related)
* ANOTHER METHOD *
1) You can grab a copy of the "HiveConnection" class from github, or wherever, and instantiate a new one:
String url=String.format("jdbc:hive2://%s:%s/default",
server,
port)
Properties p = new Properties();
p.setProperty("host", con);
Connection jdbc = new HiveConnection(url,p);
Then, you can add your debugger hooks or log statements to the HiveConnection() class.
Ulitmately, when i had this error, I traced it to:
openTransport
Which ultimately creates a
org.apache.thrift.transport.TSaslClientTransport
instance.
And the hang happens in this code block:
try {
System.out.println(".....1 carrying on... attempting to open. " + transport.getClass().getName());
transport.open();
System.out.println("done open.");
}
catch (TTransportException e) {
System.out.println("2 fail .");
e.printStackTrace();
throw new SQLException("Could not establish connection to "
+ uri + ": " + e.getMessage(), " 08S01");
}
FYI I've posted a follow up regarding why MY connection failed. It might be related to yours as well... How can I trace the failure ot TSaslTransport (hive related)

I had the same problem/ Check this params:
driverName = "org.apache.hive.jdbc.HiveDriver"
con = DriverManager.getConnection("jdbc:hive2://192.168.1.93:10000/default", "", "");

Related

MariaDB Connector J : autoReconnect does not wok for Basic Faillover

From https://mariadb.com/kb/en/library/about-mariadb-connector-j/, for option autoReconnect, When this parameter enabled when a Failover and Load Balancing Mode is not in use, the connector will simply try to reconnect to its host after a failure. This is referred to as Basic Failover.
But the problem is that the reconnect does not work after server failure. The test code is as follows:
#Test
public void waitTimeoutResultSetTest() throws SQLException, InterruptedException {
try (Connection connection = setBlankConnection("&autoReconnect=true")) {
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery("SELECT 1");
assertTrue(rs.next());
stmt.execute("set session wait_timeout=1");
Thread.sleep(3000); // Wait for the server to kill the connection
try {
rs = stmt.executeQuery("show databases;");
assertTrue(rs.next());
System.out.println("position 1");
} catch (SQLException e) {
//normal exception
System.out.println("position 2");
}
}
}
With autoReconnect, I think the expected result is that it will get to position 1, but actually it will get to position 2, with exception that "Connection reset by peer: socket write error."
My question is that whether the basic failover does not work, or my test code is wrong? I cannot find other information from web, could you kindly give me some explanation if you know about it?

Connection time out in jpos client

I am using jpos client (In one of the class of java Spring MVC Program) to connect the ISO8585 based server, however due to some reason server is not able to respond back, due to which my program keeps waiting for the response and results in hanging my program. So what is the proper way to implement connection timeout?
My client program look like this:
public FieldsModal sendFundTransfer(FieldsModal field){
try {
JposLogger logger = new JposLogger(ISO_LOG_LOCATION);
org.jpos.iso.ISOPackager customPackager = new GenericPackager(ISO_PACKAGER);
ISOChannel channel = new PostChannel(ISO_SERVER_IP, Integer.parseInt(ISO_SERVER_PORT), customPackager);// live
logger.jposlogconfig(channel);
channel.connect();
log4j.info("Connection established using PostChannel");
ISOMsg m = new ISOMsg();
m.set(0, field.getMti());
//m.set(2, field.getField2());
m.set(3, field.getField3());
m.set(4, field.getField4());
m.set(11, field.getField11());
m.set(12, field.getField12());
m.set(17, field.getField17());
m.set(24, field.getField24());
m.set(32, field.getField32());
m.set(34, field.getField34());
m.set(41, field.getField41());
m.set(43, field.getField43());
m.set(46, field.getField46());
m.set(49, field.getField49());
m.set(102,field.getField102());
m.set(103,field.getField103());
m.set(123, field.getField123());
m.set(125, field.getField125());
m.set(126, field.getField126());
m.set(127, field.getField127());
m.setPackager(customPackager);
System.out.println(ISOUtil.hexdump(m.pack()));
channel.send(m);
log4j.info("Message has been send");
ISOMsg r = channel.receive();
r.setPackager(customPackager);
System.out.println(ISOUtil.hexdump(r.pack()));
channel.disconnect();
}catch (Exception err) {
System.out.println("sendFundTransfer : " + err);
}
return field;
}
Well the real proper way would be to use Q2. Given you don't need a persistent connection you coud just set a timeout for the channel.
PostChannel channel = new PostChannel(ISO_SERVER_IP, Integer.parseInt(ISO_SERVER_PORT), customPackager);// live
channel.setTimeout(timeout); //timeout in millies.
This way channel will autodisconnect if nothing happens during the time specified by timeout , and your call to receive will throw an exception.
The alternative is using Q2 and a mux (see QMUX, for which you need to run Q2, or ISOMUX which is kind of deprecated).

Using JDBC Pool in oracle weblogic Server,

I have a question that puzzles me these days. I am using JDBC connection pool in oracle weblogc server for my REST API calls. The package was deployed and was able to handle the incoming requests correctly.
But somehow, after a new request is made, in the db session level, I will get a new session row of "INACTIVE" status, even if I have purposely have the db connection closed in the code. And seems to me, this session will kept for ever. Eventually it kills the pool.
Here is the sample of my code, where "apple" is my connection pool name.
Connection connection = JDBCConnection.getJDBCConnction(apple);
Statement stmt = null;
String query ="select name from user";
String hosts="";
try {
stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(query);
while (rs.next()) {
name = rs.getString("name");
}
} finally {
connection.close();
}
Is there anything extra I need to do ?
Thanks,
Jack
You are likely running into an issue where you are closing the Connection but it does not result in closing the ResultSet or the Statement.
The topic has been explained extensively here and here on SO.

JDBC statement pooling with DB2 does not have significant time difference

I'm using JDBC db2 driver, a.k.a. JT400 to connect to db2 server on Application System/400, a midrange computer system.
My goal is to insert into three Tables, from outside of IBM mainframe, which would be cloud instance(eg. Amazon WS).
To make the performance better
1) I am using already established connections to connect to db2.
(using org.apache.commons.dbcp.BasicDataSource or com.ibm.as400.access.AS400JDBCManagedConnectionPoolDataSource, both are working fine.)
public class AS400JDBCManagedConnectionPoolDataSource extends AS400JDBCManagedDataSource implements ConnectionPoolDataSource, Referenceable, Serializable {
}
public class AS400JDBCManagedDataSource extends ToolboxWrapper implements DataSource, Referenceable, Serializable, Cloneable {
}
2) I want to cache the insert into statements for all three tables, so that I don't have to send query every time and compile every time, which is expensive. I would instead just pass the parameters only. (Obviously I am doing this using JDBC prepared statements)
Based on an official IBM document Optimize Access to DB2 for i5/OS
from Java and WebSphere, page 17-20 - Enabling Extended Dynamic Support, it's possible to cache the statement with AS400JDBCManagedConnectionPoolDataSource.
BUT, the problem is the insert into queries are being compiled each time, which is taking 200ms * 3 queries = 600ms each time.
Example I'm using,
public class CustomerOrderEventHandler extends MultiEventHandler {
private static Logger logger = LogManager.getLogger(CustomerOrderEventHandler.class);
//private BasicDataSource establishedConnections = new BasicDataSource();
//private DB2SimpleDataSource nativeEstablishedConnections = new DB2SimpleDataSource();
private AS400JDBCManagedConnectionPoolDataSource dynamicEstablishedConnections =
new AS400JDBCManagedConnectionPoolDataSource();
private State3 orderState3;
private State2 orderState2;
private State1 orderState1;
public CustomerOrderEventHandler() throws SQLException {
dynamicEstablishedConnections.setServerName(State.server);
dynamicEstablishedConnections.setDatabaseName(State.DATABASE);
dynamicEstablishedConnections.setUser(State.user);
dynamicEstablishedConnections.setPassword(State.password);
dynamicEstablishedConnections.setSavePasswordWhenSerialized(true);
dynamicEstablishedConnections.setPrompt(false);
dynamicEstablishedConnections.setMinPoolSize(3);
dynamicEstablishedConnections.setInitialPoolSize(5);
dynamicEstablishedConnections.setMaxPoolSize(50);
dynamicEstablishedConnections.setExtendedDynamic(true);
Connection connection = dynamicEstablishedConnections.getConnection();
connection.close();
}
public void onEvent(CustomerOrder orderEvent){
long start = System.currentTimeMillis();
Connection dbConnection = null;
try {
dbConnection = dynamicEstablishedConnections.getConnection();
long connectionSetupTime = System.currentTimeMillis() - start;
state3 = new State3(dbConnection);
state2 = new State2(dbConnection);
state1 = new State1(dbConnection);
long initialisation = System.currentTimeMillis() - start - connectionSetupTime;
int[] state3Result = state3.apply(orderEvent);
int[] state2Result = state2.apply(orderEvent);
long state1Result = state1.apply(orderEvent);
dbConnection.commit();
logger.info("eventId="+ getEventId(orderEvent) +
",connectionSetupTime=" + connectionSetupTime +
",queryPreCompilation=" + initialisation +
",insertionOnlyTimeTaken=" +
(System.currentTimeMillis() - (start + connectionSetupTime + initialisation)) +
",insertionTotalTimeTaken=" + (System.currentTimeMillis() - start));
} catch (SQLException e) {
logger.error("Error updating the order states.", e);
if(dbConnection != null) {
try {
dbConnection.rollback();
} catch (SQLException e1) {
logger.error("Error rolling back the state.", e1);
}
}
throw new CustomerOrderEventHandlerRuntimeException("Error updating the customer order states.", e);
}
}
private Long getEventId(CustomerOrder order) {
return Long.valueOf(order.getMessageHeader().getCorrelationId());
}
}
And the States with insert commands look like below,
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
public class State2 extends State {
private static Logger logger = LogManager.getLogger(DetailState.class);
Connection connection;
PreparedStatement preparedStatement;
String detailsCompiledQuery = "INSERT INTO " + DATABASE + "." + getStateName() +
"(" + DetailState.EVENT_ID + ", " +
State2.ORDER_NUMBER + ", " +
State2.SKU_ID + ", " +
State2.SKU_ORDERED_QTY + ") VALUES(?, ?, ?, ?)";
public State2(Connection connection) throws SQLException {
this.connection = connection;
this.preparedStatement = this.connection.prepareStatement(detailsCompiledQuery); // this is taking ~200ms each time
this.preparedStatement.setPoolable(true); //might not be required, not sure
}
public int[] apply(CustomerOrder event) throws StateException {
event.getMessageBody().getDetails().forEach(detail -> {
try {
preparedStatement.setLong(1, getEventId(event));
preparedStatement.setString(2, getOrderNo(event));
preparedStatement.setInt(3, detail.getSkuId());
preparedStatement.setInt(4, detail.getQty());
preparedStatement.addBatch();
} catch (SQLException e) {
logger.error(e);
throw new StateException("Error setting up data", e);
}
});
long startedTime = System.currentTimeMillis();
int[] inserted = new int[0];
try {
inserted = preparedStatement.executeBatch();
} catch (SQLException e) {
throw new StateException("Error updating allocations data", e);
}
logger.info("eventId="+ getEventId(event) +
",state=details,insertionTimeTaken=" + (System.currentTimeMillis() - startedTime));
return inserted;
}
#Override
protected String getStateName() {
return properties.getProperty("state.order.details.name");
}
}
So the flow is each time an event is received(eg. CustomerOrder), it gets the establishedConnection and then asks the states to initialise their statements.
The metrics for timing look as below,
for the first event, it takes 580ms to create the preparedStatements for 3 tables.
{"timeMillis":1489982655836,"thread":"ScalaTest-run-running-CustomerOrderEventHandlerSpecs","level":"INFO","loggerName":"com.xyz.customerorder.events.handler.CustomerOrderEventHandler",
"message":"eventId=1489982654314,connectionSetupTime=1,queryPreCompilation=580,insertionOnlyTimeTaken=938,insertionTotalTimeTaken=1519","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":1,"threadPriority":5}
for the second event, takes 470ms to prepare the statements for 3 tables, which is less than the first event but just < 100ms, I assume it to be drastically less, as it should not even make it to compilation.
{"timeMillis":1489982667243,"thread":"ScalaTest-run-running-PurchaseOrderEventHandlerSpecs","level":"INFO","loggerName":"com.xyz.customerorder.events.handler.CustomerOrderEventHandler",
"message":"eventId=1489982665456,connectionSetupTime=0,queryPreCompilation=417,insertionOnlyTimeTaken=1363,insertionTotalTimeTaken=1780","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":1,"threadPriority":5}
What I'm thinking is since I'm closing preparedStatement for that particular connection, it does not even exist for new connection. If thats the case whats the point of having statement caching at all in multi-threaded environment.
The documentation has similar example, where its making transactions inside the same connection which is not the case for me, as I need to have multiple connections at the same time.
Questions
Primary
Q1) Is DB2 JDBC driver caching the statements at all, between multiple connections? Because I don't see much difference while preparing the statement. (see example, first one takes ~600ms, second one takes ~500ms)
References
ODP = Open Data Path
SQL packages
SQL packages are permanent objects used to store information related
to prepared SQL statements. They can be used by the IBM iSeries Access
for the IBM Toolbox for
Java JDBC driver. They are also used by applications which use the
QSQPRCED (SQL Process Extended Dynamic) API interface.
In the case JDBC, the existence of the SQL package is
checked when the client application issues the first prepare of a SQL
Statement. If the package does not exist, it is created at that time
(even though it may not yet contain any SQL statements)
Tomcat jdbc connection pool configuration - DB2 on iSeries(AS400)
IBM Data Server Driver for JDBC and SQLJ statement caching
A couple of important things to note regarding statement caching:
Because Statement objects are child objects of a given Connection, once the Connection is closed all child objects (e.g. all Statement objects) must also be closed.
It is not possible to associate a statement from one connection with a different connection.
Statement pooling may or may not be done be by a given JDBC driver. Statement pooling may also be performed by a connection management layer (i.e. application server)
Per JDBC spec, default value for Statement.isPoolable() == false and PreparedStatement.isPoolable() == true, however this flag is only a hint to the JDBC driver. There is no guarantee from the spec that statement pooling will occur.
First off, I am not sure if the JT400 driver does statement caching. The document you referenced in your question comment, Optimize Access to DB2 for i5/OS from Java and WebSphere, is specific to using the JT400 JDBC driver with WebSphere application server, and on slide #3 it indicates that statement caching comes from the WebSphere connection management layer, not the native JDBC driver layer. Given that, I'm going to assume that the JT400 JDBC driver doesn't support statement caching on its own.
So at this point you are probably going to want to plug into some sort of app server (unless you want to implement statement caching on your own, which is sort of re-inventing the wheel). I know for sure that both WebSphere Application Server products (traditional and Liberty) support statement caching for any JDBC driver.
For WebSphere Liberty (the newer product), the data source config is the following:
<dataSource jndiName="jdbc/myDS" statementCacheSize="10">
<jdbcDriver libraryRef="DB2iToolboxLib"/>
<properties.db2.i.toolbox databaseName="YOURDB" serverName="localhost"/>
</dataSource>
<library id="DB2iToolboxLib">
<fileset dir="/path/to/jdbc/driver/dir" includes="jt400.jar"/>
</library>
The key bit being the statementCacheSize attribute of <dataSource>, which has a default value of 10.
(Disclaimer, I'm a WebSphere dev, so I'm going to talk about what I know)
First off, the IBM i Java documentation is here: IBM Toolbox for Java
Secondly, I don't see where you are setting the "extended dynamic" property to true which provides
a mechanism for caching dynamic SQL statements on the server. The first
time a particular SQL statement is prepared, it is stored in a SQL
package on the server. If the package does not exist, it is
automatically created. On subsequent prepares of the same SQL
statement, the server can skip a significant part of the processing by
using information stored in the SQL package. If this is set to "true",
then a package name must be set using the "package" property.
I think you're missing some steps in using the managed pool...here's the first example in the IBM docs
import javax.naming.Context;
import javax.naming.InitialContext;
import javax.sql.DataSource;
import com.ibm.as400.access.AS400JDBCManagedConnectionPoolDataSource;
import com.ibm.as400.access.AS400JDBCManagedDataSource;
public class TestJDBCConnPoolSnippet
{
void test()
{
AS400JDBCManagedConnectionPoolDataSource cpds0 = new AS400JDBCManagedConnectionPoolDataSource();
// Set general datasource properties. Note that both connection pool datasource (CPDS) and managed
// datasource (MDS) have these properties, and they might have different values.
cpds0.setServerName(host);
cpds0.setDatabaseName(host);//iasp can be here
cpds0.setUser(userid);
cpds0.setPassword(password);
cpds0.setSavePasswordWhenSerialized(true);
// Set connection pooling-specific properties.
cpds0.setInitialPoolSize(initialPoolSize_);
cpds0.setMinPoolSize(minPoolSize_);
cpds0.setMaxPoolSize(maxPoolSize_);
cpds0.setMaxLifetime((int)(maxLifetime_/1000)); // convert to seconds
cpds0.setMaxIdleTime((int)(maxIdleTime_/1000)); // convert to seconds
cpds0.setPropertyCycle((int)(propertyCycle_/1000)); // convert to seconds
//cpds0.setReuseConnections(false); // do not re-use connections
// Set the initial context factory to use.
System.setProperty(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.fscontext.RefFSContextFactory");
// Get the JNDI Initial Context.
Context ctx = new InitialContext();
// Note: The following is an alternative way to set context properties locally:
// Properties env = new Properties();
// env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.fscontext.RefFSContextFactory");
// Context ctx = new InitialContext(env);
ctx.rebind("mydatasource", cpds0); // We can now do lookups on cpds, by the name "mydatasource".
// Create a standard DataSource object that references it.
AS400JDBCManagedDataSource mds0 = new AS400JDBCManagedDataSource();
mds0.setDescription("DataSource supporting connection pooling");
mds0.setDataSourceName("mydatasource");
ctx.rebind("ConnectionPoolingDataSource", mds0);
DataSource dataSource_ = (DataSource)ctx.lookup("ConnectionPoolingDataSource");
AS400JDBCManagedDataSource mds_ = (AS400JDBCManagedDataSource)dataSource_;
boolean isHealthy = mds_.checkPoolHealth(false); //check pool health
Connection c = dataSource_.getConnection();
}
}

kerberos auth and connection pooling in jdbc

I've got Java web application running on Tomcat with SSO via SPNEGO/Kerberos and I want to pass kerberos ticket to database, Oracle DB in my case (like impersonation in MS products). I've found an example of implementation (http://docs.oracle.com/cd/B28359_01/java.111/b31224/clntsec.htm):
Connection conn = (Connection)Subject.doAs(specificSubject, new PrivilegedExceptionAction({
public Object run() {
Connection con = null;
Properties prop = new Properties();
prop.setProperty(AnoServices.AUTHENTICATION_PROPERTY_SERVICES,"("+AnoServices.AUTHENTICATION_KERBEROS5 + ")");
try {
OracleDriver driver = new OracleDriver();
con = driver.connect(url, prop);
}catch (Exception except){
except.printStackTrace();
}
return con;
}
});
String auth = ((OracleConnection)conn).getAuthenticationAdaptorName();
System.out.println("Authentication adaptor="+auth);
printUserName(conn);
conn.close();
But as it is known to create a new connection is an expensive operation. To solve this problem commonly used connection pooling (like c3p0), but I cant find example, how to combine code above and connection pool. Is there any example?

Resources