PyHive ignoring Hive config - hadoop

I'm intermittently getting the error message
DAG did not succeed due to VERTEX_FAILURE.
when running Hive queries via PyHive. Hive is running on an EMR cluster where hive.vectorized.execution.enabled is set to false in the hive-site.xml file for this reason.
I can set the above property through the configuration on the Hive connection and my query has run successfully every time I've executed it, however I want to confirm that this has fixed the issue and that it is definitely the case that hive-site.xml is being ignored.
Can anyone confirm if this is the expected behavior, or alternatively is there any way to inspect the Hive configuration via PyHive as I've not been able to find any way of doing this?
Thanks!

PyHive is a thin client that connects to HiveServer2, just like a Java or C client (via JDBC or ODBC). It does not use any Hadoop configuration files on your local machine. The HS2 session starts with whatever properties are set server-side.
Same goes for ImPyla BTW.
So it's your responsibility to set custom session properties from your Python code, e.g. execute this statement...
SET hive.vectorized.execution.enabled =False
... before running your SELECT.

Related

Setting up JDBC password dynamically on Apache Zeppelin

Is it possible to set the default.password dynamically e.g. from a file? We have connected Presto to Zeppelin with a JDBC connector successfully, however we are using a different authentication method that requires us to renew the password every day. I have checked the current gitHub repository and found out that there is an interpreter.json that takes in default.password from the interpreter settings on Zeppelin. If I change the default.password to an environment variable, will it affect other JDBC interpreters. Is there a workaround?
Links to the repository:
https://github.com/apache/zeppelin/blob/e63ba8e897a522c6cad099286110c2eaa1496912/jdbc/src/main/resources/interpreter-setting.json
https://github.com/apache/zeppelin/blob/8f45fefb1c45ab163bedb94e3d9a9ef8a35afd91/jdbc/src/main/java/org/apache/zeppelin/jdbc/JDBCInterpreter.java
I figured out the problem. The interpreter.json in the config file stores all the information of each JDBC connection. So, by updating the password with jq command and restarting Zeppelin every day, this will update the password dynamically.

Disabling/Pause database replication using ML-Gradle

I want to disable the Database Replication from the replica cluster in MarkLogic 8 using ML-Gradle. After updating the configurations, I also want to re-enable it.
There are tasks for enabling and disabling flexrep in ML Gradle. But I couldn't found any such thing for Database Replication. How can this be done?
ml-gradle uses the Management API to handle configuration changes. Database Replication is controlled by sending a PUT command to /manage/v2/databases/[id-or-name]/properties. Update your ml-config/databases/content-database.json file (example that does not include that property) to include database-replication, including replication-enabled: true.
To see what that object should look like, you can send a GET request to the properties endpoint.
You can create your own command to set replication-enabled - see https://github.com/rjrudin/ml-gradle/wiki/Writing-your-own-management-task
I'll also add a ticket for making official commands - e.g. mlEnableReplication and mlDisableReplication, with those defaulting to the content database, and allowing for any database to be specified.

Login Hive, log4j file

I'm trying to access to Hive by the command window.
I just run "Hive" in the appropiate directory but I get an error "Login denied".
I've read that log4j is used to log in, but I don't know whether I have to create an account and write my user data there or not.
Thank you very much
The Hive service should be working right now. From a FI-LAB VM of your own, you simply have to log into the Head Node using your Cosmos credentials (if you have no Cosmos credentials, get them by registering here):
[root#your_filab_vm]$ssh cosmos.lab.fi-ware.org
Once logged in the Head Node, type the following command:
[your_cosmos_username#cosmosmaster-gi]$ hive
Logging initialized using configuration in jar:file:/usr/local/hive-0.9.0-shark-0.8.0-bin/lib/hive-common-0.9.0-shark-0.8.0.jar!/hive-log4j.properties
Hive history file=/tmp/<your_cosmos_username>/hive_job_log_<your_cosmos_username>_201407212017_1797291774.txt
hive>
As you can see, in this example your Hive history will be written within:
/tmp/<your_cosmos_username>/hive_job_log_<your_cosmos_username>_201407212017_1797291774.txt

hbase client scan could not initialize org.apache.hadoop.hbase.util.Classes

I use hbase client Scan to get data from remote hbase server cluster. When I set a Filter to the scan, the client will throw a exception:
org.apache.hadoop.ipc:RemoteException:IPC server unable to read call parameters:
Could not init org.apache.hadoop.hbase.util.Classes.
The server side's hbase log:
java.lang.NoClassDefFoundError:Could not initial class org.apache.hadoop.hbase.util.Classes
at org.apache.hadoop.hbase.client.Scan.readFields(Scan.java:590)
But it works well without a Filter. By the way, the filter is NOT a custom Filter.
My hbase version is 0.94.10 and hadoop is 1.2.1. I have copy hadoop-core.jar to the lib directory under hbase.
Don't forget to check JDK version. In IntelliJ DB for example I was having this issue because I had both JDK 15 and JDK 8 installed and the phoenix version I was using was intended to Java 8 so once I changed the JVM it worked.
In IntelliJ: You can change the Driver VM by going to Data Sources and Drivers -> NameYourCustomDriver -> Advanced (Tab) -> VM home path (field)
See this image example for reference: IntelliJ example
The org.apache.hadoop.hbase.util.Classes puts some initial code in its static block. So it will be initialized only once. When it initialize something the first time, if RunTimeException is thrown, it will not initialize anymore unless you restart your hbase cluster. In the initial block, it will create directory, and if it fails to create the directory, runtimeException will be thrown.
During initialization of org.apache.hadoop.hbase.util.Classes, if the configured value for "hbase.local.dir" doesn't exist, a runtime exception will be thrown and org.apache.hadoop.hbase.util.Classes will fail to initialize. That runtime exception causes a ClassDefNotFoundException to be thrown which is what is eventually reported in the logs.
Make sure these exist and are writeable by HBase:
hbase.local.dir (default: ${hbase.tmp.dir}/local)
hbase.tmp.dir (default: ${java.io.tmpdir}/hbase-${user.name})

JDBC URL for Oracle XA client

Using the JDBC driver oracle.jdbc.xa.client.OracleXADataSource, what is the correct format of the JDBC URL? The thin format of
jdbc:oracle:thin:#host:port:sid
does not work. WebSphere is reporting that the given url (which is otherwise correct) is invalid.
The test connection operation failed for data source Oracle MyDB (XA) on
server nodeagent at node MY_node with the following exception:
java.sql.SQLException: Invalid Oracle URL specifiedDSRA0010E: SQL State = 99999,
Error Code = 17,067. View JVM logs for further details.
There is nothing in the JVM logs.
Whether you use a XA Driver or not, the JDBC connection string is the same (and the format of your question is correct).
For me the issue resolved by adding alias name, username and password in JAAS - J2C authentication data. And also selecting this entry as Component-managed authentication alias.
In case this happens to anyone else. The problem went away after restarting websphere.
In my case, the problem went away when I change the authentication property of the jdbc resource reference from Authentication=Application to Authentication=Container
Had the same issue. Dont know about simple deployments, but on a two nodes cluster, I restarted the first node, and the connection started working on it (not on the second). Restarted the second node, and the connection started working there too.
So just restart the nodes (I also restarted the nodeAgents, but i don't know if it's necessary).
if you are doing using wsadmin command then you need to stop manager,stop node,start manager, sync node and then start node (I mean full sync). Hopefully this will resolve the issue. I dont know why but this resolves my issue.

Resources