Pyhive error after upgrading from CDH to CDP private cloud - hadoop

May I have your help to resolve below error from Pyhive Module.
Issue: We have upgraded the Cloudera cluster from CDH version to CDP version. We are using Pyhive python module to get the impala connection from Impala using pyhive hive.connect(User,password,host,port,auth=LDAP).
We are getting below error for some queries which are submitted through pandas read_sql and some queries are getting executed fine and returning DF.
This was fine before upgrade and queries have no issues and all were returning results.
conn = pyhive.Connection(host=impala_host, port=impala_port, username=user, password=password, auth="LDAP")
PFB the stack trace.
data = pd.read_sql(sql, conn)
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 608, in read_sql
chunksize=chunksize,
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 2130, in read_query
data = self._fetchall_as_list(cursor)
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 2144, in _fetchall_as_list
result = cur.fetchall()
File "/usr/local/lib/python3.7/site-packages/pyhive/common.py", line 137, in fetchall
return list(iter(self.fetchone, None))
File "/usr/local/lib/python3.7/site-packages/pyhive/common.py", line 106, in fetchone
self._fetch_while(lambda: not self._data and self._state != self._STATE_FINISHED)
File "/usr/local/lib/python3.7/site-packages/pyhive/common.py", line 46, in _fetch_while
self._fetch_more()
File "/usr/local/lib/python3.7/site-packages/pyhive/hive.py", line 477, in _fetch_more
_check_status(response)
File "/usr/local/lib/python3.7/site-packages/pyhive/hive.py", line 585, in _check_status
raise OperationalError(response)
pyhive.exc.OperationalError: TFetchResultsResp(status=TStatus(statusCode=2, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), hasMoreRows=True, results=None)
We have verified the source code of Pyhive and the cursor.fetchall() not waiting on sleep() and immediately coming out because the query status (code=2) is still running at backend.

Related

How to install pyspark without hadoop?

I want to install pyspark but I don't want to use hadoop because I just want to test out some functions. I followed instructions from a bunch of websites: I used pip to install pyspark, installed jdk 8 and set JAVA_PATH, SPARK_HOME, PATH variables but it's not working.
My program is:
from pyspark import *
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
I am getting this exception:
\Java\jdk1.8.0_291\bin\java was unexpected at this time.
Traceback (most recent call last):
File "c:\Users\ankit\Untitled-1.py", line 4, in <module>
spark = SparkSession.builder.getOrCreate()
File "C:\Users\ankit\AppData\Local\Programs\Python\Python39\lib\site-packages\pyspark\sql\session.py", line 228, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Users\ankit\AppData\Local\Programs\Python\Python39\lib\site-packages\pyspark\context.py", line 384, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:\Users\ankit\AppData\Local\Programs\Python\Python39\lib\site-packages\pyspark\context.py", line 144, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "C:\Users\ankit\AppData\Local\Programs\Python\Python39\lib\site-packages\pyspark\context.py", line 331, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "C:\Users\ankit\AppData\Local\Programs\Python\Python39\lib\site-packages\pyspark\java_gateway.py", line 108, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

How do I connect to a Netcool / Omnibus “Object Server” using JayDeBeApi module along with SAP Sybase JDBC drivers (jconn4.jar) in Python3?

I am new to python programming. I'm trying to connect to a Netcool Object Server using Python3, I am using JayDeBeApi module along with SAP Sybase JDBC drivers (jconn4.jar)
following is the sample script:
import jaydebeapi
server="xxx"
database="xx"
user="xx"
password="xx"
jclassname='com.sybase.jdbc4.jdbc.SybDriver'
url='jdbc:sybase:Tds://'+server+'/'+database
driver_args=[url,user,password]
jars="path/jconn4.jar"
conn=jaydebeapi.connect(jclassname,driver_args,jars)
curs = conn.cursor()
curs.execute("select * from status")
curs.fetchall()`
when I am executing the script it showing an error as follows
File "sample.py", line 12, in <module>
conn=jaydebeapi.connect(jclassname,driver_args,jars)
File "/usr/local/lib/python3.5/site-packages/jaydebeapi/__init__.py", line 381, in connect
jconn = _jdbc_connect(jclassname, url, driver_args, jars, libs)
File "/usr/local/lib/python3.5/site-packages/jaydebeapi/__init__.py", line 199, in _jdbc_connect_jpype
return jpype.java.sql.DriverManager.getConnection(url, *dargs)
RuntimeError: No matching overloads found. at native/common/jp_method.cpp:117
if anyone successfully connected to a Netcool Object Server using JayDeBeApi module in Python3? please share the sample script
thanks
The url format you specified is not correct. The below works for me.
url = jdbc:sybase:Tds:++hostname:++dbport/++dbname
e.g
conn = jaydebeapi.connect('com.sybase.jdbc4.jdbc.SybDriver', ['jdbc:sybase:Tds:hostA:8888/db1','root',''],['path/jconn4.jar'])

HAWQ stop cluster failed

I installed HAWQ from source code. After initializing and starting HAWQ cluster, I tried to stop it with "hawq stop cluster". However, it failed.
The error shows:
[hadoop#Master ~]$ hawq stop cluster
20161217:19:59:31:004594 hawq_stop:Master:hadoop-[INFO]:-Prepare to do 'hawq stop'
20161217:19:59:31:004594 hawq_stop:Master:hadoop-[INFO]:-You can check log in /home/hadoop/hawqAdminLogs/hawq_stop_20161217.log
20161217:19:59:31:004594 hawq_stop:Master:hadoop-[INFO]:-Stop hawq with args: ['stop', 'cluster']
Continue with HAWQ service stop Yy|Nn (default=N):
20161217:19:59:38:004594 hawq_stop:Master:hadoop-[INFO]:-No standby host configured
20161217:19:59:38:004594 hawq_stop:Master:hadoop-[INFO]:-Stop hawq cluster
Traceback (most recent call last):
File "/home/hadoop/hawq/bin/hawq_ctl", line 1276, in <module>
stop_hawq(opts, hawq_dict)
File "/home/hadoop/hawq/bin/hawq_ctl", line 1043, in stop_hawq
instance.run()
File "/home/hadoop/hawq/bin/hawq_ctl", line 891, in run
check_return_code(self._stopAll())
File "/home/hadoop/hawq/bin/hawq_ctl", line 816, in _stopAll
master_result = self._stop_master()
File "/home/hadoop/hawq/bin/hawq_ctl", line 760, in _stop_master
self._stop_master_checks()
File "/home/hadoop/hawq/bin/hawq_ctl", line 712, in _stop_master_checks
self.conn = dbconn.connect(self.dburl, utility=True)
File "/home/hadoop/hawq/lib/python/gppylib/db/dbconn.py", line 211, in connect
cnx = pgdb._connect_(cstr, dbhost, dbport, dbopt, dbtty, dbuser, dbpasswd)
AttributeError: 'module' object has no attribute '_connect_'
At present, I used the alternative way to stop the cluster, that is, stop master and segments separately with pg_ctl.
pg_ctl stop -D <master_data_dir>/<segment_data_dir>
Anything about this error is helpful. Thanks!
Because directly use the command 'pip install pygresql', it will install the latest version(5.0.3) pygresql. In the errors above, pgdb._connect_() is the old version (4.2.2) routine, in 5.0.3 it is pgdb._connect().
The solution is :
pip install pygresql==4.2.2
Before stop cluster, if it's not '-M immediate' stop, hawq will connect to database to check running connections.
From your log, the connection to master node is failed due to python module issues. Seems like pygresql module is not installed properly. Please try to reinstall it.

Python (boto) TypeError launching Spark Cluster

Following is attempt to launch cluster with ten slaves.
12:13:44/sparkup $ec2/spark-ec2 -k sparkeast -i ~/.ssh/myPem.pem \
-s 10 -z us-east-1a -r us-east-1 launch spark2
Here is output. Note that the same command had been successful with the February Master code. Today I had updated to latest 1.4.0-SNAPSHOT
Setting up security groups...
Searching for existing cluster spark2 in region us-east-1...
Spark AMI: ami-5bb18832
Launching instances...
Launched 10 slaves in us-east-1a, regid = r-68a0ae82
Launched master in us-east-1a, regid = r-6ea0ae84
Waiting for AWS to propagate instance metadata...
Waiting for cluster to enter 'ssh-ready' state.........unable to load cexceptions
TypeError
p0
(S''
p1
tp2
Rp3
(dp4
S'child_traceback'
p5
S'Traceback (most recent call last):\n File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1280, in _execute_child\n sys.stderr.write("%s %s (env=%s)\\n" %(executable, \' \'.join(args), \' \'.join(env)))\nTypeError\n'
p6
sb.Traceback (most recent call last):
File "ec2/spark_ec2.py", line 1444, in <module>
main()
File "ec2/spark_ec2.py", line 1436, in main
real_main()
File "ec2/spark_ec2.py", line 1270, in real_main
cluster_state='ssh-ready'
File "ec2/spark_ec2.py", line 869, in wait_for_cluster_state
is_cluster_ssh_available(cluster_instances, opts):
File "ec2/spark_ec2.py", line 833, in is_cluster_ssh_available
if not is_ssh_available(host=dns_name, opts=opts):
File "ec2/spark_ec2.py", line 807, in is_ssh_available
stderr=subprocess.STDOUT # we pipe stderr through stdout to preserve output order
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 709, in __init__
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1328, in _execute_child
raise child_exception
TypeError
The AWS console shows that instances are actually running. So it is unclear what actually failed.
Any hints or workarounds appreciated.
UPDATE This same error occurs when doing login command. It seems to be problem with the boto API - but the cluster itself appears to be OK.
ec2/spark-ec2 -i ~/.ssh/sparkeast.pem login spark2
Searching for existing cluster spark2 in region us-east-1...
Found 1 master, 10 slaves.
Logging into master ec2-54-87-46-170.compute-1.amazonaws.com...
unable to load cexceptions
TypeError
p0
(.. same exception stacktrace as above )
The issue is that the python-2.7.6 installation on my yosemite macbook appears to have become corrupted.
I reset the PATH and PYTHONPATH to point to a custom homebrew installed python version and then the boto - and other python commands including building spark performance project - work fine.

using rabbitmqadmin to access CloudAMQP / Heroku

I'm starting out to learn about AMQP and RabbitMQ.
To get myself going I have used a CLI tool, rabbitmqadmin, to successfully publish data to a RabbitMQ development install I have created upon my Mac OS X box. So far so good, I can publish messages, and watch them dequeue...
However when I come to try the exact same functionality upon the Heroku / CloudAMQP instance the rabbitmqadmin client seems to fall over.
This is the call:
rabbitmqadmin --host lemur.cloudamqp.com --vhost app4444444_heroku.com --user app4444444_heroku.com --password <withheld> publish routing_key=test payload="hello"
...and this is the output:
Traceback (most recent call last):
File "/usr/local/bin/rabbitmqadmin", line 828, in <module>
main()
File "/usr/local/bin/rabbitmqadmin", line 325, in main
method()
File "/usr/local/bin/rabbitmqadmin", line 428, in invoke_get
result = self.post(uri, json.dumps(upload))
File "/usr/local/bin/rabbitmqadmin", line 354, in post
return self.http("POST", path, body)
File "/usr/local/bin/rabbitmqadmin", line 377, in http
resp = conn.getresponse()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1013, in getresponse
response.begin()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 402, in begin
version, status, reason = self._read_status()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 366, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
Any thoughts or ideas gratefully received!
Add --ssl to the command line. CloudAMQP's web ui is https only.

Resources