HAWQ stop cluster failed - hawq

I installed HAWQ from source code. After initializing and starting HAWQ cluster, I tried to stop it with "hawq stop cluster". However, it failed.
The error shows:
[hadoop#Master ~]$ hawq stop cluster
20161217:19:59:31:004594 hawq_stop:Master:hadoop-[INFO]:-Prepare to do 'hawq stop'
20161217:19:59:31:004594 hawq_stop:Master:hadoop-[INFO]:-You can check log in /home/hadoop/hawqAdminLogs/hawq_stop_20161217.log
20161217:19:59:31:004594 hawq_stop:Master:hadoop-[INFO]:-Stop hawq with args: ['stop', 'cluster']
Continue with HAWQ service stop Yy|Nn (default=N):
20161217:19:59:38:004594 hawq_stop:Master:hadoop-[INFO]:-No standby host configured
20161217:19:59:38:004594 hawq_stop:Master:hadoop-[INFO]:-Stop hawq cluster
Traceback (most recent call last):
File "/home/hadoop/hawq/bin/hawq_ctl", line 1276, in <module>
stop_hawq(opts, hawq_dict)
File "/home/hadoop/hawq/bin/hawq_ctl", line 1043, in stop_hawq
instance.run()
File "/home/hadoop/hawq/bin/hawq_ctl", line 891, in run
check_return_code(self._stopAll())
File "/home/hadoop/hawq/bin/hawq_ctl", line 816, in _stopAll
master_result = self._stop_master()
File "/home/hadoop/hawq/bin/hawq_ctl", line 760, in _stop_master
self._stop_master_checks()
File "/home/hadoop/hawq/bin/hawq_ctl", line 712, in _stop_master_checks
self.conn = dbconn.connect(self.dburl, utility=True)
File "/home/hadoop/hawq/lib/python/gppylib/db/dbconn.py", line 211, in connect
cnx = pgdb._connect_(cstr, dbhost, dbport, dbopt, dbtty, dbuser, dbpasswd)
AttributeError: 'module' object has no attribute '_connect_'
At present, I used the alternative way to stop the cluster, that is, stop master and segments separately with pg_ctl.
pg_ctl stop -D <master_data_dir>/<segment_data_dir>
Anything about this error is helpful. Thanks!

Because directly use the command 'pip install pygresql', it will install the latest version(5.0.3) pygresql. In the errors above, pgdb._connect_() is the old version (4.2.2) routine, in 5.0.3 it is pgdb._connect().
The solution is :
pip install pygresql==4.2.2

Before stop cluster, if it's not '-M immediate' stop, hawq will connect to database to check running connections.
From your log, the connection to master node is failed due to python module issues. Seems like pygresql module is not installed properly. Please try to reinstall it.

Related

Pyhive error after upgrading from CDH to CDP private cloud

May I have your help to resolve below error from Pyhive Module.
Issue: We have upgraded the Cloudera cluster from CDH version to CDP version. We are using Pyhive python module to get the impala connection from Impala using pyhive hive.connect(User,password,host,port,auth=LDAP).
We are getting below error for some queries which are submitted through pandas read_sql and some queries are getting executed fine and returning DF.
This was fine before upgrade and queries have no issues and all were returning results.
conn = pyhive.Connection(host=impala_host, port=impala_port, username=user, password=password, auth="LDAP")
PFB the stack trace.
data = pd.read_sql(sql, conn)
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 608, in read_sql
chunksize=chunksize,
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 2130, in read_query
data = self._fetchall_as_list(cursor)
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 2144, in _fetchall_as_list
result = cur.fetchall()
File "/usr/local/lib/python3.7/site-packages/pyhive/common.py", line 137, in fetchall
return list(iter(self.fetchone, None))
File "/usr/local/lib/python3.7/site-packages/pyhive/common.py", line 106, in fetchone
self._fetch_while(lambda: not self._data and self._state != self._STATE_FINISHED)
File "/usr/local/lib/python3.7/site-packages/pyhive/common.py", line 46, in _fetch_while
self._fetch_more()
File "/usr/local/lib/python3.7/site-packages/pyhive/hive.py", line 477, in _fetch_more
_check_status(response)
File "/usr/local/lib/python3.7/site-packages/pyhive/hive.py", line 585, in _check_status
raise OperationalError(response)
pyhive.exc.OperationalError: TFetchResultsResp(status=TStatus(statusCode=2, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), hasMoreRows=True, results=None)
We have verified the source code of Pyhive and the cursor.fetchall() not waiting on sleep() and immediately coming out because the query status (code=2) is still running at backend.

Hyperledger Indy: unable to init node, init_indy_node failed with error

I am following the instruction at https://github.com/hyperledger/indy-node/blob/master/docs/source/start-nodes.md
My OS is Ubuntu 16.04.6 LTS.
I have managed to install indy-node by following
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys CE7709D068DB5E88
sudo bash -c 'echo "deb https://repo.sovrin.org/deb xenial stable" >> /etc/apt/sources.list'
sudo apt-get update
sudo apt-get install indy-node
Now at step2, it says:
set Network name in config file
the location of the config depends on how a Node was installed. It's usually inside /etc/indy for Ubuntu.
the following needs to be added: NETWORK_NAME={network_name} where {network_name} matches the one in genesis transaction files above
I am confused as the document never mentioned any "genesis transaction files" above. So I decide to set NETWORK_NAME = sandbox in my case.
Now I start to generate keys with command init_indy_node Alpha 0.0.0.0 9701 0.0.0.0 9702 --seed 111111111111111111111111111Alpha, and I get the error:
Traceback (most recent call last):
File "/usr/local/bin/init_indy_keys", line 6, in <module>
from plenum.common.keygen_utils import initNodeKeysForBothStacks
File "/usr/local/lib/python3.5/dist-packages/plenum/__init__.py", line 87, in <module>
setup_plugins()
File "/usr/local/lib/python3.5/dist-packages/plenum/__init__.py", line 50, in setup_plugins
config = getConfigOnce()
File "/usr/local/lib/python3.5/dist-packages/plenum/common/config_util.py", line 106, in getConfigOnce
return _getConfig(general_config_dir)
File "/usr/local/lib/python3.5/dist-packages/plenum/common/config_util.py", line 87, in _getConfig
config.GENERAL_CONFIG_FILE))
File "/usr/local/lib/python3.5/dist-packages/plenum/common/config_util.py", line 32, in extend_with_external_config
config = getInstalledConfig(*extender)
File "/usr/local/lib/python3.5/dist-packages/plenum/common/config_util.py", line 26, in getInstalledConfig
spec.loader.exec_module(config)
File "/etc/indy/indy_config.py", line 2, in <module>
NETWORK_NAME = sandbox
NameError: name 'sandbox' is not defined
Can anyone help to create node keys and init them.
I have referred to Hyperledger Indy: Create genesis transaction file and looks like my problem is prior to this.
I have also tried von-network and I am able to start the network in docker image.
The only problem is I am not able to generate my own keys.
Thanks
OK.. Looks like I have found the reason: the indy_config file is actually a .py file, not a "pure" config file like json or yml, I need to add ' for the NETWORK_NAME

Running Pyspark on Pycharm

On a Mac (v. 10.14.5), I am trying to run PySpark programs in PyCharm (professional edition, v. 19.2).
I know my simple PySpark program is fine, because when I run it with spark-submit outside PyCharm from the terminal, using Spark I installed via brew, it works as expected. I have tried linking PyCharm to this version of Spark, but am getting other issues.
I followed multiple instructions online to install pyspark within Pycharm (Preferences -> Project Interpreter), and set the SPARK_HOME environment variable to the appropriate venv directory (Run -> Edit Configurations -> Environment Variables). For example, this stackoverflow thread.
But, I get an error message when I run the program:
Failed to find Spark jars directory (/Users/rahul/PycharmProjects/spark-demoII/venv/assembly/target/scala-2.12/jars).
You need to build Spark with the target "package" before running this program.
Traceback (most recent call last):
File "/Users/rahul/PycharmProjects/spark-demoII/run.py", line 6, in <module>
sc = SparkContext("local", "SimpleApp")
File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/context.py", line 133, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/context.py", line 316, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/java_gateway.py", line 46, in launch_gateway
return _launch_gateway(conf)
File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/java_gateway.py", line 108, in _launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
Process finished with exit code 1
Anyone know how to get PyCharm to run Pyspark programs on a similar machine?
In response to #pissal suggestion:
I tried that previously but that version of spark does work. I tried it again anyway: after switching to a virtual environment, I did a pip install pyspark. To ensure that this version of spark works, I ran a spark-submit run.py (outside of PyCharm), and here is the error message.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/rahul/.virtualenvs/test1/lib/python3.7/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:348)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$secMgr$1(SparkSubmit.scala:348)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3720)
at java.base/java.lang.String.substring(String.java:1909)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52)
... 25 more
So the reason this was happening was that pyspark has not been updated to use the latest version of Java. After removing Java version 13, I made sure my home brew installation of spark uses java version 1.8. Then added the following to the Environment Variables in Run -> Edit Configurations in Pycharm:
SPARK_HOME=/usr/local/Cellar/apache-spark/2.4.4/libexec
With these settings I can run pyspark jobs in PyCharm.

graphlab create: unable to start cluster in aws

At the moment I'm trying to create a cluster in aws ec2 with Graphlab Create. The code is as follows:
import graphlab as gl
ec2config = gl.deploy.Ec2Config(region='us-west-2', instance_type='m3.large',
aws_access_key_id='secret-acces-key-id',
aws_secret_access_key='secret-access-key')
ec2 = gl.deploy.ec2_cluster.create(name='Test Cluster',
s3_path='s3://test-big-data-2016', ec2_config=ec2config, idle_shutdown_timeout=3600, num_hosts=1)
When the above code is executed I get the following error:
Traceback (most recent call last):
File "test.py", line 59, in
ec2 = gl.deploy.ec2_cluster.create(name='Test Cluster', s3_path='s3://test-big-data-2016', ec2_config=ec2config, idle_shutdown_timeout=36000, num_hosts=1)
File "/Users/remco/anaconda/envs/gl-env/lib/python2.7/site-packages/graphlab/deploy/ec2_cluster.py", line 83, in create
cluster.start()
File "/Users/remco/anaconda/envs/gl-env/lib/python2.7/site-packages/graphlab/deploy/ec2_cluster.py", line 233, in start
self.idle_shutdown_timeout
File "/Users/remco/anaconda/envs/gl-env/lib/python2.7/site-packages/graphlab/deploy/_executionenvironment.py", line 372, in _start_commander_host
raise RuntimeError('Unable to start host(s). Please terminate '
RuntimeError: Unable to start host(s). Please terminate manually from the AWS console.
When I look in EC2 Management Console a new instance is launched and running. But still getting the error in the terminal.
I really don't know what I'm doing wrong here. I followed the exact instructions from: https://turi.com/learn/userguide/deployment/pipeline-example.html

Python (boto) TypeError launching Spark Cluster

Following is attempt to launch cluster with ten slaves.
12:13:44/sparkup $ec2/spark-ec2 -k sparkeast -i ~/.ssh/myPem.pem \
-s 10 -z us-east-1a -r us-east-1 launch spark2
Here is output. Note that the same command had been successful with the February Master code. Today I had updated to latest 1.4.0-SNAPSHOT
Setting up security groups...
Searching for existing cluster spark2 in region us-east-1...
Spark AMI: ami-5bb18832
Launching instances...
Launched 10 slaves in us-east-1a, regid = r-68a0ae82
Launched master in us-east-1a, regid = r-6ea0ae84
Waiting for AWS to propagate instance metadata...
Waiting for cluster to enter 'ssh-ready' state.........unable to load cexceptions
TypeError
p0
(S''
p1
tp2
Rp3
(dp4
S'child_traceback'
p5
S'Traceback (most recent call last):\n File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1280, in _execute_child\n sys.stderr.write("%s %s (env=%s)\\n" %(executable, \' \'.join(args), \' \'.join(env)))\nTypeError\n'
p6
sb.Traceback (most recent call last):
File "ec2/spark_ec2.py", line 1444, in <module>
main()
File "ec2/spark_ec2.py", line 1436, in main
real_main()
File "ec2/spark_ec2.py", line 1270, in real_main
cluster_state='ssh-ready'
File "ec2/spark_ec2.py", line 869, in wait_for_cluster_state
is_cluster_ssh_available(cluster_instances, opts):
File "ec2/spark_ec2.py", line 833, in is_cluster_ssh_available
if not is_ssh_available(host=dns_name, opts=opts):
File "ec2/spark_ec2.py", line 807, in is_ssh_available
stderr=subprocess.STDOUT # we pipe stderr through stdout to preserve output order
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 709, in __init__
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1328, in _execute_child
raise child_exception
TypeError
The AWS console shows that instances are actually running. So it is unclear what actually failed.
Any hints or workarounds appreciated.
UPDATE This same error occurs when doing login command. It seems to be problem with the boto API - but the cluster itself appears to be OK.
ec2/spark-ec2 -i ~/.ssh/sparkeast.pem login spark2
Searching for existing cluster spark2 in region us-east-1...
Found 1 master, 10 slaves.
Logging into master ec2-54-87-46-170.compute-1.amazonaws.com...
unable to load cexceptions
TypeError
p0
(.. same exception stacktrace as above )
The issue is that the python-2.7.6 installation on my yosemite macbook appears to have become corrupted.
I reset the PATH and PYTHONPATH to point to a custom homebrew installed python version and then the boto - and other python commands including building spark performance project - work fine.

Resources