graphlab create: unable to start cluster in aws - amazon-ec2

At the moment I'm trying to create a cluster in aws ec2 with Graphlab Create. The code is as follows:
import graphlab as gl
ec2config = gl.deploy.Ec2Config(region='us-west-2', instance_type='m3.large',
aws_access_key_id='secret-acces-key-id',
aws_secret_access_key='secret-access-key')
ec2 = gl.deploy.ec2_cluster.create(name='Test Cluster',
s3_path='s3://test-big-data-2016', ec2_config=ec2config, idle_shutdown_timeout=3600, num_hosts=1)
When the above code is executed I get the following error:
Traceback (most recent call last):
File "test.py", line 59, in
ec2 = gl.deploy.ec2_cluster.create(name='Test Cluster', s3_path='s3://test-big-data-2016', ec2_config=ec2config, idle_shutdown_timeout=36000, num_hosts=1)
File "/Users/remco/anaconda/envs/gl-env/lib/python2.7/site-packages/graphlab/deploy/ec2_cluster.py", line 83, in create
cluster.start()
File "/Users/remco/anaconda/envs/gl-env/lib/python2.7/site-packages/graphlab/deploy/ec2_cluster.py", line 233, in start
self.idle_shutdown_timeout
File "/Users/remco/anaconda/envs/gl-env/lib/python2.7/site-packages/graphlab/deploy/_executionenvironment.py", line 372, in _start_commander_host
raise RuntimeError('Unable to start host(s). Please terminate '
RuntimeError: Unable to start host(s). Please terminate manually from the AWS console.
When I look in EC2 Management Console a new instance is launched and running. But still getting the error in the terminal.
I really don't know what I'm doing wrong here. I followed the exact instructions from: https://turi.com/learn/userguide/deployment/pipeline-example.html

Related

Keep on getting error while trying to upload to esp32-cam

I just got my new esp32-cam and I keeps giving me the error below even though i did everything in the tutorial video I found on youtube correctly and even watched several others and it still gives the error
Traceback (most recent call last):
File "esptool.py", line 3682, in <module>
File "esptool.py", line 3675, in _main
File "esptool.py", line 3329, in main
File "esptool.py", line 263, in __init__
File "site-packages\serial\__init__.py", line 88, in serial_for_url
File "site-packages\serial\serialwin32.py", line 78, in open
File "site-packages\serial\serialwin32.py", line 222, in _reconfigure_port
serial.serialutil.SerialException: Cannot configure port, something went wrong. Original message: WindowsError(31, 'A device attached to the system is not functioning.')
Failed to execute script esptool
the selected serial port Failed to execute script esptool
does not exist or your board is not connected

botot3 attach_volume throwing volume not available

I am trying to attach volume to instance using boto3 but its failed to attach with below error
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (IncorrectState) when calling the AttachVolume operation: vol-xxxxxxxxxxxxxxx is not 'available'.
I can see volumme exists in aws console but somehow boto3 is not able to attach volume.
os.environ['AWS_DEFAULT_REGION'] = "us-west-1"
client = boto3.client('ec2', aws_access_key_id=access_key, aws_secret_access_key=secret_key,
region_name='us-west-1')
response1 = client.attach_volume(
VolumeId=volume_id,
InstanceId=instance_id,
Device='/dev/sdg',
)
I tried using aws cli for attaching the same and its working fine after exporting AWS_DEFAULT_REGION="us-west-1"
Also tried exporting the same in python script using os.environ['AWS_DEFAULT_REGION'] = "us-west-1" but python script is failing with the same error as mentioned above.
I figured it out. I am not giving enough time after creating ebs volume. I am able to attach now after adding sleep

Getting permissions exception when calling amazondax service

I'm using the amazondax service from an AWS Lambda method, and getting an exception that indicates missing permissions - but I don't know what permissions are necessary for this. Both the Lambda method and my DAX cluster are setup with the same VPC subnets and security groups. I'm getting the following exception:
[ERROR] 2018-12-11T23:06:50.457Z 70c80374-fd99-11e8-bac1-318371e7b8ed Failed to retrieve endpoints
Traceback (most recent call last):
File "/var/task/amazondax/Cluster.py", line 211, in _pull
new_endpoints = self._pull_from(ip, port)
File "/var/task/amazondax/Cluster.py", line 222, in _pull_from
endpoints = client.endpoints()
File "/var/task/amazondax/DaxClient.py", line 192, in endpoints
return self._decode_result('endpoints', None, Assemblers.endpoints_455855874_1, tube)
File "/var/task/amazondax/DaxClient.py", line 227, in _decode_result
return self._handle_error(operation_name, tube)
File "/var/task/amazondax/DaxClient.py", line 233, in _handle_error
raise DaxServiceError(operation_name, message, codes, *exc_info)
amazondax.DaxError.DaxServiceError: An error occurred (Unknown) when calling the endpoints operation: Client does not have permission to invoke Endpoints
I assume the last line "Client does not have permission..." is the key to this, but I'm having trouble figuring out exactly what permission(s) are required.
Here is the code that is breaking:
dax = amazondax.AmazonDaxClient(session, region_name='us-east-1', endpoint_url='mydaxcluster.blahblahblah.cache.amazonaws.com:8111')
You will need to add "dax:" permissions for the operations you need to the policy associated with the IAM user used in the session.

Python (boto) TypeError launching Spark Cluster

Following is attempt to launch cluster with ten slaves.
12:13:44/sparkup $ec2/spark-ec2 -k sparkeast -i ~/.ssh/myPem.pem \
-s 10 -z us-east-1a -r us-east-1 launch spark2
Here is output. Note that the same command had been successful with the February Master code. Today I had updated to latest 1.4.0-SNAPSHOT
Setting up security groups...
Searching for existing cluster spark2 in region us-east-1...
Spark AMI: ami-5bb18832
Launching instances...
Launched 10 slaves in us-east-1a, regid = r-68a0ae82
Launched master in us-east-1a, regid = r-6ea0ae84
Waiting for AWS to propagate instance metadata...
Waiting for cluster to enter 'ssh-ready' state.........unable to load cexceptions
TypeError
p0
(S''
p1
tp2
Rp3
(dp4
S'child_traceback'
p5
S'Traceback (most recent call last):\n File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1280, in _execute_child\n sys.stderr.write("%s %s (env=%s)\\n" %(executable, \' \'.join(args), \' \'.join(env)))\nTypeError\n'
p6
sb.Traceback (most recent call last):
File "ec2/spark_ec2.py", line 1444, in <module>
main()
File "ec2/spark_ec2.py", line 1436, in main
real_main()
File "ec2/spark_ec2.py", line 1270, in real_main
cluster_state='ssh-ready'
File "ec2/spark_ec2.py", line 869, in wait_for_cluster_state
is_cluster_ssh_available(cluster_instances, opts):
File "ec2/spark_ec2.py", line 833, in is_cluster_ssh_available
if not is_ssh_available(host=dns_name, opts=opts):
File "ec2/spark_ec2.py", line 807, in is_ssh_available
stderr=subprocess.STDOUT # we pipe stderr through stdout to preserve output order
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 709, in __init__
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1328, in _execute_child
raise child_exception
TypeError
The AWS console shows that instances are actually running. So it is unclear what actually failed.
Any hints or workarounds appreciated.
UPDATE This same error occurs when doing login command. It seems to be problem with the boto API - but the cluster itself appears to be OK.
ec2/spark-ec2 -i ~/.ssh/sparkeast.pem login spark2
Searching for existing cluster spark2 in region us-east-1...
Found 1 master, 10 slaves.
Logging into master ec2-54-87-46-170.compute-1.amazonaws.com...
unable to load cexceptions
TypeError
p0
(.. same exception stacktrace as above )
The issue is that the python-2.7.6 installation on my yosemite macbook appears to have become corrupted.
I reset the PATH and PYTHONPATH to point to a custom homebrew installed python version and then the boto - and other python commands including building spark performance project - work fine.

nginx + python app -- how to enable error logging/stack trace

I have a Flask app running on nginx + uWSGI.
On my local server (non-nginx), I get a nice stack trace + error reporting for exceptions.
Like this:
$ python run.py
Traceback (most recent call last):
File "run.py", line 1, in <module>
from myappname import app
File "/home/me/myappname/myappname/__init__.py", line 27, in <module>
file_handler.setLevel(logging.debug)
File "/usr/lib/python2.7/logging/__init__.py", line 710, in setLevel
self.level = _checkLevel(level)
File "/usr/lib/python2.7/logging/__init__.py", line 190, in _checkLevel
raise TypeError("Level not an integer or a valid string: %r" % level)
On nginx, there is next to no logging whatsoever (in /var/log/nginx/error.log).
This post suggests adding app.logger.exception('Failed') to my script, which didn't help.
How do I enable this sort of logging for debugging purposes?
Nginx will capture your app's console output, but you must make the app recover from exceptions. Else, you'll only get 500 or 400 errors from Nginx.
Try running the app off Nginx until it seems stable.
Use the logging module to capture app status information to your own log file. This strategy will be useful in the long run.

Resources