DataStax Enterprise Amazon AMI - amazon-ec2

I'm trying to launch a new instance of DataStax AMI on a EC2 Amazon instance. I tried this in 2 different regions (us-east and eu-west), using these AMIs: ami-ada2b6c4, ami-814ec2e8 (us-east) and ami-7f33cd08, ami-b2212dc6 (eu-west)
I followed this documentation:
http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installAMI.html
So this is what I've done so far:
I've created a new security group (with those specific ports - I cannot upload the print screen because I have just created this account)
I've create a new key pair
I've launched the DataStax AMI with these configuration details:
--clustername cluster --totalnodes 4 --version enterprise --username my_name
--password my_password --searchnodes 2
(I have verified my credentials - I can login here http://debian.datastax.com/enterprise/ )
After selecting the previous created security group & key pair I launched the instance.
I've connected to my DataStax Enterprise EC2 instance and this is the displayed log:
Cluster started with these options:
--clustername cluster --totalnodes 4 --version enterprise --username my_name
--password **** --searchnodes 2
03/12/15-08:59:23 Reflector: Received 1 of 2 responses from: [u'172.31.34.171']...
Exception seen in ds1_launcher.py. Please check ~/datastax_ami/ami.log for more info.
Please visit ....
and the ami.log shows these messages:
[INFO] 03/12/15-08:59:23 Reflector: Received 1 of 2 responses from: [u'172.31.34.171']
[ERROR] EC2 is experiencing some issues and has not allocated all of the resources in under 10 minutes.
Aborting the clustering of this reservation. Please try again.
[ERROR] Exception seen in ds1_launcher.py:
Traceback (most recent call last):
File "/home/ubuntu/datastax_ami/ds1_launcher.py", line 22, in initial_configurations
ds2_configure.run()
File "/home/ubuntu/datastax_ami/ds2_configure.py", line 1135, in run
File "/home/ubuntu/datastax_ami/ds2_configure.py", line 57, in exit_path
AttributeError: EC2 is experiencing some issues and has not allocated all of the resources in under 10 minutes.
Aborting the clustering of this reservation. Please try again.
Any suggestion on how to fix this?
Any help will be greatly appreciated!
Have a nice day!

seems to be an issue with Amazon EC2 instances.
looks like there are no available instances of that type in that region at that AZ at that moment.
another thing you can try is just using the repos install just opscenter in a ec2 instance that you already have (or a new one), and try to create the new nodes/cluster through opscenter, it's pretty simple and will let you choose the AMI, secGroup and keyPair you already have.

Related

How to use databricks dbx with an Azure VPN?

I am using dbx to deploy and launch jobs on ephemeral clusters on Databricks.
I have initialized the cicd-sample-project and connected to a fresh empty Databricks Free trial environment and everything works (this means, that I can successfully deploy the python package with this command
python -m dbx deploy cicd-sample-project-sample-etl --assets-only
and execute it through the python -m dbx launch cicd-sample-project-sample-etl --from-assets --trace
When I try to launch the same exact job on my company's Databricks environment the deploy command goes through. The only difference is that my company's Databricks environment connects to Azure through a VPN.
Therefore, I added some rules to my firewall
firewall_rules_img
firewall_rules_2_img
but when I give the dbx launch command I get the following error
error_node_img
and in the log this message appears
WARN MetastoreMonitor: Failed to connect to the metastore InternalMysqlMetastore(DbMetastoreConfig{host=consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com, port=3306, dbName=organization5367007285973203, user=[REDACTED]}). (timeSinceLastSuccess=0) org.skife.jdbi.v2.exceptions.UnableToObtainConnectionException: java.sql.SQLTransientConnectionException: metastore-monitor - Connection is not available, request timed out after 15090ms. at org.skife.jdbi.v2.DBI.open(DBI.java:230)
I am not even trying to write on the metastore, I am just logging some data:
from cicd_sample_project.common import Task
class SampleETLTask(Task):
def launch(self):
self.logger.info("Launching sample ETL task")
self.logger.info("Sample ETL task finished!")
def entrypoint(): # pragma: no cover
task = SampleETLTask()
task.launch()
if __name__ == "__main__":
entrypoint()
Does someone encountered the same problem? Where you able to use Databricks-dbx with an Azure VPN?
Please let me know and thanks for your help.
PS: If needed I can provide the full log
In your case the egress traffic isn't configured correctly - it's not the DBX problem, but general Databricks networking problem. Just make sure that outgoing traffic is allowed to the ports and destinations described in the documentation.

How to safely fix an AWOL ambari system user?

I'm a student working on a test cluster, consisting of around 25 hosts. We installed using Ambari and have FreeIpa running on a host as a dns and ldap server. The rest are typical Hadoop
infrastructure. Hive was failing and I wondered whether the db connection parameters used during the Ambari installation were incorrect and I tried to find a way to re-run the db connection process. I didn't get anywhere and it was late so I left it, ambari interface working.
Next morning, ambari webUI seems to be down. I thought that maybe the webserver needed restarted so I tried the following:
[akidd#dw ~]$ sudo ambari-server start
Using python /usr/bin/python
Starting ambari-server
ERROR: Exiting with exit code 1.
REASON: Unable to detect a system user for Ambari Server.
- If this is a new setup, then run the "ambari-server setup" command to create the user
- If this is an upgrade of an existing setup, run the "ambari-server upgrade" command.
Refer to the Ambari documentation for more information on setup and upgrade.
Can anyone help me to understand what could have happened?
If I run ambari-server setup will the existing cluster be ok assuming I create everything like for like with how it was originally?
Thanks for your help!
#user3535074 You should try to start it with the user that installed it.
If you do run ambari-server setup as current user, remember to choose No the following options:
Customize user account for ambari-server daemon [y/n] (n)? n
Do you want to change Oracle JDK [y/n] (n)? n
Enter advanced database configuration [y/n] (n)? n
More info on the following post, including how to backup ambari database before running setup again:
https://community.cloudera.com/t5/Support-Questions/Ambari-server-failed-to-start-after-system-reboot-Below-is/td-p/203806

Could not connect VM on port 22 in Google cloud

I have installed hadoop (HDP ) in the google cloud vm instance , after sometime when i tried to connect the machine again, it is showing error :
"We are unable to connect to VM on port 22" .
To get additional debug logs try to SSH using verbose flag using the following command:
$ gcloud compute ssh INSTANCE_NAME --zone ZONE --ssh-flag="-vvv"
If the above step doesn’t help, connect to the instance using the serial console of the affected instance and check if this issue has to do with open port as Abhinav mentioned.
You find additional SSH troubleshooting information in the Help Center Article.

salt master minion communication error in aws instances

I have installed salt-master and minion on aws ec2 instances and configured cloud in the below formatenter image description here
and i came up with the error like permission denied(publickey) can any one suggest me a better solution
Owner of the key file must be the same user that start your services (salt-master/minion).
Otherwise OS won't allow you to read such file
The command is:
chown this_maybe_salt_user /path/to/your/key

windows cluster - SSH seems to be failing

Two physical systems, each is running Server 2008
Installed DataStax Community (version 2.0.7 64-bit) on each (that is the version number in the DataStax package I downloaded according to the file name)
OpCenter running locally shows a running 1 node cluster. I can execute IO on the system at the command line (using cassandra-stress)
The system names are "5017-cassandra-1" and "5017-cassandra-2"
I'd like to create a cluster in which both nodes participate. This is not a production environment (I'm just trying to learn).
From OpCenter on 5017-cassandra-1 I go to Nodes (I see 1 node of course), Add Nodes.
I leave the "Package" drop down as default (but the latest version shown in the drop down is 2.0.6), enter the IP address of 5017-cassandra-2. I add the Administrator user name and password in the "Node Creditials (sudo)" fields and press "Add Nodes" and get:
Error provisioning cluster: Unable to SSH to some of the hosts
Unable to SSH to 10.108.14.224:
global name 'get_output' is not defined
Reading that I needed to add OpenSSL - I installed the runtime redistributables (on both system) and Win64 OpenSSL-1_0_1h.
The error persists.
any suggestions or link to a step-by-step would be appreciated.

Resources