How to restart single node hadoop cluster on ec2 - hadoop

I have installed a single node haodoop cluster on using Hortonworks/Ambari on Amazon's ec2 host.
Since I don't want this cluster running 24/7, I stop the instance when done. When I reboot the instance later, I get a new IP address and then ambari no longer is able to start the Hadoop related services.
Is there a way other than completely redeploying to reconfigure the cluster so the services will start?
It looks like the IP address lives in various xml files under /etc, in the postgres database table ambari, and possibly other places I haven't found yet.
I tried updating the xml files and postgres database with updated versions of the ip address, internal and external dns names as I could find them, but to no avail. I have not been able to restart the services.
The reason I am doing this is to possibly save the deployment time and data configuration on hdfs and other project specific setup each time I restart the host.
Any suggestions?
Thanks!

Elastic IP can be used. Also, since you mentioned it being a single node cluster - you can use localhost or private IP.
If you use elastic IP, your UIs will always be on the same public IP. However, if you use private IP or localhost and do not associate your instance with an elastic IP you will have to look for public IP everytime you start the instance and then connect to the web UI using the IP.

Thanks for the help, both Harman and TJ are correct. I haven't used an elastic IP because I might have more than one of these running and a time, and for now at least, I don't mind looking up the public ip address.
Harman's suggestion of using "localhost" as the fqdn when setting up ambari in the first place is a really good idea in retrospect. Unless I go through the whole setup again, that's water under the bridge for me, but I recommend this to others who might read this post.
In my case, I figured this out on my own before coming back to the page. The specific step I took was insanely simple after all, thanks to Occam's Razor.
I added the following line in /etc/hosts:
<new internal IP> <old internal dns name>
and then did
ambari-server restart. from the command line. Then I am able to restart all services after logging into ambari.

Related

EC2 Private IP changes on every server restart

We have a RHEL 7.2 EC2 instance and we are trying to install Oracle 12C EE server. We have assigned an Elastic IP to the instance to make sure that the Public IP address does not change when we restart the server. But we saw that the hostname of the instance gets changed on a server restart.
Problem: There are a few steps in oracle installation where we need to mention the hostname of the EC2 instance (i.e. private DNS), so we are hardcoding the hostname during oracle installation. But the problem is if in case the hostname gets changed in every server restart then the installed software wont work (since it holds previous host name) - how to resolve this issue?
Please let us know on the best practices to resolve this issue.
IP addresses do not change in EC2 with a simple restart. They only change with a complete stop, followed later by a start. If you are using a VPC, which you most likely are, then the private IP address will not change even with a stop/start.
If you want a solution that will work even if you move the installation to a different EC2 instance, then you should create a Route53 private hosted zone, attach it to your VPC, and then create a custom DNS name for this server.
If you are using VPC (which is the default now) the private IP should not change upon restart or stop start.
My understanding is that you're having issue with hostname reset to the default ip-x-y-z-k upon os reboot causing issues with oracle database.
This is usually caused by cloud-init (embedded in the AMI).
I suggest you to go through these steps:
First set the hostname in your os:
$: hostnamectl set-hostname Your-New-Host-Name-Here --static
Edit your '/etc/hosts' to match the private IP:
<private_ip> <hostname>
Check the value of HOSTNAME in '/etc/sysconfig/network' it should match your hostname.
Finally, to solve the issue, I suggest to remove the following lines from '/etc/cloud/cloud.cfg'
set_hostname
update_hostname
update_etc_host
To test if it works stop and start the instance, the private IP should stay the same as before and the hostname should be the one you defined.
I hope this helps.
G.

Can't join into a cluster on marklogic

I'm working with marklogic database and I tried to create a cluster.
I already have a development key. The OS is the same in all the nodes (win 7 x64).
When you tried to add a node into the cluster, you need to type the host name or the IP adress. For some reason when I type de host name, marklogic sometimes can't find the node , but that doesn't matter, because with the IP, the connection is successfull.
The main problem is when continues trought the process. At the end when marklogic try to transfer cluster configuration information to the new host, the process never ends and finally a message like "No data received" appear in the web browser.
I know that this message doesnt mean that the process fails, because when I change for example the host name, the same message appear.
So, when I check the summary in the first node, the second node appears, so that means the node "joins" into the cluster, but I'm not able to start the admin interface and always the second node appears disconnected even if I restart the service.
Aditionally, I'm able to make a ping from any computer to another.
I tried to create another network, because in my school some ports are not allowed, furthermore I tried to use different development key and the same key in my nodes too,
and finally I already have all the services enabled, but the problem persist.
Any help or comments would be appreciated.
Make sure ports 7998 - 8003 are open on both computers for both inbound and outbound traffic and that you don't have a firewall (Windows firewall, or iptables) blocking these.
You can also start looking into the Logs/ErrorLog.txt file and see if something obvious shows up.
Stick to IP addresses for now as it seems your DNS isn't fully working.
Your error looks like a kind of networking connectivity problem between the hosts.
Also you might get more detailed, or atleast different, answers from the MarkLogic developer mailing list.
http://developer.marklogic.com/discuss
-David Lee
Make sure the host names in MarkLogic configuration match the DNS names at which the hosts can see each other. If those are unreliable, then simply use IP addresses as host names. Go to the Admin interface on both ends, lookup the host name, change the DNS name into IP name, try again.
Also look at DALDEI's suggestion about ports and firewalls, that could be interfering as well.
HTH!

Configuring Amazon EC2 for a dynamic website

I am curious about Amazon webservices and so I thought of creating a dynamic webpage with Amazon EC2. I created an instance, installed apache and php and made sure it is working in EC2(using remote access). I have assigned a elastic IP to the instance. My question is how to access the webpage that I created in the instance. I am not sure what to give the servername in httpd.conf. My goal is access the page like http://amazonaddress/test.php
I am using windows server, but I think it is basically the same. My documents are in the same folder as mentioned in conf file. But when I use my elastic IP, it isn't working . Not even the basic index page in the apache htdocs(that's the home folder according to conf). To throw more light I will explain what I have done till now.
I have created a micro instance(EC2) and logged into it using remote desktop. I have installed apache msi file and php after that. I have created a elasticIP and attached the instance and to my security group I have added http service to port 80. I have tested if localhost is working in my remote machine(points to index.html). After that I have tried accessing it using elastic IP and it just times out. Is there any step I have missed?
You can access it via http://255.255.255.255 where you replace the 255.255.255.255 with your elastic IP address.
Then you want to setup DNS for your domain name. So you'll need to create an A Record mapping www.yourdomain.com to whatever your elastic IP address is. You can usually do this via your domain name registrar as most of them also run basic DNS services for free.
You can access an ec2 instance using it's public DNS name (or elastic IP since you already have one of those), which can be seen in the instances description tab. Configuring your personal domain name to point to that server will involve creating an A Record mapping to that public IP.
Assuming apache has been setup correctly, that's all you should need to do to get started (and your test.php page is in /var/www/). For your purposes, you probably shouldn't even need to modify the httpd.conf file at all.
Also, be sure to remember to open a port on the security group (under Network & Security from the EC2 Console) that the instance belongs to. In your example, you will want to open port 80 inbound with source 0.0.0.0/0 (unless you want to limit access to a specific IP range).
Hope this helps.

Solution for local ip changes of AWS EC2 instances

Amazon only gives you a certain number of static ip address and the local (private) ips of each EC2 instance can change when the machine is restarted. This makes creating a stable platform where EC2 instances depend on each other ridiculously hard to use as far as I can tell.
I've search online a lot about various solutions and so far have found nothing reasonable outside of assigning an elastic ip address on ever EC2 even if its not public facing. Does anyone have any other good ideas that is actually easy to execute on?
Thanks!
See the AWS team's response to question Static local IP:
The internal IP address of EC2 instances is allocated via DHCP. On
instance shutdown, or when the DHCP lease expires, the IP address is
returned to the general EC2 DHCP pool of addresses available for other
instances.
There is no way to guarantee that you will obtain the same DHCP
address across reboots.
Edit: The answer is to use Amazon VPC. There is no downside except a trivial amount of extra setup because now you control the router. It's a world apart from plain old EC2 instance on AWS. It's so necessary in fact that VPC will be enabled for all future AWS setups by default. See this post for more information: http://www.reddit.com/r/aws/comments/1a3n0r/ec2_update_virtual_private_clouds_for_everyone/
The stock answers are:
Use AWS VPC so you have complete control over instance addressing
Use Elastic IPs, which will resolve to the instance's local address (not the public, as you'd expect) when used to communicate between EC2 instances
I stumbled upon third option. There's ec2-ssh by the Instragram folks. It's a python shell script that you install globally and lets you both query the public dns of your ec2 instances by tag name and also ssh in via tag name as well.
The documentation for it is virtually nonexistent. I've written down the steps to install below:
To install ec2-ssh:
sudo yum install python-boto (python wrapper for ec2 api)
git clone https://github.com/Instagram/ec2-ssh
In your ~/.bash_profile set your AWS access key and secret like so:
export AWS_ACCESS_KEY_ID=XYZ123
export AWS_SECRET_ACCESS_KEY=XYZ123
cd into the bin folder of the repo, there will be two files:
ec2-host and ec2-ssh
copy them to your /usr/bin or /usr/local/bin.
Now you can do awesome stuff like:
$ ec2-host ZenWorker
ec2-999-xy-999-99.compute-1.amazonaws.com
and
$ ec2-ssh ZenWorker
Connecting to ec2-999-xy-999-99.compute-1.amazonaws.com.
Note that in your regular shell scripts you can use backticks to call these global tools. I've timed these calls and they take between 0.25 and 0.5 second using an EC2 instance, so that's really the only downside. Perhaps you can live with the delay, or use the fact that public DNS only changes for an instance on reboot to work up a solution.
Note that these two programs are commandline scripts and you don't need any Python knowledge to use them. For PHP fans, or those that also want an easy way to scp files without knowing the changing public DNS, you can checkout ec2dns.
I was in the same situation once. I still dont have the expertise to solve it properly. My ugly solution was to use elb not really for load balancing but just for the endpoint.
But I think a good solution can be obtained by using aws vpc.
Here's another Ruby solution for Updating Route 53 DNS from instance on AWS. You shouldn't reference raw 3rd party system IP addresses in your applications or server configurations.
you can change Ip Address using Elastic Ip:
You Can Do Using C# Code:
var associateRequest = new AssociateAddressRequest
{
PublicIp = your Elastic Ip,
InstanceId = Your Instance Id Which You Assign
};
amazonEc2Client.AssociateAddress(associateRequest);
after That DeAssociate It.
var disAssociateRequest = new isassociateAddressRequest(publicIp.ElasticIpAddress1);
AmazonEc2Client.DisassociateAddress(your Elastic Ip);
your Public Ip Will Change

How do I connect up my Amazon EC2 instances without manually modifying config files?

I have a three-tier Windows-based web application bundled into 3 AMIs on Amazon EC2 that I use for load testing.
An ASP.NET web application on IIS
An .NET application server
SQL Server
After I launch them, the config files of each tier needs modifying to update the IP addresses.
At the moment I am doing this manually: I connect to the webserver instance via remote desktop and modify the config file to point to the new IP of the application server instance. Then I do the same with the application server to change the IP in the connection string.
This must be a common requirement and I must be missing something obvious. There must be a better way!
I could use Elastic IP addresses, but these machines are only provisioned for a couple of hours at a time, and I would be charged for the addresses when they were NOT in use (which would be most of the time).
Is there some way of persistently naming the machines? Can I somehow get all the machines on the same network and use machine names instead of IP addresses?
I could write some nifty PowerShell script that would perform the modifications remotely. Is there an example somewhere?
I could use a dynamic IP address service. I'm not sure if this would have any negative effect on performance or availability... Are there any downsides to this approach?
I could install some sort of self-configuring service on each machine (which connects to S3? SNS? SimpleDB?) to publish/retrieve the addresses of the other machines and update the config files automatically. Is there an example somewhere?
What is best practice?
You could use Amazon Virtual Private Cloud (Amazon VPC). You have a private subnet where you can assign an IP address to an instance, but it may require launching an instance from command line to assign IP. VPC is charged the same way as EC2.

Resources