How to properly configure the Amazon EC2 AMI for 'hadoop-ec2'? - hadoop

I am trying to launch an instance on Amazon EC2. I have researched this problem extensively, but I have not found any helpful information.
When I run the command hadoop-ec2 launch-cluster mycluster 2, I receive the following error message:
Starting master with AMI.
Required parameter 'AMI' missing (-h for usage)
I have entered my AWS key, AWS secret key, AWS key pairs, etc. I am using hadoop-1.0.4. I am using the default S3 bucket (hadoop-images), but I have tried many other AMIs and I always get the same error message.
Has anybody experience this problem before?

The basic issue is that the search for images the launch-hadoop-master script performs is not returning any results. The most likely cause of this due to the different AMIs that are available in different regions (but it could be due to any changes you've made to S3_BUCKET and HADOOP_VERSION in hadoop-ec2-env.sh).
From the launch-hadoop-master script:
# Finding Hadoop image
AMI_IMAGE=`ec2-describe-images -a | grep $S3_BUCKET
| grep $HADOOP_VERSION
| grep $ARCH
| grep available
| awk '{print $2}'`
# Start a master
echo "Starting master with AMI $AMI_IMAGE"
So, it appears that AMI_IMAGE is not being set to a valid image and thus the search for AMIs that match the various grep filters is failing (the defaults for the Hadoop 1.0.4 distribution are S3_BUCKET is hadoop-images, HADOOP_VERSION is 0.19.0 and ARCH is x86 if you're using m1.small instances). If you search the public AMIs in the US-West-2 region, you'll see that there aren't many Hadoop images, but if you search the public AMIs in the US-East-1 region, you'll see that there are quite a few. Thus, one way around this issue is to work in the US-East-1 region (this is simplest) or, alternatively, set EC2_URL in your login script via export EC2_URL=https://ec2.us-east-1.amazonaws.com but now you need make sure you put your keys in this region from the AWS console.
If you did indeed change HADOOP_VERSION to 1.0.4, I'll note that
ec2-describe-images -a | grep hadoop-images
| grep "1.0.4"'
| grep x86
| grep available
doesn't return any images in the US-East-1 region. Note that the version (HADOOP_VERSION) of the Hadoop distribution that you are running the hadoop-ec2 command from does not need to be the same as the version of Hadoop that the images will be running.
Lastly, as a blunt fix, you could find the AMI that you want to use, and force set AMI_IMAGE to the image name in the launch-hadoop-master and launch-hadoop-cluster scripts.

Related

EC2 instance region is not populated in user-data script

I want to fill some tags of the EC2 spot instance, however as it is impossible to do it directly in spot request, I do it via user data script. All is going fine when I specify region statically, but it is not universal approach. When I try to detect current region from instance userdata, the region variable is always empty. I do it in a following way:
#!/bin/bash
region=$(ec2-metadata -z | awk '{print $2}' | sed 's/[a-z]$//')
aws ec2 create-tags \
--region $region \
--resources `wget -q -O - http://169.254.169.254/latest/meta-data/instance-id` \
--tags Key=sometag,Value=somevalue Key=sometag,Value=somevalue
I tried to made a delay before region populating
/bin/sleep 30
but this had no result.
However, when I run this script manually after start, the tags are added fine. What is going on?
Why all in all aws-cli doesn't get default region from profile? I have aws configure properly configured inside the instance, but without --region clause it throws error that region is not specified.
I suspect the ec2-metadata command is not available when your userdata script is executed. Try getting the region from the metadata server directly (which is what ec2-metadata does anyway)
region=$(curl -fsq http://169.254.169.254/latest/meta-data/placement/availability-zone | sed 's/[a-z]$//')
AWS CLI does use the region from default profile.
You can now use this endpoint to get only the instance region (no parsing needed):
http://169.254.169.254/latest/meta-data/placement/region
So in this case:
region=`curl -s http://169.254.169.254/latest/meta-data/placement/region`
I ended up with
region=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | python -c "import json,sys; print"
which worked fine. However, it would be fine if somebody explain the nuts-and-bolts.

Is it secure to store EC2 User-Data shell scripts in a private S3 bucket?

I have an EC2 ASG on AWS and I'm interested in storing the shell script that's used to instantiate any given instance in an S3 bucket and have it downloaded and run upon instantiation, but it all feels a little rickety even though I'm using an IAM Instance Role, transferring via HTTPS, and encrypting the script itself while at rest in the S3 bucket using KMS using S3 Server Side Encryption (because the KMS method was throwing an 'Unknown' error).
The Setup
Created an IAM Instance Role that gets assigned to any instance in my ASG upon instantiation, resulting in my AWS creds being baked into the instance as ENV vars
Uploaded and encrypted my Instance-Init.sh script to S3 resulting in a private endpoint like so : https://s3.amazonaws.com/super-secret-bucket/Instance-Init.sh
In The User-Data Field
I input the following into the User Data field when creating the Launch Configuration I want my ASG to use:
#!/bin/bash
apt-get update
apt-get -y install python-pip
apt-get -y install awscli
cd /home/ubuntu
aws s3 cp s3://super-secret-bucket/Instance-Init.sh . --region us-east-1
chmod +x Instance-Init.sh
. Instance-Init.sh
shred -u -z -n 27 Instance-Init.sh
The above does the following:
Updates package lists
Installs Python (required to run aws-cli)
Installs aws-cli
Changes to the /home/ubuntu user directory
Uses the aws-cli to download the Instance-Init.sh file from S3. Due to the IAM Role assigned to my instance, my AWS creds are automagically discovered by aws-cli. The IAM Role also grants my instance the permissions necessary to decrypt the file.
Makes it executable
Runs the script
Deletes the script after it's completed.
The Instance-Init.sh Script
The script itself will do stuff like setting env vars and docker run the containers that I need deployed on my instance. Kinda like so:
#!/bin/bash
export MONGO_USER='MyMongoUserName'
export MONGO_PASS='Top-Secret-Dont-Tell-Anyone'
docker login -u <username> -p <password> -e <email>
docker run - e MONGO_USER=${MONGO_USER} -e MONGO_PASS=${MONGO_PASS} --name MyContainerName quay.io/myQuayNameSpace/MyAppName:latest
Very Handy
This creates a very handy way to update User-Data scripts without the need to create a new Launch Config every time you need to make a minor change. And it does a great job of getting env vars out of your codebase and into a narrow, controllable space (the Instance-Init.sh script itself).
But it all feels a little insecure. The idea of putting my master DB creds into a file on S3 is unsettling to say the least.
The Questions
Is this a common practice or am I dreaming up a bad idea here?
Does the fact that the file is downloaded and stored (albeit briefly) on the fresh instance constitute a vulnerability at all?
Is there a better method for deleting the file in a more secure way?
Does it even matter whether the file is deleted after it's run? Considering the secrets are being transferred to env vars it almost seems redundant to delete the Instance-Init.sh file.
Is there something that I'm missing in my nascent days of ops?
Thanks for any help in advance.
What you are describing is almost exactly what we are using to instantiate Docker containers from our registry (we now use v2 self-hosted/private, s3-backed docker-registry instead of Quay) into production. FWIW, I had the same "this feels rickety" feeling that you describe when first treading this path, but after almost a year now of doing it -- and compared to the alternative of storing this sensitive configuration data in a repo or baked into the image -- I'm confident it's one of the better ways of handling this data. Now, that being said, we are currently looking at using Hashicorp's new Vault software for deploying configuration secrets to replace this "shared" encrypted secret shell script container (say that five times fast). We are thinking that Vault will be the equivalent of outsourcing crypto to the open source community (where it belongs), but for configuration storage.
In fewer words, we haven't run across many problems with a very similar situation we've been using for about a year, but we are now looking at using an external open source project (Hashicorp's Vault) to replace our homegrown method. Good luck!
An alternative to Vault is to use credstash, which leverages AWS KMS and DynamoDB to achieve a similar goal.
I actually use credstash to dynamically import sensitive configuration data at container startup via a simple entrypoint script - this way the sensitive data is not exposed via docker inspect or in docker logs etc.
Here's a sample entrypoint script (for a Python application) - the beauty here is you can still pass in credentials via environment variables for non-AWS/dev environments.
#!/bin/bash
set -e
# Activate virtual environment
. /app/venv/bin/activate
# Pull sensitive credentials from AWS credstash if CREDENTIAL_STORE is set with a little help from jq
# AWS_DEFAULT_REGION must also be set
# Note values are Base64 encoded in this example
if [[ -n $CREDENTIAL_STORE ]]; then
items=$(credstash -t $CREDENTIAL_STORE getall -f json | jq 'to_entries | .[]' -r)
keys=$(echo $items | jq .key -r)
for key in $keys
do
export $key=$(echo $items | jq 'select(.key=="'$key'") | .value' -r | base64 --decode)
done
fi
exec $#

How to identify Amazon AWS EC2 instance?

Is there a system file that authoritatively tells me if a host is an Amazon AWS EC2 instance?
Bonus point: without installing anything new, is there a command that will tell me some basic parameters of an EC2 instance?
Context:
~~~~~~~~
I have a script that gathers information.
If I run the script on an EC2 instance, some "standard" commands are not available,
for example: dmidecode (this gives me practically nothing), or virt-what.
I am aware of /usr/bin/ec2-describe-instances, or wget for metadata script, or wget
for individual components reported by metadata, but I don't want to install anything
new, and I need the tool to describe the host itself (since the script runs locally),
not to inquire another host, or pass a key that I must obtain from yet another script
that I must install first.
Maybe wget for a specific metadata info is the best tool I could use?
Thanks
I cannot guarantee this will work on any other AMI than Ubuntu 12.04 (I am using ami-8e987ef9) - please test yourself.
Here is what you might want to check out if you really want to avoid getting this info from the magic IP 169.254.169.254 and metadata as #Rico suggested:
### Datasource EC2
ubuntu#ip-10-33-59-70:~$ cat /var/lib/cloud/instance/datasource
cloudinit.DataSourceEc2.DataSourceEc2: DataSourceEc2
ubuntu#ip-10-33-59-70:~$ cat /var/lib/cloud/data/previous-datasource
cloudinit.DataSourceEc2.DataSourceEc2: DataSourceEc2
### Hostname
ubuntu#ip-10-33-59-70:~$ cat /var/lib/cloud/data/previous-hostname
ip-10-33-59-70
### Instance ID
ubuntu#ip-10-33-59-70:~$ cat /var/lib/cloud/data/previous-instance-id
i-280ace69
### Instance ID also (check out instance symlink)
ubuntu#ip-10-33-59-70:/var/lib/cloud$ ls -al | grep instance
lrwxrwxrwx 1 root root 22 Jan 29 22:00 instance -> ./instances/i-280ace69
drwxr-xr-x 3 root root 4096 Jan 29 22:00 instances
Maybe looking around /var/lib/cloud will give you some info that you need without using curl.
I would rather suggest using the magic IP with metadata though.
On Ubuntu 12.04 there's also /usr/bin/ec2metadata, a util written in Python that actually queries 169.254.169.254.
Just use the standard way of querying the metadata
curl http://169.254.169.254/latest/meta-data/instance-id
For a list of all the metadata options:
curl http://169.254.169.254/latest/meta-data/
If you get anything other than a '200' then it means it's not an EC2 instance.
From the documentation:
[ec2-user ~]$ cat /sys/hypervisor/uuid
For HVM:
[ec2-user ~]$ sudo dmidecode --string system-uuid
[ec2-user ~]$ sudo cat /sys/devices/virtual/dmi/id/product_uuid
If the strings start with "ec2" or "EC2", the OS is running on an EC2 instance.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/identify_ec2_instances.html

How to get the instance Name from the instance in AWS?

I'm trying to set up a means to register an instance in route53 automatically when the instance is created, using salt and this article.
The article uses ec2-metadata to get the instance-id and and the hostname. I'm wondering if there is a way, using bash within the instance, to get the instance Name instead. ec2-metadata only seems to show the instance-id. Thanks in advance.
First, you need to get the instance-id.
AWS_INSTANCE_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`
Than you can get the ec2 instance name using below command.
EC2_NAME=$(aws ec2 describe-tags --region $REGION --filters "Name=resource-id,Values=$AWS_INSTANCE_ID" "Name=key,Values=Name" --output text | cut -f5)
Please ensure that you have AWS Cli Installed.
I hope this helps.
Thanks!
First and foremost, the Amazon EC2 Instance Metadata Service also provides quite some other Names besides the instance-id, if these might be what you are looking for - see Instance Metadata Categories:
hostname - The private hostname of the instance. In cases where multiple network interfaces are present, this refers to the eth0 device (the device for which the device number is 0).
local-hostname - The private DNS hostname of the instance. In cases where multiple network interfaces are present, this refers to the eth0 device (the device for which the device number is 0).
public-hostname - The instance's public DNS. If the instance is in a VPC, this category is only returned if the enableDnsHostnames attribute is set to true.
If you are looking for the Name as exposed in the AWS Management Console though, you would indeed need to resort to using one of the Tools for Amazon Web Services to retrieve it - that Name is in fact just a regular tag with the key Name (see Tagging Your Amazon EC2 Resources), which happens to be used across most AWS services for the obvious purpose.
Here's how to get it with the AWS Command Line Interface for example (skipping region and credentials):
aws ec2 describe-tags \
--filters Name=resource-id,Values=i-abcd1234 Name=key,Values=Name \
--query Tags[].Value --output text
For more advanced CLI JSON output processing than what's possible with the built in --query option, you could resort to jq (a lightweight and flexible command-line JSON processor).
Overthink's answer provides an example based on the now legacy Amazon EC2 API Tools (please note the comments, which correctly point out that you'd nowadays deal with credentials differently, see Tell the CLI Tools Who You Are and IAM Roles for EC2 instances for details).
Not sure what it would look like with bash, but you could use an SDK from the instance itself if you can get the instance id. You would query the ec2 recourse and pass in the ec2 instance id. Using the ruby sdk it would look like:
i = ec2.instances["i-12345678"]
puts i.dns_name
Found that describe-tags not working in my config, failed with 'UnauthorizedOperation' error. Got this working with describe-instances:
aws ec2 describe-instances --filters Name=instance-id,Values=$(wget -qO- http://instance-data/latest/meta-data/instance-id) --query Reservations[].Instances[].Tags[].Value --output text
Command using region and access keys from current user's aws config file's [default] section: ~/.aws/config . If need to use another user's region/keys (can be found at AWS console's IAM dashboard), you can add them to another section in that file, for example [user2] and use in command like this:
aws --profile user2 ec2 describe-instances --filters Name=instance-id,Values=$(wget -qO- http://instance-data/latest/meta-data/instance-id) --query Reservations[].Instances[].Tags[].Value --output text
Use this command to show which metadata is available
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` \
&& curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/
You can chain any one of the below file/folders to display the required info
ami-id
ami-launch-index
ami-manifest-path
block-device-mapping/
events/
hostname
iam/
identity-credentials/
instance-action
instance-id
instance-life-cycle
instance-type
local-hostname
local-ipv4
mac
metrics/
network/
placement/
profile
public-keys/
reservation-id
For eg. instance-type can be chained this to the above command as follows:
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` \
&& curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/instance-type
Reference from AWS
ec2metadata tool is useful to get information about the EC2 server.
You can use the following;
ec2metadata --instance-id
in bash;
INSTANCE_ID=$(ec2metadata --instance-id)
You can also access other useful information like the following;
--ami-id
--ami-launch-index
--ami-manifest-path
--ancestor-ami-ids
--availability-zone
--block-device-mapping
--instance-action
--instance-id
--instance-type
--local-hostname
--local-ipv4
--kernel-id
--mac
--profile
--product-codes
--public-hostname
--public-ipv4
--public-keys
--ramdisk-id
--reserveration-id
--security-groups

Describing E2C instance doesn't return anything

I've started an EC2 instance and installed the ec2-api-tools. Environment variables (JAVA_HOME, EC2_PRIVATE_KEY, EC2_CERT) are set up.
Running ec2-describe-instances doesn't return anything. According to the EC2 command line reference information on all currently running (and terminated) instances should be returned. What's going wrong?
In general ec2-describe-images -o self -o amazon works, so the EC2 tools are working. Adding explicitly -K and -C parameters to ec2-describe-instances doesn't change the situation.
A little more detail:
You don't need to set the EC2_URL directly. You can use the more friendly command-line option:
--region eu-west-1
(substituting the name of the region you want to address).
This way you don't need to look up the region's URL endpoint.
Here are the EC2 Command Line API Tools general options where this is explained.
if all your instances are in eu-west-1, configure your aws cli to use this region by default.
just type : aws configure
and you ll be prompted to enter your credential, then you can rewrite the region

Resources