Set environment variables in AWS EMR during bootstrap

Set environment variables in AWS EMR during bootstrap - bash

I added the following configuration in spark-env
--configurations '[
{
"Classification": "spark-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"MY_VARIABLE": "MY_VARIABLE"
}
}
]
}
]'
But if I just do echo $MY_VARIABLE in bash I can't see them in the terminal.
Basically what I want to do is the following:
schedule the creation of an AWS EMR cluster with AWS Lambda (where I would set all my environment variables such as git credentials)
in the bootstrapping of the machine, install a bunch of things, including git
git clone a repository (so I need to use the credentials stored in the environment variables)
execute some code from this repository

Pass the environment variables as arguments to the bootstrap action.

the reason why you can't find MY_VARIABLE using echo is because MY_VARIABLE is only available to the spark-env.
Assuming you are using pyspark, if you open a pyspark shell (whilst you are ssh'd into one of the nodes of your cluster) and you try to type os.getenv("MY_VARIABLE") you'll see the value of you assigned to that variable.
An alternative solution for your use case would be: instead of using credentials (which in general is not the preferred way), you could use a set of keys that allows you to clone a repo with SSH (rather than https). You can store those keys in aws ssm and retrieve those in the EMR bootstrap script. An example could be:
bootstrap.sh
export SSM_VALUE=$(aws ssm get-parameter --name $REDSHIFT_DWH_PUBLIC_KEY --with-decryption --query 'Parameter.Value' --output text)
echo $SSM_VALUE >> $AUTHORIZED_KEYS
In my case, I needed to connect to a Redshift instance, but this would work nicely also with your use case.
Alessio

Related

Why does running aws update-service from the cli cause a vim like window to popup expecting some input?

Running
aws ecs update-service --cluster cluster-name --service service-name --force-new-deployment --region ap-southeast-2 opens:
{
"service": {
"serviceArn": "arn:aws:ecs:ap-southeast-2:000000000000:service/service-name",
"serviceName": "service-name",
"clusterArn": "arn:aws:ecs:ap-southeast-2:000000000000:cluster/clustername",
"loadBalancers": [
{
"targetGroupArn": "arn:aws:elasticloadbalancing:ap-southeast-2:000000000000:targetgroup/targetgroupname",
"containerName": "containername",
"containerPort": 5000
}
],
"serviceRegistries": [],
"status": "ACTIVE",
"desiredCount": 1,
"runningCount": 1,
"pendingCount": 1,
"launchType": "FARGATE",
"platformVersion": "LATEST",
"taskDefinition": "arn:aws:ecs:ap-southeast-2:000000000000:task-definition/service-name:43",
:
and blocks my shell until I press q. This used to work just fine. I think I updated my AWSCLI and it's caused this. Why is this happening and how can I avoid it to update my service in my CI Scripts?

as described on https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-pagination.html:
you can add the --no-cli-pager option to your command to disable the pager for a single command use.
in your specific example:
aws ecs update-service --cluster cluster-name --service service-name --force-new-deployment --region ap-southeast-2 --no-cli-pager
it will still print the output to the command line (but not halt and block your script's further commands by opening vim like previously)
this is better than disabling the pager, or redirecting the output to /dev/null, since you still get to see the output if you want to. (you could also redirect the output to a file ./output if you'd want to keep the output)

I ran into this behavior and tried a number of things to work around it before finding the AWS Documentation, How to use the AWS_PAGER environment variable
Windows:
C:\> setx AWS_PAGER ""
Linux and MacOS:
export AWS_PAGER=""

Yes, that is the default behavior of the AWS CLI 2. This should solve your problem (as I don't think there is a built in way to make it "quiet" or silent).
aws ecs update-service --service svcName --desired-count 1 > /dev/null
I wouldn't recommend going back to an older version unless as a last resort.

How to set command output as env. variable in windows cmd?

I have saved azure storage key in key vault and i want to retrieve the key using Azure cli and set it as env. variable in window cmd before i run terraform script.
Below listed command doesn't work, can anyone tell me what needs to be changed ?
set ARM_ACCESS_KEY=$(az keyvault secret show --name terraform-backend-key --vault-name myKeyVault)
Error on initializing
Main.tf
variable "count" {}
variable "prefix" {
default="RG"
}
terraform {
backend "azurerm" {
container_name = "new"
storage_account_name = "mfarg"
key = "terraform.tfstate"
}}
resource "azurerm_resource_group" "test" {
count ="${var.count}"
name = "${var.prefix}-${count.index}"
location = "West US 2"
}
Command prompt output

To set the environment variable in Windows, I suggest you use the PowerShell command to achieve it. In PowerShell, you can just do it like this:
$env:ACCESS_KEY=$(az keyvault secret show -n terraform-backend-key --vault-name myKeyVault --query value -o tsv)
Also, in your CLI command, you could not show the secret directly, it outputs the whole secret not just the access key as you want. See the difference between the two commands.

A late answer, but perhaps useful to those who still have the same problem.
This method will work in windows Command prompt, cmd.
For /f %%i in ('az keyvault secret show --vault-name "Your-KeyVault-Name" --name "Your-Secret-Name" --query "value"') do set "password=%%i"
Now if you just run "echo %password%" you will see your secret value.
Remember that az command has to be between ' ', like 'az keyvault secret etc'.

unable to parse JSON using terraform external data

USE CASE
I wanted to provision an EFS in a given region with an EC2 instance and then mount that instance with EFSand if EFS doesn't exist in that region then I need to provision an alternate of EFS.
PROBLEMS
Amazon is not offering EFS service in every region.
There is no way to identify the availability of service using aws-cli. Already asked but didn't get an answer yet link
WHAT I TRIED
This is the case where amazon provide the service, If I run this command curl -ls https://elasticfilesystem.us-east-1.amazonaws.com system returns the following output.
<MissingAuthenticationTokenException>
<Message>Missing Authentication Token</Message>
</MissingAuthenticationTokenException>
And if I change the region to eu-west-3 (paris) and run the command system returns nothing.
So this is the WORKAROUND that I've in my mind to check the availability of service in any particular region. But If i write a bash script for this and run the same command using terraform datasource external it shows an error message that unable to parse a JSON error pasring '<'. I've no clue why bash-script considering this returned XML, though I'm checking the exit code instead.
bash script
function check_efs() {
curl -ls https://elasticfilesystem.us-east-1.amazonsaws.com
if [ $? -eq 0 ]; then
output=1
else:
output=0
}
function produce_output() {
value=$(output)
jq -n \
--arg is_efs_exist "$value" \
'{"is_efs_exist":$is_efs_exist}'
}
check_efs
produce_output
main.tf
provider external {}
data "external" "this" {
program = ["bash", "${path.module}/scripts/efschecker.sh"]
}

How can I automate entering input for a command in a bash script that runs on AWS EC2 launch?

For example: upon launching my EC2 instance, I would like to automatically run
docker login
so I can pull a private image from dockerhub and run it. To login to dockerhub I need to input a username and password, and this is what I would like to automate but haven't been able to figure out how.
I do know that you can pass in a script to be ran on launch via User Data. The issue is that my script expects input and I would like to automate entering that input.
Thanks in advance!

If just entering a password for docker login is your problem then I would suggest searching for a manual for docker login. 30 secs on Google gave me this link:
https://docs.docker.com/engine/reference/commandline/login/
It suggests something of the form
docker login --username foo --password-stdin < ~/my_password.txt
Which will read the password from a file my_password.txt in the current users home directory.

Seems like the easiest solution for you here is to modify your script to accept command line parameters, and pass those in with the UserData string.
Keep in mind that this will require you to change your launch configs every time your password changes.
The better solution here is to store your containers in ECS, and let AWS handle the authentication for you (as far as pulling the correct containers from a repo).
Your UserData then turns into something along:
#!/bin/bash
mkdir -p /etc/ecs
rm -f /etc/ecs/ecs.config # cleans up any old files on this instance
echo ECS_LOGFILE=/log/ecs-agent.log >> /etc/ecs/ecs.config
echo ECS_LOGLEVEL=info >> /etc/ecs/ecs.config
echo ECS_DATADIR=/data >> /etc/ecs/ecs.config
echo ECS_CONTAINER_STOP_TIMEOUT=5m >> /etc/ecs/ecs.config
echo ECS_CLUSTER=<your-cluster-goes-here> >> /etc/ecs/ecs.config
docker pull amazon/amazon-ecs-agent
docker run --name ecs-agent --detach=true --restart=on-failure:10 --volume=/var/run/docker.sock:/var/run/docker.sock --volume=/var/log/ecs/:/log --volume=/var/lib/ecs/data:/data --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro --volume=/var/run/docker/execdriver/native:/var/lib/docker/execdriver/native:ro --publish=127.0.0.1:51678:51678 --env-file=/etc/ecs/ecs.config amazon/amazon-ecs-agent:latest
You may or may not need all the volumes specified above.
This setup lets the AWS ecs-agent handle your container orchestration for you.

Below is what I could suggest at this moment -
Create a S3 bucket i.e mybucket.
Put a text file(doc_pass.txt) with your password into that S3 bucket
Create a IAM policy which has GET access to just that particular S3 bucket & add this policy to the EC2 instance role.
Put below script in you user data -
aws s3 cp s3://mybucket/doc_pass.txt doc_pass.txt
cat doc_pass.txt | docker login --username=YOUR_USERNAME --password-stdin
This way you just need to make your S3 bucket secure, no secrets gets displayed in the user data.

Is it secure to store EC2 User-Data shell scripts in a private S3 bucket?

I have an EC2 ASG on AWS and I'm interested in storing the shell script that's used to instantiate any given instance in an S3 bucket and have it downloaded and run upon instantiation, but it all feels a little rickety even though I'm using an IAM Instance Role, transferring via HTTPS, and encrypting the script itself while at rest in the S3 bucket using KMS using S3 Server Side Encryption (because the KMS method was throwing an 'Unknown' error).
The Setup
Created an IAM Instance Role that gets assigned to any instance in my ASG upon instantiation, resulting in my AWS creds being baked into the instance as ENV vars
Uploaded and encrypted my Instance-Init.sh script to S3 resulting in a private endpoint like so : https://s3.amazonaws.com/super-secret-bucket/Instance-Init.sh
In The User-Data Field
I input the following into the User Data field when creating the Launch Configuration I want my ASG to use:
#!/bin/bash
apt-get update
apt-get -y install python-pip
apt-get -y install awscli
cd /home/ubuntu
aws s3 cp s3://super-secret-bucket/Instance-Init.sh . --region us-east-1
chmod +x Instance-Init.sh
. Instance-Init.sh
shred -u -z -n 27 Instance-Init.sh
The above does the following:
Updates package lists
Installs Python (required to run aws-cli)
Installs aws-cli
Changes to the /home/ubuntu user directory
Uses the aws-cli to download the Instance-Init.sh file from S3. Due to the IAM Role assigned to my instance, my AWS creds are automagically discovered by aws-cli. The IAM Role also grants my instance the permissions necessary to decrypt the file.
Makes it executable
Runs the script
Deletes the script after it's completed.
The Instance-Init.sh Script
The script itself will do stuff like setting env vars and docker run the containers that I need deployed on my instance. Kinda like so:
#!/bin/bash
export MONGO_USER='MyMongoUserName'
export MONGO_PASS='Top-Secret-Dont-Tell-Anyone'
docker login -u <username> -p <password> -e <email>
docker run - e MONGO_USER=${MONGO_USER} -e MONGO_PASS=${MONGO_PASS} --name MyContainerName quay.io/myQuayNameSpace/MyAppName:latest
Very Handy
This creates a very handy way to update User-Data scripts without the need to create a new Launch Config every time you need to make a minor change. And it does a great job of getting env vars out of your codebase and into a narrow, controllable space (the Instance-Init.sh script itself).
But it all feels a little insecure. The idea of putting my master DB creds into a file on S3 is unsettling to say the least.
The Questions
Is this a common practice or am I dreaming up a bad idea here?
Does the fact that the file is downloaded and stored (albeit briefly) on the fresh instance constitute a vulnerability at all?
Is there a better method for deleting the file in a more secure way?
Does it even matter whether the file is deleted after it's run? Considering the secrets are being transferred to env vars it almost seems redundant to delete the Instance-Init.sh file.
Is there something that I'm missing in my nascent days of ops?
Thanks for any help in advance.

What you are describing is almost exactly what we are using to instantiate Docker containers from our registry (we now use v2 self-hosted/private, s3-backed docker-registry instead of Quay) into production. FWIW, I had the same "this feels rickety" feeling that you describe when first treading this path, but after almost a year now of doing it -- and compared to the alternative of storing this sensitive configuration data in a repo or baked into the image -- I'm confident it's one of the better ways of handling this data. Now, that being said, we are currently looking at using Hashicorp's new Vault software for deploying configuration secrets to replace this "shared" encrypted secret shell script container (say that five times fast). We are thinking that Vault will be the equivalent of outsourcing crypto to the open source community (where it belongs), but for configuration storage.
In fewer words, we haven't run across many problems with a very similar situation we've been using for about a year, but we are now looking at using an external open source project (Hashicorp's Vault) to replace our homegrown method. Good luck!

An alternative to Vault is to use credstash, which leverages AWS KMS and DynamoDB to achieve a similar goal.
I actually use credstash to dynamically import sensitive configuration data at container startup via a simple entrypoint script - this way the sensitive data is not exposed via docker inspect or in docker logs etc.
Here's a sample entrypoint script (for a Python application) - the beauty here is you can still pass in credentials via environment variables for non-AWS/dev environments.
#!/bin/bash
set -e
# Activate virtual environment
. /app/venv/bin/activate
# Pull sensitive credentials from AWS credstash if CREDENTIAL_STORE is set with a little help from jq
# AWS_DEFAULT_REGION must also be set
# Note values are Base64 encoded in this example
if [[ -n $CREDENTIAL_STORE ]]; then
items=$(credstash -t $CREDENTIAL_STORE getall -f json | jq 'to_entries | .[]' -r)
keys=$(echo $items | jq .key -r)
for key in $keys
do
export $key=$(echo $items | jq 'select(.key=="'$key'") | .value' -r | base64 --decode)
done
fi
exec $#

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Set environment variables in AWS EMR during bootstrap - bash

Pass the environment variables as arguments to the bootstrap action.

Related

Why does running aws update-service from the cli cause a vim like window to popup expecting some input?

How to set command output as env. variable in windows cmd?

unable to parse JSON using terraform external data

How can I automate entering input for a command in a bash script that runs on AWS EC2 launch?

Is it secure to store EC2 User-Data shell scripts in a private S3 bucket?

Categories

Resources