AWS Spark Cluster setup errors - hadoop

I have created an AWS keypair.
I am following the instructions here word for word: https://aws.amazon.com/articles/4926593393724923
When I type in "aws emr create-cluster --name SparkCluster --ami-version 3.2 --instance-type m3.xlarge --instance-count 3 --ec2-attributes KeyName=MYKEY --applications Name=Hive --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark"
replacing MYKEY with both the full path and just the name of my key pair (I've tried everything), I get the following error:
`A client error (InvalidSignatureException) occurred when calling the RunJobFlow operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.
The Canonical String for this request should have been
'POST
/
content-type:application/x-amz-json-1.1
host:elasticmapreduce.us-east-1.amazonaws.com
user-agent:aws-cli/1.7.5 Python/2.7.8 Darwin/14.1.0
x-amz-date:20150210T180927Z
x-amz-target:ElasticMapReduce.RunJobFlow
content-type;host;user-agent;x-amz-date;x-amz-target
dbb58908194fa8deb722fdf65ccd713807257deac18087025cec9a5e0d73c572'
The String-to-Sign should have been
'AWS4-HMAC-SHA256
20150210T180927Z
20150210/us-east-1/elasticmapreduce/aws4_request
c83894ad3b43c0657dac2c3ab7f53d384b956087bd18a3113873fceeabc4ae26'`
What am I doing wrong?

GOT IT. Sadly, the above page mentions nothing about having to set the environment variables AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY. You must do this first. I learned you had to do that first from a totally different setup guide: http://spark.apache.org/docs/1.2.0/ec2-scripts.html.
After I set that, the Amazon instructions worked.

Related

Specifying MSK credentials in an AWS CDK stack

I have code that seems to "almost" deploy. It will fail with the following error:
10:55:25 AM | CREATE_FAILED | AWS::Lambda::EventSourceMapping | QFDSKafkaEventSour...iltynotifyEFE73996
Resource handler returned message: "Invalid request provided: The secret provided in 'sourceAccessConfigurations' is not associated with cluster some-valid-an. Please provide a secret associated with the cluster. (Service: Lambda, Status Code: 400, Request ID: some-uuid )" (RequestToken: some-uuid, HandlerErrorCode: InvalidRequest)
I've cobbled together the cdk stack from multiple tutorials, trying to learn CDK. I've gotten it to the point that I can deploy a lambda, specify one (or more) layers for the lambda, and even specify any of several different sources for triggers. But our production Kafka requires credentials... and I can't figure out for the life of me how to supply those so that this will deploy correctly.
Obviously, those credentials shouldn't be included in the git repo of my codebase. I assume I will have to set up a Secrets Manager secret with part or all of the values. We're using scram-sha-512, and it includes a user/pass pair. The 'secret_name' value to Secret() is probably the name/path of the Secrets Manager secret. I have no idea what the second, unnamed param is for, and I'm having trouble figuring that out. Can anyone point me in the right direction?
Stack code follows:
#!/usr/bin/env python3
from aws_cdk import (
aws_lambda as lambda_,
App, Duration, Stack
)
from aws_cdk.aws_lambda_event_sources import ManagedKafkaEventSource
from aws_cdk.aws_secretsmanager import Secret
class ExternalRestEndpoint(Stack):
def __init__(self, app: App, id: str) -> None:
super().__init__(app, id)
secret = Secret(self, "Secret", secret_name="integrations/msk/creds")
msk_arn = "some valid and confirmed arn"
# Lambda layer.
lambdaLayer = lambda_.LayerVersion(self, 'lambda-layer',
code = lambda_.AssetCode('utils/lambda-deployment-packages/lambda-layer.zip'),
compatible_runtimes = [lambda_.Runtime.PYTHON_3_7],
)
# Source for the lambda.
with open("src/path/to/sourcefile.py", encoding="utf8") as fp:
mysource_code = fp.read()
# Config for it.
lambdaFn = lambda_.Function(
self, "QFDS",
code=lambda_.InlineCode(mysource_code),
handler="lambda_handler",
timeout=Duration.seconds(300),
runtime=lambda_.Runtime.PYTHON_3_7,
layers=[lambdaLayer],
)
# Set up the event (managed Kafka).
lambdaFn.add_event_source(ManagedKafkaEventSource(
cluster_arn=prototype_mks,
topic="foreign.endpoint.availabilty.notify",
secret=secret,
batch_size=100, # default
starting_position=lambda_.StartingPosition.TRIM_HORIZON
))
Looking into a code sample, I understand that you are working with Amazon MSK as an event source, and not just self-managed (cross-account) Kafka.
I assume I will have to set up a Secrets Manager secret with part or all of the values
You don't need to setup credentials. If you use MSK with SALS_SCRAM, you already have credentials, which must be associated with MSK cluster.
As you can see from the doc, you secret name should start with AmazonMSK_, for example AmazonMSK_LambdaSecret.
So, in the code above, you will need to fix this line:
secret = Secret(self, "Secret", secret_name="AmazonMSK_LambdaSecret")
I assume you already aware of the CDK python doc, but will just add here for reference.

Unable to create Azure-keyvault-backed secret scope on Azure Databricks

I am not able to create secret scope on Azure Databricks from Databricks CLI. I run a command like this:
databricks secrets "create-scope" --scope "edap-dev-kv" --scope-backend-type AZURE_KEYVAULT --resource-id "/subscriptions/ba426b6f-65cb-xxxx-xxxx-9a1e1656xxxx/resourceGroups/edap-dev-rg/providers/Microsoft.KeyVault/vaults/edap-dev-kv" --profile profile_edap_dev2_dbx --dns-name "https://edap-dev-kv.vault.azure.net/"
I get error msg:
Error: b'<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>\n<title>
Error 400 io.jsonwebtoken.IncorrectClaimException:
Expected aud claim to be: 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d, but was: https://management.core.windows.net/.
</title>\n</head>\n<body><h2>HTTP ERROR 400</h2>\n<p>
Problem accessing /api/2.0/secrets/scopes/create.
Reason:\n<pre> io.jsonwebtoken.IncorrectClaimException:
Expected aud claim to be: 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d,
but was: https://management.core.windows.net/.</pre></p>\n</body>\n</html>\n'
I have tried doing it with both user (personal) and service principal's AAD token. (I've found somewhere that it it should be a AAD token of user account.)
I am able to do it with GUI using same parameters.
In your case, the personal access token was issued for incorrect service - it was issued for https://management.core.windows.net/. but it's required that you use resource ID of the Azure Databricks - 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d.
Simplest way to do that is to use az-cli with following command:
az account get-access-token -o tsv --query accessToken \
--resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

How to write a policy in .yaml for a python lambda to read from S3 using the aws sam cli

I am trying to deploy a python lambda to aws. This lambda just reads files from s3 buckets when given a bucket name and file path. It works correctly on the local machine if I run the following command:
sam build && sam local invoke --event testfile.json GetFileFromBucketFunction
The data from the file is printed to the console. Next, if I run the following command the lambda is packaged and send to my-bucket.
sam build && sam package --s3-bucket my-bucket --template-file .aws-sam\build\template.yaml --output-template-file packaged.yaml
The next step is to deploy in prod so I try the following command:
sam deploy --template-file packaged.yaml --stack-name getfilefrombucket --capabilities CAPABILITY_IAM --region my-region
The lambda can now be seen in the lambda console, I can run it but no contents are returned, if I change the service role manually to one which allows s3 get/put then the lambda works. However this undermines the whole point of using the aws sam cli.
I think I need to add a policy to the template.yaml file. This link here seems to say that I should add a policy such as one shown here. So, I added:
Policies: S3CrudPolicy
Under 'Resources:GetFileFromBucketFunction:Properties:', I then rebuild the app and re-deploy and the deployment fails with the following errors in cloudformation:
1 validation error detected: Value 'S3CrudPolicy' at 'policyArn' failed to satisfy constraint: Member must have length greater than or equal to 20 (Service: AmazonIdentityManagement; Status Code: 400; Error Code: ValidationError; Request ID: unique number
and
The following resource(s) failed to create: [GetFileFromBucketFunctionRole]. . Rollback requested by user.
I delete the stack to start again. My thoughts were that 'S3CrudPolicy' is not an off the shelf policy that I can just use but something I would have to define myself in the template.yaml file?
I'm not sure how to do this and the docs don't seem to show any very simple use case examples (from what I can see), if anyone knows how to do this could you post a solution?
I tried the following:
S3CrudPolicy:
PolicyDocument:
-
Action: "s3:GetObject"
Effect: Allow
Resource: !Sub arn:aws:s3:::${cloudtrailBucket}
Principal: "*"
But it failed with the following error:
Failed to create the changeset: Waiter ChangeSetCreateComplete failed: Waiter encountered a terminal failure state Status: FAILED. Reason: Invalid template property or properties [S3CrudPolicy]
If anyone can help write a simple policy to read/write from s3 than that would be amazing? I'll need to write another one so get lambdas to invoke others lambdas as well so a solution here (I imagine something similar?) would be great? - Or a decent, easy to use guide of how to write these policy statements?
Many thanks for your help!
Found it!! In case anyone else struggles with this you need to add the following few lines to Resources:YourFunction:Properties in the template.yaml file:
Policies:
- S3CrudPolicy:
BucketName: "*"
The "*" will allow your lambda to talk to any bucket, you could switch for something specific if required. If you leave out 'BucketName' then it doesn't work and returns an error in CloudFormation syaing that S3CrudPolicy is invalid.

How to download AWS Lambda Layer

Using the AWS CLI is it possible to download a Lambda Layer?
I have seen this documented command.
https://docs.aws.amazon.com/lambda/latest/dg/API_GetLayerVersion.html
But when I try to run it with something like below.
aws lambda get-layer-version --layer-name arn:aws:lambda:us-east-1:209497400698:layer:php-73 --version-number 7
I get this error.
An error occurred (InvalidParameterValueException) when calling the
GetLayerVersion operation: Invalid Layer name:
arn:aws:lambda:us-east-1:209497400698:layer:php-73
Is downloading a layer possible via the CLI?
As an extra note I am trying to download any of these layers
https://runtimes.bref.sh/
It should be possible to download a layer programmatically using the AWS CLI. For example
# https://docs.aws.amazon.com/cli/latest/reference/lambda/get-layer-version.html
URL=$(aws lambda get-layer-version --layer-name YOUR_LAYER_NAME_HERE --version-number YOUR_LAYERS_VERSION --query Content.Location --output text)
curl $URL -o layer.zip
For the arn's in that web page, I had to use the other api which uses an arn value. For example:
# https://docs.aws.amazon.com/cli/latest/reference/lambda/get-layer-version-by-arn.html
URL=$(aws lambda get-layer-version-by-arn --arn arn:aws:lambda:us-east-1:209497400698:layer:php-73:7 --query Content.Location --output text)
curl $URL -o php.zip
HTH
-James

I have code which run in lambda but not in python

I have code which run in lambda but same is not work on my system.
asgName="test"
def lambda_handler(event, context):
client = boto3.client('autoscaling')
asgName="test"
response = client.describe_auto_scaling_groups(AutoScalingGroupNames=[asgName])
if not response['AutoScalingGroups']:
return 'No such ASG'
...
...
...
my below code i try to run in linux but prompt error "No such ASG"
asgName="test"
client = boto3.client('autoscaling')
response = client.describe_auto_scaling_groups(AutoScalingGroupNames=[asgName])
if not response['AutoScalingGroups']:
return 'No such ASG'
The first thing to check is that you are connecting to the correct AWS region. If not specified, it defaults to us-east-1 (N. Virginia). A region can also be specified in the credentials file.
In your code, you can specify the region with:
client = boto3.client('autoscaling', region_name = 'us-west-2')
The next thing to check is that the credentials are associated with the correct account. The AWS Lambda function is obviously running in your desired account, but you should confirm that the code running "in linux" is using the same AWS account.
You can do this by using the AWS Command-Line Interface (CLI), which will use the same credentials as your Python code on the Linux computer. Run:
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names test
It should give the same result as the Python code running on that computer.
You might need to specify the region:
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names test --region us-west-2
(Of course, change your region as appropriate.)

Resources