Handle credentials in a kubeflow's ContainerOp - continuous-integration

I'm trying to run a kubeflow pipeline setup and I have several environements (dev, staging, prod).
In my pipeline I'm using kfp.components.func_to_container_op to get a pipeline task instance (ContainerOp), and then execute it with the appropriate arguments that allows it to integrate with my s3 bucket:
from utils.test import test
test_op = comp.func_to_container_op(test, base_image='my_image')
read_data_task = read_data_op(
bucket,
aws_key,
aws_pass,
)
arguments = {
'bucket': 's3',
'aws_key': 'key',
'aws_pass': 'pass',
}
kfp.Client().create_run_from_pipeline_func(pipeline, arguments=arguments)
Each one of the environments is using different credentials to connect to it and those credentials are being passed in the function:
def test(s3_bucket: str, aws_key: str, aws_pass: str):
....
s3_client = boto3.client('s3', aws_access_key_id=aws_key, aws_secret_access_key=aws_pass)
s3_client.upload_file(from_filename, bucket_name, to_filename)
so for each environment I need to update the arguments to contain the correct credentials and it makes it very hard to maintain since each time that I want to update from dev to stg to prod I can't simply copy the code.
My question is what is the best approach to pass those credentials?

Ideally you should push any env-specific configurations as close to the cluster as possible (as far away from components).
You can create Kubernetes secret in each environemnt with different creadentials. Then use that AWS secret in each task:
from kfp import aws
def my_pipeline():
...
conf = kfp.dsl.get_pipeline_conf()
conf.add_op_transformer(aws.use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
Maybe boto3 can auto-load the credentials using the secret files and the environment variables.
At least all GCP libraries and utilities do that with GCP credentials.
P.S. It's better to create issues in the official repo: https://github.com/kubeflow/pipelines/issues

Related

How to download data with minio uri

When I run simple kubeflow pipeline on minikube like first pod's output is second pod's input, data seems to be saved in minio. But I did not intend to use it. So to check the output data in minino, I got access to http://localhost:9000/ . Then I reached to login page.
When I run kubectl get secrets to find secret information, I could not find any minio secrets. Also minioadmin and minioadmin for Access Key and Secret Key did not work. How can I fetch data from minio uri?
I define the pipeline like this;
import kfp
import kfp.components as comp
from kfp.components import load_component_from_file
example_component1_op = load_component_from_file("./pipelines/components/example_component1/example_component1.yaml")
example_component2_op = load_component_from_file("./pipelines/components/example_component2/example_component2.yaml")
#kfp.dsl.pipeline(name='example_pipeline_20220820')
def example_pipeline():
example_component1_task = example_component1_op(
input_1='/app/input.txt',
input_2=8,
)
example_component2_task = example_component2_op(
input_1=example_component1_task.outputs['output_1'],
input_2=5,
)
I found the Access Key and the Secret Key.
Access Key: minio
Secret Key: minio123
Ref
https://github.com/kubeflow/pipelines/blob/master/developer_guide.md

Serverless stage environment variables using dotenv (.env)

I'm new to serverless,
So far I was be able to deploy and use .env for the app.
then, under provider in stage property in serverless.yml file, I change it to different stage. I also made new.env.{stage}.
after re-deploy using sls deploy, It still reads the default .env file.
the documentation states:
The framework looks for .env and .env.{stage} files in service directory and then tries to load them using dotenv. If .env.{stage} is found, .env will not be loaded. If stage is not explicitly defined, it defaults to dev.
So, I still don't understand "If stage is not explicitly defined, it defaults to dev". How to explicitly define it?
The dotenv File is choosen based on your stage property configuration. You need to explicitly define the stage property in your serverless.yaml or set it within your deployment command.
This will use the .env.dev file
useDotenv: true
provider:
name: aws
stage: dev # dev [default], stage, prod
memorySize: 3008
timeout: 30
Or you set the stage property via deploy command.
This will use the .env.prod file
sls deploy --stage prod
In your serverless.yml you need to define the stage property inside the provider object.
Example:
provider:
name: aws
[...]
stage: prod
As Feb 2023 I'm going to attempt to give my solution. I'm using the Nx tootling for monorepo (this shouldn't matter but just in case) and I'm using the serverless.ts instead.
I see the purpose of this to be to enhance the developer experience in the sense that it is nice to just nx run users:serve --stage=test (in my case using Nx) or sls offline --stage=test and serverless to be able to load the appropriate variables for that specific environment.
Some people went the route of using several .env.<stage> per environment. I tried to go this route but because I'm not that good of a developer I couldn't make it work. The approach that worked for the was to concatenate variable names inside the serverless.ts. Let me explain...
I'm using just one .env file instead but changing variable names based on the --stage. The magic is happening in the serverless.ts
// .env
STAGE_development=test
DB_NAME_development=mycraftypal
DB_USER_development=postgres
DB_PASSWORD_development=abcde1234
DB_PORT_development=5432
READER_development=localhost // this could be aws rds uri per db instances
WRITER_development=localhost // this could be aws rds uri per db instances
# TEST
STAGE_test=test
DB_NAME_test=mycraftypal
DB_USER_test=postgres
DB_PASSWORD_test=abcde1234
DB_PORT_test=5433
READER_test=localhost // this could be aws rds uri per db instances
WRITER_test=localhost // this could be aws rds uri per db instances
// serverless.base.ts or serverless.ts based on your configuration
...
useDotenv: true, // this property is at the root level
...
provider: {
...
stage: '${opt:stage, "development"}', // get the --stage flag value or default to development
...,
environment: {
STAGE: '${env:STAGE_${self:provider.stage}}}',
DB_NAME: '${env:DB_NAME_${self:provider.stage}}',
DB_USER: '${env:DB_USER_${self:provider.stage}}',
DB_PASSWORD: '${env:DB_PASSWORD_${self:provider.stage}}',
READER: '${env:READER_${self:provider.stage}}',
WRITER: '${env:WRITER_${self:provider.stage}}',
DB_PORT: '${env:DB_PORT_${self:provider.stage}}',
AWS_NODEJS_CONNECTION_REUSE_ENABLED: '1',
}
...
}
When one is utilizing the useDotenv: true, serverless loads your variables from the .env and puts them in the env variable so you can access them env:STAGE.
Now I can access the variable with dynamic stage like so ${env:DB_PORT_${self:provider.stage}}. If you look at the .env file each variable has the ..._<stage> at the end. In this way I can retrieve dynamically each value.
I'm still figuring it out since I don't want to have the word production in my url but still get the values dynamically and since I'm concatenating this value ${env:DB_PORT_${self:provider.stage}}... then the actual variable becomes DB_PORT_ instead of DB_PORT.

AWS CDK Cross Account Lambda Deployment Permission Issue

I followed the following tutorial to create a Lambda deploy pipeline using CDK. When I try to keep everything in the same account it works well.
https://docs.aws.amazon.com/cdk/latest/guide/codepipeline_example.html
But my scenario is slightly different from the example because it involves two AWS accounts instead one. I maintain application source code and pipeline
in the OPS account and this pipeline will deploy the Lambda application to the UAT account.
OPS Account (12345678) - CodeCommit repo & CodePipeline
UAT Account (87654321) - Lambda application
As per the aws following aws documentation (Cross-account actions section) I made the following changes to source code.
https://docs.aws.amazon.com/cdk/api/latest/docs/aws-codepipeline-actions-readme.html
Lambda stack expose deploy action role as follows
export class LambdaStack extends cdk.Stack {
public readonly deployActionRole: iam.Role;
constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
...
this.deployActionRole = new iam.Role(this, 'ActionRole', {
assumedBy: new iam.AccountPrincipal('12345678'), //pipeline account
// the role has to have a physical name set
roleName: 'DeployActionRole',
});
}
}
In the pipeline stack,
new codePipeline.Pipeline(this, 'MicroServicePipeline', {
pipelineName: 'MicroServicePipeline',
stages: [
{
stageName: 'Deploy',
actions: [
new codePipelineAction.CloudFormationCreateUpdateStackAction({
role: props.deployActionRole,
....
})
]
}
]
});
Following is how I initiate stacks
const app = new cdk.App();
const opsEnv: cdk.Environment = {account: '12345678', region: 'ap-southeast-2'};
const uatEnv: cdk.Environment = {account: '87654321', region: 'ap-southeast-2'};
const lambdaStack = new LambdaStack(app, 'LambdaStack', {env: uatEnv});
const lambdaCode = lambdaStack.lambdaCode;
const deployActionRole = lambdaStack.deployActionRole;
new MicroServicePipelineStack(app, 'MicroServicePipelineStack', {
env: opsEnv,
stackName: 'MicroServicePipelineStack',
lambdaCode,
deployActionRole
});
app.synth();
AWS credentials profiles looks liks
[profile uatadmin]
role_arn=arn:aws:iam::87654321:role/PigletUatAdminRole
source_profile=opsadmin
region=ap-southeast-2
when I run cdk diff or deploy I get an error saying,
➜ infra git:(master) ✗ cdk diff MicroServicePipelineStack --profile uatadmin
Including dependency stacks: LambdaStack
Stack LambdaStack
Need to perform AWS calls for account 87654321, but no credentials have been configured.
What have I done wrong here? Is it my CDK code or is it the way I have configured my AWS profile?
Thanks,
Kasun
The problem is with your AWS CLI configuration. You cannot use the CDK CLI natively to deploy resources in two separate accounts with one CLI command. There is a recent blog post on how to tell CDK which credentials to use, depending on the stack environment parameter:
https://aws.amazon.com/blogs/devops/cdk-credential-plugin/
The way we use it is to deploy stacks into separate accounts with multiple CLI commands specifying the required profile. All parameters that need to be exchanged (such as the location of your lambdaCode) is passed via e.g. environment variables.
Just try to use using the environment variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html
Or
~/.aws/credentials
[default]
aws_access_key_id=****
aws_secret_access_key=****
~/.aws/config
[default]
region=us-west-2
output=json
https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html
It works for me.
I'm using cdk version 1.57.0
The issue is in the fact that you have resources that exist in multiple accounts and hence there are different credentials required to create those resources. However, CDK does not understand natively how to get credentials for those different accounts or when to swap between the different credentials. One way to fix this is to use cdk-assume-role-credential-plugin, which will allow you to use a single CDK deploy command to deploy to many different accounts.
I wrote a detailed tutorial here: https://johntipper.org/aws-cdk-cross-account-deployments-with-cdk-pipelines-and-cdk-assume-role-credential-plugin/

Aws Ruby SDK credentials from file

I would like to store my credentials in ~/.aws/credentials and not in environmental variables, but I am struggling.
To load the credentials I use (from here)
credentials = Aws::SharedCredentials.new({region: 'myregion', profile_name: 'myprofile'})
My ~/.aws/credentials is
[myprofile]
AWS_ACCESS_KEY = XXXXXXXXXXXXXXXXXXX
AWS_SECRET_KEY = YYYYYYYYYYYYYYYYYYYYYYYYYYY
My ~/.aws/config is
[myprofile]
output = json
region = myregion
I then define a resource with
aws = Aws::EC2::Resource.new(region: 'eu....', credentials: credentials)
but if I try for example
aws.instances.first
I get the error Error: #<Aws::Errors::MissingCredentialsError: unable to sign request without credentials set>
Everything works if I hard code the keys
According to the source code aws loads credentials automatically only from ENV.
You can create credentials with custom attributes.
credentials = Aws::Credentials.new(AWS_ACCESS_KEY, AWS_SECRET_KEY)
aws = Aws::EC2::Resource.new(region: 'eu-central-1', credentials: credentials)
In your specific case, it seems there is no way to pass custom credentials to SharedCredentials.
If you just do
credentials = Aws::SharedCredentials.new()
it loads the default profile. You should be able to load myprofile by passing in :profile_name as an option.
I don't know if you can also override the region though. You might want to try to loose that option, see how it works.

How can I get the name of an automatically-created Heroku instance?

When an integrated service, like CodeShip CI, runs tests, an instance is spun-up on Heroku to run the CI suite.
How can I get the name of that branch/build-specific Heroku app, in a programmatic manner?
My use case: I want to give a developer heroku-cli access to the staging instance that was spun-up for their branch. Also, I want the instance URL so that QA can check it for accuracy.
I don't know if this is principally a CodeShip question, or a Heroku question. I can solve the rest of the integration, if I simply can get the name/info for this new instance.
Codeship defines certain environment variables with each run.
Thus I can build a URL based on all of the known pieces, per env:
app_name = 'foo'
env = 'staging'
name_parts = [
app_name,
env,
'pr'
ENV['CI_BUILD_NUMBER']
]
testing_url = 'http://' + name_parts.join('-') + '.herokuapp.com'
#=> foo-staging-pr-2729.herokuapp.com
send_to_chatroom(app_name + '_dev_notification')

Resources