Okay, after a week, or more, my Aurora Cluster is running. This was not really easy but, nevertheless, I got it.
I have a simple aurora file
# copy frontend into the local sandbox
clone_service = Process(
name = 'copy service',
cmdline = 'git clone https://citrullin#bitbucket.org/jakiku/frontend.git frontend')
install_npm_deps = Process(
name = 'install npm dependencies',
cmdline = 'cd frontend && npm install'
)
run_server = Process(
name = 'run server',
cmdline = 'node server.js'
)
# describe the task
run_frontend_service = SequentialTask(
processes = [clone_service, install_npm_deps, run_server],
resources = Resources(cpu = 1, ram = 128*MB, disk=64*MB))
jobs = [
Service(cluster = 'mesos-fr',
environment = 'devel',
role = 'www-data',
name = 'frontend_service',
task = run_frontend_service)
]
Nothing special. I want only define which port I need to use. I checked Resources(port = 3000) but it doesn't work. It's not really a resource, it's an attribute in mesos
Generally speaking you want to avoid static ports with Aurora jobs. Since any number of tasks could land on the same host, there's no good way to guarantee that multiple tasks wouldn't request the same port causing one of them to randomly fail.
The recommended way to solve this problem is to request a port from Mesos using the thermos namespace in your aurora config. For example, if you were to do something like:
run_server = Process(
name = 'run server',
cmdline = 'node server.js --port={{thermos.ports[http]}}'
)
Then Aurora will assign a random port to your task when it is assigned to a host.
The obvious question this raises is how do other things find your service if it's running on a randomly assigned port that can change over time as your task is moved around between hosts. The answer to this is service discovery. If you add announce=Announcer() to your job configuration, then your task will be added to a ServerSet which other tasks can use to discover and communicate with it.
Reference:
Mesos documentation on configuring agents to offer ports.
Aurora documentation on requesting ports here.
Related
I'm trying to run a kubeflow pipeline setup and I have several environements (dev, staging, prod).
In my pipeline I'm using kfp.components.func_to_container_op to get a pipeline task instance (ContainerOp), and then execute it with the appropriate arguments that allows it to integrate with my s3 bucket:
from utils.test import test
test_op = comp.func_to_container_op(test, base_image='my_image')
read_data_task = read_data_op(
bucket,
aws_key,
aws_pass,
)
arguments = {
'bucket': 's3',
'aws_key': 'key',
'aws_pass': 'pass',
}
kfp.Client().create_run_from_pipeline_func(pipeline, arguments=arguments)
Each one of the environments is using different credentials to connect to it and those credentials are being passed in the function:
def test(s3_bucket: str, aws_key: str, aws_pass: str):
....
s3_client = boto3.client('s3', aws_access_key_id=aws_key, aws_secret_access_key=aws_pass)
s3_client.upload_file(from_filename, bucket_name, to_filename)
so for each environment I need to update the arguments to contain the correct credentials and it makes it very hard to maintain since each time that I want to update from dev to stg to prod I can't simply copy the code.
My question is what is the best approach to pass those credentials?
Ideally you should push any env-specific configurations as close to the cluster as possible (as far away from components).
You can create Kubernetes secret in each environemnt with different creadentials. Then use that AWS secret in each task:
from kfp import aws
def my_pipeline():
...
conf = kfp.dsl.get_pipeline_conf()
conf.add_op_transformer(aws.use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
Maybe boto3 can auto-load the credentials using the secret files and the environment variables.
At least all GCP libraries and utilities do that with GCP credentials.
P.S. It's better to create issues in the official repo: https://github.com/kubeflow/pipelines/issues
I have setup IKS and logged in to one of the container's command line.
I need to execute a script on that container that connects to RedisCache's client.
Here is the script (testScript.py) I want to execute -
import redis
r = redis.Redis(host=‘master.some.path.of.redis.url.amazonaws.com’, port=6323, password=‘somePassword’, ssl=True)
r.set(‘foo’, ‘bar’)
value = r.get(‘foo’)
print(value)
I need help in understanding how to be able to setup Redis on the IKS.
I have tried a few ways to get sonarQube running in our AWS environment, all successfully. However, SonarQube is unstable. Whenever Elastic beanstalk recycles an instance, my SonarQube environment is wiped out.
Here is what I tried:
Attempt 1: EC2 instance. I create the EC2 instance off of a bitnami ami imageId: ami-0f9cf81913a6dce27
This seemed like pretty simple process. But I prefer elastic beanstalk environment to manage our sonarQube EC2 instances.
Attempt 2: Create a EB Environment using a single docker instance, with this dockerfile:
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "sonarqube:7.1"
},
"Ports": [{
"ContainerPort": "9000"
}]
}
This created the EB environment. It creates an RDS instance (with mySql 5.x) to store the scan data (in a database called ebdb). The sonarQube server hosts an internal elasticsearch instance locally for it's search data.
I then have to add a few environment variables to support the RDS instance (jdbc username, password, url endpoint, etc).
I then have to configure the sonarQube security side.
No marketplace features are installed. So I add SonarJava, Groovy, and SonarJS.
I add a login user for scans. All good.
Except, occasionally Elastic Beanstalk will have a health issue and drop the current instance, and re-create a new instance.
In this case, everything is still in tact - security: users, passwords, etc. Except the marketplace features are gone. So code scans will fail until I manually add them back.
The schema for single instance docker container is pretty sparse, I did not see any way to further customize w/ the docker file.
Attempt 3: Use multi-instance docker container. The schema is more robust, perhaps I can configure sonarQube more explicitly. e.g. You can pass environment variables, mysql settings, etc.
I was unable to get this to work. I did learn I needed to set the memory above 2 GB, for elasticsearch to start up. But i was unable to get the sonarQube environment to come up.
I might revisit this later.
Attempt 4: use AMI in elastic beanstalk (with terraform aws provider)
main.tf
resource "aws_elastic_beanstalk_application" "sonarqube" {
name = "SonarQube"
description = "SonarQube for nano-services"
}
resource "aws_elastic_beanstalk_environment" "nonprod" {
name = "${var.application-name}"
application = "${aws_elastic_beanstalk_application.sonarqube.name}"
solution_stack_name = "64bit Amazon Linux 2018.03 v2.10.0 running Docker 17.12.1-ce"
wait_for_ready_timeout = "30m"
setting {
namespace = "aws:autoscaling:updatepolicy:rollingupdate"
name = "Timeout"
value = "PT1H"
}
setting {
namespace = "aws:elasticbeanstalk:environment"
name = "ServiceRole"
value = "aws-elasticbeanstalk-service-role"
}
setting {
namespace = "aws:elasticbeanstalk:command"
name = "DeploymentPolicy"
value = "Rolling"
}
setting {
namespace = "aws:elasticbeanstalk:command"
name = "BatchSizeType"
value = "Fixed"
}
setting {
namespace = "aws:elasticbeanstalk:command"
name = "BatchSize"
value = "1"
}
setting {
namespace = "aws:elasticbeanstalk:command"
name = "IgnoreHealthCheck"
value = "true"
}
setting {
namespace = "aws:autoscaling:launchconfiguration"
name = "EC2KeyName"
value = "web-aws-key"
}
setting {
namespace = "aws:autoscaling:launchconfiguration"
name = "IamInstanceProfile"
value = "arn:aws:iam::<redacted>:instance-profile/aws-elasticbeanstalk-ec2-role"
}
setting {
namespace = "aws:autoscaling:launchconfiguration"
name = "instanceType"
value = "t2.xlarge"
}
setting {
namespace = "aws:elb:listener:443"
name = "ListenerProtocol"
value = "SSL"
}
setting {
namespace = "aws:elb:listener:443"
name = "InstanceProtocol"
value = "SSL"
}
setting {
namespace = "aws:elb:listener:443"
name = "SSLCertificateId"
value = "arn:aws:acm:<redacted>"
}
setting {
namespace = "aws:elb:listener:443"
name = "ListenerEnabled"
value = "true"
}
}
Initially I included the sonarQube AMI:
setting {
namespace = "aws:autoscaling:launchconfiguration"
name = "imageId"
value = "ami-0f9cf81913a6dce27"
}
This does create everything. However, the EC2 instances respond too slowly, and EB goes to Grey status. Even though SonarQube is up and running, EB is unaware of it. So I commented this out, and manually modified the image id as a one-off.
wait_for_ready_timeout does assist with this, as that simply keeps terraform from timing out. e.g. It finishes in 22.5 minutes instead of a hard stop at 20 minutes.
In this case, it creates SonarQube with a local mysql database (no RDS instance) w/ elasticsearch being local as well.
SonarQube's market place features are also included, except for Groovy. Which I added.
However, same issue as before. When EB drops an instance and re-creates it, the sonarQube environment is wiped out. This time, the credentials, marketplace features, and everything.
Has anyone run into this problem and figured it out?
I resolved the issue by using ECS (Fargate), instead of the Elastic Beanstalk container.
Steps:
Create an RDS mysql instance in AWS for sonar
Open a mysql shell for this instance, and configure it for sonar, see: Sonar setup with MySql
Create a dockerfile with the plugins you care about, e.g:
FROM sonarqube:latest
ENV SONARQUBE_JDBC_USERNAME=[YOUR-USERNAME] \
SONARQUBE_JDBC_PASSWORD=[YOUR-PASSWORD] \
SONARQUBE_JDBC_URL=jdbc:mysql://[YOUR-RDS-ENDPOINT]:3306/sonar?useSSL=false&useUnicode=true&characterEncoding=utf8&rewriteBatchedStatements=true&useConfigs=maxPerformance
RUN wget "https://sonarsource.bintray.com/Distribution/sonar-java-plugin/sonar-java-plugin-5.7.0.15470.jar" \
&& wget "https://sonarsource.bintray.com/Distribution/sonar-javascript-plugin/sonar-javascript-plugin-4.2.1.6529.jar" \
&& wget "https://sonarsource.bintray.com/Distribution/sonar-groovy-plugin/sonar-groovy-plugin-1.4.jar" \
&& mv *.jar $SONARQUBE_HOME/extensions/plugins \
&& ls -lah $SONARQUBE_HOME/extensions/plugins
EXPOSE 9000
EXPOSE 9092
I exposed 9092 in case i wanted to comment out the mysql connection, and test locally with the internal h2 database at some point.
Verify the docker image runs locally
eval $(docker-machine env)
docker build -t sonar .
docker run -it -d --rm --name sonar -p 9000:9000 -p 9092:9092 sonar:latest
echo $DOCKER_HOST
Open a browser to this ip address, port 9000. e.g. http://192.x.x.x:9000
Create a new ECS repository called sonar to store the docker image.
The AWS interface actually tells you how to publish your docker image, so this should be self-evident.
Tag and push the docker file to the sonar repository
$(aws ecr get-login --no-include-email --region [YOUR-AWS-REGION])
docker tag sonar:latest [YOUR-ECS-DOCKER-IMAGE-URI]/sonar:latest
docker push [YOUR-ECS-DOCKER-IMAGE-URI]/sonar:latest
Create a new fargate cluster, called sonar
Create a new task definition.
For your container, use the ECS docker image URI. I gave mine 6 GB memory and 2 cpus, with 1024 cpu units. Here I exposed port 9000 and 9092. I added the environment vars in the Dockerfile here as well.
Create an ECS service, and include the task. Run it, verify the logs cloudwatch. And hit the public endpoint on port 9000, and done.
I largely borrowed from this: https://www.infralovers.com/en/articles/2018/05/04/sonarqube-on-aws-fargate/
I hope this helps others.
I have a need to provision Windows VMs with knife and run the initial configuration once with Chef... but have the chef-client disabled after that. Unfortunately setting the interval and task variables to 0 in the default.rb attributes file in the chef-client cookbook do not seem to work:
# log_file has no effect when using runit
default['chef_client']['log_file'] = 'client.log'
default['chef_client']['interval'] = '0'
default['chef_client']['splay'] = '0'
default['chef_client']['conf_dir'] = '/etc/chef'
default['chef_client']['bin'] = '/usr/bin/chef-client'
...
# Configuration for Windows scheduled task
default['chef_client']['task']['frequency'] = 'minute'
default['chef_client']['task']['frequency_modifier'] = node['chef_client'] ['interval'].to_i / 0
default['chef_client']['task']['user'] = 'SYSTEM'
default['chef_client']['task']['password'] = nil # Password is only required for none system users
Any ideas on what I need to do?
Just don't run the chef-client recipe at all.
What I ended up doing was altering the windows_service.rb recipe to disable the service:
service 'chef-client' do
action [:disable, :stop]
end
coderanger's answer is probably ok but this will allow an easier answer to events that might need the client to be easily ran in the future if needed.
I have one (EC2) Ubuntu server where bluepill is working just fine to start and monitoring resque processes (and it has done so on other nodes in the past).
I'm setting up a new node, and for some reason on this node bluepill does not recognize that the processes have started and are running, and so keeps creating new ones. I'm a little baffled by what's causing this. The 2 nodes are almost identical; they're both EC2 servers provisioned by the same chef scripts. It is true that the one not working is 'production' and the other 'staging', but there's almost no difference due to that.
Any thoughts or suggestions before I fork the github project and start inserting more monitoring, to try and figure out what's going on? There's been discussion on this list in the past about troubles w/ bluepill and resque, but as I said this is working fine on my staging server, and has worked fine on earlier production servers (although I will note that this new production server is ruby 1.9.3 (vs 1.9.2) and rails 3.2 (vs. 3.1)).
Here's my .pill file (or more specifically, my chef cookbook's template file):
ENV["RAILS_ENV"] = "<%= node.chef_environment %>"
ENV["QUEUE"] = "*"
Bluepill.application("zmx_app") do |app|
app.working_dir = "/srv/zmx/current"
app.uid = "root"
app.gid = "root"
2.times do |i|
app.process("resque-#{i}") do |process|
process.group = "resque"
process.start_command = "rake resque:work"
process.pid_file = "/srv/zmx/current/tmp/pids/resque_workers-#{i}.pid"
process.stop_command = "kill -QUIT {{PID}}"
process.daemonize = true
end
end
end
This turned out to be a bug in bluepill, which I have forked, fixed, and submitted a pull request.
And I'm not sure why I didn't realize that there was, in fact, a difference between my two environments: staging/old prod was on bluepill 0.0.55, my new production environment on 0.0.58.