I have been looking for a while now on setting up worker nodes in a cloud native application. I plan to have an autoscaling group of worker nodes pulling jobs from a queue, nothing special there.
I am just wondering, is there any best practice way to ensure that a (eg. ruby) script is running at all times? My current assumption is that you have a script running that polls the queue for jobs and sleeps for a few seconds or so if a job query returns no new job.
What really caught my attention was the Services key in the Linux Custom Config section of AWS Elastic Beanstalk Documentation.
00_start_service.config
services:
sysvinit:
<name of service>:
enabled: true
ensureRunning: true
files: "<file name>"
sources: "<directory>"
packages:
<name of package manager>:
<package name>: <version>
commands:
<name of command>:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
The example they give is this..
services:
sysvinit:
myservice:
enabled: true
ensureRunning: true
I find the example and documentation extremely vague and I have no idea how to get my own service up and running using this config key, which means I do not even know if this is what I want or need to use. I have tried creating a ruby executable file and putting the name in the field, but no luck.
I asked the AWS forums for more clarification and have received no response.
If anyone has any insight or direction on how this can be achieved, I would greatly appreciate it. Thank you!
I decided not to use the "services" section of the EB config files, instead just using the "commands" ..
I build a service monitor in ruby that monitors a given system process (in this case my service).
The service itself is a script looping infinitely with delays based on long polling times to the queue service.
A cron job runs the monitor every minute and if the service is down it is restarted.
The syntax for files in the documentation seems to be wrong. The following works for me (note square brackets instead of quotation marks):
services:
sysvinit:
my_service:
enabled: true
ensureRunning: true
files : [/etc/init.d/my_service]
Related
I have currently dockerized my DBT solution and I launch it in AWS Fargate (triggered from Airflow). However, Fargate requires about 1 minute to start running (image pull + resource provisioning + etc.), which is great for long running executions (hours), but not for short ones (1-5 minutes).
I'm trying to run my docker container in AWS Lambda instead of in AWS Fargate for short executions, but I encountered several problems during this migration.
The one I cannot fix is related to the bellow message, at the time of running the dbt deps --profiles-dir . && dbt run -t my_target --profiles-dir . --select my_model
Running with dbt=0.21.0
Encountered an error:
[Errno 38] Function not implemented
It says there is no function implemented but I cannot see anywhere which is that function. As it appears at the time of installing dbt packages (redshift and dbt_utils), I tried to download them and include them in the docker image (set local paths in packages.yml), but nothing changed. Moreover, DBT writes no logs at this phase (I set the log-path to /tmp in the dbt_project.yml so that it can have write permissions within the Lambda), so I'm blind.
Digging into this problem, I've found that this can be related to multiprocessing issues within AWS Lamba (my docker image contains python scripts), as stated in https://github.com/dbt-labs/dbt-core/issues/2992. I run DBT from python using the subprocess library.
Since it may be a multiprocessing issue, I have also tried to set "threads": 1 in profiles.yml but it did not solve the problem.
Does anyone succeeded in deploying DBT in AWS Lambda?
I've recently been trying to do this, and the summary of what I've found is that it seems to be possible, but isn't worth it.
You can pretty easily build a Lambda Layer that includes dbt & the provider you want to use, but you'll also need to patch the multiprocessing behavior and invoke dbt.main from within the Lambda code. Once you've jumped through all those hops, you're left with a dbt instance that is limited to a relatively small upper bound on memory, a 15 minute maximum runtime, and is throttled to a single thread.
This discussion gives an rough example of what's needed to get it running in Lambda: https://github.com/dbt-labs/dbt-core/issues/2992#issuecomment-919288906
All that said, I'd love to put dbt on a Lambda and I hope dbt's multiprocessing will one day support it.
You might not understand what I am trying to say in the question. Before stating the problem, I would give an example to understand even more.
Let's take terraform for the example.
Imagine we use terraform to spin up an EC2 instance in the aws. So, we are currently storing the state locally. Let's say we need to add a tag to the ec2 instance.
So what we do is, we add the this small block to achieve that.
tags = {
Name = "HelloWorld"
}
Nothing complex, just simple as it is. So, as we add a tag, the terraform will look for any changes in the state and as it found a tag has been added, it runs and only add the tag without recreating the whole instance. (There can be other scenarios where it needs to recreate the instance but let's leave those for now)
So, as I mentioned, terraform doesn't recreate the instance for adding a tag. If we want to add another tag, it will add the tag without recreating the instance. Just as simple as that.
doesn't matter how many times we run the terraform apply it doesn't do any changes to existing resources unless we made any in the terraform files.
So, let's now come to the real question.
Let's say we want to install a httpd using ansible. So I write ansible playbook. And it will use the yum package and install and then start and enable the service.
Okay it is simple like that.
Let's say we we run the same playbook for the second time, now will try to execute the same commands from scratch without checking first whether it has executed this playbook before?
I have couple of experiences where I try to install some services and when I try to execute the same playbook again, it fails.
Is it possible to preserve the state in ansible just like we do in terraform so it always only execute the newer changes other than executing from the start on the second run?
You seem to be describing idempotence:
Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application.
It's up to you to make sure your playbooks are idempotent in ansible. You could do this for your example using the package and service modules.
- name: Setup Apache
hosts: webservers
tasks:
- name: Install Apache
package:
name: httpd
state: present
- name: Start and enable Apache
service:
name: httpd
state: started
enabled: yes
Ansible doesn't achieve idempotence by retaining state on the the controller. Each module should check the state of the host it's operating on to determine what changes it needs to make to achieve the state specified by the playbook. If it determines that it doesn't need to make any changes then that's what it should report back to the controller.
I want to write a playbook to synchronize a source file to a destination host and restart tomcat/apache if the file changed. The documentation on synchronize does not give any example on if this is possible. Can anyone provide some pointers?
If you're only changing one file, you probably want to use copy instead of synchronize. However, this approach should work either way.
The handler system is designed for this sort of thing. The documentation there provides an example of bouncing memcached after a configuration file change:
Here’s an example of restarting two services when the contents of a
file change, but only if the file changes:
- name: template configuration file
template: src=template.j2 dest=/etc/foo.conf
notify:
- restart memcached
- restart apache
The things listed in the notify section of a task are called handlers.
Handlers are lists of tasks, not really any different from regular
tasks, that are referenced by a globally unique name, and are notified
by notifiers. If nothing notifies a handler, it will not run.
Regardless of how many tasks notify a handler, it will run only once,
after all of the tasks complete in a particular play.
Here’s an example handlers section:
handlers:
- name: restart memcached
service: name=memcached state=restarted
- name: restart apache
service: name=apache state=restarted
I am writing a small microservices based app, and in it I have a redis instance that some ruby code/containers access to use for via Resque. Currently in my docker compose I am linking like so:
redis:
image: redis:latest
ports:
- '6379:6379'
ruby_worker:
image: my_user/my_image:latest
links:
- redis:db
This works fine(I only name it :db for now cause that is the example I found when looking up linking).
In my ruby code, I have to set up my Resque redis server like:
Resque.redis = ENV['DB_PORT_6379_TCP_ADDR'] + ':6379'
But this just doesn't seem right. It is dependent on that exact redis name, and if I had to spin up a different redis instance(like how I did while playing with docker cloud today), it doesn't find the redis server. In total I have 3 containers(so far) connecting to this redis for resque. A small sinatra based front end, and 2 workers. I am not much of a rails person, and have never used resque before 3 days ago. So sorry if I missed some of the basics.
Is there a better way to connect to my redis instance in my ruby code? Is there a way to pass the redis name in my docker-compose? Right now my resque-web container is configured like below and it seems to work fine:
resque:
image: ennexa/resque-web
links:
- redis:redisserver
ports:
- "5678:5678"
command: "-r redis://redisserver:6379"
Don't use those link environment variables, they are deprecated, and they don't exist in user defined networks anymore.
A good way to do this would be first to default to the name hostname redis. You can always alias any service to have that name, so you should almost never need to change it from the default. For example:
links:
- someservice:redis
To override the default, you can create some application specific env var like REDIS_HOSTNAME, and set it from the environment section (but again, this is almost never necessary):
environment:
REDIS_HOSTNAME=foo
I'm creating an init.d script that will run a couple of tasks when the instance starts up.
it will create a new volume with our code repository and mount it if it doesn't exist already.
it will tag the instance
The tasks above being complete will be crucial for our site (i.e. without the code repository mounted the site won't work). How can I make sure that the server doesn't end up being publicly visible? Should I start my init.d script with de-registering the instance from the ELB (I'm not even sure if it will be registered at that point), and then register it again when all the tasks finished successfully?
What is the best practice?
Thanks!
You should have a health check on your ELB. So your server shouldn't get in unless it reports as happy. And it shouldn't report happy if the boot script errors out.
(Also, you should look into using cloud-init. That way you can change the boot script without making a new AMI.)
I suggest you use CloudFormation instead. You can bring up a full stack of your system by representing it in a JSON format template.
For example, you can create an autoscale group that has an instances with unique tags and the instances have another volume attached (which presumably has your code)
Here's a sample JSON template attaching an EBS volume to an instance:
https://s3.amazonaws.com/cloudformation-templates-us-east-1/EC2WithEBSSample.template
And here many other JSON templates that you can use for your guidance and deploy your specific Stack and Application.
http://aws.amazon.com/cloudformation/aws-cloudformation-templates/
Of course you can accomplish the same using init.d script or using the rc.local file in your instance but I believe CloudFormation is a cleaner solution from the outside (not inside your instance)
You can also write your own script that brings up your stack from the outside by why reinvent the wheel.
Hope this helps.