I'm starting to work on a new serverless project using AWS Lambda and API gateway.
What is the best way to organize my project, without being locked into one framework such as the serverless framework or chalice?
Here's what I'm using so far.
project-dir/
serverless.yaml (config file)
functions/
function1.py
function2.py
lib/
common_helper_functions.py
tests/
unit/
test1.py
test2.py
functional/
test1.py
test2.py
migrations
resources
cloudformation.templates.json
Do any of you recommend a better way to organize my project? Does each micro-service get a separate git repo? Am I missing other important folders?
Your structure looks good if a bit flat. I like putting code flows together. There are usually multiple functions to get to a result. Those should be grouped. Common functions that cross flows but don't cross projects go into a common folder in project. I base my repo organization on overall ideas. If lambdas cross projects they go in a common repo. Project specific stay in their repo.
Many times the hardest part of using a serverless architecture is finding the code being called. With a good logical grouping you will save yourself many headaches later.
Related
So we're suffering a lot with a microservice architecture.
Each microservice has its own repo and CI/CD (github action deploy service on merge to main)
But there are a lot of caveats, among them: They share a lot of code, so we're using internal artifactory to publish shared libraries. And that comes with a lot of trouble too.
For shared artifacts we used a monorepo and that made everything way easier, and that got us thinking: Well, let's make a monorepo for all the microservices (they're not that many anyway).
But then, how do we do CI/CD
A branch per microservice (like a "main" for every microservice)?
A smart script that identifies which subprojects were modified?
Some info in the commit comment that tells Github action which subproject to deploy?
Those all seem like ugly workarounds. Has anyone experienced this, and solved it elegantly?
I have question about microservices and repositories. We are a small team (5 people) and we creating new project in microservices. Expected microservice applications in our project is between 10-15.
We are thinking about one repository for all microservices in structure like that:
-/
--/app1
--/app2
--/app3
-./script.sh
-./script.bat
What do you think about this design? Can you recommend something better? We think if we will have repository per app it will be overkill for that small project in one team. As our applications you can imagine spring boot or spa applications in angular. Thank you in advice.
In general you can have all your micro-services in one repository but I think while the code grows for each of them it can be difficult to manage that.
Here are some things that you might want to consider before deciding to put all your micro-services in one repository:
Developer discipline:
Be careful with coupling of code. Since the code for all your micro-services is in one repository you don't have a real physical boundary between them, so developers can just use some code from other micro-services like adding a reference or similar. Having all micro-services in one repository will require some discipline and rules for developers not to cross boundaries and misuse them.
Come into temptation to create and misuse shared code.
This is not a bad thing if you do it in a proper and structured way. Again this leaves a lot of space for doing it the wrong way. If people just start using the same shared jar or similar that could lead to a lot of problems. In order to have something shared it should be isolated and packaged and ideally should have some versioning with support for backwards compatibility. This way each micro-service when this library is updated would still have a working code with the previous version. Still it is doable in the same repository but as with the 1. point above it requires planing and managing.
Git considerations:
Managing a lot of pull requests and branches in one repository can be challenging and can lead to the situation: "I am blocked by someone else". Also as possibly more people are going to work on the project and will commit to your source branch you will have to do rebase and/or merge source branch to your development or feature branch much more often(even if you do not need the changes from other services). Email notifications configured for the repository can be very annoying as you will receive Emails for things which are not in your micro-service code. In this case you need to create some Filters/Rules in your Email clients to avoid the Emails that you are not interested in.
Number of micro-services grow even further then your initial 10-15. The number can grow? If not, all fine. But if it does, at some point you could maybe consider to split each micro-service in a dedicated repository. Doing this at the point where you are in later stage of project can be challenging and could require some work and in worst case you will find out that there are some couplings that people made over time which you will have to resolve at this stage.
CI pipelines considerations:
If you use something like Jenkins to build, test and/or deploy your code
you could encounter some small configuration difficulties like the integration between Jenkins and GitHub. You would need to configure a pipeline which would only build/test a specific part of the code (or one micro-service) if someone creates a merge/pull request against that micro-service. I never tried to do such a thing but I guess you will have to figure out how to do it (script and automate this). It is doable I guess but will required some work to achieve it.
Conclusion
Still all or most of these points can be resolved with some extra management and configuration but it is still worth knowing what additional effort you could encounter. I guess there are some other points to be taken into considerations as well but my general advice would be to use separate repositories for each micro-service if you can (Private Repository pricing and similar reasons). This is a decision which is made project by project.
I'm working on a micro-services project. For this, I'd like to have one Go package per service, all included in a parent package for the project. It looks like this:
.
└── github.com
└── username
└── project
├── service1
└── service2
I think this structure allows to comply with the Go conventions on package names and paths. A consequence of this is that all my microservices end on the same repository on Github, as the repository will be at depth 3 in the URL. I think this may become an issue in the future if the codebase becomes large. It may also add complexity for the CI/CD pipeline, for example a change to one service would trigger a build for all other services and the code to clone would be unnecessarily large.
Is there a way to avoid this conflict between the Go conventions and the way Github works ? Or this a problem that have to be solved during the CI/CD work ?
What you're talking about is popularly called "monorepo" these days. While I personally like having all my projects in the own independent repositories (including microservices and everything else), there are a number of proponents of having all code for a company in a single repository. Interestingly, both Google and Facebook use monorepos though it must be said they have built a lot of fancy tooling to make that work for them.
One important thing to note is that your repository is a separate thing from your architecture. There is not necessarily any correlation between them. You can have microservices all in a single repo and you can have a monolith divided up into several repos; the repository is only a tool to store and document your code base, nothing more.
While researching the topic, here are some of the advantages and disadvantages taken from a number of articles across the web:
Monorepo Advantages
Ease of sharing modules between projects (even in microservices, there are often cross-cutting concerns)
One single place to see and know what code exists - especially useful in large companies with lots of code
Simplifies automated and manual code review processes
Simplifies documentation rather than pulling from multiple, disconnected repos
Monorepo Disadvantages
Massive codebase can be challenging/slow to check in/out to local
Without very clear, strict guidelines it can be easy to cause tight coupling between products
Requires (slightly) more complex CI/CD tooling to partial-release
Depending on repository platform, very large codebases can affect performance
And here's a good discussion on the pros and cons of monorepos, and here's one specifically related to switching TO a monorepo with microservices architecture. Here's one more with lots of links both pro- and against-monorepo.
Like so many other things in programming and especially in SOA, the right solution for you depends on a number of factors that only you can determine. The main takeaway is that big and small companies have been successful with both options and many in between, so choose carefully but don't worry too much about it.
Versioning sub-packages that a Go project depends on can be track by git tagging. So using Go-modules one is encouraged to move sub-packages into their own git repos.
If the majority of your solution will be written in go, I'd suggest leveraging go modules.
This blog post explains how to manage Go-modules' go.mod with regard to package dependencies and their associated version git tags:
https://blog.golang.org/migrating-to-go-modules
There is a good way to work with separate repositories and Go microservices: Go plugins. In short:
Build a base image that implements shared functionality.
Have that base image look for a Go plugin somewhere in the container
In derived images, compile the functionality as a Go plugin and place it where the base image can find it.
For Go/gRPC, I have put a base image that does this here.
In my company, we have a system organized with microservices with a dedicated git repository per service. We would like to introduce gRPC and we were wondering how to share protobuf files and build libs for our various languages. Based on some examples we collected, we decided at the end to go for a single repository with all our protobuf inside, it seems the most common way of doing it and it seems easier to maintain and use.
I would like to know if you have some examples on your side ?
Do you have some counter examples of companies doing the exact opposite, meaning hosting protobuf in a distributed way ?
We have a distinct repo for protofiles (called schema) and multiple repos for every microservice. Also we never store generated code. Server and client files are generated from scratch by protoc during every build on CI.
Actually this approach works and fits our needs well. But there are two potential pitfalls:
Inconsistency between schema and microservice repositories. Commits to two different git repos are not atomic, so, at the time of schema updates, there is always a little time period when schema is updated, while microservice's repo is not yet.
In case if you use Go, there is a potential problem of moving to Go modules introduced in Go 1.11. We didn't make a comprehensive research on it yet.
Each of our microservices has it's own API (protobuf or several protobuf files). For each API we have separate repository. Also we have CI job which build protoclasses into jar (and not only for Java but for another language too) and publish it into our central repository. Than you just add dependencies to API you need.
For example, we have microservice A, we also have repository a-api (contains only protofiles) which build by job into jar (and to another languages) com.api.a-service.<version>
I am trying to build a microservice API using AWS (lambda + s3 + apiGateway) and I have noticed that all my lambdas have the same weight, so it seems like I am uploading the full project to every lambda instead of the needed resources.
Is there any way to upload just the resources I need for each function? Will this minimize the execution time? Is it worth to do it?
Going to answer this in 2 parts:
(1) The obligatory "Why do you care?"
I ask this because I was really concerned too. But after testing, it doesn't seem like the size of the bundle uploaded (the jars in the lib folder of the lambda distribution bundle) seemed to really affect anything expect maybe initial upload time (or maybe S3 usage if you are going that route).
For the sake of sanity, rather than having a bunch of nano projects and bundles, I have a single Java Lambda API module and then I upload the same artifact for every Lambda.
At some point, if it makes sense to separate for whatever reason (micro service architecture, separation of code, etc), then I plan on splitting.
Now having said that, the one things that REALLY seems to affect Java based lambdas is class loading time. You mentioned you use Spring. I would recommend you not use Spring configuration loading as you will probably end up executing a bunch of code you never really need.
Remember, ideally your lambdas should be in the 100ms range.
I had a case where I was using the AWS SDK and initializing the AWSClient was taking 13 seconds! (13000 ms). When I switched to using Python of Node, it went to 56ms...
Remember that you get charged by time, and a 1000x factor is no laughing matter :)
(2) If you've decided on splitting, I'd recommend using the gradle distribution plugin with child projects to make each child project and child project zip distribution "light". I went down this road but realized I would really be splitting my components really fine... and I'd either be duplicating configurations across projects. Or if I made a project dependency, I would simply end up bundling up the entire dependency tree again.
If you already know what you need to cherry pick without relying on gradle / maven to handle the dependencies for you, you can create gradle zip tasks to create different Lambda distribution packages.
AWS documentation: http://docs.aws.amazon.com/lambda/latest/dg/create-deployment-pkg-zip-java.html
You will need to create and build 3 different jars for each of your lambda functions and in each of the jar simply package the classes and their required resources rather than creating a super-set jar that has classes and resources for each of the lambda functions.
This way your jars will get lighter.
For more details about building lambda jars see Building AWS Lambda jar