How can I minimize the weight of my lambda functions? - aws-lambda

I am trying to build a microservice API using AWS (lambda + s3 + apiGateway) and I have noticed that all my lambdas have the same weight, so it seems like I am uploading the full project to every lambda instead of the needed resources.
Is there any way to upload just the resources I need for each function? Will this minimize the execution time? Is it worth to do it?

Going to answer this in 2 parts:
(1) The obligatory "Why do you care?"
I ask this because I was really concerned too. But after testing, it doesn't seem like the size of the bundle uploaded (the jars in the lib folder of the lambda distribution bundle) seemed to really affect anything expect maybe initial upload time (or maybe S3 usage if you are going that route).
For the sake of sanity, rather than having a bunch of nano projects and bundles, I have a single Java Lambda API module and then I upload the same artifact for every Lambda.
At some point, if it makes sense to separate for whatever reason (micro service architecture, separation of code, etc), then I plan on splitting.
Now having said that, the one things that REALLY seems to affect Java based lambdas is class loading time. You mentioned you use Spring. I would recommend you not use Spring configuration loading as you will probably end up executing a bunch of code you never really need.
Remember, ideally your lambdas should be in the 100ms range.
I had a case where I was using the AWS SDK and initializing the AWSClient was taking 13 seconds! (13000 ms). When I switched to using Python of Node, it went to 56ms...
Remember that you get charged by time, and a 1000x factor is no laughing matter :)
(2) If you've decided on splitting, I'd recommend using the gradle distribution plugin with child projects to make each child project and child project zip distribution "light". I went down this road but realized I would really be splitting my components really fine... and I'd either be duplicating configurations across projects. Or if I made a project dependency, I would simply end up bundling up the entire dependency tree again.
If you already know what you need to cherry pick without relying on gradle / maven to handle the dependencies for you, you can create gradle zip tasks to create different Lambda distribution packages.
AWS documentation: http://docs.aws.amazon.com/lambda/latest/dg/create-deployment-pkg-zip-java.html

You will need to create and build 3 different jars for each of your lambda functions and in each of the jar simply package the classes and their required resources rather than creating a super-set jar that has classes and resources for each of the lambda functions.
This way your jars will get lighter.
For more details about building lambda jars see Building AWS Lambda jar

Related

How to keep service/componnet running while update bundle OSGI

I have implemented 2 services A,B in my bundle. I would like to change the code of service A by building a new jar file and do update command but keep the service B running without start it again.
Sounds like you have 2 services in 1 bundle. The unit of deployment is a bundle, so my recommendation is to split the two services into two bundles. Otherwise, undeploying your existing bundle will naturally also tear down Service B.
Alternatively, in case the API/interface resides in a separate bundle, you could deploy a new service-implementation for A in a separate bundle, with a higher priority, and rewire all uses of the service. Which typically is rather confusing, so it's a distant second place recommendation.
Edit: You comment that you are combining services in a bundle to minimize the number of jars, but you want to update the services independently. Specifically for minimizing the number of jars: Are you trying to solve a problem that you indeed had? I'm mainly working with Liferay, which is fully OSGi, and a plain vanilla installation comes with more than 1000 bundles - the runtime handles it just fine. Make sure you're not preemptively optimizing something that doesn't need optimization.
If your components have different maintenance intervals, then deploy them in different bundles. Period. No use working against the system, which has no problem with the number of bundles at all.

Organization of protobuf files in a microservice architecture

In my company, we have a system organized with microservices with a dedicated git repository per service. We would like to introduce gRPC and we were wondering how to share protobuf files and build libs for our various languages. Based on some examples we collected, we decided at the end to go for a single repository with all our protobuf inside, it seems the most common way of doing it and it seems easier to maintain and use.
I would like to know if you have some examples on your side ?
Do you have some counter examples of companies doing the exact opposite, meaning hosting protobuf in a distributed way ?
We have a distinct repo for protofiles (called schema) and multiple repos for every microservice. Also we never store generated code. Server and client files are generated from scratch by protoc during every build on CI.
Actually this approach works and fits our needs well. But there are two potential pitfalls:
Inconsistency between schema and microservice repositories. Commits to two different git repos are not atomic, so, at the time of schema updates, there is always a little time period when schema is updated, while microservice's repo is not yet.
In case if you use Go, there is a potential problem of moving to Go modules introduced in Go 1.11. We didn't make a comprehensive research on it yet.
Each of our microservices has it's own API (protobuf or several protobuf files). For each API we have separate repository. Also we have CI job which build protoclasses into jar (and not only for Java but for another language too) and publish it into our central repository. Than you just add dependencies to API you need.
For example, we have microservice A, we also have repository a-api (contains only protofiles) which build by job into jar (and to another languages) com.api.a-service.<version>

What is the best folder structure for a serverless project?

I'm starting to work on a new serverless project using AWS Lambda and API gateway.
What is the best way to organize my project, without being locked into one framework such as the serverless framework or chalice?
Here's what I'm using so far.
project-dir/
serverless.yaml (config file)
functions/
function1.py
function2.py
lib/
common_helper_functions.py
tests/
unit/
test1.py
test2.py
functional/
test1.py
test2.py
migrations
resources
cloudformation.templates.json
Do any of you recommend a better way to organize my project? Does each micro-service get a separate git repo? Am I missing other important folders?
Your structure looks good if a bit flat. I like putting code flows together. There are usually multiple functions to get to a result. Those should be grouped. Common functions that cross flows but don't cross projects go into a common folder in project. I base my repo organization on overall ideas. If lambdas cross projects they go in a common repo. Project specific stay in their repo.
Many times the hardest part of using a serverless architecture is finding the code being called. With a good logical grouping you will save yourself many headaches later.

Cloud based CI server that can handle concurrency blocking based on external resources

I've been researching cloud based CI systems for a while now and cannot seem to find any systems that can address a major need of mine.
I'm building CI processes for development on Salesforce, but this question is more generally about builds which rely on an external resource. In our builds, we deploy code into a cloud hosted Salesforce instance and then run the tests in that instance. During a build, the external resource is effectively locked and build failures will occur if two builds target the same external resource at the same time. This means that the normal concurrency model of cloud based CI systems would start tripping over the Salesforce instance (external resource) with a concurrency greater than 1.
To complicate things a bit more, we actually have 5 different external resources for each project (feature, master, packaging, beta, and release) and need to control the concurrency of any builds relying on an external resource to 1. For example, all our feature branches build against the feature external resource. We can identify these builds by the branch name which uses the pattern feature/* and need to ensure that only one feature build runs at a time. However, the feature build doesn't tie up the other 4 external resources so ideally any builds that would need those resources should still be able to run concurrently.
I currently accomplish this in Jenkins using the Throttle Concurrent Builds plugin and assign a throttle group to each build identifying the external resource it relies on. This has been successful at preventing concurrent builds from tripping over external resources.
A few clarifications:
I'm not asking how to reduce concurrency to 1 at the repo level. I know every cloud CI system can do that. I should be able to set repo concurrency to N external resources (in my case, 5).
Ideally, I'd like to be able to use a regex pattern on branch name as the "group" with which to block concurrence. So, a setting like: If branch name matches 'feature/.*' then limit concurrency to 1. I want to avoid having to manually configure new feature branches in the build system and instead match on pattern.
I have to say, it's been nearly impossible to find a restrictive Google search term that would help me answer this question. Hopefully someone out there has faced this problem before and can shed some light for me :)
With Jenkins Pipeline plugin you can set the stage concurrency to 1 - and only 1 thing will pass through that stage at a time. The stage was designed to be able to represent things like this.
https://www.cloudbees.com/blog/parallelism-and-distributed-builds-jenkins
stage "build"
node {
sh './test-the-awesome'
}
stage name: "environment test", concurrency: 1
node {
sh 'tests that lock the environment'
}
You can put the build pipeline in a Jenkinsfile in a repo too: https://documentation.cloudbees.com/docs/cookbook/pipeline-as-code.html (so any branches that build, also obey that lock).
As pointed out by #Jesse Glick in the comments below, perhaps a more general solution (not yet compatible with pipeline) is to use the Lockable Resources Plugin - which will then work across jobs, of any type.
I accomplish this with a Drone.io setup.
Essentially, I use a grunt plugin to access a Redis db hosted externally. It provides semaphore locking on any param you'd like.
Determine if the lock is free for that Env.
If so that Env's Key with a reasonable timeouts
Run the tests
Clear the lock
If the lock is held, get it's expiration time, and sleep until then.
I am not aware of any cloud based CI tools that can manage external resources the way you want to, unless you include the logic as part of the build script, which you've already said you'd prefer not to do. If you decide you want to do that you could do it with Snap CI or Drone or any of the other cloud tools I imagine.
In this sort of situation, I would usually recommend an agent-based system such as Go.cd

How to use R-Osgi to get remote "exported-package"?

R-Osgi provides us a way to call service from a remote OGSi container. WebSite: http://r-osgi.sourceforge.net.
I'm new to R-OSGi and now I want to split my OSGi container into small ones and interact each other by R-Osgi because it's too huge. But it seems R-OSGi only provides a way for Registered Service. We know that the most popular way to interaction between 2 bundles besides Service, "exported-package" is also widely used.
So, is there anyone familiar with R-OSGi and know how to use "exported-package" from remote OSGi container?
Thanks for any response.
If you think about it, attempting to handle the remote importing/exporting of packages is very complex, fragile and prone to error; you'd need to send all bundle lifecycle events over the wire, and honour them in the importing system (which would require caching).
Additionally the framework would need to know ahead of time to use these class definitions (you cannot instantiate a class that references classes that aren't available to your classloader). A remote bundle's classloader may depend on classes from another classloader, this chain could go round and round a network making classloading take ages.
Put another way; your local bundles will never resolve without the class definitions they depend on, and considering there could be thousands+ of potential remote exporters on a network/hardware with very poor SLA, this wouldn't scale well or be very robust considering the fallacies of distributed computing.
If we tried to do remote packages, the framework would need to import all exported packages from all available remote nodes and then select just one to import each bundle export from (this would be arbitrary and if the select node goes down, the whole import remote package process would have to triggered again).
What you need to do is separate you api/interfaces from your implementation, you then distribute the api bundle to all nodes that need it and then use dOSGi to import services.
Apologies if this unclear or waffly but it should explain why it's simply not practical to have remote exported packages.
On a side note; I'm not sure if r-osgi is being actively maintained or is up-to-date with the latest Remote Services Admin spec, from looking at the last commit to SVN trunk was 14/02/2011. There's some alternative dOSGi implementations listed here (but avoid CXF).
EDIT: In terms of deployment, distributing your bundles (and configuration) can be done from an OBR (there are number of public ones and several implementations Felix/Eclipse), or a maven repository can be reappropriated with pax url handler.

Resources