Why AWS Lambda suggests to set up two subnets if VPC is configured? - aws-lambda

Is this because of the IP availability?
I've always thought that creating a single huge subnet instead of creating two with the same size, is exactly the same. I haven't experienced any perfomance issues by doing this, but I haven't found anything in the docs to confirm that this is a valid way.
Why AWS Lambda suggests to configure these two subnets? is there a technical reason for that?
Thanks in advance.

It is not "only" for performance, it is for high availability(more fault-tolerant) according to here
It's a best practice to create multiple private subnets across different Availability Zones for redundancy and so that Lambda can ensure high availability for your function.
Resilience documentation

Related

Is there a feature for setting Min/Max/Fixed function/action replica in Openwhisk?

I have an Openwhisk setup on Kubernetes using [1]. For some study purpose, I want to have a fixed number of replicas/pods for each action that I deploy, essentially disabling the auto-scaling feature.
Similar facility exists for OpenFaas [2], where during deployment of a function, we can configure the system to have N function replicas at all times. These N function replicas (or pods) for the given function will always be present.
I assume this can be configured somewhere while deploying an action, but being a beginner in OpenWhisk, I could not find a way to do this. Is there a specific configuration that I need to change?
What can I do to achieve this in Openwhisk? Thanks :)
https://github.com/apache/openwhisk-deploy-kube
https://docs.openfaas.com/architecture/autoscaling/#minmax-replicas
OpenWhisk serverless functions follow closer to AWS lambda. You don’t set the number of replicas. OpenWhisk uses various heuristics and can specialize a container in milliseconds and so elasticity on demand is more practical than kube based solutions. There is no mechanism in the system today to set minimums or maximums. A function gets to scale proportional to the resources available in the system and when that capacity is maxed out, requests will queue.
Note that while AWS allows one to set the max concurrency, this isn’t the same as what you’re asking for, which is a fixed number of pre-provisioned resources.
Update to answer your two questions specifically:
Is there a specific configuration that I need to change?
There isn’t. This feature isn’t available at user level or deployment time.
What can I do to achieve this in Openwhisk?
You can modify the implementation in several ways to achieve what you’re after. For example, one model is to extend the stem-cell pool for specific users or functions. If you were interested in doing something like this, the project Apache dev list is a great place to discuss this idea.

Block assignation using network topology

If I well understood principles when applying network topology, blocks are written:
On the client server if hosting a datanode
On a second server defined on a different rack
On a third server defined on the same rack as #2
Is this policy configurable or it is “hard-written” in class? Of course, I do not want to modify any class by myself…
Basically, I would like to:
Take into account datacenter (according to what I read, HDFS do not care datacenters even if using network topology)
Force the write in 3 distinct racks
How do I do that?
There is a capability to override the baseline block allocation algorithm but it does involve writing quite a bit of Java code and there aren't any real good examples out there. Here is a blog with a link to the JIRA ticket explaining the enhancement:
http://hadoopblog.blogspot.com/2009/09/hdfs-block-replica-placement-in-your.html
https://issues.apache.org/jira/browse/HDFS-385

S3 Ruby Client - when to specify regional endpoint

I have buckets in 2 AWS regions. I'm able to perform puts or gets against both buckets without specifying the regional endpoint(the ruby client defaults to us-east-1).
I haven't found much relevant info on how requests on a bucket reach the proper regional endpoint when the region is not specified. From what I've found(https://github.com/aws/aws-cli/issues/223#issuecomment-22872906), it appears that requests are routed to the bucket's proper region via DNS.
Does specifying the region have any advantages when performing puts and gets against existing buckets? I'm trying to decide whether I need to specify the appropriate region for operations against a bucket or if I can just rely on it working.
Note that the buckets are long lived so the DNS propagation delays mentioned in the linked github issue are not an issue.
SDK docs for region:
http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/Core/Configuration.html#region-instance_method
I do not think that there is any performance benefit to putting/getting data if you specify the bucket. All bucket names are supposed to be unique across all regions. I don't think there's a lot of overhead in that lookup, compared to data throughput.
I welcome comments to the contrary.

How can I assign an Elastic IP to one of the balanced instances?

If you have one instance, and auto scaling needs to create one more, then you have two instances. But when auto scaling wants to remove one because it's not needed, the new or the old one can be removed.
So, the instance I had with the Elastic IP now it's removed ...
How can I apply a Elastic IP always to one of the instances of a auto scaling activity ?
Thank you
Hmm.. You could have a small code that checks if the ip is available, and will attached to one of your instance. You can write it such as when the instance is launch it automatically attached that Elastic IP to itself if that IP is available.
You could create 2x scaling groups as described here.

How fast are EC/2 nodes between each other?

I am looking to setup Amazon EC/2 nodes on rails with Riak. I am looking to be able to sync the riak DBs and if the cluster gets a query, to be able to tell where the data lies and retrieve it quickly. In your opinion(s), is EC/2 fast enough between nodes to query a Riak DB, return the results, and get them back to the client in a timely manner? I am new to all of this, so please be kind :)
I'm not a Riak expert, but if you keep all of your EC2 instances in the same availability zone you should get more than adequate performance. AWS has a gigabit internal network and people have been able to get the full gigabit out of it; see this blog post for an example.
Marc-Andre,
Your best bet is to ask on the mailing list: http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
There's also lots of information on the wiki: http://wiki.basho.com/display/RIAK/Home

Resources