Why would I want less Kinesis when consuming using Lambda? - aws-lambda

I'm using AWS Lambda to consume from Kinesis. My Lambda function doesn't have any requirements on max concurrency. Is there any reason for me to not have the maximum possible number of shards for my stream? I can't see that number of shards would affect cost.

The Amazon Kinesis pricing page shows price per shard hour. Data retention (beyond 24 hours) is also charged per shard hour.
So, yes, it will affect your cost.

Related

What does overProvisioned Memory mean for Lambda in AWS Cloudwatch?

I am trying to learn more about monitoring and analysis of lamda functions in my serverless environment, to understand how to point out 'suspect' lambdas that need attention. I have running through some sample queries in Logs Insights sections, and I have a few lambdas that have this result.
I'm basically trying to understand if this is something that needs fixing quickly, or if it's not a big deal if there is so much overProvisioned memory?
Should I be more worried looking at Duration/Concurrency issues than this metric?
TLDR: overprovisioned memory and duration affects billing cost. Both parameters can be controlled where possible to cost-effective values.
Allocated memory, together with duration and number of times the lambda is executed per month is used for computing billing cost for the month. [1]
Currently, the lambda uses roughly 14% of provisioned memory at maximum load, the remaining fraction can be utilised.
If you're serving a huge amount of request, reducing over-provisioned memory and duration can be cost effective.
My recommendation is to provision memory to be sum of max load plus (50% - 75%) of max load and reviewing the duration.
Concurrency doesn't factor in monthly billing cost.
Some numbers: [2]
Default concurrency limit for functions = 100
Hard set concurrency limit for account = 1000
Reducing the duration, means you can serve more requests at a time.
The concurrency limit per account can be increased when requested to the AWS Support.
Another typical workaround for concurrency issues is to throttle requests using a queue. This may be more costly.
The lambda receiving the request creates a new SNS topic, envelopes it together with request, pushes it to a message queue and returns caller the topic.
Caller receives and subscribes to topic.
Another lambda processes the queue and report status for the job to the topic.
Caller receives message.
Account limit for number of topics is set at 100,000 [3].
This limit can be increased by requesting to AWS Support. Although cleaning up topics that are no longer necessary to keep around can be more suitable.
Having to design through this workarounds for concurrency limits could mean that the application requirements are more suited for traditional web application backed by a long running server.

Elasticsearch drops too many requests -- would a buffer improve things?

We have a cluster of workers that send indexing requests to a 4-node Elasticsearch cluster. The documents are indexed as they are generated, and since the workers have a high degree of concurrency, Elasticsearch is having trouble handling all the requests. To give some numbers, the workers process up to 3,200 tasks at the same time, and each task usually generates about 13 indexing requests. This generates an instantaneous rate that is between 60 and 250 indexing requests per second.
From the start, Elasticsearch had problems and requests were timing out or returning 429. To get around this, we increased the timeout on our workers to 200 seconds and increased the write thread pool queue size on our nodes to 700.
That's not a satisfactory long-term solution though, and I was looking for alternatives. I have noticed that when I copied an index within the same cluster with elasticdump, the write thread pool was almost empty and I attributed that to the fact that elasticdump batches indexing requests and (probably) uses the bulk API to communicate with Elasticsearch.
That gave me the idea that I could write a buffer that receives requests from the workers, batches them in groups of 200-300 requests and then sends the bulk request to Elasticsearch for one group only.
Does such a thing already exist, and does it sound like a good idea?
First of all, it's important to understand what happens behind the scene when you send the index request to Elasticsearch, to troubleshoot the issue or finding the root-cause.
Elasticsearch has several thread pools but for indexing requests(single/bulk) write threadpool is being used, please check this according to your Elasticsearch version as Elastic keeps on changing the threadpools(earlier there was a separate threadpool for single and bulk request with different queue capacity).
In the latest ES version(7.10) write threadpool's queue capacity increased significantly to 10000 from 200(exist in earlier release), there may be below reasons to do it.
Elasticsearch now prefers to buffer more indexing requests instead of rejecting the requests.
Although increasing queue capacity means more latency but it's a trade-off and this will reduce the data-loss if the client doesn't have the retry mechanism.
I am sure, you would have not moved to ES 7.9 version, when capacity was increased, but you can increase the size of this queue slowly and allocate more processors(if you have more capacity) easily through the config change mentioned in this official example. Although this is a very debatable topic and a lot of people consider this as a band-aid solution than the proper fix, but now as Elastic themself increased the queue size, you can also try it, and if you have a short duration of increased traffic than it makes even more sense.
Another critical thing is to find out the root cause why your ES nodes are queuing up more requests, it can be legitimate like increasing indexing traffic and infra reached its limit. but if it's not legitimate you can have a look at my short tips to improve one-time indexing performance and overall indexing performance, by implementing these tips you will get a better indexing rate which will reduce the pressure on write thread pool queue.
Edit: As mentioned by #Val in the comment, if you are also indexing docs one by one then moving to bulk index API will give you the biggest boost.

Kinesis stream / shard - multiple consumers

I have already read some questions about kinesis shard and multiple consumers but I still don't understand how it works.
My use case: I have a kinesis stream with just one shard. I would like to consume this shard using different lambda function, each of them independently. It's like that each lambda function will have it's own shard iterator.
Is it possible? Set multiple lambda consumers ( stream based) reading from the same stream/shard?
Hey Mr Magalhaes I believe the following picture should answer some of your questions.
So to clarify you can set multiple lambdas as consumers on a kinesis stream, but the Lambdas will block each other on processing. If your stream has only one shard it will only have one concurrent Lambda.
If you have one kinesis stream, you can connect as many lambda functions as you want through an event source mapping.
All functions will run simultaneously and fully independent of each other and will constantly be invoked if new records arrive in the stream.
The number of shards does not matter.
For a single lambda function:
"For Lambda functions that process Kinesis or DynamoDB streams the number of shards is the unit of concurrency. If your stream has 100 active shards, there will be at most 100 Lambda function invocations running concurrently. This is because Lambda processes each shard’s events in sequence." [https://docs.aws.amazon.com/lambda/latest/dg/scaling.html]
But there is no limit on how many different lambda consumers you want to attach with kinesis.
Yes, no problem with this !
The number of shards doesn't limit the number of consumers a stream can have.
In you case, it will just limit the number of concurrent invocations of each lambda. This means that for each consumers, you can only have the number of shards of concurrent executions.
Seethis doc for more details.
Short answer:
Yes it will work, and will work concurrently.
Long answer:
Each shared in Kinesis stream has 2MiB/sec read throughput:
https://docs.aws.amazon.com/streams/latest/dev/building-consumers.html
If you have multiple applications (in your case Lambda's). They will share the throughput.
A description taken from the link above:
Fixed at a total of 2 MiB/sec per shard. If there are multiple consumers reading from the same shard, they all share this throughput. The sum of the throughput they receive from the shard doesn't exceed 2 MiB/sec.
If you create (write) less than 1mib/sec of data you should be able to support two "applications" with a single shard.
In general if you have Y shards and X applications it should work properly assuming your total write throughput (mib/sec) is less than 2mib/sec * Y / X and that data is spread equally between shards.
If you require each "Application" to use 2 Mib/sec each, you may enable "Consumers with Enhanced Fan-Out" which "fan-outs" the stream giving each application a dedicated 2 Mib/sec per shard (instead of sharing the throughput).
This is described in the following link:
https://docs.aws.amazon.com/streams/latest/dev/introduction-to-enhanced-consumers.html
In Amazon Kinesis Data Streams, you can build consumers that use a feature called enhanced fan-out. This feature enables consumers to receive records from a stream with throughput of up to 2 MiB of data per second per shard. This throughput is dedicated, which means that consumers that use enhanced fan-out don't have to contend with other consumers that are receiving data from the stream.

multiple consumers per kinesis shard

I read you can have multiple consumer apps per kinesis stream.
http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
however, I heard you can only have on consumer per shard. Is this true? I don't find any documentation to support this, and can't imagine how that could be if multiple consumers are reading from the same stream. Certainly, it doesn't mean the producer needs to repeat content in different shards for different consumers.
Kinesis Client Library starts threads in the background, each listens to 1 shard in the stream. You cannot connect to a shard over multiple threads, that is by-design.
http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record-processor-scaling.html
For example, if your application is running on one EC2 instance, and
is processing one Amazon Kinesis stream that has four shards. This one
instance has one KCL worker and four record processors (one record
processor for every shard). These four record processors run in
parallel within the same process.
In the explanation above, the term "KCL worker" refers to a Kinesis consumer application. Not the threads.
But below, the same "KCL worker" term refers to a "Worker" thread in the application; which is a runnable.
Typically, when you use the KCL,
you should ensure that the number of instances does not exceed the
number of shards (except for failure standby purposes). Each shard is
processed by exactly one KCL worker and has exactly one corresponding
record processor, so you never need multiple instances to process one
shard.
See the Worker.java class in KCL source.
Late to the party, but the answer is that you can have multiple consumers per kinesis shard. A KCL instance will only start one process per shard, but you can have another KCL instance consuming the same stream (and shard), assuming the second one has permission.
There are limits, though, as laid out in the docs, including:
Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second.
If you want a stream with multiple consumers where each message will be processed once, you're probably better off with something like Amazon Simple Queue Service.
to keep it simple, you can have multiple/different lambda functions get triggered on kinesis data. this way your both the lambdas are going to get all the data from the kinesis. The downside is that now you will have to increase the throughput at the kinesis level which is going to pricey. Use SQS instead for your use case.

monitor incoming http requests to a website with a loadbalancer

I am stuck with the problem of monitoring http requests of a website with an internet-facing loadbalancer. To be specific, I have hosted a website that uses a server farm of AWS EC2 instances with a loadbalancer (ELB) at the front. Now I want to get an idea about the request arrival rate per second (or per minute) to scale the server farm.
I have thought of an approach to perform this task online. The idea is to get the ELB log each minute and parsing it for http request count for the last minute. Just wondering whether there is any efficient way to do it online.
Any help would be highly appreciated.
Your best bet is to use AWS's cloudwatch to do the monitoring for you:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_MonitoringLoadBalancerWithCW.html
Elastic Load Balancing publishes data points to Amazon CloudWatch
about your load balancers and your back-end application instances.
CloudWatch allows you to retrieve statistics about those data points
as an ordered set of time-series data, known as metrics. Think of a
metric as a variable to monitor, and the data points represent the
values of that variable over time. Each data point has an associated
time stamp and (optionally) a unit of measurement. For example, total
number of healthy EC2 instances behind a load balancer over a
specified time period can be a metric.
Amazon CloudWatch provides statistics based on the metric data points
published by Elastic Load Balancing. Statistics are metric data
aggregations over specified periods of time. The following statistics
are available: Minimum (min), Maximum (max), Sum, Average, and Count.
When you request statistics, the returned data stream is identified by
the metric name and a dimension. A dimension is a name/value pair that
helps you to uniquely identify a metric. For example, you can request
statistics of all the healthy EC2 instances behind a load balancer
launched in a specific Availability Zone.

Resources