I try to figure out how throttling policies affect EWS.
For EWS, we have these values:
EWSMaxSubscription: the number of active subscriptions done by the impersonated user.
EWSMaxConcurrency: how many concurrent connections or actions a single client may take.
EwsMaxBurst: how far above the standard resource limit a client may go in shorts bursts (in milliseconds). It probably comes into effect when the percentage of CPU/Memory usage by Exchange exceeds the defined threshold (depending of the setup, I suppose).
EwsRechargeRate: the speed at which the user’s resource budget recharges or refills (in milliseconds).
I understand each of the above throttling parameters. However, I'm not sure to clearly understand the EwsCutoffBalance. This parameter defines the resource consumption limits for EWS user before that user is completely blocked from performing operations on a specific component...
My questions...
How this value is used regarding the EwsMaxBurst and the
EwsRechargeRate values?
What is the unit of this parameter?
How can we determine the right value if I need to change the throttling
policy of a specific user account (instead of using "Unlimited")?
Both EwsMaxBurst and the EwsRechargeRate are expressed in ms.
By default, client will be blocked after 5 minutes (300000 / 1000ms /60s) heavy usage. When recharge rate is 900000 by default, system will recharge in 15 minutes. If you like to keep client requests processed, you can either increase maxburst over recharge rate or decrease recharge rate less than maxburst
Related
I am trying to learn more about monitoring and analysis of lamda functions in my serverless environment, to understand how to point out 'suspect' lambdas that need attention. I have running through some sample queries in Logs Insights sections, and I have a few lambdas that have this result.
I'm basically trying to understand if this is something that needs fixing quickly, or if it's not a big deal if there is so much overProvisioned memory?
Should I be more worried looking at Duration/Concurrency issues than this metric?
TLDR: overprovisioned memory and duration affects billing cost. Both parameters can be controlled where possible to cost-effective values.
Allocated memory, together with duration and number of times the lambda is executed per month is used for computing billing cost for the month. [1]
Currently, the lambda uses roughly 14% of provisioned memory at maximum load, the remaining fraction can be utilised.
If you're serving a huge amount of request, reducing over-provisioned memory and duration can be cost effective.
My recommendation is to provision memory to be sum of max load plus (50% - 75%) of max load and reviewing the duration.
Concurrency doesn't factor in monthly billing cost.
Some numbers: [2]
Default concurrency limit for functions = 100
Hard set concurrency limit for account = 1000
Reducing the duration, means you can serve more requests at a time.
The concurrency limit per account can be increased when requested to the AWS Support.
Another typical workaround for concurrency issues is to throttle requests using a queue. This may be more costly.
The lambda receiving the request creates a new SNS topic, envelopes it together with request, pushes it to a message queue and returns caller the topic.
Caller receives and subscribes to topic.
Another lambda processes the queue and report status for the job to the topic.
Caller receives message.
Account limit for number of topics is set at 100,000 [3].
This limit can be increased by requesting to AWS Support. Although cleaning up topics that are no longer necessary to keep around can be more suitable.
Having to design through this workarounds for concurrency limits could mean that the application requirements are more suited for traditional web application backed by a long running server.
My question is related to telecommunications, but it's still pure programming challenge since I'm using a Soft-switch.
Goal:
create algorithm used by call routing engine to fully saturate
available link capacity with traffic sold at highest possible rate
Situation:
there is communications link (E1/T1) with fixed capacity of 30 voice
channels (1 channel = one voice call between end users, so we can have max 30 concurrent calls on each link)
link has fixed cost of running per month, so it's best when it's fully utilized all the time (fixed cost divided by more minutes results in higher profit)
there are users "fighting" for link capacity by sending calls to Call Routing Engine
each user can consume random link capacity at given time, it's possible that one user take whole capacity at one time (ie peek
hours) but consume no capacity in off-peak hours
each user has different call rate per minute
ideal situation: link is fully utilized (24/7/365) with calls made by users with highest call rate per minute
Available control:
call routing engine can accept call and send it using this link or reject the call
Available data:
current link usage
user rate per minute
recent calls per minute per user
user call history (access is costly, but possible)
Example:
user A has rate 1 cent per minute, B 0.8 cent, C 0.7 cent
it's best to accept user A calls and reject others if user A can fill full link capacity
BUT user A usually can't fill whole link capacity and we need to accept calls from others to fill the gap
we have no control on how many calls users will send at given moment, so It's hard to plan what calls to accept and what to reject
Any ideas or suggested approach to this problem?
I suspect that the simplest algorithm you can come up with may be the best - for example if you get a call from user type B or C, simply check if there are any calls from a user type A and if not accept then call.
The reasosn why it may be best to go simplest approach:
Tts easier!
Rejecting calls like this may not be allowed by the regulator depending on the area.
If there really is a strong business opportunity here then a VoIP solution is likely going to be easier and if your client does not ask you do this someone else will likely do it anyway. VoIP as a an alternative transport for high cost TDM legs of calls is a very common approach.
I want to know the limit for API calls per minute/hour for Office365 REST APIs (People, Mail).
Is there any documentation for it?
I have information about the current throttling algorithm, but please be aware that we are constantly revisiting and tweaking it. Any algorithm we share now does not mean we are committing to supporting that mechanism in the future.
There is no hard request rate enforced on the Exchange side. The way current throttling works is that it allows each caller by default 30 minutes (1800000 millisec) per hour of solid thread time on the server. It is a lazy evaluated “token bucket” impl with a rolling window – basically you “recharge” at a rate of 1 second per 2 clock seconds, and you spend clock time on the server. When you get to zero, you are throttled for about 5 minutes.
Hope that helps!
I am going to develop an Android application that enables tracking (and monitoring on map interface) of multiple users by a specific user. For this reason, I want to study on a mBaaS, Parse. However I cannot figure out how much requests per second performed by such an app considering the count of users. To exemplify, if I choose free option for the monthly cost, the limit will be 30 requests per second. I have some doubts about whether this number is sufficient for this app.
In other words, there will be periodic API requests (let's say every 30 seconds) for all users that are tracking. I think it is highly possible to exceed the limit of 30 requests per second with a very few active users. Even if 5 different users track 10 different users at the same time, the probability of catching 30 requests per second is very high.
Considering all these, what kind of strategy you advise? How can I manage periodic geolocation requests in this system? Is Parse the right choice? If not, any better alternative?
The approach used in Traccar GPS tracking system is to return all user's objects in one request. So, say if you want one user to track 100 other users, you still need only one request to get all 100 locations.
You can optimize it further by not sending location if it hasn't changed. So, if only 10 users from 100 changed their location since last request, you can return only 10 location items in response.
We are building a new application in parse and are trying to estimate our requests/second and optimize the application to limit it and keep it below the 30/second. Our app, still in development, makes various calls to parse. Some only use 1 requests, and a few as many as 5 requests. We have tested and verified this in the analytics/events/api requests tab.
However, when I go to the analytics/performance/total requests section, the requests/second rarely go above .2 and are often much lower. I assume this is because this is an average over a minute or more. So I have two questions:
1) does anyone know what the # represents on this total requests/second screen. Is it an average over a certain time period. If so, how much?
2) when parse denies the request due to rate limit, does it deny based on the actual per second, or is it based on an average over a certain time period?
Thanks!
I supposed you have your answer by now but just in case:
You're allowed 30reqs/sec on a free plan, but Parse actually counts it on a per minute basis, or 1800 requests per minute.