Google Cloud Storage API write Limits - google-api

Reading the documentation on here I am still not clear of the following points:
Are there any limits on the size of the API request to the Google cloud Storage bucket? (We need to transfer PDFs from a CRM to Google Cloud Storage)
How many files we can send (it mentions a limit of 1000 writes per second) is that the same thing?

Files are the objects you store in Cloud Storage, so the maximum size limit 5 TB should be considered for individual files. There is no limit to write across multiple objects, however buckets support roughly 1000 writes per second and then scale up as needed.
For parallel uploads, please take note that up to 32 objects/files can be composed in a single composition request and a per-project composition rate limit of approximately 1,000 source objects per second.
I also recommend that you take a look at best practices on how to ramp up the request rate

You go with more than 1000 writes per second per bucket - the GCS infrastructure will scale automatically to accommodate it. The only requirement is not to ramp up the load too quickly, namely "double the request rate no faster than every 20 minutes".
https://cloud.google.com/storage/docs/request-rate#ramp-up

Related

Does Oracle Nosql Cloud Service have provision to set max read units consumption per second?

Does Oracle Nosql Cloud Service have provision to set max read units consumption per second. For e.g. In 40K read units, I want to reserve 20K for 1st operation and rest 20K for 2nd operation. In order to make sure 20K is always reserved for 1st operation, I want to set max read units consumption per second for 2nd operation. Is this something possible to do?
The provision values are for the entire table, so if a table has 40K read units /second, it’s up to the application to apportion them per operation.The SDKs have rate limiting support that can help with this. For example, see https://oracle.github.io/nosql-java-sdk/oracle/nosql/driver/NoSQLHandleConfig.html#setDefaultRateLimitingPercentage(double).
Sets a default percentage of table limits to use. This may be useful
for cases where a client should only use a portion of full table
limits. This only applies if rate limiting is enabled using
setRateLimitingEnabled(boolean).
You could use this method in your case.

EC2 host type for a DynamoDB batchWrite call

I have a requirement to bulk upload an excel sheet to a DynamoDB table and the maximum number of rows are 200,000. The website for bulk upload will be used less frequently, so we can assume there are only 1 - 2 bulk uploads being processed at a given time. In the backend, I am using Apache POI API to parse the excel sheet into DynamoDB Items.
Because we can only send up to 25 items in a batchWriteItem call, the currently latency is around 15 minutes (900 seconds) to completely upload all the 200,000 items. Hence I am planning to implement multi threading to execute multiple batchWriteItem API calls in parallel. Can you help me understand which EC2 host types are best suited for multi-threading for this purpose.
Any references will be really helpful.
Normally, multi-threading would be helped by using an Instance Type that has multiple CPUs.
However, you are describing behaviour that is waiting on network rather than CPU. Therefore, it is likely that the operation you describe is not being heavily impacted by CPU Utilization.
The best way to answer your question is to recommend that you experiment with different instance types to find the one that is best for your application's combination of needs:
Pick an instance family (eg m5) and try a few different sizes
Compare this against another family (eg c5) to see whether the improved performance is worth the extra cost
Monitor the application to find the bottleneck, which would either be RAM, CPU, Network or Disk access
Please note that smaller instances have less Network bandwidth, so you might need to choose a larger instance type to avoid being throttled on network bandwidth. This might result in excess CPU that isn't being fully utilized.

what does the s3 prefix means with respect to scale?

From Request Rate and Performance Guidelines - Amazon Simple Storage Service:
Amazon S3 automatically scales to high request rates. For example,
your application can achieve at least 3,500 PUT/POST/DELETE and 5,500
GET requests per second per prefix in a bucket. There are no limits to
the number of prefixes in a bucket. It is simple to increase your read
or write performance exponentially. For example, if you create 10
prefixes in an Amazon S3 bucket to parallelize reads, you could scale
your read performance to 55,000 read requests per second.
Assume the S3 bucket with the folder s3bucket/sample/. There are multiple objects present in the folder. Example : s3bucket/sample/object_1, s3bucket/sample/object_2.
What does the prefix mean in this statement? Is it the full object path like s3bucket/sample or s3bucket/sample/object_1?
Is the 5500 requests per second for the folder - s3bucket/sample or 5500 requests are allowed for every object in the folder?
We have a multiple asset types belonging to one content, would like to understand which of the below 2 option will scale better?
Option 1
s3bucket/contentId_1/assetType_1
s3bucket/contentId_1/assetType_2
s3bucket/contentId_1/assetType_3
s3bucket/contentId_2/assetType_1
s3bucket/contentId_2/assetType_2
s3bucket/contentId_3/assetType_3
or
Option 2
s3bucket/contentId_1_assetType_1
s3bucket/contentId_1_assetType_2
s3bucket/contentId_1_assetType_3
s3bucket/contentId_2_assetType_1
s3bucket/contentId_2_assetType_2
s3bucket/contentId_3_assetType_3
The page says requests per second per prefix in a bucket, which is effectively the same as saying "per directory per bucket".
Frankly, you are unlikely to go anywhere near these performance limits. Large companies with millions of customers might want to use these performance hints, but the vast majority of AWS customers would not approach such levels of usage.
I would suggest you deploy your data in the most meaningful way for your application rather than having to conform to these techniques, at least until you scale to very large usage patterns.

Reaching quota too soon on Youtube Data API V3 - optimizing search.list [duplicate]

I'm building a pretty large app for a client that is going to aggregate feeds from various sources. My client estimates around 900 follow-able users will be in this system to start out, with more being added over time. He wants to update the feed data every 15 minutes, so we would need to update one user feed per second, assuming 900 feeds and a 15 minute TTL. As the requests take a few seconds to complete, we would then need to load balance across a few threads to tackle the queue asynchronously.
Should I be worried about quota errors or hitting any kind of limitations? If so, what are our options?
I've already read their help pages and documentation, but it's very vague; I need concrete numbers. It's not feasible to load test their API to figure out the limitation.
Version 3 of the YouTube Data API has concrete quota numbers listed in the Google API Console where you register for your API Key. You can use 10,000 units per day. Projects that had enabled the YouTube Data API before April 20, 2016, have a default quota of 50,000,000 per day.
You can read about what a unit is here:
https://developers.google.com/youtube/v3/getting-started#quota
A simple read operation that only retrieves the ID of each returned resource has a cost of approximately 1 unit.
A write operation has a cost of approximately 50 units.
A video upload has a cost of approximately 1600 units.
If you hit the limits, Google will stop returning results until your quota is reset. You can apply for more than 1,000,000 requests per day, but you will have to pay for those extra requests.
There is a calculator provided by YouTube to check your usage. It is a good tool to estimate your usage.
https://developers.google.com/youtube/v3/determine_quota_cost
If you need to make more requests than allotted, you can request a higher quota here: https://support.google.com/youtube/contact/yt_api_form

How can I calculate my YouTube API usage?

I'm building a pretty large app for a client that is going to aggregate feeds from various sources. My client estimates around 900 follow-able users will be in this system to start out, with more being added over time. He wants to update the feed data every 15 minutes, so we would need to update one user feed per second, assuming 900 feeds and a 15 minute TTL. As the requests take a few seconds to complete, we would then need to load balance across a few threads to tackle the queue asynchronously.
Should I be worried about quota errors or hitting any kind of limitations? If so, what are our options?
I've already read their help pages and documentation, but it's very vague; I need concrete numbers. It's not feasible to load test their API to figure out the limitation.
Version 3 of the YouTube Data API has concrete quota numbers listed in the Google API Console where you register for your API Key. You can use 10,000 units per day. Projects that had enabled the YouTube Data API before April 20, 2016, have a default quota of 50,000,000 per day.
You can read about what a unit is here:
https://developers.google.com/youtube/v3/getting-started#quota
A simple read operation that only retrieves the ID of each returned resource has a cost of approximately 1 unit.
A write operation has a cost of approximately 50 units.
A video upload has a cost of approximately 1600 units.
If you hit the limits, Google will stop returning results until your quota is reset. You can apply for more than 1,000,000 requests per day, but you will have to pay for those extra requests.
There is a calculator provided by YouTube to check your usage. It is a good tool to estimate your usage.
https://developers.google.com/youtube/v3/determine_quota_cost
If you need to make more requests than allotted, you can request a higher quota here: https://support.google.com/youtube/contact/yt_api_form

Resources