I creating an app that works like an DMS(Document Management System) so my client will be uploading PDF's, XLS's and DOC's.
You don't want to be uploading anything to Heroku, it has an ephemeral file system which is reset on restarts/deploys. Anything uploaded should be uploaded to a permanent file store like Amazon S3
https://devcenter.heroku.com/articles/dynos#ephemeral-filesystem
From How much disk space on the Dyno can I use? of the heroku help site:
Issue
You need to store temporary files on the Dyno
Resolution
Application processes have full access to the available, unused space on the mounted /app disc, allowing your application to write gigabytes of temporary data files. To find approximately how much space is available for your current Dyno type, run the CLI command heroku run "df -h" --size=standard-1x -a APP_NAME, and check the value for the volume mounted at /app.
Different Dyno types might have different size discs, so it's important that you check with the correct Dyno size
Please note:
Due to the Dynos ephemeral filesystem, any files written to the disc will be permanently destroyed when the Dyno is restarted or cycled. To ensure your files persist between restarts, we recommend using a third party file storage service.
The important part here is that it is not the same value for every plans and is possibly subject to changes with time:
Different Dyno types might have different size discs, so it's important that you check with the correct Dyno size
The correct answer is that it would appear you have 620 GB.
According to this answer: https://stackoverflow.com/a/16938926/3973137
https://policy.heroku.com/aup#quota
Network Bandwidth: 2TB/month - Soft
Shared DB processing: Max 200msec per second CPU time - Soft
Dyno RAM usage: Determined by Dyno type - Hard
Slug Size: 500MB - Hard
Request Length: 30 seconds - Hard
Maybe you should think about storing data on amazon s3?
Related
I currently want to deploy a deep learning REST API using Flask on Heroku. The weights (Its a pre-trained BERT model) are stored here*as a .zip file. Is there a way I can directly deploy these?
From what I currently understand I have to have these uploaded on Github/S3. That's a bit of a hassle and seems pointless since they are already hosted. Do let me know!
Generally you can write a bash script that unzips the content and then you execute your program. However...
Time Concern: Unpacking costs time. And the free tier heroku workers only work for roughly a day before being forcefully restarted. If you are operating a web dyno the restarts will be even more frequent and if it takes too long to boot up the process fails (60 seconds to bind to $PORT)
Size Concern: That zip file is 386 MB big and when unpacked liklier to be even bigger.
Heroku has a slug size limit of 500 MB see: https://devcenter.heroku.com/changelog-items/1145
Once the zip file is unpacked you will be over the limit. The zip file itself + its unpacked content is well over 500 MB. You need to pre-unpack it and make sure the files are less than 500 MB. But given that the data is zipped already 386 MB and unpacked it will be bigger. Furthermore you will rely on some buildpacks (python, javascript, ...) that and processing it will take memory. You will go well over 500 MB.
Which means: You will need to pay for Heroku services or look for a different hosting provider.
My Heroku app has more that 3gb storage capacity and i want to is it true?
enter image description here
based of Heroku site it must have 500MB as you could see here :
Heroku has certain soft and hard limits in using its service. Hard
limits are automatically enforced by the Service. Soft limits are
consumable resources that you agree not to exceed.
Network Bandwidth: 2TB/month - Soft
Shared DB processing: Max 200msec per second CPU time - Soft
Dyno RAM usage: Determined by Dyno type - Hard
Slug Size: 500MB - Hard
Request Length: 30 seconds - Hard
Excuse me, I googled free dyno max storage size but I get some sites like this which have not information about the max capacity of the Heroku free apps!!
enter image description here
I must add that someone else added the my.sassy.girl.s1.web.48-pahe.in file in this rapidly site and I don't know is its size is really 3 GB (but when I trying download it the Firefox browser show it's size in 3 GB), any idea to find out is the size of that file is really 3GB?
Thanks.
The 500 MB limit is that of the slug - the code and other assets in your Git repository that you're deploying.
Heroku dynos also have temporary storage you can utilize, but it's important to note that any files placed on these dynos disappears after any dyno reboot. That means every 24 hours (as well as after any deployment) your files not in your Git repository will all go away.
https://devcenter.heroku.com/articles/dynos#ephemeral-filesystem
User-uploaded files should go to static storage like Amazon S3.
https://devcenter.heroku.com/articles/s3
I recently increased IOPS for a commit log, now should I restart cassandra process to get IOPS in action? I don't find it docs.
Please let me know if you come across anywhere in docs?
When the IOPS is modified on a Provision IOPS Elastic Block Store (EBS) Volume, resources are reprovisioned automatically.
It will take time to adjust the IOPS of the volume -- progress can be monitored via the management console.
Once the volume modification is complete, the volume will operate with a higher IOPS. There is no need to restart an application since the modification is transparent. It will simply operate faster if more requests are sent to the volume.
Azure - How do I increase performance on the same single blob download for 3000 - 18,000 clients all downloading in a 5 minute range? (Can't use CDN because we need the files to be private with SAS).
Requirements:
We can't use CDN because the file or "blob" needs to be private. We’ll generate SAS keys on all the simultaneous download requests.
The files/blobs will be the encrypted exams uploaded 24 or 48 hours before an Exam start time.
-3000 - 18,000 downloads at the same start time in a 5- 10 minute window before the Exam start time.
172 – 1000 blobs. Sizes (53 K Byte – 10 M byte ).
We have a web service that verifies the students info, pin, exam date/time are correct. If correct, generates the URI & SAS.
Azure site said only 480 Mbit/s for a single blob.
But another part of Azure site mentions as high as 20,000 trans/sec # 20 Mbit/sec.
Ideas?
Would snapshot of the blob help?
I thought a snapshot is only helpful if you know the source blob is being updated during a download?
Would premium help?
I read premium just means it’s stored on a SSD for more $) But we need more bandwidth and many clients hitting the same blob.
Would creating say 50 copies of the same Exam help?
Then rotate each client browser through each copy of the file.
Listed on AZURE FORUMS
https://social.msdn.microsoft.com/Forums/azure/en-US/7e5e4739-b7e8-43a9-b6b7-daaea8a0ae40/how-do-i-increase-performance-on-the-same-single-blob-download-for-3000-18000-clients-all?forum=windowsazuredata
I would cache the blobs in memory using a Redis Cache instead of using the blobs as the source. In Azure you can launch a Redis Cache of the appropriate size for your volume. Then you are not limited by the blob service.
When the first file is requested
1. check the Redis-cache for the file.
a.Found - Serve the file from the cache.
b.Not Found - Get the file from the blob and put in the cache. Serve the file.
Next request will use the file from the cache, freeing up the azure blob storage.
This is better than duplicating the file on blob storage since you can set an expire time in the Redis cache and it will clean itself up.
https://azure.microsoft.com/en-us/documentation/articles/cache-configure/
Duplication. Rather than rotating though, give the client a list and have them pick randomly. That will also let them fall back to another server if the first request fails.
You can use SAS keys with the CDN, assuming that you will be using the same SAS key for all users and that you aren't generating a unique SAS for each user. If you are expecting the users to come within a 5-10 minute window then you could generate a single 15 minute SAS and use that with the CDN. Just make sure you also set the cache TTL on the blob to the same duration that the SAS specifies because the CDN won't actually validate the SAS permissions (blob storage will validate it any time the CDN has to fetch the object from origin). See Using Azure CDN with Shared Access Signatures for more information.
Jason's suggestion of using multiple blobs is also a good solution.
You could also spin up several Webrole instances and host the file locally on those instances, then instead of sending users a SAS URL (which could be used by non-authorized users) you could actually authenticate the user and serve the file directly from the Webrole. If all of your traffic will be within a ~10 minute window then you could spin up hundreds of instances and still keep the cost very low.
So far I get an average of 700 kilobytes per second for downloads via chrome hitting an ec2 instance in virginia (us-east region). If I download directly from s3 in virginia (us-east region) I get 2 megabytes per second.
I've simplified this way down to simply running apache and reading a file from a mounted ebs volume. Less than one percent of the time I've seen the download hit around 1,800 kilobytes per second.
I also tried nginx, no difference. I also tried running a large instance with 7GB of Ram. I tried allocating 6GB of ram to the jvm and running tomcat, streaming the files in memory from s3 to avoid the disk. I tried enabling sendfile in apache. None of this helps.
When I run from apache reading from the file system, and use a download manager such as downthemall, I always get 2 megabytes per second when downloading from an ec2 instance in virginia (us-east region). It's as if my apache is configured to only allow 700 megabytes per thread. I don't see any configuration options relating to this though.
What am I missing here? I also benchmarked dropbox downloads as they use ec2 as well, and I noticed I get roughly 700 kilobytes per second there too, which is way slow as well. I imagine they must host their ec2 instances in virginia / us-east region as well based in the speed. If I use a download manager to download files from dropbox I get 2 megabytes a second as well.
Is this just the case with tcp, where if you are far away from the server you have to split transfers into chunks and download them in parrallel to saturate your network connection?
I think your last sentence is right: your 700mbps is probably a limitation of a given tcp connection ... maybe a throttle imposed by EC2, or perhaps your ISP, or the browser, or a router along the way -- dunno. Download managers likely split the request over multiple connections (I think this is called "multi-source"), gluing things together in the right order after they arrive. Whether this is the case depends on the software you're using, of course.