heroku rack:cache .. vs Amazon S3 + Amazon Cloudfront - caching

Using this reference about Heroku Cedar,
https://devcenter.heroku.com/articles/rack-cache-memcached-rails31#rackcache-storage
They recommend using a combination of rack:cache (as entity store) and memcache (as meta store) .. The actual files are stored in the entity store, I believe..
In the guide above they set it to "file:tmp/cache/rack/body"..
Say I want to store static html files in cache and have them expire in 7 days.. Am I better off by using the above rack:cache + memcached combo or would I be better off by just storing all my html files in Amazon S3 + Cloudfront CDN and then running a cron job to delete all html files from my S3 bucket to ensure fresh html pages every seven days.
the logic would be as follows:
If user requests a particular html file from S3 and it does not exist .. my app generates a new html file in S3 and we wait until the next S3 Mass-delete operation via cron.
Files are only generated if users "request for them"... I have about 5 million static files that need to be at most a week old.. I do not want my S3 bucket to fill up with 5 million HTML files unless my visitors are requesting all 5 million html files every week... I estimate they will request only about 10k unique files a week or so..
So my question here is.. which would be more efficient and faster? Storing all my html files in rack:cache entitystore.... with memcache acting as metastore....... OR to run with Amazon S3 + Cloudfront?
I'm looking for 2 angles here:
Which is better for reducing total time to get to user?
Which is better for reducing the load on my webserver?
The solution might address both issues.

Related

Laravel Lumen directly Download and Extract ZIP file to Google Cloud Storage

My goal is to download a large zip file (15 GB) and extract it to Google Cloud using Laravel Storage (https://laravel.com/docs/8.x/filesystem) and https://github.com/spatie/laravel-google-cloud-storage.
My "wish" is to sort of stream the file to Cloud Storage, so I do not need to store the file locally on my server (because it is running in multiple instances, and I want to have the disk size as small as possible).
Currently, there does not seem to be a way to do this without having to save the zip file on the server. Which is not ideal in my situation.
Another idea is to use a Google Cloud Function (eg with Python) to download, extract and store the file. However, it seems like Google Cloud Functions are limited to a max timeout of 9 mins (540 seconds). I don't think that will be enough time to download and extract 15GB...
Any ideas on how to approach this?
You should be able to use streams for uploading big files. Here’s the example code to achieve it:
$disk = Storage::disk('gcs');
$disk->put($destFile, fopen($sourceZipFile, 'r+'));

20 images get uploaded instead of 30

I'm using Laravel with the plugin to create files in AWS S3 (league/flysystem-aws-s3-v3).
I'm having an issue where:
I have an API call with a method in a controller that receives an array of files.
The method reads all the files and uploads them to S3.
For some reason, if I send more than 20 files, only 20 files get uploaded to AWS S3.
Since the plugin for AWS S3 uses Guzzle under the hood, I was thinking it could be related to a timeout or maximum number of calls to be made within a certain period.
Any ideas of what might be causing this?
Looks like a limitation in your php.ini file.
When you install php this is the default configuration:
; Maximum number of files that can be uploaded via a single request
max_file_uploads = 20
Try changing this limit and then restarting your server (apache, nginx, etc)
Please verify your php.ini file with below two values
Please try increasing "upload_max_filesize"
; Maximum allowed size for uploaded files.
upload_max_filesize = 2M
Also check "max_file_uploads" is greater than 20.
; Maximum number of files that can be uploaded via a single request
max_file_uploads = 20

How best to queue file uploads to S3 in a multi server environment?

In short, my API will accept file uploads. I (ultimately) store them in S3, but to save uploading them to S3 on the same request, I queue the upload process and do it in the background.
I was originally storing the file on the server, and in my job, I was queueing the file path, and then grabbing the contents with that file path on the server, and then sending to S3.
I develop/stage on a single server. My production environment will sit behind a load balancer, with 2-3 servers. I realised that my jobs will fail 2/3 of the time as the file that I am linking to in my job may be on a different server and not on the server running the job.
I realised I could just base64_encode the file contents, and just store that in Redis (as opposed to just storing the path of the file). Using the following:
$contents = base64_encode(file_get_contents($file));
UploadFileToCloud::dispatch($filePath, $contents, $lead)->onQueue('s3-uploads');
I have quite a large Redis store, so I am confident I can do this for lots of small files (most likely in my case), but some files can be quite large.
I am starting to have concerns that I may run into issues using this method, most likely issues to do with my Redis store running out of memory.
I have thought about a shared drive between all my instances and revert back to my original method of storing the file path, but unsure.
Another issue I have is if a file upload fails, if it's a big file, can the failed_jobs table handle the amount of data (for example) of a base64 encoded 20mb pdf.
Is base64 encoding the file contents and queuing that the best method? Or, can anyone recommend an alternative means to queue uploading a file upload in a multi server environment?

Update extension for multiple files at once on Amazon S3

I'm having about 1 million files on my S3 bucket and unfortunately these files were uploaded with wrong extension. I need to add a '.gz' extension to every file in that bucket.
I can manage do that by using aws cli:
aws s3 mv bucket_name/name_1 bucket_name/name_1.gz
This works fine but the script is running so slow since it moves the file one by one, in my calculation it'll take up to 1 week, which is not acceptable.
I wonder if we have any better and faster way to achieve this goal ?
You can try S3 Browser which supports multi thread calls.
http://s3browser.com/
I suspect other tools can do multi thread as well, but the CLI doesn't.
There's no renaming feature for S3 files/bucket so you need to move or copy/delete files. If the files are big, it can indeed be a bit slow.
However there's nothing that prevents you to wait for a request to complete to continue with "renaming" the next file in your list, just process it.

GAE/J file store

I am working on a GAE/J based project. One of the requirements is to let the users upload files (doc,ppt,pdf,xls etc).
What options do I have for storing files besides http://code.google.com/p/gae-filestore/
Is it possible to make these files "searchable"?
Blobstore service. It stores files up to 2GB, and its API was intended for user uploaded files. See: http://code.google.com/appengine/docs/java/blobstore/overview.html It isn't indexed, but I believe that some people have been able use Map/Reduce with it: http://ikaisays.com/2010/08/11/using-the-app-engine-mapper-for-bulk-data-import/
Datastore. You can store up to 1 MB as a "blob property".

Resources