Amazon S3 Missing Files - caching

We're working on developing user widgets that our members can embed on their websites and blogs. To reduce the load on our servers we'd like to be able to compile the necessary data file for each user ahead of time and store it on our Amazon S3 account.
While that sounds simple enough, I'm hoping there might be a way for S3 to automatically ping our script if a file is requested that for some reason doesn't exist (say for instance if it failed to upload properly). Basically we want Amazon S3 to act as our cache and it to notify a script on a cache miss. I don't believe Amazon provides a way to do this directly but I was hoping that some hacker out there could come up with a smart way to accomplish this (such as mod_rewite, hash table, etc).
Thanks in advance for your help & advice!

Amazon doesn't currently support this, but it looks like it might be coming. For now, what you could do is enable logging and parse the log for 404s once an hour or so.
It's certainly not instant, but it would prevent long-term 404s and give you some visibility about what files are missing.

Related

Storing files in a webserver

I have a project using MEAN stack that uploads imagefiles to a server and the names of the images to db. Then the images are shown for users of the applications kinda like an image gallery.
I have been trying to figure out an effiecent way of storing the imagefiles. atm im storing them under the angular application in a folder /var/www/app/files
What are the usual ways of storing them in a cloud server like digital ocean, heroku and many others.
Im a bit thrown off by the fact they offer many options for datastorage.
Lets say that hundres of thousands of images were uploaded by the application to the server.
Saving all of them in inside your front end app in a subfolder might not be the best solution? or am i wrong with this.
I am very new to these webserver cloud services and how they actually operate.
Can someone clarify on what would be the optimal solution.
Thanks!
Saving all of them in inside your front end app in a subfolder might not be the best solution?
You're very right about this. Over time this will get cluttered, and unless you use some very convoluted logic, will slow down your server.
If you're using Angular and this is in the public folder sent to every user, this is even worse.
The best solution to this is using something like an AWS S3 Bucket (DigitalOcean has Block Storage and I believe Heroku has something a bit different). These services offer storage of files, and essentially act as logic-less servers. You can set some privacy policies and other settings, but there's no runtime like NodeJS that can do logic for you.
Your Node server (or any other server setup) interfaces with this storage server, and handles most of the fetching and storing of files. You can optionally limit these storage services so they can only communicate with your Node server, so any file traffic would be done through your Node server.

Streaming specific setup for audio streaming

Thanks for any help and suggestions.
So I have an amazon ec2 instance (m3.medium which I have suddenly realized yields me 4gb of storage) running with wowza server installed for live streaming audio and audio/video on demand. Everything is running fine...things seem to be going swimmingly as we have been live for a week now.
Everyday we have close to 80 people listen in on the live stream and usually that falls to 20-10 concurrent users listening to archived streams at any given time. We hope to increase this number in time.
We have the live/record app and the vod app that we use for streaming and vod/aod respectively. After the streaming show is done it saves the file to the content folder as you know.
So I was cruising the file system checking out the content folder and thinking that eventually this folder is going to fill up and was curious how people navigate this part of streaming- the storage part. It definitely feels like the easiest route as far as storage goes, though I know of the perils involved in keeping all of these files on the instance.
For storing these sorts of files that need to remain available in perpetuity, which tend to add up space-wise, what is the manner in which people commonly do this?
I tried briefly to mount an s3 and it just didn't work for me. I'm sure that can be fixed but I kept reading that its not recommended to write or stream from s3.
Thanks..any info and leads is a big help. Total newbie here and surprised I even made it this far.
I'll probably need to start a new instance and transfer everything over to a new instance unless there is a way for me to attach more storage to the instance.
Thanks.
S3 is going to be your best option. But s3 is not a block storage device like an EBS volume. So you can really open a file on it and write to like a stream.
Instead, you need break your files apart (on some breakpoint that works for you) and upload the file to s3 after its been completed. With that you can have users directly download from S3 and remove load from your current instance.

Temporary cloud storage and sending script

I'm looking for a way to store small data packages on a temporary place on the cloud, to be lately sent from there and thus erased. It would be also wonderful to be able to create a cloud-script for the sending/erasing task.
This is the scenario: data on the cloud will be sent to different places in different formats, so I need it to be up there until the destination is assigned. When this happens, that particular amount of data will be formatted to the particular destination structure and then sent.
I've been taken a look to several options, and maybe Openkeyval could be the best one, but something seems wrong with it because it doesn't let me post anything to any new location.
Ideas?
Thanks in advance!
You could use Amazon S3 with the curl command or if you have more freedom about installing additional tools the s3cmd gives you more power.
I found a free and nice solution using Dropbox:
https://www.dropbox.com/developers/datastore/tutorial/http
Data can be sent and retrieved using curl. Storage is made in the cloud -no need to have a copy of it in any computer-, and can be retrieve from any machine. Just what I was looking for! :D

Is there an equivalent to Amazon S3's s3cmd sync for Rackspace Cloud Files?

I'm currently using Amazon S3 to host all static content on my site. The site has a lot of static files, so I need an automated way of syncing the files on my localhost with the remote files. I currently do this with s3cmd's sync feature, which works wonderfully. Whenever I run my deploy script, only the files that have changed are uploaded, and any files that have been deleted are also deleted in S3.
I'd like to try Rackspace CloudFiles; however, I can't seem to find anything that offers the same functionality. Is there anyway to accomplish this on Rackspace Cloud Files short of writing my own syncing utility? It needs to have a command-line interface and work on OS X.
The pyrax SDK for the Rackspace Cloud has the sync_folder_to_container() method for cloudfiles that sounds like what you're looking for. It will only upload new/changed files, and will optionally remove files from the cloud that are deleted locally.
As far as the initial upload is concerned, I usually use eventlet to upload files in as asynchronous a manner as possible. The total time will still be limited by your upload speeds (I don't know of any SDK that can get around that), but the non-blocking code will certainly help overall performance.
If you have any other questions, feel free to ask here on on the GitHub page.
-- Ed Leafe
The Rackspace Python SDK can do that for you. There's a script called cf_pyrax.py that does, more or less, what I think you're trying to do. There's a write up on it in this blog post.

What's a good alternative to page caching on Heroku?

I understand page caching isn't a good option on heroku since each dyno has an emepheral file system (so they wouldn't share files and it would get wiped out on each restart).
So I'm wondering what the best alternative is. I have a large amount of potential files that could get generated in a traditional page caching scenario (say 10GB-100GB) so redis/memcached don't seem like good options here. Redis can write out to disk, but my understanding is that once you exceed it's memory capacity, it's not the right solution to start reading off of disk.
Has anyone found a good solution here? I'm thinking maybe MongoStore. (And some way to run this in conjunction with redis since I'm using redis for some other scenarios.) Thanks!
If your site is 100% static content and never going to be dynamic, S3 may be a good option. You can then create a CNAME to the s3 domain. This allows you to leverage CloudFront should you need it. Otherwise, 100GB would have to go into the database, which is in turn then pulled up by your application.
Heroku's cedar stack allows for custom buildpacks. This one vendors nginx. This would be good if you envision transitioning to a more dynamic site.

Resources