Which technology is recommended to use as Audio Streaming Web Server - performance

I have a client side software which directly streams audio from a fileserver with public multimedia files exposed.
I'm using AWS S3 like web services, and I'm trying to maintain file hosting costs the lowest (Currently 0$). So any paid solution for data storage has been already reviewed.
The file collection size is really increasing. It might be close to 10TB of files during the next 12 months.
For now I manage around 250Gb of diverse quality mp3 files and images.
I would like to implement a server for streaming multimedia files, and I would like some advice in which server architecture/technology to use for this purpose (Hadoop, Nginx, ..)
First requisites might be:
good I/O management
handling many persistent and durable connections
for streaming.
The file security is not an issue in this question
Any help is welcome.

There's nothing special about audio files vs. any others for this use case. Any web server will do.
You're already using S3, just use that. S3 can serve your files directly, but with any decent load you're going to want to use CloudFront in front of your S3 bucket. CloudFront is a CDN that will distribute your media files from geographically distributed points, keeping things nice and fast for your users. It's also often cheaper to use CloudFront than S3 directly, when you have more traffic.

Related

Where to store thousands of images in S3 or EFS of AWS?

I will make a project in the not too distant future, a project where we will be storing thousands of thousands of images in the course of time. I'm on a hard decision whether to use Amazon S3 or EFS to store those images. Both I think are a very good option, but my question goes to what would be the best service or what would be the best practice?
My application will be done with Laravel and I already did the integration of both services.
Most of the characteristics of the project are:
Most of the files I will store will be photos about 95%.
Approximately 1.5k photos would be stored daily.
The photos are very large (professional cameras).
Traffic to the application will not be much, approx. 100 users at a time.
Each user would consult about 100 photos per day.
What do you recommend?
S3 is absolutely the right answer and practice. I have built numerous applications like you describe, some with 100s of millions of images, and S3 is superior. It also allows for flexibility such as your API returning the images as pre-signed URLs which will reduce load to your servers, images can be linked directly via static web hosting, and it provides lifecycle policies to archive less used data. Additionally, further integration with other AWS services is easy using event triggers.
As for storing/uploading, S3 multi-part upload is very useful to both increase performance and increase reliability.
EFS would make sense for your type of scenario if you were doing some intensive processing where you had a cluster of severs that needed lower latency with a shared file system - think HPC. EFS would also come at a higher cost and doesn't provide as many extensibility options or built-in features as S3. Your scenario doesn't sound like it requires EFS.
http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
For the scenario you proposed AWS S3 is the choice. Why?
Since images are more often added, it costs roughly 1/10 th of EFS.
Less overhead on your web servers since files can be directly uploaded and downloaded with S3.
You can leverage event driven processing with Lambda e.g Generating thumbnail, Image processing filters by S3 Lambda trigger.
Higher level of SLA for availability and durability.
Supporting for inbuilt lifecycle management to archival and reduce cost.
AWS EFS can also be an option if it happens to frequently modify the images (Where EBS is also an option)
You can also consider using AWS CloudFront with either the option to cache images.
Note: At the end its not about using a single service. Based on your upcoming requrements you can choose either one of them or both.

Storing files in a webserver

I have a project using MEAN stack that uploads imagefiles to a server and the names of the images to db. Then the images are shown for users of the applications kinda like an image gallery.
I have been trying to figure out an effiecent way of storing the imagefiles. atm im storing them under the angular application in a folder /var/www/app/files
What are the usual ways of storing them in a cloud server like digital ocean, heroku and many others.
Im a bit thrown off by the fact they offer many options for datastorage.
Lets say that hundres of thousands of images were uploaded by the application to the server.
Saving all of them in inside your front end app in a subfolder might not be the best solution? or am i wrong with this.
I am very new to these webserver cloud services and how they actually operate.
Can someone clarify on what would be the optimal solution.
Thanks!
Saving all of them in inside your front end app in a subfolder might not be the best solution?
You're very right about this. Over time this will get cluttered, and unless you use some very convoluted logic, will slow down your server.
If you're using Angular and this is in the public folder sent to every user, this is even worse.
The best solution to this is using something like an AWS S3 Bucket (DigitalOcean has Block Storage and I believe Heroku has something a bit different). These services offer storage of files, and essentially act as logic-less servers. You can set some privacy policies and other settings, but there's no runtime like NodeJS that can do logic for you.
Your Node server (or any other server setup) interfaces with this storage server, and handles most of the fetching and storing of files. You can optionally limit these storage services so they can only communicate with your Node server, so any file traffic would be done through your Node server.

Caching for I/O intensive S3?

I am writing a video messaging service and the videos will be stored on amazon S3. The nature of video messaging will involve a lot of writing and reading from the S3 storage. Basically as soon as it's written it will be read by another client. I am worried that S3 cannot keep up with the speed and will delay the message delivery time. I already have CloudFront CDN + S3 setup, I wonder if CloudFront is enough to serve as a cache or do I need to setup some sort of memcaching layer above the S3?
CloudFront + S3 should be enough, but do test your assumptions, use multipart upload and measure it all, as this guy did: http://improve.dk/pushing-the-limits-of-amazon-s3-upload-performance/
At the top, I was pushing more than one gigabyte of data to S3 every second - 1117,9 megs/sec to be precise. That is an awful lot of data, all coming from a single machine. Now imagine you scale this out over multiple machines, and you have the network infrastructure to support it - you can really send a lot of data.

Amazon AWS and usage model for S3 storage

There is this example on amazon, a high traffic web application. I noticed that they are using S3 as their content delivery method. I was wondering if I need to have a Web Server for the content delivery and a Web App for my application. I don't understand why they have 2 web servers and 2 web app in the diagram.
And what is the best way to set up a website that serves images and static contents through S3 and the rest of the content through the regular storage.
My last question is, can I consider S3 as a main storage, reliable enough that I can only keep my static content there and don't have a normal storage as a backup ?
That is a very general diagram, specific diagrams will vary depending on the specifics of the overall architecture.
Having said that, I believe the Web Server represents something like Apache or Nginx and the App Server represent something like Rails, Rack Server, Unicorn, Gunicorn, Django, Sinatra, Flask, Jetty, Tomcat, etc. In some cases you can merge the Web Server and the App Server together like for example deploying Apache with python mod_wsgi to run your Django app. (So depends on Architecture)
what is the best way to set up a website that serves images and static
contents through S3 and the rest of the content through the regular
storage.
There's no really best way other than just point your dynamic content to your Databases (SQL and NoSQL) and point your static files to an S3 bucket (images, css, Jquery code, etc) You can also use third party modules depending on your application stack. For example you can accomplish this in Django with the django-storages module. You can find similar modules for other app stacks like Rails.
My last question is, can I consider S3 as a main storage, reliable
enough that I can only keep my static content there and don't have a
normal storage as a backup ?
S3 is pretty reliable, they provide a 99.999999999% reliability of your data. That goes down if you use their RRS (Reduced Redundancy Storage), but if you want to use it you probably want to back up your data in a non RRS bucket anyways. Anyhow, if it's extremely critical data, you are more than free to backup your data somewhere else just in case.
Notice in the diagram that they also recommend using CloudFront for your static files and this is especially useful if your users will be accessing your application from different geographical areas.
Hope this helps.

Simple and Scalable Non-Hosted alternative to Amazon S3

I'm giving users the ability to attach images, videos, audio clips and other file attachments inside an existing web application. Some installs of our product have thousands of users so the volume of data will get very high.
Amazon S3 is the obvious solution but, due to legal reasons, cannot always be used. I need a solution that my customers can host themselves.
I'm therefore looking to build or adopt a file storage system with the following traits:
Not hosted. Must be installable on my customers' Windows servers.
Horizontally scalable to terabytes of storage.
Similar operation to S3 such that I can make both approaches part of my product.
Proven architecture
I've seen several suggestions for this on StackOverflow and other forums (Eucalyptus Walrus, Hadoop HDFS, MongoDB + GridFS, CouchDB, MogileFS) but couldn't find enough information to identify one as simple and proven.
I have experience with CouchDB and would jump on it if I could be sure that it'll do well with terabytes of video files but I haven't found a good success story to lean on.
The closest open source project is OpenStack swift (https://github.com/openstack/swift). It power's RackSpace CloudFiles.
While S3 design is not exactly known OpenStack is built to give very similar functionality.
It is scale-out, have no single point of failure or bottlenecks.

Resources