No permanent filesystem for Heroku? - heroku

The app I am currently hosting on Heroku allows users to submit photos. Initially, I was thinking about storing those photos on the filesystem, as storing them in the database is apparently bad practice.
However, it seems there is no permanent filesystem on Heroku, only an ephemeral one. Is this true and, if so, what are my options with regards to storing photos and other files?

It is true. Heroku allows you to create cloud apps, but those cloud apps are not "permanent" - they are instances (or "slugs") that can be replicated multiple times on Amazon's EC2 (that's why scaling is so easy with Heroku). If you were to push a new version of your app, then the slug will be recompiled, and any files you had saved to the filesystem in the previous instance would be lost.
Your best bet (whether on Heroku or otherwise) is to save user submitted photos to a CDN. Since you are on Heroku, and Heroku uses AWS, I'd recommend Amazon S3, with optionally enabling CloudFront.
This is beneficial not only because it gets around Heroku's ephemeral "limitation", but also because a CDN is much faster, and will provide a better service for your webapp and experience for your users.

Depending on the technology you're using, your best bet is likely to stream the uploads to S3 (Amazon's storage service). You can interact with S3 with a client library to make it simple to post and retrieve the files. Boto is an example client library for Python - they exist for all popular languages.
Another thing to keep in mind is that Heroku file systems are not shared either. This means you'll have to be putting the file to S3 with the same application as the one handling the upload (instead of say, a worker process). If you can, try to load the upload into memory, never write it to disk and post directly to S3. This will increase the speed of your uploads.
Because Heroku is hosted on AWS, the streams to S3 happen at a very high speed. Keep that in mind when you're developing locally.

Related

Umbraco 8 application with load balancing

I have an Umbraco 8.2.2 application which is hosted in AWS EC2 server.
Recently, I encounter server availability issues that caused downtimes once in a while.
One of the solutions I've thought about is to maintain an additional AWS EC2 server which hosts the same application (Same code, same database) and configure load balancing between them.
It will host both client and server.
To what extent is this possible, in your experience?
How can I handle obstacles like shared media & cache folders, as they should be the same?
I've heard about S3 as an option.
What additional obstacles may I face, and what should I put my focus on?
Thanks.
This sounds like a good use case for Amazon EFS which offers you a shared POSIX file system. You can mount the directories where your media and cache folders are located to the EFS share and then mount the EFS share to the backend EC2 instances that are behind the load balancer. This solution requires very little or no changes to your application itself, you will just be changing the storage media for certain files in your application.
As for obstacles, EFS is a network filesystem, therefore, it is generally not recommended to execute code from your EFS share or to use it for applications that require very low storage latency. If that's the case then you can consider Amazon FSx but that's a very expensive solution.
If you can't avoid executing your code from EFS, just try it out and see how it affects your application's performance. EFS works fine for plenty of web application use cases. Here is a tutorial on how to host a simple website using EFS behind a load balanced environment to get you started.
If EFS is not an option, then you could try to offload your static content to Amazon S3 and serve it through CloudFront. This is probably a cheaper option and offloads a lot of traffic from your load balancer and EC2 instances but it is also probably more work because you have to refractor your application to serve your content through CloudFront. Here is a tutorial (there are plenty more online) on how to create a static website that serves content through CloudFront. In your case, you would be serving the content (i.e. your media files) through S3/CloudFront and then update the links used in your applications to retrieve that content from the CloudFront endpoint instead of retrieving them directly from your application/load balancer endpoint... so the work you need to do is on two fronts, setting up the S3/CloudFront environment, and configuring your application to offload the content to S3 and serve it through CloudFront.

Has anyone ever used Felix Cloud Storage on heroku?

Felix Cloud Storage is a heroku add-on which lets you store files based on AWS S3.
On the free tier you get up to 100 GB-Mo Shared Storage (monthly)
It looks like there is enough space on the shared bucket and therefore I should be able
to upload at least a file.
The issue I have is that when I try to create a shared space, it will throw an error:
I wanted to contact Felix support team but there is no such information. I was wondering
if someone ever used felix-cloud on their Heroku app and if yes, what did you do different?
This is a very rare and special case where for file storage it's not feasible to use AWS S3 or other similar solutions, but it's rather required through a Heroku add-on.

Persistent volume with Heroku Dynos

My Java application is built around an embedded database which stores db data directly to disk. I understand that Heroku by default is built on an ephemeral filesystem and anything stored in it will be removed when dynos restart or just don't stick.
What is the workaround to make such an application available and deployed in Heroku?
I understand that Heroku by default is built on an ephemeral filesystem
This isn't "by default". It's fundamental to Heroku's architecture and cannot be changed. Heroku is designed to be trivially horizontally scalable, and part of that design is that state should exist apart from any one dyno. Dynos are disposable.
What is the workaround to make such an application available and deployed in Heroku?
As far as I know, none exists. Either change how you save your data or choose another host.
(You might be able to mount a shared persistent filesystem on your dynos, but that's awkward and undermines Heroku's architecture. I don't advise it. None of Heroku's offical addons provides a persistent filesystem, and a quick search finds a few blog posts outlining attempts but I don't see any successes.)

Upload videos to Amazon S3 using ruby with sinatra

I am building an android app which has a backend written on ruby/sinatra. The data from the android app is coming in the form of json data.
The database being used is mongodb.
I am able to catch the data on the backend. Now what I want to do is to upload a video on Amazon S3 being sent from the android app in the form of byte array.
I also want to store the video in a form of a string in the local database.
I have been using carrierwave, fog and carrierwave-mongoid gems but didn't have any luck.
These are the some blogs I followed:
https://blog.engineyard.com/2011/a-gentle-introduction-to-carrierwave/
http://www.javahabit.com/2012/06/03/saving-files-in-amazon-s3-using-carrierwave-and-fog-gem/
If someone could just guide me with how to go about it specifically with sinatra and mongodb cause that's where I am facing the main issue.
You might think about using AWS SDK for Android to directly upload to S3 so that your app server thread doesn't get stuck while an user is uploading a file. If you are using a service like Heroku you would be paying extra $$$ just because your user had a lousy connection.
However in this scenario;
Uploading to S3 should be straight forward once you have your mounting in place using carrierwave.
You should never store your video in the database as it will slow you down! DBs are not optimised for files, OSs are. Video is binary data and cannot be stored as text, you would need a blob type if you want to do this crime.
IMO, uploading to S3 is good enough as then you can use Amazon cloudfront CDN services to copy and distribute your content in a more optimised way.

Do I need Amazon's EC2, Cloudfront, RDS?

I want to publish a web site on Amazon's servers, that:
Runs CakePHP
Uses MySQL to store data
Lets users upload audio through flash (currently using a hosted Flash Media Server), and listen to the files later
Do I need Amazon's EC2 for the website, RDS for the MySQL database, and CloudFront for the FMS? I'd really like a walkthrough of which services I should use.
Thanks.
First of all you need EC2 service in order to have a virtual machine, where you can install Apache, PHP and your Web Application.
Then you also need a database server and data repository for the media files. The recommended way is exactly what you suggest: RDS for MySQL and CloudFront as the file repository.
Initially none of the above services (RDS, CloudFront and even EBS) were available. Developers have no way to use a MySQL database, because even if it was installed in an EC2 instance, the instance isn't guaranteed to stay up and running and if the instance is lost, the data is also lost. For this reason EBS was introduced. It created a mounted storage with guaranteed persistence that you could access from the EC2 instance. Theoretically you could install MySQL there and use it to store the flash files. If you only want to serve files through the HTTP protocol, there is no problem using EBS.
CloudFront however has some advantages:
Users are automatically routed to the nearest edge location for high performance delivery of your content.
You can also use it to stream content through the the RTMP protocol.
You don't have to worry about the size of the storage. With EBS you create a storage with a specific size. This could be a problem if you later find out that you need more storage. With CloudFront the files are installed in S3 and you do not need to worry about their size.
You do not waste web server capacity. If you use EBS, the files will be served by the server in EC2.
You could also use S3, but you wouldn't able to use the RTMP protocol and you would need to manually create links to your files. Also, it wouldn't be possible to use your domain name for the files.
RDS also has some advantages over installing MySQL in EC2, EBS:
automated database backups
You can monitor your database with Amazon CloudWatch (free service)
You need EC2 to launch instance and create your LAMP server. RDS is good if you don't need to manage MySql db yourself, but one limiting factor of RDS is you can't have DB replication.
For persistent storage, you can make use EBS or S3 for data file.
One thing not mentioned in any of these replies is the security that may (or may not) need to go around your file access. Cloud networks are good for publicly accessible data, but I haven't seen a cloud network yet that will provide a granular level of file access on a per user basis. While you may be able to obfuscate the url's to access files so that it isn't easy to sequentially guess audio file IDs, that may not be enough if people are keeping private audio. Not saying don't do it, just make the decision with care.

Resources