How to pipe upload stream of large files to Azure Blob storage via Node.js app on Azure Websites? - ajax

I am building a Video service using Azure Media Services and Node.js
Everything went semi-fine till now, however when I tried to deploy the app to Azure Web Apps for hosting, any large files fail with 404.13 error.
Yes I know about maxAllowedContentLength and not, that is NOT a solution. It only goes up to 4GB, which is pathetic - even HDR environment maps can easily exceed that amount these days. I need to enable users to upload files up to 150GB in size. However when azure web apps recieves a multipart request it appears to buffer it into memory until a certain threshold of either bytes or just seconds (upon hitting which, it returns me a 404.13 or a 502 if my connection is slow) BEFORE running any of my server logic.
I tried Transfer-Encoding: chunked header in the server code, but even if that would help, since Web Apps doesn't let the code run, that doesn't actually matter.
For the record: I am using Sails.js at backend and Skipper is handling the stream piping to Azure Blob Service. Localhost obviously works just fine regardless of file size. I made a duplicate of this question on MSDN forums, but those are as slow as always. You can go there to see what I have found so far: enter link description here
Clientside I am using Ajax FormData to serialize the fields (one text field and one file) and send them, using the progress even to track upload progress.
Is there ANY way to make this work? I just want it to let my serverside logic handle the data stream, without buffering the bloody thing.

Rather than running all this data through your web application, you would be better off having your clients upload directly to a container in your Azure blob storage account.
You will need to enable CORS on your Azure Storage account to support this. Then, in your web application, when a user needs to upload data you would instead generate a SAS token for the storage account container you want the client to upload to and return that to the client. The client would then use the SAS token to upload the file into your storage account.
On the back-end, you could fire off a web job to do whatever processing you need to do on the file after it's been uploaded.
Further details and sample ajax code to do this is available in this blog post from the Azure Storage team.

Related

Shared key for authenticating file uploads in a separate server

In my current project I need to be able to upload images and save them in a store like S3 and perform some operations (resizing, etc) before saving the images. I'm still grasping how this should be done.
I was thinking of creating a separate server to make this image processing and uploads to lower the load on my main application server, I don't know if I should do this or maybe I should but I'm just solving an imaginary scalability problem.
Any way, I need a way to restrict the uploads to the image server. I was thinking since I don't need to distribute keys to create a shared secret between the application and the image server. This secret would be used to create tokens that would be provided to the clients to upload images for a limited amount of time.
If my endpoint in the image server receives the shared secret for authenticating and creating the upload token is it sufficient for security?
Is it enough to have both servers over https to ensure there is no way to steal the secret in a man in the middle attack?
I may have some misconceptions here about security and criptography but I would be really glad if someone could help me out or provide me with some reads that would be good for this case.
Thank you!
To my understanding, you want to allow users to upload images to your site and once uploaded you would like to process it in some way (resizing, etc..) If this is correct here is the workflow I suggest.
Have your main application server create a pre-signed URL for the S3 upload. Send this pre-signed URL to the client. The client will be able to upload to S3. Once the upload is done do the processing in the background with a worker (fetch from S3-process-upload).
This way, you can avoid having to deal with the token for the upload. S3 also supports time limits on the pre-signed URL.
I was thinking of creating a separate server to make this image processing and uploads to lower the load on my main application server, I don't know if I should do this or maybe I should but I'm just solving an imaginary scalability problem.
It makes sense to separate the logic in this case. I wouldn't create a new server, just a background process for image processing. If you need to scale you can always spin up multiple such background processes.
If my endpoint in the image server receives the shared secret for authenticating and creating the upload token is it sufficient for security?
As I suggested above, use S3 for receiving the image upload. Clients will upload to S3 directly, it'll deal with security for you.
Is it enough to have both servers over https to ensure there is no way to steal the secret in a man in the middle attack?
HTTPS provides confidentiality which is enough to prevent man-in-the-middle sniffing of your traffic. Using AWS's pre-signed URLs you also won't be sending the secret over but a so-called signature. Here is an intro to the topic and Amazon's own documentation for more info.

How to upload huge files into webserver

I have a virtual machine on google cloud and i create a webserver on this machine (ubuntu 12.04). I will service my website on this machine.
My website shows huge size images which format is jpeg2000. Also my website supports, users can upload their images and share other people.
But problem is images' size about 1 ~ 3 gb and i can not use standart file upload methods (php file upload) because when the connection gone, that upload starts again. So i need to better way ?
I am thinking about google drive api. If i create a common google drive account and users upload this account on my website using google drive api. Is it will good way ?
Since you're uploading files to Drive, you can use the Upload API with uploadType=resumable.
Resumable upload: uploadType=resumable. For reliable transfer, especially important with larger files. With this method, you use a session initiating request, which optionally can include metadata. This is a good strategy to use for most applications, since it also works for smaller files at the cost of one additional HTTP request per upload.
However, do note that there's a storage limit for the account. If you want have more capacity, you'll have to purchase it.

Upload videos to Amazon S3 using ruby with sinatra

I am building an android app which has a backend written on ruby/sinatra. The data from the android app is coming in the form of json data.
The database being used is mongodb.
I am able to catch the data on the backend. Now what I want to do is to upload a video on Amazon S3 being sent from the android app in the form of byte array.
I also want to store the video in a form of a string in the local database.
I have been using carrierwave, fog and carrierwave-mongoid gems but didn't have any luck.
These are the some blogs I followed:
https://blog.engineyard.com/2011/a-gentle-introduction-to-carrierwave/
http://www.javahabit.com/2012/06/03/saving-files-in-amazon-s3-using-carrierwave-and-fog-gem/
If someone could just guide me with how to go about it specifically with sinatra and mongodb cause that's where I am facing the main issue.
You might think about using AWS SDK for Android to directly upload to S3 so that your app server thread doesn't get stuck while an user is uploading a file. If you are using a service like Heroku you would be paying extra $$$ just because your user had a lousy connection.
However in this scenario;
Uploading to S3 should be straight forward once you have your mounting in place using carrierwave.
You should never store your video in the database as it will slow you down! DBs are not optimised for files, OSs are. Video is binary data and cannot be stored as text, you would need a blob type if you want to do this crime.
IMO, uploading to S3 is good enough as then you can use Amazon cloudfront CDN services to copy and distribute your content in a more optimised way.

Saving third-party images on third-party server

I am writing a service as part of which a user chooses an image from a url (not my domain) and later he and others can view that image.
I need to save this image to a third party server (S3).
After a lot of wasted time I found I can not do it from the client side due to security issues (I can't get the third party image data and send it from the client side without alerting the client, which is just bad)
I also do not want to do the uploading on my server because I run Rails on Heroku and the workers expansive.
So I though of two options:
use something like transloadit.com,
or write a service on EC2 that will run over my db, find where the rows where the images are not uploaded and upload them.
I decided to go for the EC2 and S3 because the solution i am writing is meant for enterprise and it seems that it will sound better as part of the architecture when presented to customers.
My question is: what is the setup i need so I can access the Heroku db from an external service?
Any better ideas on how to solve this?
So you want to effectively write a worker, but instead of doing it on Heroku you want to do it on EC2? That feels like more work.
As for the database, did you see the documentation? It shows how to get the URL.
PS. Did you not find it in the docs?

Heroku architecture for running different applications but on the same domain

I have a unique set-up I am trying to determine if Heroku can accommodate. There is so much marketing around polygot applications, but only one example I can actually find!
My application consists of:
A website written in Django
A separate Java application, which takes files uploaded by users, parses them, and stores the data in a database
A shared database accessible by both applications
Because these user-uploaded files can be enormous, I want the uploaded file to go directly to the Java application. My preferred architecture is:
The Django-generated webpage displays the upload form.
The form does an AJAX submit to the Java application
The browser starts polling the database to see if the Java application has inserted the data
Meanwhile the Java application does its thing w/ the user-uploaded file and updates the database when it's done
The Django webpage AJAX-refreshes a div with the results of the user upload once the polling mechanism sees that the upload is complete
The big issue I can't figure out here is if I can get both the Django the Java apps either running on the same set of dynos or on different dynos but under the same domain to avoid AJAX cross-domain issues. Does Heroku support URL-level routing? For ex:
Django application available at http://www.myawesomewebsite.com
Java application available at http://www.myawesomewebsite.com/javaurl/
If this is not possible, does anyone have any ideas for work-arounds? I know I could have the user upload the file to Django and have Django send the request to Java from the server-side instead of the client side, but that's an awful lot of passing around of enormous files.
Thanks so much!
Heroku does not support the ability to route via the URL. Polyglot components should exist as their own subdomains and operate in a cross-domain fashion.
As a side-note: Have you considered directly uploading to S3 instead of uploading to your app on Heroku which will then (presumably) upload to S3. If you're dealing with cross-domain file uploads this is worth considering for its high level of scalability.

Resources