Upload large files using Ruby - ruby

I'm wondering what is the best pattern to allow large files to be uploaded to a server using Ruby.
I've found Rails and Large, Large file Uploads: Looking at the alternative but it doesn't give any concrete solutions.
I don't want to use Rails since I'm working on a simple upload server that'll run in standalone mode. I'm guessing that Sinatra could be the key but I don't know which web server I should use to run it without raising a Timeout.
I also need this web server to allow simultaneous upload.
UPDATE: By "large files" I mean between 200MB and 5GB.
UPDATE2: Since those files are videos (in my case), I can deal with a max size of 2GB like youtube.

ok i am taking a bit of a strech here but:
if you would use a couchdb as a target for your uploads you would get rid of the timeout problem.
consider the couchdb as some "temp" memory in this example.
so if a downloads finishes you can take the file from the couchdb and do with it whatever you want.
i managed to upload files as big as 9gb over a dsl line into couchdb without any drama.
it may take a bit of reading but i think you could make it work.
couchdb has many rails gems so it plays nice with others ;)
let me know if you wanna go down that rabbit hole so i can give you some more pointers

passenger recommends using a separate apache/nginx module to handle uploads.

Related

Implementing "extreme" bandwidth saving for web browsing with a compression proxy

I have a network connection where I pay per megabyte, so I'm interested in reducing my bandwidth usage as far as possible while still having a reasonably good browsing experience. I use this wonderful extension (https://bandwidth-hero.com/). This extension runs a image-compression proxy on my heroku account that accepts images URLs, and returns a low-quality version of those images.This reduces bandwidth usage by 30-40% when images are loaded.
To further reduce usage, I typically browse with both JavaScript and images disabled (there are various extensions for doing this in firefox/firefox-esr/google-chrome). This has an added bonus of blocking most ads (since they usually need JavaScript to run).
For daily browsing, the most efficient solution is using a text-mode browser in a virtual console such as elinks/lynx/links2 running over ssh (with zlib compression) on a VPS server. But sometimes using JavaScript becomes necessary, as sites will not render without it .Elinks is the only text-mode browser that even tries to support JavaScript, and even that support is quite rudimentary. When I have to come back to using firefox/chrome, I find my bandwidth usage shooting up. I would like to avoid this.
I find that bandwidth is used partially to get the 'raw' html files of the sites I'm browsing, but more often for the associated .js/.css files. These are typically highly compressible. On my local workstation, html+css+javascript files typically compress by a factor of more than 10x when using lzma(2) compression.
It seems to me that one way to do drastically reduce bandwidth consumption would be to use the same template as the bandwidth-hero extension, i.e. run a compression proxy either on a vps or on my heroku account but do so for text content (.html/.js/.css).
Ideally, I would like to run a compression proxy on my local machine. When I open a site (say www.stackoverflow.com), the browser should send a request to this local proxy. This local proxy then sends a request to a back-end running on heroku/vps. The heroku/vps back-end actually fetches all the content, and compresses it (lzma/bzip/gzip). The compressed content is sent back to my local proxy. The local proxy decompresses the content and finally gives it to the browser.
There is something like this mentioned in this answer (https://stackoverflow.com/a/42505732/10690958) for node.js . I am thinking of the same for python.
From what google searches show, HTTP can "automatically" ask for gzip versions of pages. But does this also apply for the associated files that are loaded by JavaScript, and for the css files? Perhaps, what I am thinking about is already implemented by default ?
Any pointers would be welcome. I was thinking of writing a local proxy in python,as I am reasonably fluent in it. But I know little about heroku or the intricacies of HTTP.
thanks.
Update: I found a possible solution here https://github.com/barnacs/compy
which does almost exactly what I need (minify+compress with brotli/gzip+transcode jpeg/gif/png). It uses go instead of python, but that does not really matter. It also has a docker image here https://hub.docker.com/r/andrewgaul/compy/ . Since I'm not very familiar with heroku, I cant figure out how to use this to run the compression proxy service on my account. The heroku docs also weren't of much help to me. Any pointers would be welcome.

How can I compress the JSON content from CouchDB's HTTP responses?

I am making a lot of _all_docs requests to CouchDB's HTTP server. One thing I'm realizing is that the data is not compressed, so this results in large file sizes. Even by using limit and skip, the files can sometimes be 10MB each. That doesn't cause any problems for my app, but it does mean that if a connection to our CouchDB server is slower than our office connection, it will go rather slow.
Is there any way I can enable HTTP compression? I am not referring to attachments - just the JSON files.
Also, I am using Windows Server - not Linux/Unix.
Thanks!
There is no support in CouchDB directly, but it has been requested. (so voice your support there if you want this included)
That being said, there are a number of options you have. First, you can set up nginx as a reverse proxy and allow it to compress (and possibly cache) responses for you. After a quick search, I found this plugin that you install in CouchDB directly.
Another thing is that CouchDB does a pretty solid job of allowing clients to cache reliably. You can leverage this to prevent repeatedly downloading the same large resource.

Streaming specific setup for audio streaming

Thanks for any help and suggestions.
So I have an amazon ec2 instance (m3.medium which I have suddenly realized yields me 4gb of storage) running with wowza server installed for live streaming audio and audio/video on demand. Everything is running fine...things seem to be going swimmingly as we have been live for a week now.
Everyday we have close to 80 people listen in on the live stream and usually that falls to 20-10 concurrent users listening to archived streams at any given time. We hope to increase this number in time.
We have the live/record app and the vod app that we use for streaming and vod/aod respectively. After the streaming show is done it saves the file to the content folder as you know.
So I was cruising the file system checking out the content folder and thinking that eventually this folder is going to fill up and was curious how people navigate this part of streaming- the storage part. It definitely feels like the easiest route as far as storage goes, though I know of the perils involved in keeping all of these files on the instance.
For storing these sorts of files that need to remain available in perpetuity, which tend to add up space-wise, what is the manner in which people commonly do this?
I tried briefly to mount an s3 and it just didn't work for me. I'm sure that can be fixed but I kept reading that its not recommended to write or stream from s3.
Thanks..any info and leads is a big help. Total newbie here and surprised I even made it this far.
I'll probably need to start a new instance and transfer everything over to a new instance unless there is a way for me to attach more storage to the instance.
Thanks.
S3 is going to be your best option. But s3 is not a block storage device like an EBS volume. So you can really open a file on it and write to like a stream.
Instead, you need break your files apart (on some breakpoint that works for you) and upload the file to s3 after its been completed. With that you can have users directly download from S3 and remove load from your current instance.

What's a good alternative to page caching on Heroku?

I understand page caching isn't a good option on heroku since each dyno has an emepheral file system (so they wouldn't share files and it would get wiped out on each restart).
So I'm wondering what the best alternative is. I have a large amount of potential files that could get generated in a traditional page caching scenario (say 10GB-100GB) so redis/memcached don't seem like good options here. Redis can write out to disk, but my understanding is that once you exceed it's memory capacity, it's not the right solution to start reading off of disk.
Has anyone found a good solution here? I'm thinking maybe MongoStore. (And some way to run this in conjunction with redis since I'm using redis for some other scenarios.) Thanks!
If your site is 100% static content and never going to be dynamic, S3 may be a good option. You can then create a CNAME to the s3 domain. This allows you to leverage CloudFront should you need it. Otherwise, 100GB would have to go into the database, which is in turn then pulled up by your application.
Heroku's cedar stack allows for custom buildpacks. This one vendors nginx. This would be good if you envision transitioning to a more dynamic site.

Downloading large files to PC from OAS Server

We have an Oracle 10g forms application running on a Solaris OAS server, with the forms displaying in IE. Part of the application involves uploading and downloading files (Word docs and PDFs, mainly) from the PC to the OAS server, using Oracle's webutil utility.
The problem is with large files (anything over 25Megs or so), it takes a long time, sometimes many minutes. Uploading seems to work, even with large files. Downloading large files, though, will cause it to error out part way through the download.
I've been testing with a 189Meg file in our development system. Using WEBUTIL_FILE_TRANSFER.Client_To_DB (or Client_To_DB_with_Progress), the download would error out after about 24Megs. I switched to WEBUTIL_FILE_TRANSFER.URL_To_Client_With_Progress, and finally got the entire file to download, but it took 22 minutes. Doing without the progress bar got it down to 18 minutes, but that's still too long.
I can display files in the browser, and my test file displayed in about 5 seconds, but many files need to be downloaded for editing and then re-uploaded.
Any thoughts on how to accomplish this uploading and downloading faster? At this point, I'm open to almost any idea, whether it uses webutil or not. Solutions that are at least somewhat native to Oracle are preferred, but I'm opn to suggestions.
Thanks,
AndyDan
This may be totally out to lunch, but since you're looking for any thoughts that might help, here are mine.
First of all, I'm assuming that the actual editing of the files happens outside the browser, and that you're just looking for a better way to get the files back and forth.
In that case, one option I've used in the past is just to route around the web application using Apache, or any other vanilla web server you like. For downloading, create a unique file session token, remember it in the web application, and place a copy of the file, named with the token (e.g. <unique token>.doc), in a download directory visible to Apache. Then provide a link to the file that will be served via Apache.
For upload, you have a couple of options. One is to use the mechanism you've got, then when a file is uploaded, you just have to match on the token in the name to patch the file back into your archive. Alternately, you could create a very simple file upload form separate from your application that will upload the file to a temp directory via Apache, then route the user back into your application and provide the token in the URL HTTP GET-style or else in a cookie.
Before you go to all that trouble, you'll want to make sure that your vanilla web server will provide better upload and download speed and reliability than your current solution, but it should.
As an aside, I don't know whether the application server you're using provides HTTP compression, but if it does, you should make sure it's enabled and working. This is probably the best single thing you can do to increase transfer speed of large files, assuming they're fairly compressible. If your application server doesn't support it, then most any vanilla web server will.
I hope that helps.
I ended up using CLIENT_HOST to call an FTP command to download the files. My 189MB test file took 20-22 minutes to download using WEBUTIL_FILE_TRANSFER.URL_To_Client_With_Progress, and only about 20 seconds using FTP. It's not the best solution because it leaves the FTP password exposed on the PC temporarily, but only for as long as the download takes, and even then the user would have to know where to find it.
So, we're implementing this for now, and looking for a more secure but still performant long term solution.

Resources