Thanks for any help and suggestions.
So I have an amazon ec2 instance (m3.medium which I have suddenly realized yields me 4gb of storage) running with wowza server installed for live streaming audio and audio/video on demand. Everything is running fine...things seem to be going swimmingly as we have been live for a week now.
Everyday we have close to 80 people listen in on the live stream and usually that falls to 20-10 concurrent users listening to archived streams at any given time. We hope to increase this number in time.
We have the live/record app and the vod app that we use for streaming and vod/aod respectively. After the streaming show is done it saves the file to the content folder as you know.
So I was cruising the file system checking out the content folder and thinking that eventually this folder is going to fill up and was curious how people navigate this part of streaming- the storage part. It definitely feels like the easiest route as far as storage goes, though I know of the perils involved in keeping all of these files on the instance.
For storing these sorts of files that need to remain available in perpetuity, which tend to add up space-wise, what is the manner in which people commonly do this?
I tried briefly to mount an s3 and it just didn't work for me. I'm sure that can be fixed but I kept reading that its not recommended to write or stream from s3.
Thanks..any info and leads is a big help. Total newbie here and surprised I even made it this far.
I'll probably need to start a new instance and transfer everything over to a new instance unless there is a way for me to attach more storage to the instance.
Thanks.
S3 is going to be your best option. But s3 is not a block storage device like an EBS volume. So you can really open a file on it and write to like a stream.
Instead, you need break your files apart (on some breakpoint that works for you) and upload the file to s3 after its been completed. With that you can have users directly download from S3 and remove load from your current instance.
Related
My question might sound weird if this is not possible but just bear with me.
I would like to know if it is possible to download stuff to a cloud storage (generally) like you can to your local storage.
I want to build a small bot that can download media(pdf, vid, audio,...) and send to me. As of now, it downloads the file to my local storage before sending the file. However, when I will host it(I plan to do so on a free service since it's small), I suspect that might not be possible and even if it were the storage for the app itself will be too small to accommodate more than a few files.
As such, I want to use a cloud service of some sort to serve as an intermediate where I can download the file, before sending it. But I don't know if that is possible or even makes sense.
After looking around, I have learnt of some cloud-to-cloud services that can directly extract data from a link and store it in my cloud.
However, this is not applicable in my case since some modifications will have to be done to the files before sending. For example, I have some code below that downloads the audio from a youtube video
from pytube import YouTube
import os
def download(URL: str) -> str:
yt = YouTube(url)
video = yt.streams.filter(only_audio=True).first()
out_file = video.download(output_path="./music")
base, ext = os.path.splitext(out_file)
new_file = base + '.mp3'
os.rename(out_file, new_file)
return new_file
As in this case, I only want to download the audio of the video. So my question is, is it possible for me to download to some cloud storage (the same way I would download to my local storage) ...[ as in out_file = video.download(output_path="path/to/some/cloud/storage")
Thanks for helping out! :)
If you were dealing with a straight forward file download, then maybe. You'd need the cloud storage service to have an API which would let you specify a URL to pull a file from. The specifics would depend on which cloud storage service you were using.
You aren't dealing with a straight forward file download though. You're pulling a stream from YouTube and stitching the pieces together before saving it as a single file.
There's no way to do that sort of manipulation without pulling it down to the machine the program is running on.
It is possible to treat some cloud storage systems as if they were local directories. Either by using software which synchronises a local directory with the cloud (which is pretty standard for the official software) or one which moves files up and down on demand (such as s3fs). The latter seems to fit most of your needs, but might have performance issues. That said, both of these are scuppered for you since you intend to use free hosting which wouldn't allow you to install such software.
You are going to have the store the files locally.
To avoid running into capacity limits, you can upload them to whatever cloud storage service you are using (via whatever API they provide for that) as soon as they are downloaded … and then delete the local copy.
we have a need to regularly provide large files to clients on a daily or weekly basis. Currently our process is this:
Internal process creates the file and places it in a specific folder
Our client connects via SFTP and downloads the file
This work well when the files are small. As they get bigger (50-100 GB in size), we keep getting network interruptions and internal disk space related issues.
What I'd like to see is the following:
Our internal process creates the file.
This file is copied to an intermediary service (similar to something like FileDropper).
Our client will download the file from this intermediary service.
I'd like to know if other people had similar issues and what possible solutions are in place. File Dropper works great for non-business related files but obviously I won't be putting client data on there. We also have an Office 365 subscription. I tried to see what I could use with that but I haven't found anything yet that would help solve this.
Any hints, suggestions or feedback is much appreciated!
Consider Amazon S3.
I have used it several times in the past and it is very reliable both for processing a lot of files and for processing large files
I'm currently using Amazon S3 to host all static content on my site. The site has a lot of static files, so I need an automated way of syncing the files on my localhost with the remote files. I currently do this with s3cmd's sync feature, which works wonderfully. Whenever I run my deploy script, only the files that have changed are uploaded, and any files that have been deleted are also deleted in S3.
I'd like to try Rackspace CloudFiles; however, I can't seem to find anything that offers the same functionality. Is there anyway to accomplish this on Rackspace Cloud Files short of writing my own syncing utility? It needs to have a command-line interface and work on OS X.
The pyrax SDK for the Rackspace Cloud has the sync_folder_to_container() method for cloudfiles that sounds like what you're looking for. It will only upload new/changed files, and will optionally remove files from the cloud that are deleted locally.
As far as the initial upload is concerned, I usually use eventlet to upload files in as asynchronous a manner as possible. The total time will still be limited by your upload speeds (I don't know of any SDK that can get around that), but the non-blocking code will certainly help overall performance.
If you have any other questions, feel free to ask here on on the GitHub page.
-- Ed Leafe
The Rackspace Python SDK can do that for you. There's a script called cf_pyrax.py that does, more or less, what I think you're trying to do. There's a write up on it in this blog post.
I have a ton of video in this app I'm designing, and I want to come in under the 20meg limit for the initial download, but I would love to have the thing work for folks when they don't have access to WIFI/3G. I want to have the video stream in and download the first time it's viewed, but then be saved and play from the local file every time afterward.
Anybody done something like this before? Thanks!
A basic way for caching could be done as follows: When you want to see a video, first see if it is in the cache.If it is there already then load it from the cache. Otherwise download it and save it in the cache. You can use the video URI (or its hash) as the file-name when you save it in the storage. Implementing File operations won't be a big problem as there are lots of resources out there to help you. You might want to create a utility class for handling all the caching operations.
We're working on developing user widgets that our members can embed on their websites and blogs. To reduce the load on our servers we'd like to be able to compile the necessary data file for each user ahead of time and store it on our Amazon S3 account.
While that sounds simple enough, I'm hoping there might be a way for S3 to automatically ping our script if a file is requested that for some reason doesn't exist (say for instance if it failed to upload properly). Basically we want Amazon S3 to act as our cache and it to notify a script on a cache miss. I don't believe Amazon provides a way to do this directly but I was hoping that some hacker out there could come up with a smart way to accomplish this (such as mod_rewite, hash table, etc).
Thanks in advance for your help & advice!
Amazon doesn't currently support this, but it looks like it might be coming. For now, what you could do is enable logging and parse the log for 404s once an hour or so.
It's certainly not instant, but it would prevent long-term 404s and give you some visibility about what files are missing.