can I 'download' to a cloud storage? - download

My question might sound weird if this is not possible but just bear with me.
I would like to know if it is possible to download stuff to a cloud storage (generally) like you can to your local storage.
I want to build a small bot that can download media(pdf, vid, audio,...) and send to me. As of now, it downloads the file to my local storage before sending the file. However, when I will host it(I plan to do so on a free service since it's small), I suspect that might not be possible and even if it were the storage for the app itself will be too small to accommodate more than a few files.
As such, I want to use a cloud service of some sort to serve as an intermediate where I can download the file, before sending it. But I don't know if that is possible or even makes sense.
After looking around, I have learnt of some cloud-to-cloud services that can directly extract data from a link and store it in my cloud.
However, this is not applicable in my case since some modifications will have to be done to the files before sending. For example, I have some code below that downloads the audio from a youtube video
from pytube import YouTube
import os
def download(URL: str) -> str:
yt = YouTube(url)
video = yt.streams.filter(only_audio=True).first()
out_file = video.download(output_path="./music")
base, ext = os.path.splitext(out_file)
new_file = base + '.mp3'
os.rename(out_file, new_file)
return new_file
As in this case, I only want to download the audio of the video. So my question is, is it possible for me to download to some cloud storage (the same way I would download to my local storage) ...[ as in out_file = video.download(output_path="path/to/some/cloud/storage")
Thanks for helping out! :)

If you were dealing with a straight forward file download, then maybe. You'd need the cloud storage service to have an API which would let you specify a URL to pull a file from. The specifics would depend on which cloud storage service you were using.
You aren't dealing with a straight forward file download though. You're pulling a stream from YouTube and stitching the pieces together before saving it as a single file.
There's no way to do that sort of manipulation without pulling it down to the machine the program is running on.
It is possible to treat some cloud storage systems as if they were local directories. Either by using software which synchronises a local directory with the cloud (which is pretty standard for the official software) or one which moves files up and down on demand (such as s3fs). The latter seems to fit most of your needs, but might have performance issues. That said, both of these are scuppered for you since you intend to use free hosting which wouldn't allow you to install such software.
You are going to have the store the files locally.
To avoid running into capacity limits, you can upload them to whatever cloud storage service you are using (via whatever API they provide for that) as soon as they are downloaded … and then delete the local copy.

Related

Storing files in a webserver

I have a project using MEAN stack that uploads imagefiles to a server and the names of the images to db. Then the images are shown for users of the applications kinda like an image gallery.
I have been trying to figure out an effiecent way of storing the imagefiles. atm im storing them under the angular application in a folder /var/www/app/files
What are the usual ways of storing them in a cloud server like digital ocean, heroku and many others.
Im a bit thrown off by the fact they offer many options for datastorage.
Lets say that hundres of thousands of images were uploaded by the application to the server.
Saving all of them in inside your front end app in a subfolder might not be the best solution? or am i wrong with this.
I am very new to these webserver cloud services and how they actually operate.
Can someone clarify on what would be the optimal solution.
Thanks!
Saving all of them in inside your front end app in a subfolder might not be the best solution?
You're very right about this. Over time this will get cluttered, and unless you use some very convoluted logic, will slow down your server.
If you're using Angular and this is in the public folder sent to every user, this is even worse.
The best solution to this is using something like an AWS S3 Bucket (DigitalOcean has Block Storage and I believe Heroku has something a bit different). These services offer storage of files, and essentially act as logic-less servers. You can set some privacy policies and other settings, but there's no runtime like NodeJS that can do logic for you.
Your Node server (or any other server setup) interfaces with this storage server, and handles most of the fetching and storing of files. You can optionally limit these storage services so they can only communicate with your Node server, so any file traffic would be done through your Node server.

Streaming specific setup for audio streaming

Thanks for any help and suggestions.
So I have an amazon ec2 instance (m3.medium which I have suddenly realized yields me 4gb of storage) running with wowza server installed for live streaming audio and audio/video on demand. Everything is running fine...things seem to be going swimmingly as we have been live for a week now.
Everyday we have close to 80 people listen in on the live stream and usually that falls to 20-10 concurrent users listening to archived streams at any given time. We hope to increase this number in time.
We have the live/record app and the vod app that we use for streaming and vod/aod respectively. After the streaming show is done it saves the file to the content folder as you know.
So I was cruising the file system checking out the content folder and thinking that eventually this folder is going to fill up and was curious how people navigate this part of streaming- the storage part. It definitely feels like the easiest route as far as storage goes, though I know of the perils involved in keeping all of these files on the instance.
For storing these sorts of files that need to remain available in perpetuity, which tend to add up space-wise, what is the manner in which people commonly do this?
I tried briefly to mount an s3 and it just didn't work for me. I'm sure that can be fixed but I kept reading that its not recommended to write or stream from s3.
Thanks..any info and leads is a big help. Total newbie here and surprised I even made it this far.
I'll probably need to start a new instance and transfer everything over to a new instance unless there is a way for me to attach more storage to the instance.
Thanks.
S3 is going to be your best option. But s3 is not a block storage device like an EBS volume. So you can really open a file on it and write to like a stream.
Instead, you need break your files apart (on some breakpoint that works for you) and upload the file to s3 after its been completed. With that you can have users directly download from S3 and remove load from your current instance.

Is FTP file sharing faster than cloud storage alternatives e.g dropbox / google drive / mediafire

I work doing DCP (digital cinema packages) for trailers, the files are usually a zip of 1-2 gig.
I have been just uploading them to an ftp on a cloud hosting and sending the links with username/password, and that works most of the time but lately there has been some clients that experience time out while downloading and unable to resume (clients being local cinemas downloading the files)
I know some foreign production houses use dropbox and similar web based file sharing to send their big files but I wonder if is there any alternative to FTP and web based file sharing aside from torrents ?
I have had FTP timeout issues delivering broadcast-sized media, especially with distant clients. In some cases, I use 7Zip volume split archives to deliver the large file in smaller pieces, which speeds up the overall transfer (multiple downloads at once) while preventing timeouts. The client needs to be somewhat technically inclined, as it involves using a 7z archiver like 7Zip or PeaZip. They basically download all the pieces into a single folder, and when they open the first file, it shows the whole file they can then extract (I usually go with 256MB pieces).
Here's a how-to I found real quick, but there are plenty others out there: http://www.linglom.com/it-support/how-to-split-a-large-file-using-7-zip/

Is there an equivalent to Amazon S3's s3cmd sync for Rackspace Cloud Files?

I'm currently using Amazon S3 to host all static content on my site. The site has a lot of static files, so I need an automated way of syncing the files on my localhost with the remote files. I currently do this with s3cmd's sync feature, which works wonderfully. Whenever I run my deploy script, only the files that have changed are uploaded, and any files that have been deleted are also deleted in S3.
I'd like to try Rackspace CloudFiles; however, I can't seem to find anything that offers the same functionality. Is there anyway to accomplish this on Rackspace Cloud Files short of writing my own syncing utility? It needs to have a command-line interface and work on OS X.
The pyrax SDK for the Rackspace Cloud has the sync_folder_to_container() method for cloudfiles that sounds like what you're looking for. It will only upload new/changed files, and will optionally remove files from the cloud that are deleted locally.
As far as the initial upload is concerned, I usually use eventlet to upload files in as asynchronous a manner as possible. The total time will still be limited by your upload speeds (I don't know of any SDK that can get around that), but the non-blocking code will certainly help overall performance.
If you have any other questions, feel free to ask here on on the GitHub page.
-- Ed Leafe
The Rackspace Python SDK can do that for you. There's a script called cf_pyrax.py that does, more or less, what I think you're trying to do. There's a write up on it in this blog post.

downloading large amount of files

I'm researching solutions for a potential client. They're requesting the ability to download a large amount of MP3's (1000+) from their online catalog.
I've researched/tested building a zip containing all MP3s using ZipArchive but ran into obvious memory leak issues that have ruled that solution out.
I'm now trying to think out of the box.
One idea was to create an FTP queue or a Torrent type download link for them. Is there anything out there that can pull something like this off?
Any help or suggested direction would be greatly appreciated! Thanks!!
Edit: Here is the overall process/goal that we're trying to achieve.
The client creates music for TV/Flim placement. They maintain a online catalog AND a local copy they send to potential buyers. The online catalog and the offline catalog need to mirror each other. Problem being, they have multiple offices that will have to update their local copy with the new files added to the online catalog from many different locations
Example: East Coast User updates catalog with 100 new files. West Coast User needs to update the offline catalog with the new files retrieved from the online catalog.
We had hoped to create custom zip's of the files each user needed to update their catalog based on the user's download history that we'd maintain in MySQL. We were testing ZipArchive but we couldn't seem to build Zips over 175 MEG (give or take). We're in the process of testing ZipStreaming but are having some issues.
I hope this clears up the overall goal and problems we are facing.
GNU wget?
It can download recursive. Just give wget a list of all files on the server, e.G.
http://www.example.org/filelist.html which contains links like file1.mp3, file2.mp3 etc (apache normally generates such an index file automatically wenn a directory without index.html/php in it gets called.
http://linux.die.net/man/1/wget
Frankly speaking, I can't identify the actual problem/question from your post. If you are looking for minimizing network load, then you need to remember that MP3 files are not compressed well because they are already compressed (not as well as possible, but well). If you are looking for a transport, than any file transfer protocol will do (FTP, SFTP, HTTP, WebDAV).
If you need flexibility and features, I'd recommend SFTP: this is a protocol for remote file system access, so besides "get file" operation it has plenty of useful operations including machine-readable directory listing (not always available in FTP and not available in standard HTTP), built-in ZLib compression, built-in possibility to resume file transfers and more bonuses. HTTP also has ZLib compression, but this one is not always available.
Update: your approach doesn't care about what is really available on the client and you are going to prepare ZIP files based on your (possibly incorrect) knowledge of the client already has.
If the client and server are both applications that you develop, then you should use RSync protocol or something similar to update data online (not using any ZIP files) and download the files that are missing on the client. If direct communication between the client and the server is not possible, you can make the client send his state to the server and the server will prepare an individual package after that. As for ZIP functionality - it's needed only when you use batch update (no real-time communication between the client and the server). I don't know what technology you are using but if your only problem is with ZIP component, you can use something else for data packing - either different ZIP component (for .NET and VCL we have ZIP component) or some other packing solution (for example, our SolFS product doesn't have size limits). Unfortunately I am not aware of RSync-like implementation available as a component.

Resources