I'm looking for a way to store small data packages on a temporary place on the cloud, to be lately sent from there and thus erased. It would be also wonderful to be able to create a cloud-script for the sending/erasing task.
This is the scenario: data on the cloud will be sent to different places in different formats, so I need it to be up there until the destination is assigned. When this happens, that particular amount of data will be formatted to the particular destination structure and then sent.
I've been taken a look to several options, and maybe Openkeyval could be the best one, but something seems wrong with it because it doesn't let me post anything to any new location.
Ideas?
Thanks in advance!
You could use Amazon S3 with the curl command or if you have more freedom about installing additional tools the s3cmd gives you more power.
I found a free and nice solution using Dropbox:
https://www.dropbox.com/developers/datastore/tutorial/http
Data can be sent and retrieved using curl. Storage is made in the cloud -no need to have a copy of it in any computer-, and can be retrieve from any machine. Just what I was looking for! :D
Related
My question might sound weird if this is not possible but just bear with me.
I would like to know if it is possible to download stuff to a cloud storage (generally) like you can to your local storage.
I want to build a small bot that can download media(pdf, vid, audio,...) and send to me. As of now, it downloads the file to my local storage before sending the file. However, when I will host it(I plan to do so on a free service since it's small), I suspect that might not be possible and even if it were the storage for the app itself will be too small to accommodate more than a few files.
As such, I want to use a cloud service of some sort to serve as an intermediate where I can download the file, before sending it. But I don't know if that is possible or even makes sense.
After looking around, I have learnt of some cloud-to-cloud services that can directly extract data from a link and store it in my cloud.
However, this is not applicable in my case since some modifications will have to be done to the files before sending. For example, I have some code below that downloads the audio from a youtube video
from pytube import YouTube
import os
def download(URL: str) -> str:
yt = YouTube(url)
video = yt.streams.filter(only_audio=True).first()
out_file = video.download(output_path="./music")
base, ext = os.path.splitext(out_file)
new_file = base + '.mp3'
os.rename(out_file, new_file)
return new_file
As in this case, I only want to download the audio of the video. So my question is, is it possible for me to download to some cloud storage (the same way I would download to my local storage) ...[ as in out_file = video.download(output_path="path/to/some/cloud/storage")
Thanks for helping out! :)
If you were dealing with a straight forward file download, then maybe. You'd need the cloud storage service to have an API which would let you specify a URL to pull a file from. The specifics would depend on which cloud storage service you were using.
You aren't dealing with a straight forward file download though. You're pulling a stream from YouTube and stitching the pieces together before saving it as a single file.
There's no way to do that sort of manipulation without pulling it down to the machine the program is running on.
It is possible to treat some cloud storage systems as if they were local directories. Either by using software which synchronises a local directory with the cloud (which is pretty standard for the official software) or one which moves files up and down on demand (such as s3fs). The latter seems to fit most of your needs, but might have performance issues. That said, both of these are scuppered for you since you intend to use free hosting which wouldn't allow you to install such software.
You are going to have the store the files locally.
To avoid running into capacity limits, you can upload them to whatever cloud storage service you are using (via whatever API they provide for that) as soon as they are downloaded … and then delete the local copy.
we have a need to regularly provide large files to clients on a daily or weekly basis. Currently our process is this:
Internal process creates the file and places it in a specific folder
Our client connects via SFTP and downloads the file
This work well when the files are small. As they get bigger (50-100 GB in size), we keep getting network interruptions and internal disk space related issues.
What I'd like to see is the following:
Our internal process creates the file.
This file is copied to an intermediary service (similar to something like FileDropper).
Our client will download the file from this intermediary service.
I'd like to know if other people had similar issues and what possible solutions are in place. File Dropper works great for non-business related files but obviously I won't be putting client data on there. We also have an Office 365 subscription. I tried to see what I could use with that but I haven't found anything yet that would help solve this.
Any hints, suggestions or feedback is much appreciated!
Consider Amazon S3.
I have used it several times in the past and it is very reliable both for processing a lot of files and for processing large files
We're working on developing user widgets that our members can embed on their websites and blogs. To reduce the load on our servers we'd like to be able to compile the necessary data file for each user ahead of time and store it on our Amazon S3 account.
While that sounds simple enough, I'm hoping there might be a way for S3 to automatically ping our script if a file is requested that for some reason doesn't exist (say for instance if it failed to upload properly). Basically we want Amazon S3 to act as our cache and it to notify a script on a cache miss. I don't believe Amazon provides a way to do this directly but I was hoping that some hacker out there could come up with a smart way to accomplish this (such as mod_rewite, hash table, etc).
Thanks in advance for your help & advice!
Amazon doesn't currently support this, but it looks like it might be coming. For now, what you could do is enable logging and parse the log for 404s once an hour or so.
It's certainly not instant, but it would prevent long-term 404s and give you some visibility about what files are missing.
I've finally had a second to look into streaming, daemons, and cron
tasks and all the neat gems built around them! But I'm not clear on
how/when to use these things.
I have a few questions:
1) If I wanted to have a website that stayed constantly updated, realtime, with my Facebook friends' activity feeds, up-to-the-minute Amazon book reviews on my favorite books, and my Twitter feed, would I just create some custom streaming implementation using the Daemon gem, the ruby-yali gem for streaming the content, and the Whenever gem, which could say, check those sites every 3-10 seconds to see if content I'm looking for has changed? Is that how it would work? Or is it typically/preferably done differently?
2) Is (1) too processor intensive? Is there a better way you do it, a better way for live content streaming, given that the website you want realtime updates on doesn't have a streaming api? I'm thinking about just sending a request every few seconds in a separate small ruby app (with daemons and cronjobs), getting the json/xml result, using nokogiri to remove the stuff I don't need, and then just going through the small list of comments/books/posts/etc., building a feed of what's changed, and using Juggernaut or something to push those changes to some rails app. Would that work?
I guess it all boils down to the question:
How does real-time streaming of the latest content of some website work? How do YOU do it?
...so if someone is on my site, they can see in real time the new message or new book that just came out?
Looking forward to your answers,
Lance
Well first, if a website that doesn't provide an API, then it's a strong indication that it's not legal to parse and extract their data, however you'd better check their terms of use and privacy policy.
Personally I'm not aware of something called "Streaming API", but supposing that they have an API , you still need to pull the results provided by it(xml, json, ....), parse them and present them back to the user. The strategy will vary depending on your app type:
Desktop app: then you just can pull the data directly, parse it and provide it to the user, many apps are like that just like Twhirl.
Web app: then you need to cut down the time for extracting the data. Typically you will pull the data from the API and store it. However, storing the data is a bit tricky! You don't want want your database to be a lock down for the app by the extreme pull queries that it gonna get to retrieve the data back. One way to do this is to use push methodology; follow option 2 in this case to get the data and then push to the user. If you want instant updates like chat for example you can have a look at orbited. If it's ok to save the data to some kind of user and followers' 'inboxes', then the simplest way as I can tell is to use IMAP to send the updates to the user inbox.
My computer at home is set up to automatically download some stuff from RSS feeds (mostly torrents and podcasts). However, I don't always keep this computer on. The sites I subscribe to have a relatively large throughput, so when I turn the computer back on it has no idea what it missed between the the time it was turned off and the latest update.
How would you go about storing the feeds entries for a longer period of time than they're available on the actual sites?
I've checked out Yahoo's pipes and found no such functionality, Google reader can sort of do it, but it requires a manual marking of each item. Magpie RSS for php can do caching, but that's only to avoid retrieving the feed too much not really storing more entries.
I have access to a webserver (LAMP) that's on 24/7, so a solution using a php/mysql would be excellent, any existing web-service would be great too.
I could write my own code to do this, but I'm sure this has to be an issue previously encountered by someone?
What I did:
I wasn't aware you could share an entire tag using Google reader, thanks to Mike Wills for pointing this out.
Once I knew I could do this it was simply a matter of adding the feed to a separate Google account (not to clog up my personal reading list), I also did some selective matching using Yahoo pipes just to get the specific entries I was interested in, this too to minimize the risk that anything would be missed.
It sounds like Google Reader does everything you're wanting. Not sure what you mean by marking individual items--you'd have to do that with any RSS aggregator.
I use Google Reader for my podiobooks.com subscriptions. I add all of the feeds to a tag, in this case podiobooks.com, that I share (but don't share the URL). I then add the RSS feed to iTunes. Example here.
Sounds like you want some sort of service that checks the RSS feed every X minutes, so you can download every single article/item published to the feed while you are "watching" it, rather than only seeing the items displayed on the feed when you go to view it. Do I have that correct?
Instead of coming up with a full-blown software solution, can you just use cron or some other sort of job scheduling on the webserver with whatever solution you are already using to read the feeds and download their content?
Otherwise it sounds like you'll end up coming close to re-writing a full-blown service like Google Reader.
Writing an aggregator for keeping longer history shouldn't be too hard with a good RSS library.