Feature: maximum privacy - transloadit

Transloadit uses a temporary url for our files. I currently need to upload some important files to my rackspace cloud private container.. i have all setup but seems that, after uploading, users can read assembly result with js. Rackspace urls can be read but, because its a private container, rackspace url (and ssl_url) are inaccessible from user.
The problem is that, there is also a transloadit temporary url that contains that file.
Is there anyway to disable that temporary url so that we can guarantee to our users that their files are not publicly accessible? If not, can this flag be implemented so that we can use it on our template?
best
FA

we use that temporary URL for machine-to-machine communication of files. I realize this is not ideal, and we could implement a flag that auto-deletes all temporary files of an assembly after its execution.
In the future we will also likely move to Amazon EFS which will remove the need for the temporary URLs entirely.
Kind regards,
Tim
Co-Founder Transloadit
#tim_kos

Related

can I 'download' to a cloud storage?

My question might sound weird if this is not possible but just bear with me.
I would like to know if it is possible to download stuff to a cloud storage (generally) like you can to your local storage.
I want to build a small bot that can download media(pdf, vid, audio,...) and send to me. As of now, it downloads the file to my local storage before sending the file. However, when I will host it(I plan to do so on a free service since it's small), I suspect that might not be possible and even if it were the storage for the app itself will be too small to accommodate more than a few files.
As such, I want to use a cloud service of some sort to serve as an intermediate where I can download the file, before sending it. But I don't know if that is possible or even makes sense.
After looking around, I have learnt of some cloud-to-cloud services that can directly extract data from a link and store it in my cloud.
However, this is not applicable in my case since some modifications will have to be done to the files before sending. For example, I have some code below that downloads the audio from a youtube video
from pytube import YouTube
import os
def download(URL: str) -> str:
yt = YouTube(url)
video = yt.streams.filter(only_audio=True).first()
out_file = video.download(output_path="./music")
base, ext = os.path.splitext(out_file)
new_file = base + '.mp3'
os.rename(out_file, new_file)
return new_file
As in this case, I only want to download the audio of the video. So my question is, is it possible for me to download to some cloud storage (the same way I would download to my local storage) ...[ as in out_file = video.download(output_path="path/to/some/cloud/storage")
Thanks for helping out! :)
If you were dealing with a straight forward file download, then maybe. You'd need the cloud storage service to have an API which would let you specify a URL to pull a file from. The specifics would depend on which cloud storage service you were using.
You aren't dealing with a straight forward file download though. You're pulling a stream from YouTube and stitching the pieces together before saving it as a single file.
There's no way to do that sort of manipulation without pulling it down to the machine the program is running on.
It is possible to treat some cloud storage systems as if they were local directories. Either by using software which synchronises a local directory with the cloud (which is pretty standard for the official software) or one which moves files up and down on demand (such as s3fs). The latter seems to fit most of your needs, but might have performance issues. That said, both of these are scuppered for you since you intend to use free hosting which wouldn't allow you to install such software.
You are going to have the store the files locally.
To avoid running into capacity limits, you can upload them to whatever cloud storage service you are using (via whatever API they provide for that) as soon as they are downloaded … and then delete the local copy.

Will Cloudflare still cache files behind xsendfile

I have set up a Wordpress Woocommerce storefront. I want to set up downloadable products which will be downloaded via XSendFile module.
However, my download files are quite big (50mb approx) and so am planning to set up Cloudflare to cache the download files so I don't exceed my bandwidth limit from my hosting service.
My question is, will Cloudflare cache files that are linked through Apache's XSendFile module?
Sorry if this is a basic question. I'm just trying to figure out whether this set up will work or whether I will need to find an alternative solution.
NOTE: Forgot to add that the download files are pdf files.
It really depends on if we are proxying the record that it is on (www, for example). It is also important to note that we wouldn't cache third-party resources at all, if it is structured in some way that is not directly on the domain.
I would also recommend reviewing what CloudFlare caches by default.

Is it advisable to use Redis or Memcached as a cache for FILES?

I have multiple configuration files which I need to read from disk and apply to many records.
I need to improve this to increase performance.
I have two processes.
Process1: Update Configuration:
This updates content configuration files.
This can run from multiple locations.
Process2: Apply Configuration:
This uses content of configuration files.
This can run from multiple locations.
At present, this is using direct file+n/w IO to read updated configuration files.
Both processes are back-end and there is no browser involved here.
Should I use Redis or Memcached as a cache for FILES ?
Note that file need to be read from a common location. They are being updated by another background process. Update can happen any time. Size of configuration files is 1K to 10K.
I want Process2 to access updated configuration files in fastest way possible.
Redis is good choice as it preserves data in memory with optional persistence. So such approach does not have to touch hard drive.
The problem I can see here that every client needs to understand Redis and is to use some support library, e.g. in Java or whatever language you use.
Why to not use http itself, e.g. deploy some http file server. You can also provide version checking + caching, so client can store version of file on the server and use client-cache content if the server has same file and download it when it was changed. This is called HEAD, look at http://www.tutorialspoint.com/http/http_methods.htm
You just should use same approach as web itself has. Every browser downloads content, html, css, images etc. Best improvement, for you, is client side caching, e.g. css or images are stored in browsers cache and download only first type or when it was changed.
And if you dont want, you cant use exactly REST approach itself.

Is there an equivalent to Amazon S3's s3cmd sync for Rackspace Cloud Files?

I'm currently using Amazon S3 to host all static content on my site. The site has a lot of static files, so I need an automated way of syncing the files on my localhost with the remote files. I currently do this with s3cmd's sync feature, which works wonderfully. Whenever I run my deploy script, only the files that have changed are uploaded, and any files that have been deleted are also deleted in S3.
I'd like to try Rackspace CloudFiles; however, I can't seem to find anything that offers the same functionality. Is there anyway to accomplish this on Rackspace Cloud Files short of writing my own syncing utility? It needs to have a command-line interface and work on OS X.
The pyrax SDK for the Rackspace Cloud has the sync_folder_to_container() method for cloudfiles that sounds like what you're looking for. It will only upload new/changed files, and will optionally remove files from the cloud that are deleted locally.
As far as the initial upload is concerned, I usually use eventlet to upload files in as asynchronous a manner as possible. The total time will still be limited by your upload speeds (I don't know of any SDK that can get around that), but the non-blocking code will certainly help overall performance.
If you have any other questions, feel free to ask here on on the GitHub page.
-- Ed Leafe
The Rackspace Python SDK can do that for you. There's a script called cf_pyrax.py that does, more or less, what I think you're trying to do. There's a write up on it in this blog post.

Amazon S3 Missing Files

We're working on developing user widgets that our members can embed on their websites and blogs. To reduce the load on our servers we'd like to be able to compile the necessary data file for each user ahead of time and store it on our Amazon S3 account.
While that sounds simple enough, I'm hoping there might be a way for S3 to automatically ping our script if a file is requested that for some reason doesn't exist (say for instance if it failed to upload properly). Basically we want Amazon S3 to act as our cache and it to notify a script on a cache miss. I don't believe Amazon provides a way to do this directly but I was hoping that some hacker out there could come up with a smart way to accomplish this (such as mod_rewite, hash table, etc).
Thanks in advance for your help & advice!
Amazon doesn't currently support this, but it looks like it might be coming. For now, what you could do is enable logging and parse the log for 404s once an hour or so.
It's certainly not instant, but it would prevent long-term 404s and give you some visibility about what files are missing.

Resources