Reliable Ways to Send Large Files to Clients - ftp

we have a need to regularly provide large files to clients on a daily or weekly basis. Currently our process is this:
Internal process creates the file and places it in a specific folder
Our client connects via SFTP and downloads the file
This work well when the files are small. As they get bigger (50-100 GB in size), we keep getting network interruptions and internal disk space related issues.
What I'd like to see is the following:
Our internal process creates the file.
This file is copied to an intermediary service (similar to something like FileDropper).
Our client will download the file from this intermediary service.
I'd like to know if other people had similar issues and what possible solutions are in place. File Dropper works great for non-business related files but obviously I won't be putting client data on there. We also have an Office 365 subscription. I tried to see what I could use with that but I haven't found anything yet that would help solve this.
Any hints, suggestions or feedback is much appreciated!

Consider Amazon S3.
I have used it several times in the past and it is very reliable both for processing a lot of files and for processing large files

Related

Reading and parsing of text file on the server side

I need to tackle the problem of processing parsing of a text file on server side only.
There is a company's server, that is used for the storage of files. I have an access to it through windows explorer with a folder address like "\folder\folder". There I have the following permissions: modify, read & execute, read and write.
There is stored a big txt file (250-300 Mb) with approximately 1-2kk of lines as an input. I need to parse the lines that contain only specific substring. As a result, the output will contain only 2-3k lines.
The input must be located on the server to be accessible to many people. Each person is supposed to handle the result and then download it to his own PC.
The crucial point of the problem is a terrible speed of internet connection between server and client. It may take up to 20 minutes to download and open this huge input file.
Therefore, I'm curious if it is possible to proceed with parsing on the server side only to avoid client-server connection issues and then download output which is a merely small file.
Could anybody, please, help to find out any solution to this problem?
Also as it is a company’s server, the problem should be solved with a limited set of instruments such as Windows’ (maybe some scripting language like powershell or vbscript), Notepad++, or Excel’s/VBA's functionality. A suggestion of your method which may comply with this restriction is greatly welcomed.

Sync a local folder with a server via REST API calls?

I currently have the following problem and can't decide which way to go:
I have a local directory with subfolders and files and want to mirror and sync that with a remote directory on a server. The problem is that I don't have any direct access to the server itself. The only access point I have is a bunch of REST API calls such as: uploading a file, downloading a file, getting metadata of a file (including creation and change date) and getting a file/directory list
I have already spent some time to look for possible programs/implementations but none of those have really convinced me. Here are some of the possiblities I considered so far:
Use a Powershell or Python script and manually check each file and folder for changes. Schedule a task to call the script every x minutes/hours
Use the Microsoft Sync Framework (MSF) and implement a custom SyncProvider which handles the REST calls and translates it into MSF format. Here I can't really tell if it's feasable at all and how complex it would be
Use tools like Syncthing or similar, but I couldn't find something that supports a remote sync directory only accessible via REST calls, but as there are quite a lot of tool I might have missed some that do
I'm working under Windows 10 so the solution should run on Windows and preferably not require too many addition resources.
Furthermore the solution should be somewhat resilient to errors as the REST API calls seem to have a tendency to fail sometimes (roughly 1 in 10 calls fails)
Any ideas and suggestions are welcome :)

Is FTP file sharing faster than cloud storage alternatives e.g dropbox / google drive / mediafire

I work doing DCP (digital cinema packages) for trailers, the files are usually a zip of 1-2 gig.
I have been just uploading them to an ftp on a cloud hosting and sending the links with username/password, and that works most of the time but lately there has been some clients that experience time out while downloading and unable to resume (clients being local cinemas downloading the files)
I know some foreign production houses use dropbox and similar web based file sharing to send their big files but I wonder if is there any alternative to FTP and web based file sharing aside from torrents ?
I have had FTP timeout issues delivering broadcast-sized media, especially with distant clients. In some cases, I use 7Zip volume split archives to deliver the large file in smaller pieces, which speeds up the overall transfer (multiple downloads at once) while preventing timeouts. The client needs to be somewhat technically inclined, as it involves using a 7z archiver like 7Zip or PeaZip. They basically download all the pieces into a single folder, and when they open the first file, it shows the whole file they can then extract (I usually go with 256MB pieces).
Here's a how-to I found real quick, but there are plenty others out there: http://www.linglom.com/it-support/how-to-split-a-large-file-using-7-zip/

Building a file upload site that scales

I'm attempting to build a file upload site as a side project, and I've never built anything that needed to handle a large amount of files like this. As far as I can tell, there are three major options for storing and retrieving the files (note that there can be multiple files per upload, so, for example, website.com/a23Fc may let you download a single or multiple files, depending on how many the user originally uploaded - similar to imgur.com):
Stick all the files in one huge files directory, and use a (relational) DB to figure out which files belong to which URLs, then return a list of filenames depending on that. Example: user loads website.com/abcde, so it queries the DB for all files related to the abcde uploads, returns their filenames, and the site outputs those.
Use CouchDB because it allows you to actually attach files to individual records in the DB, so each URL/upload could be a DB record with files attached to it. Example, user loads website.com/abcde, CouchDB grabs the document with the ID of abcde, grabs the files attached to that document, and gives them to the user.
Skip out on using a DB completely, and for each upload, create a new directory and stick the files in that. Example: user loads website.com/abcde, site looks for a /files/abcde/ directory, grabs all the files out of there, and gives them to the user, so a database isn't involved at all.
Which of these seems to most scalable? Like I said, I have very little experience in this area so if I'm completely off or if there is an obvious 4th option, I'm more than open to it. Having thousands or millions of files in a single directory (i.e., option 1) doesn't seem very smart, but having thousands or millions of directories in a directory (i.e., option 3) doesn't seem much better.
A company I used to work for faced this exact problem with about a petabyte of image files. Their solution was to use the Andrew File System (see http://en.wikipedia.org/wiki/Andrew_File_System for more) to store the files in a directory structure that matched the URL structure. This scaled very well in practice.
They also recorded the existence of the files in a database for other reasons that were internal to their application.
I recommend whichever solution you can personally complete in the shortest amount of time. If you already have working CouchDB prototypes, go for it! Same for a relational-oriented or filesystem-oriented solution.
Time-to-market is more important than architecture for two reasons:
This is a side project, you should try to get as far along as possible.
If the site becomes popular, since the primary purpose is file upload, you are likely to rebuild the core service at least once, perhaps more, during the life of the site.
If you are going to user ASP.NET here is article that describes how to use Distributed File System for web farm http://weblogs.asp.net/owscott/archive/2006/06/07/DFS-for-Webfarm-Usage---Content-Replication-and-Failover.aspx

downloading large amount of files

I'm researching solutions for a potential client. They're requesting the ability to download a large amount of MP3's (1000+) from their online catalog.
I've researched/tested building a zip containing all MP3s using ZipArchive but ran into obvious memory leak issues that have ruled that solution out.
I'm now trying to think out of the box.
One idea was to create an FTP queue or a Torrent type download link for them. Is there anything out there that can pull something like this off?
Any help or suggested direction would be greatly appreciated! Thanks!!
Edit: Here is the overall process/goal that we're trying to achieve.
The client creates music for TV/Flim placement. They maintain a online catalog AND a local copy they send to potential buyers. The online catalog and the offline catalog need to mirror each other. Problem being, they have multiple offices that will have to update their local copy with the new files added to the online catalog from many different locations
Example: East Coast User updates catalog with 100 new files. West Coast User needs to update the offline catalog with the new files retrieved from the online catalog.
We had hoped to create custom zip's of the files each user needed to update their catalog based on the user's download history that we'd maintain in MySQL. We were testing ZipArchive but we couldn't seem to build Zips over 175 MEG (give or take). We're in the process of testing ZipStreaming but are having some issues.
I hope this clears up the overall goal and problems we are facing.
GNU wget?
It can download recursive. Just give wget a list of all files on the server, e.G.
http://www.example.org/filelist.html which contains links like file1.mp3, file2.mp3 etc (apache normally generates such an index file automatically wenn a directory without index.html/php in it gets called.
http://linux.die.net/man/1/wget
Frankly speaking, I can't identify the actual problem/question from your post. If you are looking for minimizing network load, then you need to remember that MP3 files are not compressed well because they are already compressed (not as well as possible, but well). If you are looking for a transport, than any file transfer protocol will do (FTP, SFTP, HTTP, WebDAV).
If you need flexibility and features, I'd recommend SFTP: this is a protocol for remote file system access, so besides "get file" operation it has plenty of useful operations including machine-readable directory listing (not always available in FTP and not available in standard HTTP), built-in ZLib compression, built-in possibility to resume file transfers and more bonuses. HTTP also has ZLib compression, but this one is not always available.
Update: your approach doesn't care about what is really available on the client and you are going to prepare ZIP files based on your (possibly incorrect) knowledge of the client already has.
If the client and server are both applications that you develop, then you should use RSync protocol or something similar to update data online (not using any ZIP files) and download the files that are missing on the client. If direct communication between the client and the server is not possible, you can make the client send his state to the server and the server will prepare an individual package after that. As for ZIP functionality - it's needed only when you use batch update (no real-time communication between the client and the server). I don't know what technology you are using but if your only problem is with ZIP component, you can use something else for data packing - either different ZIP component (for .NET and VCL we have ZIP component) or some other packing solution (for example, our SolFS product doesn't have size limits). Unfortunately I am not aware of RSync-like implementation available as a component.

Resources