I have an ftp directory with Akamai now and I need to upload images as fast as possible (possibly 1+ million per day)
What would be the fastest way to sync local files to an ftp site?
thanks
Instead of FTP, use Rsync. It has lower overhead than FTP and is well suited to synchronising a large filebase.
Rsync documentation
Akamai Netstorage supports Rsync as an upload method. It may need to be enabled in the Akamai control panel - whoever administers your Netstorage user accounts can enable it.
Rsync is included in all Linux distributions, if you are on Windows you can get it as part of cygwin.
1 million a day sure is a lot, its hard to imagine what requires having such a huge number resources. All I can suggest is solving this purely at the ftp sync level, using an off the shelf tool. (Maybe http://www.ftpsynchronizer.com/?)
Failing that, knocking up a directory watching ftp uploader wouldn't be a hugely difficult programming job in most common languages that have ftp libaries.
The other alternative is that if you can get these files on an internet facing server, you can switch to using Akamai Http Content Delivery and get Akamai to pull the images rather than you having to continuously push them.
if you have such huge number of files and you want to upload faster, then I would suggest going for 'signiant' product which improves the upload time drastically. its a 3rd party upload service which works with Akamai very well, many customers use it.
Related
I have a network connection where I pay per megabyte, so I'm interested in reducing my bandwidth usage as far as possible while still having a reasonably good browsing experience. I use this wonderful extension (https://bandwidth-hero.com/). This extension runs a image-compression proxy on my heroku account that accepts images URLs, and returns a low-quality version of those images.This reduces bandwidth usage by 30-40% when images are loaded.
To further reduce usage, I typically browse with both JavaScript and images disabled (there are various extensions for doing this in firefox/firefox-esr/google-chrome). This has an added bonus of blocking most ads (since they usually need JavaScript to run).
For daily browsing, the most efficient solution is using a text-mode browser in a virtual console such as elinks/lynx/links2 running over ssh (with zlib compression) on a VPS server. But sometimes using JavaScript becomes necessary, as sites will not render without it .Elinks is the only text-mode browser that even tries to support JavaScript, and even that support is quite rudimentary. When I have to come back to using firefox/chrome, I find my bandwidth usage shooting up. I would like to avoid this.
I find that bandwidth is used partially to get the 'raw' html files of the sites I'm browsing, but more often for the associated .js/.css files. These are typically highly compressible. On my local workstation, html+css+javascript files typically compress by a factor of more than 10x when using lzma(2) compression.
It seems to me that one way to do drastically reduce bandwidth consumption would be to use the same template as the bandwidth-hero extension, i.e. run a compression proxy either on a vps or on my heroku account but do so for text content (.html/.js/.css).
Ideally, I would like to run a compression proxy on my local machine. When I open a site (say www.stackoverflow.com), the browser should send a request to this local proxy. This local proxy then sends a request to a back-end running on heroku/vps. The heroku/vps back-end actually fetches all the content, and compresses it (lzma/bzip/gzip). The compressed content is sent back to my local proxy. The local proxy decompresses the content and finally gives it to the browser.
There is something like this mentioned in this answer (https://stackoverflow.com/a/42505732/10690958) for node.js . I am thinking of the same for python.
From what google searches show, HTTP can "automatically" ask for gzip versions of pages. But does this also apply for the associated files that are loaded by JavaScript, and for the css files? Perhaps, what I am thinking about is already implemented by default ?
Any pointers would be welcome. I was thinking of writing a local proxy in python,as I am reasonably fluent in it. But I know little about heroku or the intricacies of HTTP.
thanks.
Update: I found a possible solution here https://github.com/barnacs/compy
which does almost exactly what I need (minify+compress with brotli/gzip+transcode jpeg/gif/png). It uses go instead of python, but that does not really matter. It also has a docker image here https://hub.docker.com/r/andrewgaul/compy/ . Since I'm not very familiar with heroku, I cant figure out how to use this to run the compression proxy service on my account. The heroku docs also weren't of much help to me. Any pointers would be welcome.
I've made a website for an arts organisation. The website allows people to browse a database of artists' work. The database is large and the image files for the artists' work come to about 150Gb. I have my own server that is currently just being used to keep the images on its hard-drive.
I'm going to purchase hosting so I don't have to worry about bandwidth etc... but would it be better to purchase hosting that allows me to upload my entire image database or should I use the website to get the images from my server? If so how would I do that?
Sorry I am very new to this
I think it could be better to have the data on the same server so you avoid calls to another server for images which are quite big as you say and this can slow you down overall.
I assume you will need to set up some API on your server to deliver the images or at least URLs for them but then you must make sure they are accessible.
You'll want the image files on the same server as your website, as requests elsewhere to pull in images will definitely hinder your site's performance - especially if you have large files.
Looking at large size of database and consideration of bandwidth, dedicated server will be suitable as they includes large disk spaces and bandwidth. You can install webserver as well as database server on same server inspite of managing them separately. Managing database backups and service monitoring becomes much more easier.
For an instance, you can review dedicated server configuration and resources here :- https://www.accuwebhosting.com/dedicated-servers
I work doing DCP (digital cinema packages) for trailers, the files are usually a zip of 1-2 gig.
I have been just uploading them to an ftp on a cloud hosting and sending the links with username/password, and that works most of the time but lately there has been some clients that experience time out while downloading and unable to resume (clients being local cinemas downloading the files)
I know some foreign production houses use dropbox and similar web based file sharing to send their big files but I wonder if is there any alternative to FTP and web based file sharing aside from torrents ?
I have had FTP timeout issues delivering broadcast-sized media, especially with distant clients. In some cases, I use 7Zip volume split archives to deliver the large file in smaller pieces, which speeds up the overall transfer (multiple downloads at once) while preventing timeouts. The client needs to be somewhat technically inclined, as it involves using a 7z archiver like 7Zip or PeaZip. They basically download all the pieces into a single folder, and when they open the first file, it shows the whole file they can then extract (I usually go with 256MB pieces).
Here's a how-to I found real quick, but there are plenty others out there: http://www.linglom.com/it-support/how-to-split-a-large-file-using-7-zip/
I'm researching solutions for a potential client. They're requesting the ability to download a large amount of MP3's (1000+) from their online catalog.
I've researched/tested building a zip containing all MP3s using ZipArchive but ran into obvious memory leak issues that have ruled that solution out.
I'm now trying to think out of the box.
One idea was to create an FTP queue or a Torrent type download link for them. Is there anything out there that can pull something like this off?
Any help or suggested direction would be greatly appreciated! Thanks!!
Edit: Here is the overall process/goal that we're trying to achieve.
The client creates music for TV/Flim placement. They maintain a online catalog AND a local copy they send to potential buyers. The online catalog and the offline catalog need to mirror each other. Problem being, they have multiple offices that will have to update their local copy with the new files added to the online catalog from many different locations
Example: East Coast User updates catalog with 100 new files. West Coast User needs to update the offline catalog with the new files retrieved from the online catalog.
We had hoped to create custom zip's of the files each user needed to update their catalog based on the user's download history that we'd maintain in MySQL. We were testing ZipArchive but we couldn't seem to build Zips over 175 MEG (give or take). We're in the process of testing ZipStreaming but are having some issues.
I hope this clears up the overall goal and problems we are facing.
GNU wget?
It can download recursive. Just give wget a list of all files on the server, e.G.
http://www.example.org/filelist.html which contains links like file1.mp3, file2.mp3 etc (apache normally generates such an index file automatically wenn a directory without index.html/php in it gets called.
http://linux.die.net/man/1/wget
Frankly speaking, I can't identify the actual problem/question from your post. If you are looking for minimizing network load, then you need to remember that MP3 files are not compressed well because they are already compressed (not as well as possible, but well). If you are looking for a transport, than any file transfer protocol will do (FTP, SFTP, HTTP, WebDAV).
If you need flexibility and features, I'd recommend SFTP: this is a protocol for remote file system access, so besides "get file" operation it has plenty of useful operations including machine-readable directory listing (not always available in FTP and not available in standard HTTP), built-in ZLib compression, built-in possibility to resume file transfers and more bonuses. HTTP also has ZLib compression, but this one is not always available.
Update: your approach doesn't care about what is really available on the client and you are going to prepare ZIP files based on your (possibly incorrect) knowledge of the client already has.
If the client and server are both applications that you develop, then you should use RSync protocol or something similar to update data online (not using any ZIP files) and download the files that are missing on the client. If direct communication between the client and the server is not possible, you can make the client send his state to the server and the server will prepare an individual package after that. As for ZIP functionality - it's needed only when you use batch update (no real-time communication between the client and the server). I don't know what technology you are using but if your only problem is with ZIP component, you can use something else for data packing - either different ZIP component (for .NET and VCL we have ZIP component) or some other packing solution (for example, our SolFS product doesn't have size limits). Unfortunately I am not aware of RSync-like implementation available as a component.
We have an Oracle 10g forms application running on a Solaris OAS server, with the forms displaying in IE. Part of the application involves uploading and downloading files (Word docs and PDFs, mainly) from the PC to the OAS server, using Oracle's webutil utility.
The problem is with large files (anything over 25Megs or so), it takes a long time, sometimes many minutes. Uploading seems to work, even with large files. Downloading large files, though, will cause it to error out part way through the download.
I've been testing with a 189Meg file in our development system. Using WEBUTIL_FILE_TRANSFER.Client_To_DB (or Client_To_DB_with_Progress), the download would error out after about 24Megs. I switched to WEBUTIL_FILE_TRANSFER.URL_To_Client_With_Progress, and finally got the entire file to download, but it took 22 minutes. Doing without the progress bar got it down to 18 minutes, but that's still too long.
I can display files in the browser, and my test file displayed in about 5 seconds, but many files need to be downloaded for editing and then re-uploaded.
Any thoughts on how to accomplish this uploading and downloading faster? At this point, I'm open to almost any idea, whether it uses webutil or not. Solutions that are at least somewhat native to Oracle are preferred, but I'm opn to suggestions.
Thanks,
AndyDan
This may be totally out to lunch, but since you're looking for any thoughts that might help, here are mine.
First of all, I'm assuming that the actual editing of the files happens outside the browser, and that you're just looking for a better way to get the files back and forth.
In that case, one option I've used in the past is just to route around the web application using Apache, or any other vanilla web server you like. For downloading, create a unique file session token, remember it in the web application, and place a copy of the file, named with the token (e.g. <unique token>.doc), in a download directory visible to Apache. Then provide a link to the file that will be served via Apache.
For upload, you have a couple of options. One is to use the mechanism you've got, then when a file is uploaded, you just have to match on the token in the name to patch the file back into your archive. Alternately, you could create a very simple file upload form separate from your application that will upload the file to a temp directory via Apache, then route the user back into your application and provide the token in the URL HTTP GET-style or else in a cookie.
Before you go to all that trouble, you'll want to make sure that your vanilla web server will provide better upload and download speed and reliability than your current solution, but it should.
As an aside, I don't know whether the application server you're using provides HTTP compression, but if it does, you should make sure it's enabled and working. This is probably the best single thing you can do to increase transfer speed of large files, assuming they're fairly compressible. If your application server doesn't support it, then most any vanilla web server will.
I hope that helps.
I ended up using CLIENT_HOST to call an FTP command to download the files. My 189MB test file took 20-22 minutes to download using WEBUTIL_FILE_TRANSFER.URL_To_Client_With_Progress, and only about 20 seconds using FTP. It's not the best solution because it leaves the FTP password exposed on the PC temporarily, but only for as long as the download takes, and even then the user would have to know where to find it.
So, we're implementing this for now, and looking for a more secure but still performant long term solution.