I have a transactional database with very large number of records and concurrent access.i need to provide download facility to the clients.the download file size can be even up to 300MB.so if i provide the download facility directly from server memory there will a performance issue.is there any other alternative way to achieve this?
Related
I have read many questions/comments regarding saving the image in DB or file system on server side. However i'm still confused. For now I allow user to upload image (limit to 10MB) and I save the image in the server folder and serve the image via apache context path configuration pointed to that location. However, due to the numbers of image and high load. We want to provide load balancing and fail over functionality. So I have 2 options.
Add code to replicate the uploaded image to all servers or using rsync to do that.
Using CouchDB or MongoDB and save the image as attachment of an document. So I have out of the box replicate functionality.
Can anyone show me the pros/cons of these approach. Can CouchDB/MongoDB have the same read performance compared to file system ?
You can also store files in distributed file system. The benefit over DB supported image server is you do not have to alter the application. Obviously, storing all the data the same way, including images, may be a benefit for you, but changing architecture for already working system may also be problematic.
For example, GlusterFS may be installed on top of "normal" file system to give you distributed features minimizing changes to the system itself. It is supposed to support via its plugins (translators) all the feature you would potentially expect from cloud system: replication, load balancing, stripping of files into relocated parts and fail-over.
Can CouchDB/MongoDB have the same read performance compared to file system ?
No, there will be lag between file system timers and database timers, this is an unfortunately reality.
I have no idea of your current setup, load and performance so I cannot really advise on what to do, however, Apache isn't really a good image server anyway.
Your best bet might be to look into a CDN cache for your images.
I have multiple configuration files which I need to read from disk and apply to many records.
I need to improve this to increase performance.
I have two processes.
Process1: Update Configuration:
This updates content configuration files.
This can run from multiple locations.
Process2: Apply Configuration:
This uses content of configuration files.
This can run from multiple locations.
At present, this is using direct file+n/w IO to read updated configuration files.
Both processes are back-end and there is no browser involved here.
Should I use Redis or Memcached as a cache for FILES ?
Note that file need to be read from a common location. They are being updated by another background process. Update can happen any time. Size of configuration files is 1K to 10K.
I want Process2 to access updated configuration files in fastest way possible.
Redis is good choice as it preserves data in memory with optional persistence. So such approach does not have to touch hard drive.
The problem I can see here that every client needs to understand Redis and is to use some support library, e.g. in Java or whatever language you use.
Why to not use http itself, e.g. deploy some http file server. You can also provide version checking + caching, so client can store version of file on the server and use client-cache content if the server has same file and download it when it was changed. This is called HEAD, look at http://www.tutorialspoint.com/http/http_methods.htm
You just should use same approach as web itself has. Every browser downloads content, html, css, images etc. Best improvement, for you, is client side caching, e.g. css or images are stored in browsers cache and download only first type or when it was changed.
And if you dont want, you cant use exactly REST approach itself.
I have a large file on window azure and I want to download and save it on my disk. The maximum time for each link on window azure is 60 minutes. If I dowload directly base on link, maybe it isn't enough time. How to download it?
Nathan, your question isn't very clear, but I suspect you are referring to the time allowed for a Shared Access Signature, and being concerned that the client might not download the file within the time allowed?
There are 2 scenarios here:
Once a storage transaction (ie. download file) which uses a SAS
begins, then the transfer will be able to continue past the
expiration of the SAS. It is only new requests which are
authenticated using the SAS and which will fail if they are
attempted past the expiration time on the SAS.
If the client has to resume the download (or is downloading in
blocks), then the client has to be smart enough to detect the failed
authentication after the SAS expires and then re-request a new SAS
from the issuer.
try using a download accelerator like flashgot or something similar ...
One option would be to download the file in pieces and reassemble it once you have the pieces. There are a couple of ways to do that.
If the blob was uploaded in multiple blocks, then you could download each block individually. This is supported directly in the client libraries, so if you can do this it's probably easier. You can also download the blocks in parallel to reduce the total time it takes to download.
You could use HTTP Range headers to get certain byte ranges. I don't believe this is supported in the clients, so you'd probably have to code it yourself. But it will work even if the blob was not uploaded in blocks. I think this could also be done in parallel, but I'm not sure.
I'm researching solutions for a potential client. They're requesting the ability to download a large amount of MP3's (1000+) from their online catalog.
I've researched/tested building a zip containing all MP3s using ZipArchive but ran into obvious memory leak issues that have ruled that solution out.
I'm now trying to think out of the box.
One idea was to create an FTP queue or a Torrent type download link for them. Is there anything out there that can pull something like this off?
Any help or suggested direction would be greatly appreciated! Thanks!!
Edit: Here is the overall process/goal that we're trying to achieve.
The client creates music for TV/Flim placement. They maintain a online catalog AND a local copy they send to potential buyers. The online catalog and the offline catalog need to mirror each other. Problem being, they have multiple offices that will have to update their local copy with the new files added to the online catalog from many different locations
Example: East Coast User updates catalog with 100 new files. West Coast User needs to update the offline catalog with the new files retrieved from the online catalog.
We had hoped to create custom zip's of the files each user needed to update their catalog based on the user's download history that we'd maintain in MySQL. We were testing ZipArchive but we couldn't seem to build Zips over 175 MEG (give or take). We're in the process of testing ZipStreaming but are having some issues.
I hope this clears up the overall goal and problems we are facing.
GNU wget?
It can download recursive. Just give wget a list of all files on the server, e.G.
http://www.example.org/filelist.html which contains links like file1.mp3, file2.mp3 etc (apache normally generates such an index file automatically wenn a directory without index.html/php in it gets called.
http://linux.die.net/man/1/wget
Frankly speaking, I can't identify the actual problem/question from your post. If you are looking for minimizing network load, then you need to remember that MP3 files are not compressed well because they are already compressed (not as well as possible, but well). If you are looking for a transport, than any file transfer protocol will do (FTP, SFTP, HTTP, WebDAV).
If you need flexibility and features, I'd recommend SFTP: this is a protocol for remote file system access, so besides "get file" operation it has plenty of useful operations including machine-readable directory listing (not always available in FTP and not available in standard HTTP), built-in ZLib compression, built-in possibility to resume file transfers and more bonuses. HTTP also has ZLib compression, but this one is not always available.
Update: your approach doesn't care about what is really available on the client and you are going to prepare ZIP files based on your (possibly incorrect) knowledge of the client already has.
If the client and server are both applications that you develop, then you should use RSync protocol or something similar to update data online (not using any ZIP files) and download the files that are missing on the client. If direct communication between the client and the server is not possible, you can make the client send his state to the server and the server will prepare an individual package after that. As for ZIP functionality - it's needed only when you use batch update (no real-time communication between the client and the server). I don't know what technology you are using but if your only problem is with ZIP component, you can use something else for data packing - either different ZIP component (for .NET and VCL we have ZIP component) or some other packing solution (for example, our SolFS product doesn't have size limits). Unfortunately I am not aware of RSync-like implementation available as a component.
Is it possible to organize asynchronous data exchange with separate files (transportable tablespaces maybe) using Oracle Streams? I.e, is it possible to organize offline replication using files?
You may want to consider using DataPump or imp/exp if you want to do this in batches with files. You would use these tools to export the things you want, then get the files over to the other database and import them.
If you have a poor connection between the two hosts, however, you're going to run in to the same problem you're running in to with your snapshots and database links. You still need to get the data across to the other box.
Oracle Streams is another option that may be able to basically queue up transactions while the link is unusable, but is a far more advanced topic and one you may want to consider hiring a consultant for.