minio slow read speed for small files - performance

i was under the impression that minio is well suited for small file storage and read (https://blog.min.io/minio-optimizes-small-objects/) , i finally migrated my 2 million small text files but the read speed surprisengly slower than directly from the disk ... is there a way to compact/merge those small files ? or is there something that i am doing wrong ...
my usual use case : reading a 10 000 random read files
when it was directly from the disk i average around 120 seconds
i transfered then to a local network solution : it took a round 500-600 seconds to read
now with minio its around 600 seconds
RQ : (
the disk is capable of outputting greater speed but in large files also for minio it works great with large files
)...
do you guys have any idea ... i am really stuck :(

minio was never a good system to deal with that kinda of problem i think u just need to get a faster hardware (drive) ssd should work fine for u

Related

High throughput in-memory database for temporary storage of images

I'm looking for a high throughput in-memory database for storing binary chunks of sizes between 1.5MB to 3MB (images).
The use case is live video stream computer vision inference pipeline, where we have multiple deep models doing inference on 720p video at 25FPS in real-time. Our current solution is Amazon FSX with Lustre, which can handle the task (average throughput is 180MB/s). The models are in their own K8s pods and read the decoded video frames from the FSX. The problem is that it takes a long time to setup for each run and it's not optimal, since in order to increase the throughput you also need to pay for extra space, which we don't really need, since the storage is temporary and most of the time less than a 1000 frames are stored at once. Ideally, we would have an in-memory database on an instance, which can be lifted up fast and is can have a very high throughput (up to 500MB/s).
I've tested Redis and Memcached as an alternative, but both fail to achieve similar performance, which I assume is due to large chunk sizes (as far as I know both are meant for many smaller sized chunks and not for larger ones).
Any suggestions on what else to test or in what direction to look would be very helpful.
Thank you!
You could take a look at eXtremeDB. I work for the vendor (McObject), so hopefully this won't get flagged as 'commercial' since you asked for ideas. eXtremeDB has been used for facial and fingerprint recognition in access control systems. Not exactly the same use case, but perhaps similar enough to warrant a look.

Copy a file every 10 seconds using a batch script - only once

I'm trying to find a way to run a batch script on Windows that backs up my project directory to our local network file share server.
Example of what I would usually run:
robocopy /mir "C:\PROJECT_FOLDER_PATH" "\\NETWORK_FOLDER_PATH"
But, every now and then, my IT admin approaches me about a massive copy operation that is slowing down the network.
As my projects folder grows over time, this becomes more of an annoyance. I try to run the script only while signing off later in the day to minimize the number of people affected in the office but, I was trying to come up with a better solution.
I've written a script that uses 7zip to create a 7zip archive and splits it into volumes of 250MB. So now I have a folder that just contains several smaller files and no folders to worry about. But, if I batch copy all of these to the server, I'm concerned I'm still running into the same problem.
So my initial idea was to run copy one file at a time every 5-10sec. rather than all at once. But I would only want the script to run once. I know I could write a loop and rely on robocopy's /mir tag to skip files that have already been backed up, but I don't want to have to monitor the script once I start it.
I want to run the script when I'm ready to do a backup and then have it copy the files up to the network at intervals to avoid over taxing our small network.
Robocopy has a special option to throttle data traffic while copying.
/ipg:n - Specifies the inter-packet gap to free bandwidth on slow lines.
The number n is the number of milliseconds for Robocopy to wait after each block of 64 KB.
The higher the number, the slower Robocopy gets, but also: the less likely you will run into a conflict with your IT admin.
Example:
robocopy /mir /ipg:50 "C:\PROJECT_FOLDER_PATH" "\\NETWORK_FOLDER_PATH"
On a file of 1 GB (about 16,000 blocks of 64 KB each), this will increase the time it takes to copy the file with 800 seconds (16,000 x 50 ms).
Suppose it normally takes 80 seconds to copy this file; this might well be the case on a 100 Mbit connection.
Then the total time becomes 80 + 800 = 880 seconds (almost 15 minutes).
The bandwidth used is 8000 Mbit / 880 sec = 9.1 Mbit/s.
This leaves more than 90 Mbit/s of bandwidth for other processes to use.
Other options you may find useful:
/rh:hhmm-hhmm - Specifies run times when new copies may be started.
/pf - Checks run times on a per-file (not per-pass) basis.
Source:
https://technet.microsoft.com/en-us/library/cc733145(v=ws.11).aspx
http://www.zeda.nl/index.php/en/copy-files-on-slow-links
http://windowsitpro.com/windows-server/robocopy-over-network

Can multiple clients can download the same file using FTP without performace impact?

I have a file at location "A" which will be downloaded by multiple clients via FTP. The clients can access the file at the same time. The host server (where file is stored) is solaris server with link speed of 100BT. The clients can support up to 1Gbps. Size of the file is nearly ~700 mb
When 5 to 6 clients downloaded the file, the download took around 20 mins. But when the number of clients was increased to ~40, the download took more than a hour.
My question here is that when the number of clients is increased will it have an impact on download speed? If yes then what are the factors that are responsible for this impact? Please clarify...
This question would better be asked on superuser because it is not about programming.
But if your server has a 100 BT link, it can support about 10 MB / sec. Distribute this over 5 clients and each gets 2 MB/sec. Use 40 clients and each gets 250 KB/sec. Of course it gets slower the more clients you have.
Imagine a load of sections of pipe of varying thicknesses joined together with your server at one end and your client(s) at the other. The pieces of pipe here are:
the disk where your file is stored on the server
the CPU and memory bandwidth on your server
the network connection from your server (and all switches and hubs on the way)
the CPU and memory bandwidth on your client
the disk where the file will be saved on your client
Basically, the transfer is going to go as fast as the thinnest piece of pipe allows data to flow through it. As a rough guide, the performances will be
60-150 MBytes/s
several GBytes/s
100 Mbits/s or around 10-12 MBytes/s
several GBytes/s
60-150 MBytes/s
As you can see, the server's 100Mb/s network interface is the biggest bottleneck by a massive factor (5-15x). Also, you say your file is 700mb (millibits), but I suspect you mean 700MB (megabytes). So, if your server's network interface is only 100 Mb/s (or 10MB/s) the 700MB file is going to take at least 70s to pass through the network and it will need to do so once for each client, so 5 clients are going to take at least 350s assuming no overheads.
Short answer:
try compressing the file,
or going on eBay to get a faster network interface for the server
distribute from the server to one (or more) of your 1Gb/s clients and then from there to other clients.

Storing and processing millions of images in a project

I have a project that will generate a huge number of images. (1 000 000 sorry i erred)
I need to process every image through algorithm.
Can you advice me an archinecture for this project?
It is proprietary algorithm in the area of computer vision.
Average size of image is near 20 kB
I need to process them when they are uploaded and 1 or 2 times on request.
On average, once a day I get a million images, each of which I will need to navigate through the algorithm 1-2 times per day.
Yep most often, the image will be stored on a local disk
When i process images i will generate new image.
Current view:
Most likely i will have a few servers (i do not own) for each of the servers i have to perform the procedure described above.
Internet bandwidth between servers is very thin (about 1 Mb \ s) but for me it is necessary to exchange messages between servers (update coefficients of the neural network) and update algorithm.
On current hardware (intel family 6 model 26) it is about 10 minutes to complete full procedure for 50 000 images.
May be
Where will be wide internel channels so i can upload this images to servers i have.
Dont know much about images. But i guess this should help http://www.cloudera.com/content/cloudera/en/why-cloudera/hadoop-and-big-data.html
Also please let us know what kind of processing you are talking about and when you are saying huge number of images. How much do you expect per hour or per day ?

How to increase ClickOnce download speed?

I have created application and deployed it using ClickOnce it size is about 30MB, at first it take time about 2 - 3 hours on 128KB lease-line. Then I have enable file compression on IIS and download time is reduce to about 40 minutes.
I want to decrease download time to 10 - 20 minutes but I have no idea how to do that.
My solution has 4 projects a, b, c and d, two of these project have very large size (nearly 15 MB). I try to decompose these two project but it design is very tightly couple and I have no time to do that.
If anyone have idea or solution for this problem please help me.
Well, about your only option is to find ways to compress/reduce the size of the projects themselves. Although, if you really have 128KB, it should not have taken anywhere near 2-3 hours for 30MB, unless you are running other data across that connection at the same time. As for the project itself, unless you can find a way to reduce the project size (you've already enabled compression in IIS) there's really nothing else open to you.
A 128 kilobits per second connection should be able to upload a 30mb file in about 33 minutes so it sounds like your connection is OK. You can possibly decrease the size of the file by getting rid of unnecessary dependencies (Visual Studio can perform this task for you, under Project Settings). You probably won't be able to decrease the file size very much, however.
I think the most reasonable thing for you to do is to try and upgrade your connection. Even my crappy consumer-grade upload rate nearly quadruples your 128kbps line.
Good luck.

Resources