Approach used to estimate download remaining time in applications (such as browsers) - download

I am just wondering why the initial estimation of the remaining time of each download in most (if not all) applications is only based on the current download speed of the particular download and does not take into account the other concurrent downloads.
For example, if we have 2 concurrent downloads that started at the same time (t=0), download A=10MB and download B=5MB and a total available bandwidth of 1MB/s, shared equally among the two downloads (i.e. 0.5MB/s per download when the two downloads are simultaneous), according to the commonly used approach, the estimated remaining download time for each download at time t=0 will be:
Download A: will be finished in 20 seconds
Download B: will be finished in 10 seconds
However, if for the initial estimation of the remaining download time of download A we took into account that download B will finish after 10s and thus the allocated bandwidth of download A will be increased from 0.5MB/s to 1MB/s, then the following, more accurate initial estimation could be made at time t=0:
Download A: will be finished in 15 seconds (at time t=10s 5MB of download A will have been downloaded and the rest 5MB of download A will be downloaded at 1MB/s)
Download B: will be finished in 10 seconds
Thus, the second approach can give us a more accurate initial estimate at time t=0.
Does anybody know why this approach is not utilized?

Related

How do programs estimate how long a process will take to complete?

In some loading bars there will be something like "2 minutes remaining". Does the programmer time how long the process takes on their computer and use that value some how? Or does the program calculate it all itself? Or another method?
This calculation is actually done internally because timing how long it would take to execute or download a certain program is based on internet speed, RAM, processor speed, etc. so it would be hard to have one universally predicted time based on the programmer's computer. Typically how this is calculated is based on how much of the program is already downloaded in comparison to the size of the file and takes into account how long it took to download that much data. From there the program extrapolates how much longer it will take to finish your download based on how fast it has operated until that point in time.
Those 'x minutes remaining' interface elements, which (ideally) indicate how much time it will take to complete a certain task are simply forecasts based on how long that task has taken so far, and how much work on that task has been accomplished.
For example, suppose I have a app that will upload a batch of images to a server (all of which are the same size, for simplicity). Here's a general idea of how the code that indicates time remaining will work:
Before we begin, we assign the current time to a variable. Also, at this point the time remaining indicator is not visible. Then, in a for... loop:
for (var i = 0; i < batchOfImages.length; i++)
{
We upload an image
After the image is uploaded, we get the difference between the current time and the start time. This is the total time expended on this task so far.
We then divide the total time expended by the total number of images uploaded so far, i + 1, to get the amount of time it has taken on average to upload each image so far
We then multiply the average upload time per image by the number of images remaining to upload to get the likely amount of time it will take to upload the remaining images
We update the text on the indicator to show amount of time remaining, and we make sure that the indicator is visible
}

Concurrent users projected to actual users

I need to provide the business with a report estimating number of users (devices in this case) the system can cope with without extensive delays and errors.
Assuming each device polls-communicates with the server every 5 seconds or so would it be acceptable to multiple the number of concurrent users I stress test with by 5 to get the figure required by the business?
In general what are the best means of answering such a question considering the above factors?
I am guessing that the collision rate (making them concurrent) may well be over the ratio of 5 (the seconds it takes for the device before it asks to communicate with the server).
Any advice?
I am using JMeter to produce concurrent user/device throughput.
Edit as requested to explain further:
From an analytics point of view if each device will attempt to connect and communicate with the server every 5 seconds and we wish to receive a response within the time it is ready to re-communicate (in other words in next 4 seconds), the collision chances literally for other devices running the same software is calculated on the elapsed time between the two calls no?
I am looking for statistical analysis methodology really to find a percent to multiply the concurrent test results to a real environment.
I know it is a general question without a specific / explicit answer but more the methodology, if there is one, of how can one project the number of "active" users the system can cope with from the known "concurrent" users. I would have though that given the frequency of calls is known and that each call takes 300ms in average one could somehow project the actual users (maybe by an industry standard multiplier?)

Estimating maximum users that an application can support

I am analyzing a web application and want to predict the maximum users that application can support. Now i have the below numbers out of my load test execution
1. Response Time
2. Throughput
3. CPU
I have the application use case SLA
Response Time - 4 Secs
CPU - 65%
When i execute load test of 10 concurrent users (without Think Time) for a particular use case the average response time reaches 3.5 Seconds and CPU touches 50%. Next I execute load test of 20 concurrent users and response time reaches 6 seconds and CPU 70% thus surpassing the SLA.
The application server configuration is 4 core 7 GB RAM.
Going by the data does this suggests that the web application can support only 10 user at a time? Is there any formula or procedure which can suggest what is the maximum users the application can support.
TIA
"Concurrent users" is not a meaningful measurement, unless you also model "think time" and a couple of other things.
Think about the case of people reading books on a Kindle. An average reader will turn the page every 60 seconds, sending a little ping to a central server. If the system can support 10,000 of those pings per second, how many "concurrent users" is that? About 10,000 * 60, or 600,000. Now imagine that people read faster, turning pages every 30 seconds. The same system will only be able to support half as many "concurrent users". Now imagine a game like Halo online. Each user will be emitting multiple transactions / requests per second. In other words, user behavior matters a lot, and you can't control it. You can only model it.
So, for your application, you have to make a reasonable guess at the "think time" between requests, and add that to your benchmark. Only then will you start to approach a reasonable simulation. Other things to think about are session time, variability, time of day, etc.
Chapter 4 of the "Mature Optimization Handbook" discusses a lot of these issues: http://carlos.bueno.org/optimization/mature-optimization.pdf

Storing and processing millions of images in a project

I have a project that will generate a huge number of images. (1 000 000 sorry i erred)
I need to process every image through algorithm.
Can you advice me an archinecture for this project?
It is proprietary algorithm in the area of computer vision.
Average size of image is near 20 kB
I need to process them when they are uploaded and 1 or 2 times on request.
On average, once a day I get a million images, each of which I will need to navigate through the algorithm 1-2 times per day.
Yep most often, the image will be stored on a local disk
When i process images i will generate new image.
Current view:
Most likely i will have a few servers (i do not own) for each of the servers i have to perform the procedure described above.
Internet bandwidth between servers is very thin (about 1 Mb \ s) but for me it is necessary to exchange messages between servers (update coefficients of the neural network) and update algorithm.
On current hardware (intel family 6 model 26) it is about 10 minutes to complete full procedure for 50 000 images.
May be
Where will be wide internel channels so i can upload this images to servers i have.
Dont know much about images. But i guess this should help http://www.cloudera.com/content/cloudera/en/why-cloudera/hadoop-and-big-data.html
Also please let us know what kind of processing you are talking about and when you are saying huge number of images. How much do you expect per hour or per day ?

Performance Counters - Tool for monitoring in Windows Server 2008

I am able to get Performance counters for every two seconds in Windows Server 2008 machine using Powershell script. But when i go to Task Manager and check for the CPU Usage, powershell.exe is taking 50% of CPU. So i am trying to get those Performance counters using other third party tools. I have searched and found this and this. Those two are need to refresh manually and not getting automatically for every two seconds. Can anyone Please suggest some tool which gives the Performance Counters for every two seconds and analyzes the Maximum, Average of those counters and stores the results in text/xls or any other format. Please help me.
I found some Performance tools from here, listed below:
Apache JMeter
NeoLoad
LoadRunner
LoadUI
WebLOAD
WAPT
Loadster
LoadImpact
Rational Performance Tester
Testing Anywhere
OpenSTA
QEngine (ManageEngine)
Loadstorm
CloudTest
Httperf.
There are a number of tools that do this -- Google for "server monitor". Off the top of my head:
PA Server Monitor
Tembria FrameFlow
ManageEngine
SolarWinds Orion
GFI Max Nagios
SiteScope. This tool leverages either the perfmon API or the SNMP interface to collect the stats without having to run an additional non-native app on the box. If you go the open source route then you might consider Hyperic. Hyperic does require an agent to be on the box.
In either case I would look to your sample window as part of the culprit for the high CPU and not powershell. The higher your sample rate the higher you will drive the CPU, independent of tool. You can see this yourself just by running perfmon. Use the same sets of stats and what what happens to the CPU as you adjust the sample rate from once every 30 seconds, to once in 20, then ten, 5 and finally 2 seconds as the interval. When engaged in performance testing we rarely go below ten seconds on a host as this will cause the sampling tool to distort the performance of the host. If we have a particularly long term test, say 24 hours, then adjusting the interval to once in 30 seconds will be enough to spot long term trends in resource utilization.
If you are looking to collect information over a long period of time, 12 hours to more, consider going to a longer term interval. If you are going for a short period of sampling, an hour for instance, you may want to run a couple of different periods of one hour at lesser and greater levels of sampling (2 seconds vs 10 seconds) to ensure that the shorter sample interval is generating additional value for the additional overhead to the system.
To repeat, tools just to collect OS stats:
Commercial: SiteScope (Agentless). Leverages native interfaces
Open Source: Hyperic (Agent)
Native: Perfmon. Can dump data to a file for further analysis
This should be possible without third party tools. You should be able to collect the data using Windows Performance Monitor (see Creating Data Collector Sets) and then translate that data to a custom format using Tracerpt.
If you are still looking for other tools, I have compiled a list of windows server performance monitoring tools that also includes third party solutions.

Resources