I have a script which do different things as per the requirement.
It takes options from the user and execute the commands accordingly.
So the question is how can i find the remaining time of total execution.
Thanks
You'll probably have to make an estimate of how much time each command will take before you run them and then subtract from the total as each command completes.
You can improve on this by readjusting the total and remaining time when each command completes based on the difference between your estimate and the actual time.
To get it even sharper, you can record times over invocations and use that to make a better guess but I think that would be overkill for just a script.
It's not possible however to get it exactly correct. Cf. http://xkcd.com/612/
Related
I have a series of file-watchers that trigger jobs. The file-watchers look, every fixed interval of time, in their list and, if they find a file, they trigger a job. If not, they wait, coming back after that mentioned interval.
Some jobs are dependent on others, so running them in a proper order and with proper parallelism would be a good optimization. But I do not want to think about this myself.
What data structure and algorithms should I use to ask a computer to tell me what job to assign to what file-watcher (and in what order to put them)?
As input, I have the dependencies between the jobs, the arrival time of files for each job and a number of watchers. (For starter, I will pretend each jobs takes same amount of time). How do I spread the jobs between the watchers, to avoid unnecessary waiting gaps and to obtain faster run time?
(I am looking forward tackling this optimization in an algorithmic way, but would like to start with some expert advice)
EDIT : so far I understood the fact the I need a DAG (Directed acyclic graph) to represent the dependencies and that I need to play with Topological sorting in order to optimize. But this responds with a one execution line, one thread. What if I have more, say 7?
Lets say I am going to run process X and see how long it takes.
I am going to save into a database a date I ran this process, and the time it took. I want to know what to put into the DB.
Process X almost always runs under 1500ms, so this is a short process. It usually runs between 500 and 1500ms, quite a range (3x difference).
My question is, how many "runs" should be saved into the DB as a single run?
Every run saved into the DB as its
own row?
5 Runs, averaged, then save that
time?
10 Runs averaged?
20 Runs, remove anything more than 2
std deviations away, and save
everything inside that range?
Does anyone have any good info backing them up on this?
Save the data for every run into its own row. Then later you can use and analyze the data however you like... ie, all you the other options you listed can be performed after the fact. It's not really possible for someone else to draw meaningful conclusions about how to average/analyze the data without knowing more about what's going on.
The fastest run is the one that most accurately times only your code.
All slower runs are slower because of noise introduced by the operating system scheduler.
The variance you experience is going to differ from machine to machine, and even on identical machines, the set of runnable processes will introduce noise.
None of the above. Bran is close though. You should save every measurment. But don't average them. The average (arithmetic mean) can be very misleading in this type of analysis. The reason is that some of your measurments will be much longer than the others. This will happen becuse things can interfere with your process - even on 'clean' test systems. It can also happen becuse your process may not be as deterministic as you might thing.
Some people think that simply taking more samples (running more iterations) and averaging the measurmetns will give them better data. It doesn't. The more you run, the more likelty it is that you will encounter a perturbing event, thus making the average overly high.
A better way to do this is to run as many measurments as you can (time permitting). 100 is not a bad number, but 30-ish can be enough.
Then, sort these by magnitude and graph them. Note that this is not a standard distribution. Compute compute some simple statistics: mean, median, min, max, lower quaertile, upper quartile.
Contrary to some guidance, do not 'throw away' outside vaulues or 'outliers'. These are often the most intersting measurments. For example, you may establish a nice baseline, then look for departures. Understanding these departures will help you fully understand how your process works, how the sytsem affecdts your process, and what can interfere with your process. It will often readily expose bugs.
Depends what kind of data you want. I'd say one line per run initially, then analyze the data, go from there. Maybe store a min/max/average of X runs if you want to consolidate it.
http://en.wikipedia.org/wiki/Sample_size
Bryan is right - you need to investigate more. if your code has that much variance even "most" of the time then you might have a lot of fluctuation in your test environment because of other processes, os paging or other factors. If not it seems that you have code paths doing wildly varying amount of work and coming up with a single number/run data to describe the performance of such a multi-modal system is not going to tell you much. So i'd say isolate your setup as much as possible, run at least 30 trials and get a feel for what your performance curve looks like. Once you have that, you can use that wikipedia page to come up with a number that will tell you how many trials you need to run per code-change to see if the performance has increased/decreased with some level of statistical significance.
While saying, "Save every run," is nice, it might not be practical in your case. However, I do think that storing only the average eliminates too much data. I like storing the average of ten runs, but instead of storing just the average, I'd also store the max and min values, so that I can get a feel for the spread of the data in addition to its center.
The max and min information in particular will tell you how often corner cases arise. Is the 1500ms case a one-in-1000 outlier? Or is it something that recurs on a regular basis?
(First of all, sorry for my english, it's not my first language)
I have a list of tasks/jobs, each task must start after a specific start time, needs to run for a certain time and has to be finished after a certain end time.
I can dynamically add and remove workers, so it is possible to execute 2 or more tasks at the same time if I have to. My Goal is to find a scheduling plan that executes each job successfully and uses the minimal amount of workers possible.
I'm currently using an EDF (http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) Algorithm and recursively call the function with a higher Worker Limit if it can't schedule all jobs correctly, but I think this doesn't work right because I don't have a real way to measure when I can lower the ressource limit again.
Are there any Algorithms that work for my problem, or any other clever ideas?
Thanks for your help.
A scheduling problem can often be solved very effectively by formulating it either as mixed-integer program (MIP)
http://en.wikipedia.org/wiki/Mixed_integer_programming#Integer_unknowns
or expressing it using constraint programming (CP)
http://en.wikipedia.org/wiki/Constraint_programming
For either MIP or CP, you will find both free and commercial solvers that can address your problem.
In both of these approaches, you put your effort into stating the properties that the solution must have, and the hard work of applying an appropriate algorithm is left to a specialized solver.
In a GUI application I would like to show a progess dialog displaying how much time left for the task to accomplish, how may I get the remaining time before the task ends and count it down please? thanks
How to get the remaining time is something no-one but your application (or you) can know.
Assuming you have the code for this GUI application, to determine remaining time you simply need to know the total time a task takes and subtract the amount of time that passed since the start of the task.
When we copy files in windows, we get an expected time of completion. Is that time the best time or the worst time? Also are you assuming the environmental variables?
Raymond Chen had something to say about this...
If you implement such feature, let progress bar grow quicky to 90%. Then you can perform real job, no matter how long it will take. User experience will be much better than showing current progress ;-)
I can only guess, how the time is calculated. But many hours spent watching the copying window and seeing how the time estimate changes, here is my best estimate:
Windows is keeping a list of all the files to be copied
It is keeping track of the time and number of files already copied
The remaining time is calculated as :
Average time per file so far = time passed so far / files already copied
Estimated time needed for all files = avg time per file * number of files.
The calculation is repeated after a fixed time span has passed (maybe 5 seconds, maybe 30?)
It probably is a little more complex that I explained above, I suppose that the size of the file currently being copied, and the percentage that has been copied, goes into the calculation as well. That would explain why we see an estimate when only one file is being copied ;-).
So, in a direct answer to your question: It is neither the best nor the worst time, it is just a very weak estimate that is the less exact the more the file sizes differ from each other.
Or in other words: It was probably the fastest way (in terms of fast programming, as well as low cpu usage when running) that a programmer could think of that implemented a specified feature. I wouldn't be surprised if it was coded on a Friday afternoon...