I have a series of spreadsheets that manages a golf league. When the scorer has entered scores in a Scores Book a script posts the scores by marking the sheet as available. Another book imports available scores and attempts to produce a PDF. From time to time, the script does not finish before it runs out of execution time and when that happens and an execution transcript is available, there are different points where extensive amounts of time are used in what appear to be trivial steps. Here is an example
[17-08-03 08:20:15:893 PDT] SpreadsheetApp.openById([1T6-EVpw0GH_oP-kjuJUnezhzgLOQPGU-QgTkuBD8Nog]) [0.15 seconds]
[17-08-03 08:20:15:944 PDT] Spreadsheet.getName() [0.051 seconds]
[17-08-03 08:20:15:945 PDT] Logger.log([System book is: Gaffers - System (2017), []]) [0 seconds]
[17-08-03 08:20:15:946 PDT] Spreadsheet.getSheetByName([Week Results]) [0 seconds]
[17-08-03 08:21:27:748 PDT] Sheet.hideRows([2, 4]) [71.801 seconds]
[17-08-03 08:21:27:749 PDT] Sheet.clearNotes() [0 seconds]
[17-08-03 08:22:57:737 PDT] Sheet.getRange([A3]) [89.987 seconds]
[17-08-03 08:22:57:807 PDT] Executi
When the problem occurs, and a transcript is available, the point of the excessive times does not appear to be consistent.
Can anyone advise what might be causing these kinds of delays?
FYI: My structure is set up this way because if the poor scorer tries to enter hole-by-hole scores for multiple players on a sheet with all the calculations in it, there is a substantial delay before keyboard entries seem to be accepted. When it was originally built in Excel, calculations could be turned off to avoid that issue.
Related
Should I set shuffle=True in sklearn.model_selection.KFold ?
I'm in this situation where I'm trying to evaluate the cross_val_score of my model on a given dataset.
if I write
cross_val_score(estimator=model, X=X, y=y, cv=KFold(shuffle=False), scoring='r2')
I get back:
array([0.39577543, 0.38461982, 0.15859382, 0.3412703 , 0.47607428])
Instead, by setting
cross_val_score(estimator=model, X=X, y=y, cv=KFold(shuffle=True), scoring='r2')
I obtain:
array([0.49701477, 0.53682238, 0.56207702, 0.56805794, 0.61073587])
So, in light of this, I want to understand if setting shuffle = True in KFold may lead obtaining over-optimistic cross validation scores.
Reading the documentation, it just says that the effect of initial shuffling just shuffles the data at the beginning, before splitting it into K-folds, training on the K-1 and testing on the one left out, and repeating for the number of folds without re-shuffling.. So, according to this, one shouldn't worry too much. Of course it the shuffle occurred at each iteration of training during cross validation, one would end up considering generalization error on points that were previously considered during training, committing a bad mistake, but is this the case?
How can I interpret the fact that in this case I get slightly better values when shuffle is True?
I'm working on a project where I need to solve thousands of small to large "simple" instances of a knapsack-like problems. All my instances have the same structure, the same constraints but varies in the number of items (thus variables). The domains can be fixed for all the instances.
I've successfully integrated minizinc into a workflow which
extract relevant data from a database,
generate the .mzn model (including the initial variable assignments)
call the minizinc driver for the CBC solver and,
parse and interpret the solution.
I've now hit a performance bottleneck at the flattening phase for large instances (cf verbose flattening stats). Flattening takes up to 30seconds while solving to optimality requires less than 1s.
I've tried to disable the "optimized flattening" to no avail, I've also tried to split the toolchain (mzn2fzn then mzn-cbc); and finally tried to separate model and data definitions but didn't observe any significant improvement.
If it helps, I've identified the set of constraint causing the problem:
...
array[1..num_items] of int: item_label= [1,12,81,12, [10 000 more ints between 1..82], 53 ];
int: num_label = 82;
array[1..num_label] of var 0..num_items: cluster_distribution;
constraint forall(i in 1..num_label)(cluster_distribution[i]==sum(j in 1..num_items)(item_indicator[j] /\ item_label[j]==i));
var 1..num_label: nz_label;
constraint non_zero_label = sum(i in 1..num_label) (cluster_distribution[i]>0);
....
Basically, each of the 5000 items have a label (out of ~100 possible labels), and I try (amongst other objectives) to maximise the number of different labels (the non_zero_label var ). I've tried various formulation but this one seems to be the most efficient.
Could anyone provide me with some guidance on how to accelerate this flattening phase? How would you approach the task of solving thousands of "similar" instances?
Would it be beneficial to directly generate the MPS file and then call CBC natively? I expect minizinc to be quite efficient at compiling MPS file, but maybe I could get a speedup by exploiting the repeated structure of the instances? I however fear this would more or less boil down to re-coding a poorly written half-baked pseudo custom version of minizinc, which doesn't feel right.
Thanks !
My system info: Os X and Ubuntu, mnz 2.1.7.
Compiling instance-2905-gd.mzn
MiniZinc to FlatZinc converter, version 2.1.7
Copyright (C) 2014-2018 Monash University, NICTA, Data61
Parsing file(s) 'instance-2905-gd.mzn' ...
processing file '../share/minizinc/std/stdlib.mzn'
processing file '../share/minizinc/std/builtins.mzn'
processing file '../share/minizinc/std/redefinitions-2.1.1.mzn'
processing file '../share/minizinc/std/redefinitions-2.1.mzn'
processing file '../share/minizinc/linear/redefinitions-2.0.2.mzn'
processing file '../share/minizinc/linear/redefinitions-2.0.mzn'
processing file '../share/minizinc/linear/redefinitions.mzn'
processing file '../share/minizinc/std/nosets.mzn'
processing file '../share/minizinc/linear/redefs_lin_halfreifs.mzn'
processing file '../share/minizinc/linear/redefs_lin_reifs.mzn'
processing file '../share/minizinc/linear/domain_encodings.mzn'
processing file '../share/minizinc/linear/redefs_bool_reifs.mzn'
processing file '../share/minizinc/linear/options.mzn'
processing file '../share/minizinc/std/flatzinc_builtins.mzn'
processing file 'instance-2905-gd.mzn'
done parsing (70 ms)
Typechecking ... done (13 ms)
Flattening ... done (16504 ms), max stack depth 14
MIP domains ...82 POSTs [ 82,0,0,0,0,0,0,0,0,0, ], LINEQ [ 0,0,0,0,0,0,0,0,82, ], 82 / 82 vars, 82 cliques, 2 / 2 / 2 NSubIntv m/a/m, 0 / 127.085 / 20322 SubIntvSize m/a/m, 0 clq eq_encoded ... done (28 ms)
Optimizing ... done (8 ms)
Converting to old FlatZinc ... done (37 ms)
Generated FlatZinc statistics:
Variables: 21258 int, 20928 float
Constraints: 416 int, 20929 float
This is a minimization problem.
Printing FlatZinc to '/var/folders/99/0zvzbfcj3h16g04d07w38wrw0000gn/T/MiniZinc IDE (bundled)-RzF4wk/instance-2905-gd.fzn' ... done (316 ms)
Printing .ozn to '/var/folders/99/0zvzbfcj3h16g04d07w38wrw0000gn/T/MiniZinc IDE (bundled)-RzF4wk/instance-2905-gd.ozn' ... done (111 ms)
Maximum memory 318 Mbytes.
Flattening done, 17.09 s
If you are solving many instantiations of the same problem class and you are running into large flattening times, then there are two general approaches you can take: either you optimise the MiniZinc to minimize flattening time or you can implement the model in a direct solver API.
The first option is the best if you want to keep the generality between solvers. To optimise the flattening time the main thing you would like to eliminate is "temporary" variables: variables that are created and then thrown away. A lot of the time in flattening the model goes into resolving variables that are not necessary. This is to minimize the search space while solving. Temporary variables might be generated in comprehensions, loops, let-expressions, and reifications. For your specific model Gleb posted an optimisation for the for-loop in your constraint:
constraint forall(i in 1..num_label)(cluster_distribution[i]==sum(j in 1..num_items where item_label[j]==i)(item_indicator[j]));
The other option might be best if you want to integrate your model into a software product as you can offer direct interactions with a solver. You will have to "flatten" your model/data manually, but you can do it in a more limited way specific to only your purpose. Because it is not general purpose it can be very quick. Hints for the model can be found by looking at the generated FlatZinc for different instances.
There is a Bear. The Bear either sleeps in its cave or hunts in the forest. If the
Bear is hungry then it does not sleep. If the Bear is tired then it does not hunt.
Question
A
Formulate the story, above, in FOPC, using your predicates and/or objects.
Attempt A
SleepsInCave(Bear) v HuntsInForest(Bear)
Hungry(Bear) -> ~SleepsInCave(Bear)
Tired(bear) -> ~HuntsInForest(Bear)
Question B
Convert your FOPC into conjunctive normal form
Attempt B
Not sure how to conver to CNF because I was unable to complete part A!
I have a large scale multi objective optimization problem to solve with fmincon solver of Matlab. I tried different solver to get a better and faster output. Here is the challenge:
I am getting Exit Flag: 1,0,4,5 for different Pareto points ,as it is a multi-objective optimization problem, with Active-set algorithm. Then I tried to check different algorithms like interior-point and sqp for generating the Pareto points. I observed that sqp returns few exit flags 1, some 2 and few 0 but not any 4 or 5 flag. Also, I should note that, its 0 and 2 flagged solutions are correct answers . However, When it comes to return any exit flag except 1, it takes a long time to solve the Pareto point.
As interior-point algorithm is designed for large scale program, it's very faster than sqp in generating the Pareto solutions. However, it only returns solutions with Exit flag 0. Unfortunately, its 0 flagged solutions are wrong solutions despite sqp which its 0 and 2 flagged solutions are correct answers.
0) Is there anyway to config the fmincon to solve my problem with interior-point and also get the correct solutions? In the literature I saw some problems similar to mine have been solved with interior-point algorithm.
1) Is there any settings (TolX,TolCon,...) that I can use to get more exit flag 1 ?
2) Is there any setting that speeds up the optimization process with the cost of lower accuracy?
3) For 2 Pareto points I am getting exit flag -2 , which means the problem is not feasible for them. It is expected from the nature of the problem. But it takes ages for fmincon to determine the Exit-Flag -2. Is there any option that I can set to satisfy 1,2 and also leave this infeasible point faster?
I couldn't do this , because I can only set options for one time and all Pareto points should use the same option.
To describe the problem I should say:
I have several linear and nonlinear (.^2,Sin...) for both equality and inequality constraints (about 300) and also having 400 optimization variables. All objective functions of this multi-objective optimization problem is linear.
these are the options that I currently use. Please help me to modify it
options = optimset('Algorithm', 'sqp', 'Display', 'off');
options = optimset('Algorithm', 'sqp', 'Display', 'off', 'TolX',1e-6,...
'TolFun',1e-6,'MaxIter',1e2, 'MaxFunEvals', 1e4);
First option takes about 500 sec for generating 15 Pareto points. Meaning that each optimization of fmincon expend 33 sec.
The second option takes 200 sec, which is 13 sec for each optimization of fmincon.
Your help will be highly appreciated.
Recently, I implemented parallelisation in my MATLAB program, much to the suggestions offered in Slow xlsread in MATLAB. However, implementing the parallelism has cropped up another problem - non-linearly increasing processing time with increasing scale.
The culprit seems to be the java.util.concurrent.LinkedBlockingQueue method as can be seen from the attached images of profiler and the corresponding condensed graphs.
Problem: How do I remove this non-linearity as my work involves processing more than 1000 sheets in single run - which would take an insanely long time?
Note: The parallelised part of the program involves just reading all the .xls files and storing them in matrices, after which I start the remainder of my program. dlmwrite is used towards the end of the program and optimization on its time is not really required, although could also be suggested.
Culprit:
Code being parallelised:
parfor i = 1:runs
sin = 'Sheet';
sno = num2str(i);
sna = strcat(sin, sno);
data(i, :, :) = xlsread('Processes.xls', sna, '' , 'basic');
end
Doing parallel IO operation is likely to be a problem (could be slower in fact) unless maybe if you keep everything on an SSD. If you are always reading the same file and it's not enormous, you may want to try reading it prior to your loop and just doing your data manipulation in parallel.