I need to find a suitable method to be based on, for developing an optimization algorithm which does the following:
Let's say we have N tasks to do, and we have M rooms that each one of them contains some specific number of infrastructure/conditions.
Each task demands using room with a suitable conditions for the task.
For example, to get task A done we need to use water tap and gas piping, so we can only use rooms that contain those ones.
Also, for each task we have a predefined due date.
I hope I've explained it well enough.
So, I need to develop an algorithm which can allocate the tasks for each room in a proper scheduling, so I could do all of the tasks at the minimum total time and without exceeding deadline times (and if exceeding is inevitable, then getting the least worst answer).
What are an existing methods or algorithm I can get based on and learn from them?
I though about 'Job Shop', but I wonder if there are other suitable algorithms that can handle problems like that.
This is not an algorithm but a Mixed Integer Programming model. I am not sure if this is what you are looking for.
Assumptions: only one job can execute at the same time in a room. Jobs in different rooms can execute in parallel. Also, to keep things simple, I assume the problem is feasible (the model will detect infeasible problems but we don't return a solution if this is the case).
So we introduce a number of decision variables:
assign(i,j) = 1 if task i is assigned to room j
0 otherwise
finish(i) = time job i is done processing
makespan = finishing time of the last job
With this we can formulate the MIP model:
The following data is used:
Length(i) = processing time of job i
M = a large enough constant (say the planning horizon)
DueDate(i) = time job i must be finished
Allowed(i,j) = Yes if job i can be executed in room j
Importantly, I assume jobs are ordered by due date.
The first constraint says: if job i runs in room j then it finishes just after the previous jobs running in that room. The second constraint is a bound: a job must finish before its due date. The third constraint says: each job must be assigned to exactly one room where it is allowed to execute. Finally, the makespan is the last finish time.
To test this, I generated some random data:
---- 37 SET use resource usage
resource1 resource2 resource3 resource4 resource5
task2 YES
task3 YES
task5 YES
task7 YES
task9 YES YES
task11 YES
task12 YES YES
task13 YES
task14 YES
task15 YES
task16 YES YES
task17 YES
task20 YES YES
task21 YES YES
task23 YES
task24 YES
task25 YES YES
task26 YES
task28 YES
---- 37 SET avail resource availability
resource1 resource2 resource3 resource4 resource5
room1 YES YES YES YES
room2 YES YES
room3 YES YES
room4 YES YES YES YES
room5 YES YES YES YES
The set Allowed is calculated from use(i,r) and avail(j,r) data:
---- 41 SET allowed task is allowed to be executed in room
room1 room2 room3 room4 room5
task1 YES YES YES YES YES
task2 YES YES YES YES
task3 YES YES YES YES
task4 YES YES YES YES YES
task5 YES YES YES YES
task6 YES YES YES YES YES
task7 YES YES
task8 YES YES YES YES YES
task9 YES
task10 YES YES YES YES YES
task11 YES YES YES YES
task12 YES
task13 YES YES
task14 YES YES
task15 YES YES YES YES
task16 YES YES YES
task17 YES YES
task18 YES YES YES YES YES
task19 YES YES YES YES YES
task20 YES
task21 YES
task22 YES YES YES YES YES
task23 YES YES
task24 YES YES YES YES
task25 YES YES
task26 YES YES YES YES
task27 YES YES YES YES YES
task28 YES YES YES YES
task29 YES YES YES YES YES
task30 YES YES YES YES YES
We also have random due dates and processing times:
---- 33 PARAMETER length job length
task1 2.335, task2 4.935, task3 4.066, task4 1.440, task5 4.979, task6 3.321, task7 1.666
task8 3.573, task9 2.377, task10 4.649, task11 4.600, task12 1.065, task13 2.475, task14 3.658
task15 3.374, task16 1.138, task17 4.367, task18 4.728, task19 3.032, task20 2.198, task21 2.986
task22 1.180, task23 4.095, task24 3.132, task25 3.987, task26 3.880, task27 3.526, task28 1.460
task29 4.885, task30 3.827
---- 33 PARAMETER due job due dates
task1 5.166, task2 5.333, task3 5.493, task4 5.540, task5 6.226, task6 8.105
task7 8.271, task8 8.556, task9 8.677, task10 8.922, task11 10.184, task12 11.711
task13 11.975, task14 12.814, task15 12.867, task16 14.023, task17 14.200, task18 15.820
task19 15.877, task20 16.156, task21 16.438, task22 16.885, task23 17.033, task24 17.813
task25 21.109, task26 21.713, task27 23.655, task28 23.977, task29 24.014, task30 24.507
When I run this model, I get as results:
---- 129 PARAMETER results
start length finish duedate
room1.task1 2.335 2.335 5.166
room1.task9 2.335 2.377 4.712 8.677
room1.task11 4.712 4.600 9.312 10.184
room1.task20 9.312 2.198 11.510 16.156
room1.task23 11.510 4.095 15.605 17.033
room1.task30 15.605 3.827 19.432 24.507
room2.task6 3.321 3.321 8.105
room2.task10 3.321 4.649 7.971 8.922
room2.task15 7.971 3.374 11.344 12.867
room2.task24 11.344 3.132 14.476 17.813
room2.task29 14.476 4.885 19.361 24.014
room3.task2 4.935 4.935 5.333
room3.task8 4.935 3.573 8.508 8.556
room3.task18 8.508 4.728 13.237 15.820
room3.task22 13.237 1.180 14.416 16.885
room3.task27 14.416 3.526 17.943 23.655
room3.task28 17.943 1.460 19.403 23.977
room4.task3 4.066 4.066 5.493
room4.task4 4.066 1.440 5.506 5.540
room4.task13 5.506 2.475 7.981 11.975
room4.task17 7.981 4.367 12.348 14.200
room4.task21 12.348 2.986 15.335 16.438
room4.task25 15.335 3.987 19.322 21.109
room5.task5 4.979 4.979 6.226
room5.task7 4.979 1.666 6.645 8.271
room5.task12 6.645 1.065 7.710 11.711
room5.task14 7.710 3.658 11.367 12.814
room5.task16 11.367 1.138 12.506 14.023
room5.task19 12.506 3.032 15.538 15.877
room5.task26 15.538 3.880 19.418 21.713
Detail: based on the assignment I recalculated the start and finish times. The model can allow some slack here and there as long as it does not interfere with the objective and the due dates. To get rid of any possible slacks, I just execute all jobs as early as possible. Just back-to-back execution of jobs in the same room using the job ordering (remember I sorted jobs according to due date).
This model with 30 jobs and 10 rooms took 20 seconds using Cplex. Gurobi was about the same.
Augmenting the model to handle infeasible models is not very difficult. Allow jobs to violate the due date but at a price. A penalty term needs to be added to the objective. The due date constraint is in the above example a hard constraint, and with this technique, we make it a soft constraint.
I used a small variant of Alex's OPL CP Optimizer model on the data and it finds the optimal solution (makespan=19.432) within a couple of seconds and proves optimality in about 5s on my laptop. I think a big advantage of a CP Optimizer model is that it would scale to much larger instances and easily produce good quality solutions even if, for large instances, proving optimality may be challenging of course.
Here is my version of the CP Optimizer model:
using CP;
int N = 30; // Number of tasks
int M = 5; // Number of rooms
int Length [1..N] = ...; // Task length
int DueDate[1..N] = ...; // Task due date
{int} Rooms[1..N] = ...; // Possible rooms for task
tuple Alloc { int job; int room; }
{Alloc} Allocs = {<i,r> | i in 1..N, r in Rooms[i]};
dvar interval task[i in 1..N] in 0..DueDate[i] size Length[i];
dvar interval alloc[a in Allocs] optional;
minimize max(i in 1..N) endOf(task[i]);
subject to {
forall(i in 1..N) { alternative(task[i], all(r in Rooms[i]) alloc[<i,r>]); }
forall(r in 1..M) { noOverlap(all(a in Allocs: a.room==r) alloc[a]); }
}
Note also that the MIP model exploits a problem specific dominance rule that the tasks allocated to a particular room can be ordered by increasing due-date. While this is perfectly true for this simple version of the problem, this assumption may not hold anymore in the presence of additional constraints (as for instance a minimal start time for the tasks). The CP Optimizer formulation does not make this assumption.
Within CPLEX you can rely on MIP but you could also use CPOptimizer scheduling.
In OPL your model would look like
using CP;
int N = 30; // nbTasks
int M = 10; // rooms
range Tasks = 1..N;
range Rooms = 1..M;
int taskDuration[i in Tasks]=rand(20);
int dueDate[i in Tasks]=20+rand(20);
int possible[j in Tasks][m in Rooms] = (rand(10)>=8);
dvar interval itvs[j in Tasks][o in Rooms] optional in 0..100 size taskDuration[j] ;
dvar interval itvs_task[Tasks];
dvar sequence rooms[m in Rooms] in all(j in Tasks) itvs[j][m];
execute {
cp.param.FailLimit = 10000;
}
minimize max(j in Tasks) endOf(itvs_task[j]);
subject to {
// alternative
forall(t in Tasks) alternative(itvs_task[t],all(m in Rooms)itvs[t][m]);
// one room is for one task at most at the same time
forall (m in Rooms)
noOverlap(rooms[m]);
// due dates
forall(j in Tasks) endOf(itvs_task[j]) <=dueDate[j];
}
and give
In Pyomo Erwin's MIP can be implemented like:
################################################################################
# Sets
################################################################################
model.I = Set(initialize=self.resource_usage.keys(), doc='jobs to run')
model.J = Set(initialize=self.resource_availability.keys(), doc='rooms')
model.ok = Set(initialize=self.ok.keys())
################################################################################
# Params put at model
################################################################################
model.length = Param(model.I, initialize=self.length)
model.due_date = Param(model.I, initialize=self.due_date)
################################################################################
# Var
################################################################################
model.x = Var(model.I, model.J, domain=Boolean, initialize=0, doc='job i is assigned to room j')
model.finish = Var(model.I, domain=NonNegativeReals, initialize=0, doc='finish time of job i')
model.makespan = Var(domain=NonNegativeReals, initialize=0)
################################################################################
# Constraints
################################################################################
M = 100
def all_jobs_assigned_c(model, i):
return sum(model.x[ii, jj] for (ii, jj) in model.ok if ii == i) == 1
model.all_jobs_assigned_c = Constraint(model.I, rule=all_jobs_assigned_c)
def finish1_c(model, i, j):
return sum(
model.length[ii] * model.x[ii, jj] for (ii, jj) in model.ok if jj == j and ii <= i
) - M * (1 - model.x[i, j]) <= model.finish[i]
model.finish1_c = Constraint(model.I, model.J, rule=finish1_c)
model.finish2_c = Constraint(
model.I, rule=lambda model, i: model.finish[i] <= model.due_date[i]
)
model.makespan_c = Constraint(
model.I, rule=lambda model, i: model.makespan >= model.finish[i]
)
################################################################################
# Objective
################################################################################
def obj_profit(model):
return model.makespan
model.objective = Objective(rule=obj_profit, sense=minimize)
Solving with CBC took with 4 cores about 2min and results in:
Related
So for my summer task for A Level Computer Science before starting Year 12, I have been given a task where I have to convert a flowchart into pseudocode.
This is the task: https://www.leggott.ac.uk/wp-content/uploads/2019/07/SummerTask-JLC-Programming.pdf
So far I have got this:
// checking the patient's skin condition
skin_condition = input "Does skin appear normal? (Y/N)"
if skin_condition = N then
output "check pulse, call doctor"
endif
respiratory_status = input "Is the paitent breathing normally? (Y/N)"
// checking the patient's respiratory staus
if respiratory_status = N then
output "check for obstructions, call doctor"
endif
temperature = input "What is the patients body temperature?"
// checking the patient's body temperature
if temperature < 95 then
output "add additional blankets to warm patient"
endif
neurological_status = input "Can the patient move or respond? (Y/N)"
// checking the patient's neurological status
if neurological_status = N then
output "check consciousness, call doctor"
endif
cardiovascular_status = input "Does the patient have a normal pulse rate? (Y/N)"
// checking the patient's cardiovascular status
if cardiovascular_status = N then
output "check consciousness, call doctor"
endif
output "monitor patient every hour or as necessary"
I think you've got the general idea. Just a couple of points:
You never set skin_condition prior to using it.
You're asking for inputs before they're required in the flowchart (e.g. respiratory_status). Ask the question when you get to that point in the chart.
I'm using StanfordNLP to do text classification. I have a training set with two labels: YES and NO. Both labels have more or less the same datums per label (~= 120K).
The problem is that StanfordNLP is misclassifying some text, and I'm not able to identify why. How do I debug it?
My train file look like:
YES guarda-roupa/roupeiro 2 portas de correr
YES guarda-roupa/roupeiro 3 portas
YES guarda roupa , roupeiro 3 portas
YES guarda-roupa 4 portas
YES guarda roupa 6p mdf
YES guardaroupas 3 portas
YES jogo de quarto com guarda-roupa 2 portas + cômoda + berço
YES guarda roupa 4pts
NO base para guarda-sol
NO guarda-sol alumínio
NO guarda chuva transparente
NO coifa guarda po alavanca cambio
NO lancheira guarda do leao vermelha
NO hard boiled: queima roupa
NO roupa nova do imperador
NO suporte para passar roupa
The YES label identifies "guarda roupa" (wardrobe) and NO identifies things that aren't "guarda roupa" but have one or more commons words (such as "guarda chuva" -- umbrella, or "roupa" -- clothes).
I don't know why, but my model insists to classify "guarda roupa" (and its variations such as "guardaroupa", "guarda-roupas", etc) as NO...
How do I debug it? I already double checked my train file in order to see if I misclassified something, introducing an error, but I could not find it...
Any advice is welcome.
UPDATE 1
I'm using the following properties in order to control features creation:
useClassFeature=false
featureMinimumSupport=2
lowercase=true
1.useNGrams=false
1.usePrefixSuffixNGrams=false
1.splitWordsRegexp=\\s+
1.useSplitWordNGrams=true
1.minWordNGramLeng=2
1.maxWordNGramLeng=5
1.useAllSplitWordPairs=true
1.useAllSplitWordTriples=true
goldAnswerColumn=0
displayedColumn=1
intern=true
sigma=1
useQN=true
QNsize=10
tolerance=1e-4
UPDATE 2
Searching the API, I discovered that ColumnDataClassifier has a method getClassifier() that gives access to the underlying LinearClassifier, which has a dump() method. The dump produces an output that looks like bellow. From API: "Print all features in the classifier and the weight that they assign to each class."
YES NO
1-SW#-guarda-roupa-roupeiro-2portas 0,01 -0,01
1-ASWT-guarda-roupa-roupeiro 0,19 -0,19
1-SW#-guarda-roupa-roupeiro 0,19 -0,19
If I do a toString() into LinearClassifier it will print:
[-0.7, -0.7+0.1): 427.0 [(1-SW#-guarda-roupa-roupeiro-2portas,NO), ...]
[0.6, 0.6+0.1): 427.0 [(1-SW#-guarda-roupa-roupeiro-2portas,YES), ...]
I am working on a large-ish dataframe collection with some machine data in several tables. The goal is to add a column to every table which expresses the row's "class", considering its vicinity to a certain time stamp.
seconds = 1800
for i in range(len(tables)): # looping over 20 equally structured tables containing machine data
table = tables[i]
table['Class'] = 'no event'
for event in events[i].values: # looping over 20 equally structured tables containing events
event_time = event[1] # get integer time stamp
start_time = event_time - seconds
table.loc[(table.Time<=event_time) & (table.Time>=start_time), 'Class'] = 'event soon'
The event_times and the entries in table.Time are integers. The point is to assign the class "event soon" to all rows in a specific time frame before an event (the number of seconds).
The code takes quite long to run, and I am not sure what is to blame and what can be fixed. The amount of seconds does not have much impact on the runtime, so the part where the table is actually changed is probabaly working fine and it may have to do with the nested loops instead. However, I don't see how to get rid of them. Hopefully, there is a faster, more pandas way to go about adding this class column.
I am working with Python 3.6 and Pandas 0.19.2
You can use numpy broadcasting to do this vectotised instead of looping
Dummy data generation
num_tables = 5
seconds=1800
def gen_table(count):
for i in range(count):
times = [(100 + j)**2 for j in range(i, 50 + i)]
df = pd.DataFrame(data={'Time': times})
yield df
def gen_events(count, num_tables):
for i in range(num_tables):
times = [1E4 + 100 * (i + j )**2 for j in range(count)]
yield pd.DataFrame(data={'events': times})
tables = list(gen_table(num_tables)) # a list of 5 DataFrames of length 50
events = list(gen_events(5, num_tables)) # a list of 5 DataFrames of length 5
Comparison
For debugging, I added a dict of verification DataFrames. They are not needed, I just used them for debugging
verification = {}
for i, (table, event_df) in enumerate(zip(tables, events)):
event_list = event_df['events']
time_diff = event_list.values - table['Time'].values[:,np.newaxis] # This is where the magic happens
events_close = np.any( (0 < time_diff) & (time_diff < seconds), axis=1)
table['Class'] = np.where(events_close, 'event soon', 'no event')
# The stuff after this line can be deleted since it's only used for the verification
df = pd.DataFrame(data=time_diff, index=table['Time'], columns=event_list)
df['event'] = np.any((0 < time_diff) & (time_diff < seconds), axis=1)
verification[i] = df
newaxis
A good explanation on broadcasting is in Jakevdp's book
table['Time'].values[:,np.newaxis]
gives a (50,1) 2-d array
array([[10000],
[10201],
[10404],
....
[21609],
[21904],
[22201]], dtype=int64)
Verification
For the first step the verification df looks like this:
events 10000.0 10100.0 10400.0 10900.0 11600.0 event
Time
10000 0.0 100.0 400.0 900.0 1600.0 True
10201 -201.0 -101.0 199.0 699.0 1399.0 True
10404 -404.0 -304.0 -4.0 496.0 1196.0 True
10609 -609.0 -509.0 -209.0 291.0 991.0 True
10816 -816.0 -716.0 -416.0 84.0 784.0 True
11025 -1025.0 -925.0 -625.0 -125.0 575.0 True
11236 -1236.0 -1136.0 -836.0 -336.0 364.0 True
11449 -1449.0 -1349.0 -1049.0 -549.0 151.0 True
11664 -1664.0 -1564.0 -1264.0 -764.0 -64.0 False
11881 -1881.0 -1781.0 -1481.0 -981.0 -281.0 False
12100 -2100.0 -2000.0 -1700.0 -1200.0 -500.0 False
12321 -2321.0 -2221.0 -1921.0 -1421.0 -721.0 False
12544 -2544.0 -2444.0 -2144.0 -1644.0 -944.0 False
....
20449 -10449.0 -10349.0 -10049.0 -9549.0 -8849.0 False
20736 -10736.0 -10636.0 -10336.0 -9836.0 -9136.0 False
21025 -11025.0 -10925.0 -10625.0 -10125.0 -9425.0 False
21316 -11316.0 -11216.0 -10916.0 -10416.0 -9716.0 False
21609 -11609.0 -11509.0 -11209.0 -10709.0 -10009.0 False
21904 -11904.0 -11804.0 -11504.0 -11004.0 -10304.0 False
22201 -12201.0 -12101.0 -11801.0 -11301.0 -10601.0 False
Small optimizations of original answer.
You can shave a few lines and some assignments of the original algorithm
for table, event_df in zip(tables, events):
table['Class'] = 'no event'
for event_time in event_df['events']: # looping over 20 equally structured tables containing events
start_time = event_time - seconds
table.loc[table['Time'].between(start_time, event_time), 'Class'] = 'event soon'
You might shave some more if instead of the text 'no event' and 'event soon' you would just use booleans
Given a trained contextual bandit model, how can I retrieve a prediction vector on test samples?
For example, let's say I have a train set named "train.dat" containing lines formatted as below
1:-1:0.3 | a b c # <action:cost:probability | features>
2:2:0.3 | a d d
3:-1:0.3 | a b e
....
And I run below command.
vw -d train.dat --cb 30 -f cb.model --save_resume
This produces a file, 'cb.model'. Now, let's say I have a test dataset as below
| a d d
| a b e
I'd like to see probabilities as below
0.2 0.7 0.1
The interpretation of these probabilities would be that action 1 should be picked 20% of the time, action 2 - 70%, and action 3 - 10% of the time.
Is there a way to get something like this?
When you use "--cb K", the prediction is the optimal arm/action based on argmax policy, which is a static policy.
When using "--cb_explore K", the prediction output contains the probability for each arm/action. Depending the policy you pick, the probabilities are calculated differently.
If you send those lines to a daemon running your model, you'd get just that. You send a context, and the reply is a probability distribution across the number of allowed actions, presumably comprising the "recommendation" provided by the model.
Say you have 3 actions, like in your example. Start a contextual bandits daemon:
vowpalwabbit/vw -d train.dat --cb_explore 3 -t --daemon --quiet --port 26542
Then send a context to it:
| a d d
You'll get just what you want as the reply.
In the Workspace Class, initialize the object and then call the method predict(prediction_type: int). Below are the corresponding parameter values
class PredictionType(IntEnum):
SCALAR = pylibvw.vw.pSCALAR
SCALARS = pylibvw.vw.pSCALARS
ACTION_SCORES = pylibvw.vw.pACTION_SCORES
ACTION_PROBS = pylibvw.vw.pACTION_PROBS
MULTICLASS = pylibvw.vw.pMULTICLASS
MULTILABELS = pylibvw.vw.pMULTILABELS
PROB = pylibvw.vw.pPROB
MULTICLASSPROBS = pylibvw.vw.pMULTICLASSPROBS
DECISION_SCORES = pylibvw.vw.pDECISION_SCORES
ACTION_PDF_VALUE = pylibvw.vw.pACTION_PDF_VALUE
PDF = pylibvw.vw.pPDF
ACTIVE_MULTICLASS = pylibvw.vw.pACTIVE_MULTICLASS
NOPRED = pylibvw.vw.pNOPRED
I have a three asset portfolio. I need to set the target return for my second asset
whenever i try i get this error
asset.ts <- as.timeSeries(asset.ret)
spec <- portfolioSpec()
setSolver(spec) <- "solveRshortExact"
constraints <- c("Short")
setTargetReturn(Spec) = mean(colMeans(asset.ts[,2]))
efficientPortfolio(asset.ts, spec, constraints)
Error: is.numeric(targetReturn) is not TRUE
Title:
MV Efficient Portfolio
Estimator: covEstimator
Solver: solveRquadprog
Optimize: minRisk
Constraints: Short
Portfolio Weights:
MSFT AAPL NORD
0 0 0
Covariance Risk Budgets:
MSFT AAPL NORD
Target Return and Risks:
mean mu Cov Sigma CVaR VaR
0 0 0 0 0 0
Description:
Sat Apr 19 15:03:24 2014 by user: Usuario
i have tried and i have searched the web but i have no idea how to set the target return
for a specific expected return of the data set. i could copy the mean of my second asset # but i think due to decimal it could affect the answer.
I ran into this error , when using 2 assets.
Appears to be a bug in the PortOpt methods.
When there's 2 assets, it runs : .mvSolveTwoAssets
Which looks for the TargetReturn in the portfolioSpecs.
But as you know, targetReturn isn't always needed.
But in your code , you have 2 separate variables for spec.
'spec' , and 'Spec'
i.e.: 'Spec' .. assuming this is a typo, then this line needs to be changed.
setTargetReturn(Spec) = mean(colMeans(asset.ts[,2]))