Managing time limit in Deep Q-learning

Managing time limit in Deep Q-learning - time

I'm trying to implement a python's Deep RL program, where the agent has to resolve the problem (approach a target) before the expiry of the time limit.
Which is the best way to manage the time? It's a good idea to pass the remaining time as an input of the neural network?
I tried to do that (remaining time as one of the entries describing the state of the environment) but the algorithm is not converging...
Any idea or tip?
Thanks a lot!!

Assuming you are trying to implement deep q learning, I think it's better to subtract the time remaining from the reward, like:
Q_target = (reward-time_remaining)+gamma*max(Q(s',a))

Related

What is the difference between Genetic Algorithm and Iterated Local Search Algorithm?

I'm basically trying to use Genetic Algorithm or Iterated Local Search Algorithm to get an optimal solution for a question.Can someone please explain what is the basic difference between these two algorithms and is there any situations where one of them is better than the other?

Let me start from the second question. I believe that there is no way to determine a better algorithm for a given problem without any trials and tests. The behavior of an algorithm heavily depends on problem's properties. If we are talking about complex problems with hundreds and thousands of variables, it's just too difficult to predict anything. I'm not talking about your engineer's intuition, some deep problem understanding, previous experience, etc, they are not really measurable.
The main difference between global and local search is quite straightforward - local search considers just one or a few of possible solutions at a single point of time and it tries to improve them with some modifications. Thus, each iteration it considers just a small portion of a search space (=local neighboorhood). Global search tries to take into account whole problem with all its parameters at the same time. For example, PSO samples huge amount of candidates and tries to move all of them into the global optimum's direction using some simple formula.

Measure expected time to execute any function

Often in Machine Learning, training consumes a lot of time and though, this is measurable, but only after the end of training.
Is there some method which can be used to estimate the time it might take to complete the training(or generally, any function), something like a before_call?
Sure it depends on the machine and more on the inputs but an approximation based on all the IO the algorithm will call, based on simple inputs and then scaled to the size of the actual inputs. Something like this?
PS - JS, Ruby or any other OO language
PPS - I see that in Oracle there is a way, described here. That is cool. How is it done?

Let Ci be the complexity of the i'th learning step. Let Pi be the probability that the thing to be learned will be learned at or before the i'th step. Let k be the step where Pk > 0.5.
In this case the complexity, C is
C = sum(Pi, i=1,k)
The problem is that k is difficult to find. In this case it is a good idea to have a stored set of previously learned similar patterns and compute their average step number, which will be the median. If the set is large-enough, it will be pretty accurate.
Pi = the number of instances when things were learned by step i / total number of instances

In case if you did not set any time/number of steps limits (that will be trivial), there is no way to estimate required time in general.
For example, neural network training basically is a problem of global high-dimensional optimization. In this task your are trying to find such set of parameters to a given loss function, that it will return minimal error. This task belong to NP-complete class and is very difficult to solve. Common approach is to randomly change some parameters by a small value in hope that it will improve overall performance. It works great in practice, but required runtime can vary greatly from problem to problem. I would recommend to read about NP-completness, stochastic gradient descent and optimisation in general.

Algorithm to find resource-free slot

I am designing a project management app for a factory. The app is expected to produce draft project plans. To schedule a task, the app should check three conditions:
task dependency - do not start before,
machine availability, and
shift work hours
I keep track of machine engagement in machine_allocations table:
machine_allocations
+------------+--------------+-----------------+---------------+
| machine_id | operation_id | start_timestamp | end_timestamp |
+------------+--------------+-----------------+---------------+
Shift hours follow a pattern.
Now, to find the earliest possible date-time for an operation I am thinking of a function:
function earliest_slot($machine_id, $for_duration, $no_sooner_than) {
// pseudo code
1. get records for the machine in question for after $no_sooner_than
2. put start and end timestamps into $unavailable array
3. add non-working times as new elements to the array
4. in a loop find timeslots which are not in the array
5. if a timeslot is found which is equal to or bigger than $for_duration, return that
}
My question is, is this a good approach? Are there simpler ways to do this?

Finding the earliest date-time for one operation at a time may not give you the best result. Consider the example where operation A uses machine 1 for a long time, operation B uses machine 1 for a short time and operation C uses machine 2 for a short time, but operation C must be done after B.
In this case, it is better to schedule B before A on machine 1, but your approach would not achieve this. Of course, writing and using software to manage this would be more difficult than what you have suggested, so you need to decide whether the benefit is worth the extra effort.
Have a look at Scheduling, Job Shop Scheduling and Scheduling algorithm.
First you need to think about what sort of information you can collect about tasks (such as dependencies, priorities, deadlines) and then decide how best to put it together.
You may find that an approach like you propose is good enough in your case. My addition to your proposed algorithm would be to sort the list of existing machine operations to make searching through them faster, that is you can stop as soon as you find a time where your operation fits because it's guaranteed to be the earliest time.
A relatively simple extension would be a priority system that allows you to bump lower-priority tasks forward (which may require the adjustment of their dependencies as well), but more complicated algorithms would consider multiple tasks at once and try to optimise the outcome. In the end it comes down to what's appropriate for your specific problem.

That depends when You want to plan work. If before starting work of machines then mayby a Branch&Bound algorithm, or something like it (mayby dynamic programming). If work have to planned when machines are working and You can not tell what jobs would be performed then for optimal solution You can not count (well I can't think about it). Mayby put next jobs on machine with smalles max time? Mayby a dynamic version of Ford-Bellmas alg (if you have couple layers of production). Hard to say.
I would do couple of approches and determine witch are best. The You can write an article about this :)

Scheduling with variable Resources

(First of all, sorry for my english, it's not my first language)
I have a list of tasks/jobs, each task must start after a specific start time, needs to run for a certain time and has to be finished after a certain end time.
I can dynamically add and remove workers, so it is possible to execute 2 or more tasks at the same time if I have to. My Goal is to find a scheduling plan that executes each job successfully and uses the minimal amount of workers possible.
I'm currently using an EDF (http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) Algorithm and recursively call the function with a higher Worker Limit if it can't schedule all jobs correctly, but I think this doesn't work right because I don't have a real way to measure when I can lower the ressource limit again.
Are there any Algorithms that work for my problem, or any other clever ideas?
Thanks for your help.

A scheduling problem can often be solved very effectively by formulating it either as mixed-integer program (MIP)
http://en.wikipedia.org/wiki/Mixed_integer_programming#Integer_unknowns
or expressing it using constraint programming (CP)
http://en.wikipedia.org/wiki/Constraint_programming
For either MIP or CP, you will find both free and commercial solvers that can address your problem.
In both of these approaches, you put your effort into stating the properties that the solution must have, and the hard work of applying an appropriate algorithm is left to a specialized solver.

Resource scheduling problem

I'm developing a motorcycle hire website. The problem I have is how to solve the problem of assignment a guest to a motorcycle in an efficient way. I know how to do it in a "silly" way, but I want to know if there is a classical algorithm that solves this kind of problem. It's the same problem as the assignment of a guest to rooms in a hotel. In this last example, the goal is to achive maximum occupancy by never rejecting a reservation due to inefficient scheduling.
I'm pretty sure that this problem has to be a classic problem that has a known solution.
Thanks a lot.

What you're interested in is called Interval Scheduling. Assuming all reservations have the same weight (none are favored over any other), you'd want a greedy algorithm.
Here (pdf) are some good slides about the topic.
Basically, you want to schedule the earliest-ending reservations first.

This is Interval scheduling but it's an online algorithm. If you want to read further you can read here:
http://www-bcf.usc.edu/~dkempe/teaching/online.pdf

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio