I have a predicate that will check if a room is available within a given schedule(consisting of events).
Checking if the room is available and not taken by another event is currently working in an exponential way. I'd like to optimise this.
What I'm currently doing is:
I take the first event, verify it doesn't overlap with any of the other events. Then I take the second event, I verify it doesn't overlap with any of the other remaining events. And so on until the list is empty.
I've been thinking about it, but the only way I see in order to make this more performant is by using asserts.
I'm wondering if there is any other way than using asserts in order to improve the efficiency?
Optimal scheduling is definitely an exponential problem. It's akin to optimal bin packing. This is an entire research area.
It sounds to me like what you're doing is O(n2): you're comparing every element in the list to every other element in the list. But you're only doing it once, because you're comparing to every element after that element in the list. So element 1 gets compared to N-1 other elements, but element N-1 only gets compared to 1 other element. This is not an absurd time complexity for your problem.
An interval tree approach is potentially a significant improvement, because you will not actually compare every element to every other element. This lowers your worst-case time complexity to O(N log N) which is considered a pretty big improvement assuming your set of events is large enough that the constant factor cost of using a balanced tree is mitigated.
I suspect this isn't really where your performance problem lies though. You probably don't want the first schedule you can build, you probably want to see what schedule you can make that has the fewest conflicts, which will mean trying different permutations. This is where your algorithm is really running into trouble, and unfortunately it's where my knowledge runs dry; I don't know how one optimizes this process further. But I do know there is a lot written about process theory and scheduling theory that can assist you if you look for it. :)
I don't think your problem comes down to needing to use certain Prolog technologies better, such as the dynamic store. But, you can always profile your code and see where it is spending its time, and maybe there is some low-hanging fruit there that we could solve.
To go much further I think we're going to need to know more about your problem.
Related
I have leant that two ways of DP, but I am confused now. How we choose in different condition? And I find that in most of time top-down is more natural for me. Can anyone tell me that how to make the choice.
PS: I have read this post older post but still get confused. Need help. Don't identify my questions as duplication. I have mentioned that they are different. I hope to know how to choose and when to consider problem from top-down or bottom-up way.
To make it simple, I will explain based on my summary from some sources
Top-down: something looks like: a(n) = a(n-1) + a(n-2). With this equation, you can implement with about 4-5 lines of code by making the function a call itself. Its advantage, as you said, is quite intuitive to most developers but it costs more space (memory stack) to execute.
Bottom-up: you first calculate a(0) then a(1), and save it to some array (for instance), then you continuously savea(i) = a(i-1) + a(i-2). With this approach, you can significantly improve the performance of your code. And with big n, you can avoid stack overflow.
A slightly longer answer, but I have tried to explain my own approach to dynamic programming and what I have come to understand after solving such questions. I hope future users find it helpful. Please do feel free to comment and discuss:
A top-down solution comes more naturally when thinking about a dynamic programming problem. You start with the end result and try to figure out the ways you could have gotten there. For example, for fib(n), we know that we could have gotten here only through fib(n-1) and fib(n-2). So we call the function recursively again to calculate the answer for these two cases, which goes deeper and deeper into the tree until the base case is reached. The answer is then built back up until all the stacks are popped off and we get the final result.
To reduce duplicate calculations, we use a cache that stores a new result and returns it if the function tries to calculate it again. So, if you imagine a tree, the function call does not have to go all the way down to the leaves, it already has the answer and so it returns it. This is called memoization and is usually associated with the top-down approach.
Now, one important point I think for the bottom-up approach is that you must know the order in which the final solution has to be built. In the top-down case, you just keep breaking one thing down into many but in the bottom-up case, you must know the number and order of states that need to be involved in a calculation to go from one level to the next. In some simpler problems (eg. fib(n)), this is easy to see, but for more complex cases, it does not lend itself naturally. The approach I usually follow is to think top-down, break the final case into previous states and try to find a pattern or order to then be able to build it back up.
Regarding when to choose either of those, I would suggest the approach above to identify how the states are related to each other and being built. One important distinction you can find this way is how many calculations are really needed and how a lot might just be redundant. In the bottom up case, you have to fill an entire level before you go to the next. However, in the top down case, an entire subtree can be skipped if not needed and in such a way, a lot of extra calculations can be saved.
Hence, the choice obviously depends on the problem, but also on the inter-relation between states. It is usually the case that bottom-up is recommended because it saves you stack space as compared to the recursive approach. However, if you feel the recursion isn't too deep but is very wide and can lead to a lot of unnecessary calculations by tabularization, you can then go for top-down approach with memoization.
For example, in this question: https://leetcode.com/problems/partition-equal-subset-sum/, if you see the discussions, it is mentioned that top-down is faster than bottom-up, basically, the binary tree approach with a cache versus the knapsack bottom up build-up. I leave it as an exercise to understand the relation between the states.
Bottom-up and Top-down DP approaches are the same for many problems in terms of time and space complexity. Difference are that, bottom-up a little bit faster, because you don't need overhead for recursion and, yes, top-down more intuitive and natural.
But, real advantage of Top-bottom approach can be on some small sets of tasks, where you don't need to calculate answer for all smaller subtasks! And you can reduce time complexity in this cases.
For example you can use Top-down approach with memorization for finding N-th Fibonacci number, where the sequence is defined as a[n]=a[n-1]+a[n-2] So, you have both O(N) time for calculating it (I don't compare with O(logN) solution for finding this number). But look at the sequence a[n]=a[n/2]+a[n/2-1] with some edge cases for small N. In botton up approach you can't do it faster than O(N) where top-down algorithm will work with complexity O(logN) (or maybe some poly-logarithmic complexity, I am not sure)
To add on to the previous answers,
Optimal time:
if all sub-problems need to be solved
→ bottom-up approach
else
→ top-down approach
Optimal space:
Bottom-up approach
The question Nikhil_10 linked (i.e https://leetcode.com/problems/partition-equal-subset-sum/) doesn't require all subproblems to be solved. Hence the top-down approach is more optimal.
If you like the top-down natural then use it if you know you can implement it. bottom-up is faster than the top-down one. Sometimes Bottom-ups are easier and most of the times the bottom-up are easy. Depending on your situation make your decision.
I am a first year undergraduate CSc student who is looking to get into competitive programming.
Recursion involves defining and solving sub problems. As I understand, top down dynamic programming (dp) involves memoizing the solutions to sub problems to reduce the time complexity of the algorithm.
Can top down dp be used to improve the efficiency of every recursive algorithm with overlapping sub problems? Where would dp fail to work and how can I identify this?
The short answer is: Yes.
However, there are some constraints. The most obvious one is that recursive calls must overlap. I.e. during the execution of an algorithm, the recursive function must be called multiple times with the same parameters. This lets you truncate the recursion tree by memoization. So you can always use memoization to reduce the number of calls.
However, this reduction of calls comes with a price. You need to store the results somewhere. The next obvious constraint is that you need to have enough memory. This comes with a not-so obvious constraint. Memory access always requires some time. You first need to find where the result is stored and then maybe even copy it to some location. So in some cases, it might be faster to let the recursion calculate the result instead of loading it from somewhere. But this is very implementation-specific and can even depend on the operating system and hardware setup.
I recently learnt about anytime algorithm but couldn't find any good explanation of this.
Can any one explain about anytime algorithm and how it works?
Traditionally, an algorithm is some process that, when followed, eventually will stop and return a result (think about something like binary search, mergesort, Dijkstra's algorithm, etc.)
An anytime algorithm is an algorithm that, rather than producing a final answer, continuously searches for better and better answers to a particular problem. The "anytime" aspect means that at any point in time, you can ask the algorithm for its current best guess.
For example, suppose that you have some mathematical function and you want to find the minimum value that the function obtains. There are many numerical algorithms that you can use to do this - gradient descent, Newton's method, etc. - that under most circumstances never truly reach the ultimate answer. Instead, they converge closer and closer to the true value. These algorithms can be made into anytime algorithms. You can just run them indefinitely, and at any point in time, you can ask the algorithm what its best guess is so far.
Note that there is no one single algorithm called the "anytime algorithm." It's a class of algorithms, just in the same way that there's no one "randomized algorithm" or no one "approximation algorithm."
Hope this helps!
An anytime algorithm is an a class of computational procedures that computes a solution to some problem and that also needs to have three technical properties.
(1) It needs to be an algorithm, meaning it is guaranteed to terminate.
(2) It needs to be stoppable at any time, and at that time it needs to provide an answer to to the problem (think of this as an approximation to the ideal solution).
(3) As more time passes, the result you get from stopping the algorithm gets uniformly and continuously better (i.e. it never comes up with a worse solution, which can happen for some optimization procedures that might oscillate or occasionally restart from scratch).
http://katemats.com/interview-questions/ says:
You are given a sorted array and you want to find the number N. How do you do the search as quickly as possible (not just traversing each element)?
How would the performance of your algorithm change if there were lots of duplicates in the array?
My answer to the first question is binary search, which is O(log(n)), where n is the number of elements in the array.
According to this answer, "we have a maximum of log_2(n-1) steps" in the worst case when "element K is not present in A and smaller than all elements in A".
I think the answer to the second question is it doesn't affect the performance. Is this correct?
If you are talking worst case / big O, then you are correct - log(n) is your bound. However, if your data is fairly uniformly distributed (or you can map to that distribution), interpolating where to pick your partition can get log(log(n)) behavior. When you do the interpolation too, you also get rid of your worse cases where you have looking for one of the end elements (of course there are new pathological cases though).
For many many duplicates you might be willing to stride further away the direct center on the next probe. With more dups, you get a better margin for guessing correctly. While always choosing the half-way point gets you there in good time, educated guesses might get you some really excellent average behavior.
When I interview, I like to hear those answers, both knowledge of the book and what the theoretical is, but also what things can be done to specialize to the given situation. Often these constant factors can be really helpful (look at quicksort and its partition choosing schemes).
I don't think having duplicates matters.
You're looking for a particular number N, what matters is whether or not the current node matches N.
If I'm looking for the number 1 in the list 1-2-3-4-5-6 the performance would be identical to searching the list 1-9-9-9-9-9.
If the number N is duplicated then you will have a chance of finding it a couple steps sooner. For example if the same search was done on the list 1-1-1-1-1-9.
Sports tracker applications usually record a timestamp and a location in regular intervals in order to store the entire track. Analytical applications then allow to find certain statistics, such as the track subsection with the highest speed of a fixed duration (e.g. time needed for 5 miles). Or vice versa, the longest distance traversed in certain time span (e.g. Cooper distance in 12 minutes).
I'm wondering what's the most elegant and/or efficient approach to compute such sections.
In a naive approach, I'd normalize and interpolate the waypoints to get a more fine grained list of waypoints, either with a fixed time interval or fix distance steps. Then, move a sliding window representing my time span resp. distance segement over the list and search for the best sub-list matching my criteria. Is there any better way?
Elegance and efficiency are in the eye of the beholder.
Personally, I think your interpolation idea is elegant.
I imagine the interpolation algorithm is easy to build and the search you'll perform on the subsequent data is easy to perform. This can lead to tight code whose correctness can be easily verified. Furthermore, the interpolation algorithms probably already exist and are multi-purpose, so you don't have to to repeat yourself (DRY). Your suggested solution has the benefit of separating data processing from data analysis. Modularity of this nature is often considered a component of elegance.
Efficiency - are we talking about speed, space, or lines of code? You could try to combine the interpolation step with the search step to save space, but this will probably sacrifice speed and code simplicity. Certainly speed is sacrificed in the sense that multiple queries cannot take advantage of previous calculations.
When you consider the efficiency of your code, worry not so much about how the computer will handle it, or how you will code it. Think more deeply about the intrinsic time complexity of your approach. I suspect both the interpolation and search can be made to take place in O(N) time, in which case it would take vast amounts of data to bog you down: it is difficult to make an O(N) algorithm perform very badly.
In support of the above, interpolation is just estimating intermediate points between two values, so this is linear in the number of values and linear in the number of intermediate points. Searching could probably be done with a numerical variant of the Knuth-Morris-Pratt Algorithm, which is also linear.