Best Algorithm for Minimum Cost Maximum Flow? - algorithm

Can someone tell me which is the best algorithm for minimum cost maximum flow (and easy to implement) and from where to read will be helpful? I searched online and got names of many algorithms and unable to decide which one to study.

From my experience benchmarking MCF in an industry setting, there are three publicly available implementations that are competitive,
Andrew V Goldberg's cost scaling implementation.
Coin-OR's Lemon library cost scaling implementation.
Coin-OR's Network Simplex implementation.
I would try those in that order if you are limited for time. Other honorable mentions are,
Google-OR's cost scaling implementation. I haven't benchmarked this, but I'd expect it to be competitive with those above.
MCFClass has several implementations listed under various restricted licenses for commercial use. RelaxIV is very competitive but restrictive.
In terms of studying literature and a survey of competitive algorithms, the work of Kirarly and Kovacs are an excellent starting point.

Related

When to use a certain Reinforcement Learning algorithm?

I'm studying Reinforcement Learning and reading Sutton's book for a university course. Beside the classic PD, MC, TD and Q-Learning algorithms, I'm reading about policy gradient methods and genetic algorithms for the resolution of decision problems.
I have never had experience before in this topic and I'm having problems understanding when a technique should be preferred over another. I have a few ideas, but I'm not sure about them. Can someone briefly explain or tell me a source where I can find something about typical situation where a certain methods should be used? As far as I understand:
Dynamic Programming and Linear Programming should be used only when the MDP has few actions and states and the model is known, since it's very expensive. But when DP is better than LP?
Monte Carlo methods are used when I don't have the model of the problem but I can generate samples. It does not have bias but has high variance.
Temporal Difference methods should be used when MC methods need too many samples to have low variance. But when should I use TD and when Q-Learning?
Policy Gradient and Genetic algorithms are good for continuous MDPs. But when one is better than the other?
More precisely, I think that to choose a learning methods a programmer should ask himlself the following questions:
does the agent learn online or offline?
can we separate exploring and exploiting phases?
can we perform enough exploration?
is the horizon of the MDP finite or infinite?
are states and actions continuous?
But I don't know how these details of the problem affect the choice of a learning method.
I hope that some programmer has already had some experience about RL methods and can help me to better understand their applications.
Briefly:
does the agent learn online or offline? helps you to decide either using on-line or off-line algorithms. (e.g. on-line: SARSA, off-line: Q-learning). On-line methods have more limitations and need more attention to pay.
can we separate exploring and exploiting phases? These two phase are normally in a balance. For example in epsilon-greedy action selection, you use an (epsilon) probability for exploiting and (1-epsilon) probability for exploring. You can separate these two and ask the algorithm just explore first (e.g. choosing random actions) and then exploit. But this situation is possible when you are learning off-line and probably using a model for the dynamics of the system. And it normally means collecting a lot of sample data in advance.
can we perform enough exploration? The level of exploration can be decided depending on the definition of the problem. For example, if you have a simulation model of the problem in memory, then you can explore as you want. But real exploring is limited to amount of resources you have. (e.g. energy, time, ...)
are states and actions continuous? Considering this assumption helps to choose the right approach (algorithm). There are both discrete and continuous algorithms developed for RL. Some of "continuous" algorithms internally discretize the state or action spaces.

Course Scheduling Algorithms: why use of DFS or Graph coloring is not suggested?

I need to develop a Course Timetabling software which can allot timeslots and rooms efficiently. This is a curriculum based routine, not post-enrollment based. And efficiently means classes are assigned timeslots according to staff time preferences and also need to minimize 1st year-2nd year class overlap so that 2nd year students can retake the courses they've failed to pass.(and also for 3rd-4th yr pair).
Now, at first i thought that would be an easy problem, but now it seems different. Most of the papers i've looked on uses Genetic Algorithm/PSO/Simulated Annealing or these type of algorithm. And i'm still unable to interpret the problem to a GA problem.
what i'm confused about is why almost none of them suggests DFS or Graph-coloring algorithm?
Can someone explain the scenario if DFS/graph-coloring is used? Or why they aren't suggested or tried.
My experience with solving this problem for a complex department, is that the hard constraints (like no overlapping of courses that are taken by the same population, and hard constraints of the teachers) are rather easily solvable by exact methods. I modeled the problem with 0-1 integer linear programming, and solved it with a SAT-based tool called minisat+. Competitive commercial tools like cplex can also solve it.
So with today's tools there is no need to approximate as suggested above, even when the input is rather large.
Now, optimizing the solution is a different story. There can be many (weighted) objectives, and finding the solution that brings the objective to minimum is indeed very hard computationally (no tool that I tried can solve it within 24 hours), but they reach near optimum in a few hours (I know it is near optimum because I can compute the theoretical bound on the solution).
This document describes applying a GA approach to university time-tabling, so it should be directly applicable to your requirement: Using a GA to solve university time-tabling

Detail of all MPI Algorithm?

Is there any document about how MPI functions such as MPI_Algather, MPI_AlltoAll, MPI_Allreduce etc.. are implemented ?
I would like to learn about their algorithm and compute the complexity of them in term of uni-directional or bi-directional bandwidth and total data transfer size for a number of nodes and fixed data size.
I think the exact implementation of those algoritms varies, depending on the communication mechanism: in example a network will have tree-based reduction algorithms, while shared memory models will have different ones.
I'm not exactly sure about where to find answers to such questions, but I think that a good search for papers in google scholar or having a look at this paper list at open-mpi.org should be useful.
http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395/ref=sr_1_10?s=books&ie=UTF8&qid=1314807638&sr=1-10
shown above is great link that explains all the basic MPI algorithms and allows you to implement a simple version yourself. However, when doing comparisons between the algorithms that you have implemented and the MPI algorithms, you will see that they have made many optimizations depending on the size of the message and number of nodes that you are running on. Hopefully this helps

What are some of the applications of Order Statistics Algorithms?

I am watching the MIT lectures and Eric Demaine says that they discussed some of the applications of Order Statistics Algorithms. I was wondering if the SO community would help me figure out some of the applications of the selection algorithms.
Finding the median is a common application of such algorithm. E.g. I've used it in image processing for the median filter. Min, max, k-NN also use order statistic algorithms, so that's another application.
Here are some other applications that I can think of in addition to what Jacob said:
Most of the services care about 95-th or 99-th percentile of latency rather than mean 'coz they want to keep most of the users happy.
In machine-learning if you want to convert a continuous valued feature into Boolean features by bucketing it, one common approach is to partition it by percentile so that the cardinality of each Boolean feature is somewhat similar.
There are probably hundreds of applications of order statistics. The algorithms to compute them can change based on what kind of scaling you need and what approximations you can tolerate. If you can give more context on what Eric Demaine says, probably you can get better answers.

Nesting maximum amount of shapes on a surface

In industry, there is often a problem where you need to calculate the most efficient use of material, be it fabric, wood, metal etc. So the starting point is X amount of shapes of given dimensions, made out of polygons and/or curved lines, and target is another polygon of given dimensions.
I assume many of the current CAM suites implement this, but having no experience using them or of their internals, what kind of computational algorithm is used to find the most efficient use of space? Can someone point me to a book or other reference that discusses this topic?
After Andrew in his answer pointed me to the right direction and named the problem for me, I decided to dump my research results here in a separate answer.
This is indeed a packing problem, and to be more precise, it is a nesting problem. The problem is mathematically NP-hard, and thus the algorithms currently in use are heuristic approaches. There does not seem to be any solutions that would solve the problem in linear time, except for trivial problem sets. Solving complex problems takes from minutes to hours with current hardware, if you want to achieve a solution with good material utilization. There are tens of commercial software solutions that offer nesting of shapes, but I was not able to locate any open source solutions, so there are no real examples where one could see the algorithms actually implemented.
Excellent description of the nesting and strip nesting problem with historical solutions can be found in a paper written by Benny Kjær Nielsen of University of Copenhagen (Nielsen).
General approach seems to be to mix and use multiple known algorithms in order to find the best nesting solution. These algorithms include (Guided / Iterated) Local Search, Fast Neighborhood Search that is based on No-Fit Polygon, and Jostling Heuristics. I found a great paper on this subject with pictures of how the algorithms work. It also had benchmarks of the different software implementations so far. This paper was presented at the International Symposium on Scheduling 2006 by S. Umetani et al (Umetani).
A relatively new and possibly the best approach to date is based on Hybrid Genetic Algorithm (HGA), a hybrid consisting of simulated annealing and genetic algorithm that has been described by Wu Qingming et al of Wuhan University (Quanming). They have implemented this by using Visual Studio, SQL database and genetic algorithm optimization toolbox (GAOT) in MatLab.
You are referring to a well known computer science domain of packing, for which there are a variety of problems defined and research done, for both 2-dimnensional space as well as 3-dimensional space.
There is considerable material on the net available for the defined problems, but to find it you knid of have to know the name of the problem to search for.
Some packages might well adopt a heuristic appraoch (which I suspect they will) and some might go to the lengths of calculating all the possibilities to get the absolute right answer.
http://en.wikipedia.org/wiki/Packing_problem

Resources