Finding an optimal solution for targeting ships in a naval engagement

Finding an optimal solution for targeting ships in a naval engagement - algorithm

I have an engagement between two fleets of n and m ships, each ship in the friendly fleet with its own with salvo damage, and each ship in the enemy fleet with its own hp amount. The goal of this algorithm is to find the optimal solution (if such solution exists) to how to assign targets to your ships, (ex: ship 1 in my fleet targets ship 3 in your fleet) in such a way that the salvo will maximize the amount of damage done to the enemy fleet.
Important. By damage, I mean the amount of damage/hp value of an enemy ship destroyed. If an enemy ship has 100hp and deals 20dmg, its "value" is 100/20 = 5. So destroying that ship incurs a score of 5. And lastly, only the score of destroyed ships is taken into account. If it is impossible to destroy any ships with a single salvo, the score will then include the damaged ships.
I have attempted a greedy method, an iterative improvement method, and a hill ascent method too, but none of them are capable of reaching an optimal solution. I have also tried another method, where a large amount of randomized target choice sets are made, and evaluated, and the best one is picked out of al of them. This is the one that has produced the best results, but it is incredibly innefficient and almost never produces the optimal result.
I believe there has to be a way of calculating an optimal solution that does not require checking every single possible targeting choice, but I cannot find a way of doing so. It also seems like this problem is like a weird form of the multiple knapsack problem. With the knapsacks being the enemy hp pools, and the items the dmg values of the shots. Except this time the last item placed into a knapsack can exceed the size limit of the knapsac but only the fraction of the item's value that fits into the kanpsack is useful.
Even if it is not a solution to the problem, any thoughts or help are very much appreciated!

Linear programming will do the job perfectly here. In this case, the decision variables are integers, so we are dealing with ILP.
Here is a small description on how to model your problem as a linear program.
Data:
F_dmg[n] // an array containing the damage of friendly ships
E_hp[m] // an array containing the hp points of the ennemy ships
M // constant, the highest hp among all ships
V[m] // the 'value' of ennemy ships
Decision variables:
X[n][m] // a matrix of booleans (0 or 1)
// X[i][j] = 1 if the ship i attacks the ship j, 0 otherwise
Dmg[m] // an array of integer, representing the total damage taken by each ennemy ship
IsAlive[m] // an array of booleans, representing the fact that the ship is destroyed or not (0 if dead, 1 if alive)
Constraints:
// a friend ship can attack at most one ennemy ship
for all i in 1..n, sum(j in 1..m) X[i][j] <= 1
// the damage sustained by a ship cannot exceed its hp
for all j in 1..m, sum(i in 1..n) Dmg[m] <= E_hp[j]
// the damage sustained by a ship has to be coherent with the attacks it receives
for all j in 1..m, sum(i in 1..n) Dmg[j] <= X[i][j] * F_dmg[i]
// a ship is destroyed if the damage sustained is equal to its hp
for all j in 1..m, M * IsAlive[j] >= E_hp[j] - Dmg[j]
Objective function
maximize sum(j in 1..m) (1 - IsAlive[j]) * V[j]
Write that in OPL, feed it to an ILP solver and you'll get an optimal answer real fast if your input is not absolutely gigantic.

This either is, or is very similar to, the Weapon Target Assignment Problem.
Unfortunately that problem is NP-hard, and according to the 2003 paper "Exact and Heuristic Algorithms for the Weapon Target Assignment Problem" (Ahuja, Kumar et al.), even instances as small as 20 weapons and 20 targets can't be solved to provable optimality. (I only read the abstract.)

Must me quite similar to a depth first algorithm that will try and find the optimal route it will then return the optimal route for each ship and return. An array of targets that each ship should target

Related

Football Guaranteed Relegation/Promotion Algorithm

I'm wondering if there is a way to speed up the calculation of guaranteed promotion in football (soccer) for a given football league table. It seems like there is a lot of structure to the problem so the exhaustive solution is perhaps slower than necessary.
In the problem there is a schedule (a list of pairs of teams) that will play each other in the future and a table (map) of points each team has earned in games in the past. I've included a sample real life points table below. Each future game can be won, lost or tied and teams earn 3 points for a win and 1 for a tie. Points (Pts) is what ultimately matters for promotion and num_of_promoted_teams (positive integer, usually around 1-3) are promoted at the end of each season.
The problem is to determine which (if any) teams are currently guaranteed promotion. Where guaranteed promotion means that no matter the outcome of the final games the team must end up promoted.
def promoted(num_of_promoted_teams, table, schedule):
return guaranteed_promotions
I've been thinking about using depth first search (of the future game results) to eliminate teams which would lower the average but not the worst case performance. This certainly help early in the season, but the problem could become large in mid-season before shrinking again near the end. It seems like there might be a better way.

A constraint solver should be fast enough in practice thanks to clever pruning algorithms, and hard to screw up. Here’s some sample code with the OR-Tools CP-SAT solver.
from ortools.sat.python import cp_model
def promoted(num_promoted_teams, table, schedule):
for candidate in table.keys():
model = cp_model.CpModel()
final_table = table.copy()
for home, away in schedule:
home_win = model.NewBoolVar("")
draw = model.NewBoolVar("")
away_win = model.NewBoolVar("")
model.AddBoolOr([home_win, draw, away_win])
model.AddBoolOr([home_win.Not(), draw.Not()])
model.AddBoolOr([home_win.Not(), away_win.Not()])
model.AddBoolOr([draw.Not(), away_win.Not()])
final_table[home] += 3 * home_win + draw
final_table[away] += draw + 3 * away_win
candidate_points = final_table[candidate]
num_not_behind = 0
for team, team_points in final_table.items():
if team == candidate:
continue
is_behind = model.NewBoolVar("")
model.Add(team_points < candidate_points).OnlyEnforceIf(is_behind)
model.Add(candidate_points <= team_points).OnlyEnforceIf(is_behind.Not())
num_not_behind += is_behind.Not()
model.Add(num_promoted_teams <= num_not_behind)
solver = cp_model.CpSolver()
status = solver.Solve(model)
if status == cp_model.INFEASIBLE:
yield candidate
print(*promoted(2, {"A": 10, "B": 8, "C": 8}, [("B", "C")]))

Here’s an alternative solution that is less extensible and probably slower in exchange for being self-contained and predictable.
This solution consists of an algorithm to test whether a particular team can finish behind a particular set of other teams (assuming unfavorable tie-breaking), wrapped in a loop over pairs consisting of a top-k team ℓ and a set of k teams W that might or might not finish ahead of ℓ (where k is the number of promoted teams).
If there were no draws, then we could use bipartite matching. Mark ℓ as having lost its remaining matches and mark W as having won their matches against teams not in W. On one side of the bipartite graph, there are nodes corresponding to matches between members of W. On the other side, there are zero or more nodes for each team in W, corresponding to the number of matches that that team must win to pull ahead of ℓ. If there is a matching that completely matches the latter side, then W can finish collectively in front of ℓ, and ℓ is not guaranteed promotion.
This could be extended easily if wins were 2 points instead of 3, but alas, 3 points causes the problem not to be convex, and we’re going to need some branching. The simplest branching method depends on the observation that it’s better for two teams to each win once and lose once against each other than draw twice. Hence, loop over all subsets of at most k choose 2 pairs of teams and run the algorithm above after marking each pair in the subset as having drawn once.
(I could propose improvements, but k is small, computers are cheap, programmers are expensive, and sports fans are relentless.)

Setting priority queue values to optimize the probability of finding a 'gift'

I have a priority queue of "door numbers". I get the next door number from the priority queue (i.e. the door with the lowest corresponding priority value), and then open the door. Behind the door, there may be a gift or not. Based on the presence / absence of a gift, I update the priority for this door number, and put it back into the priority queue. I then repeat, getting the next door number to open, and so on.
Assuming every door has a different gift-replenishment rate (i.e. some may get a new gift daily, others never at all), how should I update the priority values in order to maximize the number of gifts I find? That is, I want to maximize the ratio of doors I open with gifts to doors I open without gifts.
I should note that replenishment rates are not guaranteed to be fixed over time / there is random variation. But I'm okay with simplifying assumptions here.
This almost seems like a Monte-Carlo problem to me, except that the more often I explore a node (door), the lower its expected value. (And of course there's no tree to build; we only need to figure out the value of depth-1 nodes.)
The most trivial way is to keep track of last priority (LP) and current priority (CP), with delta = CP - LP. If we find a gift, set the next priority NP = CP + delta - 1; otherwise set NP = CP + delta + 1. This works I guess, but seems rather slow in its optimization.
Or we could have a multiplicative value instead: NP = CP + delta * shrink or NP = CP + delta * grow, where shrink < 1 and grow > 1. This is what I currently have, and it seemed to work fine for months, but now I'm getting the situation where some doors are being opened back-to-back (i.e. open door D, found gift, put back on priority queue, D is now best choice again, no gift found of course, now put back on queue with worse priority) which seems pretty bad. For reference, I used shrink = 0.9 and grow = 1.3.
Is there a math formula (as with Monte-Carlo) expressing the optimal way to explore doors?

Multi-armed bandit theory runs deep and is not my specialty, so there's probably a reference that I don't know about. That being said, my first instinct is:
Simplify the math with the spherical-cow assumption that, for each door, the replenishment time is exponentially distributed with some unknown rate that stays constant over time.
Separate out our estimate of the replenishment rate from the history.
Set the priority of each door to 1 − exp(−λx) where λ is the estimated replenishment rate and x is the time since we last opened the door. (Higher is better.)
Multi-armed bandits typically have to balance exploration with exploitation, but my hunch here is that we'll get this naturally from the replenishment process.
Most of the technical detail is in doing the estimate. We have a bunch of examples (x, b) where x is the time since we last opened the door and b is whether there was a gift. For a given rate λ, the formula above for the priority gives the expected value of b. I'll suggest a maximum likelihood estimator for λ. This means maximizing the sum of log(exp(−λx)) = −λx over all (x, 0) examples plus the sum of log(1 − exp(−λx)) over all (x, 1) examples. This function can be optimized directly, but there are two issues:
The more times we open a door, the more expensive the optimization gets.
If there are no positive or negative examples, then the solution is degenerate. Probably we should require λ be at least monthly or something to avoid giving up on a door entirely.
What I would actually recommend is picking a small set of λ values to make this a discrete optimization problem.
(Another potential problem is that the priority formula could be inefficient for many doors. What you could do instead is pick a target threshold for priority and then calculate when the priority will exceed that threshold.)

Best approach to a variation of a bucketing problem

Find the most appropriate team compositions for days in which it is possible. A set of n participants, k days, a team has m slots. A participant specifies how many days he wants to be a part of and which days he is available.
Result constraints:
Participants must not be participating in more days than they want
Participants must not be scheduled in days they are not available in.
Algorithm should do its best to include as many unique participants as possible.
A day will not be scheduled if less than m participants are available for that day.
I find myself solving this problem manually every week at work for my football team scheduling and I'm sure there is a smart programmatic approach to solve it. Currently, we consider only 2 days per week and colleagues write down their name for which day they wanna participate, and it ends up having big lists for each day and impossible to please everyone.
I considered a new approach in which each colleague writes down his name, desired times per week to play and which days he is available, an example below:
Kane 3 1 2 3 4 5
The above line means that Kane wants to play 3 times this week and he is available Monday through Friday. First number represents days to play, next numbers represent available days(1 to 7, MOnday to Sunday).
Days with less than m (in my case, m = 12) participants are not gonna be scheduled. What would be the best way to approach this problem in order to find a solution that does its best to include each participant at least once and also considers their desires(when to play, how much to play).
I can do programming, I just need to know what kind of algorithm to implement and maybe have a brief logical explanation for the choice.
Result constraints:
Participants must not play more than they want
Participants must not be scheduled in days they don't want to play
Algorithm should do its best to include as many participants as possible.
A day will not be scheduled if less than m participants are available for that day.

Scheduling problems can get pretty gnarly, but yours isn't too bad actually. (Well, at least until you put out the first automated schedule and people complain about it and you start adding side constraints.)
The fact that a day can have a match or not creates the kind of non-convexity that makes these problems hard, but if k is small (e.g., k = 7), it's easy enough to brute force through all of the 2k possibilities for which days have a match. For the rest of this answer, assume we know.
Figuring out how to assign people to specific matches can be formulated as a min-cost circulation problem. I'm going to write it as an integer program because it's easier to understand in my opinion, and once you add side constraints you'll likely be reaching for an integer program solver anyway.
Let P be the set of people and M be the set of matches. For p in P and m in M let p ~ m if p is willing to play in m. Let U(p) be the upper bound on the number of matches for p. Let D be the number of people demanded by each match.
For each p ~ m, let x(p, m) be a 0-1 variable that is 1 if p plays in m and 0 if p does not play in m. For all p in P, let y(p) be a 0-1 variable (intuitively 1 if p plays in at least one match and 0 if p plays in no matches, but hold on a sec). We have constraints
# player doesn't play in too many matches
for all p in P, sum_{m in M | p ~ m} x(p, m) ≤ U(p)
# match has the right number of players
for all m in M, sum_{p in P | p ~ m} x(p, m) = D
# y(p) = 1 only if p plays in at least one match
for all p in P, y(p) ≤ sum_{m in M | p ~ m} x(p, m)
The objective is to maximize
sum_{p in P} y(p)
Note that we never actually force y(p) to be 1 if player p plays in at least one match. The maximization objective takes care of that for us.
You can write code to programmatically formulate and solve a given instance as a mixed-integer program (MIP) like this. With a MIP formulation, the sky's the limit for side constraints, e.g., avoid playing certain people on consecutive days, biasing the result to award at least two matches to as many people as possible given that as many people as possible got their first, etc., etc.

I have an idea if you need a basic solution that you can optimize and refine by small steps. I am talking about Flow Networks. Most of those that already know what they are are probably turning their nose because flow network are usually used to solve maximization problem, not optimization problem. And they are right in a sense, but I think it can be initially seen as maximizing the amount of player for each day that play. No need to say it is a kind of greedy approach if we stop here.
No more introduction, the purpose is to find the maximum flow inside this graph:
Each player has a number of days in which he wants to play, represented as the capacity of each edge from the Source to node player x. Each player node has as many edges from player x to day_of_week as the capacity previously found. Each of this 2nd level edges has a capacity of 1. The third level is filled by the edges that link day_of_week to the sink node. Quick example: player 2 is available 2 days: monday and tuesday, both have a limit of player, which is 12.
Until now 1st, 2nd and 4th constraints are satisfied (well, it was the easy part too): after you found the maximum flow of the entire graph you only select those path that does not have any residual capacity both on 2nd level (from players to day_of_weeks) and 3rd level (from day_of_weeks to the sink). It is easy to prove that with this level of "optimization" and under certain conditions, it is possible that it will not find any acceptable path even though it would have found one if it had made different choices while visiting the graph.
This part is the optimization problem that i meant before. I came up with at least two heuristic improvements:
While you visit the graph, store day_of_weeks in a priority queue where days with more players assigned have a higher priority too. In this way the amount of residual capacity of the entire graph is certainly less evenly distributed.
randomness is your friend. You are not obliged to run this algorithm only once, and every time you run it you should pick a random edge from a node in the player's level. At the end you average the results and choose the most common outcome. This is an situation where the majority rule perfectly applies.
Better to specify that everything above is just a starting point: the purpose of heuristic is to find the best approximated solution possible. With this type of problem and given your probably small input, this is not the right way but it is the easiest one when you do not know where to start.

Expectation Maximization coin toss examples

I've been self-studying the Expectation Maximization lately, and grabbed myself some simple examples in the process:
http://cs.dartmouth.edu/~cs104/CS104_11.04.22.pdf
There are 3 coins 0, 1 and 2 with P0, P1 and P2 probability landing on Head when tossed. Toss coin 0, if the result is Head, toss coin 1 three times else toss coin 2 three times. The observed data produced by coin 1 and 2 is like this: HHH, TTT, HHH, TTT, HHH. The hidden data is coin 0's result. Estimate P0, P1 and P2.
http://ai.stanford.edu/~chuongdo/papers/em_tutorial.pdf
There are two coins A and B with PA and PB being the probability landing on Head when tossed. Each round, select one coin at random and toss it 10 times then record the results. The observed data is the toss results provided by these two coins. However, we don't know which coin was selected for a particular round. Estimate PA and PB.
While I can get the calculations, I can't relate the ways they are solved to the original EM theory. Specifically, during the M-Step of both examples, I don't see how they're maximizing anything. It just seems they are recalculating the parameters and somehow, the new parameters are better than the old ones. Moreover, the two E-Steps don't even look similar to each other, not to mention the original theory's E-Step.
So how exactly do these example work?

The second PDF won't download for me, but I also visited the wikipedia page http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm which has more information. http://melodi.ee.washington.edu/people/bilmes/mypapers/em.pdf (which claims to be a gentle introduction) might be worth a look too.
The whole point of the EM algorithm is to find parameters which maximize the likelihood of the observed data. This is the only bullet point on page 8 of the first PDF, the equation for capital Theta subscript ML.
The EM algorithm comes in handy where there is hidden data which would make the problem easy if you knew it. In the three coins example this is the result of tossing coin 0. If you knew the outcome of that you could (of course) produce an estimate for the probability of coin 0 turning up heads. You would also know whether coin 1 or coin 2 was tossed three times in the next stage, which would allow you to make estimates for the probabilities of coin 1 and coin 2 turning up heads. These estimates would be justified by saying that they maximized the likelihood of the observed data, which would include not only the results that you are given, but also the hidden data that you are not - the results from coin 0. For a coin that gets A heads and B tails you find that the maximum likelihood for the probability of A heads is A/(A+B) - it might be worth you working this out in detail, because it is the building block for the M step.
In the EM algorithm you say that although you don't know the hidden data, you come in with probability estimates which allow you to write down a probability distribution for it. For each possible value of the hidden data you could find the parameter values which would optimize the log likelihood of the data including the hidden data, and this almost always turns out to mean calculating some sort of weighted average (if it doesn't the EM step may be too difficult to be practical).
What the EM algorithm asks you to do is to find the parameters maximizing the weighted sum of log likelihoods given by all the possible hidden data values, where the weights are given by the probability of the associated hidden data given the observations using the parameters at the start of the EM step. This is what almost everybody, including the Wikipedia algorithm, calls the Q-function. The proof behind the EM algorithm, given in the Wikipedia article, says that if you change the parameters so as to increase the Q-function (which is only a means to an end), you will also have changed them so as to increase the likelihood of the observed data (which you do care about). What you tend to find in practice is that you can maximize the Q-function using a variation of what you would do if you know the hidden data, but using the probabilities of the hidden data, given the estimates at the start of the EM-step, to weight the observations in some way.
In your example it means totting up the number of heads and tails produced by each coin. In the PDF they work out P(Y=H|X=) = 0.6967. This means that you use weight 0.6967 for the case Y=H, which means that you increment the counts for Y=H by 0.6967 and increment the counts for X=H in coin 1 by 3*0.6967, and you increment the counts for Y=T by 0.3033 and increment the counts for X=H in coin 2 by 3*0.3033. If you have a detailed justification for why A/(A+B) is a maximum likelihood of coin probabilities in the standard case, you should be ready to turn it into a justification for why this weighted updating scheme maximizes the Q-function.
Finally, the log likelihood of the observed data (the thing you are maximizing) gives you a very useful check. It should increase with every EM step, at least until you get so close to convergence that rounding error comes in, in which case you may have a very small decrease, signalling convergence. If it decreases dramatically, you have a bug in your program or your maths.

As luck would have it, I have been struggling with this material recently as well. Here is how I have come to think of it:
Consider a related, but distinct algorithm called the classify-maximize algorithm, which we might use as a solution technique for a mixture model problem. A mixture model problem is one where we have a sequence of data that may be produced by any of N different processes, of which we know the general form (e.g., Gaussian) but we do not know the parameters of the processes (e.g., the means and/or variances) and may not even know the relative likelihood of the processes. (Typically we do at least know the number of the processes. Without that, we are into so-called "non-parametric" territory.) In a sense, the process which generates each data is the "missing" or "hidden" data of the problem.
Now, what this related classify-maximize algorithm does is start with some arbitrary guesses at the process parameters. Each data point is evaluated according to each one of those parameter processes, and a set of probabilities is generated-- the probability that the data point was generated by the first process, the second process, etc, up to the final Nth process. Then each data point is classified according to the most likely process.
At this point, we have our data separated into N different classes. So, for each class of data, we can, with some relatively simple calculus, optimize the parameters of that cluster with a maximum likelihood technique. (If we tried to do this on the whole data set prior to classifying, it is usually analytically intractable.)
Then we update our parameter guesses, re-classify, update our parameters, re-classify, etc, until convergence.
What the expectation-maximization algorithm does is similar, but more general: Instead of a hard classification of data points into class 1, class 2, ... through class N, we are now using a soft classification, where each data point belongs to each process with some probability. (Obviously, the probabilities for each point need to sum to one, so there is some normalization going on.) I think we might also think of this as each process/guess having a certain amount of "explanatory power" for each of the data points.
So now, instead of optimizing the guesses with respect to points that absolutely belong to each class (ignoring the points that absolutely do not), we re-optimize the guesses in the context of those soft classifications, or those explanatory powers. And it so happens that, if you write the expressions in the correct way, what you're maximizing is a function that is an expectation in its form.
With that said, there are some caveats:
1) This sounds easy. It is not, at least to me. The literature is littered with a hodge-podge of special tricks and techniques-- using likelihood expressions instead of probability expressions, transforming to log-likelihoods, using indicator variables, putting them in basis vector form and putting them in the exponents, etc.
These are probably more helpful once you have the general idea, but they can also obfuscate the core ideas.
2) Whatever constraints you have on the problem can be tricky to incorporate into the framework. In particular, if you know the probabilities of each of the processes, you're probably in good shape. If not, you're also estimating those, and the sum of the probabilities of the processes must be one; they must live on a probability simplex. It is not always obvious how to keep those constraints intact.
3) This is a sufficiently general technique that I don't know how I would go about writing code that is general. The applications go far beyond simple clustering and extend to many situations where you are actually missing data, or where the assumption of missing data may help you. There is a fiendish ingenuity at work here, for many applications.
4) This technique is proven to converge, but the convergence is not necessarily to the global maximum; be wary.
I found the following link helpful in coming up with the interpretation above: Statistical learning slides
And the following write-up goes into great detail of some painful mathematical details: Michael Collins' write-up

I wrote the below code in Python which explains the example given in your second example paper by Do and Batzoglou.
I recommend that you read this link first for a clear explanation of how and why the 'weightA' and 'weightB' in the code below are obtained.
Disclaimer : The code does work but I am certain that it is not coded optimally. I am not a Python coder normally and have started using it two weeks ago.
import numpy as np
import math
#### E-M Coin Toss Example as given in the EM tutorial paper by Do and Batzoglou* ####
def get_mn_log_likelihood(obs,probs):
""" Return the (log)likelihood of obs, given the probs"""
# Multinomial Distribution Log PMF
# ln (pdf) = multinomial coeff * product of probabilities
# ln[f(x|n, p)] = [ln(n!) - (ln(x1!)+ln(x2!)+...+ln(xk!))] + [x1*ln(p1)+x2*ln(p2)+...+xk*ln(pk)]
multinomial_coeff_denom= 0
prod_probs = 0
for x in range(0,len(obs)): # loop through state counts in each observation
multinomial_coeff_denom = multinomial_coeff_denom + math.log(math.factorial(obs[x]))
prod_probs = prod_probs + obs[x]*math.log(probs[x])
multinomial_coeff = math.log(math.factorial(sum(obs))) - multinomial_coeff_denom
likelihood = multinomial_coeff + prod_probs
return likelihood
# 1st: Coin B, {HTTTHHTHTH}, 5H,5T
# 2nd: Coin A, {HHHHTHHHHH}, 9H,1T
# 3rd: Coin A, {HTHHHHHTHH}, 8H,2T
# 4th: Coin B, {HTHTTTHHTT}, 4H,6T
# 5th: Coin A, {THHHTHHHTH}, 7H,3T
# so, from MLE: pA(heads) = 0.80 and pB(heads)=0.45
# represent the experiments
head_counts = np.array([5,9,8,4,7])
tail_counts = 10-head_counts
experiments = zip(head_counts,tail_counts)
# initialise the pA(heads) and pB(heads)
pA_heads = np.zeros(100); pA_heads[0] = 0.60
pB_heads = np.zeros(100); pB_heads[0] = 0.50
# E-M begins!
delta = 0.001
j = 0 # iteration counter
improvement = float('inf')
while (improvement>delta):
expectation_A = np.zeros((5,2), dtype=float)
expectation_B = np.zeros((5,2), dtype=float)
for i in range(0,len(experiments)):
e = experiments[i] # i'th experiment
ll_A = get_mn_log_likelihood(e,np.array([pA_heads[j],1-pA_heads[j]])) # loglikelihood of e given coin A
ll_B = get_mn_log_likelihood(e,np.array([pB_heads[j],1-pB_heads[j]])) # loglikelihood of e given coin B
weightA = math.exp(ll_A) / ( math.exp(ll_A) + math.exp(ll_B) ) # corresponding weight of A proportional to likelihood of A
weightB = math.exp(ll_B) / ( math.exp(ll_A) + math.exp(ll_B) ) # corresponding weight of B proportional to likelihood of B
expectation_A[i] = np.dot(weightA, e)
expectation_B[i] = np.dot(weightB, e)
pA_heads[j+1] = sum(expectation_A)[0] / sum(sum(expectation_A));
pB_heads[j+1] = sum(expectation_B)[0] / sum(sum(expectation_B));
improvement = max( abs(np.array([pA_heads[j+1],pB_heads[j+1]]) - np.array([pA_heads[j],pB_heads[j]]) ))
j = j+1

The key to understanding this is knowing what the auxiliary variables are that make estimation trivial. I will explain the first example quickly, the second follows a similar pattern.
Augment each sequence of heads/tails with two binary variables, which indicate whether coin 1 was used or coin 2. Now our data looks like the following:
c_11 c_12
c_21 c_22
c_31 c_32
...
For each i, either c_i1=1 or c_i2=1, with the other being 0. If we knew the values these variables took in our sample, estimation of parameters would be trivial: p1 would be the proportion of heads in samples where c_i1=1, likewise for c_i2, and \lambda would be the mean of the c_i1s.
However, we don't know the values of these binary variables. So, what we basically do is guess them (in reality, take their expectation), and then update the parameters in our model assuming our guesses were correct. So the E step is to take the expectation of the c_i1s and c_i2s. The M step is to take maximum likelihood estimates of p_1, p_2 and \lambda given these cs.
Does that make a bit more sense? I can write out the updates for the E and M step if you prefer. EM then just guarantees that by following this procedure, likelihood will never decrease as iterations increase.

Is there a well understood algorithm or solution model for this meeting scheduling scenario?

I have a complex problem and I want to know if an existing and well understood solution model exists or applies, like the Traveling Salesman problem.
Input:
A calendar of N time events, defined by starting and finishing time, and place.
The capacity of each meeting place (maximum amount of people it can simultaneously hold)
A set of pairs (Ai,Aj) which indicates that attendant Ai wishes to meet with attendat Aj, and Aj accepted that invitation.
Output:
For each assistant A, a cronogram of all the events he will attend. The main criteria is that each attendants should meet as many of the attendants who accepted his invites as possible, satisfying the space constraints.
So far, we thought of solving with backtracking (trying out all possible solutions), and using linear programming (i.e. defining a model and solving with the simplex algorithm)
Update: If Ai already met Aj in some event, they don't need to meet anymore (they have already met).

Your problem is as hard as minimum maximal matching problem in interval graphs, w.l.o.g Assume capacity of rooms is 2 means they can handle only one meeting in time. You can model your problem with Interval graphs, each interval (for each people) is one node. Also edges are if A_i & A_j has common time and also they want to see each other, set weight of edges to the amount of time they should see each other, . If you find the minimum maximal matching in this graph, you can find the solution for your restricted case. But notice that this graph is n-partite and also each part is interval graph.
P.S: note that if the amount of time that people should be with each other is fixed this will be more easier than weighted one.

If you have access to a good MIP solver (cplex/gurobi via acedamic initiative, but coin OR and LP_solve are open-source, and not bad either), I would definitely give simplex a try. I took a look at formulating your problem as a mixed integer program, and my feeling is that it will have pretty strong relaxations, so branch and cut and price will go a long way for you. These solvers give remarkably scalable solutions nowadays, especially the commercial ones. Advantage is they also provide an upper bound, so you get an idea of solution quality, which is not the case for heuristics.
Formulation:
Define z(i,j) (binary) as a variable indicating that i and j are together in at least one event n in {1,2,...,N}.
Define z(i,j,n) (binary) to indicate they are together in event n.
Define z(i,n) to indicate that i is attending n.
Z(i,j) and z(i,j,m) only exist if i and j are supposed to meet.
For each t, M^t is a subset of time events that are held simulteneously.
So if event 1 is from 9 to 11, event 2 is from 10 to 12 and event 3 is from 11 to 13, then
M^1 = {event 1, event 2) and M^2 = {event 2, event 3}. I.e. no person can attend both 1 and 2, or 2 and 3, but 1 and 3 is fine.
Max sum Z(i,j)
z(i,j)<= sum_m z(i,j,m)
(every i,j)(i and j can meet if they are in the same location m at least once)
z(i,j,m)<= z(i,m) (for every i,j,m)
(if i and j attend m, then i attends m)
z(i,j,m)<= z(j,m) (for every i,j,m)
(if i and j attend m, then j attends m)
sum_i z(i,m) <= C(m) (for every m)
(only C(m) persons can visit event m)
sum_(m in M^t) z(i,m) <= 1 (for every t and i)
(if m and m' are both overlapping time t, then no person can visit them both. )

As pointed out by #SaeedAmiri, this looks like a complex problem.
My guess would be that the backtracking and linear programming options you are considering will explode as soon as the number of assistants grows a bit (maybe in the order of tens of assistants).
Maybe you should consider a (meta)heuristic approach if optimality is not a requirement, or constraint programming to build an initial model and see how it scales.
To give you a more precise answer, why do you need to solve this problem? what would be the typical number of attendees? number of rooms?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio