Generating chains from an order of lists - algorithm

I am searching for how to accomplish something I've somewhat a grasp on, but do not:
I have n number of lists of varying size:
{A, B, C, D}
{1,2}
{X, Y, Z}
...to the nth potentially
How do I generate all possible chains of 1 item from each level A1X, A1Y, A1Z, etc. Its an algotrithmic and mathematic task, no its not homework(I know school is starting), its part of something I'm working on, I have no code --- I just need to be pointed in the right direction to formulate my terms.

(You didn't ask for code, but I tend to use Python for executable pseudo-code. You still need to translate the core algorithm to the language of your choice).
In effect, you are talking about forming the Cartesian Product of the lists. It can be done in various ways. If the number of lists isn't known ahead of time, a recursive approach is the most natural.
Let L1*L2* ... *Ln denote the list of all strings which are of the form
s1+s2+...+sn where si in Li and + is the concatenation operator. For a basis you could either take n ==1, a single List, or n == 0, no lists at all. In many ways the latter is more elegant, in which case it is natural to define the product of an empty list of strings to be the list whose sole element is the empty string.
Then:
Return [''] if n == 0
Otherwise return
[a+b | a ranges over L1 and b ranges over (L2 * L3 * ... * Ln)]
where (L2 * L3 * ... *Ln) was alread computed recursively (which will just be the empty string if n is 1).
The last list can easily be built up in a nested loop, or expressed more directly in any language which supports list comprehensions.
Here is a Python implementation which returns the list of all products given a list of lists of strings (abbreviated as lls in the code):
def product(lls):
if len(lls) == 0:
return ['']
else:
return [a+b for a in lls[0] for b in product(lls[1:])]
Tested like thus:
lists_of_strings = [['A','B','C','D'],['1','2','3'],['X','Y','Z']]
print(product(lists_of_strings))
With output:
['A1X', 'A1Y', 'A1Z', 'A2X', 'A2Y', 'A2Z', 'A3X', 'A3Y', 'A3Z', 'B1X', 'B1Y', 'B1Z', 'B2X', 'B2Y', 'B2Z', 'B3X', 'B3Y', 'B3Z', 'C1X', 'C1Y', 'C1Z', 'C2X', 'C2Y', 'C2Z', 'C3X', 'C3Y', 'C3Z', 'D1X', 'D1Y', 'D1Z', 'D2X', 'D2Y', 'D2Z', 'D3X', 'D3Y', 'D3Z']
In Python itself there isn't much motivation to do this since the itertools module has a nice product and the same product can be expressed as:
[''.join(p) for p in itertools.product(*lists_of_strings)]

Related

How do I find the right optimisation algorithm for my problem?

Disclaimer: I'm not a professional programmer or mathematician and this is my first time encountering the field of optimisation problems. Now that's out of the way so let's get to the problem at hand:
I got several lists, each containing various items and number called 'mandatoryAmount':
listA (mandatoryAmountA, itemA1, itemA2, itemA2, ...)
Each item has certain values (each value is a number >= 0):
itemA1 (M, E, P, C, Al, Ac, D, Ab,S)
I have to choose a certain number of items from each list determined by 'mandatoryAmount'.
Within each list I can choose every item multiple times.
Once I have all of the items from each list, I'll add up the values of each.
For example:
totalM = listA (itemA1 (M) + itemA1 (M) + itemA3 (M)) + listB (itemB1 (M) + itemB2 (M))
The goals are:
-To have certain values (totalAl, totalAc, totalAb, totalS) reach a certain number cap while going over that cap as little as possible. Anything over that cap is wasted.
-To maximize the remaining values with different weightings each
The output should be the best possible selection of items to meet the goals stated above. I imagine the evaluation function to just add up all non-waste values times their respective weightings while subtracting all wasted stats times their respective weightings.
edit:
The total amount of items across all lists should be somewhere between 500 and 1000, the number of lists is around 10 and the mandatoryAmount for each list is between 0 and 14.
Here's some sample code that uses Python 3 and OR-Tools. Let's start by
defining the input representation and a random instance.
import collections
import random
Item = collections.namedtuple("Item", ["M", "E", "P", "C", "Al", "Ac", "D", "Ab", "S"])
List = collections.namedtuple("List", ["mandatoryAmount", "items"])
def RandomItem():
return Item(
random.random(),
random.random(),
random.random(),
random.random(),
random.random(),
random.random(),
random.random(),
random.random(),
random.random(),
)
lists = [
List(
random.randrange(5, 10), [RandomItem() for j in range(random.randrange(5, 10))]
)
for i in range(random.randrange(5, 10))
]
Time to formulate the optimization as a mixed-integer program. Let's import
the solver library and initialize the solver object.
from ortools.linear_solver import pywraplp
solver = pywraplp.Solver.CreateSolver("solver", "SCIP")
Make constraints for the totals that must reach a certain cap.
AlCap = random.random()
totalAl = solver.Constraint(AlCap, solver.infinity())
AcCap = random.random()
totalAc = solver.Constraint(AcCap, solver.infinity())
AbCap = random.random()
totalAb = solver.Constraint(AbCap, solver.infinity())
SCap = random.random()
totalS = solver.Constraint(SCap, solver.infinity())
We want to maximize the other values subject to some weighting.
MWeight = random.random()
EWeight = random.random()
PWeight = random.random()
CWeight = random.random()
DWeight = random.random()
solver.Objective().SetMaximization()
Create variables and fill in the constraints. For each list there is an
equality constraint on the number of items.
associations = []
for list_ in lists:
amount = solver.Constraint(list_.mandatoryAmount, list_.mandatoryAmount)
for item in list_.items:
x = solver.IntVar(0, solver.infinity(), "")
amount.SetCoefficient(x, 1)
totalAl.SetCoefficient(x, item.Al)
totalAc.SetCoefficient(x, item.Ac)
totalAb.SetCoefficient(x, item.Ab)
totalS.SetCoefficient(x, item.S)
solver.Objective().SetCoefficient(
x,
MWeight * item.M
+ EWeight * item.E
+ PWeight * item.P
+ CWeight * item.C
+ DWeight * item.D,
)
associations.append((item, x))
if solver.Solve() != solver.OPTIMAL:
raise RuntimeError
solution = []
for item, x in associations:
solution += [item] * round(x.solution_value())
print(solution)
I think David Eisenstat has the right idea with Integer programming, but let's see if we get some good solutions otherwise and perhaps provide some initial optimization. However, I think that we can just choose all of one item in each list may make this easier to solve that it normally would be. Basically that turns it into more of a Subset Sum problem. Especially with the cap.
There are two possibilities here:
There is no solution, no condition satisfies the requirement.
There is a solution that we need to be optimized.
We really want to try to find a solution first, if we can find one (regardless of the amount of waste), then that's nice.
So let's reframe the problem: We aim to simply minimize waste, but we also need to meet a min requirement. So let's try to get as much waste as possible in ways we need it.
I'm going to propose an algorithm you could use that should work "fairly well" and is polynomial time, though could probably have some optimizations. I'll be using K to mean mandatoryAmount as it's a bit of a customary variable in this situation. Also I'll be using N to mean the number of lists. Lastly, Z to represent the total number of items (across all lists).
Get the list of all items and sort them by the amount of each value they have (first the goal values, then the bonus values). If an item has 100A, 300C, 200B, 400D, 150E and the required are [B, D], then the sort order would look like: [400,200,300,150,100]. Repeat but for one goal value. Using the same example above we would have: [400,300,150,100] for goal: D and [200,300,150,100] for goal B. Create a boolean variable for optimization mode (we start by seeking for a solution, once we find one, we can try to optimize it). Create a counter/hash to contain the unassigned items. An item cannot be unassigned more than K times (to avoid infinite loops). This isn't strictly needed, but could work as an optimization for step 5, as it prioritize goals you actually need.
For each list, keep a counter of the number of assignable slots for each list, set each to K, as well as the number of total assignable slots, and set to K * N. This will be adjusted as needed along the way. You want to be able to quickly O(1) lookup for: a) which list an (sorted) item belongs to, b) how many available slots that item has, and c) How many times has the item been unassigned, d) Find the item is the sorted list.
General Assignment. While there are slots available (total slots), go through the list from highest to lowest order. If the list for that item is available, assign as many slots as possible to that item. Update the assignable and total slots. If result is a valid solution, record it, trip the "optimization mode flag". If slots remain unassigned, revert the previous unassignment (but do not change the assignment count).
Waste Optimization. Find the most wasteful item that can be unassigned (unassigned count < K). Unassign one slot of it. If in optimization mode, do not allow any of the goal values to go below their cap (skip if it would). Update the unassigned count for item. Goto #3, but start just after the wasteful item. If no assignment made, reassign this item until the list has no remaining assignments, but do not update the unassigned count (otherwise we might end up in an invalid state).
Goal value Optimization. Skip if current state is a valid solution. Find the value furthest from it's goal (IE: A/B/C/D/E above) that can be unassigned. Unassign one slot for that item. Update assignment count. Goto step 3, begin search at start of list (unlike Step 4), stop searching the list if you go below the value of this item (not this item itself, as others may have the same value). If no assignment made, reassign this item until the list has no remaining assignments, but do not update the unassigned count (otherwise we might end up in an invalid state).
No Assignments remain. Return current state as "best solution found".
Algorithm should end with the "best" solution that this approach can come up with. Increasing max unassignment counts may improve the solution, decreasing max assignment counts will speed up the algorithm. Algorithm will run until it has maxed out it's assignment counts.
This is a bit of a greedy algorithm, so I'm not sure it's optimal (in that it will always yield the best result) but it may give you some ideas as to how to approach it. It also feels like it should yield fairly good results, as it basically trying to bound the results. Algorithm performance is something like O(Z^2 * K), where K is the mandatoryAmount and Z is the total number of items. Each item is unassigned K items, and potentially each assignment also requires O(Z) checks before it is reassigned.
As an optimization, use a O(log N) or better delete/next operation sorted data structure to store the sorted lists. Doing so it would make it practical to delete items from the assignment lists once the unassignment count reaches K (rendering them no longer assignable) allowing for O(Z * log(Z) * K) performance instead.
Edit:
Hmmm, the above only works within a single list (IE: Item removed can only be added to it's own list, as only that list has room). To avoid this, do step 4 (remove too heavy) then step 5 (remove too light) and then goto step 3 (using step 5's rules for searching, but also disallow adding back the too heavy ones).
So basically we remove the heaviest one then the lightest one then we try to assign something that is as heavy as possible to make up for the lightest one we removed.

Fast way to compare cyclical data

Suppose I have the data set {A,B,C,D}, of arbitrary type, and I want to compare it to another data set. I want the comparison to be true for {A,B,C,D}, {B,C,D,A}, {C,D,A,B}, and {D,A,B,C}, but not for {A,C,B,D} or any other set that is not ordered similarly. What is a fast way to do this?
Storing them in arrays,rotating, and doing comparison that way is an O(n^2) task so that's not very good.
My first intuition would be to store the data as a set like {A,B,C,D,A,B,C} and then search for a subset, which is only O(n). Can this be done any faster?
There is a fast algorithm for finding the minimum rotation of a string - https://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation. So you can store and compare the minimum rotation.
One option is to use a directed graph. Set up a graph with the following transitions:
A -> B
B -> C
C -> D
D -> A
All other transitions will put you in an error state. Thus, provided each member is unique (which is implied by your use of the word set), you will be able to determine membership provided you end on the same graph node on which you started.
If a value can appear multiple times in your search, you'll need a smarter set of states and transitions.
This approach is useful if you precompute a single search and then match it to many data points. It's not so useful if you have to constantly regenerate the graph. It could also be cache-inefficient if your state table is large.
Well Dr Zoidberg, if you are interested in order, as you are, then you need to store your data in a structure that preserves order and also allows for easy rotation.
In Python a list would do.
Find the smallest element of the list then rotate each list you want to compare until the smallest element of them is at the beginning. Note: this is not a sort, but a rotation. With all the lists for comparison so normalised, a straight forward list compare between any two would tell if they are the same after rotation.
>>> def rotcomp(lst1, lst2):
while min(lst1) != lst1[0]:
lst1 = lst1[1:] + [lst1[0]]
while min(lst2) != lst2[0]:
lst2 = lst2[1:] + [lst2[0]]
return lst1 == lst2
>>> rotcomp(list('ABCD'), list('CDAB'))
True
>>> rotcomp(list('ABCD'), list('CDBA'))
False
>>>
>>> rotcomp(list('AABC'), list('ABCA'))
False
>>> def rotcomp2(lst1, lst2):
return repr(lst1)[1:-1] in repr(lst2 + lst2)
>>> rotcomp2(list('ABCD'), list('CDAB'))
True
>>> rotcomp2(list('ABCD'), list('CDBA'))
False
>>> rotcomp2(list('AABC'), list('ABCA'))
True
>>>
NEW SECTION: WITH DUPLICATES?
If the input may contain duplicates then, (from the possible twin question mentioned under the question), An algorithm is to see if one list is a sub-list of the other list repeated twice.
function rotcomp2 uses that algorithm and a textual comparison of the repr of the list contents.

Algorithm to find "ordered combinations"

I need an algorithm to find, what I call, "ordered combinations" (Maybe someone knows the real name for this if there is one).
Of course I already tried to come up with an algorithm on my own but I'm really stuck.
How it should work:
Given 2 lists (not sets, order is important here!) of elements that are guaranteed to contain the same elements, all ordered combinations.
An ordered combination is a 2-tuple, 3-tuple, ... n-tuple (no limit on N) of elements that appear in the same order in both lists.
Its entirely possible that an element occurs more than once in a list.
But every element from one list is guaranteed to appear at least once in the other list.
It does not matter if the output contains a combination more than once.
I'm not really sure if that makes it clear so here are multiple examples:
(List1, List2, Expected Result, Annotation)
ASDF
ADSF
Result: AS, AD, AF, SF, DF, ASF, ADF
Note: ASD is not a valid result because there is no way to have ascending indices in the second list for this combination
ADSD
ASDD
Result: AD, AS, AD, DD, SD, ASD, ADD
Note: AD appears twice because it can be created from indices 1,2 and 1,4 and in the second list 1,3 and 1,4. But it would also be correct if it only appears once. Also D appears twice in both lists in an order, so this allows ADD as a valid combination too.
SDFG
SDFG
Result: SD, SF, SG, DF, DG, FG, SDF, SFG, SDG, DFG, SDFG,
Note: Same input; all combinations are possible
ABCDEFG
GFEDCBA
Result: <empty>
Note: There are no combinations that appear in the same order in both lists
QWRRRRRRR
WRQ
Result: WR
Note: The only combination that appears in the same order in both sets is WR
Notes:
While it's a language agnostic algorithm I'd prefer answers that contain either C# or pseudo-code so I can understand them.
I realized that longer combinations are always made up from shorter combinations. Example: SDF can only be a valid result if SD and DF are possible too. Maybe this helps to make the algorithm more performant by building the longer combinations from the shorter ones.
Speed is of great importance here. This is algorithm will be used in realtime!
If it's not clear how the algorithm works, drop a comment. I'll add an example to clarify it.
Maybe this problem is already known and solved, but I don't know the proper name for it.
I would describe this problem as enumerating common subsequences of two strings. As a first cut, make a method like this, which chooses the first letter nondeterministically and recurses (Python, sorry).
def commonsubseqs(word1, word2, prefix=''):
if len(prefix) >= 2:
print(prefix)
for letter in set(word1) & set(word2): # set intersection
# figure out what's left after consuming the first instance of letter
remainder1 = word1[word1.index(letter) + 1:]
remainder2 = word2[word2.index(letter) + 1:]
# take letter and recurse
commonsubseqs(remainder1, remainder2, prefix + letter)
If this simple solution is not fast enough for you, then it can be improved as follows. For each pair of suffixes of the two words, we precompute the list of recursive calls. In Python again:
def commonsubseqshelper(table, prefix, i, j):
if len(prefix) >= 2:
print(''.join(prefix))
for (letter, i1, j1) in table[i][j]:
prefix.append(letter)
commonsubseqshelper(table, prefix, i1, j1)
del prefix[-1] # delete the last item
def commonsubseqs(word1, word2):
table = [[[(letter, word1.index(letter, i) + 1, word2.index(letter, j) + 1)
for letter in set(word1[i:]) & set(word2[j:])]
for j in range(len(word2) + 1)] # 0..len(word2)
for i in range(len(word1) + 1)] # 0..len(word1)
commonsubseqshelper(table, [], 0, 0)
This polynomial-time preprocessing step improves the speed of enumeration to its asymptotic optimum.

Algorithm for discrete similarity metric

Given that I have two lists that each contain a separate subset of a common superset, is
there an algorithm to give me a similarity measurement?
Example:
A = { John, Mary, Kate, Peter } and B = { Peter, James, Mary, Kate }
How similar are these two lists? Note that I do not know all elements of the common superset.
Update:
I was unclear and I have probably used the word 'set' in a sloppy fashion. My apologies.
Clarification: Order is of importance.
If identical elements occupy the same position in the list, we have the highest similarity for that element.
The similarity decreased the farther apart the identical elements are.
The similarity is even lower if the element only exists in one of the lists.
I could even add the extra dimension that lower indices are of greater value, so a a[1] == b[1] is worth more than a[9] == b[9], but that is mainly cause I am curious.
The Jaccard Index (aka Tanimoto coefficient) is used precisely for the use case recited in the OP's question.
The Tanimoto coeff, tau, is equal to Nc divided by Na + Nb - Nc, or
tau = Nc / (Na + Nb - Nc)
Na, number of items in the first set
Nb, number of items in the second set
Nc, intersection of the two sets, or the number of unique items
common to both a and b
Here's Tanimoto coded as a Python function:
def tanimoto(x, y) :
w = [ ns for ns in x if ns not in y ]
return float(len(w) / (len(x) + len(y) - len(w)))
I would explore two strategies:
Treat the lists as sets and apply set ops (intersection, difference)
Treat the lists as strings of symbols and apply the Levenshtein algorithm
If you truly have sets (i.e., an element is simply either present or absent, with no count attached) and only two of them, just adding the number of shared elements and dividing by the total number of elements is probably about as good as it gets.
If you have (or can get) counts and/or more than two of them, you can do a bit better than that with something like cosine simliarity or TFIDF (term frequency * inverted document frequency).
The latter attempts to give lower weighting to words that appear in all (or nearly) all the "documents" -- i.e., sets of words.
What is your definition of "similarity measurement?" If all you want is how many items in the set are in common with each other, you could find the cardinality of A and B, add the cardinalities together, and subtract from the cardinality of the union of A and B.
If order matters you can use Levenshtein distance or other kind of Edit distance
.

Algorithm to merge two lists lacking comparison between them

I am looking for an algorithm to merge two sorted lists,
but they lack a comparison operator between elements of one list and elements of the other.
The resulting merged list may not be unique, but any result which satisfies the relative sort order of each list will do.
More precisely:
Given:
Lists A = {a_1, ..., a_m}, and B = {b_1, ..., b_n}. (They may be considered sets, as well).
A precedence operator < defined among elements of each list such that
a_i < a_{i+1}, and b_j < b_{j+1} for 1 <= i <= m and 1 <= j <= n.
The precedence operator is undefined between elements of A and B:
a_i < b_j is not defined for any valid i and j.
An equality operator = defined among all elements of either A or B
(it is defined between an element from A and an element from B).
No two elements from list A are equal, and the same holds for list B.
Produce:
A list C = {c_1, ..., c_r} such that:
C = union(A, B); the elements of C are the union of elements from A and B.
If c_p = a_i, c_q = a_j, and a_i < a_j, then c_p < c_q. (The order of elements
of the sublists of C corresponding to sets A and B should be preserved.
There exist no i and j such that c_i = c_j.
(all duplicated elements between A and B are removed).
I hope this question makes sense and that I'm not asking something either terribly obvious,
or something for which there is no solution.
Context:
A constructible number can be represented exactly in finitely many quadratic extensions to the field of rational numbers (using a binary tree of height equal to the number of field extensions).
A representation of a constructible number must therefore "know" the field it is represented in.
Lists A and B represent successive quadratic extensions of the rational numbers.
Elements of A and B themselves are constructible numbers, which are defined in the context
of previous smaller fields (hence the precedence operator). When adding/multiplying constructible numbers,
the quadratically extended fields must first be merged so that the binary arithmetic
operations can be performed; the resulting list C is the quadratically extended field which
can represent numbers representable by both fields A and B.
(If anyone has a better idea of how to programmatically work with constructible numbers, let me know. A question concerning constructible numbers has arisen before, and also here are some interesting responses about their representation.)
Before anyone asks, no, this question does not belong on mathoverflow; they hate algorithm (and generally non-graduate-level math) questions.
In practice, lists A and B are linked lists (stored in reverse order, actually).
I will also need to keep track of which elements of C corresponded to which in A and B, but that is a minor detail.
The algorithm I seek is not the merge operation in mergesort,
because the precedence operator is not defined between elements of the two lists being merged.
Everything will eventually be implemented in C++ (I just want the operator overloading).
This is not homework, and will eventually be open sourced, FWIW.
I don't think you can do it better than O(N*M), although I'd be happy to be wrong.
That being the case, I'd do this:
Take the first (remaining) element of A.
Look for it in (what's left of) B.
If you don't find it in B, move it to the output
If you do find it in B, move everything from B up to and including the match, and drop the copy from A.
Repeat the above until A is empty
Move anything left in B to the output
If you want to detect incompatible orderings of A and B, then remove "(what's left of)" from step 2. Search the whole of B, and raise an error if you find it "too early".
The problem is that given a general element of A, there is no way to look for it in B in better than linear time (in the size of B), because all we have is an equality test. But clearly we need to find the matches somehow and (this is where I wave my hands a bit, I can't immediately prove it) therefore we have to check each element of A for containment in B. We can avoid a bunch of comparisons because the orders of the two sets are consistent (at least, I assume they are, and if not there's no solution).
So, in the worst case the intersection of the lists is empty, and no elements of A are order-comparable with any elements of B. This requires N*M equality tests to establish, hence the worst-case bound.
For your example problem A = (1, 2, c, 4, 5, f), B = (a, b, c, d, e, f), this gives the result (1,2,a,b,c,4,5,d,e,f), which seems good to me. It performs 24 equality tests in the process (unless I can't count): 6 + 6 + 3 + 3 + 3 + 3. Merging with A and B the other way around would yield (a,b,1,2,c,d,e,4,5,f), in this case with the same number of comparisons, since the matching elements just so happen to be at equal indices in the two lists.
As can be seen from the example, the operation can't be repeated. merge(A,B) results in a list with an order inconsistent with that of merge(B,A). Hence merge((merge(A,B),merge(B,A)) is undefined. In general, the output of a merge is arbitrary, and if you go around using arbitrary orders as the basis of new complete orders, you will generate mutually incompatible orders.
This sounds like it would use a degenerate form of topological sorting.
EDIT 2:
Now with a combined routine:
import itertools
list1 = [1, 2, 'c', 4, 5, 'f', 7]
list2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
ibase = 0
result = []
for n1, i1 in enumerate(list1):
for n2, i2 in enumerate(itertools.islice(list2, ibase, None, 1)):
if i1 == i2:
result.extend(itertools.islice(list2, ibase, ibase + n2))
result.append(i2)
ibase = n2 + ibase + 1
break
else:
result.append(i1)
result.extend(itertools.islice(list2, ibase, None, 1))
print result
Would concatenating the two lists be sufficient? It does preserve the relative sortedness of elements from a and elements from b.
Then it's just a matter of removing duplicates.
EDIT: Alright, after the comment discussion (and given the additional condition that a_i=b_i & a_j=b_j & a_i<a_j => b_i<b-J), here's a reasonable solution:
Identify entries common to both lists. This is O(n2) for the naive algorithm - you might be able to improve it.
(optional) verify that the common entries are in the same order in both lists.
Construct the result list: All elements of a that are before the first shared element, followed by all elements of b before the first shared element, followed by the first shared element, and so on.
Given the problem as you have expressed it, I have a feeling that the problem may have no solution. Suppose that you have two pairs of elements {a_1, b_1} and {a_2, b_2} where a_1 < a_2 in the ordering of A, and b_1 > b_2 in the ordering of B. Now suppose that a_1 = b_1 and a_2 = b_2 according to the equality operator for A and B. In this scenario, I don't think you can create a combined list that satisfies the sublist ordering requirement.
Anyway, there's an algorithm that should do the trick. (Coded in Java-ish ...)
List<A> alist = ...
List<B> blist = ...
List<Object> mergedList = new SomeList<Object>(alist);
int mergePos = 0;
for (B b : blist) {
boolean found = false;
for (int i = mergePos; i < mergedList.size(); i++) {
if (equals(mergedList.get(i), b)) {
found = true; break;
}
}
if (!found) {
mergedList.insertBefore(b, mergePos);
mergePos++;
}
}
This algorithm is O(N**2) in the worst case, and O(N) in the best case. (I'm skating over some Java implementation details ... like combining list iteration and insertion without a major complexity penalty ... but I think it can be done in this case.)
The algorithm neglects the pathology I mentioned in the first paragraph and other pathologies; e.g. that an element of B might be "equal to" multiple elements of A, or vice versa. To deal with these, the algorithm needs to check each b against all elements of the mergedList that are not instances of B. That makes the algorithm O(N**2) in the best case.
If the elements are hashable, this can be done in O(N) time where N is the total number of elements in A and B.
def merge(A, B):
# Walk A and build a hash table mapping its values to indices.
amap = {}
for i, a in enumerate(A):
amap[a] = i
# Now walk B building C.
C = []
ai = 0
bi = 0
for i, b in enumerate(B):
if b in amap:
# b is in both lists.
new_ai = amap[b]
assert new_ai >= ai # check for consistent input
C += A[ai:new_ai] # add non-shared elements from A
C += B[bi:i] # add non-shared elements from B
C.append(b) # add the shared element b
ai = new_ai + 1
bi = i + 1
C += A[ai:] # add remaining non-shared elements from A
C += B[bi:] # from B
return C
A = [1, 2, 'c', 4, 5, 'f', 7]
B = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
print merge(A, B)
(This is just an implementation of Anon's algorithm. Note that you can check for inconsistent input lists without hurting performance and that random access into the lists is not necessary.)

Resources