Merge several pre-sorted lists into a joint list in most effective way (and taking top elements of it ) - algorithm

I have such a pre-set:
many pre-sorted ordered lists, e.g., 1000 elements each sorted by a criteria (for instance, "last" based on a time)
need to make a joint list that maintains sort order and also contains 1000 last elements (so can discard elements of original lists that do not fit into top 1000).
However, selecting 1000 top can be done separately as well.
merging needs to be as fast, as efficient as possible. Re-sorting full merged list is not an option.

Use any priority queue-based data structure:
priority queue q = empty
for each list add first element to q
create an array next that contains next elements for every list (initially next element is a second element)
while result list is not full
take top element from the q and add to the result list
add next element of the corresponding list to the q (if any)
update next element of the corresponding list

This problem is known as Merge k sorted arrays.
In java, it is quit simple to resolve it. Just add elements into a sortedSet(the add to the sorted set is fast). And stop when you reach the 1000 top.
SortedSet<Integer> s = new TreeSet<>();
//s1,s2,s3 are the input lists here
int n = Math.max(Math.max(s2.size(), s1.size()), s3.size());
for (int i = 0; i < n || s.size() <= limit; i++) {
if (s1.get(i) != null) {
s.add(s1.get(i));
}
if (s2.get(i) != null) {
s.add(s2.get(i));
}
if (s3.get(i) != null) {
s.add(s3.get(i));
}
}

Related

Grouping numbers in a list

I came across the following question,
You are given an array A of n elements. These elements are now added to a new list L which is initially empty , in a certain order based on the given q queries.
In each query you are given an integer i that corresponds to A[i] in the array A. This means that you have to add the element A[i] to the list L.
After each element is added to the list L, make groups among the elements in the list L. Two elements will be in same group if their indexes in the array A are consecutive.
For each group we define the group’s value as axb where a is the largest value in that group and b is the size of that group.
Print the maximum group value among all the groups that are formed after each element is added to the list L.
My approach was to use a map<int,vector<int>> where key is the group number and value is a vector containing group size, max. of group. I also had an array g and g[i] indicated group number of a[i], -1 if it is not in any group. The code below is a part of my implementation, but I'm sure there are better ways to solve this question as this solution of mine gave TLE and WA in some cases,and I can't seem to figure out the correct approach. Pls suggest optimal way to solve this.
int g[a.size()+2]; //+2 because queries start with index 1, and g[i] corresponds to a[i-1]
for(int i=0;i<a.size()+2;i++)
g[i]=-1;
int gno=1;
map<int,vector<int> > m;
vector<int> ans;
int mx=0;
for(unsigned int i=0;i<queries.size();i++){
int q = queries[i];
if(g[q-1]==-1 && g[q+1]==-1){
//create new group with current eleent as first element
g[q] = gno; //gno is the group number.
vector<int> v;
v.push_back(1);
v.push_back(a[q-1]);
m[gno]=v;
mx = max(mx,m[gno][0]*m[gno][1]);
gno++;
}
else if(g[q-1]!=-1 && g[q+1]==-1){
//join current element to left group
g[q] = g[q-1];
m[g[q]][0]++;
m[g[q]][1] = max(m[g[q]][1],a[q-1]);
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
else if(g[q-1]==-1 && g[q+1]!=-1){
//join current element to right group
g[q] = g[q+1];
m[g[q]][0]++;
m[g[q]][1] = max(m[g[q]][1],a[q-1]);
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
else{
//join both groups to left and right
g[q]=g[q-1];
int g1 = g[q];
int i;
m[g[q]][0] += 1 + m[g[q+1]][0];
m[g[q]][1] = max(m[g[q]][1],max(a[q-1],m[g[q+1]][1]));
for(i=q+1;g[i]==g[i+1];i++){
g[i]=g1;
}
g[i]=g1;
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
ans.push_back(mx);
}
.
I would not actually build list L. It may be too costly in time to find what to do with a new value: is it a new group on itself, does it extend an existing group, do two groups need to merge into one? If the first values are all far apart, you'll have many groups, and you need to iterate them with each new incoming value: this is not efficient.
I would just collect all the values first and only then see how they fit in groups.
There are two ways to collect the values:
Store them in a list, and when all values have been collected, sort the list in ascending order
Flag the entry in an array of booleans of size n. This way you do not have to sort it, but afterwards you do need to iterate the whole array to find the values in ascending order.
Method 1 will be the best when q is a lot less than n. Method 2 will be better for greater q.
With both methods you'll be able to iterate over the found values in ascending order, and while doing so you can identify the groups, their value, and also keep track of the largest group-value. Only one sweep is needed to find the answer.
Let's start with two simplifying assumptions:
no duplicates. Once a given index i has been "queried", it will never be queried again.
no negative numbers. All elements are positive or zero, so the largest value in a group is always positive or zero, so expanding a group (or merging two groups) will never cause the overall "maximum group value" to decrease.
(Further below I'll show how to not require those assumptions, but for now this will simplify the picture.)
So, whenever we "query" an index i, there are four cases:
i-1 is currently the right-endpoint of a group (by which I mean its greatest index) and i+1 is currently the left-endpoint of another group.
In this case, we need to merge the two groups into a single group, with i bridging the gap between them.
i-1 is currently the right-endpoint of a group, but i+1 is not currently in any group.
In this case we need to extend the group to cover i.
i-1 is not currently in any group, but i+1 is currently the left-endpoint of a group.
In this case, as in the previous case, we need to extend the group to cover i.
Neither i-1 nor i+1 is in a group.
In this case, we have a new group with just one element.
In all cases, the key thing to note is that we're only interested in the endpoints of groups. So we don't need a general mapping from indices to their groups . . . which is good, because when we merge two groups, it would be expensive to then go and update every single index from one group to point to the other.
So we just need three mappings:
std::unordered_map<int, int> map_from_left_endpoint_to_right_endpoint;
std::unordered_map<int, int> map_from_right_endpoint_to_left_endpoint;
std::unordered_map<int, int> map_from_left_endpoint_to_largest_value;
To distinguish the four cases, we use e.g. map_from_right_endpoint_to_left_endpoint.find(i - 1) (which returns an iterator pointing to the left-endpoint of the group that i-1 is the right-endpoint of, if applicable; otherwise it returns map_from_right_endpoint_to_left_endpoint.end()). We then delete entries as they become no-longer-applicable (due to groups being extended or merged in a given direction), in addition to (obviously) inserting new entries, and updating the values of existing entries.
In addition to those values, we also need an
int maximum_group_value = 0;
and whenever we extend a group or merge two groups, we check whether the value of the resulting group (meaning its largest_value * (right_endpoint - left_endpoint + 1) is greater than maximum_group_value. If so, we update maximum_group_value and return it; if not, we return maximum_group_value as-is.
Now, what if duplicates are allowed, such that a given index i might be "queried" after it already belongs to a group?
The simplest approach is to simply keep track of which i-s have already been queried; but a more elegant approach, if desired, might be to change map_from_left_endpoint_to_right_endpoint from a std::unordered_map to a std::map, and then use something like this:
bool is_already_in_a_group(
std::map<int, int> const & map_from_left_endpoint_to_right_endpoint,
int const i) {
// get iterator to first element *after* index (or to 'end()' if no such):
auto iter = map_from_left_endpoint_to_right_endpoint.upper_bound(index);
// if that pointer points to 'begin()', then there are no elements
// at or before index:
if (iter == map_from_left_endpoint_to_right_endpoint.begin()) {
return false;
}
// otherwise, move iterator to point to the last element whose key is
// less than or equal to index:
--iter;
// . . . and check whether the value of that element is greater than
// or equal to index (meaning that [key, value] spans index):
return iter->second >= index;
}
to check if the greatest key in map_from_left_endpoint_to_right_endpoint that is less than or equal to i is mapped to a value that is greater than or equal to i.
This adds a fifth case to our case analysis above — "if i is already inside a group, just do nothing and return maximum_group_value" — but other than that, has no effect.
Note that this same approach also lets us eliminate map_from_right_endpoint_to_left_endpoint, if we want: the above function could easily be tweaked to int get_left_endpoint_for_right_endpoint by changing its return statement to return iter->second == index ? iter->first : -1;.
At this point it becomes sensible to define a Group class with three fields (left_endpoint, right_endpoint, and largest_value), and just keep a single map_from_left_endpoint_to_group.
Lastly — what if negative values are allowed, such that the "maximum group value" can actually decrease as the result of a query? (For example, if the array elements are [-1, -10] and the queries are i=0, i=1, then the results are maximum_group_value=-1, maximum_group_value=-2.) In such a case, we need to keep track of the values of all current groups, because any one of them might suddenly become the maximum.
For that, instead of storing a single int maximum_group_value, we can maintain a heap of groups, ordered by value, that we push into every time we create/extend/merge groups. (We can just use a std::vector<Group> for this, plus std::push_heap with an appropriate comparator, or with an appropriate definition for operator<(Group const &, Group const &).) After each query, we check if the top group on the heap (the first element in the vector) is still a group that actually exists; if so, we return its value, otherwise we pop it (using std::pop_heap) and repeat.
As an optimization, we can also store int maximum_group_value, and eliminate the heap once we've encountered a nonnegative array-element (since as soon as a given group contains a nonnegative array-element, its value can never decrease again, and obviously the maximum group value will be the value of one of those groups).

Binary search with gaps

Let's imagine two arrays like this:
[8,2,3,4,9,5,7]
[0,1,1,0,0,1,1]
How can I perform a binary search only in numbers with an 1 below it, ignoring the rest?
I know this can be in O(log n) comparisons, but my current method is slower because it has to go through all the 0s until it hits an 1.
If you hit a number with a 0 below, you need to scan in both directions for a number with a 1 below until you find it -- or the local search space is exhausted. As the scan for a 1 is linear, the ratio of 0s to 1s determines whether the resulting algorithm can still be faster than linear.
This question is very old, but I've just discovered a wonderful little trick to solve this problem in most cases where it comes up. I'm writing this answer so that I can refer to it elsewhere:
Fast Append, Delete, and Binary Search in a Sorted Array
The need to dynamically insert or delete items from a sorted collection, while preserving the ability to search, typically forces us to switch from a simple array representation using binary search to some kind of search tree -- a far more complicated data structure.
If you only need to insert at the end, however (i.e., you always insert a largest or smallest item), or you don't need to insert at all, then it's possible to use a much simpler data structure. It consists of:
A dynamic (resizable) array of items, the item array; and
A dynamic array of integers, the set array. The set array is used as a disjoint set data structure, using the single-array representation described here: How to properly implement disjoint set data structure for finding spanning forests in Python?
The two arrays are always the same size. As long as there have been no deletions, the item array just contains the items in sorted order, and the set array is full of singleton sets corresponding to those items.
If items have been deleted, though, items in the item array are only valid if the there is a root set at the corresponding position in the set array. All sets that have been merged into a single root will be contiguous in the set array.
This data structure supports the required operations as follows:
Append (O(1))
To append a new largest item, just append the item to the item array, and append a new singleton set to the set array.
Delete (amortized effectively O(log N))
To delete a valid item, first call search to find the adjacent larger valid item. If there is no larger valid item, then just truncate both arrays to remove the item and all adjacent deleted items. Since merged sets are contiguous in the set array, this will leave both arrays in a consistent state.
Otherwise, merge the sets for the deleted item and adjacent item in the set array. If the deleted item's set is chosen as the new root, then move the adjacent item into the deleted item's position in the item array. Whichever position isn't chosen will be unused from now on, and can be nulled-out to release a reference if necessary.
If less than half of the item array is valid after a delete, then deleted items should be removed from the item array and the set array should be reset to an all-singleton state.
Search (amortized effectively O(log N))
Binary search proceeds normally, except that we need to find the representative item for every test position:
int find(item_array, set_array, itemToFind) {
int pos = 0;
int limit = item_array.length;
while (pos < limit) {
int testPos = pos + floor((limit-pos)/2);
if (item_array[find_set(set_array, testPos)] < itemToFind) {
pos = testPos + 1; //testPos is too low
} else {
limit = testPos; //testPos is not too low
}
}
if (pos >= item_array.length) {
return -1; //not found
}
pos = find_set(set_array, pos);
return (item_array[pos] == itemToFind) ? pos : -1;
}

Parallel sorting of a singly-linked list

Is there any algorithm that makes it parallel sorting of a linked list worth it?
It's well known that Merge Sort is the best algorithm to use for sorting a linked list.
Most merge sorts are explained in terms of arrays, with each half recursively sorted. This would make it trivial to parallelize: sort each half independently then merge the two halves.
But a linked list doesn't have a "half-way" point; a linked list goes until it ends:
Head → [a] → [b] → [c] → [d] → [e] → [f] → [g] → [h] → [i] → [j] → ...
The implementation i have now walks the list once to get a count, then recursively splits the counts until we're comparing a node with it's NextNode. The recursion takes care of remembering where the two halves are.
This means the MergeSort of a linked list progresses linearly through the list. Since it seems to demand linearly progression through a list, i would think it then cannot be parallelized. The only way i could imagine it is by:
walk the list to get a count O(n)
walk half the list to get to the halfway point O(n/2)
then sort each half O(n log n)
But even if we did parallelize sorting (a,b) and (c,d) in separate threads, i would think that the false sharing during NextNode reordering would kill any virtue of parallelization.
Is there any parallel algorithms for sorting a linked list?
Array merge sort algorithm
Here is the standard algorithm for performing a merge sort on an array:
algorithm Merge-Sort
input:
an array, A (the values to be sorted)
an integer, p (the lower bound of the values to be sorted)
an integer, r (the upper bound of the values to be sorted)
define variables:
an integer, q (the midpoint of the values to be sorted)
q ← ⌊(p+r)/2⌋
Merge-Sort(A, p, q) //sort the lower half
Merge-Sort(A, q+1, r) //sort the upper half
Merge(A, p, q, r)
This algorithm is designed, and meant, for arrays, with arbitrary index access. To make it suitable for linked lists, it has to be modified.
Linked-list merge sort algorithm
This is (single-threaded) singly-linked list, merge sort, algorithm i currently use to sort the singly linked list. It comes from the Gonnet + Baeza Yates Handbook of Algorithms
algorithm sort:
input:
a reference to a list, r (pointer to the first item in the linked list)
an integer, n (the number of items to be sorted)
output:
a reference to a list (pointer to the sorted list)
define variables:
a reference to a list, A (pointer to the sorted top half of the list)
a reference to a list, B (pointer to the sorted bottom half of the list)
a reference to a list, temp (temporary variable used to swap)
if r = nil then
return nil
if n > 1 then
A ← sort(r, ⌊n/2⌋ )
B ← sort(r, ⌊(n+1)/2⌋ )
return merge( A, B )
temp ← r
r ← r.next
temp.next ← nil
return temp
A Pascal implementation would be:
function MergeSort(var r: list; n: integer): list;
begin
if r = nil then
Result := nil
else if n > 1 then
Result := Merge(MergeSort(r, n div 2), MergeSort(r, (n+1) div 2) )
else
begin
Result := r;
r := r.next;
Result.next := nil;
end
end;
And if my transcoding works, here's an on-the-fly C# translation:
list function MergeSort(ref list r, Int32 n)
{
if (r == null)
return null;
if (n > 1)
{
list A = MergeSort(r, n / 2);
list B = MergeSort(r, (n+1) / 2);
return Merge(A, B);
}
else
{
list temp = r;
r = r.next;
temp.next = null;
return temp;
}
}
What i need now is a parallel algorithm to sort a linked list. It doesn't have to be merge sort.
Some have suggested copying the next n items, where n items fit into a single cache-line, and spawn a task with those.
Sample data
algorithm GenerateSampleData
input:
an integer, n (the number of items to generate in the linked list)
output:
a reference to a node (the head of the linked list of random data to be sorted)
define variables:
a reference to a node, head (the returned head)
a reference to a node, item (an item in the linked list)
an integer, i (a counter)
head ← new node
item ← head
for i ← 1 to n do
item.value ← Random()
item.next ← new node
item ← item.next
return head
So we could generate a list of 300,000 random items by calling:
head := GenerateSampleData(300000);
Benchmarks
Time to generate 300,000 items 568 ms
MergeSort
count splitting variation 3,888 ms (baseline)
MergeSort
Slow-Fast midpoint finding 3,920 ms (0.8% slower)
QuickSort
Copy linked list to array 4 ms
Quicksort array 5,609 ms
Relink list 5 ms
Total 5,625 ms (44% slower)
Bonus Reading
Stackoverflow: What's the fastest algorithm for sorting a linked list?
Stackoverflow: Merge Sort a Linked List
Mergesort For Linked Lists
Parallel Merge Sort O(log n) pdf, 1986
Stackoverflow: Parallel Merge Sort (Closed, in typical SO nerd-rage fashion)
Parallel Merge Sort Dr. Dobbs, 3/24/2012
Eliminate False Sharing Dr. Dobbs, 3/14/2009
Mergesort For Linked Lists, by Simon Tatham (of Putty fame)
Mergesort is perfect for parallel sorting. Split the list in two halves and sort each of them in parallel, then merge the result. If you need more than two parallel sorting processes, do so recursively. If you don't happen to have infinitely many CPUs, you can omit parallelization at a certain recusion depth (which you will have to determine by testing).
BTW, the usual approach to splitting a list in two halves of roughly the same size is Floyd's Cycle Finding algorithm, also known as the hare and tortoise approach:
Node MergeSort(Node head)
{
if ((head == null) || (head.Next == null))
return head; //Oops, don't return null; what if only head.Next was null
Node firstHalf = head;
Node middle = GetMiddle(head);
Node secondHalf = middle.Next;
middle.Next = null; //cut the two halves
//Sort the lower and upper halves
firstHalf = MergeSort(firstHalf);
secondHalf = MergeSort(secondHalf);
//Merge the sorted halves
return Merge(firstHalf, secondHalf);
}
Node GetMiddle(Node head)
{
if (head == null || head.Next == null)
return null;
Node slow = head;
Node fast = head;
while ((fast.Next != null) && (fast.Next.Next != null))
{
slow = slow.Next;
fast = fast.Next.Next;
}
return slow;
}
After that, list and list2 are two lists of roughly the same size. Concatenating them would yield the original list. Of course, fast = fast->next->next needs further attention; this is just to demonstrate the general principle.
Merge-Sort is a Divide and Conquer Algorithm.
Arrays are good at dividing in the Middle.
Linked lists are innefficient to divide at the Middle, so instead, let's divide them while we walk through the list.
Take the first element and put it in list 1.
Take the second element and put it in list 2.
Take the third element and put it in list 1.
...
You have now divided the list in half efficiently, and with a bottom-up merge sort you can start the merging steps while you are still walking over the first list dividing it into odds and evens.
You can do merge sort in two ways. First you divide list in two half and then apply merge sort recursively on both parts and merge result. But there is another approach. You can split list into pairs and then merge pair of pairs recursively until you get single list which is result. See for example implementation of Data.List.sort in ghc haskell. This algorithm can be made parallel by spawning processes (or threads) for some appropriate amount of pairs at start and then also for merging their results until there is one.
For an inefficient solution, use the Quicksort algorithm: the first element in the list is used as a pivot, to partition the unsorted list into three (this uses O(n) of time). Then you recursively sort the lower and higher sublists in separate threads. The result is obtained by concatenating the lower sublist with the sublist of keys equal to the pivot and then the upper sublist in O(1) additional steps (instead of the slow merging).
i wanted to include a version that actually handles the parallel work (using native Windows thread-pool).
You don't want to put work into a threads all the way down the dividing recursion tree. You only want to schedule as much work as there are CPUs. This means you have to know how many (logical) CPUs there are. For example, if you had 8 cores, then the first
initial call: 1 thread
first recursion: becomes 2 threads
second recursion: becomes 4 threads
third recursion: becomes 8 threads
forth recursion: perform the work without splitting more threads
Handle this by querying for the number of processors in the system:
Int32 GetNumberOfProcessors()
{
SYSTEM_INFO systemInfo;
GetSystemInfo(ref systemInfo);
return systemInfo.dwNumberOfProcessors;
}
And then we change the recursive MergeSort function to support a numberOfProcessors argument:
public Node MergeSort(Node head)
{
return MergeSort(head, GetNumberOfProcessors());
}
Each time we recurse, we divide numberOfProcessors/2. When the recursive function stops seeing at least two processors available, it stops putting work into the thread pool, and calculates it on the same thread.
Node MergeSort(Node head, Int32 numberOfProcessors)
{
if ((head == null) || (head.Next == null))
return head;
Node firstHalf = head;
Node middle = GetMiddle(head);
Node secondHalf = middle.Next;
middle.Next = null;
//Sort the lower and upper halves
if (numberOfProcessors >= 2)
{
//Throw the work into the thread pool, since we have CPUs left
MergeSortOnTheadPool(ref firstHalf, ref secondHalf, numberOfProcessors / 2);
//i only split this into a separate function to keep
//the code short and easily readable
}
else
{
firstHalf = MergeSort(firstHalf, numberOfProcessors);
secondHalf = MergeSort(secondHalf, numberOfProcessors);
}
//Merge the sorted halves
return Merge(firstHalf, secondHalf);
}
This parallel work could be done using your favorite language-available mechanic. Since the language i actually use (which isn't the C# this code looks to be written in) doesn't support asynnc-await, the Task Parallel Library, or any other language integrated parallel system, we do it the old fashioned way: Event with Interlocked operations. It's a technique i read in an AMD whitepaper once; complete with their tricks to eliminate the subtle race conditions:
void MergeSortOnThreadPool(ref Node listA, ref Node listB)
{
Int32 nActiveThreads = 1; //Yes 1, to stop a race condition
using (Event doneEvent = new Event())
{
//Put everything the thread will need into a holder object
ContextInfo contextA = new Context();
contextA.DoneEvent = doneEvent;
contextA.ActiveThreads = AddressOf(nActiveThreads);
contextA.List = firstHalf;
contextA.NumberOfProcessors = numberOfProcessors/2;
InterlockedIncrement(nActiveThreads);
QueueUserWorkItem(MergeSortThreadProc, contextA);
//Put everything the second thead will need into another holder object
ContextInfo contextB = new Context();
contextB.DoneEvent = doneEvent;
contextB.ActiveThreads = AddressOf(nActiveThreads);
contextB.List = firstHalf;
contextB.NumberOfProcessors = numberOfProcessors/2;
InterlockedIncrement(nActiveThreads);
QueueUserWorkItem(MergeSortThreadProc, contextB);
//wait for the threads to finish
Int32 altDone = InterlockedDecrement(nThreads);
if (altDone > 0) then
doneEvent.WaitFor(INFINITE);
}
listA = contextA.Result; //returned to the caller as ref parameters
listB = contextB.Result;
}
The thread pool thread procedure has to do some housekeeping as well; watching to see if it's the last thread, and setting the event on it's way out:
void MergeSortThreadProc(Pointer Data);
{
Context context = Context(Data);
Node sorted = MergeSort(context.List, context.ProcessorsRemaining);
context.Result = sorted;
//Now do lifetime management
Int32 altDone = InterlockedDecrement(context.ActiveCount^);
if (altDone <= 0)
context.DoneEvent.SetEvent;
}
Note: Any code is released into the public domain. No attribution required.

Return the kth element from the tail (or end) of a singly linked list

[Interview Question]
Write a function that would return the 5th element from the tail (or end) of a singly linked list of integers, in one pass, and then provide a set of test cases against that function.
It is similar to question : How to find nth element from the end of a singly linked list?, but I have an additional requirement that we should traverse the linked list only once.
This is my solution:
struct Lnode
{
int val;
Lnode* next;
Lnode(int val, Lnode* next=NULL) : val(val), next(next) {}
};
Lnode* kthFromTail(Lnode* start, int k)
{
static int count =0;
if(!start)
return NULL;
Lnode* end = kthFromTail(start->next, k);
if(count==k) end = start;
count++;
return end;
}
I'm traversing the linked list only once and using implicit recursion stack. Another way can be to have two pointers : fast and slow and the fast one being k pointers faster than the slow one.Which one seems to be better? I think the solution with two pointers will be complicated with many cases for ex: odd length list, even length list, k > length of list etc.This one employing recursion is clean and covers all such cases.
The 2-pointer solution doesn't fit your requirements as it traverses the list twice.
Yours uses a lot more memory - O(n) to be exact. You're creating a recursion stack equal to the number of items in the list, which is far from ideal.
To find the kth from last item...
A better (single-traversal) solution - Circular buffer:
Uses O(k) extra memory.
Have an array of length k.
For each element, insert at the next position into the array (with wrap-around).
At the end, just return the item at the next position in the array.
2-pointer solution:
Traverses the list twice, but uses only O(1) extra memory.
Start p1 and p2 at the beginning.
Increment p1 k times.
while p1 is not at the end
increment p1 and p2
p2 points to the kth from last element.
'n' is user provided value. eg, 5 from last.
int gap=0 , len=0;
myNode *tempNode;
while (currNode is not NULL)
{
currNode = currNode->next;
gap = gap+1;
if(gap>=n)
tempNode = currNode;
}
return tempNode;

Check if two linked lists merge. If so, where?

This question may be old, but I couldn't think of an answer.
Say, there are two lists of different lengths, merging at a point; how do we know where the merging point is?
Conditions:
We don't know the length
We should parse each list only once.
The following is by far the greatest of all I have seen - O(N), no counters. I got it during an interview to a candidate S.N. at VisionMap.
Make an interating pointer like this: it goes forward every time till the end, and then jumps to the beginning of the opposite list, and so on.
Create two of these, pointing to two heads.
Advance each of the pointers by 1 every time, until they meet. This will happen after either one or two passes.
I still use this question in the interviews - but to see how long it takes someone to understand why this solution works.
Pavel's answer requires modification of the lists as well as iterating each list twice.
Here's a solution that only requires iterating each list twice (the first time to calculate their length; if the length is given you only need to iterate once).
The idea is to ignore the starting entries of the longer list (merge point can't be there), so that the two pointers are an equal distance from the end of the list. Then move them forwards until they merge.
lenA = count(listA) //iterates list A
lenB = count(listB) //iterates list B
ptrA = listA
ptrB = listB
//now we adjust either ptrA or ptrB so that they are equally far from the end
while(lenA > lenB):
ptrA = ptrA->next
lenA--
while(lenB > lenA):
prtB = ptrB->next
lenB--
while(ptrA != NULL):
if (ptrA == ptrB):
return ptrA //found merge point
ptrA = ptrA->next
ptrB = ptrB->next
This is asymptotically the same (linear time) as my other answer but probably has smaller constants, so is probably faster. But I think my other answer is cooler.
If
by "modification is not allowed" it was meant "you may change but in the end they should be restored", and
we could iterate the lists exactly twice
the following algorithm would be the solution.
First, the numbers. Assume the first list is of length a+c and the second one is of length b+c, where c is the length of their common "tail" (after the mergepoint). Let's denote them as follows:
x = a+c
y = b+c
Since we don't know the length, we will calculate x and y without additional iterations; you'll see how.
Then, we iterate each list and reverse them while iterating! If both iterators reach the merge point at the same time, then we find it out by mere comparing. Otherwise, one pointer will reach the merge point before the other one.
After that, when the other iterator reaches the merge point, it won't proceed to the common tail. Instead will go back to the former beginning of the list that had reached merge-point before! So, before it reaches the end of the changed list (i.e. the former beginning of the other list), he will make a+b+1 iterations total. Let's call it z+1.
The pointer that reached the merge-point first, will keep iterating, until reaches the end of the list. The number of iterations it made should be calculated and is equal to x.
Then, this pointer iterates back and reverses the lists again. But now it won't go back to the beginning of the list it originally started from! Instead, it will go to the beginning of the other list! The number of iterations it made should be calculated and equal to y.
So we know the following numbers:
x = a+c
y = b+c
z = a+b
From which we determine that
a = (+x-y+z)/2
b = (-x+y+z)/2
c = (+x+y-z)/2
Which solves the problem.
Well, if you know that they will merge:
Say you start with:
A-->B-->C
|
V
1-->2-->3-->4-->5
1) Go through the first list setting each next pointer to NULL.
Now you have:
A B C
1-->2-->3 4 5
2) Now go through the second list and wait until you see a NULL, that is your merge point.
If you can't be sure that they merge you can use a sentinel value for the pointer value, but that isn't as elegant.
If we could iterate lists exactly twice, than I can provide method for determining merge point:
iterate both lists and calculate lengths A and B
calculate difference of lengths C = |A-B|;
start iterating both list simultaneously, but make additional C steps on list which was greater
this two pointers will meet each other in the merging point
Here's a solution, computationally quick (iterates each list once) but uses a lot of memory:
for each item in list a
push pointer to item onto stack_a
for each item in list b
push pointer to item onto stack_b
while (stack_a top == stack_b top) // where top is the item to be popped next
pop stack_a
pop stack_b
// values at the top of each stack are the items prior to the merged item
You can use a set of Nodes. Iterate through one list and add each Node to the set. Then iterate through the second list and for every iteration, check if the Node exists in the set. If it does, you've found your merge point :)
This arguably violates the "parse each list only once" condition, but implement the tortoise and hare algorithm (used to find the merge point and cycle length of a cyclic list) so you start at List A, and when you reach the NULL at the end you pretend it's a pointer to the beginning of list B, thus creating the appearance of a cyclic list. The algorithm will then tell you exactly how far down List A the merge is (the variable 'mu' according to the Wikipedia description).
Also, the "lambda" value tells you the length of list B, and if you want, you can work out the length of list A during the algorithm (when you redirect the NULL link).
Maybe I am over simplifying this, but simply iterate the smallest list and use the last nodes Link as the merging point?
So, where Data->Link->Link == NULL is the end point, giving Data->Link as the merging point (at the end of the list).
EDIT:
Okay, from the picture you posted, you parse the two lists, the smallest first. With the smallest list you can maintain the references to the following node. Now, when you parse the second list you do a comparison on the reference to find where Reference [i] is the reference at LinkedList[i]->Link. This will give the merge point. Time to explain with pictures (superimpose the values on the picture the OP).
You have a linked list (references shown below):
A->B->C->D->E
You have a second linked list:
1->2->
With the merged list, the references would then go as follows:
1->2->D->E->
Therefore, you map the first "smaller" list (as the merged list, which is what we are counting has a length of 4 and the main list 5)
Loop through the first list, maintain a reference of references.
The list will contain the following references Pointers { 1, 2, D, E }.
We now go through the second list:
-> A - Contains reference in Pointers? No, move on
-> B - Contains reference in Pointers? No, move on
-> C - Contains reference in Pointers? No, move on
-> D - Contains reference in Pointers? Yes, merge point found, break.
Sure, you maintain a new list of pointers, but thats not outside the specification. However the first list is parsed exactly once, and the second list will only be fully parsed if there is no merge point. Otherwise, it will end sooner (at the merge point).
I have tested a merge case on my FC9 x86_64, and print every node address as shown below:
Head A 0x7fffb2f3c4b0
0x214f010
0x214f030
0x214f050
0x214f070
0x214f090
0x214f0f0
0x214f110
0x214f130
0x214f150
0x214f170
Head B 0x7fffb2f3c4a0
0x214f0b0
0x214f0d0
0x214f0f0
0x214f110
0x214f130
0x214f150
0x214f170
Note becase I had aligned the node structure, so when malloc() a node, the address is aligned w/ 16 bytes, see the least 4 bits.
The least bits are 0s, i.e., 0x0 or 000b.
So if your are in the same special case (aligned node address) too, you can use these least 4 bits.
For example when travel both lists from head to tail, set 1 or 2 of the 4 bits of the visiting node address, that is, set a flag;
next_node = node->next;
node = (struct node*)((unsigned long)node | 0x1UL);
Note above flags won't affect the real node address but only your SAVED node pointer value.
Once found somebody had set the flag bit(s), then the first found node should be the merge point.
after done, you'd restore the node address by clear the flag bits you had set. while an important thing is that you should be careful when iterate (e.g. node = node->next) to do clean. remember you had set flag bits, so do this way
real_node = (struct node*)((unsigned long)node) & ~0x1UL);
real_node = real_node->next;
node = real_node;
Because this proposal will restore the modified node addresses, it could be considered as "no modification".
There can be a simple solution but will require an auxilary space. The idea is to traverse a list and store each address in a hash map, now traverse the other list and match if the address lies in the hash map or not. Each list is traversed only once. There's no modification to any list. Length is still unknown. Auxiliary space used: O(n) where 'n' is the length of first list traversed.
this solution iterates each list only once...no modification of list required too..though you may complain about space..
1) Basically you iterate in list1 and store the address of each node in an array(which stores unsigned int value)
2) Then you iterate list2, and for each node's address ---> you search through the array that you find a match or not...if you do then this is the merging node
//pseudocode
//for the first list
p1=list1;
unsigned int addr[];//to store addresses
i=0;
while(p1!=null){
addr[i]=&p1;
p1=p1->next;
}
int len=sizeof(addr)/sizeof(int);//calculates length of array addr
//for the second list
p2=list2;
while(p2!=null){
if(search(addr[],len,&p2)==1)//match found
{
//this is the merging node
return (p2);
}
p2=p2->next;
}
int search(addr,len,p2){
i=0;
while(i<len){
if(addr[i]==p2)
return 1;
i++;
}
return 0;
}
Hope it is a valid solution...
There is no need to modify any list. There is a solution in which we only have to traverse each list once.
Create two stacks, lets say stck1 and stck2.
Traverse 1st list and push a copy of each node you traverse in stck1.
Same as step two but this time traverse 2nd list and push the copy of nodes in stck2.
Now, pop from both stacks and check whether the two nodes are equal, if yes then keep a reference to them. If no, then previous nodes which were equal are actually the merge point we were looking for.
int FindMergeNode(Node headA, Node headB) {
Node currentA = headA;
Node currentB = headB;
// Do till the two nodes are the same
while (currentA != currentB) {
// If you reached the end of one list start at the beginning of the other
// one currentA
if (currentA.next == null) {
currentA = headA;
} else {
currentA = currentA.next;
}
// currentB
if (currentB.next == null) {
currentB = headB;
} else {
currentB = currentB.next;
}
}
return currentB.data;
}
We can use two pointers and move in a fashion such that if one of the pointers is null we point it to the head of the other list and same for the other, this way if the list lengths are different they will meet in the second pass.
If length of list1 is n and list2 is m, their difference is d=abs(n-m). They will cover this distance and meet at the merge point.
Code:
int findMergeNode(SinglyLinkedListNode* head1, SinglyLinkedListNode* head2) {
SinglyLinkedListNode* start1=head1;
SinglyLinkedListNode* start2=head2;
while (start1!=start2){
start1=start1->next;
start2=start2->next;
if (!start1)
start1=head2;
if (!start2)
start2=head1;
}
return start1->data;
}
Here is naive solution , No neeed to traverse whole lists.
if your structured node has three fields like
struct node {
int data;
int flag; //initially set the flag to zero for all nodes
struct node *next;
};
say you have two heads (head1 and head2) pointing to head of two lists.
Traverse both the list at same pace and put the flag =1(visited flag) for that node ,
if (node->next->field==1)//possibly longer list will have this opportunity
//this will be your required node.
How about this:
If you are only allowed to traverse each list only once, you can create a new node, traverse the first list to have every node point to this new node, and traverse the second list to see if any node is pointing to your new node (that's your merge point). If the second traversal doesn't lead to your new node then the original lists don't have a merge point.
If you are allowed to traverse the lists more than once, then you can traverse each list to find our their lengths and if they are different, omit the "extra" nodes at the beginning of the longer list. Then just traverse both lists one step at a time and find the first merging node.
Steps in Java:
Create a map.
Start traversing in the both branches of list and Put all traversed nodes of list into the Map using some unique thing related to Nodes(say node Id) as Key and put Values as 1 in the starting for all.
When ever first duplicate key comes, increment the value for that Key (let say now its value became 2 which is > 1.
Get the Key where the value is greater than 1 and that should be the node where two lists are merging.
We can efficiently solve it by introducing "isVisited" field. Traverse first list and set "isVisited" value to "true" for all nodes till end. Now start from second and find first node where flag is true and Boom ,its your merging point.
Step 1: find lenght of both the list
Step 2 : Find the diff and move the biggest list with the difference
Step 3 : Now both list will be in similar position.
Step 4 : Iterate through list to find the merge point
//Psuedocode
def findmergepoint(list1, list2):
lendiff = list1.length() > list2.length() : list1.length() - list2.length() ? list2.lenght()-list1.lenght()
biggerlist = list1.length() > list2.length() : list1 ? list2 # list with biggest length
smallerlist = list1.length() < list2.length() : list2 ? list1 # list with smallest length
# move the biggest length to the diff position to level both the list at the same position
for i in range(0,lendiff-1):
biggerlist = biggerlist.next
#Looped only once.
while ( biggerlist is not None and smallerlist is not None ):
if biggerlist == smallerlist :
return biggerlist #point of intersection
return None // No intersection found
int FindMergeNode(Node *headA, Node *headB)
{
Node *tempB=new Node;
tempB=headB;
while(headA->next!=NULL)
{
while(tempB->next!=NULL)
{
if(tempB==headA)
return tempB->data;
tempB=tempB->next;
}
headA=headA->next;
tempB=headB;
}
return headA->data;
}
Use Map or Dictionary to store the addressess vs value of node. if the address alread exists in the Map/Dictionary then the value of the key is the answer.
I did this:
int FindMergeNode(Node headA, Node headB) {
Map<Object, Integer> map = new HashMap<Object, Integer>();
while(headA != null || headB != null)
{
if(headA != null && map.containsKey(headA.next))
{
return map.get(headA.next);
}
if(headA != null && headA.next != null)
{
map.put(headA.next, headA.next.data);
headA = headA.next;
}
if(headB != null && map.containsKey(headB.next))
{
return map.get(headB.next);
}
if(headB != null && headB.next != null)
{
map.put(headB.next, headB.next.data);
headB = headB.next;
}
}
return 0;
}
A O(n) complexity solution. But based on an assumption.
assumption is: both nodes are having only positive integers.
logic : make all the integer of list1 to negative. Then walk through the list2, till you get a negative integer. Once found => take it, change the sign back to positive and return.
static int findMergeNode(SinglyLinkedListNode head1, SinglyLinkedListNode head2) {
SinglyLinkedListNode current = head1; //head1 is give to be not null.
//mark all head1 nodes as negative
while(true){
current.data = -current.data;
current = current.next;
if(current==null) break;
}
current=head2; //given as not null
while(true){
if(current.data<0) return -current.data;
current = current.next;
}
}
You can add the nodes of list1 to a hashset and the loop through the second and if any node of list2 is already present in the set .If yes, then thats the merge node
static int findMergeNode(SinglyLinkedListNode head1, SinglyLinkedListNode head2) {
HashSet<SinglyLinkedListNode> set=new HashSet<SinglyLinkedListNode>();
while(head1!=null)
{
set.add(head1);
head1=head1.next;
}
while(head2!=null){
if(set.contains(head2){
return head2.data;
}
}
return -1;
}
Solution using javascript
var getIntersectionNode = function(headA, headB) {
if(headA == null || headB == null) return null;
let countA = listCount(headA);
let countB = listCount(headB);
let diff = 0;
if(countA > countB) {
diff = countA - countB;
for(let i = 0; i < diff; i++) {
headA = headA.next;
}
} else if(countA < countB) {
diff = countB - countA;
for(let i = 0; i < diff; i++) {
headB = headB.next;
}
}
return getIntersectValue(headA, headB);
};
function listCount(head) {
let count = 0;
while(head) {
count++;
head = head.next;
}
return count;
}
function getIntersectValue(headA, headB) {
while(headA && headB) {
if(headA === headB) {
return headA;
}
headA = headA.next;
headB = headB.next;
}
return null;
}
If editing the linked list is allowed,
Then just make the next node pointers of all the nodes of list 2 as null.
Find the data value of the last node of the list 1.
This will give you the intersecting node in single traversal of both the lists, with "no hi fi logic".
Follow the simple logic to solve this problem:
Since both pointer A and B are traveling with same speed. To meet both at the same point they must be cover the same distance. and we can achieve this by adding the length of a list to another.

Resources