Algorithm to match roots between two string lists

Algorithm to match roots between two string lists - algorithm

The problem:
I am using a watch service to monitor a directory for input so I can fire an event once I have two (semi)matching input files. The problem I have is: If I have two lists, each containing strings that may differ how can I find matching roots between lists as they occur.
The filename structure looks like this:
<companyname>-<ordernum><postfix>.csv
so for example:
list1 could contain:
mycomp-1234.csv
mycomp-4567.csv
newcomp-7891.csv
oldcomp-3376.csv
list2 could contain:
mycomp-2232_items.csv
newcomp-13123_items.csv
oldcomp-87078777_items.csv
mycomp-1234_items.csv
I want to find, and fire the event as soon as a match occurs between lists. A match being any filename, less the suffix. i.e. mycomp-1234 would return a match for both lists.
What I'm looking for
I'm looking to find the most efficient manner to do this. I know I can iterate over each list comparing values, but I am sure there is a more efficient way to do this.
I do not need code, I'd rather learn this by myself, so a push in the right direction is perfect. If your fingers make you write code, please write pseudo code so it can benefit as many languages as possible.
And no, this is not homework. For those of you intensely curious folk this is to perform EDI transformations from csv to X12 EDI files.

Sort the lists alphabetically then compare the values and step forward in the list that has the smaller value. If the lists have any elements in common the values will match.

A side by side comparison of two sorted lists.
Collections.sort(list1);
Collections.sort(list2);
int i1 = 0;
int i2 = 0;
while (i1 < list1.size() && i2 < list.size()) {
String name1 = list1.get(i1);
String name2 = list2.get(i2);
String[] parts1 = name1.split("[-_.]");
String[] parts2 = name2.split("[-_.]");
if (parts1.length < 3) {
++i1;
continue;
}
if (parts2.length < 3) {
++i2;
continue;
}
int cmp = parts1[0].compareTo(parts1[0]);
if (cmp == 0) {
cmp = parts1[1].compareTo(parts1[1]);
}
if (cmp < 0) {
++i1;
continue
}
if (cmp > 0) {
++i2;
continue
}
// Found match:
...
++i1;
++i2;
}

An online method: Maintain a binary search tree containing all the current filenames. Use as keys the relevant bits of filenames. For example, the key for either newcomp-7891.csv or newcomp-7891_items is newcomp-7891. Each time the watch service reports a directory event, you can delete disused names and can attempt to add new names to the tree. If a key already is in the tree, fire your desired event.
A hash table can be used similarly, if the hash implementation supports deletion of keys when filenames are removed.
The question asks for “the most efficient manner to do this”. Note that this method is far more efficient than sorting the lists from scratch each time a directory event occurs. At an event with k additions and deletions, it uses O(k·lg n) time if the dataset has n entries, so over a period of time where the average tree size is n and m additions/deletions occur, in u directory events, it will do O(m·lg n) work. By contrast, the sort-each-time methods suggested in other answers will do O(u·n·lg n) work, which is much more.

Related

abstract inplace mergesort for effective merge sort

I am reading about merge sort in Algorithms in C++ by Robert Sedgewick and have following questions.
static void mergeAB(ITEM[] c, int cl, ITEM[] a, int al, int ar, ITEM[] b, int bl, int br )
{
int i = al, j = bl;
for (int k = cl; k < cl+ar-al+br-bl+1; k++)
{
if (i > ar) { c[k] = b[j++]; continue; }
if (j > br) { c[k] = a[i++]; continue; }
c[k] = less(a[i], b[j]) ? a[i++] : b[j++];
}
}
The characteristic of the basic merge that is worthy of note is that
the inner loop includes two tests to determine whether the ends of the
two input arrays have been reached. Of course, these two tests usually
fail, and the situation thus cries out for the use of sentinel keys to
allow the tests to be removed. That is, if elements with a key value
larger than those of all the other keys are added to the ends of the a
and aux arrays, the tests can be removed, because when the a (b) array
is exhausted, the sentinel causes the next elements for the c array to
be taken from the b (a) array until the merge is complete.
However, it is not always easy to use sentinels, either because it
might not be easy to know the largest key value or because space might
not be available conveniently.
For merging, there is a simple remedy. The method is based on the
following idea: Given that we are resigned to copying the arrays to
implement the in-place abstraction, we simply put the second array in
reverse order when it is copied (at no extra cost), so that its
associated index moves from right to left. This arrangement leads to
the largest element—in whichever array it is—serving as sentinel for
the other array.
My questions on above text
What does statement "when the a (b) array is exhausted"? what is 'a (b)' here?
Why is the author mentioning that it is not easy to determine the largest key and how is the space related in determining largest key?
What does author mean by "Given that we are resigned to copying the arrays"? What is resigned in this context?
Request with simple example in understanding idea which is mentioned as simple remedy?

"When the a (b) array is exhausted" is a shorthand for "When either the a array or the b array is exhausted".
The interface is dealing with sub-arrays of a bigger array, so you can't simply go writing beyond the ends of the arrays.
The code copies the data from two arrays into one other array. Since this copy is inevitable, we are 'resigned to copying the arrays' means we reluctantly accept that it is inevitable that the arrays must be copied.
Tricky...that's going to take some time to work out what is meant.
Tangentially: That's probably not the way I'd write the loop. I'd be inclined to use:
int i = al, j = bl;
for (int k = cl; i <= ar && j <= br; k++)
{
if (a[i] < b[j])
c[k] = a[i++];
else
c[k] = b[j++];
}
while (i <= ar)
c[k++] = a[i++];
while (j <= br)
c[k++] = b[j++];
One of the two trailing loops does nothing. The revised main merge loop has 3 tests per iteration versus 4 tests per iteration for the one original algorithm. I've not formally measured it, but the simpler merge loop is likely to be quicker than the original single-loop algorithm.
The first three questions are almost best suited for English Language Learners.

a(b) and b(a)
Sometimes parenthesis are used to tell one or more similar phrases at once:
when a (b) is exhausted we copy elements from b (a)
means:
when a is exhausted we copy elements from b,
when b is exhausted we copy elements from a
What is difficult about sentinels
Two annoying things about sentinels are
sometimes your array data may potentially contain every possible value, so there is no value you can use as sentinel that is guaranteed to be bigger that all the values in the array
to use a sentinel instead of checking the index to see if you are done with an array requires that you have room for one extra space in the array to store the sentinel
Resigning
We programmers are never happy to copy (or move) things around and leaving them where they already are is, if possible, better (because we are lazy).
In this version of the merge sort we already gave up about trying to not copy things around... we resigned to it.
Given that we must copy, we can copy things in the opposite order if we like (and of course use the copy in opposite order) because that is free(*).
(*) is free at this level of abstraction, the cost on some real CPU may be high. As almost always in the performance area YMMV.

Algorithm to find duplicate in an array

I have an assignment to create an algorithm to find duplicates in an array which includes number values. but it has not said which kind of numbers, integers or floats. I have written the following pseudocode:
FindingDuplicateAlgorithm(A) // A is the array
mergeSort(A);
for int i <- 0 to i<A.length
if A[i] == A[i+1]
i++
return A[i]
else
i++
have I created an efficient algorithm?
I think there is a problem in my algorithm, it returns duplicate numbers several time. for example if array include 2 in two for two indexes i will have ...2, 2,... in the output. how can i change it to return each duplicat only one time?
I think it is a good algorithm for integers, but does it work good for float numbers too?

To handle duplicates, you can do the following:
if A[i] == A[i+1]:
result.append(A[i]) # collect found duplicates in a list
while A[i] == A[i+1]: # skip the entire range of duplicates
i++ # until a new value is found

Do you want to find Duplicates in Java?
You may use a HashSet.
HashSet h = new HashSet();
for(Object a:A){
boolean b = h.add(a);
boolean duplicate = !b;
if(duplicate)
// do something with a;
}
The return-Value of add() is defined as:
true if the set did not already
contain the specified element.
EDIT:
I know HashSet is optimized for inserts and contains operations. But I'm not sure if its fast enough for your concerns.
EDIT2:
I've seen you recently added the homework-tag. I would not prefer my answer if itf homework, because it may be to "high-level" for an allgorithm-lesson
http://download.oracle.com/javase/1.4.2/docs/api/java/util/HashSet.html#add%28java.lang.Object%29

Your answer seems pretty good. First sorting and them simply checking neighboring values gives you O(n log(n)) complexity which is quite efficient.
Merge sort is O(n log(n)) while checking neighboring values is simply O(n).
One thing though (as mentioned in one of the comments) you are going to get a stack overflow (lol) with your pseudocode. The inner loop should be (in Java):
for (int i = 0; i < array.length - 1; i++) {
...
}
Then also, if you actually want to display which numbers (and or indexes) are the duplicates, you will need to store them in a separate list.

I'm not sure what language you need to write the algorithm in, but there are some really good C++ solutions in response to my question here. Should be of use to you.

O(n) algorithm: traverse the array and try to input each element in a hashtable/set with number as the hash key. if you cannot enter, than that's a duplicate.

Your algorithm contains a buffer overrun. i starts with 0, so I assume the indexes into array A are zero-based, i.e. the first element is A[0], the last is A[A.length-1]. Now i counts up to A.length-1, and in the loop body accesses A[i+1], which is out of the array for the last iteration. Or, simply put: If you're comparing each element with the next element, you can only do length-1 comparisons.
If you only want to report duplicates once, I'd use a bool variable firstDuplicate, that's set to false when you find a duplicate and true when the number is different from the next. Then you'd only report the first duplicate by only reporting the duplicate numbers if firstDuplicate is true.

public void printDuplicates(int[] inputArray) {
if (inputArray == null) {
throw new IllegalArgumentException("Input array can not be null");
}
int length = inputArray.length;
if (length == 1) {
System.out.print(inputArray[0] + " ");
return;
}
for (int i = 0; i < length; i++) {
if (inputArray[Math.abs(inputArray[i])] >= 0) {
inputArray[Math.abs(inputArray[i])] = -inputArray[Math.abs(inputArray[i])];
} else {
System.out.print(Math.abs(inputArray[i]) + " ");
}
}
}

An efficient way to find matching items in N lists?

Given a number of lists of items, find the lists with matching items.
The brute force pseudo-code for this problem looks like:
foreach list L
foreach item I in list L
foreach list L2 such that L2 != L
for each item I2 in L2
if I == I2
return new 3-tuple(L, L2, I) //not important for the algorithm
I can think of a number of different ways of going about this - creating a list of lists and removing each candidate list after searching the others for example - but I'm wondering if there is a better algorithm for this?
I'm using Java, if that makes a difference to your implementation.
Thanks

Create a Map<Item,List<List>>.
Iterate through every item in every list.
each time you touch an item, add the current list to that item's entry in the Map.
You now have a Map entry for each item that tells you what lists that item appears in.
This algorithm is about O(N) where N is the number of lists (the exact complexity will be affected by how good your Map implementation is). I believe your algorithm was at least O(N^2).
Caveat: I am comparing number of comparisons, not memory use. If your lists are super huge and full of mostly non duplicated items, the map that my method creates might become too big.

As per your comment you want a MultiMap implementation. A multimap is like a Map but it can map each key to multiple values. Store the value and a reference to all the maps that contain that value.
Map<Object, List>
of course you should use a type safe instead of Object and a type safe List as the value. What you are trying to do is called an Inverted Index.

I'll start with the assumption that the datasets can fit in memory. If not, then you will need something fancier.
I refer below to a "set", where I am thinking of something like a C++ std::set. I don't know the Java equivalent, but any storage scheme that permits rapid lookup (tree, hash table, whatever).
Comparing three lists: L0, L1 and L2.
Read L0, placing each element in a set: S0.
Read L1, placing items that match an element of S0 into a new set: S1, and discarding others.
Discard S0.
Read L2, keeping items that match an element of S1 and discarding others.
Update
Just realised that the question was for "n" lists, not three. However the extension should be obvious. (I hope)
Update 2
Some untested C++ code to illustrate the algorithm
#include <string>
#include <vector>
#include <set>
#include <cassert>
typedef std::vector<std::string> strlist_t;
strlist_t GetMatches(std::vector<strlist_t> vLists)
{
assert(vLists.size() > 1);
std::set<std::string> s0, s1;
std::set<std::string> *pOld = &s1;
std::set<std::string> *pNew = &s0;
// unconditionally load first list as "new"
s0.insert(vLists[0].begin(), vLists[0].end());
for (size_t i=1; i<vLists.size(); ++i)
{
//swap recently read "new" to "old" now for comparison with new list
std::swap(pOld, pNew);
pNew->clear();
// only keep new elements if they are matched in old list
for (size_t j=0; j<vLists[i].size(); ++j)
{
if (pOld->end() != pOld->find(vLists[i][j]))
{
// found match
pNew->insert(vLists[i][j]);
}
}
}
return strlist_t(pNew->begin(), pNew->end());
}

You can use a trie, modified to record what lists each node belongs to.

C/C++/Java/C#: help parsing numbers

I've got a real problem (it's not homework, you can check my profile). I need to parse data whose formatting is not under my control.
The data look like this:
6,852:6,100,752
So there's first a number made of up to 9 digits, followed by a colon.
Then I know for sure that, after the colon:
there's at least one valid combination of numbers that add up to the number before the column
I know exactly how many numbers add up to the number before the colon (two in this case, but it can go as high as ten numbers)
In this case, 6852 is 6100 + 752.
My problem: I need to find these numbers (in this example, 6100 + 752).
It is unfortunate that in the data I'm forced to parse, the separator between the numbers (the comma) is also the separator used inside the number themselves (6100 is written as 6,100).
Once again: that unfortunate formatting is not under my control and, once again, this is not homework.
I need to solve this for up to 10 numbers that need to add up.
Here's an example with three numbers adding up to 6855:
6,855:360,6,175,320
I fear that there are cases where there would be two possible different solutions. HOWEVER if I get a solution that works "in most cases" it would be enough.
How do you typically solve such a problem in a C-style bracket language?

Well, I would start with the brute force approach and then apply some heuristics to prune the search space. Just split the list on the right by commas and iterate over all possible ways to group them into n terms (where n is the number of terms in the solution). You can use the following two rules to skip over invalid possibilities.
(1) You know that any group of 1 or 2 digits must begin a term.
(2) You know that no candidate term in your comma delimited list can be greater than the total on the left. (This also tells you the maximum number of digit groups that any candidate term can have.)

Recursive implementation (pseudo code):
int total; // The total read before the colon
// Takes the list of tokens as integers after the colon
// tokens is the set of tokens left to analyse,
// partialList is the partial list of numbers built so far
// sum is the sum of numbers in partialList
// Aggregate takes 2 ints XXX and YYY and returns XXX,YYY (= XXX*1000+YYY)
function getNumbers(tokens, sum, partialList) =
if isEmpty(tokens)
if sum = total return partialList
else return null // Got to the end with the wrong sum
var result1 = getNumbers(tokens[1:end], sum+token[0], Add(partialList, tokens[0]))
var result2 = getNumbers(tokens[2:end], sum+Aggregate(token[0], token[1]), Append(partialList, Aggregate(tokens[0], tokens[1])))
if result1 <> null return result1
if result2 <> null return result2
return null // No solution overall
You can do a lot better from different points of view, like tail recursion, pruning (you can have XXX,YYY only if YYY has 3 digits)... but this may work well enough for your app.
Divide-and-conquer would make for a nice improvement.

I think you should try all possible ways to parse the string and calculate the sum and return a list of those results that give the correct sum. This should be only one result in most cases unless you are very unlucky.
One thing to note that reduces the number of possibilities is that there is only an ambiguity if you have aa,bbb and bbb is exactly 3 digits. If you have aa,bb there is only one way to parse it.

Reading in C++:
std::pair<int,std::vector<int> > read_numbers(std::istream& is)
{
std::pair<int,std::vector<int> > result;
if(!is >> result.first) throw "foo!"
for(;;) {
int i;
if(!is >> i)
if(is.eof()) return result;
else throw "bar!";
result.second.push_back(i);
char ch;
if(is >> ch)
if(ch != ',') throw "foobar!";
is >> std::ws;
}
}
void f()
{
std::istringstream iss("6,852:6,100,752");
std::pair<int,std::vector<int> > foo = read_numbers(iss);
std::vector<int> result = get_winning_combination( foo.first
, foo.second.begin()
, foo.second.end() );
for( std::vector<int>::const_iterator i=result.begin(); i!=result.end(), ++i)
std::cout << *i << " ";
}
The actual cracking of the numbers is left as an exercise to the reader. :)

I think your main problem is deciding how to actually parse the numbers. The rest is just rote work with strings->numbers and iteration over combinations.
For instance, in the examples you gave, you could heuristically decide that a single-digit number followed by a three-digit number is, in fact, a four-digit number. Does a heuristic such as this hold true over a larger dataset? If not, you're also likely to have to iterate over the possible input parsing combinations, which means the naive solution is going to have a big polynomic complexity (O(nx), where x is >4).
Actually checking for which numbers add up is easy to do using a recursive search.
List<int> GetSummands(int total, int numberOfElements, IEnumerable<int> values)
{
if (numberOfElements == 0)
{
if (total == 0)
return new List<int>(); // Empty list.
else
return null; // Indicate no solution.
}
else if (total < 0)
{
return null; // Indicate no solution.
}
else
{
for (int i = 0; i < values.Count; ++i)
{
List<int> summands = GetSummands(
total - values[i], numberOfElements - 1, values.Skip(i + 1));
if (summands != null)
{
// Found solution.
summands.Add(values[i]);
return summands;
}
}
}
}

Check if two linked lists merge. If so, where?

This question may be old, but I couldn't think of an answer.
Say, there are two lists of different lengths, merging at a point; how do we know where the merging point is?
Conditions:
We don't know the length
We should parse each list only once.

The following is by far the greatest of all I have seen - O(N), no counters. I got it during an interview to a candidate S.N. at VisionMap.
Make an interating pointer like this: it goes forward every time till the end, and then jumps to the beginning of the opposite list, and so on.
Create two of these, pointing to two heads.
Advance each of the pointers by 1 every time, until they meet. This will happen after either one or two passes.
I still use this question in the interviews - but to see how long it takes someone to understand why this solution works.

Pavel's answer requires modification of the lists as well as iterating each list twice.
Here's a solution that only requires iterating each list twice (the first time to calculate their length; if the length is given you only need to iterate once).
The idea is to ignore the starting entries of the longer list (merge point can't be there), so that the two pointers are an equal distance from the end of the list. Then move them forwards until they merge.
lenA = count(listA) //iterates list A
lenB = count(listB) //iterates list B
ptrA = listA
ptrB = listB
//now we adjust either ptrA or ptrB so that they are equally far from the end
while(lenA > lenB):
ptrA = ptrA->next
lenA--
while(lenB > lenA):
prtB = ptrB->next
lenB--
while(ptrA != NULL):
if (ptrA == ptrB):
return ptrA //found merge point
ptrA = ptrA->next
ptrB = ptrB->next
This is asymptotically the same (linear time) as my other answer but probably has smaller constants, so is probably faster. But I think my other answer is cooler.

If
by "modification is not allowed" it was meant "you may change but in the end they should be restored", and
we could iterate the lists exactly twice
the following algorithm would be the solution.
First, the numbers. Assume the first list is of length a+c and the second one is of length b+c, where c is the length of their common "tail" (after the mergepoint). Let's denote them as follows:
x = a+c
y = b+c
Since we don't know the length, we will calculate x and y without additional iterations; you'll see how.
Then, we iterate each list and reverse them while iterating! If both iterators reach the merge point at the same time, then we find it out by mere comparing. Otherwise, one pointer will reach the merge point before the other one.
After that, when the other iterator reaches the merge point, it won't proceed to the common tail. Instead will go back to the former beginning of the list that had reached merge-point before! So, before it reaches the end of the changed list (i.e. the former beginning of the other list), he will make a+b+1 iterations total. Let's call it z+1.
The pointer that reached the merge-point first, will keep iterating, until reaches the end of the list. The number of iterations it made should be calculated and is equal to x.
Then, this pointer iterates back and reverses the lists again. But now it won't go back to the beginning of the list it originally started from! Instead, it will go to the beginning of the other list! The number of iterations it made should be calculated and equal to y.
So we know the following numbers:
x = a+c
y = b+c
z = a+b
From which we determine that
a = (+x-y+z)/2
b = (-x+y+z)/2
c = (+x+y-z)/2
Which solves the problem.

Well, if you know that they will merge:
Say you start with:
A-->B-->C
|
V
1-->2-->3-->4-->5
1) Go through the first list setting each next pointer to NULL.
Now you have:
A B C
1-->2-->3 4 5
2) Now go through the second list and wait until you see a NULL, that is your merge point.
If you can't be sure that they merge you can use a sentinel value for the pointer value, but that isn't as elegant.

If we could iterate lists exactly twice, than I can provide method for determining merge point:
iterate both lists and calculate lengths A and B
calculate difference of lengths C = |A-B|;
start iterating both list simultaneously, but make additional C steps on list which was greater
this two pointers will meet each other in the merging point

Here's a solution, computationally quick (iterates each list once) but uses a lot of memory:
for each item in list a
push pointer to item onto stack_a
for each item in list b
push pointer to item onto stack_b
while (stack_a top == stack_b top) // where top is the item to be popped next
pop stack_a
pop stack_b
// values at the top of each stack are the items prior to the merged item

You can use a set of Nodes. Iterate through one list and add each Node to the set. Then iterate through the second list and for every iteration, check if the Node exists in the set. If it does, you've found your merge point :)

This arguably violates the "parse each list only once" condition, but implement the tortoise and hare algorithm (used to find the merge point and cycle length of a cyclic list) so you start at List A, and when you reach the NULL at the end you pretend it's a pointer to the beginning of list B, thus creating the appearance of a cyclic list. The algorithm will then tell you exactly how far down List A the merge is (the variable 'mu' according to the Wikipedia description).
Also, the "lambda" value tells you the length of list B, and if you want, you can work out the length of list A during the algorithm (when you redirect the NULL link).

Maybe I am over simplifying this, but simply iterate the smallest list and use the last nodes Link as the merging point?
So, where Data->Link->Link == NULL is the end point, giving Data->Link as the merging point (at the end of the list).
EDIT:
Okay, from the picture you posted, you parse the two lists, the smallest first. With the smallest list you can maintain the references to the following node. Now, when you parse the second list you do a comparison on the reference to find where Reference [i] is the reference at LinkedList[i]->Link. This will give the merge point. Time to explain with pictures (superimpose the values on the picture the OP).
You have a linked list (references shown below):
A->B->C->D->E
You have a second linked list:
1->2->
With the merged list, the references would then go as follows:
1->2->D->E->
Therefore, you map the first "smaller" list (as the merged list, which is what we are counting has a length of 4 and the main list 5)
Loop through the first list, maintain a reference of references.
The list will contain the following references Pointers { 1, 2, D, E }.
We now go through the second list:
-> A - Contains reference in Pointers? No, move on
-> B - Contains reference in Pointers? No, move on
-> C - Contains reference in Pointers? No, move on
-> D - Contains reference in Pointers? Yes, merge point found, break.
Sure, you maintain a new list of pointers, but thats not outside the specification. However the first list is parsed exactly once, and the second list will only be fully parsed if there is no merge point. Otherwise, it will end sooner (at the merge point).

I have tested a merge case on my FC9 x86_64, and print every node address as shown below:
Head A 0x7fffb2f3c4b0
0x214f010
0x214f030
0x214f050
0x214f070
0x214f090
0x214f0f0
0x214f110
0x214f130
0x214f150
0x214f170
Head B 0x7fffb2f3c4a0
0x214f0b0
0x214f0d0
0x214f0f0
0x214f110
0x214f130
0x214f150
0x214f170
Note becase I had aligned the node structure, so when malloc() a node, the address is aligned w/ 16 bytes, see the least 4 bits.
The least bits are 0s, i.e., 0x0 or 000b.
So if your are in the same special case (aligned node address) too, you can use these least 4 bits.
For example when travel both lists from head to tail, set 1 or 2 of the 4 bits of the visiting node address, that is, set a flag;
next_node = node->next;
node = (struct node*)((unsigned long)node | 0x1UL);
Note above flags won't affect the real node address but only your SAVED node pointer value.
Once found somebody had set the flag bit(s), then the first found node should be the merge point.
after done, you'd restore the node address by clear the flag bits you had set. while an important thing is that you should be careful when iterate (e.g. node = node->next) to do clean. remember you had set flag bits, so do this way
real_node = (struct node*)((unsigned long)node) & ~0x1UL);
real_node = real_node->next;
node = real_node;
Because this proposal will restore the modified node addresses, it could be considered as "no modification".

There can be a simple solution but will require an auxilary space. The idea is to traverse a list and store each address in a hash map, now traverse the other list and match if the address lies in the hash map or not. Each list is traversed only once. There's no modification to any list. Length is still unknown. Auxiliary space used: O(n) where 'n' is the length of first list traversed.

this solution iterates each list only once...no modification of list required too..though you may complain about space..
1) Basically you iterate in list1 and store the address of each node in an array(which stores unsigned int value)
2) Then you iterate list2, and for each node's address ---> you search through the array that you find a match or not...if you do then this is the merging node
//pseudocode
//for the first list
p1=list1;
unsigned int addr[];//to store addresses
i=0;
while(p1!=null){
addr[i]=&p1;
p1=p1->next;
}
int len=sizeof(addr)/sizeof(int);//calculates length of array addr
//for the second list
p2=list2;
while(p2!=null){
if(search(addr[],len,&p2)==1)//match found
{
//this is the merging node
return (p2);
}
p2=p2->next;
}
int search(addr,len,p2){
i=0;
while(i<len){
if(addr[i]==p2)
return 1;
i++;
}
return 0;
}
Hope it is a valid solution...

There is no need to modify any list. There is a solution in which we only have to traverse each list once.
Create two stacks, lets say stck1 and stck2.
Traverse 1st list and push a copy of each node you traverse in stck1.
Same as step two but this time traverse 2nd list and push the copy of nodes in stck2.
Now, pop from both stacks and check whether the two nodes are equal, if yes then keep a reference to them. If no, then previous nodes which were equal are actually the merge point we were looking for.

int FindMergeNode(Node headA, Node headB) {
Node currentA = headA;
Node currentB = headB;
// Do till the two nodes are the same
while (currentA != currentB) {
// If you reached the end of one list start at the beginning of the other
// one currentA
if (currentA.next == null) {
currentA = headA;
} else {
currentA = currentA.next;
}
// currentB
if (currentB.next == null) {
currentB = headB;
} else {
currentB = currentB.next;
}
}
return currentB.data;
}

We can use two pointers and move in a fashion such that if one of the pointers is null we point it to the head of the other list and same for the other, this way if the list lengths are different they will meet in the second pass.
If length of list1 is n and list2 is m, their difference is d=abs(n-m). They will cover this distance and meet at the merge point.
Code:
int findMergeNode(SinglyLinkedListNode* head1, SinglyLinkedListNode* head2) {
SinglyLinkedListNode* start1=head1;
SinglyLinkedListNode* start2=head2;
while (start1!=start2){
start1=start1->next;
start2=start2->next;
if (!start1)
start1=head2;
if (!start2)
start2=head1;
}
return start1->data;
}

Here is naive solution , No neeed to traverse whole lists.
if your structured node has three fields like
struct node {
int data;
int flag; //initially set the flag to zero for all nodes
struct node *next;
};
say you have two heads (head1 and head2) pointing to head of two lists.
Traverse both the list at same pace and put the flag =1(visited flag) for that node ,
if (node->next->field==1)//possibly longer list will have this opportunity
//this will be your required node.

How about this:
If you are only allowed to traverse each list only once, you can create a new node, traverse the first list to have every node point to this new node, and traverse the second list to see if any node is pointing to your new node (that's your merge point). If the second traversal doesn't lead to your new node then the original lists don't have a merge point.
If you are allowed to traverse the lists more than once, then you can traverse each list to find our their lengths and if they are different, omit the "extra" nodes at the beginning of the longer list. Then just traverse both lists one step at a time and find the first merging node.

Steps in Java:
Create a map.
Start traversing in the both branches of list and Put all traversed nodes of list into the Map using some unique thing related to Nodes(say node Id) as Key and put Values as 1 in the starting for all.
When ever first duplicate key comes, increment the value for that Key (let say now its value became 2 which is > 1.
Get the Key where the value is greater than 1 and that should be the node where two lists are merging.

We can efficiently solve it by introducing "isVisited" field. Traverse first list and set "isVisited" value to "true" for all nodes till end. Now start from second and find first node where flag is true and Boom ,its your merging point.

Step 1: find lenght of both the list
Step 2 : Find the diff and move the biggest list with the difference
Step 3 : Now both list will be in similar position.
Step 4 : Iterate through list to find the merge point
//Psuedocode
def findmergepoint(list1, list2):
lendiff = list1.length() > list2.length() : list1.length() - list2.length() ? list2.lenght()-list1.lenght()
biggerlist = list1.length() > list2.length() : list1 ? list2 # list with biggest length
smallerlist = list1.length() < list2.length() : list2 ? list1 # list with smallest length
# move the biggest length to the diff position to level both the list at the same position
for i in range(0,lendiff-1):
biggerlist = biggerlist.next
#Looped only once.
while ( biggerlist is not None and smallerlist is not None ):
if biggerlist == smallerlist :
return biggerlist #point of intersection
return None // No intersection found

int FindMergeNode(Node *headA, Node *headB)
{
Node *tempB=new Node;
tempB=headB;
while(headA->next!=NULL)
{
while(tempB->next!=NULL)
{
if(tempB==headA)
return tempB->data;
tempB=tempB->next;
}
headA=headA->next;
tempB=headB;
}
return headA->data;
}

Use Map or Dictionary to store the addressess vs value of node. if the address alread exists in the Map/Dictionary then the value of the key is the answer.
I did this:
int FindMergeNode(Node headA, Node headB) {
Map<Object, Integer> map = new HashMap<Object, Integer>();
while(headA != null || headB != null)
{
if(headA != null && map.containsKey(headA.next))
{
return map.get(headA.next);
}
if(headA != null && headA.next != null)
{
map.put(headA.next, headA.next.data);
headA = headA.next;
}
if(headB != null && map.containsKey(headB.next))
{
return map.get(headB.next);
}
if(headB != null && headB.next != null)
{
map.put(headB.next, headB.next.data);
headB = headB.next;
}
}
return 0;
}

A O(n) complexity solution. But based on an assumption.
assumption is: both nodes are having only positive integers.
logic : make all the integer of list1 to negative. Then walk through the list2, till you get a negative integer. Once found => take it, change the sign back to positive and return.
static int findMergeNode(SinglyLinkedListNode head1, SinglyLinkedListNode head2) {
SinglyLinkedListNode current = head1; //head1 is give to be not null.
//mark all head1 nodes as negative
while(true){
current.data = -current.data;
current = current.next;
if(current==null) break;
}
current=head2; //given as not null
while(true){
if(current.data<0) return -current.data;
current = current.next;
}
}

You can add the nodes of list1 to a hashset and the loop through the second and if any node of list2 is already present in the set .If yes, then thats the merge node
static int findMergeNode(SinglyLinkedListNode head1, SinglyLinkedListNode head2) {
HashSet<SinglyLinkedListNode> set=new HashSet<SinglyLinkedListNode>();
while(head1!=null)
{
set.add(head1);
head1=head1.next;
}
while(head2!=null){
if(set.contains(head2){
return head2.data;
}
}
return -1;
}

Solution using javascript
var getIntersectionNode = function(headA, headB) {
if(headA == null || headB == null) return null;
let countA = listCount(headA);
let countB = listCount(headB);
let diff = 0;
if(countA > countB) {
diff = countA - countB;
for(let i = 0; i < diff; i++) {
headA = headA.next;
}
} else if(countA < countB) {
diff = countB - countA;
for(let i = 0; i < diff; i++) {
headB = headB.next;
}
}
return getIntersectValue(headA, headB);
};
function listCount(head) {
let count = 0;
while(head) {
count++;
head = head.next;
}
return count;
}
function getIntersectValue(headA, headB) {
while(headA && headB) {
if(headA === headB) {
return headA;
}
headA = headA.next;
headB = headB.next;
}
return null;
}

If editing the linked list is allowed,
Then just make the next node pointers of all the nodes of list 2 as null.
Find the data value of the last node of the list 1.
This will give you the intersecting node in single traversal of both the lists, with "no hi fi logic".

Follow the simple logic to solve this problem:
Since both pointer A and B are traveling with same speed. To meet both at the same point they must be cover the same distance. and we can achieve this by adding the length of a list to another.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Algorithm to match roots between two string lists - algorithm

Sort the lists alphabetically then compare the values and step forward in the list that has the smaller value. If the lists have any elements in common the values will match.

Related

abstract inplace mergesort for effective merge sort

Algorithm to find duplicate in an array

An efficient way to find matching items in N lists?

C/C++/Java/C#: help parsing numbers

Check if two linked lists merge. If so, where?

Categories

Resources