Weighted union rule - data-structures

Weighted union rule - data-structures

Can someone check with me if I'm using the rule right in the last step (7)?
UPDATE:
Numbers inside the parentheses are the number of elements (weight(?)) of each set that takes part in the Union. Uppercase letters are names of sets.
As I understand this: we are using as our rank the number of elements? This is getting confusing, each one is using different terms for the same stuff.
We have Unions:
U(1,2,A)
U(3,4,B)
U(A,B,C)
U(5,6,D)
U(7,8,E)
U(D,C,F)
U(E,F,G)

Step 7 (and the others) looks correct, but step 6 doesn't.
In step 6, 4 should be the root, as that's the bigger tree.

void combine(int x,int y)
{
int xroot=find(x),yroot=find(y);
if(rank[xroot]<rank[yroot])
parent[xroot]=yroot;
else if(rank[xroot]>rank[yroot])
parent[yroot]=xroot;
else
{///rank of both is equal..
parent[yroot]=xroot;
rank[xroot]++;
}
}
Using rank, you see the size of set, not sum of vertices, so step 6 is wrong.
But why the size?
Because if we make root of bigger set the root of smaller set , we need to update parents of smaller number of nodes.
For the best explanation, I would recommend CLRS (Introduction to Algorithms).
Hope it helps you!

Related

Number Bases & Ruby - What is the fastest way to increment a variable intended for a specific number base?

I'd like to be able to have a variable "num" that I can increment and have it be understood what number base its in.
For example, if num is base 7 and is equal to 66, if I do num+= 1, num should be set to 100.
One solution involves to_s and to_i, however, theres so much converting going on it doesn't seem like it would be very efficient.
def increment_with_base(number, base)
number_base_ten = number.to_s.to_i(base)
number_base_ten += 1
number_base_ten.to_s(base).to_i
end
Is there something more appropriate than this? Is it possible to tell Ruby which number base I'm using so I don't have to do so many conversions?
As I mentioned in a comment below, I'm very familiar with number bases - just not number bases in Ruby. I actually need to display each incremented number (and I'll be incrementing a lot).
Reading this next part isn't necessary if you know the answer to the question. However, I've added it to clarify why I'm doing what I'm doing. No need to read what follows unless you're looking for more information.
For further information, what I'm doing is generating a set of graphs where each node only has either 0 or 1 transition. Each node is represented by a digit, and the specific digit represents which other node there is a directed edge towards. For example, the number 4.3.1.0 is a graph with four nodes where the first node has an edge to the fourth node, the second has an edge to the third, the third has an edge to the first, and the fourth node does not have any transition.
So if I wanted to generate all four node graphs, where each node only has one exiting edge, I'd need to count from 0.0.0.0 to 4.4.4.4.

Bases don't matter for arithmetic. Numbers are just numbers, all bases are just slightly different ways of representing them. For example, 667 (xy means x is written in base y) is 4810 and regardless of what base you think in while adding one to it, the result is always the same number: 1007 = 4910 = 3116 = ... and so on for any other base (even the really weird ones like the golden ratio base). It's the same operation with the same result.
Just do your arithmetic on numbers and only worry about base when it actually matters (e.g. when rendering it for a user). Even if you need to display it after every update, you still save a couple conversions, which is not only faster but also much simpler and cleaner.

If you simply want to enumerate in a different base, you don't need to explicitly increment:
BASE = 4
MAX = 100
(0..MAX).each do |x|
puts x.to_s( BASE )
end
This isn't much code, and it's pretty fast. Is that suitable for requirements?
And to better match your underlying problem (as well as I understand it?)
NODES = 4
BASE = NODES + 1
MAX = BASE ** NODES
(0...MAX).each do |x|
puts ("0" * NODES + x.to_s( BASE )).chars.to_a[(-NODES..-1)].join('.')
end
I timed the above at 0.01 seconds on my laptop. But if you try with 9 nodes, it takes much much longer (no surprise, you would be looping one billion times!)

Decision Tree learning algorithm

I want to preface this by saying that this is a homework assignment.
I am given a set of Q binary input variables that will be used to classify output of Y which is also binary.
The first part of the question is: at most how many examples do I need to enumarate all possibile combinations of Q? I am currently think that since it asks for at most I will need Q as it is possible that all values up to Q-1 are the same for instance 1 and the item at Q is 0 .
The second part of the question is: at most how many leaf nodes can the tree have given Z examples?
My current answer is that at most the tree would have 2 leaf nodes, one representing true and one representing false since it is dealing with binary inputs and binary outputs.
Is this the correct way of examining this problem or am I generalizing my answers too deeply?
Edit
After looking at Cameron's response, I would now turn my first answer into 2^Q and to build on his example of Q = 3, I would get 2^3 or 8 (2*2*2). Please correct if that is incorrect thinking.
Edit #2
The second part of the question it appears as though it should be (2^Q) * Z or to provide an example: (2^3) * 3) or 8*3 = 24 leaf nodes. To recap if I have 3 inputs that are binary I would initially take 2^3 and get 8 now I want to go over 3 examples. Therefore I should get 8*3 or 24.
Edit #3
In hindsight it seems that no matter how many examples I use the number of leaf nodes should never increase, as it is a per tree basis.

I'd suggest you approach the problem by working out small example cases by hand.
For the first part, choose a small value for Q, say 3, and write down all possible combinations of Q. Then you can figure out how many examples you need. Increase Q and do it again.
For the second part of your question, pick a small Z and run the decision tree algorithm by hand. See how many leaves you get. Then pick another Z and see if/how it changes. Try generating different examples (with the same Z) and see if you can change the number of leaves.

Tricky programming problem that I'm having trouble getting my head around

First off, let me say that this is not homework (I am an A-Level student, this is nothing close to what we problem solve (this is way harder)), but more of a problem I'm trying to suss out to improve my programming logic.
I thought of a scenario where there is an array of random integers, let's for example say 10 integers. The user will input a number he wants to count to, and the algorithm will try and work out what numbers are needed to make that sum. For example if I wanted to make the sum 44 from this array of integers:
myIntegers = array(1, 5, 9, 3, 7, 12, 36, 22, 19, 63);
The output would be:
36 + 3 + 5 = 44
Or something along those lines. I hope I make myself clear. As an added bonus I would like to make the algorithm pick as few numbers as possible to make the required sum, or give out an error if the sum cannot be made with the numbers supplied.
I thought about using recursion and iterating through the array, adding numbers over and over until the sum is met or gone past. But what I can't get my head around is what to do if the algorithm goes past the sum and needs to be selective about what numbers to pick from the array.
I'm not looking for complete code, or a complete algorithm, I just want your opinions on how I should proceed with this and perhaps share a few tips or something. I'll probably start work on this tonight. :P
As I said, not homework. Just me wanting to do something a bit more advanced.
Thanks for any help you're able to offer. :)

You are looking at the Knapsack Problem
The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most useful items.
Edit: Your special case is the Subset Sum Problem

Will subset sum do? ;]

This is the classic Knapsack problem that you would see in college level algorithms course (or at least I saw it then). Best to work this out on paper and the solution in code should be relatively easy to work out.
EDIT: One thing to consider is dynamic programming.

Your Problem is related to the subset sum problem.
You have to try all possible combinations in the worst case.

No shortcuts here I'm afraid. In addition to what other people have said, about what specific problem this is etc., here's some practical advice to offer you a starting point:
I would sort the array and given the input sum m, would find the first number in the array less than m, call it n (this is your first possible number for the sum), and start from the highest possible complement (m-n), working your way down.
If you don't find a precise match, pick the highest available, call it o, (that now is your 2nd number) and look for the 3rd one starting from (m-n-o) and work your way down again.
If you don't find a precise match, start with the next number n (index of original n at index-1) and do the same. You can keep doing this until you find a precise match for two numbers. If no match for the sum is found for two numbers, start the process again, but expand it to include a 3rd number. And so on.
That could be done recursively. At least this approach ensures that when you find a match, it will be the one with the least possible numbers in the set forming the total input sum.
Potentially though, worst case, you end up going through the whole lot.
Edit: As Venr correctly points out, my first approach was incorrect. Edited approach to reflect this.

There is a very efficient randomized algorithm for this problem. I know you already accepted an answer, but I'm happy to share anyway, I just hope people will still check this question :).
Let Used = list of numbers that you sum.
Let Unused = list of numbers that you DON'T sum.
Let tmpsum = 0.
Let S = desired sum you want to reach.
for ( each number x you read )
toss a coin:
if it's heads and tmpsum < S
add x to Used
else
add x to Unused
while ( tmpsum != S )
if tmpsum < S
MOVE one random number from Unused to Used
else
MOVE one random number from Used to Unused
print the Used list, containing the numbers you need to add to get S
This will be much faster than the dynamic programming solution, especially for random inputs. The only problems are that you cannot reliably detect when there is no solution (you could let the algorithm run for a few seconds and if it doesn't finish, assume there is no solution) and that you cannot be sure you will get the solution with minimum number of elements chosen. Again, you could add some logic to make the algorithm keep going and trying to find a solution with less elements until certain stop conditions are met, but this will make it slower. However, if you are only interested in a solution that works and you have a LOT of numbers and the desired sum can be VERY big, this is probably better than the DP algorithm.
Another advantage of this approach is that it will also work for negative and rational numbers with no modifications, which is not true for the DP solution, because the DP solution involves using partial sums as array indexes, and indexes can only be natural numbers. You can of course use hashtables for example, but that will make the DP solution even slower.

I don't know exactly what's this task is called, but it seems that it's kind of http://en.wikipedia.org/wiki/Knapsack_problem.

Heh, I'll play the "incomplete specification" card (nobody said that numbers couldn't appear more than once!) and reduce this to the "making change" problem. Sort your numbers in decreasing order, find the first one less than your desired sum, then subtract that from your sum (division and remainders could speed this up). Repeat until sum = 0 or no number less than the sum is found.
For completeness, you would need to keep track of the number of addends in each sum, and of course generate the additional sequences by keeping track of the first number you use, skipping that, and repeating the process with the additional numbers. This would solve the (7 + 2 + 1) over (6 + 4) problem.

Repeating the answer of others: it is a Subset Sum problem.
It could be efficiently solved by Dynamic Programing technique.
The following has not been mentioned yet: the problem is Pseudo-P (or NP-Complete in weak sense).
Existence of an algorithm (based on dynamic programming) polynomial in S (where S is the sum) and n (the number of elements) proves this claim.
Regards.

Ok, I wrote a C++ program to solve the above problem. The algorithm is simple :-)
First of all arrange whatever array you have in descending order(I have hard-coded the array in descending form but you may apply any of the sorting algorithms ).
Next I took three stacks n, pos and sum. The first one stores the number for which a possible sum combination is to be found, the second holds the index of the array from where to start the search, the third stores the elements whose addition will give you the number you enter.
The function looks for the largest number in the array which is smaller than or equal to the number entered. If it is equal, it simply pushes the number onto the sum stack. If not, then it pushes the encountered array element to the sum stack(temporarily), and finds the difference between the number to search for and number encountered, and then it performs recursion.
Let me show an example:-
to find 44 in {63,36,22,19,12,9,7,5,3,1}
first 36 will be pushed in sum(largest number less than 44)
44-36=8 will be pushed in n(next number to search for)
7 will be pushed in sum
8-7=1 will be pushed in n
1 will be pushed in sum
thus 44=36+7+1 :-)
#include <iostream>
#include<conio.h>
using namespace std;
int found=0;
void func(int n[],int pos[],int sum[],int arr[],int &topN,int &topP,int &topS)
{
int i=pos[topP],temp;
while(i<=9)
{
if(arr[i]<=n[topN])
{
pos[topP]=i;
topS++;
sum[topS]=arr[i];
temp=n[topN]-arr[i];
if(temp==0)
{
found=1;
break;
}
topN++;
n[topN]=temp;
temp=pos[topP]+1;
topP++;
pos[topP]=temp;
break;
}
i++;
}
if(i==10)
{
topP=topP-1;
topN=topN-1;
pos[topP]+=1;
topS=topS-1;
if(topP!=-1)
func(n,pos,sum,arr,topN,topP,topS);
}
else if(found!=1)
func(n,pos,sum,arr,topN,topP,topS);
}
main()
{
int x,n[100],pos[100],sum[100],arr[10]={63,36,22,19,12,9,7,5,3,1},topN=-1,topP=-1,topS=-1;
cout<<"Enter a number: ";
cin>>x;
topN=topN+1;
n[topN]=x;
topP=topP+1;
pos[topP]=0;
func(n,pos,sum,arr,topN,topP,topS);
if(found==0)
cout<<"Not found any combination";
else{
cout<<"\n"<<sum[0];
for(int i=1;i<=topS;i++)
cout<<" + "<<sum[i];
}
getch();
}
You can copy the code and paste it in your IDE, works fine :-)

Shortest branch in a binary tree?

A binary tree can be encoded using two functions l and r
such that for a node n, l(n) give the left child of n, r(n)
give the right child of n.
A branch of a tree is a path from the root to a leaf, the
length of a branch to a particular leaf is the number of
arcs on the path from the root to that leaf.
Let MinBranch(l,r,x) be a simple recursive algorithm for
taking a binary tree encoded by the l and r functions
together with the root node x for the binary tree and
returns the length of the shortest branch of the binary
tree.
Give the pseudocode for this algorithm.
OK, so basically this is what I've come up with so far:
MinBranch(l, r, x)
{
if x is None return 0
left_one = MinBranch(l, r, l(x))
right_one = MinBranch(l, r, r(x))
return {min (left_one),(right_one)}
}
Obviously this isn't great or perfect. I'd be greatful if
people can help me get this perfect and working - any help
will be appreciated.

I doubt anyone will solve homework for you straight-up. A clue: the return value must surely grow higher as the tree gets bigger, right? However I don't see any numeric literals in your function except 0, and no addition operators either. How will you ever return larger numbers?
Another angle on the same issue: anytime you write a recursive function, it helps to enumerate "what are all the conditions where I should stop calling myself? what I return in each circumstance?"

You're on the right approach, but you're not quite there; your recursive algorithm will always return 0. (the logic is almost right, though...)
note that the length of the sub-branches is one less than the length of the branch; so left_one and right_one should be 1 + MinBranch....
Steping through the algorithm with some sample trees will help uncover off-by-one errors like this one...

It looks like you almost have it, but consider this example:
4
3 5
When you trace through MinBranch, you'll see that in your
MinBranch(l,r,4) call:
left_one = MinBranch(l, r, l(x))
= MinBranch(l, r, l(4))
= MinBranch(l, r, 3)
= 0
That makes sense, after all, 3 is a leaf node, so of course the distance
to the closest leaf node is 0. The same happens for right_one.
But you then wind up here:
return {min (left_one),(right_one)}
= {min (0), (0) }
= 0
but that's clearly wrong, because this node (4) is not a leaf node. Your
code forgot to count the current node (oops!). I'm sure you can manage
to fix that.
Now, actually, they way you're doing this isn't the fastest, but I'm not
sure if that's relevant for this exercise. Consider this tree:
4
3 5
2
1
Your algorithm will count up the left branch recursively, even though it
could, hypothetically, bail out if you first counted the right branch
and noted that 3 has a left, so its clearly longer than 5 (which is a
leaf). But, of course, counting the right branch first doesn't always
work!
Instead, with more complicated code, and probably a tradeoff of greater
memory usage, you can check nodes left-to-right, top-to-bottom (just
like English reading order) and stop at the first leaf you find.

What you've created can be thought of as a depth-first search. However, given what you're after (shortest branch), this may not be the most efficent approach. Think about how your algorithm would perform on a tree that was very heavy on the left side (of the root node), but had only one node on the right side.
Hint: consider a breadth-first search approach.

What you have there looks like a depth first search algorithm which will have to search the entire tree before you come up with a solution. what you need is the breadth first search algorithm which can return as soon as it finds the solution without doing a complete search

Ordering a dictionary to maximize common letters between adjacent words

This is intended to be a more concrete, easily expressable form of my earlier question.
Take a list of words from a dictionary with common letter length.
How to reorder this list tto keep as many letters as possible common between adjacent words?
Example 1:
AGNI, CIVA, DEVA, DEWA, KAMA, RAMA, SIVA, VAYU
reorders to:
AGNI, CIVA, SIVA, DEVA, DEWA, KAMA, RAMA, VAYU
Example 2:
DEVI, KALI, SHRI, VACH
reorders to:
DEVI, SHRI, KALI, VACH
The simplest algorithm seems to be: Pick anything, then search for the shortest distance?
However, DEVI->KALI (1 common) is equivalent to DEVI->SHRI (1 common)
Choosing the first match would result in fewer common pairs in the entire list (4 versus 5).
This seems that it should be simpler than full TSP?

What you're trying to do, is calculate the shortest hamiltonian path in a complete weighted graph, where each word is a vertex, and the weight of each edge is the number of letters that are differenct between those two words.
For your example, the graph would have edges weighted as so:
DEVI KALI SHRI VACH
DEVI X 3 3 4
KALI 3 X 3 3
SHRI 3 3 X 4
VACH 4 3 4 X
Then it's just a simple matter of picking your favorite TSP solving algorithm, and you're good to go.

My pseudo code:
Create a graph of nodes where each node represents a word
Create connections between all the nodes (every node connects to every other node). Each connection has a "value" which is the number of common characters.
Drop connections where the "value" is 0.
Walk the graph by preferring connections with the highest values. If you have two connections with the same value, try both recursively.
Store the output of a walk in a list along with the sum of the distance between the words in this particular result. I'm not 100% sure ATM if you can simply sum the connections you used. See for yourself.
From all outputs, chose the one with the highest value.
This problem is probably NP complete which means that the runtime of the algorithm will become unbearable as the dictionaries grow. Right now, I see only one way to optimize it: Cut the graph into several smaller graphs, run the code on each and then join the lists. The result won't be as perfect as when you try every permutation but the runtime will be much better and the final result might be "good enough".
[EDIT] Since this algorithm doesn't try every possible combination, it's quite possible to miss the perfect result. It's even possible to get caught in a local maximum. Say, you have a pair with a value of 7 but if you chose this pair, all other values drop to 1; if you didn't take this pair, most other values would be 2, giving a much better overall final result.
This algorithm trades perfection for speed. When trying every possible combination would take years, even with the fastest computer in the world, you must find some way to bound the runtime.
If the dictionaries are small, you can simply create every permutation and then select the best result. If they grow beyond a certain bound, you're doomed.
Another solution is to mix the two. Use the greedy algorithm to find "islands" which are probably pretty good and then use the "complete search" to sort the small islands.

This can be done with a recursive approach. Pseudo-code:
Start with one of the words, call it w
FindNext(w, l) // l = list of words without w
Get a list l of the words near to w
If only one word in list
Return that word
Else
For every word w' in l do FindNext(w', l') //l' = l without w'
You can add some score to count common pairs and to prefer "better" lists.

You may want to take a look at BK-Trees, which make finding words with a given distance to each other efficient. Not a total solution, but possibly a component of one.

This problem has a name: n-ary Gray code. Since you're using English letters, n = 26. The Wikipedia article on Gray code describes the problem and includes some sample code.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio