Adjacency Set Representaion in Python

Adjacency Set Representaion in Python - algorithm

So I got this really cool book from university library today, Python Algorithms by Magnus Lie Hetland and in the Chapter second of the book he creates the adjacency list as follows, which was kind of cool:
a,b,c,d,e,f,g,h = range(8)
N = [{b,c,d,e,f},{c,e},{d},{e},{f},{c,g,h},{f,h},{f,g}]
And when I do:
N[a] I get the first element of N, and it's kind of surprising me how did it got mapped in such a manner?
I found this question but it's different than what I am asking still let me know if it's a duplicate.
Adjacency List and Adjacency Matrix in Python
Thanks,
Prerit

It's just Python.
a,b,c,d,e,f,g,h = range(8)
is tuple assignment. It assigns 0 to a, 1 to b, etc.
N = [{b,c,d,e,f},{c,e},{d},{e},{f},{c,g,h},{f,h},{f,g}]
creates an array named N where the 0'th element is the set {b,c,d,e,f}, etc.
So when you say N[a], you're also saying N[0], and that's the set you're seeing.
It's a cool trick for building a constant graph by hard-coding in Python, but if you need to build the graph dynamically based on input or output from another algorithm, then you'll want a different representation.

Related

Using Linked List to represent a Matrix Class

I'm having trouble initializing the linked list for the matrix based on the parameters I input. So if I input the parameters (3,3) it should actually make make 4x4 so I can use the first column and first row for indexing. and the left top corner node as an entry point.
def __init__(self, m, n, default=0):
self._head = MatrixNode(None)
for node in range(m - 1):
node = MatrixNode(0)
node._right = node
for node in range(n - 1):
node = MatrixNode(0)
node._down = node
this is what I have so far but I'm sure its horrible.

At first, it may be useful to know, what a MatrixNode is. I guess you just want to store a value there?
Then i see two linear loops, while a matrix is a n*m data structure. Are you sure, your loops do not need to be nested to initialize your structure correctly?
For linked lists i would expect something like row.next = nextrow and row.startnode.next = nextnode, i do not see anything like this here.
Having this said, i want to ask you, if you really want to implement a matrix yourself, and in such an object oriented (inefficient!) way.
You can use two-dimensional arrays (a=[[1,2], [3,4]];a[0][0]==1) or a good implementation from a numerics library like numpy/scipy.
There you have numpy.array for storing n-dimensional data (with nice addressing like matrix[1,2] and similiar syntax to matlab) or numpy.matrix which is like an array with some methods overloaded for matrix operations (i.e. matrix-matrix multiplication of arrays is pointwise and for matrices it's the usual matrix multiplication).

You are right, its horrible :)
First things first, a linked list is a very bad way of representing a matrix.
If you want to represent a matrix, start with a list of lists, and work from there if that's not enough (see other answer mentioning numpy, for example)
If you want to learn to use linked lists, choose a better example.
Then: you are re-using the variable name "node" for different things:
Your loop index. The code for node in range(...) will assign an integer from the range to node in every iteration.
Then you assign a new MatrixNode to node, and then you set the node's neighbor (_right or _down) to be not the actual neighbor, but itself (node._right = node).
You also never save your nodes that you create inside the loops anywhere, so they will be garbage-collected.
And you never use the optional argument default.

Algorithm for finding basis of a set of bitstrings?

This is for a diff utility I'm writing in C++.
I have a list of n character-sets {"a", "abc", "abcde", "bcd", "de"} (taken from an alphabet of k=5 different letters). I need a way to observe that the entire list can be constructed by disjunctions of the character-sets {"a", "bc", "d", "e"}. That is, "b" and "c" are linearly dependent, and every other pair of letters is independent.
In the bit-twiddling version, the character-sets above are represented as {10000, 11100, 11111, 01110, 00011}, and I need a way to observe that they can all be constructed by ORing together bitstrings from the smaller set {10000, 01100, 00010, 00001}.
In other words, I believe I'm looking for a "discrete basis" of a set of n different bit-vectors in {0,1}k. This paper claims the general problem is NP-complete... but luckily I'm only looking for a solution to small cases (k < 32).
I can think of really stupid algorithms for generating the basis. For example: For each of the k2 pairs of letters, try to demonstrate (by an O(n) search) that they're dependent. But I really feel like there's an efficient bit-twiddling algorithm that I just haven't stumbled upon yet. Does anyone know it?
EDIT: I ended up not really needing a solution to this problem after all. But I'd still like to know if there is a simple bit-twiddling solution.

I'm thinking a disjoint set data structure, like union find turned on it's head (rather than combining nodes, we split them).
Algorithm:
Create an array main where you assign all the positions to the same group, then:
for each bitstring curr
for each position i
if (curr[i] == 1)
// max of main can be stored for constant time access
main[i] += max of main from previous iteration
Then all the distinct numbers in main are your different sets (possibly using the actual union-find algorithm).
Example:
So, main = 22222. (I won't use 1 as groups to reduce possible confusion, as curr uses bitstrings).
curr = 10000
main = 42222 // first bit (=2) += max (=2)
curr = 11100
main = 86622 // first 3 bits (=422) += max (=4)
curr = 11111
main = 16-14-14-10-10
curr = 01110
main = 16-30-30-26-10
curr = 00011
main = 16-30-30-56-40
Then split by distinct numbers:
{10000, 01100, 00010, 00001}
Improvement:
To reduce the speed at which main increases, we can replace
main[i] += max of main from previous iteration
with
main[i] += 1 + (max - min) of main from previous iteration
EDIT: Edit based on j_random_hacker's comment

You could combine the passes of the stupid algorithm at the cost of space.
Make a bit vector called violations that is (k - 1) k / 2 bits long (so, 496 for k = 32.) Take a single pass over character sets. For each, and for each pair of letters, look for violations (i.e. XOR the bits for those letters, OR the result into the corresponding position in violations.) When you're done, negate and read off what's left.

You could give Principal Component Analysis a try. There are some flavors of PCA designed for binary or more generally for categorical data.

Since someone showed it as NP complete, for large vocabs I doubt you will do better than a brute force search (with various pruning possible) of the entire set of possibilities O((2k-1) * n). At least in a worst case scenario, probably some heuristics will help in many cases as outlined in the paper you linked. This is your "stupid" approach generalized to all possible basis strings instead of just basis of length 2.
However, for small vocabs, I think an approach like this would do a lot better:
Are your words disjoint? If so, you are done (simple case of independent words like "abc" and "def")
Perform bitwise and on each possible pair of words. This gives you an initial set of candidate basis strings.
Goto step 1, but instead of using the original words, use the current basis candidate strings
Afterwards you also need to include any individual letter which is not a subset of one of the final accepted candidates. Maybe some other minor bookeeping for things like unused letters (using something like a bitwise or on all possible words).
Considering your simple example:
First pass gives you a, abc, bc, bcd, de, d
Second pass gives you a, bc, d
Bookkeeping gives you a, bc, d, e
I don't have a proof that this is right but I think intuitively it is at least in the right direction. The advantage lies in using the words instead of the brute force's approach of using possible candidates. With a large enough set of words, this approach would become terrible, but for vocabularies up to say a few hundred or maybe even a few thousand I bet it would be pretty quick. The nice thing is that it will still work even for a huge value of k.
If you like the answer and bounty it I'd be happy to try to solve in 20 lines of code :) and come up with a more convincing proof. Seems very doable to me.

How to find all possible pairs from three subsets of a set with constraints in Erlang?

I have a set M which consists of three subsets A,B and C.
Problem: I would like to calculate all possible subsets S(1)...S(N) of M which contain all possible pairs between elements of A, B and C in such manner that:
elements of A and B can happen in a pair only once for each of two positions in a pair (that is {a1,a2} and {b1,a1} can be in one subset S, but no more elements {a1,_} and {_,a1} are allowed in this subset S);
elements of C can happen 1-N times in a subset S (that is {a,c}, {b,c}, {x,c} can happen in one subset S), but I would like to get subsets S for all possible numbers of elements of C in a subset S.
For example, if we have A = [a1,a2], B = [b1,b2], C = [c1,c2], then some of the resulting subsets S would be (remember, they should contain pairs of elements):
- {a1,b1}, {b1,a2}, {a2,b2}, {b2,c1};
- {a1,b1}, {b1,a2}, {a2,b2}, {b2,c1}, {c1,c2};
- {a1,c1}, {c1,a2}, {c1,b2}, {b1,c1};
- etc.
I tend to think that first I need to find all possible subsets of M, which contain only one element of A, one element of B and 1..N elements of C (1). And after that I should somehow generate sets of pairs (2) from that. But I am not sure that this is the right strategy.
So, the more elaborated question would be:
what is the best way to create sets and find subsets in Erlang if the elements of the set M a integers?
are there any ready-made tools to find subsets of a set in Erlang?
are there any ready-made tools to generate all possible pairs of elements of a set in Erlang?
How can I solve the aforementioned problem in Erlang?

There is a sets module*, but I suspect you're better off thinking up an algorithm first -- its implementation in Erlang is the problem (or not) that comes after this. (Maybe you notice its actually a graph algorithm (like, bipartite matching something something), and you'll get happy with Erlang's digraph module.)
Long story short, when you come up with an algorithm, Erlang can very probably be used to implement it. Yes, there is a certain support for sets. But solutions to a problem requiring "all possible subsets" tend to be exponential (i.e., given n elements, there are 2^n subsets; for every element you either have it in your subset or not) and thus bad.
(* there are some modules concerning sets)

What is determining the items that make the difference between two arrays called?

I want to find which elements of two arrays make the two arrays different.
For example, if I start off with
known_unacceptable_array = [bad, bad, good, good, good, bad, good]
known_acceptable_array = []
and an array is only unacceptable if there's three bads (but I don't know that at the time), but I'm able to evaluate whether an array is acceptable or unacceptable, I would like to find the smallest array that makes the array unacceptable
possibly_minimal_unacceptable = [bad, bad, bad]
maximal_acceptable = [bad, bad] # Third bad required to make the array unacceptable
What is this problem called, and what algorithms are there for this?
Edit: The elements can't be changed in order, and adding an element can only either change the list from being acceptable to unacceptable or have no effect - it can't change it from being unacceptable to acceptable.
Background: I've randomly generated thousands of instructions that make a ruby interpreter crash, and I want to isolate the specific instructions that cause it to crash, and at the time I thought that multiple bad instructions were required to make it crash. A very naive attempt to determine what the bad instructions is at this link

What is determining the elements that make the difference
between two arrays called?
Differencing is often called
subtraction.
I want to determine which elements of two arrays make the
two arrays different.
Again, that's subtraction(at least
some form of it):
Given A ={ x , y , z } B = { x , y a },
A - B = { z , -a }
or "only A has z and only B has a", or "z and a" make them
different.
For example, if I start off with
known_bad = [bad, bad, good, good, good, bad, good] >
known_good = []
Why start with a full array and an empty one? Isn't this an
extreme case, or are these "two arrays" not two of which you
are trying to determine the "difference."
possibly_minimal_bad = [bad, bad, bad]
maximal_good = [bad, bad] # Third bad required to make the list bad
Is this just a set of rules? Or is this the result of
finding the difference between the two arrays of the previous
(known_good,bad) set?
What is this problem called, and what algorithms are there
for this?
If it isn't called "difference" or "subtraction" then why
introduce it that way?
Is the problem: a. going from
the first two arrays (known_xx) to the second two (min,max);
or is it: b. classifying finite sequences of the words "good"
and "bad."
a) I can't see a relation between the first two
arrays and the second two. How did you get from the first two
to the second?
b) Classifying a sequence of words could be
"parsing a language", or decoding a message, recognizing a
pattern, etc.
Is it "Pattern Recognition"?
It appears that you are looking for a pattern in test input(or test point) data and it's relationship to product failure,
and want to represent the relationship in some codical
form for further analysis. Or searching for a correlation between certain test points and product failure. That makes this question rather
interesting. However, the presentation of the question
is quite confusing. Maybe those groups of
equations could be explained a little more, clarifying if they are related,and if so, then: In what way?

I'm not entirely sure if I understand the question. If my answer is unsatisfactory, please rephrase your question to be more clear. I'll base my answer on this.
I want to determine which elements of two arrays make the two arrays different.
This is a combination of the three set operations union, intersection and difference. Different combinations can achieve the same result.
Complement is the the subset of A which is not in B.
Intersection is the set of elements which is both in A and B, but not just A or B.
Union is the subset which is either in A or B (no duplicates).
It sounds like you want the union of both complements, which is:
A\B ∪ B\A
Or the complement between the intersection and the union:
A∩B \ A∪B
See http://en.wikipedia.org/wiki/Set_operations_(Boolean) for more information.

Decision Tree learning algorithm

I want to preface this by saying that this is a homework assignment.
I am given a set of Q binary input variables that will be used to classify output of Y which is also binary.
The first part of the question is: at most how many examples do I need to enumarate all possibile combinations of Q? I am currently think that since it asks for at most I will need Q as it is possible that all values up to Q-1 are the same for instance 1 and the item at Q is 0 .
The second part of the question is: at most how many leaf nodes can the tree have given Z examples?
My current answer is that at most the tree would have 2 leaf nodes, one representing true and one representing false since it is dealing with binary inputs and binary outputs.
Is this the correct way of examining this problem or am I generalizing my answers too deeply?
Edit
After looking at Cameron's response, I would now turn my first answer into 2^Q and to build on his example of Q = 3, I would get 2^3 or 8 (2*2*2). Please correct if that is incorrect thinking.
Edit #2
The second part of the question it appears as though it should be (2^Q) * Z or to provide an example: (2^3) * 3) or 8*3 = 24 leaf nodes. To recap if I have 3 inputs that are binary I would initially take 2^3 and get 8 now I want to go over 3 examples. Therefore I should get 8*3 or 24.
Edit #3
In hindsight it seems that no matter how many examples I use the number of leaf nodes should never increase, as it is a per tree basis.

I'd suggest you approach the problem by working out small example cases by hand.
For the first part, choose a small value for Q, say 3, and write down all possible combinations of Q. Then you can figure out how many examples you need. Increase Q and do it again.
For the second part of your question, pick a small Z and run the decision tree algorithm by hand. See how many leaves you get. Then pick another Z and see if/how it changes. Try generating different examples (with the same Z) and see if you can change the number of leaves.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio