Compute theta join in relational algebra - relational-algebra

I`m having trouble with this problem:
Suppose relation R(A,B) has the following tuples:
A B
1 2
3 4
5 6
and relation S(B,C,D) has the following tuples:
B C D
2 4 6
4 6 8
4 7 9
Compute the theta-join of R and S with the condition R.A < S.C AND R.B < S.D. Which of the following tuples is in the result? Assume each tuple has schema (A, R.B, S.B, C, D).
Choose from the following answers:
(3,4,2,4,6)
(1,2,4,4,6)
(1,2,2,6,8)
(3,4,4,7,8)
So when I try it, I see that
(1, 2) matches (2, 4, 6)
(3, 4) matches (4, 6, 8)
(3, 4) matches (4, 7, 9)
so I found the following tuples (they all respect the condition):
(1, 2, 2, 4, 6)
(3, 4, 4, 6, 8)
(3, 4, 4, 7, 9)
The problem is that none of these are found in the multiple choices...
Am I doing something wrong?
Thanks for the help!

To compute a theta-join, one basically does a cartesian product of the two relations, (here, R and S), and arrives at all possible combinations. On each of these tuples, you apply the condition theta and get the ones that are true.
Here, the cartesian gives 3x3 = 9 tuples. Of them, 8 tuples satisfy the condition (R.A < S.C AND R.B < S.D). That makes the tuple (3,4,2,4,6) an element of the theta join set.
What you have done is an a theta join for (R.B = S.B AND R.A < S.C AND R.B < S.D). Hope that helps you get the difference.

Related

Data Structure to convert a stream of integers into a list of range

A function is given with a method to get the next integer from a stream of integers. The numbers are fetched sequentially from the stream. How will we go about producing a summary of integers encountered till now?
Given a list of numbers, the summary will consist of the ranges of numbers. Example: The list till now = [1,5,4,2,7] then summary = [[1-2],[4-5],7]
Put the number in ranges if they are continuous.
My Thoughts:
Approach 1:
Maintain the sorted numbers. So when we fetch a new number from a stream, we can use binary search to find the location of the number in the list and insert the element so that the resulting list is sorted. But since this is a list, I think inserting the element will be an O(N) operation.
Approach 2:
Use Balanced binary search trees like Red, Black, or AVL. Each insertion will be O(log N)
and in order will yield the sorted array from which one can compute the range in O(N)
Approach 2 looks like a better approach if I am not making any mistakes. I am unsure if there is a better way to solve this issue.
I'd not keep the original numbers, but aggregate them to ranges on the fly. This has the potential to reduce the number of elements by quite some factor (depending on the ordering and distribution of the incoming values). The task itself seems to imply that you expect contiguous ranges of integers to appear quite frequently in the input.
Then a newly incoming number can fall into one of a few cases:
It is already contained in some range: then simply ignore the number (this is only relevant if duplicate inputs can happen).
It is adjacent to none of the ranges so far: create a new single-element range.
It is adjacent to exactly one range: extend that range by 1, downward or upward.
It is adjacent to two ranges (i.e. fills the gap): merge the two ranges.
For the data structure holding the ranges, you want a good performance for the following operations:
Find the place (position) for a given number.
Insert a new element (range) at a given place.
Merge two (neighbor) elements. This can be broken down into:
Remove an element at a given place.
Modify an element at a given place.
Depending on the expected number und sparsity of ranges, a sorted list of ranges might do. Otherwise, some kind of search tree might turn out helpful.
Anyway, start with the most readable approach, measure performance for typical cases, and decide whether some optimization is necessary.
I suggest maintaining a hashmap that maps each integer seen so far to the interval it belongs to.
Make sure that two numbers that are part of the same interval will point to the same interval object, not to copies; so that if you update an interval to extend it, all numbers can see it.
All operations are O(1), except the operation "merge two intervals" that happens if the stream produces integer x when we have two intervals [a, x - 1] and [x + 1, b]. The merge operation is proportional to the length of the shortest of these two intervals.
As a result, for a stream of n integers, the algorithm's complexity is O(n) in the best-case (where at most a few big merges happen) and O(n log n) in the worst-case (when we keep merging lots of intervals).
In python:
def add_element(intervals, x):
if x in intervals: # do not do anything
pass
elif x + 1 in intervals and x - 1 in intervals: # merge two intervals
i = intervals[x - 1]
j = intervals[x + 1]
if i[1]-i[0] > j[1]-j[0]: # j is shorter: update i, and make everything in j point to i
i[1] = j[1]
for y in range(j[0] - 1, j[1]+1):
intervals[y] = i
else: # i is shorter: update j, and make everything in i point to j
j[0] = i[0]
for y in range(i[0], i[1] + 2):
intervals[y] = j
elif x + 1 in intervals: # extend one interval to the left
i = intervals[x + 1]
i[0] = x
intervals[x] = i
elif x - 1 in intervals: # extend one interval to the right
i = intervals[x - 1]
i[1] = x
intervals[x] = i
else: # add a singleton
intervals[x] = [x,x]
return intervals
from random import shuffle
def main():
stream = list(range(10)) * 2
shuffle(stream)
print(stream)
intervals = {}
for x in stream:
intervals = add_element(intervals, x)
print(x)
print(set(map(tuple, intervals.values()))) # this line terribly inefficient because I'm lazy
if __name__=='__main__':
main()
Output:
[1, 5, 8, 3, 9, 6, 7, 9, 3, 0, 6, 5, 8, 1, 4, 7, 2, 2, 0, 4]
1
{(1, 1)}
5
{(1, 1), (5, 5)}
8
{(8, 8), (1, 1), (5, 5)}
3
{(8, 8), (1, 1), (5, 5), (3, 3)}
9
{(8, 9), (1, 1), (5, 5), (3, 3)}
6
{(3, 3), (1, 1), (8, 9), (5, 6)}
7
{(5, 9), (1, 1), (3, 3)}
9
{(5, 9), (1, 1), (3, 3)}
3
{(5, 9), (1, 1), (3, 3)}
0
{(0, 1), (5, 9), (3, 3)}
6
{(0, 1), (5, 9), (3, 3)}
5
{(0, 1), (5, 9), (3, 3)}
8
{(0, 1), (5, 9), (3, 3)}
1
{(0, 1), (5, 9), (3, 3)}
4
{(0, 1), (3, 9)}
7
{(0, 1), (3, 9)}
2
{(0, 9)}
2
{(0, 9)}
0
{(0, 9)}
4
{(0, 9)}
You could use a Disjoint Set Forest implementation for this. If well-implemented, it gives a near linear time complexity for inserting 𝑛 elements into it. The amortized running time of each insert operation is Θ(α(𝑛)) where α(𝑛) is the inverse Ackermann function. For all practical purposes we can not distinguish this from O(1).
The extraction of the ranges can have a time complexity of O(𝑘), where 𝑘 is the number of ranges, provided that the disjoint set maintains the set of root nodes. If the ranges need to be sorted, then this extraction will have a time complexity of O(𝑘log𝑘), as it will then just perform the sort-operation on it.
Here is an implementation in Python:
class Node:
def __init__(self, value):
self.low = value
self.parent = self
self.size = 1
def find(self): # Union-Find: Path splitting
node = self
while node.parent is not node:
node, node.parent = node.parent, node.parent.parent
return node
class Ranges:
def __init__(self):
self.nums = dict()
self.roots = set()
def union(self, a, b): # Union-Find: Size-based merge
a = a.find()
b = b.find()
if a is not b:
if a.size > b.size:
a, b = b, a
self.roots.remove(a) # Keep track of roots
a.parent = b
b.low = min(a.low, b.low)
b.size = a.size + b.size
def add(self, n):
if n not in self.nums:
self.nums[n] = node = Node(n)
self.roots.add(node)
if (n+1) in self.nums:
self.union(node, self.nums[n+1])
if (n-1) in self.nums:
self.union(node, self.nums[n-1])
def get(self):
return sorted((node.low, node.low + node.size - 1) for node in self.roots)
# example run
ranges = Ranges()
for n in 4, 7, 1, 6, 2, 9, 5:
ranges.add(n)
print(ranges.get()) # [(1, 2), (4, 7), (9, 9)]

Filling in Julia matrix with nested for loops

I have two arrays and an empty matrix, I need to perform a function such that the resulting matrix includes every combination of the two arrays.
Unfortunately I cannot run the arrays separately as they are both optional parameters for the function. I thought that the best way to do this was through nested loops but now I am unsure...
I've tried multiplying one of the matrices so that it includes the necessary duplicates, but I struggled with that as the real data is somewhat larger.
I've tried many versions of these nested loops.
a = [ 1 2 3 ]
b = [ 4 5 6 7 ]
ab = zeros(3,4)
for i = 1:length(a)
for j = 1:length(b)
ab[??] = function(x = a[??], y = b[??])
end
end
ab = [1x4 1x5 1x6 1x7, 2x4 2x5 2x6 2x7, 3x4 3x5 3x6 3x7]
Your problem can be solved by broadcasting:
julia> f(x, y) = (x,y) # trivial example
f (generic function with 1 method)
julia> f.([1 2 3]', [4 5 6 7])
3×4 Array{Tuple{Int64,Int64},2}:
(1, 4) (1, 5) (1, 6) (1, 7)
(2, 4) (2, 5) (2, 6) (2, 7)
(3, 4) (3, 5) (3, 6) (3, 7)
The prime in a' transposes a to make the shapes work out correctly.
But note that a = [ 1 2 3 ] constructs a 1×3 Array{Int64,2}, which is a matrix. For a vector (what you probably call "array"), use commas: a = [ 1, 2, 3 ] etc. If you have your data in that form, you have to transpose the other way round:
julia> f.([1,2,3], [4,5,6,7]')
3×4 Array{Tuple{Int64,Int64},2}:
(1, 4) (1, 5) (1, 6) (1, 7)
(2, 4) (2, 5) (2, 6) (2, 7)
(3, 4) (3, 5) (3, 6) (3, 7)
BTW, this is called an "outer product" (for f = *), or a generalization of it. And if f is an operator ∘, you can use dotted infix broadcasting: a' ∘. b.
Isn't that just
a'.*b
?
Oh, now I have to write some more characters to get past the minimum acceptable answer length but I don't really have anything to add, I hope the code is self-explanatory.
Also a list comprehension:
julia> a = [1,2,3];
julia> b = [4,5,6,7];
julia> ab = [(x,y) for x in a, y in b]
3×4 Array{Tuple{Int64,Int64},2}:
(1, 4) (1, 5) (1, 6) (1, 7)
(2, 4) (2, 5) (2, 6) (2, 7)
(3, 4) (3, 5) (3, 6) (3, 7)

Exact amount of comparisions in Insertion Sort

I want to get number of permutations of {1, ..., n} for which Insertion Sort does exactly n(n-1)/2 comparisions.
For example, for {1, 2, 3, 4} we got (4, 3, 2, 1), (3, 4, 2, 1), (4, 2, 3, 1) etc. - for all of them InsertionSort does 4*3/2 = 6 comparisions.
Anybody know some exact formula for that?
I am thinking about something like (n-1) + 1 = n, where
1 stands for reverse sequence and then we can swap all of (n-1) pairs in reverse sequence.
Here is a hint. The complete list for (1, 2, 3, 4) are:
(4, 3, 2, 1)
(3, 4, 2, 1)
(4, 2, 3, 1)
(2, 4, 3, 1)
(4, 3, 1, 2)
(3, 4, 1, 2)
(4, 1, 3, 2)
(1, 4, 3, 2)
Look at it from last column to first.
Walk step by step through the insertion sorts. See where they merge. Do you see a pattern there?
Reversing it, can you figure out how I generated this list? Can you prove that the list is complete?
The why is what matters here. Just saying 2n-1 is useless.
n(n-1)/2 is the sum of all elements in the range (1, n - 1). Since your sequence has length n, you can expand that range to (0, n - 1).
The number of swaps for each insertion would be:
run # list value swaps
1 [] a 0 (no swaps possible)
2 [a] b 1
3 [b, a] c 2
...
10 [i,...,a] j 9
...
n [...] ? n - 1
So we need to move every element through the entire list in order to achieve the required count of swaps. The number of comparisons can be at most one higher than the number of swaps, which means each value that is being inserted must either be placed at the first or second index of the resulting list. Or
Put differently, assuming ascending ordering of the output:
The input list should in general be a nearly descending list, where each element in the list may be preceded by at most one element that is not larger than the element in question.

combination of elements from vectors [duplicate]

This question already has answers here:
Generating Combinations in python
(3 answers)
Closed 9 years ago.
I have several vectors a=[1 2 3 ...], b=[1 2 3 ...], c=[1 2 3 ...]. I have to find all possible combinations composed from elements taken from each of these vectors like:
[1 1 1]
[1 1 2]
[3 3 3]
etc.
The problem is that I have to exclude combinations containing same elements since order does not matter. For example, it the combination [1 2 1] is presented, the combination [2 1 1] should be excluded. How can I do that in any programing language (python is preferred)?
I am not sure I have completely understood your requirements, but you may find that itertools is helpful.
For example:
from itertools import combinations_with_replacement as cr
for a in cr([1,2,3],3):
print a
prints
(1, 1, 1)
(1, 1, 2)
(1, 1, 3)
(1, 2, 2)
(1, 2, 3)
(1, 3, 3)
(2, 2, 2)
(2, 2, 3)
(2, 3, 3)
(3, 3, 3)
This might work if you're not that worried about efficiency.
from itertools import product
def specialCombinations(*vectors):
return {tuple(sorted(i)): i for i in product(*vectors)}.values()
It takes the Cartesian product of the input vectors and filters the
ones equivalent under permutation.

How to compute a natural join?

Could someone explain to me what is going on here and how to solve this problem?
Suppose relation R(A,B) has the tuples:
A B
1 2
3 4
5 6
and the relation S(B,C,D) has tuples:
B C D
2 4 6
4 6 8
4 7 9
Compute the natural join of R and S. Then, identify which of the following tuples is in the
natural join R |><| S. You may assume each tuple has schema (A,B,C,D).
I don't know what a natural join truly means. Can you explain it to me?
A natural join is joining ("sticking together") elements from two relations where there is a match. In this example
(1, 2) matches (2, 4, 6) so you get (1, 2, 4, 6)
(3, 4) matches (4, 6, 8) so you get (3, 4, 6, 8)
(3, 4) matches (4, 7, 9) so you get (3, 4, 7, 9)
So the natural join is {(1, 2, 4, 6), (3, 4, 6, 8), (3, 4, 7, 9)}
I assume R(A,B) is the master, S(B,C,D) is the detail and B is the foreign key.
SQL: select * from R, S where R.B = S.B
Then the result is:
A B C D
1 2 4 6
3 4 6 8
3 4 7 9

Resources