Relational algebra statement to eliminate duplicates - relational-algebra

Given the following table schema
customer (name: string, credit: integer)
allowance (no: string, type: string)
asker(cname: string, lno: string)
asker.cname and asker.lno are foreign keys referencing customer, respectively allowance , whose keys are name, respectively no (number)
I am trying to write the relational algebra for the query to find pairs of names of customers who share the same allowance. Avoid listing a customer with himself (Tim, Tim) and avoid listing the same over and over (ex. (Tim, Jane) and (Jane, Tim) should be one)
I have tried is:
ρ(Cust1, π no (allownace)
ρ(Cust2, π no (allownace)
π name, name((Cust1 ∩ Cust2)(customer))
I believe this is incorrect. Specially, I am having trouble where I need to find customers with the same allowance and also avoiding the customer himself and repeating.

Eliminate (Tim, Tim) via σ (name1 <> name2). (The relation(ship) is non-reflexive.)
Eliminate just one of (Tim, Jane) & (Jane, Tim) but not (Tim, Tim) via σ (name1 op name2) where op is one of <= or >=. (The relation(ship) is anti-symmetric.)
Eliminate both cases by σ (name1 op name2) where op is one of < or >. (The relationship is non-reflexive and anti-symmetric.)
(Re querying & algebra see this answer.)

Related

Using multiple OR conditions in ArrayFormula

I've a column of names
Smith John Sr
Smith John R
Smith Jr John L
Smith III John
Smith John IV
I know the Last Name is always the first word.
I know the First is sometimes in the second word.
However, sometimes "Jr" or "III" may be there instead.
Otherwise, the title can be in the fourth word if it is not the middle initial.
How do I use an arrayformula with multiple OR conditions so that I can extract into a second column the First Name, The Last Name, and the Applicable Title (e.g. Jr, Sr, III, IV). I want to disregard the initial which I figured I could use a character count "=1" within the OR logic, correct?
If you already have your working formula, you can use + to join conditions that are meant to be at least one of them met (like OR) and * to join conditions that are meant to be together (like AND). For example:
=IF(OR(A1=1,B1=1),1,0)
As an array you could do it:
=ArrayFormula(IF((A1:A=1)+(B1:B=1),TRUE,FALSE))
Since TRUE conditions are considered as 1 and FALSE conditions as 0, when you sum two conditions it will return 1 or more if one of them is TRUE; and with that positive result, it will apply the true statement of the IF. Hope it's useful

Relational Algebra: Problems with division operator

I am writing a query, where I am using the division operator. For some reason I can't get it to work properly, and I can't see why.
pol = pi allergen (sigma allergy_type = 'pollen' (allergies))
tmp = (patient_allergies/pol)
tmp
The above is my query. In pol I am retrieving all allergens, who has the allergy type pollen. This gives me a 1 column, two rows table, which content is the two allergens who has allergy_type pollen.
tmp:
Patient_allergies is a 2 column, 23 row table. The first column is allergens, second column is ssn for the people with those allergens.
What I am trying to do, is to get everyone in patient_allergies, who has the two allergens I found in pol. I'm pretty sure I need to division operator for this, but it returns an empty list, which is incorrect for what I am trying to do.
EDIT: I am using this relational algebra calc, provided by our university: dbis-uibk.github.io/relax/calc/local/uibk/local/0 There is a division operator with another symbol, but yields the same result

Most common "denominators" in a two column list in Google Sheets

How can I find the most commonly found 'Code' (Col B) associated with each unique 'Name' in (Col A) and find the closest value if the 'Code' in Col B is unique?
The image below shows the shared google sheet with Starting data in Columns A & B and the desired output columns in columns C and D. Each Unique Name has associated codes. Column D displays the most commonly occuring Code for each unique name. For example, Buick La Sabre 1 has 3 associated codes in B3,B4,B5 but in D3 only 98761 because it appears more frequently than the other 2 codes do in B2:B. I will explain what I mean by the closest value below.
The Codes that have a count = 1 are unique so the output in column D tries to find the closest match.
However, when the count of the code in B2:B > 1, then the output in column D = to the most frequent code associated with the Name.
Approach when there is 2 or more of the same values in column B
Query
I thought I might use a QUERY with a ORDER BY count(B) DESC LIMIT 2 in a fashion similar to this working equation:
QUERY($A$1:$D$25,"SELECT A, B ORDER BY B DESC Limit 2",1)
but I could not get it to work when I substituted in the Count function.
SORT & INDEX OR VLOOKUP
If the query function can't be fixed to work, then I thought another approach might be to combine a Vlookup/Index after sorting column B in a descending order.
UNIQUE(sort($B$3:$B,if(len($B$3:$B),countif($B$3:$B,$B$3:$B),),0,1,1))
Since a Vlookup or Index using multiple criteria would just pull the first value it finds, you would just end up with the first matching value, we would then get the most frequent value.
Approach when there is < 2 of the same values in column B
This is a little more complicated since the values can be numbers and letters.
A solution like that seen in the image below could be used if everything were a number. In our case there will usually be between 3 - 5 character alphanumeric code starting with 0 - 1 letters numbers and followed by numbers. I'm not sure what the best way to match a code like A1234 would be. I imagine a solution might be to SPLIT off letters and trying to match those first. For example A1234 would be split into A | 1234, then matching the closest letter and then the closest number. But I really am not sure what the best solution to this might be that works within the constraints of Google Sheets.
In the event that a number is equidistant between two numbers, the lower number should be chosen. For example, if 8 is the number and the closest match would be 6 or 10, then 6 should be selected.
In the event that a letter is being used it should work in a similar fashion. For example, thinking of {A, B, C} as {1, 2, 3}, B should preferrentially match to A since it comes before C.
In summary, looking for a way to find the most frequently associated code in col B that is associated with unique names in col A in this sheet and; In the event where there are none of the same codes in B2:B, a formula that will find the closest match for a number or alphanumeric code.
You can use this formula:
=QUERY({range of numerators & denominators}, "select Col2, count(Col2) group by Col2 label Col2 'Denominator', count(Col2) 'Count'")
That outputs something like this:
Denominator
Count
Den 1
Count 1
Den 2
Count 2
use:
=ARRAY_CONSTRAIN(SORTN(QUERY({A3:B},
"select Col1,Col2,count(Col2)
where Col1 is not null
group by Col1,Col2
order by count(Col2) desc,Col2 asc
label count(Col2)''"), 9^9, 2, 1, 1), 9^9, 2)

range restriction/domain restriction in Isabelle

I am trying to input a schema into Isabelle however when I add range restriction or domain restriction into the theorem prover it doesn't want to parse. I have the following schema in LaTeX:
\begin{schema}{VideoShop}
members: \power PERSON \\
rented: PERSON \rel TITLE \\
stockLevel: TITLE \pfun \nat
\where
\dom rented \subseteq members \\
\ran rented \subseteq \dom stockLevel \\
\forall t: \ran rented # \# (rented \rres \{t\}) \leq stockLevel~t
\end{schema}
When inputting this into Isabelle I get the following:
locale videoshop =
fixes members :: "PERSON set"
and rented :: "(PERSON * TITLE) set"
and stockLevel :: "(TITLE * nat) set"
assumes "Domain rented \<subseteq> members"
and "Range rented \<subseteq> Domain stockLevel"
and "(\<forall> t. (t \<in> Range rented) \<and> (card (rented \<rhd> {t}) \<le> stockLevel t))"
begin
.....
It all parses except for the last expression \<forall> t.....
I just don't understand how to add range restriction into Isabelle.
There are multiple problems with your input.
The ⊳ symbol you are using in the expression
(rented ⊳ {t})
is not associated with any operator, so it can't be parsed. I'm not quite sure what it's supposed to mean. From the high-level idea of the specification I'm guessing something along the lines of "all persons who rented a specific title". This can be expressed most easily with a set comprehension:
{p. (p, t) ∈ rented}
You translated the bounded universal quantifier into a quantifier containing a conjunction. This is likely not what you want, because it says "for all t, t is in the range of rented and something else". Isabelle has notation for bounded quantifiers.
∀t ∈ Range rented. ...
You are trying to use stockLevel as a function, which it isn't. From your LaTeX input I gather that it's supposed to be a partial function. Isabelle calls these maps. The appropriate type is:
TITLE ⇀ nat
Note the "harpoon" symbol instead of a function arrow. The domain function for maps is called dom. The second locale assumption can be expressed as:
Range rented ⊆ dom stockLevel
Given that, you can use stockLevel as a function from TITLE to nat option.

Oracle NOT BETWEEN for string comparison does not give same result as <= and >=

Using Oracle 11gR2 Expression Edition.
My data looks like following
ordertype
---------
ZOCO
ZOSA
ZOST
We are trying to find out records where the column is not between a certain range of values.
If I run a query with <= and >= operators:
SELECT * FROM table where ordertype <= 'ZAAA' OR ordertype >= 'ZZZZ';
then I get 0 results. This is the right answer.
However, if I use NOT BETWEEN:
SELECT * FROM table where ordertype NOT BETWEEN 'ZAAA' AND 'ZZZZ';
, then it gives multiple hits.
My understanding is that both syntax should give the same result but they are not. What am I missing? Reason I want to use NOT BETWEEN because a lot of our existing code already has this syntax and I do not want to change it without understanding the reasons.
Thank you.
Thanks for all those who posted. I ran the queries again and after fixing the "OR" in the first query, the results are the same. I still have the question of why Oracle character sorting is not recognizing it as expected, but my question which is about difference between NOT BETWEEN and <> was a false alarm. I apologize for confusion.
SELECT * FROM table where ordertype <= 'ZAAA' AND ordertype >= 'ZZZZ';
No string can be <= 'ZAAA' and >= 'ZZZZ'.
You need to use a disjunction instead:
SELECT * FROM table where ordertype < 'ZAAA' OR ordertype > 'ZZZZ';
BTW, given that BETWEEN is inclusive, NOT BETWEEN is exclusive
This is a common pitfall. you have to remember the De Morgan's Laws:
not (A and B) is the same as (not A) or (not B)
Feel free to experiment with this simple live example to convince yourself that those results are quite coherent: http://sqlfiddle.com/#!4/d41d8/38326
That being said, the only way (I can see) for the string like ZOCO for not being between ZAAA and ZZZZ would be:
having some hidden character just behind the Z (i.e.: 'Z'||CHR(0)||'OCO')
or using a locale such as Z-something is actually considered as a different letter, with a collation order outside of the given range. I don't know if such locale exists, but for example, in Welch, LL is considered as a single letter that should be sorted after the plain L. See http://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions
or having homogplyphs such as 0, 𐒠 or О instead of O in your data.
If it's not between the values, it has to be either < OR >, not AND.
In the first query, you ask for the records that are at the same time less than 'ZAAA' and also greater than 'ZZZZ'. Of course, there is no such value that fullfills both requirements, hence zero records are returned.
In the second query, you ask for records, that are either less than 'ZAAA' or greater than 'ZZZZ' (ie not between those boundaries [not between...]). There is a possibility that such records exist, and as your select statement proves, there are indeed such records, that are returned by the statement.
Your understanding that both statements are same is incorrect. NOT BETWEEN is not evaluated the way you're thinking. It simply returns the results which fall outside evaluation of BETWEEN for the parameters.
IF you check Oracle documentation for BETWEEN, it says -
The value of
expr1 NOT BETWEEN expr2 AND expr3
is the value of the expression
NOT (expr1 BETWEEN expr2 AND expr3)

Resources