how to write a query in relational algebra having two tables with a common attribute with same name but different relevance - relational-algebra

I have three schemas:
SUPPLIER(SNO,SNAME,STATUS,CITY)
PROJECT(JNO,JNAME,CITY)
SPJ(SNO,JNO,QTY)
The query is:
Get jno values for projects supplied to any Project in 'BOMBAY' by Supplier in 'DELHI'.
How do I write this query in Relational Algebra?

Aman, welcome to StackOverflow. Here's a possible answer:
WITH SSUPPLIER := SUPPLIER RENAME {SCITY := CITY}
PPROJECT := PROJECT RENAME {PCITY := CITY}
:
JOIN( SSUPPLIER, PPROJECT, SPJ )
WHERE PCITY = 'BOMBAY' AND SCITY = 'DELHI'
{ JNO }
(This is using the Tutorial D variety of Relational Algebra, as described in the textbooks of Chris Date & Hugh Darwen. I see that the given schemas are taken from those books.)
As Philipxy points out, it's a good idea to say what textbook you are using, and what variety of RA.

Related

Is MNL the right model to use when the choice options vary across observations?

In a survey of 100 people, I am asking each person to choose between product A and product B. I ask each person this question 3 times, but each time I present a different set of products. Say, first time, Person 1 is asked to choose between 'Phone 1' and 'Phone 2', given certain attributes of each phone. The second time the choice is again 'Phone 1' vs. 'Phone 2', but a different set of attributes for each phone.
A person is presented three attributes associated with the two phone alternatives every time the question is asked. So, each time between Phone 1 and Phone 2, the attributes of the phone such as cost, memory and camera pixels are presented so that user can choose which set of attributes is most attractive, Phone 1's or Phone 2's.
Overall, 3*100 = 300 responses; 3 responses per person. Each time the attributes cost, memory and camera pixels presented and user asked to choose the feature set they prefer.
My goal is to analyze how users value features of a phone vs. cost of the phone.
In this scenario, can I use a MNL - even though each time I asked the person a question, I only presented two choices ? My understanding is that MNL is sued when (a) there are multiple choices and (b) the choice options do not change across observations, i.e. each person is asked to choose between multiple products, say A, B, C and A, B, C do not change across observations.
In the scenario described above, the two choices varied across the three times the same person was asked the question ? If not MNL, should I rather create a binary logit model given that user only had to choose between two options when the question was asked (even though he was asked the question three times)? If I can use binary logit, should I be concerned that the choice set of products change across observations ? or should I let the attributes defined in each of the rows address the differences in product choices across observations.
I have setup the data as follows (thinking I can do MNL but may be I should set it up differently and use another modeling approach?):
I am working on designing and analyzing similar survey but mine is related to transportation. I am at the beginning level and I am still new to the whole concept, however I will give you an advice and reference maybe it is helpful.
First point: I have come cross 3 models as following from a useful video on YouTube:
MNL refers to Multinomial Logit Model. MNL is used with
alternative-invariant regressors (for example salary of participant
in the survey, or his/her gender …).
Conditional logit model is used with alternative-invariant (gender,
salary, education level …) and alternative-variant regressors (cost
of the product, memory, camera pixel …)
Mixed logit model which uses random parameters. It is also used with
alternative-invariant (gender, salary, education level …) and
alternative-variant regressors (cost of the product, memory, camera
pixel …)
Note regarding alternative-invariant and alternative-variant regressors:
The gender of person participating in the survey will NOT vary between Product A or Product P, so it is alternative-invariant regressor. While price of product could vary between Product A and Product B so it is called alternative-variant regressors.
Based on above I assume you need to use conditional logit model or mixed logit model.
For me I couldn’t find a special function in R for the conditional logit model or mixed logit model. The same mlogit function is used, refer to the examples below for the help of mlogit package:
a pure "multinomial model"
summary(mlogit(mode ~ 0 | income, data = Fish))
a pure "conditional" model
summary(mlogit(mode ~ price + catch, data = Fish))
a "mixed" model
m <- mlogit(mode ~ price+ catch | income, data = Fish)
summary(m)
same model with charter as the reference level
m <- mlogit(mode ~ price+ catch | income, data = Fish, reflevel = "charter")
From the examples above, I think (but NOT sure) that in the Manual of mlogit package, they refer to mixed logit when you used both alternative-invariant and alternative-variant regressors. While conditional model when you have only alternative-variant regressors. On the other hand, multinomial model when you have only multinomial alternative-invariant regressors.
Second point: There is something called “panel data” when you are asking the same person to choose one product for each choice-set. Same person here means that in your model you are taking into consideration, the gender, the salary, the education level … which they will stay the same for the same person. Check this: https://en.wikipedia.org/wiki/Panel_data
To use panel techniques please refer to help in mlogit package in R. I am quoting from it the following:
“panel only relevant if rpar is not NULL and if the data are repeated observations of the same unit ; if TRUE, the mixed-logit model is estimated using panel techniques”
So in my understanding, if you want to use the panel techniques you have to use random draws because panel will be true and rpar will not be NULL.
Moreover, for example about using the panel data, please refer to the below example from “Estimation of multinomial logit models in R : The mlogit Packages” by Yves Croissant
data("Train", package = "mlogit")
Tr <- mlogit.data(Train, shape = "wide", varying = 4:11, choice = "choice", sep = "_", opposite = c("price", "time", "change", "comfort"), alt.levels=c("A", "B"), id.var ="id")
Train.ml <- mlogit(choice ~ price + time + change + comfort, Tr)
Train.mxlc <- mlogit(choice ~ price + time + change + comfort, Tr, panel = TRUE, rpar = c(time = "cn", change = "n", comfort = "ln"), correlation = TRUE, R = 100, halton = NA)
Train.mxlu <- update(Train.mxlc, correlation = FALSE)
I hope that is useful to you.

Long queries in golang

This is a question based on absolute no knowledge of golang and the aim is to find if there is a way to make long queries readable.
My attempt is to put the sql text in a variable and then execute the variable.
Pseudocode (no real code):
var query =
SELECT * FROM foo
UNION ALL
SELECT * FROM bar
UNION ALL
SELECT * FROM other
...
db.prepare (var query)
db.query (var query)
This is maybe a dumb question, but I have searched and found no clue how to make long queries more "readable" in go. Most examples are based on "hello world" level. In the real world queries can be quite long.
TIA,
You can declare the query as a multiline string literal.
query := `
SELECT * FROM foo
UNION ALL
SELECT * FROM bar
UNION ALL
SELECT * FROM other`
And use it with DB.Query.
rows, err := db.Query(query)
There are many different drivers for many different sql databases you can use. They all would give you a DB object to work with. So you can use DB.Prepare, DB.Query appropriately. Check docs of database/sql package for more info.
There are two methods for this
1. Using backquote:
Use (backquote\backtick) which is with symbol ` in keyboard.
So that your query will look like
query := `INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country)
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway');`
2. Using string concatenation:
Use string concatenation characters between lines So that your query will look like
query :="INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country)"+
"VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway');"
I prefer the 1st one :)

Relational algebra for one-to-many relations

Suppose I have the following relations:
Academic(academicID (PK), forename, surname, room)
Contact (contactID (PK), forename, surname, phone, academicNO (FK))
I am using Java & I want to understand the use of the notation.
Π( relation, attr1, ... attrn ) means project the n attributes out of the relation.
σ( relation, condition) means select the rows which match the condition.
⊗(relation1,attr1,relation2,attr2) means join the two relations on the named attributes.
relation1 – relation2 is the difference between two relations.
relation1 ÷ relation2 divides one relation by another.
Examples I have seen use three tables. I want to know the logic when only two tables are involved (academic and contact) as opposed to three (academic, contact, owns).
I am using this structure:
LessNumVac = Π( σ( job, vacancies < 2 ), type )
AllTypes = Π( job, type )
AllTypes – LessNumVac
How do I construct the algebra for:
List the names of all contacts owned by academic "John"
List the names of all contacts who is owned by academic "John".
For that, you would join the Academic and Conctact relations, filter for John, and project the name attributes. For efficiency, select John before joining:
πforename, surename (Contact ⋈academicNO = academicID (πacademicID (σforename = "John" Academic))))
You have to extend your operations set with natural join ⋈, Left outer join ⟕ and/or Right outer join ⟖ to show joins.
There is a great Wikipedia article about Relational Algebra. You should definitely read that one!

Oracle 9i Sub query

Hi Can any one help me out of this query forming logic
SELECT C.CPPID, c.CPP_AMT_MANUAL
FROM CPP_PRCNT CC,CPP_VIEW c
WHERE
CC.CPPYR IN (
SELECT C.YEAR FROM CPP_VIEW_VIEW C WHERE UPPER(C.CPPNO) = UPPER('123')
AND C.CPP_CODE ='CPP000000000053'
and TO_CHAR(c.CPP_DATE,'YYYY/Mon')='2012/Nov'
)
AND UPPER(C.CPPNO) = UPPER('123')
AND C.CPP_CODE ='CPP000000000053'
and TO_CHAR(c.CPP_DATE,'YYYY/Mon') = '2012/Nov';
Please Correct me if i formed wrong query structure, in terms of query Performance and Standards. Thanks in Advance
If you have some indexes or partitioned tables I would not use functions on columns but on variables, to be able to use indexes/select partitions.
Also I use ANSI 92 SQL syntax. You don't specify(or not directly) a join contition between cpp_prcnt and cpp_view so it is actually a cartesian product(cross join)
SELECT C.CPPID, c.CPP_AMT_MANUAL
FROM CPP_PRCNT CC
CROSS JOIN CPP_VIEW c
WHERE
CC.CPPYR IN (
SELECT C.YEAR
FROM CPP_VIEW_VIEW C
WHERE C.CPPNO = '123'
AND C.CPP_CODE ='CPP000000000053'
AND trunc(c.CPP_DATE,'MM')=to_date('2012/Nov','YYYY/Mon')
)
AND C.CPPNO = '123'
AND C.CPP_CODE ='CPP000000000053'
AND trunc(c.CPP_DATE,'MM')=to_date('2012/Nov','YYYY/Mon')
If you show us the definition of cpp_view_view(seems to be a view over cpp_view), the definition(if simple) of CPP_VIEW and what you're trying to achieve, I bet there are more things to be improved/fixed.
There are a couple of things you could improve:
if possible, get rid of the UPPER() in the comparison - this will render any indices useless. If that's not possible, consider a function-based index on UPPER(CPPNO)
do not convert your DATE column to a string to compare it with a string - do it the other way round (i.e. convert your string to a date => only one conversion needed instead of one per table row, use of indices possible)
play around with EXISTS instead of IN, as suggested by Dileep - might be faster

SQL to Relational Algebra

How do I go about writing the relational algebra for this SQL query?
Select patient.name,
patient.ward,
medicine.name,
prescription.quantity,
prescription.frequency
From patient, medicine, prescription
Where prescription.frequency = "3perday"
AND prescription.end-date="08-06-2010"
AND canceled = "Y"
Relations...
prescription
prescription-ref
patient-ref
medicine-ref
quantity
frequency
end-date
cancelled (Y/N))
medicine
medicine-ref
name
patient
Patient-ref
name
ward
I will just point you out the operators you should use
Projection (π)
π(a1,...,an): The result is defined as the set that is obtained when all tuples in R are restricted to the set {a1,...,an}.
For example π(name) on your patient table would be the same as SELECT name FROM patient
Selection (σ)
σ(condition): Selects all those tuples in R for which condition holds.
For example σ(frequency = "1perweek") on your prescription table would be the same as SELECT * FROM prescription WHERE frequency = "1perweek"
Cross product(X)
R X S: The result is the cross product between R and S.
For example patient X prescription would be SELECT * FROM patient,prescription
You can combine these operands to solve your exercise. Try posting your attempt if you have any issues.
Note: I did not include the natural join as there are no joins. The cross product should be enough for this exercise.
An example would be something like the following. This is only if you accidentally left out the joins between patient, medicine, and prescription. If not, you will be looking for cross product (which seems like a bad idea in this case...) as mentioned by Lombo. I gave example joins that may fit your tables marked as "???". If you could include the layout of your tables that would be helpful.
I also assume that canceled comes from prescription since it is not prefixed.
Edit: If you need it in standard RA form, it's pretty easy to get from a diagram.
alt text http://img532.imageshack.us/img532/8589/diagram1b.jpg

Resources