Relational algebra for banking scenario - relational-algebra

I don't know how to solve the relational algebra questions.
Deposit (Branch, Acc-No, Cust-Name, Balance)
Loan (Branch, Loan-No, Cust-Name, Balance)
Branch (Branch, Assets, Branch-County)
Customer (Cust-Name, Cust-County, Branch)
Produce a relation that shows the branch, customer name, balance and account number for all
customers that have a loan bigger than £2000.00 and all customers that have a deposit account
with a balance smaller than £150.00. All customers should be at the Romford branch.
This is what I came up with so far. Is it correct?
π Branch, Acc-No, Cust-Name, Balance (
σ Loan.Balance > 2000 ∧ branch = 'Romford' (Loan)
∪ σ Deposit.Balance < 150 ∧ branch = 'Romford' (Customer ∩ Deposit)
)
My tutor gave this but later said it was wrong:
π Branch, Cust-Name, Balance, Acc-No (
σ Balance < 150 ∧ branch = 'Romford' (Deposit)
∪
π Branch, Cust-Name, Balance, Loan-No
σ Balance > 2000 ∧ branch = 'Romford' (Loan)
)

Given statements. Every table/relation has a statement parameterized by columns/attributes. (Its "characteristic predicate".) The rows/tuples that make the statement true go in the table/relation. First find the statements for the given tables/relations:
// customer [Cust-Name] has £[Balance] in account [Acc-No] at branch [Branch]
Deposit (Branch, Acc-No, Cust-Name, Balance)
// customer [Cust-Name] loan [Loan-No] balance is £[Balance] at branch [Branch]
Loan(Branch, Loan-No, Cust-Name, Balance)
. . .
Notice that the table/relation definition is shorthand for the statement.
Query statements. Now put these given statements together to get a statement that only the rows we want satisfy. Use AND, OR, AND NOT, AND condition. Keep or drop names. Use a new name if you need one.
I will do an example like part of your assignment:
-- informal style version
branch, customer name, account balance and account number for
customers that have a loan bigger than £2000
at Romford branch
I want those rows. So I want a statement that exactly those rows make true. So I make statements that get closer and closer to the one I want. So I start:
-- columns/attributes Cust-Name, Loan-No, Balance, Branch
customer [Cust-Name] loan [Loan-No] balance is £[Balance] at branch [Branch]
Now I want to use a different name for the loan balances because I want to end up using Balance for account balances only:
-- columns/attributes Cust-Name, Loan-No, Loan-Balance, Branch
customer [Cust-Name] loan [Loan-No] balance is £[Loan-Balance] at branch [Branch]
Now I want account balances too:
-- columns/attributes Cust-Name, Loan-No, Loan-Balance, Branch, Balance, Acc-No
customer [Cust-Name] loan [Loan-No] balance is £[Loan-Balance] at branch [Branch]
AND customer [Cust-Name] has £[Balance] in account [Acc-No] at branch [Branch]
If I had only used one name Balance then it would have had to be a loan balance and an account balance. The rows/tuples would have been for customers with a loan balance the same as an account balance. And the Balance column/attribute would have been those values.
Now I want to limit the balances and the branch:
-- columns/attributes Cust-Name, Loan-No, Loan-Balance, Branch, Balance, Acc-No
customer [Cust-Name] loan [Loan-No] balance is £[Loan-Balance] at branch [Branch]
AND customer [Cust-Name] has £[Balance] in account [Acc-No] at branch [Branch]
AND [Loan-Balance]>2000 AND [Branch]='Romford'
Now I only want some of the columns/attributes:
-- statement style version
-- columns/attributes Cust-Name, Branch, Balance, Acc-No
Keeping Branch, Cust-Name, Balance, Acc-No: (
customer [Cust-Name] loan [Loan-No] balance is £[Loan-Balance] at branch [Branch]
AND customer [Cust-Name] has £[Balance] in account [Acc-No] at branch [Branch]
AND [Loan-Balance]>2000 AND [Branch]='Romford')
This is a statement for the example rows.
You can use "Keeping names to keep" or "Dropping names to drop. (In logic, Dropping is called FOR SOME or THERE EXISTS. Because we want the statement inside it to be true FOR SOME value(s) for a name, ie we want that THERE EXISTS value(s) for a name that makes the statement true.)
Query shorthand. Now replace each statement by its shorthand.
In my example I get:
-- shorthand style version
-- columns/attributes Cust-Name, Branch, Balance, Acc-No
Keeping Branch, Cust-Name, Balance, Acc-No: (
Loan (Branch, Loan-No, Cust-Name, Loan-Balance)
AND Deposit (Branch, Acc-No, Cust-Name, Balance)
AND Loan-Balance>2000 AND Branch='Romford')
(Notice that in the second half of your question's attempt you don't need Customer. Because you don't need its statement. Because you can state everything about the rows you want without it. So it just adds Cust-County which you eventually throw away without using.)
Query algebra Now to get the algebra replace:
every statement by its table/relation
every AND of table/relation statements by ⋈ (natural join)
every OR of table/relation statements (which must have the same columns/attributes) by ∪ (union)
every AND NOT of statements (which must have the same columns/attributes) by \ (difference)
every AND condition by σ condition
every Keeping names to keep by π names to keep (projection) (and Dropping by π names to keep)
every column/attribute renaming in a given statement by ρ (rename).
∩ (intersection) and x (product) are special cases of ⋈ (∩ for both sides the same columns/attributes and x for no shared columns/attributes).
Remember that column/attribute names get introduced by table/relation statements & tables/relations but removed by Keeping/Dropping & π. Remember that a renaming in a given statement becomes a ρ.
I get:
-- algebra style version
π Branch, Cust-Name, Balance, Acc-No
σ Branch='Romford' σ Loan-Balance>2000
((ρ Loan-Balance/Balance Loan) ⋈ Deposit)
(I don't know what particular algebra notation you are supposed to use. Learn its rules for dotting names and using equijoin vs natural join. Also I don't know what kind of σ conditions it allows.)
Follow the example. So take a description of rows and write a statement that exactly those rows make true. Then convert to given statements. Then replace by shorthands. Then replace by algebra.
What are your given statements?
What are their shorthands?
What are their table/relation names?
What are their columns/attributes?
What are the columns/attributes of the rows/tuples you want?
What is a clear, plain, natural language statement that the rows you want make true but the rows you don't want don't? But avoid pronouns, because they don't translate to algebra; just reuse column/attribute names. And if you need a new name then just invent one and make statements about it.
What is one given statement that is part of your overall statement?
What is the shorthand version?
What is the algebra version?
Did you change a name in a given statement? Then rename it in its table/relation via ρ.
Continue for another part of your overall statement.
Do you want a combination of given statements? Then use AND, OR, AND NOT and Keeping/Dropping.
Do you not want to know the value of a column/attribute that you mentioned? Then use Keeping/dropping (then π).
Did you mention too many names? Then keep the ones you want via π (and corresponding Keeping/Dropping).
Keep going.
You will have to find the right order to say things in. Try different orders. Because you have to use NOT via AND NOT of statements/tables or via a condition. And OR and AND NOT of statements/tables must have the same columns/attributes on each side. And a name in in a condition has to be mentioned in a statement it is ANDed with.
Your question. It took me a while to parse and correct the goal you gave:
Show the branch, customer name, balance and account number for all the
customers that have a loan bigger than £2000 and all the customers
with deposit account with a balance smaller than £150. All these
customers should be at Romford branch.
This is:
Show the branch, customer name, account balance and account number for ( all the customers that have a loan bigger than £2000 and all the customers with deposit account with a balance smaller than £150 ) . All these customers should be at Romford Branch.
(I had to add a word to make sense of this. But it is unbelievable that this is supposed to mean "branch, customer name, balance and number [labelled what??] where balance and number are loan balance and number for customers with loans > 2000 or balance and number are account balance and number for customers with account balances < 150".)
This has an AND in the middle so you might think it will give an algebra ⋈ (natural join) or ∩ (intersection). But you must describe your columns/attributes only in terms of given statements and conditions. It turns out that the AND gets turned into an OR. Also it turns out that we have to add an extra Deposit statement. So that we have the loan customers account info. Remember that you have to have the same columns/attributes on both side of an OR (or AND NOT).
First "branch, customer name, account balance, account number for" "all the customers that have a loan bigger than £2000". This looks like what we did up above. But this time lets limit the branches later:
-- statement A
-- columns/attributes Cust-Name, Branch, Balance, Acc-No
Keeping Branch, Cust-Name, Balance, Acc-No: (
customer [Cust-Name] loan [Loan-No] balance is £[Loan-Balance] at branch [Branch]
AND customer [Cust-Name] has £[Balance] in account [Acc-No] at branch [Branch]
AND [Loan-Balance]>2000)
Now "branch, customer name, account balance, account number for" "all the customers with deposit account with a balance smaller than £150". This is simpler than before so I hope you can understand it directly:
-- statement B
-- columns/attributes Cust-Name, Branch, Balance, Acc-No
customer [Cust-Name] has £[Balance] in account [Acc-No] at branch [Branch]
AND [Balance]<150
Now we want the rows that make the statement "statement A OR statement B" true:
-- columns/attributes Cust-Name, Branch, Balance, Acc-No
Keeping Branch, Cust-Name, Balance, Acc-No: (
customer [Cust-Name] loan [Loan-No] balance is £[Loan-Balance] at branch [Branch]
AND customer [Cust-Name] has £[Balance] in account [Acc-No] at branch [Branch]
AND [Loan-Balance]>2000)
OR customer [Cust-Name] has £[Balance] in account [Acc-No] at branch [Branch]
AND [Balance]<150
Now we limit the branch:
-- statement for goal
-- columns/attributes Cust-Name, Branch, Balance, Acc-No
( Keeping Branch, Cust-Name, Balance, Acc-No: (
customer [Cust-Name] loan [Loan-No] balance is £[Loan-Balance] at branch [Branch]
AND customer [Cust-Name] has £[Balance] in account [Acc-No] at branch [Branch]
AND [Loan-Balance]>2000)
OR customer [Cust-Name] has £[Balance] in account [Acc-No] at branch [Branch]
AND [Balance]<150
)
AND [Branch]='Romford'
This is a statement for the wanted rows. Now we replace by shorthands:
-- shorthand for goal
-- columns/attributes Cust-Name, Branch, Balance, Acc-No
( Keeping Branch, Cust-Name, Balance, Acc-No: (
Loan (Branch, Loan-No, Cust-Name, Loan-Balance)
AND Deposit (Branch, Acc-No, Cust-Name, Balance)
AND [Loan-Balance]>2000)
OR Deposit (Branch, Acc-No, Cust-Name, Balance)
AND [Balance]<150
)
AND [Branch]='Romford'
An answer. Now we replace by algebra:
-- algebra style version
σ Branch='Romford'
( π Branch, Cust-Name, Balance, Acc-No
σ Loan-Balance>2000
((ρ Loan-Balance/Balance Loan) ⋈ Deposit)
∪ σ Balance<150 Deposit))
PS: Algebra = loopless calculation The whole point of the relational algebra is that statements exactly correspond to algebraic expressions: statements correspond to tables/relations and (statements') logic operators correspond to algebra operators. But the algebra version is a loopless description that can be automatically calculated. The rows that make a statement true are the value of its algebraic version. We give the rows for the table/relation statements and the algebra calculates the rows for any other statement we combine from them.

This is the answer I came up with:
π Branch, Cust-Name, Balance, Acc-No, (σ Balance < 100^branch=”Romford” (Deposit))
∪
π Branch, Cust-Name, Balance, Loan-No, (σ Balance > 2500 ^branch=”Romford”(Loan))

Related

Is this natural join operation used correctly? (Relational Algebra)

I have the following task given from the professor:R-E Modell
Assume the companies may be located in several cities. Find all companies located in every city in which “Small Bank Corporation” is
located.
Now the professor's solution is the following:
s ← Π city (σ company_name=’Small Bank Corporation’ (company))
temp1 ← Π comp_id, company_name (company)
temp2 ← Π comp_id, company_name ((temp1 × s) − company)
result ← Π company_name (temp1 − temp2)
I for myself found a completely different solutions with a natural join operation which seems much simpler:
What I tried to do was using the natural joint operation which whe defined as following that a relation r and s are joined on their common attributes. So I tried to get all city names by using a projection on a selection of all companies with the company_name "Small Bank Cooperation". After that I joined the table with the city names with the company table, so that I get all company entrys which have the city names in it.
company ⋈ Π city (σ company_name=”Small Bank Cooperation” (company)))
My question now is if my solution is also valid, since it seems a little bit to trivial?
Yours isn't the same.
My answer here says how to query relationally. It uses a version of the relational algebra where headings are sets of attribute names. My answer here summarizes it:
Every query expression has an associated (characteristic)
predicate--statement template parameterized by attributes. The tuples
that make the predicate into a true proposition--statement--are in
the relation.
We are given the predicates for expressions that are relation names.
Let query expression E have predicate e. Then:
R ⨝ S has predicate r and s
R ∪ S has predicate r or s
R - S has predicate r and not s
σ p (R) has predicate r and p
π A (R) has predicate exists non-A attributes of R [r]
When we want the tuples satisfying a certain predicate we find a way
to express that predicate in terms of relation operator
transformations of given relation predicates. The corresponding query
returns/calculates the tuples.
Your solution
company ⋈ Π city (σ company_name=”Small Bank Corporation” (company)))
is rows where
company company_id named company_name is in city
AND FOR SOME company_id & company_name [
company company_id named company_name is in city
AND company_name=”Small Bank Corporation”]
ie
company company_id named company_name is in city
AND FOR SOME company_id [
company company_id named ”Small Bank Corporation” is in city]
ie
company company_id named company_name is in city
AND some company named ”Small Bank Corporation” is in city
You are returning rows that have more columns than just company_name. But your companies are not the requested companies.
Projecting your rows on company_name gives rows where
some company named company_name is in some city
AND some company named ”Small Bank Corporation” is in that city
After that I joined the table with the city names with the company
table, so that I get all company entrys which have the city names in
it.
That isn't clear about what you get. However the companies in your rows are those in at least one of the SBC cities. The request was for those in all of the SBC cities:
companies located in every city in which “Small Bank Corporation” is located
The links I gave tell you how to compose queries but also convert between query result specifications & relational algebra expressions returning a result.
When you see a query for rows matching "every" or "all" of some other rows you can expect that that part of your query involves relational-division or some related idiom. The exact algebra depends on what is intended by the--frequently poorly/ambiguously expressed--requirements. Eg whether "companies located in every city in which" is supposed to be no companies (division) or all companies (related idiom) when there are no such cities. (The normal mathematical interpretation of your assignment is the latter.) Eg whether they want companies in exactly all such cities or at least all such cities.
(It helps to avoid "all" & "every" after "find" & "return", where it is redundant anyway.)
Database Relational Algebra: How to find actors who have played in ALL movies produced by “Universal Studios”?
How to understand u=r÷s, the division operator, in relational algebra?
How to find all pizzerias that serve every pizza eaten by people over 30?

oracle's analytical function issue

Please let me know if the following is off topic, or not clear, or too specific, or too complex to understand. I think the following is a challenge to describe, understand and solve.
CIF=cost, insurance, frieght (basically it is the import value)
The simiplified version of input table (Import) looks like this:
enter image description here
So from January to June the value 1 is assigned to SixMonthPeriod column, and the rest of the months are given the value 2.
I then want to calculate unit price for a six period, thus I use
select SixMonthPeriod, ProductDescrip, Sum(weight), Sum (CIF), (Sum (CIF))/(Sum(weight)) as UnitPrice
from Import
group by SixMonthPeriod, ProductDescrip;
This is fine, but I then want to calculate inflation for each product (over a six month period )where I need to use lag (an oracle analytical function). The six month period has to be fixed. Thus, if the previous period for a particular product is missing, then the unit price should be zero. I want to re-begin/begin the calculation of inflation for each product. The unit price and inflation equations looks like the following, respectively:
unit price = (Sum(weight) over a six month period)/(Sum (CIF) over a six month period)
inflation = (Current Unit price - previous unit price)/(previous unit price)
I use the following SQL to calculate inflation for a six month period for each product, where the calculation begins again for each product:
select Yr, SixMthPeriod, Product, UnitPrice, LagUnitPrice, ((UnitPrice -LagUnitPrice)/LagUnitPrice) as inflation
from (select Year as Yr, SixMonthPeriod as SixMthPeriod,
ProductDescrip as product, (Sum (CIF))/(Sum(weight)) as UnitPrice,
lag((Sum (CIF))/(Sum(weight)))
over (partition by ProductDescrip order by YEAR, SixMonthPeriod) as LagUnitPrice
From Import
group by Year, SixMonthPeriod, ProductDescrip)
The problem is the inflation period is not fixed.
For example, for the result, I get the following:
enter image description here
The first two rows are fine and there should be null values because they are my first line, thus there is no LagUnitPrice and inflation.
The third line has a problem where it has taken 0.34 as the LagUnitPrice but actually it is zero (for the period 2016 where SixMthPeriod=1 for the product barley). the oracle analytical functions does not take into account missing rows (e.g. for the period 2016 where SixMthPeriod=1 for the product barley).
how do I fix this problem (if you understand me)?
I have 96 rows, thus I can export the file to excel, and use excel's formulas to fix these exceptions.
You can autogenerate missing periods with nullable price, attach them to your data and do the rest as you did:
select product, year, smp, price, prev_price, (price - prev_price) / prev_price inflation
from (
select product, year, smp, price,
lag(price) over (partition by product order by year, smp) prev_price
from (
select year, ProductDescrip product, SixMonthPeriod smp, sum(CIF)/sum(weight) price
from Import
group by year, SixMonthPeriod, ProductDescrip) a
full join (
select distinct year, productdescrip product, column_value smp
from import cross join table(sys.odcinumberlist(1, 2))) b
using (product, year, smp))
order by product, year, smp
SQLFiddle demo
Subquery b is responsible for generating all periods, you can run it separately to see what it produces.

Find the names of students who are not enrolled in any course - Students, Faculty, Courses, Offerings, Enrolled

Given the database below, project the names of the students who are not enrolled in a course using relational algebra.
Students(snum, sname, major, standing, age, gpa)
Faculty(fid, fname, deptid)
Courses(cnum, cname, course_level, credits)
Offerings(onum, cnum, day, starttime, endtime, room, max_occupancy, fid)
Enrolled(snum, onum)
I can get the snum of all students not enrolled in a course with:
π snum Students - π snum Enrolled
But how do I project the sname of the student with the snums that I find?
Every base table holds the rows that make a true proposition (statement) from some (characteristic) predicate (statement template parameterized by columns). The designer gives the predicates. The users keep the tables updated.
-- rows where student [snum] is named [sname] and has major [major] and ...
Students
-- rows where student [snum] is enrolled in offering [onum]
Enrolled
Every query result holds the rows that make a true proposition from some predicate. The predicate of a relation expression is combined from the predicates of its argument expressions depending on its predicate nonterminal. The DBMS evaluates the result.
/* rows where
student [snum] is named [sname] and has major [major] and ...
AND student [snum] is enrolled in offering [onum]
*/
Student ⨝ Enrolled
AND gives NATURAL JOIN, ANDcondition gives RESTRICTcondition, EXISTScolumns gives PROJECTother columns. OR & AND NOT with the same columns on both sides give OR & MINUS. Etc.
/* rows where
THERE EXISTS sname, major, standing, age & gpa SUCH THAT
student [snum] is named [sname] and has major [major] and ...
*/
π snum Students
/* rows where
THERE EXISTS onum SUCH THAT
student [snum] is enrolled in offering [onum]
*/
π snum Enrolled
/* rows where
( THERE EXISTS sname, major, standing, age & gpa SUCH THAT
student [snum] is named [sname] and has major [major] and ...
AND NOT
THERE EXISTS onum SUCH THAT
student [snum] is enrolled in offering [onum]
)
AND student [snum] is named [sname] and has major [major] and ...
*/
(π snum Students - π snum Enrolled) ⨝ Students
You can project out any columns that you don't want from that.
(Notice that we don't need to know constraints to query.)
Relational algebra for banking scenario
Forming a relational algebra query from an English description
Is there any rule of thumb to construct SQL query from a human-readable description?

Forming a relational algebra query from an English description

I am preparing for an upcoming test in my school.
While I was going through some example questions, I got stuck with one particular question.
Passenger {p_id, p_name, p_nation} with key {p_id}
Flight {f_no, f_date, f_orig, f_dest} with key {f_no, f_date}
Trip {p_id, f_no, f_date, class} with key {p_id, f_no,f date}
and foreign keys [p_id] ⊆ Passenger[p_id] and [f_no, f_date] ⊆ Flight[f_no, f_date]
The question asks:
Consider classes that passengers have occupied on flights from Narita.
Write in relational algebra: What are the ids of passengers who have
flown from Narita in each of these classes at least once?
What I did so far is:
-- rename class to class' in Trip and join with Trip
Q1 = Trip JOIN RENAME class\class' (Trip)
-- select those Q1 tuples where class = class'
Q2 = RESTRICT class = class' (Q2)
-- Project for those who traveled in different classes more than once
Q3 = PROJECT p_id (Q1 - Q2)
Q3 will show me (if I've done it correctly) all the ids of passengers who traveled more than once in different classes.
Can someone help me to get further from this point?
This is as far as I got.
The Q3 you calculate actually holds passengers who traveled in more than one class on the same flight number on the same day. Moreover, according to the constraints there aren't any such passengers. Here's why:
According to your code Q1 is
/* (tuples where)
p_id took f_no on f_date in class
AND p_id took f_no on f_date in class'
*/
Trip JOIN RENAME class\class' Trip
For Q1, passenger p_id took f_no on d_no in class and (for that flight number and date) in class'. (Note that under common sense, with a person only able to fly a trip in one class at a time, if class <> class' then they must have flown multiple trips with the same flight number on the same date, in different classes.)
Q1 - Q2 is just SELECT class <> class' Q1. So Q3 holds ids of passengers who traveled with different classes with the same flight number on the same date. But those people aren't relevant to a sensible interpretation of your overall query "passengers who have flown from Narita in each of these classes at least once".
But anyway since {f_no, f_date} is a CK (candidate key) of Flight, there's only one flight for a given flight number and date, so no passengers can have flown the same flight number & date more than once. So Q3 is empty anyway.
Forming a relational algebra query from an English description
Always characterize a relation--the value of a given one or of a query (sub)expression--via a statement template--predicate--parameterized by attributes. The relation holds the tuples that make it into a statement--proposition--that is true of the situation.
You must have been given the predicate for each base relation. Eg:
-- (tuples where) p_id took f_no on f_date in class
Trip
Then you need to express your query (sub)expression predicates in terms of the base predicates so that the (sub)expression relations can be calculated in terms of the base relations:
Consider classes that passengers have occupied on flights from Narita.
/* (tuples where)
FOR SOME p_id, f_no, f_date, f_orig & f_dest,
p_id took f_no on f_date in class
AND f_no flew on f_date from f_orig to f_dest
AND f_orig = 'Narita'
*/
PROJECT class SELECT f_dest = 'Narita' (Trip JOIN Flight)
The predicate of r JOIN s is predicate-of-r AND predicate-of-s. The predicate of SELECT c r is predicate-of-r AND c. Every relation operator has such a predicate transform. The predicate of PROJECT some-attributes-of-r r is FOR SOME other-attributes-of-r predicate-of-r. The predicate of RENAME a\a' r is predicate-of-r with (appropriate occurrences of) a replaced by a'.
To query, find some predicate equivalent to your desired predicate, then replace its parts by corresponding relation expressions. See this.
Constraints & querying
We must know the predicates in order to query. The constraints (including FDs, CKs, PKs and FKs) are truths in every situation/state that can arise, expressed in terms of the predicates. We only need to know constraints when querying if the query's predicate can only be phrased in terms of base predicates because those constraints hold. Eg given Trip & Flight but no constraints we can't query for "classes that passengers have occupied on flights from Narita", ie the classes in tuples where:
p_id took f_no on f_date in class from f_orig to f_dest
The closest we can get is (Trip JOIN Flight):
p_id took f_no on f_date in class
AND f_no flew on f_date from f_orig to f_dest
but that doesn't necessarily tell us what class(es) were used on what flights. But if {f_no, f_date} is unique in Flight, which is implied by {f_no, f_date} being a CK of Flight, then the two predicates mean the same thing (ie have the same truth value for every tuple & situation).
On the other hand, since we can express that query given the CK constraint, we don't also need to be told that {f_no, f_date} is a FK from Trip to Flight. The FK says that if some passenger took f_no on f_date in some class then f_no flew on f_date from some origin to some destination and that {f_no, f_date} is a CK of Flight. So a Passenger {f_no, f_date} is a Flight {f_no, f_date}. But whether or not that first conjunct of the FK also holds, or any other constraint also holds, the query returns the tuples satisfying its predicate.

using alias for cast in HIVE

I have a table called loan with loan amount,annual income, year (MMM-YY format) and member id. I am trying to find the highest loan amount in a year along wit annual income and member id details.
I tried to group the highest loan amount by year using the code
select max(cast(loan_amt as int)),issue_d from loan group by issue_d;
then I wanted also to fetch the member id and annual income information so I wrote the following code
but it is giving me error message for using alias for a column which is cast.
Code:
select a.loan_amt,a.member_id,a.annual_inc,a.issue_d
from
(select loan_amt,member_id,annual_inc,issue_d from loan) a
join
(select max(cast(loan_amt as int)) as ml,issue_d from loan group by issue_d) c
where ((a.issue_d=c.issue_d) and (a.loan_amt=a.ml));
What you want to do is rank the records based on the Amount, per Period, then keep only the top 1 record for each Period.
Use one of the analytic functions that are designed exactly for that purpose -- Hive has a pretty good support of the SQL standard on that topic.
Since you don't say what to do about ties (i.e. what if several loans have the same Amount???) I assume you want just one record chosen randomly...
select X, Y, Z, Period, Amount as TopAmount
from
(select X, Y, Z, Period, cast(StrAmt as double) as Amount,
row_number() over (partition by Period order by cast(StrAmt as double) desc) as TmpRank
from WTF
) TMPWTF
where TmpRank =1
If you want all the records with top Amount then replace row_number with rank or dense_rank (the "dense" stuff would make a difference for the top 2, but not for the top 1)

Resources