I have 2 tables, and I just wanna select the "Active" value only. PeriodID for Table1 is fixed with D1,D2,D3, and D4 only. It won't increase anymore. The active status might change.
My table 1 as follow:
PeriodID | Active
------------------
D1 | Y
D2 | Y
D3 | N
D4 | Y
My table 2 as follow:
RateID | CompanyCode | D1 | D2 | D3 | D4
------------------------------------------------
1 | C1 | 1.1 | 1.5 | 1.9 | 2.0
2 | C2 | 2.1 | 2.5 | 2.9 | 3.0
How do I achievement expected result as below? Thanks
RateID | CompanyCode | D1 | D2 | D4
----------------------------------------
1 | C1 | 1.1 | 1.5 | 2.0
2 | C2 | 2.1 | 2.5 | 3.0
Below is my query without filter:
From c In DB.Table2 Select c.D1,c.D2,c.D3,c.D4
Related
Lets say i have a table as follows:
| id | dir | p1 | p2 |
|----------------------|
| a | x | 1.2 | 1.3 |
| a | x | 1.2 | 1.3 |
| a | z | 2.1 | 3 |
| a | z | 2.1 | 3 |
| b | x | 1 | null|
| b | z | 4 | null|
I would like to have unique rows of row a and b where dir = x and dir = z. So two rows each.
Then when dir = z. Take the value in p1 - (p2 of the previous row for that id) as newval1 and the value in p2 - (p1 of the previous row for that id) as new val2.
Treating nulls as zeroes.
In steps I suppose it will be:
| id | dir | p1 | p2 |
|----------------------|
| a | x | 1.2 | 1.3 |
| a | z | 2.1 | 3 |
| b | x | 1 | null|
| b | z | 4 | null|
Desired result will be:
| id | newval1 | newval2 |
|--------------------------------|
| a | 0.8(2.1-1.3) | 1.8(3-1.2 |
| b | 4 (4-0) | -1(0-1) |
Is it possible to do this in SQL?
select id,
nvl(max(case when dir = 'z' then p1 end), 0)
- nvl(max(case when dir = 'x' then p2 end), 0) as newval1,
nvl(max(case when dir = 'z' then p2 end), 0)
- nvl(max(case when dir = 'x' then p1 end), 0) as newval2
from tbl
where dir in ('x', 'z')
group by id
;
ID NEWVAL1 NEWVAL2
-- ---------- ----------
a .8 1.8
b 4 -1
Or, if you are on version 11.1 or higher, you can use the pivot operator:
select id, z_p1 - x_p2 as newval1, z_p2 - x_p1 as newval2
from tbl
pivot ( max(nvl(p1, 0)) as p1, max(nvl(p2, 0)) as p2
for dir in ('x' as x, 'z' as z)
)
;
I have a consumer table like so.
consumer | product | quantity
-------- | ------- | --------
a | x | 3
a | y | 4
a | z | 1
b | x | 3
b | y | 5
c | x | 4
What I want is a 'normalized' rank assigned to each consumer so that I can split the table easily for testing and training. I used the dense_rank() in hive, so I got the below table.
rank | consumer | product | quantity
---- | -------- | ------- | --------
1 | a | x | 3
1 | a | y | 4
1 | a | z | 1
2 | b | x | 3
2 | b | y | 5
3 | c | x | 4
This is well and good, but I want to scale this to use with any number of consumers, so I would ideally like the range of ranks between 0 and 1, like so.
rank | consumer | product | quantity
---- | -------- | ------- | --------
0.33 | a | x | 3
0.33 | a | y | 4
0.33 | a | z | 1
0.67 | b | x | 3
0.67 | b | y | 5
1 | c | x | 4
This way, I'd always know what the range of ranks is, and can split the data in a standard way (rank <= 0.7 training, and rank > 0.7 testing)
Is there a way to achieve this in hive?
Or, is there a different and better approach to my original issue of splitting the data?
I tried to do a select * where rank < 0.7*max(rank), but hive says the MAX UDAF is not yet available in where clause.
percent_rank
select percent_rank() over (order by consumer) as pr
,*
from mytable
;
+-----+----------+---------+----------+
| pr | consumer | product | quantity |
+-----+----------+---------+----------+
| 0.0 | a | z | 1 |
| 0.0 | a | y | 4 |
| 0.0 | a | x | 3 |
| 0.6 | b | y | 5 |
| 0.6 | b | x | 3 |
| 1.0 | c | x | 4 |
+-----+----------+---------+----------+
For filtering you'll need a sub-query / CTE
select *
from (select percent_rank() over (order by consumer) as pr
,*
from mytable
) t
where pr <= ...
;
I have a table a show below
Date | Customer | Count | Daily_Count | ITD_Count
d1 | A | 3 | 3 |
d2 | B | 4 | 4 |
d3 | A | 7 | 16 |
d3 | B | 9 | 16 |
d4 | A | 8 | 9 |
d4 | B | 1 | 9 |
Descrption of Fields:
Date : date
customer : name of customer
Count : # of customers
daily_Count : # of customers on daily basis calculated as
SUM(count) OVER (partition BY date )as Daily_Count
Question :
How do I calculate the Running Total or Rolling Total in the ITD_Count ?
The output should look like
Date | Customer | Count | Daily_Count | ITD_Count
d1 | A | 3 | 3 | 3
d2 | B | 4 | 4 | 7
d3 | A | 7 | 16 | 23
d3 | B | 9 | 16 | 23
d4 | A | 8 | 9 | 31
d4 | B | 1 | 9 | 31
I have tried several variations of using the Window functionality.. But hit a road-block in all my attempts.
Attempt 1 ;
SUM(daily_COunt) OVER (partition BY date order by date rows between unbounded preceding and current row ) as ITD_account_linking
Attempt 2 :
SUM(daily_COunt) OVER (partition BY date, daily_count order by date rows between unbounded preceding and current row ) as ITD_account_linking
and several more attempts following this. :(
Any possible suggestions to guide me in the right direction are welcome.
Please let me know if you need more details.
Use Hive Windowing and Analytics functions.
SELECT Date, Customer, Count, Daily_Count,
SUM(Daily_Count) OVER (ORDER BY Date ROWS UNBOUNDED PRECEDING) AS ITD_Count
FROM table;
I have two tables A1,A2
A1 (primary key ID):
| ID | NAME |
|-------|---------|
| 1 | Cat1 |
| 2 | Cat2 |
| 3 | Cat3 |
| 4 | Cat4 |
| 5 | Cat5 |
and A2 (primary key ID, foreign key A1_ID=A1.ID)
| ID | NAME | A1_ID | TYPE |
|-------|---------|--------|--------|
| 1 | Sub1 | 1 | L |
| 2 | Sub2 | 2 | F |
| 3 | Sub3 | 3 | V |
| 4 | Sub4 | 4 | L |
| 5 | Sub5 | 4 | V |
| 6 | Sub6 | 5 | |
I am trying to get all the results from both tables where A2.Type is L or F or null
This is what I have up to now:
select a.*, b.*
from a1 a
left join a2 b
on a.id=b.a1_id
where (b.type='L'
or b.type='F'
or b.type is null)
which returns :
| ID | NAME | ID | NAME | A1_ID | TYPE |
|-------|---------|--------|--------|--------|--------|
| 1 | Cat1 | 1 | Sub1 | 1 | L |
| 2 | Cat2 | 2 | Sub2 | 2 | F |
| 4 | Cat4 | 4 | Sub4 | 4 | L |
| 5 | Cat5 | 6 | Sub6 | 5 | |
But I am looking for a query that it will exclude the line with A1.ID = 4 because with the same A1_ID there is a row with TYPE=V
| ID | NAME | ID | NAME | A1_ID | TYPE |
|-------|---------|--------|--------|--------|--------|
| 1 | Cat1 | 1 | Sub1 | 1 | L |
| 2 | Cat2 | 2 | Sub2 | 2 | F |
| 5 | Cat5 | 6 | Sub6 | 5 | |
Any ideas?
You can do this with not exists:
select a.*, b.*
from a1 a left join
a2 b
on a.id = b.a1_id
where (b.type = 'L' or b.type='F' or b.type is null) and
not exists (select 1 from a2 where a2.id = a.id and a2.type = 'V');
Your original query doesn't quite do what your text says. This seems to be what you are describing:
select a.*, b.*
from a1 a join
a2 b
on a.id = b.a1_id and
(b.type = 'L' or b.type='F' or b.type is null)
where not exists (select 1 from a2 where a2.id = a.id and a2.type = 'V');
That is, the conditions in the where clause are moved to the on clause and the join is changed to an inner join. The difference is when there are no matches in a2 for a given id. Your version would return the row. This version will filter it out.
select a.*, b.*
from a1 a
left join a2 b
on a.id=b.a1_id
left join a2 c
on c.a1_ID = b.a1_ID AND c.type = 'V'
where (b.type='L'
or b.type='F'
or b.type is null)
and c.type is null
This is one way. If all you ever need to consider is v this should be efficient. However, if you need to adjust based on other criteria there maybe a better way.
in essence this takes your current results and compares it to another set of a2 that only contains record type "V". If any match is found, it is excluded from the results.
I have the following table which I'd like to turn into a report:
ClientGroup | Product | Client | Quantity
-----------------------------------------
Gr1 | P1 | C1 | 10
Gr1 | P1 | C2 | 20
Gr1 | P1 | C3 | 30
Gr1 | P2 | C1 | 40
Gr1 | P2 | C2 | 50
Gr1 | P2 | C3 | 60
Gr2 | P1 | C4 | 70
Gr2 | P1 | C5 | 80
Gr2 | P1 | C6 | 90
Gr2 | P2 | C4 | 100
Gr2 | P2 | C5 | 110
Gr2 | P2 | C6 | 120
The report would have the following layout:
--------------------
| G1 |
--------------------
Client | P1 | P2 |
--------------------
C1 | 10 | 40 |
C2 | 20 | 50 |
C3 | 30 | 60 |
--------------------
Total | 60 |150 |
--------------------
| G2 |
--------------------
Client | P1 | P2 |
--------------------
C4 | 70 | 100 |
C5 | 80 | 110 |
C5 | 90 | 120 |
--------------------
Total | 240 | 330 |
--------------------
What I'm doing is to create a Matrix, add a row group on ClientGroup, a sub group row on Client, a column group on Product with Quantity as detail. In the designer it looks somewhat like this:
---------------------------------------------
| ClientGroup | Client | [Product] |
---------------------------------------------
| [ClientGroup] | [Client] | Sum([Quantity])|
---------------------------------------------
I then hide the ClientGroup column and it seems I'm almost there. What I can't figure out is how to have a header over the columns Client and [Product] displaying the current ClientGroup.
Is it possible? Any ideas?
You can get pretty close:
Set the Headings row to be hidden.
Right-click the [Client] cell and select Insert Row > Outside Group - Above, twice.
Copy [ClientGroup] into the left-hand cell on the first new row, and set the BorderStyle-Right of the cell to be None.
Select the right-hand cell on the first new row, and set the BorderStyle-Left and -Right of the cell to be None.
Copy the heading Client into the left-hand cell on the second new row.
Copy [Product] into the right-hand cell on the second new row.
Your report should look something like this in the designer:
--------------------------------------------------
| ClientGroup | Client | [Product] |
--------------------------------------------------
| [ClientGroup] | [ClientGroup] | |
| |---------------------------------
| | Client | [Product] |
| |---------------------------------
| | [Client] | Sum([Quantity])|
--------------------------------------------------
If you preview it, the results should be pretty close to the desired layout.