Lets say i have a table as follows:
| id | dir | p1 | p2 |
|----------------------|
| a | x | 1.2 | 1.3 |
| a | x | 1.2 | 1.3 |
| a | z | 2.1 | 3 |
| a | z | 2.1 | 3 |
| b | x | 1 | null|
| b | z | 4 | null|
I would like to have unique rows of row a and b where dir = x and dir = z. So two rows each.
Then when dir = z. Take the value in p1 - (p2 of the previous row for that id) as newval1 and the value in p2 - (p1 of the previous row for that id) as new val2.
Treating nulls as zeroes.
In steps I suppose it will be:
| id | dir | p1 | p2 |
|----------------------|
| a | x | 1.2 | 1.3 |
| a | z | 2.1 | 3 |
| b | x | 1 | null|
| b | z | 4 | null|
Desired result will be:
| id | newval1 | newval2 |
|--------------------------------|
| a | 0.8(2.1-1.3) | 1.8(3-1.2 |
| b | 4 (4-0) | -1(0-1) |
Is it possible to do this in SQL?
select id,
nvl(max(case when dir = 'z' then p1 end), 0)
- nvl(max(case when dir = 'x' then p2 end), 0) as newval1,
nvl(max(case when dir = 'z' then p2 end), 0)
- nvl(max(case when dir = 'x' then p1 end), 0) as newval2
from tbl
where dir in ('x', 'z')
group by id
;
ID NEWVAL1 NEWVAL2
-- ---------- ----------
a .8 1.8
b 4 -1
Or, if you are on version 11.1 or higher, you can use the pivot operator:
select id, z_p1 - x_p2 as newval1, z_p2 - x_p1 as newval2
from tbl
pivot ( max(nvl(p1, 0)) as p1, max(nvl(p2, 0)) as p2
for dir in ('x' as x, 'z' as z)
)
;
Related
I'm trying to determine how a Turing Machine (consisting of only 0's and 1's, no blanks) could recognize a sequence of 8 1's. Every algorithm I've found has a TM searching for an indeterminate number of 1's or 0's, not a specific number.
Essentially, if you have this tape:
1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1
How can you recognize that the 8 1's represent addition, and you want to add 0 0 0 1 and 0 0 0 1?
I take it that 11111111 is like an opcode and 0001, 0001 are the operands for that opcode. At least, that's the only interpretation I am seeing.
A TM can look for a fixed, finite number of symbols by using a similar fixed, finite number of states, the sole purpose of each one being to recognize that another of the expected symbols has been seen. For instance, here's a four-tape TM that recognizes addition and does the binary addition:
|----|----|----|----|----||----|-----|-----|-----|-----|----|----|----|----|
| Q | T1 | T2 | T3 | T4 || Q' | T1' | T2' | T3' | T4' | D1 | D2 | D3 | D4 |
|----|----|----|----|----||----|-----|-----|-----|-----|----|----|----|----|
// read the opcode /////////////////////////////////////////////////////////
| qA | 1 | x | y | z || qB | 1 | x | y | z | R | S | S | S |
| qB | 1 | x | y | z || qC | 1 | x | y | z | R | S | S | S |
| qC | 1 | x | y | z || qD | 1 | x | y | z | R | S | S | S |
| qD | 1 | x | y | z || qE | 1 | x | y | z | R | S | S | S |
| qE | 1 | x | y | z || qF | 1 | x | y | z | R | S | S | S |
| qF | 1 | x | y | z || qG | 1 | x | y | z | R | S | S | S |
| qG | 1 | x | y | z || qH | 1 | x | y | z | R | S | S | S |
| qH | 1 | x | y | z || qI | 1 | x | y | z | R | S | S | S |
// read the first addend ///////////////////////////////////////////////////
| qI | w | x | y | z || qJ | w | w | y | z | R | R | S | S |
| qJ | w | x | y | z || qK | w | w | y | z | R | R | S | S |
| qK | w | x | y | z || qL | w | w | y | z | R | R | S | S |
| qL | w | x | y | z || qM | w | w | y | z | R | R | S | S |
// read the second addend //////////////////////////////////////////////////
| qM | w | x | y | z || qN | w | x | w | z | R | S | R | S |
| qN | w | x | y | z || qO | w | x | w | z | R | S | R | S |
| qO | w | x | y | z || qP | w | x | w | z | R | S | R | S |
| qP | w | x | y | z || qQ | w | x | w | z | R | S | R | S |
// prepare the output tape /////////////////////////////////////////////////
| qQ | w | x | y | z || qR | w | x | y | z | S | S | S | R |
| qR | w | x | y | z || qS | w | x | y | z | S | S | S | R |
| qS | w | x | y | z || qT | w | x | y | z | S | S | S | R |
| qT | w | x | y | z || qU | w | x | y | z | S | S | S | R |
// handle addition when no carry ///////////////////////////////////////////
| qU | w | 0 | 0 | z || qU | w | 0 | 0 | 0 | S | L | L | L |
| qU | w | 0 | 1 | z || qU | w | 0 | 1 | 1 | S | L | L | L |
| qU | w | 1 | 0 | z || qU | w | 1 | 0 | 1 | S | L | L | L |
| qU | w | 1 | 1 | z || qV | w | 1 | 1 | 0 | S | L | L | L |
| qU | w | B | B | B || hA | w | B | B | B | S | R | R | R |
// handle addition when carry //////////////////////////////////////////////
| qV | w | 0 | 0 | z || qU | w | 0 | 0 | 1 | S | L | L | L |
| qV | w | 0 | 1 | z || qV | w | 0 | 1 | 0 | S | L | L | L |
| qV | w | 1 | 0 | z || qV | w | 1 | 0 | 0 | S | L | L | L |
| qV | w | 1 | 1 | z || qV | w | 1 | 1 | 1 | S | L | L | L |
| qV | w | B | B | B || hA | w | B | B | B | S | R | R | R |
|----|----|----|----|----||----|-----|-----|-----|-----|----|----|----|----|
Legend:
Q: current state
T1: current tape symbol, input tape
T2: current tape symbol, scratch tape #1
T3: current tape symbol, scratch tape #2
T4: current tape symbol, output tape (not used)
Q': state to transition into
T1': symbol to write to input tape (not used)
T2': symbol to write to scratch tape #1
T3': symbol to write to scratch tape #2
T4': symbol to write to output tape
D1: direction to move input tape head
D2: direction to move scratch tape #1 head
D3: direction to move scratch tape #2 head
D4: direction to move output tape head
Conventions:
w, x, y and z are variables and represent either 0 or 1. A transition using all four of these can be thought of as a shorthand notation for writing sixteen (2^4) concrete transitions.
directions are L=left, S=same, R=right.
B is a blank symbol; it can be dispensed with if you add more states to assist U and V in the addition.
Having a hive table with age column consisting of age of persons.
Have to count and display the top 3 age categories.
Ex: whether below 10, 10-15, 15-20, 20-25, 25-30, ...
Which age category appears more.
Please suggest me a query to do this.
select case
when age <= 10 then '0-10'
else concat_ws
(
'-'
,cast(floor(age/5)*5 as string)
,cast((floor(age/5)+1)*5 as string)
)
end as age_group
,count(*) as cnt
from mytable
group by 1
order by cnt desc
limit 3
;
You might need to set this parameter:
set hive.groupby.orderby.position.alias=true;
Demo
with mytable as
(
select floor(rand()*100) as age
from (select 1) x lateral view explode(split(space(100),' ')) pe
)
select case
when age <= 10 then '0-10'
else concat_ws('-',cast(floor(age/5)*5 as string),cast((floor(age/5)+1)*5 as string))
end as age_group
,count(*) as cnt
,sort_array(collect_list(age)) as age_list
from mytable
group by 1
order by cnt desc
;
+-----------+-----+------------------------------+
| age_group | cnt | age_list |
+-----------+-----+------------------------------+
| 0-10 | 9 | [0,0,1,3,3,6,8,9,10] |
| 25-30 | 9 | [26,26,28,28,28,28,29,29,29] |
| 55-60 | 8 | [55,55,56,57,57,57,58,58] |
| 35-40 | 7 | [35,35,36,36,37,38,39] |
| 80-85 | 7 | [80,80,81,82,82,82,84] |
| 30-35 | 6 | [31,32,32,32,33,34] |
| 70-75 | 6 | [70,70,71,71,72,73] |
| 65-70 | 6 | [65,67,67,68,68,69] |
| 50-55 | 6 | [51,53,53,53,53,54] |
| 45-50 | 5 | [45,45,48,48,49] |
| 85-90 | 5 | [85,86,87,87,89] |
| 75-80 | 5 | [76,77,78,79,79] |
| 20-25 | 5 | [20,20,21,22,22] |
| 15-20 | 5 | [17,17,17,18,19] |
| 10-15 | 4 | [11,12,12,14] |
| 95-100 | 4 | [95,95,96,99] |
| 40-45 | 3 | [41,44,44] |
| 90-95 | 1 | [93] |
+-----------+-----+------------------------------+
I have a consumer table like so.
consumer | product | quantity
-------- | ------- | --------
a | x | 3
a | y | 4
a | z | 1
b | x | 3
b | y | 5
c | x | 4
What I want is a 'normalized' rank assigned to each consumer so that I can split the table easily for testing and training. I used the dense_rank() in hive, so I got the below table.
rank | consumer | product | quantity
---- | -------- | ------- | --------
1 | a | x | 3
1 | a | y | 4
1 | a | z | 1
2 | b | x | 3
2 | b | y | 5
3 | c | x | 4
This is well and good, but I want to scale this to use with any number of consumers, so I would ideally like the range of ranks between 0 and 1, like so.
rank | consumer | product | quantity
---- | -------- | ------- | --------
0.33 | a | x | 3
0.33 | a | y | 4
0.33 | a | z | 1
0.67 | b | x | 3
0.67 | b | y | 5
1 | c | x | 4
This way, I'd always know what the range of ranks is, and can split the data in a standard way (rank <= 0.7 training, and rank > 0.7 testing)
Is there a way to achieve this in hive?
Or, is there a different and better approach to my original issue of splitting the data?
I tried to do a select * where rank < 0.7*max(rank), but hive says the MAX UDAF is not yet available in where clause.
percent_rank
select percent_rank() over (order by consumer) as pr
,*
from mytable
;
+-----+----------+---------+----------+
| pr | consumer | product | quantity |
+-----+----------+---------+----------+
| 0.0 | a | z | 1 |
| 0.0 | a | y | 4 |
| 0.0 | a | x | 3 |
| 0.6 | b | y | 5 |
| 0.6 | b | x | 3 |
| 1.0 | c | x | 4 |
+-----+----------+---------+----------+
For filtering you'll need a sub-query / CTE
select *
from (select percent_rank() over (order by consumer) as pr
,*
from mytable
) t
where pr <= ...
;
Given two sets, e.g.:
{A B C}, {1 2 3 4 5 6}
I want to generate the Cartesian product in an order that puts as much space as possible between equal elements. For example, [A1, A2, A3, A4, A5, A6, B1…] is no good because all the As are next to each other. An acceptable solution would be going "down the diagonals" and then every time it wraps offsetting by one, e.g.:
[A1, B2, C3, A4, B5, C6, A2, B3, C4, A5, B6, C1, A3…]
Expressed visually:
| | A | B | C | A | B | C | A | B | C | A | B | C | A | B | C | A | B | C |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | | | | | | | | | | | | | | | | | |
| 2 | | 2 | | | | | | | | | | | | | | | | |
| 3 | | | 3 | | | | | | | | | | | | | | | |
| 4 | | | | 4 | | | | | | | | | | | | | | |
| 5 | | | | | 5 | | | | | | | | | | | | | |
| 6 | | | | | | 6 | | | | | | | | | | | | |
| 1 | | | | | | | | | | | | | | | | | | |
| 2 | | | | | | | 7 | | | | | | | | | | | |
| 3 | | | | | | | | 8 | | | | | | | | | | |
| 4 | | | | | | | | | 9 | | | | | | | | | |
| 5 | | | | | | | | | | 10| | | | | | | | |
| 6 | | | | | | | | | | | 11| | | | | | | |
| 1 | | | | | | | | | | | | 12| | | | | | |
| 2 | | | | | | | | | | | | | | | | | | |
| 3 | | | | | | | | | | | | | 13| | | | | |
| 4 | | | | | | | | | | | | | | 14| | | | |
| 5 | | | | | | | | | | | | | | | 15| | | |
| 6 | | | | | | | | | | | | | | | | 16| | |
| 1 | | | | | | | | | | | | | | | | | 17| |
| 2 | | | | | | | | | | | | | | | | | | 18|
or, equivalently but without repeating the rows/columns:
| | A | B | C |
|---|----|----|----|
| 1 | 1 | 17 | 15 |
| 2 | 4 | 2 | 18 |
| 3 | 7 | 5 | 3 |
| 4 | 10 | 8 | 6 |
| 5 | 13 | 11 | 9 |
| 6 | 16 | 14 | 12 |
I imagine there are other solutions too, but that's the one I found easiest to think about. But I've been banging my head against the wall trying to figure out how to express it generically—it's a convenient thing that the cardinality of the two sets are multiples of each other, but I want the algorithm to do The Right Thing for sets of, say, size 5 and 7. Or size 12 and 69 (that's a real example!).
Are there any established algorithms for this? I keep getting distracted thinking of how rational numbers are mapped onto the set of natural numbers (to prove that they're countable), but the path it takes through ℕ×ℕ doesn't work for this case.
It so happens the application is being written in Ruby, but I don't care about the language. Pseudocode, Ruby, Python, Java, Clojure, Javascript, CL, a paragraph in English—choose your favorite.
Proof-of-concept solution in Python (soon to be ported to Ruby and hooked up with Rails):
import sys
letters = sys.argv[1]
MAX_NUM = 6
letter_pos = 0
for i in xrange(MAX_NUM):
for j in xrange(len(letters)):
num = ((i + j) % MAX_NUM) + 1
symbol = letters[letter_pos % len(letters)]
print "[%s %s]"%(symbol, num)
letter_pos += 1
String letters = "ABC";
int MAX_NUM = 6;
int letterPos = 0;
for (int i=0; i < MAX_NUM; ++i) {
for (int j=0; j < MAX_NUM; ++j) {
int num = ((i + j) % MAX_NUM) + 1;
char symbol = letters.charAt(letterPos % letters.length);
String output = symbol + "" + num;
++letterPos;
}
}
What about using something fractal/recursive? This implementation divides a rectangular range into four quadrants then yields points from each quadrant. This means that neighboring points in the sequence differ at least by quadrant.
#python3
import sys
import itertools
def interleave(*iters):
for elements in itertools.zip_longest(*iters):
for element in elements:
if element != None:
yield element
def scramblerange(begin, end):
width = end - begin
if width == 1:
yield begin
else:
first = scramblerange(begin, int(begin + width/2))
second = scramblerange(int(begin + width/2), end)
yield from interleave(first, second)
def scramblerectrange(top=0, left=0, bottom=1, right=1, width=None, height=None):
if width != None and height != None:
yield from scramblerectrange(bottom=height, right=width)
raise StopIteration
if right - left == 1:
if bottom - top == 1:
yield (left, top)
else:
for y in scramblerange(top, bottom):
yield (left, y)
else:
if bottom - top == 1:
for x in scramblerange(left, right):
yield (x, top)
else:
halfx = int(left + (right - left)/2)
halfy = int(top + (bottom - top)/2)
quadrants = [
scramblerectrange(top=top, left=left, bottom=halfy, right=halfx),
reversed(list(scramblerectrange(top=top, left=halfx, bottom=halfy, right=right))),
scramblerectrange(top=halfy, left=left, bottom=bottom, right=halfx),
reversed(list(scramblerectrange(top=halfy, left=halfx, bottom=bottom, right=right)))
]
yield from interleave(*quadrants)
if __name__ == '__main__':
letters = 'abcdefghijklmnopqrstuvwxyz'
output = []
indices = dict()
for i, pt in enumerate(scramblerectrange(width=11, height=5)):
indices[pt] = i
x, y = pt
output.append(letters[x] + str(y))
table = [[indices[x,y] for x in range(11)] for y in range(5)]
print(', '.join(output))
print()
pad = lambda i: ' ' * (2 - len(str(i))) + str(i)
header = ' |' + ' '.join(map(pad, letters[:11]))
print(header)
print('-' * len(header))
for y, row in enumerate(table):
print(pad(y)+'|', ' '.join(map(pad, row)))
Outputs:
a0, i1, a2, i3, e0, h1, e2, g4, a1, i0, a3, k3, e1,
h0, d4, g3, b0, j1, b2, i4, d0, g1, d2, h4, b1, j0,
b3, k4, d1, g0, d3, f4, c0, k1, c2, i2, c1, f1, a4,
h2, k0, e4, j3, f0, b4, h3, c4, j2, e3, g2, c3, j4,
f3, k2, f2
| a b c d e f g h i j k
-----------------------------------
0| 0 16 32 20 4 43 29 13 9 25 40
1| 8 24 36 28 12 37 21 5 1 17 33
2| 2 18 34 22 6 54 49 39 35 47 53
3| 10 26 50 30 48 52 15 45 3 42 11
4| 38 44 46 14 41 31 7 23 19 51 27
If your sets X and Y are sizes m and n, and Xi is the index of the element from X that's in the ith pair in your Cartesian product (and similar for Y), then
Xi = i mod n;
Yi = (i mod n + i div n) mod m;
You could get your diagonals a little more spread out by filling out your matrix like this:
for (int i = 0; i < m*n; i++) {
int xi = i % n;
int yi = i % m;
while (matrix[yi][xi] != 0) {
yi = (yi+1) % m;
}
matrix[yi][xi] = i+1;
}
I have two tables A1,A2
A1 (primary key ID):
| ID | NAME |
|-------|---------|
| 1 | Cat1 |
| 2 | Cat2 |
| 3 | Cat3 |
| 4 | Cat4 |
| 5 | Cat5 |
and A2 (primary key ID, foreign key A1_ID=A1.ID)
| ID | NAME | A1_ID | TYPE |
|-------|---------|--------|--------|
| 1 | Sub1 | 1 | L |
| 2 | Sub2 | 2 | F |
| 3 | Sub3 | 3 | V |
| 4 | Sub4 | 4 | L |
| 5 | Sub5 | 4 | V |
| 6 | Sub6 | 5 | |
I am trying to get all the results from both tables where A2.Type is L or F or null
This is what I have up to now:
select a.*, b.*
from a1 a
left join a2 b
on a.id=b.a1_id
where (b.type='L'
or b.type='F'
or b.type is null)
which returns :
| ID | NAME | ID | NAME | A1_ID | TYPE |
|-------|---------|--------|--------|--------|--------|
| 1 | Cat1 | 1 | Sub1 | 1 | L |
| 2 | Cat2 | 2 | Sub2 | 2 | F |
| 4 | Cat4 | 4 | Sub4 | 4 | L |
| 5 | Cat5 | 6 | Sub6 | 5 | |
But I am looking for a query that it will exclude the line with A1.ID = 4 because with the same A1_ID there is a row with TYPE=V
| ID | NAME | ID | NAME | A1_ID | TYPE |
|-------|---------|--------|--------|--------|--------|
| 1 | Cat1 | 1 | Sub1 | 1 | L |
| 2 | Cat2 | 2 | Sub2 | 2 | F |
| 5 | Cat5 | 6 | Sub6 | 5 | |
Any ideas?
You can do this with not exists:
select a.*, b.*
from a1 a left join
a2 b
on a.id = b.a1_id
where (b.type = 'L' or b.type='F' or b.type is null) and
not exists (select 1 from a2 where a2.id = a.id and a2.type = 'V');
Your original query doesn't quite do what your text says. This seems to be what you are describing:
select a.*, b.*
from a1 a join
a2 b
on a.id = b.a1_id and
(b.type = 'L' or b.type='F' or b.type is null)
where not exists (select 1 from a2 where a2.id = a.id and a2.type = 'V');
That is, the conditions in the where clause are moved to the on clause and the join is changed to an inner join. The difference is when there are no matches in a2 for a given id. Your version would return the row. This version will filter it out.
select a.*, b.*
from a1 a
left join a2 b
on a.id=b.a1_id
left join a2 c
on c.a1_ID = b.a1_ID AND c.type = 'V'
where (b.type='L'
or b.type='F'
or b.type is null)
and c.type is null
This is one way. If all you ever need to consider is v this should be efficient. However, if you need to adjust based on other criteria there maybe a better way.
in essence this takes your current results and compares it to another set of a2 that only contains record type "V". If any match is found, it is excluded from the results.