Lets say i have a table as follows:
| id | dir | p1 | p2 |
|----------------------|
| a | x | 1.2 | 1.3 |
| a | x | 1.2 | 1.3 |
| a | z | 2.1 | 3 |
| a | z | 2.1 | 3 |
| b | x | 1 | null|
| b | z | 4 | null|
I would like to have unique rows of row a and b where dir = x and dir = z. So two rows each.
Then when dir = z. Take the value in p1 - (p2 of the previous row for that id) as newval1 and the value in p2 - (p1 of the previous row for that id) as new val2.
Treating nulls as zeroes.
In steps I suppose it will be:
| id | dir | p1 | p2 |
|----------------------|
| a | x | 1.2 | 1.3 |
| a | z | 2.1 | 3 |
| b | x | 1 | null|
| b | z | 4 | null|
Desired result will be:
| id | newval1 | newval2 |
|--------------------------------|
| a | 0.8(2.1-1.3) | 1.8(3-1.2 |
| b | 4 (4-0) | -1(0-1) |
Is it possible to do this in SQL?
select id,
nvl(max(case when dir = 'z' then p1 end), 0)
- nvl(max(case when dir = 'x' then p2 end), 0) as newval1,
nvl(max(case when dir = 'z' then p2 end), 0)
- nvl(max(case when dir = 'x' then p1 end), 0) as newval2
from tbl
where dir in ('x', 'z')
group by id
;
ID NEWVAL1 NEWVAL2
-- ---------- ----------
a .8 1.8
b 4 -1
Or, if you are on version 11.1 or higher, you can use the pivot operator:
select id, z_p1 - x_p2 as newval1, z_p2 - x_p1 as newval2
from tbl
pivot ( max(nvl(p1, 0)) as p1, max(nvl(p2, 0)) as p2
for dir in ('x' as x, 'z' as z)
)
;
I just stumbled upon a strange (and very annoying game) that I wanted to solve programmatically. It reminds a bit of Rubik's cube, but 2 dimensional. I'm struggling a bit on how to approach this...
There is a 9x9 square with some circles placed into the inner squares. For instance, one get's the following picture:
A B C D E F G H I
-------------------------------------
9 | | | O | | | O | | | | J
-------------------------------------
8 | | | O | | O | | O | | | K
-------------------------------------
7 | | | | O | | | O | O | | L
-------------------------------------
6 | | | O | | | | O | | | M
-------------------------------------
5 | | | O | | | | | | | N
-------------------------------------
4 | | | | O | | O | O | | | O
-------------------------------------
3 | | | | | O | | O | | | P
-------------------------------------
2 | | | | O | | | | | | Q
-------------------------------------
1 | | | O | | | | | | | R
-------------------------------------
0 Z Y X W V U T S
One can use the numbers and letters arround the square to shift entire "rows" or "columns" to either left/right or up/down. Circles that would leave the game area to the right would reappear on the left and vise-versa, same accounts for top/bottom.
The goal is to rearrange the circles to a given pattern with a maximum amount of moves. For instance, one should rearrange the circles in the above picture to reflect the below picture in maximum 17 moves:
A B C D E F G H I
-------------------------------------
9 | | | | | | | | | | J
-------------------------------------
8 | | | O | O | O | O | O | | | K
-------------------------------------
7 | | | O | | | | O | | | L
-------------------------------------
6 | | | O | | | | O | | | M
-------------------------------------
5 | | | O | | | | O | | | N
-------------------------------------
4 | | | O | | | | O | | | O
-------------------------------------
3 | | | O | O | O | O | O | | | P
-------------------------------------
2 | | | | | | | | | | Q
-------------------------------------
1 | | | | | | | | | | R
-------------------------------------
0 Z Y X W V U T S
I would like to feed the starting and the end position of the circles to a program that delivers the shortest path possible. I'm struggling a bit to find an approach that doesn't just try all possible moves until a given maximum number of moves is reached.
Also it doesn't seem to be that easy to modify the approach that's being used to solve a Rubik's cube for instance...
Well, I thought it was a very interesting problem, and maybe somebody here has an illuminating idea.
UPDATE:
Just trying all the possible moves doesn't really seem realistic after a first try. There are just too many permutations. I think this could be really hard to solve...if possible at all.
I have a consumer table like so.
consumer | product | quantity
-------- | ------- | --------
a | x | 3
a | y | 4
a | z | 1
b | x | 3
b | y | 5
c | x | 4
What I want is a 'normalized' rank assigned to each consumer so that I can split the table easily for testing and training. I used the dense_rank() in hive, so I got the below table.
rank | consumer | product | quantity
---- | -------- | ------- | --------
1 | a | x | 3
1 | a | y | 4
1 | a | z | 1
2 | b | x | 3
2 | b | y | 5
3 | c | x | 4
This is well and good, but I want to scale this to use with any number of consumers, so I would ideally like the range of ranks between 0 and 1, like so.
rank | consumer | product | quantity
---- | -------- | ------- | --------
0.33 | a | x | 3
0.33 | a | y | 4
0.33 | a | z | 1
0.67 | b | x | 3
0.67 | b | y | 5
1 | c | x | 4
This way, I'd always know what the range of ranks is, and can split the data in a standard way (rank <= 0.7 training, and rank > 0.7 testing)
Is there a way to achieve this in hive?
Or, is there a different and better approach to my original issue of splitting the data?
I tried to do a select * where rank < 0.7*max(rank), but hive says the MAX UDAF is not yet available in where clause.
percent_rank
select percent_rank() over (order by consumer) as pr
,*
from mytable
;
+-----+----------+---------+----------+
| pr | consumer | product | quantity |
+-----+----------+---------+----------+
| 0.0 | a | z | 1 |
| 0.0 | a | y | 4 |
| 0.0 | a | x | 3 |
| 0.6 | b | y | 5 |
| 0.6 | b | x | 3 |
| 1.0 | c | x | 4 |
+-----+----------+---------+----------+
For filtering you'll need a sub-query / CTE
select *
from (select percent_rank() over (order by consumer) as pr
,*
from mytable
) t
where pr <= ...
;
Given two sets, e.g.:
{A B C}, {1 2 3 4 5 6}
I want to generate the Cartesian product in an order that puts as much space as possible between equal elements. For example, [A1, A2, A3, A4, A5, A6, B1…] is no good because all the As are next to each other. An acceptable solution would be going "down the diagonals" and then every time it wraps offsetting by one, e.g.:
[A1, B2, C3, A4, B5, C6, A2, B3, C4, A5, B6, C1, A3…]
Expressed visually:
| | A | B | C | A | B | C | A | B | C | A | B | C | A | B | C | A | B | C |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | | | | | | | | | | | | | | | | | |
| 2 | | 2 | | | | | | | | | | | | | | | | |
| 3 | | | 3 | | | | | | | | | | | | | | | |
| 4 | | | | 4 | | | | | | | | | | | | | | |
| 5 | | | | | 5 | | | | | | | | | | | | | |
| 6 | | | | | | 6 | | | | | | | | | | | | |
| 1 | | | | | | | | | | | | | | | | | | |
| 2 | | | | | | | 7 | | | | | | | | | | | |
| 3 | | | | | | | | 8 | | | | | | | | | | |
| 4 | | | | | | | | | 9 | | | | | | | | | |
| 5 | | | | | | | | | | 10| | | | | | | | |
| 6 | | | | | | | | | | | 11| | | | | | | |
| 1 | | | | | | | | | | | | 12| | | | | | |
| 2 | | | | | | | | | | | | | | | | | | |
| 3 | | | | | | | | | | | | | 13| | | | | |
| 4 | | | | | | | | | | | | | | 14| | | | |
| 5 | | | | | | | | | | | | | | | 15| | | |
| 6 | | | | | | | | | | | | | | | | 16| | |
| 1 | | | | | | | | | | | | | | | | | 17| |
| 2 | | | | | | | | | | | | | | | | | | 18|
or, equivalently but without repeating the rows/columns:
| | A | B | C |
|---|----|----|----|
| 1 | 1 | 17 | 15 |
| 2 | 4 | 2 | 18 |
| 3 | 7 | 5 | 3 |
| 4 | 10 | 8 | 6 |
| 5 | 13 | 11 | 9 |
| 6 | 16 | 14 | 12 |
I imagine there are other solutions too, but that's the one I found easiest to think about. But I've been banging my head against the wall trying to figure out how to express it generically—it's a convenient thing that the cardinality of the two sets are multiples of each other, but I want the algorithm to do The Right Thing for sets of, say, size 5 and 7. Or size 12 and 69 (that's a real example!).
Are there any established algorithms for this? I keep getting distracted thinking of how rational numbers are mapped onto the set of natural numbers (to prove that they're countable), but the path it takes through ℕ×ℕ doesn't work for this case.
It so happens the application is being written in Ruby, but I don't care about the language. Pseudocode, Ruby, Python, Java, Clojure, Javascript, CL, a paragraph in English—choose your favorite.
Proof-of-concept solution in Python (soon to be ported to Ruby and hooked up with Rails):
import sys
letters = sys.argv[1]
MAX_NUM = 6
letter_pos = 0
for i in xrange(MAX_NUM):
for j in xrange(len(letters)):
num = ((i + j) % MAX_NUM) + 1
symbol = letters[letter_pos % len(letters)]
print "[%s %s]"%(symbol, num)
letter_pos += 1
String letters = "ABC";
int MAX_NUM = 6;
int letterPos = 0;
for (int i=0; i < MAX_NUM; ++i) {
for (int j=0; j < MAX_NUM; ++j) {
int num = ((i + j) % MAX_NUM) + 1;
char symbol = letters.charAt(letterPos % letters.length);
String output = symbol + "" + num;
++letterPos;
}
}
What about using something fractal/recursive? This implementation divides a rectangular range into four quadrants then yields points from each quadrant. This means that neighboring points in the sequence differ at least by quadrant.
#python3
import sys
import itertools
def interleave(*iters):
for elements in itertools.zip_longest(*iters):
for element in elements:
if element != None:
yield element
def scramblerange(begin, end):
width = end - begin
if width == 1:
yield begin
else:
first = scramblerange(begin, int(begin + width/2))
second = scramblerange(int(begin + width/2), end)
yield from interleave(first, second)
def scramblerectrange(top=0, left=0, bottom=1, right=1, width=None, height=None):
if width != None and height != None:
yield from scramblerectrange(bottom=height, right=width)
raise StopIteration
if right - left == 1:
if bottom - top == 1:
yield (left, top)
else:
for y in scramblerange(top, bottom):
yield (left, y)
else:
if bottom - top == 1:
for x in scramblerange(left, right):
yield (x, top)
else:
halfx = int(left + (right - left)/2)
halfy = int(top + (bottom - top)/2)
quadrants = [
scramblerectrange(top=top, left=left, bottom=halfy, right=halfx),
reversed(list(scramblerectrange(top=top, left=halfx, bottom=halfy, right=right))),
scramblerectrange(top=halfy, left=left, bottom=bottom, right=halfx),
reversed(list(scramblerectrange(top=halfy, left=halfx, bottom=bottom, right=right)))
]
yield from interleave(*quadrants)
if __name__ == '__main__':
letters = 'abcdefghijklmnopqrstuvwxyz'
output = []
indices = dict()
for i, pt in enumerate(scramblerectrange(width=11, height=5)):
indices[pt] = i
x, y = pt
output.append(letters[x] + str(y))
table = [[indices[x,y] for x in range(11)] for y in range(5)]
print(', '.join(output))
print()
pad = lambda i: ' ' * (2 - len(str(i))) + str(i)
header = ' |' + ' '.join(map(pad, letters[:11]))
print(header)
print('-' * len(header))
for y, row in enumerate(table):
print(pad(y)+'|', ' '.join(map(pad, row)))
Outputs:
a0, i1, a2, i3, e0, h1, e2, g4, a1, i0, a3, k3, e1,
h0, d4, g3, b0, j1, b2, i4, d0, g1, d2, h4, b1, j0,
b3, k4, d1, g0, d3, f4, c0, k1, c2, i2, c1, f1, a4,
h2, k0, e4, j3, f0, b4, h3, c4, j2, e3, g2, c3, j4,
f3, k2, f2
| a b c d e f g h i j k
-----------------------------------
0| 0 16 32 20 4 43 29 13 9 25 40
1| 8 24 36 28 12 37 21 5 1 17 33
2| 2 18 34 22 6 54 49 39 35 47 53
3| 10 26 50 30 48 52 15 45 3 42 11
4| 38 44 46 14 41 31 7 23 19 51 27
If your sets X and Y are sizes m and n, and Xi is the index of the element from X that's in the ith pair in your Cartesian product (and similar for Y), then
Xi = i mod n;
Yi = (i mod n + i div n) mod m;
You could get your diagonals a little more spread out by filling out your matrix like this:
for (int i = 0; i < m*n; i++) {
int xi = i % n;
int yi = i % m;
while (matrix[yi][xi] != 0) {
yi = (yi+1) % m;
}
matrix[yi][xi] = i+1;
}
I have a graph (User-[Likes]->Item) with millions nodes and billions nodes (roughly 50G in disk) built on a powerful machine with 256G RAM and 40 cores. Currently, I'm computing the allshortestpath() between two items.
To improve the cypher query performance, I set dbms.pagecache.memory=100g and wrapper.java.additional=-Xmx32g, with the hope that the whole neo4j can be loaded into meomory. However, when I execute the shortestpath query, the CPU usage is 1625% while MEMORY usage is only 5.7%, and I didn't see performance improvements on the cypher query. Am I missing something in the setting? Or can I setup something to run the query faster? I have read the Performance Tuning guide in the developer manual but didn't find solution.
EDIT1:
The cypher query is to count the number of unique users that like both two items. The full pattern would be (Brand)-[:Has]->(Item)<-[:LIKES]-(User)-[:LIKES]->(Item)<-[:HAS]-(Brand)
profile
MATCH p = allShortestPaths((p1:Brand {FID:'001'})-[*..4]-(p2:Brand {FID:'002'}))
with [r in RELS(p)|type(r)] as relationshipPath,
[n in nodes(p)|id(n)][2] as user, p1, p2
return p1.FID, p2.FID, count(distinct user);
EDIT2:
Below is a sampler query plan. It now seems that I'm not using shortestsPath efficiently (380,556,69 db hits). I use shortestsPath to get the common user node between start/end nodes, and then use count(distinct) to get the unique user. Is it possible to tell cypher to eliminate paths which contain the node that have been visited before?
Can you try to run this instead:
MATCH (p1:Brand {FID:'001'}),(p2:Brand {FID:'002'})
MATCH (u:User)
WHERE (p1)-[:Has]->()<-[:LIKES]-(u) AND
(p2)-[:Has]->()<-[:LIKES]-(u)
RETURN p1,p2,count(u);
This starts at the user and checks against both brands, the explain plan looks much better
+----------------------+----------------+------------------------------------------+---------------------------+
| Operator | Estimated Rows | Variables | Other |
+----------------------+----------------+------------------------------------------+---------------------------+
| +ProduceResults | 0 | count(u), p1, p2 | p1, p2, count(u) |
| | +----------------+------------------------------------------+---------------------------+
| +EagerAggregation | 0 | count(u) -- p1, p2 | p1, p2 |
| | +----------------+------------------------------------------+---------------------------+
| +SemiApply | 0 | p2 -- p1, u | |
| |\ +----------------+------------------------------------------+---------------------------+
| | +Expand(Into) | 0 | anon[78] -- anon[87], anon[89], p1, u | (p1)-[:Has]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Expand(All) | 0 | anon[87], anon[89] -- p1, u | (u)-[:LIKES]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Argument | 1 | p1, u | |
| | +----------------+------------------------------------------+---------------------------+
| +SemiApply | 0 | p1 -- p2, u | |
| |\ +----------------+------------------------------------------+---------------------------+
| | +Expand(Into) | 0 | anon[119] -- anon[128], anon[130], p2, u | (p2)-[:Has]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Expand(All) | 0 | anon[128], anon[130] -- p2, u | (u)-[:LIKES]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Argument | 1 | p2, u | |
| | +----------------+------------------------------------------+---------------------------+
| +CartesianProduct | 0 | u -- p1, p2 | |
| |\ +----------------+------------------------------------------+---------------------------+
| | +CartesianProduct | 0 | p2 -- p1 | |
| | |\ +----------------+------------------------------------------+---------------------------+
| | | +Filter | 0 | p1 | p1.FID == { AUTOSTRING0} |
| | | | +----------------+------------------------------------------+---------------------------+
| | | +NodeByLabelScan | 0 | p1 | :Brand |
| | | +----------------+------------------------------------------+---------------------------+
| | +Filter | 0 | p2 | p2.FID == { AUTOSTRING1} |
| | | +----------------+------------------------------------------+---------------------------+
| | +NodeByLabelScan | 0 | p2 | :Brand |
| | +----------------+------------------------------------------+---------------------------+
| +NodeByLabelScan | 0 | u | :User |
+----------------------+----------------+------------------------------------------+---------------------------+