four part binary search alghothm - algorithm

binary search split an array to two part and search in them.
but my teacher ask us to find a solution for split array to four part then search in parts.
binary search:
binary_search(A, target):
lo = 1, hi = size(A)
while lo <= hi:
mid = lo + (hi-lo)/2
if A[mid] == target:
return mid
else if A[mid] < target:
lo = mid+1
else:
hi = mid-1
but I want split array to 4 part then search.
are is way?

A normal binary search splits the array (container) into two pieces, usually at the midpoint:
+---+---+---+---+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+---+---+---+---+---+---+---+---+
|
V
+---+---+---+---+ +---+---+---+---+
| 1 | 2 | 3 | 4 | | 5 | 6 | 7 | 8 |
+---+---+---+---+ +---+---+---+---+
Based on the midpoint value, the search key is either in the lower section (left) or the higher section (right).
If we take the same concept and split into 4 pieces, the key will be in one of the four quandrants:
+---+---+ +---+---+ +---+---+ +---+---+
| 1 | 2 | | 3 | 4 | | 5 | 6 | | 7 | 8 |
+---+---+ +---+---+ +---+---+ +---+---+
By comparing key to the highest quadrant slot, one can determine which quadrant the key lies in.
In a binary search, the midpoint is found by dividing the search range by 2.
In a 4 part search, the quadrants are found by dividing by four.
Try this algorithm out using pen and paper before coding. When you develop steps that work, then code. This is called designing then coding. A popular development process.
Nobody should be spoon-feeding you code. Work it out yourself.
Edit 1: Search Trees
Arrays and trees are very different with an array, you know where all the items are and you can use an index to access the elements. With a binary or search tree, you need to follow the links; as you don't know where each element is.
A divide by 4 search tree, is usually follows the principles of a B-Tree. Instead of single nodes, you have a page of nodes:
+---------------------------+
| Page Details |
+-----+---------------------+
| key | pointer to sub-tree |
+-----+---------------------+
| key | pointer to sub-tree |
+-----+---------------------+
| key | pointer to sub-tree |
+-----+---------------------+
| key | pointer to sub-tree |
+-----+---------------------+
The page node is an array of nodes. Most algorithms use a binary search in the array of nodes. When the key range is found, the algorithm then traverses the link to the appropriate sub-tree. The process repeats until the key is found in the Page node or on a leaf node.
What is your data structure and where lies your confusion?

Related

How to filter by pattern in a variable length Cypher query

I have a very simple graph with 5 nodes (named n1 - n5), 1 node type (:Node) and 2 relationship types (:r1, :r2). The nodes and relationships are arranged as follows (apologies for the ascii art):
(n1)-[:r1]->(n2)-[:r1]->(n3)
(n1)-[:r2]->(n4)-[:r2]->(n3)
(n1)-[:r1]->(n5)-[:r2]->(n3)
I have a query using a variable length path. I expected to be able restrict the paths returned by describing a specific pattern in the WHERE clause:
MATCH p = (n:Node {name: 'n1'})-[*..2]->()
WHERE (n)-[:r1]->()-[:r1]->()
RETURN p
The problem is that the response returns all possible paths. My question; is it possible to filter the returned paths when specifying a variable length path in a query?
If all relationships or nodes have to adhere to the same predicate, this is easy. You'll need a variable for the path, and you'll need to use all() (or none()) in your WHERE clause to apply the predicate for all relationships or nodes in your path:
MATCH p = (n:Node {name: 'n1'})-[*..2]->()
WHERE all(rel in relationships(p) WHERE type(rel) = 'r1')
RETURN p
That said, when all you want is for all relationships in the var-length path to be of the same type (or types, if you want multiple), that's best done in the pattern itself:
MATCH p = (n:Node {name: 'n1'})-[:r1*..2]->()
RETURN p
For more complicated cases, such as multiple relationship types (where the order of those types matters in the path), or repeating sequences of types or node labels in the path, then alternate approaches are needed. APOC path expanders may help.
EDIT
You mentioned in the comments that your case deals with sequences of relationships of varying lengths. While the APOC path expanders may help, it there are a few restrictions:
The path expanders currently operate on node labels and relationship types, but not properties, so if your expansions rely on predicates on properties, the path expanders won't be able to handle that for you during expansion, that would have to be done by filtering the path expander results after.
There are limits to the relationship sequence support for path expanders. We can define sequences of any length, and can accept multiple relationship types at each step in the sequence, but we don't currently support diverging sequences ((r1 then r2 then r3) or (r2 then r5 then r6)).
If we wanted to do a 3-step sequence of r1 (incoming), r2 (outgoing), then r3 or r4 (with r3 in either direction and r4 outgoing), repeating the sequence up to 3 times we could do so like this:
MATCH (n:Node {name: 'n1'})
CALL apoc.path.expandConfig(n, {relationshipFilter:'<r1, r2>, r3 | r4>', minLevel:1, maxLevel:9) YIELD path
RETURN path
Note that we can provide differing directions per relationship in the filter, or leave off the arrow entirely if we don't care about the direction.
Label filtering is more complex, but I didn't see any need for that present in the examples so far.
Your query return all paths because your WHERE clause (Filter operator) is applied before the VarLengthExpand operator:
+-----------------------+----------------+------+---------+-----------------+-------------------+----------------------+----------------------------+------------------------------------------------------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Page Cache Hits | Page Cache Misses | Page Cache Hit Ratio | Variables | Other |
+-----------------------+----------------+------+---------+-----------------+-------------------+----------------------+----------------------------+------------------------------------------------------------------------------------------------------------+
| +ProduceResults | 0 | 4 | 0 | 0 | 0 | 0.0000 | anon[32], anon[41], n, p | |
| | +----------------+------+---------+-----------------+-------------------+----------------------+----------------------------+------------------------------------------------------------------------------------------------------------+
| +Projection | 0 | 4 | 0 | 0 | 0 | 0.0000 | p -- anon[32], anon[41], n | {p : PathExpression(NodePathStep(Variable(n),MultiRelationshipPathStep(Variable(),OUTGOING,NilPathStep)))} |
| | +----------------+------+---------+-----------------+-------------------+----------------------+----------------------------+------------------------------------------------------------------------------------------------------------+
| +VarLengthExpand(All) | 0 | 4 | 7 | 0 | 0 | 0.0000 | anon[32], anon[41] -- n | (n)-[:*..2]->() |
| | +----------------+------+---------+-----------------+-------------------+----------------------+----------------------------+------------------------------------------------------------------------------------------------------------+
| +Filter | 0 | 1 | 6 | 0 | 0 | 0.0000 | n | n.name = { AUTOSTRING0}; GetDegree(Variable(n),Some(RelTypeName(KNOWS)),OUTGOING) > 0 |
| | +----------------+------+---------+-----------------+-------------------+----------------------+----------------------------+------------------------------------------------------------------------------------------------------------+
| +NodeByLabelScan | 4 | 4 | 5 | 0 | 0 | 0.0000 | n | :Crew |
+-----------------------+----------------+------+---------+-----------------+-------------------+----------------------+----------------------------+------------------------------------------------------------------------------------------------------------+
This should get you going:
MATCH p = (n:Node {name: 'n1'})-[*..2]->()
WITH n, relationships(p)[0] as rel0, relationships(p)[1] as rel1, p
MATCH (n)-[rel0:r1]->()-[rel1:r1]->()
RETURN p

Randomness Comparison Experiment

I have a drug analysis experiment that need to generate a value based on given drug database and set of 1000 random experiments.
The original database looks like this where the number in the columns represent the rank for the drug. This is a simplified version of actual database, the actual database will have more Drug and more Gene.
+-------+-------+-------+
| Genes | DrugA | DrugB |
+-------+-------+-------+
| A | 1 | 3 |
| B | 2 | 1 |
| C | 4 | 5 |
| D | 5 | 4 |
| E | 3 | 2 |
+-------+-------+-------+
A score is calculated based on user's input: A and C, using the following formula:
# Compute Function
# ['A','C'] as array input
computeFunction(array) {
# do some stuff with the array ...
}
The formula used will be same for any provided value.
For randomness test, each set of experiment requires the algorithm to provide randomized values of A and C, so both A and C can be having any number from 1 to 5
Now I have two methods of selecting value to generate the 1000 sets for P-Value calculation, but I would need someone to point out if there is one better than another, or if there is any method to compare these two methods.
Method 1
Generate 1000 randomized database based on given database input shown above, meaning all the table should contain different set of value pair.
Example for 1 database from 1000 randomized database:
+-------+-------+-------+
| Genes | DrugA | DrugB |
+-------+-------+-------+
| A | 2 | 3 |
| B | 4 | 4 |
| C | 3 | 2 |
| D | 1 | 5 |
| E | 5 | 1 |
+-------+-------+-------+
Next we perform computeFunction() with new A and C value.
Method 2
Pick any random gene from original database and use it as a newly randomized gene value.
For example, we pick the values from E and B as a new value for A and C.
From original database, E is 3, B is 2.
So, now A is 3, C is 2. Next we perform computeFunction() with new A and C value.
Summary
Since both methods produce completely randomized input, therefore it seems to me that it will produce similar 1000-value outcome. Is there any way I could prove they are similar?

Knapsack or similar with no values and with limits as to which items can be assigned where?

Say I have a number of weights which I need to spread out across a finite number of knapsacks so that each knapsack has as even a distribution of weights as possible. The catch is that different weights can only be put into the first bags, where each value of varies for each weight.
For example, a weight might only be able to inserted into bags up to bag 4, i.e. bags 1 through 4. Another might have a limit up to 5. The goal as previously stated is to attempt an even spread across all bags, with the number of bags set by the weight with the highest limit.
Is there a name for this problem, and what algorithms exist?
EDIT: To help visualise, say I have 4 weights:
+----------+--------+-----------+
| Weight # | Weight | Bag Limit |
+----------+--------+-----------+
| 1 | 2 | 2 |
| 2 | 3 | 3 |
| 3 | 1 | 1 |
| 4 | 2 | 4 |
+----------+--------+-----------+
A solution to the problem might look like this
| 1 | | | | | | |
| 2 | | 3 | | 2 | | |
|___| |___| |___| |___|
Bag 1 Bag 2 Bag 3 Bag 4
Weights 3 and 1 were placed into Bag 1
Weight 2 was placed into Bag 2
Weight 4 was placed into Bag 3
Here, the load is spread as evenly as possible, and the problem is solved (although perhaps not optimally, as I did this in my head)
Hopefully this might clear up what I'm trying to solve.
I'd describe this problem as bin packing with side constraints -- a lot of NP-hard problems don't have good names because there are so many of them. I would expect the LP-based methods for packing variable-sized bins that decompose the problem into (1) a packing problem over whole bins (2) a knapsack problem within a bin to generate candidate bins to carry over reasonably well.

Programming Puzzle: How to paint a board?

There is a N x M board we should paint. We can paint either an entire row or an entire column at once. Given an N x M matrix of colours of all board cells find the minimal number of painting operations to paint the board.
For example: we should paint a 3 x 3 board as follows (R - red, B - blue, G - green):
B, B, B
B, R, R
B, G, G
The minimal number of painting operations is 4:
Paint row 0 with Blue
Paint row 1 with Red
Paint row 2 with Green
Paint column 0 with Blue
How would you solve it ?
This looks like a fun problem. Let me take a shot at it with some pseudocode.
Function MinPaints(Matrix) Returns Integer
If the matrix is empty return 0
Find all rows and columns which have a single color
If there are none, return infinity, since there is no solution
Set the current minimum to infinity
For each row or column with single color:
Remove the row/column from the matrix
Call MinPaints with the new matrix
If the result is less than the current minimum, set the current minimum to the result
End loop
Return the current minimum + 1
End Function
I think that will solve your problem, but I didn't try any optimization or anything. This may not be fast enough though, I don't know. I doubt this problem is solvable in sub-exponential time.
Here is how this algorithm would solve the example:
BBB
BRR
BGG
|
+---BRR
| BGG
| |
| +---RR
| | GG
| | |
| | +---GG
| | | |
| | | +---[]
| | | | |
| | | | Solvable in 0
| | | |
| | | Solvable in 1
| | |
| | +---RR
| | | |
| | | +---[]
| | | | |
| | | | Solvable in 0
| | | |
| | | Solvable in 1
| | |
| | Solvable in 2
| |
| Solvable in 3
| BB
+---Another branch with RR ...
| GG
Solvable in 4
For starters, you can try an informed exhaustive search.
Let your states graph be: G=(V,E) where V = {all possible boards} and E = {(u,v) | you can move from board u to v within a single operation}.
Note that you do not need to generate the graph in advance - you can generate it on the fly, using a successors(board) function, that return all the successors of the given board.
You will also need h:V->R - an admissible heuristic function that evaluates the board1.
Now, you can run A*, or bi-directional BFS search [or combination of both], your source will be a white board, and your target is the requested board. Because we use admissible heuristic function - A* is both complete (always finds a solution if one exists) and optimal (finds the shortest solution), it will find the best solution. [same goes for bi-directional BFS].
drawbacks:
Though the algorithm is informed, it will have exponential behavior.
But if it is an interview question, I believe a non-efficient
solution is better then no solution.
Though complete and optimal - if there is no solution - the algorithm may be stuck in an infinite loop, or a very long loop at the very least until it realizes it has exuahsted all possibilities.
(1) example for admissible heuristic is h(board) = #(miscolored_squares)/max{m,n}

The "Waiting lists problem"

A number of students want to get into sections for a class, some are already signed up for one section but want to change section, so they all get on the wait lists. A student can get into a new section only if someone drops from that section. No students are willing to drop a section they are already in unless that can be sure to get into a section they are waiting for. The wait list for each section is first come first serve.
Get as many students into their desired sections as you can.
The stated problem can quickly devolve to a gridlock scenario. My question is; are there known solutions to this problem?
One trivial solution would be to take each section in turn and force the first student from the waiting list into the section and then check if someone end up dropping out when things are resolved (O(n) or more on the number of section). This would work for some cases but I think that there might be better options involving forcing more than one student into a section (O(n) or more on the student count) and/or operating on more than one section at a time (O(bad) :-)
Well, this just comes down to finding cycles in the directed graph of classes right? each link is a student that wants to go from one node to another, and any time you find a cycle, you delete it, because those students can resolve their needs with each other. You're finished when you're out of cycles.
Ok, lets try. We have 8 students (1..8) and 4 sections. Each student is in a section and each section has room for 2 students. Most students want to switch but not all.
In the table below, we see the students their current section, their required section and the position on the queue (if any).
+------+-----+-----+-----+
| stud | now | req | que |
+------+-----+-----+-----+
| 1 | A | D | 2 |
| 2 | A | D | 1 |
| 3 | B | B | - |
| 4 | B | A | 2 |
| 5 | C | A | 1 |
| 6 | C | C | - |
| 7 | D | C | 1 |
| 8 | D | B | 1 |
+------+-----+-----+-----+
We can present this information in a graph:
+-----+ +-----+ +-----+
| C |---[5]--->1| A |2<---[4]---| B |
+-----+ +-----+ +-----+
1 | | 1
^ | | ^
| [1] [2] |
| | | |
[7] | | [8]
| V V |
| 2 1 |
| +-----+ |
\--------------| D |--------------/
+-----+
We try to find a section with a vacancy, but we find none. So because all sections are full, we need a dirty trick. So lets take a random section with a non empty queue. In this case section A and assume, it has an extra position. This means student 5 can enter section A, leaving a vacancy at section C which is taken by student 7. This leaves a vacancy in section D which is taken by student 2. We now have a vacancy at section A. But we assumed that section A has an extra position, so we can remove this assumption and have gained a simpler graph.
If the path never returned to section A, undo the moves and mark A as an invalid startingpoint. Retry with another section.
If there are no valid sections left we are finished.
Right now we have the following situation:
+-----+ +-----+ +-----+
| C | | A |1<---[4]---| B |
+-----+ +-----+ +-----+
| 1
| ^
[1] |
| |
| [8]
V |
1 |
+-----+ |
| D |--------------/
+-----+
We repeat the trick with another random section, and this solves the graph.
If you start with several students currently not assigned, you add an extra dummy section as their startingpoint. Of course, this means that there must be vacancies in any sections or the problem is not solvable.
Note that due to the order in the queue, it can be possible that there is no solution.
This is actually a Graph problem. You can think of each of these waiting list dependencies as edges on a directed graph. If this graph has a cycle, then you have one of the situations you described. Once you have identified a cycle, you can chose any point to "break" the cycle by "over filling" one of the classes, and you will know that things will settle correctly because there was a cycle in the graph.

Resources