Algorithm for impelemnting a survey-like program - algorithm

I wish to create a program in Java which will ask the user a number of questions and report some results. It is pretty much like a survey. In order to explain the problem better consider the following example:
Let’s say that there are currently 4 questions available eg Qa, Qb, Qc and Qd. Each question has a number of possible options:
=> Question A has 4 possible options a1, a2, a3 and a4.
=> Question B has 3 possible options b1, b2 and b3
=> Question C has 5 possible options c1, c2, c3, c4 and c5
=> Question D has 2 possible options d1 and d2
Moreover there are some results available which will be reported based on the user’s answers in the above questions. Let’s assume that there are 5 such results called R1, R2, R3, R4 and R5. Each result has a number of characteristics. These characteristics are really answers to the above questions. More precisely:
=> The characteristics of R1 is the set of {Qa = a4, Qb = b2, Qc = c2, Qd = d1}
This says that R1 is related with Qa via the a1 option, with Qb via the b2 option and so on
=> R2: {Qa = a3, Qb = b3, Qc = c3, Qd = d2}
=> R3: {Qa = a4, Qb = b1, Qc = c1, Qd = d2}
=> R4: {Qa = a2, Qb = b2, Qc = c5, Qd = d1}
=> R5: {Qa = a1, Qb = b3, Qc = c4, Qd = d2}
Let’s say that a user U provides the following answers to the questions
{Qa = a4, Qb = b1, Qc = c1, Qd = d1}
The purpose of the program is to report the result which is closer to the user answers along with a percentage of how close it is. For instance since there is no any result which matches 100% the user answers the program should report the results which match as more answers as possible (above a certain threshold eg 50%). In that specific case the program should report the follow results:
=> R3 with 75% (since there are 3 matches on the 4 questions)
=> R1 with 50% (since there are 2 matches on the 4 questions)
Notice that R4 has one match (so 25%) whereas R2 and R5 have no matches at all (so 0%).
The main issue on implementing the above program is that there are a lot of questions (approximately 30) with a number of answers each (3-4 answers each). I am not aware of an efficient algorithm which can retrieve the results which are closer to the user answers. Notice that the way that these results are stored is not important at all. You can assume that the results are stored in a relational database and that SQL query is used to retrieve them.
The only solution I can think of is to perform an exhaustive search but this not efficient at all. In other words I am thinking to do the following:
=> First try to retrieve results which match exactly the user answers:
{Qa = a4, Qb = b1, Qc = c1, Qd = d1}
=> If no results exist then change the option of a question (eg Qa) and try again. For example try:
{Qa = a1, Qb = b1, Qc = c1, Qd = d1}
=> If there is still nothing then try the rest possible options for Qa eg a2, a3
=> If there is still nothing then give Qa the initial user answer (that is a4) and move to Qb to do something similar. For example try something like: {Qa = a4, Qb = b2, Qc = c1, Qd = d1}
=> If after trying all the possible options for all questions one by one there are any results then try changing the options of COMBINATIONS of questions. For example try change the options of two questions at the same time (eg Qa and Qb): {Qa = a1, Qb = b2, Qc = c1, Qd = d1}
=> Then try combinations of three questions and so on...
Clearly the above algorithm would be extremely slow on a large number of questions. Is there any known algorithm or heuristic which is more efficient than the above algorithm?
Thanks in advance

"Only" 30 Questions?
Then the following "stupid" algorithm will probably be faster than any highly "intelligent" and complicated algorithm.
iterate over characteristics
score = 0
iterate over questions
if questions's answer is right in current characteristic
score++
Then add a variable which keeps track of the maximum value and matching characteristic and you are set.
Runtime is size of characteristics * size of questions, whereas the algorithm you are describing can have exponential runtime, and on top of that is much more complicated both for programming and for executing (due to effects as branch misprediction)

Related

SQL UNION Optimization

I have 4 tables named A1, A2, B1, B2.
To fulfill a requirement, I have two ways to write SQL queries. The first one is:
(A1 UNION ALL A2) A JOIN (B1 UNION ALL B2) B ON A.id = B.a_id WHERE ...
And the second one is:
(A1 JOIN B1 on A1.id = B1.a_id WHERE ...) UNION ALL (A2 JOIN B2 on A2.id = B2.a_id WHERE ... )
I tried both approaches and realized they both give the same execution time and query plans in some specific cases. But I'm unsure whether they will always give the same performance or not.
So my question is when the first/second one is better in terms of performance?
In terms of coding, I prefer the first one because I can create two views on (A1 UNION ALL A2) as well as (B1 UNION ALL B2) and treat them like two tables.
The second one is better:
(A1 JOIN B1 on A1.id = B1.a_id WHERE ...) UNION ALL (A2 JOIN B2 on A2.id = B2.a_id WHERE ... )
It gives more information to Oracle CBO optimizer about how your tables are related to each other. CBO can calculate potentials plans' costs more precisely. It's all about cardinality, column statistics, etc.
Purely functionally, and without knowing what's in the tables,the first seems better - if data matches in a1 and b2, your 2nd query won't join it.

How to access rule data in PROLOG

I have to determine whether two rectangles overlap or not, I can do this but I am struggling with figuring out how to grab my given data, and compare it to eachother to determine larger values.
%This is :what would be happening :
%separate(rectangle(0,10,10,0), rectangle(4,6,6,4))
separate(R1,R2) :-
%I Have to figure out how to take the values from R1 and R2 and compare
%them to one another.
.
It is called "pattern matching".
separated(R1, R2) :-
R1 = rectangle(A1, B1, C1, D1),
R2 = rectangle(A2, B2, C2, D2),
/* now just use your As and Bs */
and in many cases it is better to write straight away:
separated(rectangle(A1, B1, C1, D1), rectangle(A2, B2, C2, D2)) :-
/* now just use your As and Bs */

Data structure traversal

Lets say I have package A version 1 and package A version 2, Will call them A1 and A2 respectively.
If I have a pool of packages: A1, A2, B1, B2, C1, C2, D1, D2
A1 depends on B1, will represent as (A1, (B1)).
Plus A1 depends on any version of package C "C1 or C2 satisfy A1", will represent as (A1, (C1, C2))
combining A1 deps together, then A1 data-structure becomes: (A1, (B1), (C1, C2))
Also B1 depends on D1: (B1, (D1))
A1 structure becomes: (A1, ((B1, (D1))), (C1, C2))
similarly A2 structure is (A2, ((B2, (D2))), (C1, C2))
My question is: How can I select best candidate of package A, where I can select based on a condition (for example, the condition is the package does not conflict with current installed packages).
by combining A1 and A2: ((A1, ((B1, (D1))), (C1, C2)), (A2, ((B2, (D2))), (C1, C2)))
How can I traverse this data structure
So start with A1, if doesn't conflict check B1, if doesn't conflict check D1, if doesn't conflict check (C1, C2), and take one only either C1 or C2.
With this I end up selecting (A1, B1, D1, C1).
In case if A1 or any of its deps did not meet the condition, (for example if B1 conflicts with installed packages), then drop A1 entirely and move to check A2. then end up with (A2, B2, D2, C1).
What kind of traversal would that be?
I have been reading about in-order, pre-order, post-order traversal, and wondering if I need to do something similar here.
Assuming you are asking traversal on a more generic problem rather than working on this instance, I don't think there exists such a traversal.
Note that in-order is only applicable to BINARY trees. Any other kind of tree does not have in-order traversal. If your generic problem has B1, B2, B3, then apparently there wouldn't be a binary tree representation.
One property about traversal, is that the tree has all the information inclusively in the itself. When you traverse over a tree you never worry about "external information". In your case, your tree is not complete in information - you need to depend on external information to see if there is a conflict. e.g. B1 is installed - this information is never in the tree.
You can use adjacency list to represent the data:
Suppose the packages are A1, A2, B1, B2, C1, C2.
And A1 depends on B1 and C2, A2 depends on B1 and C1 and C2.
The above data can be represented as
[A1] -> [B1, C2]
[A2] -> [B1, C1, C2]
Use Topological Sorting to get the order of dependencies

Show that cross product of a x b is perpendicular to b [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
How do I know the cross product of A x B is perpendicular to B.
I'm little confused because there are 3 vectors instead of 2.
A = (0, -2, 5)
B = (2, 2, -5)
C= ( 7, -4, -5)
On R2 plane, (a x b) * b = 0 proves that a x b is perpendicular to b , but how do I find that on R3.
SO, after some of Research I finally figured out how to prove the vectors are perpendicularly to each other on R3.
A= (a1, a2, a3)
B= (b1, b2, b3)
C= (c1, c2, c3)
(AB x AC )* AB = 0
(AB x AC )* AC = 0
I don't think you understand what the cross product does. It gives a vector orthogonal to the two vectors.
The cross product a × b is defined as a vector c that is perpendicular
(orthogonal) to both a and b, with a direction given by the right-hand
rule and a magnitude equal to the area of the parallelogram that the
vectors span.
you can simply show this by using the definition of orthogonality which is from their dot products being zero.
Questions like this come down to precisely what you take to be your definitions.
For instance, one way to define the cross-product A x B is this:
By R^3 we mean three dimensional real space with a fixed orientation.
Observe that two linearly independent vectors A and B in R^3 span a plane, so every vector perpendicular to them lies on the (unique) line through the origin perpendicular to this plane.
Observe that for any positive magnitude, there are precisely two vectors along this line with that magnitude.
Observe that if we consider the ordered basis {A, B, C} of R^3, where C is one of the two vectors from the previous step, then one choice matches the orientation of R^3 and the other does not.
Define A x B as the vector C from the previous step for which {A, B, C} matches the orientation of R^3.
For instance, this is how the cross product is defined in the Wikipedia article:
"The cross product a × b is defined as a vector c that is perpendicular (orthogonal) to both a and b, with a direction given by the right-hand rule and a magnitude equal to the area of the parallelogram that the vectors span."
If this is your definition, then there is literally nothing to prove, because the definition already has the word "perpendicular" in it.
Another definition might go like this:
By R^3 we mean three-dimensional real space with a fixed orientation.
For an ordered basis { e1, e2, e3 } of R^3 with the same orientation as R^3, we can write any two vectors A and B as A = a1 e1 + a2 e2 + a3 e3 and B as B = b1 e1 + b2 e2 + b3 e3.
Observe that, regardless of the choice of { e1, e2, e3 } we make in step 2, the vector C := (a2 b3 - b2 a3) e1 - (a1 b3 - b3 a1) e2 + (a1 b2 - b1 a2) e3 is always the same.
Take the vector C from the previous step as the definition of A x B.
This isn't a great definition, because step 3 is both a lot of work and complete black magic, but it's one you'll commonly see. If this is your definition, the best way to prove that A x B is perpendicular to A and B would be to show that the other definition gives you the same vector as this one, and then the perpendicularity comes for free.
A more direct way would be to show that vectors with a dot product of zero are perpendicular, and then to calculate the dot product by doing a bunch of algebra. This is, again, a fairly popular way to do it, but it's essentially worthless because it doesn't offer any insight into what's going on.

Algorithms to create a tabular representation of a DAG?

Given a DAG, in which each node belongs to a category, how can this graph be transformed into a table with a column for each category? The transformation doesn't have to be reversible, but should preserve useful information about the structure of the graph; and should be a 'natural' transformation, in the sense that a person looking at the graph and the table should not be surprised by any of the rows. It should also be compact, i.e. have few rows.
For example given a graph of nodes a1,b1,b2,c1 with edges a1->b1, a1->b2, b1->c1, b2->c1 (i.e. a diamond-shaped graph) I would expect to see the following table:
a b c
--------
a1 b1 c1
a1 b2 c1
I've thought about this problem quite a bit, but I'm having trouble coming up with an algorithm that gives intuitive results on certain graphs. Consider the graph a1,b1,c1 with edges a1->c1, b1->c1. I'd like the algorithm to produce this table:
a b c
--------
a1 b1 c1
But maybe it should produce this instead:
a b c
--------
a1 c1
a1 b1
I'm looking for creative ideas and insights into the problem. Feel free to vary to simplify or constrain the problem if you think it will help.
Brainstorm away!
Edit:
The transformation should always produce the same set of rows, although the order of rows does not matter.
The table should behave nicely when sorting and filtering using, e.g., Excel. This means that mutliple nodes cannot be packed into a single cell of the table - only one node per cell.
What you need is a variation of topological sorting. This is an algorithm that "sorts" graph vertexes as if a---->b edge meant a > b. Since the graph is a DAG, there is no cycles in it and this > relation is transitive, so at least one sorting order exists.
For your diamond-shaped graph two topological orders exist:
a1 b1 b2 c1
a1 b2 b1 c1
b1 and b2 items are not connected, even indirectly, therefore, they may be placed in any order.
After you sorted the graph, you know an approximation of order. My proposal is to fill the table in a straightforward way (1 vertex per line) and then "compact" the table. Perform sorting and pick the sequence you got as output. Fill the table from top to bottom, assigning a vertex to relevant column:
a b c
--------
a1
b2
b1
c1
Now compact the table by walking from top to bottom (and then make similar pass from bottom to top). On each iteration, you take a closer look to a "current" row (marked as =>) and to the "next" row.
If in a column nodes in current and next node differ, do nothing for this column:
from ----> to
X b c X b c
-------- --------
=> X1 . . X1 . .
X2 . . => X2 . .
If in a column X in the next row there is no vertex (table cell is empty) and in the current row there is vertex X1, then you sometimes should fill this empty cell with a vertex in the current row. But not always: you want your table to be logical, don't you? So copy the vertex if and only if there's no edge b--->X1, c--->X1, etc, for all vertexes in current row.
from ---> to
X b c X b c
-------- --------
=> X1 b c X1 b c
b1 c1 => X1 b1 c1
(Edit:) After first (forward) and second (backward) passes, you'll have such tables:
first second
a b c a b c
-------- --------
a1 a1 b2 c1
a1 b2 a1 b2 c1
a1 b1 a1 b1 c1
a1 b1 c1 a1 b1 c1
Then, just remove equal rows and you're done:
a b c
--------
a1 b2 c1
a1 b1 c1
And you should get a nice table. O(n^2).
How about compacting all reachable nodes from one node together in one cell ? For example, your first DAG should look like:
a b c
---------------
a1 [b1,b2]
b1 c1
b2 c1
It sounds like a train system map with stations within zones (a,b,c).
You could be generating a table of all possible routes in one direction. In which case "a1, b1, c1" would seem to imply a1->b1 so don't format it like that if you have only a1->c1, b1->c1
You could decide to produce a table by listing the longest routes starting in zone a,
using each edge only once, ending with the short leftover routes. Or allow edges to be reused only if they connect unused edges or extend a route.
In other words, do a depth first search, trying not to reuse edges (reject any path that doesn't include unused edges, and optionally trim used edges at the endpoints).
Here's what I ended up doing:
Find all paths emanating from a node without in-edges. (Could be expensive for some graphs, but works for mine)
Traverse each path to collect a row of values
Compact the rows
Compacting the rows is dones as follows.
For each pair of columns x,y
Construct a map of every value of x to it's possible values of y
Create another map For entries that only have one distinct value of y, mapping the value of x to its single value of y.
Fill in the blanks using these maps. When filling in a value, check for related blanks that can be filled.
This gives a very compact output and seems to meet all my requirements.

Resources