Better Model/Algorithm to predict multiple target - algorithm

I wanna use multiple variables to predict multiple target. Note multiple target here doesn't mean multi-label.
Let's go for an example like this:
# In this example: x1,x2,x3 are used to predict y1,y2
pd.DataFrame({'x1':[1,2,3],'x2':[2,3,4],'x3':[1,1,1],'y1':[1,2,1],'y2':[2,3,3]})
x1 x2 x3 y1 y2
0 1 2 1 1 2
1 2 3 1 2 3
2 3 4 1 1 3
In my limited experience with data mining, I found two solutions might help:
Build two xgboost models respectively to predict y1,y2
Using a full-connected layer to embed [x1,x2,x3] into [y1,y2], which seems it's a promising solution.
Wanted to know if it's good practice to do that and what would be the better way to predict multiple target?

Regardless of your approach, two outputs means you need two functions. I hope it's clear that a layer producing two outputs is equivalent to two layers producing an output each.
The only thing worth taking into account here (only relevant for deeper models) is whether you want to build intermediate representations of your input that are shared for predicting both outputs, i.e. x → h1 → h2 → .. → hN, hN → y1, hN → y2. Doing so would enforce your hN representation to act as a task-indifferent, multi-purpose encoder while simultaneously reducing the complexity of having two models learn the same thing.
For shallow architectures, such as the single-layer one you described, this is meaningless.

Related

Constraint Satisfaction Problems with solvers VS Systematic Search

We have to solve a difficult problem where we need to check a lot of complex rules from multiple sources against a system in order to decide if the system satisfy those rules or how it should be changed to satisfy them.
We initially started using Constraint Satisfaction Problems algorithms (using Choco) to try to solve it but since the number of rules and variables would be smaller than anticipated, we are looking to build a list of all possibles configurations on a database and using multiple requests based on the rules to filter this list and find the solutions this way.
Is there limitations or disadvantages of doing a systematic search compared to using a CSP solver algorithms for a reasonable number of rules and variables? Will it impact performances significantly? Will it reduce the kind of constraints we can implement?
As examples :
You have to imagine it with a much bigger number of variables, much bigger domains of definition (but always discrete) and bigger number of rules (and some much more complex) but instead of describing the problem as :
x in (1,6,9)
y in (2,7)
z in (1,6)
y = x + 1
z = x if x < 5 OR z = y if x > 5
And giving it to a solver we would build a table :
X | Y | Z
1 2 1
6 2 1
9 2 1
1 7 1
6 7 1
9 7 1
1 2 6
6 2 6
9 2 6
1 7 6
6 7 6
9 7 6
And use queries like (this is just an example to help understand, actually we would use SPARQL against a semantic database) :
SELECT X, Y, Z WHERE Y = X + 1
INTERSECT
SELECT X, Y, Z WHERE (Z = X AND X < 5) OR (Z = Y AND X > 5)
CSP allows you to combine deterministic generation of values (through the rules) with heuristic search. The beauty happens when you customize both of those for your problem. The rules are just one part. Equally important is the choice of the search algorithm/generator. You can cull a lot of the search space.
While I cannot make predictions about the performance of your SQL solution, I must say that it strikes me as somewhat roundabout. One specific problem will happen if your rules change - you may find that you have to start over from scratch. Also, the RDBMS will fully generate all of the subqueries, which may explode.
I'd suggest to implement a working prototype with CSP, and one with SQL, for a simple subset of your requirements. You then will get a good feeling what works and what does not. Be sure to think about long term maintenance as well.
Full disclaimer: my last contact with CSP was decades ago in university as part of my master's (I implemented a CSP search engine not unlike choco, of course a bit more rudimentary, and researched a bit on that topic). But the field will certainly have evolved since then.

Solving a puzzle using swi-prolog

I've been given as an assignment to write using prolog a solver for
the battleships solitaire puzzle. To those unfamiliar, the puzzle deals
with a 6 by 6 grid on which a series of ships are placed according to the provided
constraints on each row and column, i.e. the first row must contain 3 squares with ships, the second row must contain 1 square with a ship, the third row must contain 0 squares etc for the other rows and columns.
Each puzzle comes with it's own set of constraints and revealed squares, typically two. An example can be seen here:
battleships
So, here's what I've done:
step([ShipCount,Rows,Cols,Tiles],[ShipCount2,Rows2,Cols2,Tiles2]):-
ShipCount2 is ShipCount+1,
nth1(X,Cols,X1),
X1\==0,
nth1(Y,Rows,Y1),
Y1\==0,
not(member([X,Y,_],Tiles)),
pairs(Tiles,TilesXY),
notdiaglist(X,Y,TilesXY),
member(T,[1,2,3,4,5,6]),
append([X,Y],[T],Tile),
append([Tile],Tiles,Tiles2),
dec_elem1(X,Cols,Cols2),dec_elem1(Y,Rows,Rows2).
dec_elem1(1,[A|Tail],[B|Tail]):- B is A-1.
dec_elem1(Count,[A|Tail],[A|Tail2]):- Count1 is Count-1,dec_elem1(Count1,Tail,Tail2).
neib(X1,Y1,X2,Y2) :- X2 is X1,(Y2 is Y1 -1;Y2 is Y1+1; Y2 is Y1).
neib(X1,Y1,X2,Y2) :- X2 is X1-1,(Y2 is Y1 -1;Y2 is Y1+1; Y2 is Y1).
neib(X1,Y1,X2,Y2) :- X2 is X1+1,(Y2 is Y1 -1;Y2 is Y1+1; Y2 is Y1).
notdiag(X1,Y1,X2,Y2) :- not(neib(X1,Y1,X2,Y2)).
notdiag(X1,Y1,X2,Y2) :- neib(X1,Y1,X2,Y2),((X1 == X2,t(Y1,Y2));(Y1 == Y2,t(X1,X2))).
notdiaglist(X1,Y1,[]).
notdiaglist(X1,Y1,[[X2,Y2]|Tail]):-notdiag(X1,Y1,X2,Y2),notdiaglist(X1,Y1,Tail).
t(X1,X2):- X is abs(X1-X2), X==1.
pairs([],[]).
pairs([[X,Y,Z]|Tail],[[X,Y]|Tail2]):-pairs(Tail,Tail2).
I represent a state with a list: [Count,Rows,Columns,Tiles]. The last state must be
[10,[0,0,0,0,0,0],[0,0,0,0,0,0], somelist]. A puzzle starts from an initial state, for example
initial([1, [1,3,1,1,1,2] , [0,2,2,0,0,5] , [[4,4,1],[2,1,0]]]).
I try to find a solution in the following manner:
run:-initial(S),step(S,S1),step(S1,S2),....,step(S8,F).
Now, here's the difficulty: if i restrict myself to one type of ship parts by using member(T,[1])
instead of
member(T,[1,2,3,4,5,6])
it works fine. However, when I use the full range of possible values for T which are needed
later, the query never ends since it runs for too long. this puzzles me, since :
(a) it works for 6 types of ships but only for 8 steps instead of 9
(b) going from a single type of ship to 6 types increases the number
of options for just the last step by a factor of 6, which
shouldn't have such a dramatic effect.
So, what's going on?
To answer your question directly, what's going on is that Prolog is trying to sift through an enormous space of possibilities.
You're correct that altering that line increases the search space of the last call by a factor of six, note that the size of the search space of, say, nine calls, isn't proportional to 9 times the size of one call. Prolog will backtrack on failure, so it's proportional (bounded above, actually) to the size of the possible results of one call raised to the ninth power.
That means we can expect the size of the space Prolog needs to search to grow by at most a factor of 6^9 = 10077696 when we allow T to take on 6 times as many values.
Of course, it doesn't help that (as far as I was able to tell) a solution doesn't exist if we call step 9 times starting with initial anyways. Since that last call is going to fail, Prolog will keep trying until it's exhausted all possibilities (of which there are a great many) before it finally gives up.
As far as a solution goes, I'm not sure I know enough about the problem. If the value if T is the kind of ship that fits in the grid (e.g. single square, half of a 2-square-ship, part of a 3-square-ship) you should note that that gives you a lot more information than the numbers on the rows/columns.
Right now, in pseudocode, your step looks like this:
Find a (X,Y) pair that has non-zero markings on its row/column
Check that there isn't already a ship there
Check that it isn't diagonal to a ship
Pick a kind of ship-part for it to be.
I'd suggest you approach like this:
Finish any already placed ship-bits to form complete ships (if we can't: fail)
Until we're finished:
Find acceptable places to place ship
Check that the markings on the row/column aren't zero
Try to place an entire ship here. (instead of a single part)
By using the most specific information that we have first (in this case, the previously placed parts), we can reduce the amount of work Prolog has to do and make things return reasonably fast.

Algorithm for planning the creation of an object from dependencies

I am an alchemist. I can make things out of other things according to my recipe book. For instance:
2 lead + 1 bismuth -> 1 carbon
1 oxygen + 5 hydrogen + 3 nitrogen -> 2 carbon
5 carbon + 5 titanium -> 1 gold
...etc.
My recipe book contains thousands of recipes, each of which consumes some discrete amount of one or more inputs and produces a discrete amount of one output. Being a lazy alchemist, I don't want to remember all my recipes. I want to write a computer program to solve this problem for me. The input to the program is a description of what I want, like "2 gold", and a description of what I have in stock, like "5 titanium, 6 lead, 3 bismuth, 2 carbon, 1 gold". The output should be either "cannot be made" or a sequence of instructions for creating the thing. For the example given here, the output could be:
make 2 carbon out of 4 lead + 2 bismuth
make 1 gold out of 4 carbon + 4 titanium
Then, combined with the 1 gold I already have, I have the 2 gold I wanted.
One last note: the recipes are weighted; e.g. I prefer to make carbon out of lead and bismuth if I can.
Is there an elegant way to formulate and solve this problem? A naive recursive solution looks tempting, but I can think of recipe sets that would cause it to do an exponential amount of work.
(And, as a followup, someday my research might uncover a circular set of recipes---maybe I can make 1 hydrogen out of 1 helium and 1 helium out of 1 hydrogen---and I would like to be able to handle this case as well.)
The problem is NP-hard.
Given an instance of CNF-SAT, prepare alchemical tables with reagents for
each variable
each literal
each clause (unsatisfied version)
each clause (satisfied version)
the output.
The reactions are
variable to large supply of corresponding positive literal
variable to large supply of corresponding negative literal
clause (unsatisfied version) and satisfying literal to clause (satisfied version)
all clauses (satisfied versions) to the output.
The question is whether we can make the output given one of each variable and one of each clause (unsatisfied version).
This problem is related to the problem of determining reachability of vector addition systems/Petri nets; my reduction is based in part on reductions that appeared in that literature.

2D PHP Pattern Creator Altgorithm

I'm looking for an algorithm to help me build 2D patterns based on rules. The idea is that I could write a script using a given site of parameters, and it would return a random, 2-dimensional sequence up to a given length.
My plan is to use this to generate image patterns based on rules. Things like image fractals or sprites for game levels could possibly use this.
For example, lets say that you can use A, B, C, & D to create the pattern. The rule is that C and A can never be next to each other, and that D always follows C. Next, lets say I want a pattern of size 4x4. The result might be the following which respects all the rules.
A B C D
B B B B
C D B B
C D C D
Are there any existing libraries that can do calculations like this? Are there any mathematical formulas I can read-up on?
While pretty inefficient concering runtime, backtracking is an often used algorithm for such a problem.
It follows a simple pattern, and if written correctly, you can easily replace a rule set into it.
Define your rule data structures; i.e., define the set of operations that the rules can encapsulate, and define the available cross-referencing that can be done. Once you've done this, you should have a clearer view of what type of algorithms to use to apply these rules to a potential result set.
Supposing that your rules are restricted to "type X is allowed to have type Y immediately to its left/right/top/bottom" you potentially have situations where generating possible patterns is computationally difficult. Take a look at Wang Tiles (a good source is the book Tilings and Patterns by Grunbaum and Shephard) and you'll see that with the states sets of rules you might define sets of Wang Tiles. Appropriate sets of these are Turing Complete.
For small rectangles, or your sets of rules, this may only be of academic interest. As mentioned elsewhere a backtracking approach might be appropriate for your ruleset - in which case you may want to consider appropriate heuristics for the order in which new components are added to your grid. Again, depending on your rulesets, other approaches might work. E.g. if your ruleset admits many solutions you might get a long way by randomly allocating many items to the grid before attempting to fill in remaining gaps.

Aggregating automatically-generated feature vectors

I've got a classification system, which I will unfortunately need to be vague about for work reasons. Say we have 5 features to consider, it is basically a set of rules:
A B C D E Result
1 2 b 5 3 X
1 2 c 5 4 X
1 2 e 5 2 X
We take a subject and get its values for A-E, then try matching the rules in sequence. If one matches we return the first result.
C is a discrete value, which could be any of a-e. The rest are just integers.
The ruleset has been automatically generated from our old system and has an extremely large number of rules (~25 million). The old rules were if statements, e.g.
result("X") if $A >= 1 && $A <= 10 && $C eq 'A';
As you can see, the old rules often do not even use some features, or accept ranges. Some are more annoying:
result("Y") if ($A == 1 && $B == 2) || ($A == 2 && $B == 4);
The ruleset needs to be much smaller as it has to be human maintained, so I'd like to shrink rule sets so that the first example would become:
A B C D E Result
1 2 bce 5 2-4 X
The upshot is that we can split the ruleset by the Result column and shrink each independently. However, I cannot think of an easy way to identify and shrink down the ruleset. I've tried clustering algorithms but they choke because some of the data is discrete, and treating it as continuous is imperfect. Another example:
A B C Result
1 2 a X
1 2 b X
(repeat a few hundred times)
2 4 a X
2 4 b X
(ditto)
In an ideal world, this would be two rules:
A B C Result
1 2 * X
2 4 * X
That is: not only would the algorithm identify the relationship between A and B, but would also deduce that C is noise (not important for the rule)
Does anyone have an idea of how to go about this problem? Any language or library is fair game, as I expect this to be a mostly one-off process. Thanks in advance.
Check out the Weka machine learning lib for Java. The API is a little bit crufty but it's very useful. Overall, what you seem to want is an off-the-shelf machine learning algorithm, which is exactly what Weka contains. You're apparently looking for something relatively easy to interpret (you mention that you want it to deduce the relationship between A and B and to tell you that C is just noise.) You could try a decision tree, such as J48, as these are usually easy to visualize/interpret.
Twenty-five million rules? How many features? How many values per feature? Is it possible to iterate through all combinations in practical time? If you can, you could begin by separating the rules into groups by result.
Then, for each result, do the following. Considering each feature as a dimension, and the allowed values for a feature as the metric along that dimension, construct a huge Karnaugh map representing the entire rule set.
The map has two uses. One: research automated methods for the Quine-McCluskey algorithm. A lot of work has been done in this area. There are even a few programs available, although probably none of them will deal with a Karnaugh map of the size you're going to make.
Two: when you have created your final reduced rule set, iterate over all combinations of all values for all features again, and construct another Karnaugh map using the reduced rule set. If the maps match, your rule sets are equivalent.
-Al.
You could try a neural network approach, trained via backpropagation, assuming you have or can randomly generate (based on the old ruleset) a large set of data that hit all your classes. Using a hidden layer of appropriate size will allow you to approximate arbitrary discriminant functions in your feature space. This is more or less the same idea as clustering, but due to the training paradigm should have no issue with your discrete inputs.
This may, however, be a little too "black box" for your case, particularly if you have zero tolerance for false positives and negatives (although, it being a one-off process, you get an arbitrary degree of confidence by checking a gargantuan validation set).

Resources