Graph Coloring in Prolog

Graph Coloring in Prolog - prolog

A map should be colored so that no two neighboring regions are given a same color.
Write a Prolog program that tries to assign the given colors to the regions of the given map.
Your program should work with any given map and any given set of colors.
You can have at most two statements (i.e. facts or rules) related to a specific problem instance: one for the given map and one for the given set of colors.
Other than these, your program should be irrelevant to any particular map or any particular set of colors.
The query must be ?- result(X).
You cannot use predicates assert, asserta, assertz nor those predicates explicitly defined for I/O, like read, write, print. Among predicates for list processing, you can use append, length, member.
All other predicates you need for list processing must be defined by yourself.

Related

Labeling for Constraint Logic Programming in prolog

I am currently writing an (exciting!) code in prolog that aims at returning an optimal reforestation plan after defining a certain purpose.
I have now, per grid cell generated a list of possible tree species to plant, and I have designed which grid cells should actually contain trees in a plan and which should be left "blank" (the density of trees also differs per reforestation purpose). I have now come to the point where I am trying to optimize
The amount of different species: If pos(1,2) can house tree_species1, tree_species2, tree_species3, and pos(4,10) can house tree_species2, tree_species3 and tree_species4 I would prefer that different values (e.g. tree_species1, tree_species4 respectively) are assigned to them. However, if this is not possible, I would love for the program to plant two of the same species rather than return "false" (which I think would happen using the "all_different/1" predicate.
Which tree is best for which purpose. For example, I would rather plant an Aspen than a White beam for biodiversity. I was thinking of connecting the different tree types to a score (biodiversity_score(Tree, Score)), but am unsure how I can use the CLP functions then to generate a "maximum" function.
I came across "labeling" but failed to see how I can manipulate this to my purpose.

How to rotate a word2vec onto another word2vec?

I am training multiple word2vec models with Gensim. Each of the word2vec will have the same parameter and dimension, but trained with slightly different data. Then I want to compare how the change in data affected the vector representation of some words.
But every time I train a model, the vector representation of the same word is wildly different. Their similarity among other words remain similar, but the whole vector space seems to be rotated.
Is there any way I can rotate both of the word2vec representation in such way that same words occupy same position in vector space, or at least they are as close as possible.
Thanks in advance.

That the locations of words vary between runs is to be expected. There's no one 'right' place for words, just mutual arrangements that are good at the training task (predicting words from other nearby words) – and the algorithm involves random initialization, random choices during training, and (usually) multithreaded operation which can change the effective ordering of training examples, and thus final results, even if you were to try to eliminate the randomness by reliance on a deterministically-seeded pseudorandom number generator.
There's a class called TranslationMatrix in gensim that implements the learn-a-projection-between-two-spaces method, as used for machine-translation between natural languages in one of the early word2vec papers. It requires you to have some words that you specify should have equivalent vectors – an anchor/reference set – then lets other words find their positions in relation to those. There's a demo of its use in gensim's documentation notebooks:
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/translation_matrix.ipynb
But, there are some other techniques you could also consider:
transform & concatenate the training corpuses instead, to both retain some words that are the same across all corpuses (such as very frequent words), but make other words of interest different per segment. For example, you might leave words like "hot" and "cold" unchanged, but replace words like "tamale" or "skiing" with subcorpus-specific versions, like "tamale(A)", "tamale(B)", "skiing(A)", "skiing(B)". Shuffle all data together for training in a single session, then check the distances/directions between "tamale(A)" and "tamale(B)" - since they were each only trained by their respective subsets of the data. (It's still important to have many 'anchor' words, shared between different sets, to force a correlation on those words, and thus a shared influence/meaning for the varying-words.)
create a model for all the data, with a single vector per word. Save that model aside. Then, re-load it, and try re-training it with just subsets of the whole data. Check how much words move, when trained on just the segments. (It might again help comparability to hold certain prominent anchor words constant. There's an experimental property in the model.trainables, with a name ending _lockf, that lets you scale the updates to each word. If you set its values to 0.0, instead of the default 1.0, for certain word slots, those words can't be further updated. So after re-loading the model, you could 'freeze' your reference words, by setting their _lockf values to 0.0, so that only other words get updated by the secondary training, and they're still bound to have coordinates that make sense with regard to the unmoving anchor words. Read the source code to better understand how _lockf works.)

How to decide to convert to categorical variable or keep it numeric?

This might be a basic or trivial question and might be straightforward. Still I would like to ask this to clear my doubt once and for all.
Take example of Passanger Class in Famous Titanic Data. Functionally it is indeed a Categorical Data, so it will make perfect sense to convert it to categorical variable. Algorithms as per my understanding tend to see a pattern specific to that class. But at the same time if you see it as numeric variable, it might denote a range also for a decision tree. Say passangers in between first class and second class.
It looks both are correct and both will affect the machine learning algorithm outputs in different ways.
Which one is appropriate and is there anywhere there is a extensive discussion about it? Should we use such ambiguous variables as numeric as well its copy as a categorical variable, which might prove to be a technique to uncover more patterns?

I suppose it's up to you whether you'd rather interpret a continuous PassengerClass variable as "for every one-unit increase in PassengerClass, the passenger's likelihood of survival goes up/down X%," versus a categorical (factor) PassengerClass as, "the likelihoods of survival for groups 2 and 3 (for example, leaving 1st-class passengers as the base group) are X and Y% percent higher, respectively, than the base group, holding all else constant."
I think about variables like PassengerClass almost as "treatment groups." Yes, I suppose you could interpret it as continuous, but I think it makes more sense to consider the unique effects of each class like "people who were given the drug versus those who weren't" - you can very easily compare the impacts of being in a higher class (e.g. 2 or 3) to being in the most common class, 1, which again would be left out.

The problem with mapping categorical notions to numerical is that some algorithms (e.g. neural networks) will interpret the value itself as having a meaning, i.e. you would get different results if you assign values 1,2,3 to passenger classes than, for example 0,1,2 or 3,2,1. The correspondence between the passenger classes and numbers is purely conventional and doesn't necessarily convey any additional meaning.
One could argue that the lesser the number, the "better" the class is, however it's still hard to interpret it as "the first class is twice as good as second class", unless you'll define some measure of "goodness" that will make the relation between numbers "1" and "2" sensible.

In this example, you have categorical data that is ordinal - meaning you can rank the categories (from best accommodations to worst, for example) but they're still categories. Regardless of how you label them, there's no actual information about the relative distances among your categories. You can put them in a table, but not (correctly) on a number line. In cases like this, it's generally best to treat your categorical data as independent categories.

Generating a directed acyclic graph from predefined elements with connection requirements

I am working on a system, when given a bank of different types of elements will create a directed acyclic graph connecting some or all the elements. Each element has some input A and an output B. When building the Graph, the system will need to make sure, the output of the previous node, matches the input of the current one.
The input and output of the nodes are to make sure only certain types of elements are connected
The elements would look like this
ElementName : Input -> Output
Possibly with multiple inputs/output, or with no outputs(See below).
One : X -> Y
Two : Y -> Z,F
Three : Y, Z -> W
Four : Z -> F
Five : F -> NULL
Note:
We are talking about a lot of different elements, 30 or so now, but the plan is to add more as time goes on.
This is part of a project to do a procedural generated narrative. The nodes are individual quests. The inputs are what you need to start the quest. The outputs are how the story state is effected.
Problem:
I have seen several different approaches to generating a random DAG, not one for making a DAG from some preset connection requirements(with rules on connecting them).
I also want some way of limiting complexity of the graph. i.e limit the number of branches they can have.
Idea of what I want:
You have a bunch of different types of legos in a bin, say 30. You have rules on connecting the Legos.
Blue -> Red
Blue -> White
Red -> Yellow
Yellow -> Green/Brown
Brown -> Blue
As you all know, in addition to a color each lego had a shape.So 2 blue legos may not be the same type of lego. So The goal is to build a large structure that fits our rules. Even with our rules, we can still connect the legos in a bunch of different structures.
P.S. I am hoping this is not to general of a question. If it is, please make a note and I will try to make it more specific.

It sounds like an L-system (aka Lindenmayer system) approach would work:
Your collection of Legos is analogous to an alphabet of symbols
Your connection rules correspond to a collection of production rules that expand each symbol into some larger string of symbols
Your starting Lego represents the the initial "axiom" string from which to begin construction
The resulting geometric structures is your DAG
The simplest approach would be something like: given a Lego, randomly select a valid connection rule & add a new Lego to the DAG. From there you could add in more complexity as needed. If you need to skew the random selection to favor certain rules, you're essentially building a stochastic grammar. If the selection of a rule depends on previously generated parts of the DAG it's a type of context sensitive grammar.
Graph rewriting, algorithmically creating a new graph out of base graph, might be a more literal solution, but I personally find that L-systems easier to internalize & that researching them yields results that are not overly academic/theoretical in nature.
L-systems themselves are a category of formal grammars. It might be worth checking into some of those related ideas, but it's pretty easy (for me at least) to get side tracked by theoretical stuff at the expense of core development.

Algorithm / Data Structure for Finding Which of Many Sets are Subsets of another Set

Abstract Description:
I have a set of strings, call it the "active set", and a set of sets of strings - call that the "possible set". When a new string is added to the active set, sets from the possible set may suddenly be subsets of the active set because the active set lacked only that string to be a superset of one of the possibles. I need an algorithm to efficiently find these when I add a new string to the active set. Bonus points if the same data structure allows me to efficiently find which of these possible sets are invalidated (no longer a subset) when a string is removed from the active set.
(The reason I framed the problem described below in terms of sets and subsets of strings in the abstract above is that the language I'm writing this in (Io) is dynamically typed. Objects do have a "type" field but it is a string with the name of the object type in it.)
Background:
In my game engine I have GameObjects which can have several types of Representation objects added to them. For instance if a GameObject has physical presence it might have a PhysicsRepresentation added to it (or not if it's not a solid object). It might have various kinds of GraphicsRepresentations added to it, such as a mesh or particle effect (and you can have more than one if you have multiple visual effects attached to the same game object).
The point of this is to separate subsystems, but you can't completely separate everything: for instance when a GameObject has both a PhysicsRepresentation and a GraphicsRepresentation, something needs to create a 3rd object which connects the position of the GraphicsRepresentation to the location of the PhysicsRepresentation. To serve this purpose while still keeping all the components separate, I have Interaction objects. The Interaction object encapsulates the cross-cutting knowledge about how two system components have to interact.
But in order to protect GameObject from having to know too much about Representations and Interactions, GameObject just provides a generic registry where Interaction prototype objects can register to be called when a particular combination of Representations is present in the GameObject. When a new Representation is added to the GameObject, GameObject should look in it's registry and activate just those Interaction objects which are newly enabled by the presence of the new Representation, plus the existing Representations.
I'm just stuck on what data structure should be used for this registry and how to search it.
Errata:
The sets of strings are not necessarily sorted, but I can choose to store them sorted.
Although an Interaction most commonly will be between two Representations, I do not want to limit it to that; I should be able to have Interactions that trigger with 3 or more different representations, or even interactions that trigger based on just 1 representation.
I want to optimize this for the case of making it as fast as possible to add/remove representations.
I will have many active sets (each game object has an active set), but I have only one possible set (the set of all registered interaction types). So I don't care how long it takes to build the data structure that represents the possible set, because it only needs to be done once provided the algorithm for comparing different active sets is non-destructive of the possible set data structure.

If your sets are really small, the best representation is using bit sets. First, you build a map from strings to consecutive integers 0..N, where N is the number of distinct strings. Then you build your sets by bitwise OR-ing of 1<<k into a number. This lets you turn your set operations into bitwise operations, which are extremely fast (an intersection is an &; a union is an |, and so on).
Here is an example: Let's say you have two sets, A={quick, brown, fox} and B={brown, lazy, dog}. First, you build a string-to-number map, like this:
quick - 0
brown - 1
fox - 2
lazy - 3
dog - 4
Then your sets would become A=00111b and B=11010b. Their intersection is A&B = 00010b, and their union is A|B = 11111b. You know a set X is a subset of set Y if X == X&Y.

One way to do this would be to keep, for each subset, a count of how many of its strings were not in the main set, and a map from strings to lists of subsets containing that string, so that you can update the counts when you add or remove a new string to the active set, and notice when a count goes down to zero.
This problem reminds me of firing rules in a rule-based system when a fact becomes true, which corresponds to a new string being added to the active set. Many of these systems use http://en.wikipedia.org/wiki/Rete_algorithm. http://www.jboss.org/drools/drools-expert.html is an open source rule-based system - although it looks like there is a lot of enterprise system wrapping round it these days.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio