I was thinking earlier today about an idea for a small game and stumbled upon how to implement it. The idea is that the player can make a series of moves that cause a little effect, but if done in a specific sequence would cause a greater effect. So far so good, this I know how to do. Obviously, I had to make it be more complicated (because we love to make it more complicated), so I thought that there could be more than one possible path for the sequence that would both cause greater effects, albeit different ones. Also, part of some sequences could be the beggining of other sequences, or even whole sequences could be contained by other bigger sequences. Now I don't know for sure the best way to implement this. I had some ideas, though.
1) I could implement a circular n-linked list. But since the list of moves never end, I fear it might cause a stack overflow ™. The idea is that every node would have n children and upon receiving a command, it might lead you to one of his children or, if no children was available to such command, lead you back to the beggining. Upon arrival on any children, a couple of functions would be executed causing the small and big effect. This might, though, lead to a lot of duplicated nodes on the tree to cope up with all the possible sequences ending on that specific move with different effects, which might be a pain to maintain but I am not sure. I never tried something this complex on code, only theoretically. Does this algorithm exist and have a name? Is it a good idea?
2) I could implement a state machine. Then instead of wandering around a linked list, I'd have some giant nested switch that would call functions and update the machine state accordingly. Seems simpler to implement, but... well... doesn't seem fun... nor ellegant. Giant switchs always seem ugly to me, but would this work better?
3) Suggestions? I am good, but I am far inexperienced. The good thing of the coding field is that no matter how weird your problem is, someone solved it in the past, but you must know where to look. Someone might have a better idea than those I had, and I really wanted to hear suggestions.
I'm not absolutely completely sure that I understand exactly what you're saying, but as an analagous situation, say someone's inputting an endless stream of numbers on the keyboard. '117' is a magic sequence, '468' is another one, '411799' is another (which contains the first one).
So if the user enters:
55468411799
you want to fire 'magic events' at the *s:
55468*4117*99*
or something like that, right? If that's analagous to the problem you're talking about, then what about something like (Java-like pseudocode):
MagicSequence fireworks = new MagicSequence(new FireworksAction(), 1, 1, 7);
MagicSequence playMusic = new MagicSequence(new MusicAction(), 4, 6, 8);
MagicSequence fixUserADrink = new MagicSequence(new ManhattanAction(), 4, 1, 1, 7, 9, 9);
Collection<MagicSequence> sequences = ... all of the above ...;
while (true) {
int num = readNumberFromUser();
for (MagicSequence seq : sequences) {
seq.handleNumber(num);
}
}
while MagicSequence has something like:
Action action = ... populated from constructor ...;
int[] sequence = ... populated from constructor ...;
int position = 0;
public void handleNumber(int num) {
if (num == sequence[position]) {
// They've entered the next number in the sequence
position++;
if (position == sequence.length) {
// They've got it all!
action.fire();
position = 0; // Or disable this Sequence from accepting more numbers if it's a once-off
}
} else {
position = 0; // missed a number, start again!
}
}
You might want to implement a state machine anyway, but you don't have to hardcode state transitions.
Try to make a graph of states, where link between state A to state B will mean A can lead to B.
Then you can traverse graph at runtime to find where player goes.
Edit: You can define graph node as:
-state-id
-list of links to other states,
where every link defines:
-state-id
-precondition, a list of states what must be visited before going to this state
What you're describing sounds very similar to the technology tree in a game live Civilization.
I don't know how the Civ authors built theirs, but I'd be inclined to use a multigraph to represent possible 'moves' - there will be some you can start at with no 'experience', and once you're in them, there will be multiple paths through to the end.
Draw-out what potential options you can have at each stage of the game, and then draw lines going from some options to others.
That should give you a start on implementation, as graphs are [relatively] easy concepts to implement and utilize.
Sounds like a neural network. You could create one and train it to recognize the patterns that cause the various effects you are looking for.
What you're describing sounds somewhat similar to a dependency graph or a word graph. You might look into those.
#Cowan, #Javier: Nice idea, mind if I add to it?
Let the MagicSequence objects listen to the incoming stream of user input, that is notify them of the input (broadcast) and let each of them add the input to there internal input stream. This stream is cleared when the input is not the expected next input in the pattern that would have the MagicSequence fire its action. As soon as the pattern is completed, fire the action and clear the internal input stream.
Optimize this by only feeding input to the MagicSequences that are waiting for it. This could be done two ways:
You have an object that lets all MagicSequences connect with events that correspond with numbers in their patterns. MagicSequence(1,1,7) would add itself to got1 and got7, for example:
UserInput.got1 += MagicSequnece[i].SendMeInput;
You could optimize this such that after each input MagicSequences deregister from invalid events and register with valid ones.
create a small state machine for each effect that you'd want. at each user action, 'broadcast' it to all state machines. most of then won't care, but some will advance, or maybe go backwards. when one of them reaches it's goal, produce the desired effect.
to keep the code neat, don't hardcode the state machines, instead build a simple data structure that encodes the state graph: each state is a node with a list of interesting events, each one points to the next state's node. Each machine's state is simply a reference to the appropriate state node.
edit: It seems Cowan's advice is equivalent to this, but he optimises his state machines to express only simple sequences. seems enough for your specific problem, but more complex conditions could need a more general solution.
Related
This question doesn't address any programming language in particular but of course I'm happy to hear some examples.
Imagine a big number of files, let's say 5000, that have all kinds of letters and numbers in it. Then, there is a method that receives a user input that acts as an alias in order to display that file. Without having the files sorted in a folder, the method(s) need to return the file name that is associated to the alias the user provided.
So let's say user input "gd322" stands for the file named "k4e23", the method would look like
if(input.equals("gd322")){
return "k4e23";
}
Now, imagine having 4 values in that method:
switch(input){
case gd322: return fw332;
case g344d: return 5g4gh;
case s3red: return 536fg;
case h563d: return h425d;
} //switch on string, no break, no string indicators, ..., pls ignore the syntax, it's just pseudo
Keeping in mind we have 5000 entries, there are probably more than just 2 entries starting with g. Now, if the user input starts with 's', instead of wasting CPU cycles checking all the a's, b's, c's, ..., we could also make another switch for this, which then directs to the 'next' methods like this:
switch(input[0]){ //implying we could access strings like that
case a: switchA(input);
case b: switchB(input);
// [...]
case g: switchG(input);
case s: switchS(input);
}
So the CPU doesn't have to check on all of them, but rather calls a method like this:
switchG(String input){
switch(input){
case gd322: return fw332;
case g344d: return 5g4gh;
// [...]
}
Is there any field of computer science dealing with this? I don't know how to call it and therefore don't know how to search for it but I think my thoughts make sense on a large scale. Pls move the thread if it doesn't belong here but I really wanna see your thoughts on this.
EDIT: don't quote me on that "5000", I am not in the situation described above and I wanted to talk about this completely theoretical, it could also be 3 entries or 300'000, maybe even less or more
If you have 5000 options, you're probably better off hashing them than having hard-coded if / switch statements. In c++ you could also use std::map to pair a function pointer or other option handling information with each possible option.
Interesting, but I don't think you can give a generic answer. It all depends on how the code is executed. Many compilers will have all kinds of optimizations, in the if and switch, but also in the way strings are compared.
That said, if you have actual (disk) files with those lists, then reading the file will probably take much longer than processing it, since disk I/O is very slow compared to memory access and CPU processing.
And if you have a list like that, you may want to build a hash table, or simply a sorted list/array in which you can perform a binary search. Sorting it also takes time, but if you have to do many lookups in the same list, it may be well worth the time.
Is there any field of computer science dealing with this?
Yes, the science of efficient data structures. Well, isn't that what CS is all about? :-)
The algorithm you described resembles a trie. It wouldn't be statically encoded in the source code with switch statements, but would use dynamic lookups in a structure loaded from somewhere and stuff, but the idea is the same.
Yes the problem is known and solved since decades. Hash functions.
Basically you have a set of values (here strings like "gd322", "g344d") and you want to know if some other value v is among them.
The idea is to put the strings in a big array, at an index calculated from their values by some function. Given a value v, you'll compute an index the same way, and check whether the value v is here or not. Much faster than checking the whole array.
Of course there is a problem with different values falling at the same place : collisions. Some magic is needed then : perfect hash functions whose coefficients are tweaked so values from the initial set don't cause any collisions.
I am taking a course on models of computation and currently we are doing finite state machines. One my tasks is to draw out a FSM that performs division of 3; to simplify the model the machine only accepts numbers multiple of 3. I am not sure how this exactly works, especially since I imagine FSM putting out only single binary values. Could you guys give examples (division by 2 or 4) or hints on how to approach this?
This is what you need, I think (sorry about the bad picture). The 'E' represents epsilon/lambda/no-output. The label of the edges denotes 'input/output'. For each symbol read there is also a corresponding output which may be lambda (no output).
Apologies for the non-descriptive question; if you can think of a better one, I'm all ears.
I'm writing some Perl to implement an algorithm and the code I have smells fishy. Since I don't have a CS background, I don't have a lot of knowledge of standard algorithms in my back pocket, but this seems like something that it might be.
Let me describe what I'm doing by way of metaphor:
You have a conveyor belt of oranges. The oranges pass you one by one. You also have an unlimited supply of flat-packed boxes.
For each orange, check it. If it is rotten, dispose of it
If it is good, put it in a box. If you don't have a box, grab a new one and construct it.
If the box has 10 oranges in it, close it up and put it on a pallet. Do not construct a new one.
Repeat until you have no more oranges
If you have a constructed box with some oranges in it, close it up and put it on a pallet
So, we have an algorithm for processing items in a list, if they meet some criteria, they should be added to a structure which, when it meets some other criteria, should be 'closed out'. Also, once the list has been processed, if there's an 'open' structure, it should be 'closed out' as well.
Naively, I assume that the algorithm consists of a loop acting over the list, with a conditional to see if the list element belongs in the structure and a conditional to see if the structure needs to be 'closed'.
Outside the loop, there would be one more conditional to close any outstanding structures.
So, here are my questions:
Is this a description of a well-known algorithm? If so, does it have a name?
Is there an effective way to coalesce the 'closing out the box' activity into a single place, as opposed to once inside the loop and once outside of the loop?
I tagged this as 'Perl' because Perlish approaches are of interest, but I'd be interested to hear of any other languages that have neat solutions to this.
It's a nice fit with a functional approach - you're iterating over a stream of Oranges, testing, grouping and operating on them. In Scala, it would be something like:
val oranges:Stream[Oranges] = ... // generate a stream of Oranges
oranges.filter(_.isNotRotten).grouped(10).foreach{ o => {(new Box).fillBox(o)}}
(grouped does the right thing with the partial box at the end)
There's probably Perl equivalents.
Is there an effective way to coalesce the 'closing out the box' activity into a single place,
as opposed to once inside the loop and once outside of the loop?
Yes. Simply add "... or there are no more oranges" to the "does the structure need to be closed" function. The easiest way of doing this is a do/while construct (technically speaking it's NOT a loop in Perl, though it looks like one):
my $current_container;
my $more_objects;
do {
my $object = get_next_object(); # Easiest implementation returns undef if no more
$more_objects = more_objects($object) # Easiest to implement as "defined $object"
if (!$more_objects || can_not_pack_more($current_container) {
close_container($current_container);
$current_container = open_container() if $more_objects;
}
pack($object, $current_container) if $more_objects;
} while ($more_objects);
IMHO, this doesn't really win you anything if the close_container() is encapsulated into a method - there's no major technical or code quality cost to calling it both inside and outside the loop. Actually, I'm strongly of the opinion that a convoluted workaround like I presented above is WORSE code quality wise than a straightforward:
my $current_container;
while (my $more_objects = more_objects(my $object = get_next_object())) {
if (can_not_pack_more($current_container)) { # false on undef
close_container($current_container);
}
$current_container = open_container_if_closed($current_container); # if defined
pack($object, $current_container);
}
close_container($current_container);
It seems like a bit over-complicated for the problem you are describing, but it sounds theoretically close to Petri Nets. check Petri Nets on wikipedia
A perl implementation can be found here
I hope this will help you,
Jerome Wagner
I don't think there's a name for this algorithm. For a straight-forward implementation you'll need two tests: one to detect a full box while in the processing loop and one after the loop to detect a partially full box. The "closing the box" logic can be made into a subroutine to avoid duplicating it. A functional approach could provide a way around that:
use List::MoreUtils qw(part natatime);
my ($good, $bad) = part { $_->is_rotten() } #oranges;
$_->dispose() foreach #$bad;
my $it = natatime 10, #$good;
while (my #batch = $it->()) {
my $box = Box->new();
$box->add(#batch);
$box->close();
$box->stack();
}
When looking at algorithms, the mainstream CS ones tend to handle very complex situations, or employ very complex approaches (look up NP-Complete for example). Moreover, the algorithms tend to focus on optimization. How can this system be more efficient? How can I use less steps in this production schedule? What is the most amount of foos that I can fit in my bar? etc.
An example of a complex approach in my opinion is quick sort because the recursion is pretty genius. I know it is standard, but I really like it. If you like algorithms, then check out the Simplex Algorithm - it has been very influential.
An example of a complex situation would be if you had oranges that go in, get sorted into 5 orange piles, then went to 5 different places to be peeled, then all came back with another path of oranges to total 10 orange piles, then each orange was individually sliced, and boxed in groups of exactly 2 pounds.
Back to your example. Your example is a simplified version of a flow network. Instead of having so many side paths and options, there is only one path with a capacity of one orange at a time. Of the flow network algorithms, the Ford-Fulkerson algorithm is probably the most influential.
So, you can probably fit one of these algorithms into the example posed, but it would be through a simplification process. Basically there is not enough complexity here to need any optimization. And there is no risk of running at an inefficient time complexity so there is no need to be running the "perfect approach".
The approach you detailed is going to be fine here, and the accepted answer above does a good job suggesting an actual functional solution to the defined problem. I just wanted to add my 2 cents with regards to algorithms.
I have a several series of data points that need to be graphed. For each graph, some points may need to be thrown out due to error. An example is the following:
The circled areas are errors in the data.
What I need is an algorithm to filter this data so that it eliminates the error by replacing the bad points with flat lines, like so:
Are there any algorithms out there that are especially good at detecting error points? Do you have any tips that could point me in the right direction?
EDIT: Error points are any points that don't look consistent with the data on both sides. There can be large jumps, as long as the data after the jump still looks consistent. If it's on the edge of the graph, large jumps should probably be considered error.
This is a problem that is hard to solve generically; your final solution will end up being very process-dependent, and unique to your situation.
That being said, you need to start by understanding your data: from one sample to the next, what kind of variation is possible? Using that, you can use previous data samples (and maybe future data samples) to decide if the current sample is bogus or not. Then, you'll end up with a filter that looks something like:
const int MaxQueueLength = 100; // adjust these two values as necessary
const double MaxProjectionError = 5;
List<double> FilterData(List<double> rawData)
{
List<double> toRet = new List<double>(rawData.Count);
Queue<double> history = new Queue<double>(MaxQueueLength); // adjust queue length as necessary
foreach (double raw_Sample in rawData)
{
while (history.Count > MaxQueueLength)
history.Dequeue();
double ProjectedSample = GuessNext(history, raw_Sample);
double CurrentSample = (Math.Abs(ProjectedSample - raw_Sample) > MaxProjectionError) ? ProjectedSample : raw_Sample;
toRet.Add(CurrentSample);
history.Enqueue(CurrentSample);
}
return toRet;
}
The magic, then, is coming up with your GuessNext function. Here, you'll be getting into stuff that is specific to your situation, and should take into account everything you know about the process that is gathering data. Are there physical limits to how quickly the input can change? Does your data have known bad values you can easily filter?
Here is a simple example for a GuessNext function that works off of the first derivative of your data (i.e. it assumes that your data is a roughly a straight line when you only look at a small section of it)
double lastSample = double.NaN;
double GuessNext(Queue<double> history, double nextSample)
{
lastSample = double.IsNaN(lastSample) ? nextSample : lastSample;
//ignore the history for simple first derivative. Assume that input will always approximate a straight line
double toRet = (nextSample + (nextSample - lastSample));
lastSample = nextSample;
return toRet;
}
If your data is particularly noisy, you may want to apply a smoothing filter to it before you pass it to GuessNext. You'll just have to spend some time with the algorithm to come up with something that makes sense for your data.
Your example data appears to be parametric in that each sample defines both a X and a Y value. You might be able to apply the above logic to each dimension independently, which would be appropriate if only one dimension is the one giving you bad numbers. This can be particularly successful in cases where one dimension is a timestamp, for instance, and the timestamp is occasionally bogus.
If removing the outliers by eye is not possible, try kriging (with error terms) as in http://www.ipf.tuwien.ac.at/cb/publications/pipeline.pdf . This seems to work quite well to automatically deal with occasional extreme noise. I know that French meteorologists use such an approach to remove outliers in their data (like a fire next to a temperature sensor or something kicking a wind sensor for instance).
Please note that it is a difficult problem in general. Any information about the errors is precious. Did someone kick the measuring device ? Then you cannot do much except removing the offending data by hand. Is your noise systematic ? You can do a lot of things then by making (reasonable) hypotheses about it.
I am working on a project that requires the parsing of log files. I am looking for a fast algorithm that would take groups messages like this:
The temperature at P1 is 35F.
The temperature at P1 is 40F.
The temperature at P3 is 35F.
Logger stopped.
Logger started.
The temperature at P1 is 40F.
and puts out something in the form of a printf():
"The temperature at P%d is %dF.", Int1, Int2"
{(1,35), (1, 40), (3, 35), (1,40)}
The algorithm needs to be generic enough to recognize almost any data load in message groups.
I tried searching for this kind of technology, but I don't even know the correct terms to search for.
I think you might be overlooking and missed fscanf() and sscanf(). Which are the opposite of fprintf() and sprintf().
Overview:
A naïve!! algorithm keeps track of the frequency of words in a per-column manner, where one can assume that each line can be separated into columns with a delimiter.
Example input:
The dog jumped over the moon
The cat jumped over the moon
The moon jumped over the moon
The car jumped over the moon
Frequencies:
Column 1: {The: 4}
Column 2: {car: 1, cat: 1, dog: 1, moon: 1}
Column 3: {jumped: 4}
Column 4: {over: 4}
Column 5: {the: 4}
Column 6: {moon: 4}
We could partition these frequency lists further by grouping based on the total number of fields, but in this simple and convenient example, we are only working with a fixed number of fields (6).
The next step is to iterate through lines which generated these frequency lists, so let's take the first example.
The: meets some hand-wavy criteria and the algorithm decides it must be static.
dog: doesn't appear to be static based on the rest of the frequency list, and thus it must be dynamic as opposed to static text. We loop through a few pre-defined regular expressions and come up with /[a-z]+/i.
over: same deal as #1; it's static, so leave as is.
the: same deal as #1; it's static, so leave as is.
moon: same deal as #1; it's static, so leave as is.
Thus, just from going over the first line we can put together the following regular expression:
/The ([a-z]+?) jumps over the moon/
Considerations:
Obviously one can choose to scan part or the whole document for the first pass, as long as one is confident the frequency lists will be a sufficient sampling of the entire data.
False positives may creep into the results, and it will be up to the filtering algorithm (hand-waving) to provide the best threshold between static and dynamic fields, or some human post-processing.
The overall idea is probably a good one, but the actual implementation will definitely weigh in on the speed and efficiency of this algorithm.
Thanks for all the great suggestions.
Chris, is right. I am looking for a generic solution for normalizing any kind of text. The solution of the problem boils down to dynmamically finding patterns in two or more similar strings.
Almost like predicting the next element in a set, based on the previous two:
1: Everest is 30000 feet high
2: K2 is 28000 feet high
=> What is the pattern?
=> Answer:
[name] is [number] feet high
Now the text file can have millions of lines and thousands of patterns. I would like to parse the files very, very fast, find the patterns and collect the data sets that are associated with each pattern.
I thought about creating some high level semantic hashes to represent the patterns in the message strings.
I would use a tokenizer and give each of the tokens types a specific "weight".
Then I would group the hashes and rate their similarity. Once the grouping is done I would collect the data sets.
I was hoping, that I didn't have to reinvent the wheel and could reuse something that is already out there.
Klaus
It depends on what you are trying to do, if your goal is to quickly generate sprintf() input, this works. If you are trying to parse data, maybe regular expressions would do too..
You're not going to find a tool that can simply take arbitrary input, guess what data you want from it, and produce the output you want. That sounds like strong AI to me.
Producing something like this, even just to recognize numbers, gets really hairy. For example is "123.456" one number or two? How about this "123,456"? Is "35F" a decimal number and an 'F' or is it the hex value 0x35F? You're going to have to build something that will parse in the way you need. You can do this with regular expressions, or you can do it with sscanf, or you can do it some other way, but you're going to have to write something custom.
However, with basic regular expressions, you can do this yourself. It won't be magic, but it's not that much work. Something like this will parse the lines you're interested in and consolidate them (Perl):
my #vals = ();
while (defined(my $line = <>))
{
if ($line =~ /The temperature at P(\d*) is (\d*)F./)
{
push(#vals, "($1,$2)");
}
}
print "The temperature at P%d is %dF. {";
for (my $i = 0; $i < #vals; $i++)
{
print $vals[$i];
if ($i < #vals - 1)
{
print ",";
}
}
print "}\n";
The output from this isL
The temperature at P%d is %dF. {(1,35),(1,40),(3,35),(1,40)}
You could do something similar for each type of line you need to parse. You could even read these regular expressions from a file, instead of custom coding each one.
I don't know of any specific tool to do that. What I did when I had a similar problem to solve was trying to guess regular expressions to match lines.
I then processed the files and displayed only the unmatched lines. If a line is unmatched, it means that the pattern is wrong and should be tweaked or another pattern should be added.
After around an hour of work, I succeeded in finding the ~20 patterns to match 10000+ lines.
In your case, you can first "guess" that one pattern is "The temperature at P[1-3] is [0-9]{2}F.". If you reprocess the file removing any matched line, it leaves "only":
Logger stopped.
Logger started.
Which you can then match with "Logger (.+).".
You can then refine the patterns and find new ones to match your whole log.
#John: I think that the question relates to an algorithm that actually recognises patterns in log files and automatically "guesses" appropriate format strings and data for it. The *scanf family can't do that on its own, it can only be of help once the patterns have been recognised in the first place.
#Derek Park: Well, even a strong AI couldn't be sure it had the right answer.
Perhaps some compression-like mechanism could be used:
Find large, frequent substrings
Find large, frequent substring patterns. (i.e. [pattern:1] [junk] [pattern:2])
Another item to consider might be to group lines by edit-distance. Grouping similar lines should split the problem into one-pattern-per-group chunks.
Actually, if you manage to write this, let the whole world know, I think a lot of us would like this tool!
#Anders
Well, even a strong AI couldn't be sure it had the right answer.
I was thinking that sufficiently strong AI could usually figure out the right answer from the context. e.g. Strong AI could recognize that "35F" in this context is a temperature and not a hex number. There are definitely cases where even strong AI would be unable to answer. Those are the same cases where a human would be unable to answer, though (assuming very strong AI).
Of course, it doesn't really matter, since we don't have strong AI. :)
http://www.logparser.com forwards to an IIS forum which seems fairly active. This is the official site for Gabriele Giuseppini's "Log Parser Toolkit". While I have never actually used this tool, I did pick up a cheap copy of the book from Amazon Marketplace - today a copy is as low as $16. Nothing beats a dead-tree-interface for just flipping through pages.
Glancing at this forum, I had not previously heard about the "New GUI tool for MS Log Parser, Log Parser Lizard" at http://www.lizardl.com/.
The key issue of course is the complexity of your GRAMMAR. To use any kind of log-parser as the term is commonly used, you need to know exactly what you're scanning for, you can write a BNF for it. Many years ago I took a course based on Aho-and-Ullman's "Dragon Book", and the thoroughly understood LALR technology can give you optimal speed, provided of course that you have that CFG.
On the other hand it does seem you're possibly reaching for something AI-like, which is a different order of complexity entirely.