Is there a name for this algorithm? - algorithm

Apologies for the non-descriptive question; if you can think of a better one, I'm all ears.
I'm writing some Perl to implement an algorithm and the code I have smells fishy. Since I don't have a CS background, I don't have a lot of knowledge of standard algorithms in my back pocket, but this seems like something that it might be.
Let me describe what I'm doing by way of metaphor:
You have a conveyor belt of oranges. The oranges pass you one by one. You also have an unlimited supply of flat-packed boxes.
For each orange, check it. If it is rotten, dispose of it
If it is good, put it in a box. If you don't have a box, grab a new one and construct it.
If the box has 10 oranges in it, close it up and put it on a pallet. Do not construct a new one.
Repeat until you have no more oranges
If you have a constructed box with some oranges in it, close it up and put it on a pallet
So, we have an algorithm for processing items in a list, if they meet some criteria, they should be added to a structure which, when it meets some other criteria, should be 'closed out'. Also, once the list has been processed, if there's an 'open' structure, it should be 'closed out' as well.
Naively, I assume that the algorithm consists of a loop acting over the list, with a conditional to see if the list element belongs in the structure and a conditional to see if the structure needs to be 'closed'.
Outside the loop, there would be one more conditional to close any outstanding structures.
So, here are my questions:
Is this a description of a well-known algorithm? If so, does it have a name?
Is there an effective way to coalesce the 'closing out the box' activity into a single place, as opposed to once inside the loop and once outside of the loop?
I tagged this as 'Perl' because Perlish approaches are of interest, but I'd be interested to hear of any other languages that have neat solutions to this.

It's a nice fit with a functional approach - you're iterating over a stream of Oranges, testing, grouping and operating on them. In Scala, it would be something like:
val oranges:Stream[Oranges] = ... // generate a stream of Oranges
oranges.filter(_.isNotRotten).grouped(10).foreach{ o => {(new Box).fillBox(o)}}
(grouped does the right thing with the partial box at the end)
There's probably Perl equivalents.

Is there an effective way to coalesce the 'closing out the box' activity into a single place,
as opposed to once inside the loop and once outside of the loop?
Yes. Simply add "... or there are no more oranges" to the "does the structure need to be closed" function. The easiest way of doing this is a do/while construct (technically speaking it's NOT a loop in Perl, though it looks like one):
my $current_container;
my $more_objects;
do {
my $object = get_next_object(); # Easiest implementation returns undef if no more
$more_objects = more_objects($object) # Easiest to implement as "defined $object"
if (!$more_objects || can_not_pack_more($current_container) {
close_container($current_container);
$current_container = open_container() if $more_objects;
}
pack($object, $current_container) if $more_objects;
} while ($more_objects);
IMHO, this doesn't really win you anything if the close_container() is encapsulated into a method - there's no major technical or code quality cost to calling it both inside and outside the loop. Actually, I'm strongly of the opinion that a convoluted workaround like I presented above is WORSE code quality wise than a straightforward:
my $current_container;
while (my $more_objects = more_objects(my $object = get_next_object())) {
if (can_not_pack_more($current_container)) { # false on undef
close_container($current_container);
}
$current_container = open_container_if_closed($current_container); # if defined
pack($object, $current_container);
}
close_container($current_container);

It seems like a bit over-complicated for the problem you are describing, but it sounds theoretically close to Petri Nets. check Petri Nets on wikipedia
A perl implementation can be found here
I hope this will help you,
Jerome Wagner

I don't think there's a name for this algorithm. For a straight-forward implementation you'll need two tests: one to detect a full box while in the processing loop and one after the loop to detect a partially full box. The "closing the box" logic can be made into a subroutine to avoid duplicating it. A functional approach could provide a way around that:
use List::MoreUtils qw(part natatime);
my ($good, $bad) = part { $_->is_rotten() } #oranges;
$_->dispose() foreach #$bad;
my $it = natatime 10, #$good;
while (my #batch = $it->()) {
my $box = Box->new();
$box->add(#batch);
$box->close();
$box->stack();
}

When looking at algorithms, the mainstream CS ones tend to handle very complex situations, or employ very complex approaches (look up NP-Complete for example). Moreover, the algorithms tend to focus on optimization. How can this system be more efficient? How can I use less steps in this production schedule? What is the most amount of foos that I can fit in my bar? etc.
An example of a complex approach in my opinion is quick sort because the recursion is pretty genius. I know it is standard, but I really like it. If you like algorithms, then check out the Simplex Algorithm - it has been very influential.
An example of a complex situation would be if you had oranges that go in, get sorted into 5 orange piles, then went to 5 different places to be peeled, then all came back with another path of oranges to total 10 orange piles, then each orange was individually sliced, and boxed in groups of exactly 2 pounds.
Back to your example. Your example is a simplified version of a flow network. Instead of having so many side paths and options, there is only one path with a capacity of one orange at a time. Of the flow network algorithms, the Ford-Fulkerson algorithm is probably the most influential.
So, you can probably fit one of these algorithms into the example posed, but it would be through a simplification process. Basically there is not enough complexity here to need any optimization. And there is no risk of running at an inefficient time complexity so there is no need to be running the "perfect approach".
The approach you detailed is going to be fine here, and the accepted answer above does a good job suggesting an actual functional solution to the defined problem. I just wanted to add my 2 cents with regards to algorithms.

Related

What's the most efficient way of combining switch/if statements

This question doesn't address any programming language in particular but of course I'm happy to hear some examples.
Imagine a big number of files, let's say 5000, that have all kinds of letters and numbers in it. Then, there is a method that receives a user input that acts as an alias in order to display that file. Without having the files sorted in a folder, the method(s) need to return the file name that is associated to the alias the user provided.
So let's say user input "gd322" stands for the file named "k4e23", the method would look like
if(input.equals("gd322")){
return "k4e23";
}
Now, imagine having 4 values in that method:
switch(input){
case gd322: return fw332;
case g344d: return 5g4gh;
case s3red: return 536fg;
case h563d: return h425d;
} //switch on string, no break, no string indicators, ..., pls ignore the syntax, it's just pseudo
Keeping in mind we have 5000 entries, there are probably more than just 2 entries starting with g. Now, if the user input starts with 's', instead of wasting CPU cycles checking all the a's, b's, c's, ..., we could also make another switch for this, which then directs to the 'next' methods like this:
switch(input[0]){ //implying we could access strings like that
case a: switchA(input);
case b: switchB(input);
// [...]
case g: switchG(input);
case s: switchS(input);
}
So the CPU doesn't have to check on all of them, but rather calls a method like this:
switchG(String input){
switch(input){
case gd322: return fw332;
case g344d: return 5g4gh;
// [...]
}
Is there any field of computer science dealing with this? I don't know how to call it and therefore don't know how to search for it but I think my thoughts make sense on a large scale. Pls move the thread if it doesn't belong here but I really wanna see your thoughts on this.
EDIT: don't quote me on that "5000", I am not in the situation described above and I wanted to talk about this completely theoretical, it could also be 3 entries or 300'000, maybe even less or more
If you have 5000 options, you're probably better off hashing them than having hard-coded if / switch statements. In c++ you could also use std::map to pair a function pointer or other option handling information with each possible option.
Interesting, but I don't think you can give a generic answer. It all depends on how the code is executed. Many compilers will have all kinds of optimizations, in the if and switch, but also in the way strings are compared.
That said, if you have actual (disk) files with those lists, then reading the file will probably take much longer than processing it, since disk I/O is very slow compared to memory access and CPU processing.
And if you have a list like that, you may want to build a hash table, or simply a sorted list/array in which you can perform a binary search. Sorting it also takes time, but if you have to do many lookups in the same list, it may be well worth the time.
Is there any field of computer science dealing with this?
Yes, the science of efficient data structures. Well, isn't that what CS is all about? :-)
The algorithm you described resembles a trie. It wouldn't be statically encoded in the source code with switch statements, but would use dynamic lookups in a structure loaded from somewhere and stuff, but the idea is the same.
Yes the problem is known and solved since decades. Hash functions.
Basically you have a set of values (here strings like "gd322", "g344d") and you want to know if some other value v is among them.
The idea is to put the strings in a big array, at an index calculated from their values by some function. Given a value v, you'll compute an index the same way, and check whether the value v is here or not. Much faster than checking the whole array.
Of course there is a problem with different values falling at the same place : collisions. Some magic is needed then : perfect hash functions whose coefficients are tweaked so values from the initial set don't cause any collisions.

Formal approach for text processing

During a coding interview I was asked the next question:
Write a program which counts the number of words and lines in the input stream.
Assume you have a reader with method nextChar().
At the first glance it looks simple. But then you realize, that you need to handle a lot of states, like:
a consecutive word/line delimeters
different end-of-word conditions - words separator or line separator or EOF
new string which starts with words separators
etc
On the interview I came up with a sortof spagetti code with many if-else and flags.
But I think there should be a formal approach for this kind of problems, which allows you to guarantee that you handled all possible cases, and which can give a structured solution.
I think it maybe something either from automata theory, or from compilers theory (I've never made a deep dive to any of these two areas before).
So, if you recognize a certain type of problem in the problem above, or you know which theory covers problems like this, please let me know.
Finite state machines.
This is essentially a small lexing task. The solution will never be pretty but if you write your code in the following way you can be quite confident of correctness:
curState <- NONE
while(c <- getChar)
switch(curState) {
case NONE:
switch(c) {
// ....
}
break;
// .....
}
}
You can also use a data structure to store the transition function (given a state and a character, what is the next state?) but for your case just writing the code is probably the best bet.
... don't forget about your text encoding! UTF-16, right? :)
You might want to look into implementations of the Unix wc utility:
Busybox
GNU
DragonFly BSD
Mac OS X (from FreeBSD)
These will be more polished and featured than what you will do in an interview, but its nice to see regardless.

Scala performance question

In the article written by Daniel Korzekwa, he said that the performance of following code:
list.map(e => e*2).filter(e => e>10)
is much worse than the iterative solution written using Java.
Can anyone explain why? And what is the best solution for such code in Scala (I hope it's not a Java iterative version which is Scala-fied)?
The reason that particular code is slow is because it's working on primitives but it's using generic operations, so the primitives have to be boxed. (This could be improved if List and its ancestors were specialized.) This will probably slow things down by a factor of 5 or so.
Also, algorithmically, those operations are somewhat expensive, because you make a whole list, and then make a whole new list throwing a few components of the intermediate list away. If you did it in one swoop, then you'd be better off. You could do something like:
list collect (case e if (e*2>10) => e*2)
but what if the calculation e*2 is really expensive? Then you could
(List[Int]() /: list)((ls,e) => { val x = e*2; if (x>10) x :: ls else ls }
except that this would appear backwards. (You could reverse it if need be, but that requires creating a new list, which again isn't ideal algorithmically.)
Of course, you have the same sort of algorithmic problems in Java if you're using a singly linked list--your new list will end up backwards, or you have to create it twice, first in reverse and then forwards, or you have to build it with (non-tail) recursion (which is easy in Scala, but inadvisable for this sort of thing in either language since you'll exhaust the stack), or you have to create a mutable list and then pretend afterwards that it's not mutable. (Which, incidentally, you can do in Scala also--see mutable.LinkedList.)
Basically it's traversing a list twice. Once for multiplying every element with two. And then another time to filter the results.
Think of it as one while loop producing a LinkedList with the intermediate results. And then another loop applying the filter to produce the final results.
This should be faster:
list.view.map(e => e * 2).filter(e => e > 10).force
The solution lies mostly with JVM. Though Scala has a workaround in the figure of #specialization, that increases the size of any specialized class hugely, and only solves half the problem -- the other half being the creation of temporary objects.
The JVM actually does a good job optimizing a lot of it, or the performance would be even more terrible, but Java does not require the optimizations that Scala does, so JVM does not provide them. I expect that to change to some extent with the introduction of SAM not-real-closures in Java.
But, in the end, it comes down to balancing the needs. The same while loop that Java and Scala do so much faster than Scala's function equivalent can be done faster yet in C. Yet, despite what the microbenchmarks tell us, people use Java.
Scala approach is much more abstract and generic. Therefore it is hard to optimize every single case.
I could imagine that HotSpot JIT compiler might apply stream- and loop-fusion to the code in the future if it sees that the immediate results are not used.
Additionally the Java code just does much more.
If you really just want to mutate over a datastructure, consider transform.
It looks a bit like map but doesn't create a new collection, e. g.:
val array = Array(1,2,3,4,5,6,7,8,9,10).transform(_ * 2)
// array is now WrappedArray(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
I really hope some additional in-place operations will be added ion the future...
To avoid traversing the list twice, I think the for syntax is a nice option here:
val list2 = for(v <- list1; e = v * 2; if e > 10) yield e
Rex Kerr correctly states the major problem: Operating on immutable lists, the stated piece of code creates intermediate lists in memory. Note that this is not necessarily slower than equivalent Java code; you just never use immutable datastructures in Java.
Wilfried Springer has a nice, Scala idomatic solution. Using view, no (manipulated) copies of the whole list are created.
Note that using view might not always be ideal. For example, if your first call is filter that is expected to throw away most of the list, is might be worthwhile to create the shorter version explicitly and use view only after that in order to improve memory locality for later iterations.
list.filter(e => e*2>10).map(e => e*2)
This attempt reduces first the List. So the second traversing is on less elements.

Tree Algorithm

I was thinking earlier today about an idea for a small game and stumbled upon how to implement it. The idea is that the player can make a series of moves that cause a little effect, but if done in a specific sequence would cause a greater effect. So far so good, this I know how to do. Obviously, I had to make it be more complicated (because we love to make it more complicated), so I thought that there could be more than one possible path for the sequence that would both cause greater effects, albeit different ones. Also, part of some sequences could be the beggining of other sequences, or even whole sequences could be contained by other bigger sequences. Now I don't know for sure the best way to implement this. I had some ideas, though.
1) I could implement a circular n-linked list. But since the list of moves never end, I fear it might cause a stack overflow ™. The idea is that every node would have n children and upon receiving a command, it might lead you to one of his children or, if no children was available to such command, lead you back to the beggining. Upon arrival on any children, a couple of functions would be executed causing the small and big effect. This might, though, lead to a lot of duplicated nodes on the tree to cope up with all the possible sequences ending on that specific move with different effects, which might be a pain to maintain but I am not sure. I never tried something this complex on code, only theoretically. Does this algorithm exist and have a name? Is it a good idea?
2) I could implement a state machine. Then instead of wandering around a linked list, I'd have some giant nested switch that would call functions and update the machine state accordingly. Seems simpler to implement, but... well... doesn't seem fun... nor ellegant. Giant switchs always seem ugly to me, but would this work better?
3) Suggestions? I am good, but I am far inexperienced. The good thing of the coding field is that no matter how weird your problem is, someone solved it in the past, but you must know where to look. Someone might have a better idea than those I had, and I really wanted to hear suggestions.
I'm not absolutely completely sure that I understand exactly what you're saying, but as an analagous situation, say someone's inputting an endless stream of numbers on the keyboard. '117' is a magic sequence, '468' is another one, '411799' is another (which contains the first one).
So if the user enters:
55468411799
you want to fire 'magic events' at the *s:
55468*4117*99*
or something like that, right? If that's analagous to the problem you're talking about, then what about something like (Java-like pseudocode):
MagicSequence fireworks = new MagicSequence(new FireworksAction(), 1, 1, 7);
MagicSequence playMusic = new MagicSequence(new MusicAction(), 4, 6, 8);
MagicSequence fixUserADrink = new MagicSequence(new ManhattanAction(), 4, 1, 1, 7, 9, 9);
Collection<MagicSequence> sequences = ... all of the above ...;
while (true) {
int num = readNumberFromUser();
for (MagicSequence seq : sequences) {
seq.handleNumber(num);
}
}
while MagicSequence has something like:
Action action = ... populated from constructor ...;
int[] sequence = ... populated from constructor ...;
int position = 0;
public void handleNumber(int num) {
if (num == sequence[position]) {
// They've entered the next number in the sequence
position++;
if (position == sequence.length) {
// They've got it all!
action.fire();
position = 0; // Or disable this Sequence from accepting more numbers if it's a once-off
}
} else {
position = 0; // missed a number, start again!
}
}
You might want to implement a state machine anyway, but you don't have to hardcode state transitions.
Try to make a graph of states, where link between state A to state B will mean A can lead to B.
Then you can traverse graph at runtime to find where player goes.
Edit: You can define graph node as:
-state-id
-list of links to other states,
where every link defines:
-state-id
-precondition, a list of states what must be visited before going to this state
What you're describing sounds very similar to the technology tree in a game live Civilization.
I don't know how the Civ authors built theirs, but I'd be inclined to use a multigraph to represent possible 'moves' - there will be some you can start at with no 'experience', and once you're in them, there will be multiple paths through to the end.
Draw-out what potential options you can have at each stage of the game, and then draw lines going from some options to others.
That should give you a start on implementation, as graphs are [relatively] easy concepts to implement and utilize.
Sounds like a neural network. You could create one and train it to recognize the patterns that cause the various effects you are looking for.
What you're describing sounds somewhat similar to a dependency graph or a word graph. You might look into those.
#Cowan, #Javier: Nice idea, mind if I add to it?
Let the MagicSequence objects listen to the incoming stream of user input, that is notify them of the input (broadcast) and let each of them add the input to there internal input stream. This stream is cleared when the input is not the expected next input in the pattern that would have the MagicSequence fire its action. As soon as the pattern is completed, fire the action and clear the internal input stream.
Optimize this by only feeding input to the MagicSequences that are waiting for it. This could be done two ways:
You have an object that lets all MagicSequences connect with events that correspond with numbers in their patterns. MagicSequence(1,1,7) would add itself to got1 and got7, for example:
UserInput.got1 += MagicSequnece[i].SendMeInput;
You could optimize this such that after each input MagicSequences deregister from invalid events and register with valid ones.
create a small state machine for each effect that you'd want. at each user action, 'broadcast' it to all state machines. most of then won't care, but some will advance, or maybe go backwards. when one of them reaches it's goal, produce the desired effect.
to keep the code neat, don't hardcode the state machines, instead build a simple data structure that encodes the state graph: each state is a node with a list of interesting events, each one points to the next state's node. Each machine's state is simply a reference to the appropriate state node.
edit: It seems Cowan's advice is equivalent to this, but he optimises his state machines to express only simple sequences. seems enough for your specific problem, but more complex conditions could need a more general solution.

Looking for algorithm that reverses the sprintf() function output

I am working on a project that requires the parsing of log files. I am looking for a fast algorithm that would take groups messages like this:
The temperature at P1 is 35F.
The temperature at P1 is 40F.
The temperature at P3 is 35F.
Logger stopped.
Logger started.
The temperature at P1 is 40F.
and puts out something in the form of a printf():
"The temperature at P%d is %dF.", Int1, Int2"
{(1,35), (1, 40), (3, 35), (1,40)}
The algorithm needs to be generic enough to recognize almost any data load in message groups.
I tried searching for this kind of technology, but I don't even know the correct terms to search for.
I think you might be overlooking and missed fscanf() and sscanf(). Which are the opposite of fprintf() and sprintf().
Overview:
A naïve!! algorithm keeps track of the frequency of words in a per-column manner, where one can assume that each line can be separated into columns with a delimiter.
Example input:
The dog jumped over the moon
The cat jumped over the moon
The moon jumped over the moon
The car jumped over the moon
Frequencies:
Column 1: {The: 4}
Column 2: {car: 1, cat: 1, dog: 1, moon: 1}
Column 3: {jumped: 4}
Column 4: {over: 4}
Column 5: {the: 4}
Column 6: {moon: 4}
We could partition these frequency lists further by grouping based on the total number of fields, but in this simple and convenient example, we are only working with a fixed number of fields (6).
The next step is to iterate through lines which generated these frequency lists, so let's take the first example.
The: meets some hand-wavy criteria and the algorithm decides it must be static.
dog: doesn't appear to be static based on the rest of the frequency list, and thus it must be dynamic as opposed to static text. We loop through a few pre-defined regular expressions and come up with /[a-z]+/i.
over: same deal as #1; it's static, so leave as is.
the: same deal as #1; it's static, so leave as is.
moon: same deal as #1; it's static, so leave as is.
Thus, just from going over the first line we can put together the following regular expression:
/The ([a-z]+?) jumps over the moon/
Considerations:
Obviously one can choose to scan part or the whole document for the first pass, as long as one is confident the frequency lists will be a sufficient sampling of the entire data.
False positives may creep into the results, and it will be up to the filtering algorithm (hand-waving) to provide the best threshold between static and dynamic fields, or some human post-processing.
The overall idea is probably a good one, but the actual implementation will definitely weigh in on the speed and efficiency of this algorithm.
Thanks for all the great suggestions.
Chris, is right. I am looking for a generic solution for normalizing any kind of text. The solution of the problem boils down to dynmamically finding patterns in two or more similar strings.
Almost like predicting the next element in a set, based on the previous two:
1: Everest is 30000 feet high
2: K2 is 28000 feet high
=> What is the pattern?
=> Answer:
[name] is [number] feet high
Now the text file can have millions of lines and thousands of patterns. I would like to parse the files very, very fast, find the patterns and collect the data sets that are associated with each pattern.
I thought about creating some high level semantic hashes to represent the patterns in the message strings.
I would use a tokenizer and give each of the tokens types a specific "weight".
Then I would group the hashes and rate their similarity. Once the grouping is done I would collect the data sets.
I was hoping, that I didn't have to reinvent the wheel and could reuse something that is already out there.
Klaus
It depends on what you are trying to do, if your goal is to quickly generate sprintf() input, this works. If you are trying to parse data, maybe regular expressions would do too..
You're not going to find a tool that can simply take arbitrary input, guess what data you want from it, and produce the output you want. That sounds like strong AI to me.
Producing something like this, even just to recognize numbers, gets really hairy. For example is "123.456" one number or two? How about this "123,456"? Is "35F" a decimal number and an 'F' or is it the hex value 0x35F? You're going to have to build something that will parse in the way you need. You can do this with regular expressions, or you can do it with sscanf, or you can do it some other way, but you're going to have to write something custom.
However, with basic regular expressions, you can do this yourself. It won't be magic, but it's not that much work. Something like this will parse the lines you're interested in and consolidate them (Perl):
my #vals = ();
while (defined(my $line = <>))
{
if ($line =~ /The temperature at P(\d*) is (\d*)F./)
{
push(#vals, "($1,$2)");
}
}
print "The temperature at P%d is %dF. {";
for (my $i = 0; $i < #vals; $i++)
{
print $vals[$i];
if ($i < #vals - 1)
{
print ",";
}
}
print "}\n";
The output from this isL
The temperature at P%d is %dF. {(1,35),(1,40),(3,35),(1,40)}
You could do something similar for each type of line you need to parse. You could even read these regular expressions from a file, instead of custom coding each one.
I don't know of any specific tool to do that. What I did when I had a similar problem to solve was trying to guess regular expressions to match lines.
I then processed the files and displayed only the unmatched lines. If a line is unmatched, it means that the pattern is wrong and should be tweaked or another pattern should be added.
After around an hour of work, I succeeded in finding the ~20 patterns to match 10000+ lines.
In your case, you can first "guess" that one pattern is "The temperature at P[1-3] is [0-9]{2}F.". If you reprocess the file removing any matched line, it leaves "only":
Logger stopped.
Logger started.
Which you can then match with "Logger (.+).".
You can then refine the patterns and find new ones to match your whole log.
#John: I think that the question relates to an algorithm that actually recognises patterns in log files and automatically "guesses" appropriate format strings and data for it. The *scanf family can't do that on its own, it can only be of help once the patterns have been recognised in the first place.
#Derek Park: Well, even a strong AI couldn't be sure it had the right answer.
Perhaps some compression-like mechanism could be used:
Find large, frequent substrings
Find large, frequent substring patterns. (i.e. [pattern:1] [junk] [pattern:2])
Another item to consider might be to group lines by edit-distance. Grouping similar lines should split the problem into one-pattern-per-group chunks.
Actually, if you manage to write this, let the whole world know, I think a lot of us would like this tool!
#Anders
Well, even a strong AI couldn't be sure it had the right answer.
I was thinking that sufficiently strong AI could usually figure out the right answer from the context. e.g. Strong AI could recognize that "35F" in this context is a temperature and not a hex number. There are definitely cases where even strong AI would be unable to answer. Those are the same cases where a human would be unable to answer, though (assuming very strong AI).
Of course, it doesn't really matter, since we don't have strong AI. :)
http://www.logparser.com forwards to an IIS forum which seems fairly active. This is the official site for Gabriele Giuseppini's "Log Parser Toolkit". While I have never actually used this tool, I did pick up a cheap copy of the book from Amazon Marketplace - today a copy is as low as $16. Nothing beats a dead-tree-interface for just flipping through pages.
Glancing at this forum, I had not previously heard about the "New GUI tool for MS Log Parser, Log Parser Lizard" at http://www.lizardl.com/.
The key issue of course is the complexity of your GRAMMAR. To use any kind of log-parser as the term is commonly used, you need to know exactly what you're scanning for, you can write a BNF for it. Many years ago I took a course based on Aho-and-Ullman's "Dragon Book", and the thoroughly understood LALR technology can give you optimal speed, provided of course that you have that CFG.
On the other hand it does seem you're possibly reaching for something AI-like, which is a different order of complexity entirely.

Resources