How do programmers test their algorithm in TopCoder or other competitions? - algorithm

Good programmers who write programs of moderate to higher difficulty in competitions of TopCoder or ACM ICPC, have to ensure the correctness of their algorithm before submission.
Although they are provided with some sample test cases to ensure the correct output, but how does it guarantees that program will behave correctly? They can write some test cases of their own but it won't be possible in all cases to know the correct answer through manual calculation. How do they do it?
Update: As it seems, it is not quite possible to analyze and guarantee the outcome of an algorithm given tight constraints of a competitive environment. However, if there are any manual, more common traits which are adopted while solving such problems - should be enough to answer the question. Something like best practices..

In competitions, the top programmers have enough experience to read the question, and think of some test cases that should catch most of the possibilities for input.
It catches most of the bugs usually - but it is NOT 100% safe.
However, in real life critical applications (critical systems on air planes or nuclear reactors for example) there are methods to PROVE some piece of code does what it is supposed to do.
This is the field of formal verification - which is way too complex and time consuming to be done during a contest, but for some systems it is used because mistakes could not be tolerated.
Some additional information:
Formal verification basically consists of 2 parts:
Manual verification - in here we use proving systems such as Hoare logic and manually prove the program does what we wants it to do.
Automatic model checking - modeling the problem as state machine, and use Model Checking tools to verify that the module does what it is supposed to do (or not doing something "bad").
Specifying "what it should do" is usually done with temporal logic.
This is often used to verify correctness of hardware models as well. For example Intel uses it to ensure they won't get the floating point bug again.

Picture this, imagine you are a top programmer.Meaning you know a bunch of algorithms and wouldn't think think twice while implementing them.You know how to modify an already known algorithm to suit the problem's needs.You are strong with estimating time and complexity and you expect that in the worst case your tailored algorithm would run within time and memory constraints.
At this level you simply think and use a scratchpad for about five to ten minutes and have a super clear algorithm before you start to code.Once you finish coding, you hit compile and there is usually no compilation error.Because the code is so intuitive to you.
Then based on the algorithm used and data structures used, you expect that there might be
one of the following issues.
a corner case
an overflow problem
A corner case is basically like you have coded for the general case, however when say N=1, the answer is different from others.So you generally write it as a special case.
An overflow is when intermediate values or results overflow a data type's limits.
You make note of any problems which arise at this point, and use this data during Challenge phase(as in TopCoder).
Once you have checked against these two, you hit Submit.

There's a time element to Top Coder, so it's not possible to test every combination within that constraint. They probably do the best they can and rely on experience for the rest, just as one does in real life. I don't know that it's ever possible to guarantee that a significant piece of code is error free forever.

Related

Competitive Programming : Generating test cases and validating the program correctness

I have been doing sport programming for a while and still improving day by day. But one thing I have always wondered is that it would be really nice if I could automate the test-case generation process and cross-validation of my program. Definitely it would be a brute force approach as some test cases would be algorithm specific.
Doing a google search gives me a nice link on Quora : How do programming contest problem setters make test cases ? and the popular testlib used by problem setters.
But isn't this a chicken-egg problem?
Assume I generated 1 million input test cases, but what would I check them against? How will I generate the outputs? Because I am still in the process of validating the program... If my script generates the correct outputs as well, then whats the point of writing the program in the first place. I can submit the script itself. Also, its not possible to write 1 million outputs for generated test cases manually. Can anyone please clarify this confusion.
I hope i have clarified the problem correctly.
It's common to generate the answer by a slow but obviously correct solution (like an exhaustive search). It can't be used as a main solution as it's too slow for large test cases, but you can check the output of your fast (but possibly incorrect) program using it.
Well the thing is it is not as broad as you think it to be. Test generation in competitive programming is guided by the algorithm of the problem and it's correctness proof.
So when you are thinking that there are million of test cases if you analyze the different situations the program can be then you will likely to get all the test cases. Maybe in certain algorithm you are some times processing the even index elements or the odd index elements of an array. Now what you will do? Divide it in 2 cases even or odd. Consider the smallest case for even ones . Same for odd ones. This way you are basically visiting all the control flow path of the program.
In competitive programming as we first determine the algorithm then we decide on a proper input sizes and then all this test cases and validations, it is often easy to think the corner points. Test case for 1000000 elements or when input is 0 or 1 ...test cases like this.
Another is most of the time we write a brute force solution much more slower than the original one. Now what we do? we just generate random medium size test cases and then run it again the slow program and we can check with our checker solution etc.
correctness is guided by some mathematical proof also.(Heuristics, Induction, Box principle, Number Theory etc ) That way we are sure about the correctness of the solution.
I faced the same issue earlier this year and saw some of my colleagues also figuring out a way to deal with this. Because there are sometimes when I just couldn’t think any more test cases and that’s when I decided to make a test case generator tool of my own, It’s free and open-source so anyone can use it.
You can easily generate a lot of test cases using this tool and validate the result using the output given from the correct but slower approach(in terms of time complexity and space complexity). You can either run them parallelly and check for outputs or write a simple script to compare the output of both programs (the slower but correct one and the better but unsure one) to validate.
I believe good coders won’t be needing it anyway, but for the middle level (div2, div3) coders and newbies, it can prove to be a lot helpful.
You can access it from GitHub : Test Case generator.
Both python source code and .exe files are present there with instructions,
If you want to make some changes of your own you can work with the python file.
If you just directly want it to generate some test cases you should prefer the .exe file(inside the zip).
It’ll help a lot if you’re beginning your Competitive programming journey.
Also, any suggestions or improvements are always welcome. Additionally you can also contribute to this project by adding some feature that you think the project is lacking or by adding some new test case formats by yourselves or making by request for the same.

How do you mitigate proposal-number overflow attacks in Byzantine Paxos?

I've been doing a lot of research into Paxos recently, and one thing I've always wondered about, I'm not seeing any answers to, which means I have to ask.
Paxos includes an increasing proposal number (and possibly also a separate round number, depending on who wrote the paper you're reading). And of course, two would-be leaders can get into duels where each tries to out-increment the other in a vicious cycle. But as I'm working in a Byzantine, P2P environment, it makes me what to do about proposers that would attempt to set the proposal number extremely high - for example, the maximum 32-bit or 64-bit word.
How should a language-agnostic, platform-agnostic Paxos-based protocol deal with integer maximums for proposal number and/or round number? Especially intentional/malicious cases, which make the modular-arithmetic approach of overflowing back to 0 a bit unattractive?
From what I've read, I think this is still an open question that isn't addressed in literature.
Byzantine Proposer Fast Paxos addresses denial of service, but only of the sort that would delay message sending through attacks not related to flooding with incrementing (proposal) counters.
Having said that, integer overflow is probably the least of your problems. Instead of thinking about integer overflow, you might want to consider membership attacks first (via DoS). Learning about membership after consensus from several nodes may be a viable strategy, but probably still vulnerable to Sybil attacks at some level.
Another strategy may be to incorporate some proof-of-work system for proposals to limit the flood of requests. However, it's difficult to know what to use this as a metric to balance against (for example, free currency when you mine the block chain in Bitcoin). It really depends on what type of system you're trying to build. You should consider the value of information in your system, then create a proof of work system that requires slightly more cost to circumvent.
However, once you have the ability to slow down a proposal counter, you still need to worry about integer maximums in any system with a high number of (valid) operations. You should have a strategy for number wrapping or a multiple precision scheme in place where you can clearly determine how many years/decades your network can run without encountering trouble without blowing out a fixed precision counter. If you can determine that your system will run for 100 years (or whatever) without blowing out your fixed precision counter, even with malicious entities, then you can choose to simplify things.
On another (important) note, the system model used in most papers doesn't reflect everything that makes a real-life implementation practical (Raft is a nice exception to this). If anything, some authors are guilty of creating a system model that is designed to avoid a hard problem that they haven't found an answer to. So, if someone says that X will solve everything, please be aware they they only mean that it solves everything in the very specific system model that they defined. On the other side of this, you should consider that the system model is closely tied to a statement that says "Y is impossible". A nice example to explain this concept is the completely asynchronous message passing of the Ben-Or consensus algorithm which uses nondeterminism in the system model's state machine to avoid the limits specified by the FLP impossibility result (which specifies that consensus requires partially asynchronous message passing when the system model's state machine is deterministic).
So, you should continue to consider the "impossible" after you read a proof that says it can't be done. Nancy Lynch did a nice writeup on this concept.
I guess what I'm really saying is that a good solution to your question doesn't really exist yet. If you figure it out, please publish it (or let me know if you find an existing paper).

Comparing algorithmic performance to old methods

I have written a new algorithm for something. Now I need to compare it with existing methods, some of which are old about 10 years.
The idea I had is to look at benchmarks of different processors over the years in order to establish how much faster my processor (i7-920) is than average processor from 2003. Then I would simply divide old methods' execution time by the speedup factor and use those numbers to compare with my own algorithm.
Has something like this been done? So I don't redo the existing work.
Can such a comparison be done some other way?
Are there some scientific papers written about such comparisons which I can reference?
I don't know which of these are possible for you, but here's a list of options I can think of:
Run their implementation side-by-side on your machine against yours.
This is the best option.
Rewrite their implementations and do (1).
You preferably need to compare it against their test to ensure you get vaguely similar results.
Find a library that implements their algorithm (or multiple libraries) and do (1).
I suggest multiple libraries, if possible, since a single one may not have implemented the algorithm efficiently. You may also want to compare these against their test.
Compare the algorithms mathematically.
This may be difficult, but it's not impossible.
Do what you presented.
(a) I would not recommend this as there are other determining factors in your computer other than the processor speed that affect the speed of an algorithm. Getting an equation that perfectly balances these will likely be very difficult.
(b) There is a massive difference between top and bottom of the line computers, so using the average is not a particularly good idea. If the author didn't provide details regarding this, I'm afraid your benchmark is not likely to be too accurate.
Go out and buy a machine of similar specs to the one used by the desired test to benchmark on.
A 10-year-old machine should be pretty cheap, if you can find one. Also, see (5.b).
Contact the author to allow for any of the other options.
Papers often provide contact details of the authors, or you should be able to find them elsewhere if they have any sort of online presence and you're half-decent at using Google.
If I were reviewing your results, I would be annoyed if you attempted to demonstrate less than an order of magnitude speedup this way. There are a lot of variables determining algorithm performance, and I would be skeptical that a generic benchmark could capture the right ones. My gold standard is old and new algorithms implemented by the same programmer, with similar effort made to optimize, running on the same hardware. Using the previous authors' implementation instead of making a new one is commonplace in the experimental algorithms literature, but using different hardware isn't.
Algorithmic performance is usually measured in big-O terms, for which it is better to count basic operations, like comparisons, and do it for a range of input sizes.
If you must measure overall time, at least eliminate other sources of difference.
As #larsmans said, do it on the same processor.
Also, if there is existing work, there's no harm in repeating it.
Generally, in science, that's a good thing.
You should attempt to reduce the amount of differing factors between the two runs. I think just run-timing the two algorithms side by side on the same machine and/or comparing their Big O times are both equally valid and important. You should also attempt to use updated libraries and other external functions; using outdated ones my also be the cause of timing results.

Logic for software verification

I'm looking at the requirements for automated software verification, i.e. a program that takes in code (ordinary procedural code written in languages like C and Java), generates a bunch of theorems saying that each loop must eventually halt, no assertion will be violated, there will never be a dereference of a null pointer etc., then passes same to a theorem prover to prove they are actually true (or else find a counterexample indicating a bug in the code).
The question is what kind of logic to use. The two major positions seem to be:
First-order logic is just fine.
First-order logic isn't expressive enough, you need higher order logic.
Problem is, there seems to be a lot of support for both positions. So which one is right? If it's the second one, are there any available examples of things you want to do, that verifiers based on first-order logic have trouble with?
You can do everything you need in FOL, but it's a lot of extra work - a LOT! Most existing systems were developed by academics / people with not a lot of time, so they are tempted to take short cuts to save time / effort, and thus are attracted to HOLs, functional languages, etc. However, if you want to build a system that is to be used by hundreds of thousands of people, rather than merely hundreds, we believe that FOL is the way to go because it is far more accessible to a wider audience. There's just no substitute for doing the work; we've been at this for 25 years now! Please take a look at our project (http://www.manmademinions.com)
Regards, Aaron.
In my practical experience, it seems to be "1. First-order logic is just fine". For examples of complete specifications for various functions written entirely in a specification language based on first-order logic, see for instance ACSL by Example or this case study.
First-order logic has automated provers (not proof assistants) that have been refined over the years to handle well properties that come from program verification. Notable automated provers for these uses are for instance Simplify, Z3, and Alt-ergo. If these provers fail and there is no obvious lemma/assertion you can add to help them, you still have the recourse of starting up a proof assistant for the difficult proof obligations. If you use HOL on the other hand, you cannot use Simplify, Z3 or Alt-ergo at all, and while I have heard of automated provers for high-order logic, I have never heard them praised for their efficiency when it comes to properties from programs.
We've found that FOL is fine for most verification conditions, but higher order logic is invaluable for a small number, for example for proving properties about summation of the elements in a collection. So our theorem prover (used in Perfect Developer and Escher C Verifier) is basically first order, but with the ability to do some higher order reasoning as well.

Performance anti patterns

I am currently working for a client who are petrified of changing lousy un-testable and un-maintainable code because of "performance reasons". It is clear that there are many misconceptions running rife and reasons are not understood, but merely followed with blind faith.
One such anti-pattern I have come across is the need to mark as many classes as possible as sealed internal...
*RE-Edit: I see marking everything as sealed internal (in C#) as a premature optimisation.*
I am wondering what are some of the other performance anti-patterns people may be aware of or come across?
The biggest performance anti-pattern I have come across is:
Not measuring performance before and
after the changes.
Collecting performance data will show if a certain technique was successful or not. Not doing so will result in pretty useless activities, because someone has the "feeling" of increased performance when nothing at all has changed.
The elephant in the room: Focusing on implementation-level micro-optimization instead of on better algorithms.
Variable re-use.
I used to do this all the time figuring I was saving a few cycles on the declaration and lowering memory footprint. These savings were of minuscule value compared with how unruly it made the code to debug, especially if I ended up moving a code block around and the assumptions about starting values changed.
Premature performance optimizations comes to mind. I tend to avoid performance optimizations at all costs and when I decide I do need them I pass the issue around to my collegues several rounds trying to make sure we put the obfu... eh optimization in the right place.
One that I've run into was throwing hardware at seriously broken code, in an attempt to make it fast enough, sort of the converse of Jeff Atwood's article mentioned in Rulas' comment. I'm not talking about the difference between speeding up a sort that uses a basic, correct algorithm by running it on faster hardware vs. using an optimized algorithm. I'm talking about using a not obviously correct home brewed O(n^3) algorithm when a O(n log n) algorithm is in the standard library. There's also things like hand coding routines because the programmer doesn't know what's in the standard library. That one's very frustrating.
Using design patterns just to have them used.
Using #defines instead of functions to avoid the penalty of a function call.
I've seen code where expansions of defines turned out to generate huge and really slow code. Of course it was impossible to debug as well. Inline functions is the way to do this, but they should be used with care as well.
I've seen code where independent tests has been converted into bits in a word that can be used in a switch statement. Switch can be really fast, but when people turn a series of independent tests into a bitmask and starts writing some 256 optimized special cases they'd better have a very good benchmark proving that this gives a performance gain. It's really a pain from maintenance point of view and treating the different tests independently makes the code much smaller which is also important for performance.
Lack of clear program structure is the biggest code-sin of them all. Convoluted logic that is believed to be fast almost never is.
Do not refactor or optimize while writing your code. It is extremely important not to try to optimize your code before you finish it.
Julian Birch once told me:
"Yes but how many years of running the application does it actually take to make up for the time spent by developers doing it?"
He was referring to the cumulative amount of time saved during each transaction by an optimisation that would take a given amount of time to implement.
Wise words from the old sage... I often think of this advice when considering doing a funky optimisation. You can extend the same notion a little further by considering how much developer time is being spent dealing with the code in its present state versus how much time is saved by the users. You could even weight the time by hourly rate of the developer versus the user if you wanted.
Of course, sometimes its impossible to measure, for example, if an e-commerce application takes 1 second longer to respond you will loose some small % money from users getting bored during that 1 second. To make up that one second you need to implement and maintain optimised code. The optimisation impacts gross profit positively, and net profit negatively, so its much harder to balance. You could try - with good stats.
Exploiting your programming language. Things like using exception handling instead of if/else just because in PLSnakish 1.4 it's faster. Guess what? Chances are it's not faster at all and that two years from now someone maintaining your code will get really angry with you because you obfuscated the code and made it run much slower, because in PLSnakish 1.8 the language maintainers fixed the problem and now if/else is 10 times faster than using exception handling tricks. Work with your programming language and framework!
Changing more than one variable at a time. This drives me absolutely bonkers! How can you determine the impact of a change on a system when more than one thing's been changed?
Related to this, making changes that are not warranted by observations. Why add faster/more CPUs if the process isn't CPU bound?
General solutions.
Just because a given pattern/technology performs better in one circumstance does not mean it does in another.
StringBuilder overuse in .Net is a frequent example of this one.
Once I had a former client call me asking for any advice I had on speeding up their apps.
He seemed to expect me to say things like "check X, then check Y, then check Z", in other words, to provide expert guesses.
I replied that you have to diagnose the problem. My guesses might be wrong less often than someone else's, but they would still be wrong, and therefore disappointing.
I don't think he understood.
Some developers believe a fast-but-incorrect solution is sometimes preferable to a slow-but-correct one. So they will ignore various boundary conditions or situations that "will never happen" or "won't matter" in production.
This is never a good idea. Solutions always need to be "correct".
You may need to adjust your definition of "correct" depending upon the situation. What is important is that you know/define exactly what you want the result to be for any condition, and that the code gives those results.
Michael A Jackson gives two rules for optimizing performance:
Don't do it.
(experts only) Don't do it yet.
If people are worried about performance, tell 'em to make it real - what is good performance and how do you test for it? Then if your code doesn't perform up to their standards, at least it's something the code writer and the application user agree on.
If people are worried about non-performance costs of rewriting ossified code (for example, the time sink) then present your estimates and demonstrate that it can be done in the schedule. Assuming it can.
I believe it is a common myth that super lean code "close to the metal" is more performant than an elegant domain model.
This was apparently de-bunked by the creator/lead developer of DirectX, who re-wrote the c++ version in C# with massive improvements. [source required]
Appending to an array using (for example) push_back() in C++ STL, ~= in D, etc. when you know how big the array is supposed to be ahead of time and can pre-allocate it.

Resources