My program is being reported as a high-level security threat by AVG?

My program is being reported as a high-level security threat by AVG? - xna-4.0

I recently started distributing my XNA 4.0 game to people. On computers with AVG installed, it detected this game as a false positive virus with a high security threat.
I have no idea of what is going on. All I know is that when it happened, there was an infinite loop issue. Could this be detected by AVG as a criteria for virus detection?
How can I report a false positive to AVG? I found this website http://www.geekstogo.com/forum/topic/273320-where-to-submit-false-positives-to-antivirus-and-security-vendors/
But under AVG, it only shows how to submit false positive websites.

You can get more information in order to handle the false positive with AVG at the below link. Make sure to check out section IV.
http://www.avg.com/us-en/whitelist
As far as the false positive goes, I am not intimately familiar with virus scanners, but there are probably tons of reasons your software could have triggered a false positive. Perhaps the code (coincidentally) matches some part of a virus signature, or perhaps heuristic matching (which looks for certain behavioral patterns in software) flagged it as suspicious (are you deleting files? doing funky things with memory? using polymorphic code?, etc). I doubt an infinite loop in itself could have lead to the issue though.

Related

How do you mitigate proposal-number overflow attacks in Byzantine Paxos?

I've been doing a lot of research into Paxos recently, and one thing I've always wondered about, I'm not seeing any answers to, which means I have to ask.
Paxos includes an increasing proposal number (and possibly also a separate round number, depending on who wrote the paper you're reading). And of course, two would-be leaders can get into duels where each tries to out-increment the other in a vicious cycle. But as I'm working in a Byzantine, P2P environment, it makes me what to do about proposers that would attempt to set the proposal number extremely high - for example, the maximum 32-bit or 64-bit word.
How should a language-agnostic, platform-agnostic Paxos-based protocol deal with integer maximums for proposal number and/or round number? Especially intentional/malicious cases, which make the modular-arithmetic approach of overflowing back to 0 a bit unattractive?

From what I've read, I think this is still an open question that isn't addressed in literature.
Byzantine Proposer Fast Paxos addresses denial of service, but only of the sort that would delay message sending through attacks not related to flooding with incrementing (proposal) counters.
Having said that, integer overflow is probably the least of your problems. Instead of thinking about integer overflow, you might want to consider membership attacks first (via DoS). Learning about membership after consensus from several nodes may be a viable strategy, but probably still vulnerable to Sybil attacks at some level.
Another strategy may be to incorporate some proof-of-work system for proposals to limit the flood of requests. However, it's difficult to know what to use this as a metric to balance against (for example, free currency when you mine the block chain in Bitcoin). It really depends on what type of system you're trying to build. You should consider the value of information in your system, then create a proof of work system that requires slightly more cost to circumvent.
However, once you have the ability to slow down a proposal counter, you still need to worry about integer maximums in any system with a high number of (valid) operations. You should have a strategy for number wrapping or a multiple precision scheme in place where you can clearly determine how many years/decades your network can run without encountering trouble without blowing out a fixed precision counter. If you can determine that your system will run for 100 years (or whatever) without blowing out your fixed precision counter, even with malicious entities, then you can choose to simplify things.
On another (important) note, the system model used in most papers doesn't reflect everything that makes a real-life implementation practical (Raft is a nice exception to this). If anything, some authors are guilty of creating a system model that is designed to avoid a hard problem that they haven't found an answer to. So, if someone says that X will solve everything, please be aware they they only mean that it solves everything in the very specific system model that they defined. On the other side of this, you should consider that the system model is closely tied to a statement that says "Y is impossible". A nice example to explain this concept is the completely asynchronous message passing of the Ben-Or consensus algorithm which uses nondeterminism in the system model's state machine to avoid the limits specified by the FLP impossibility result (which specifies that consensus requires partially asynchronous message passing when the system model's state machine is deterministic).
So, you should continue to consider the "impossible" after you read a proof that says it can't be done. Nancy Lynch did a nice writeup on this concept.
I guess what I'm really saying is that a good solution to your question doesn't really exist yet. If you figure it out, please publish it (or let me know if you find an existing paper).

What is Best for Defect Rate Tracking? Defects per KLOC?

I'm trying to create some internal metrics to demonstrate (determine?) how well TDD improves defect rates in code.
Is there a better way than defects/KLOC? What about a language's 'functional density'?
Any comments or suggestions would be helpful.
Thanks - Jonathan

You may also consider mapping defect discovery rate and defect resolution rates... how long does it take to find bugs, and once they're found, how long do they take to fix? To my knowledge, TDD is supposed to improve on fix times because it makes defects known earlier... right?

Any measure is an arbitrary comparison of defects to code size; so long as the comparison is similar, it should work. E.g., defects/kloc in C to defects/kloc in C. If you changed languages, it would affect the metric in any case, since the same program in another language might be less defect-prone.

Measuring defects isn't an easy thing. One would like to account for the complexity of the code, but that is incredibly messy and unpleasant. When measuring code quality I recommend:
Measure the current state (what is your defect rate now)
Make a change (peer reviews, training, code guidelines, etc)
Measure the new defect rate (Have things improved?)
Goto 2
If you are going to compare coders make sure you compare coders doing similar work in the same language. Don't compare the coder who works in the deep internals of your most complex calculation engine to the coder who writes the code that stores stuff in the database.
I try to make sure that coders know that the process is being measured not the coders. This helps to improve the quality of the metrics.

I suggest to use the ratio between the times :
the time spend fixing bugs
the time spend writing other codes
This seem valid across languages...
It also works if you only have a rough estimation of some big code base. You can still compare it to the new code you are writing, to impress you management ;-)

I'm skeptical of all LOC-related measurements, not just because of different relative expressiveness of languages, but because individual programmers will vary enough in the expressiveness of their code as to make this metric "fuzzy" at best.
The things I would measure in the interests of project management are:
Number of open defects on the project. There's no single scalar that can tell you where the project is and how close it is to a releasable state, but this is still a handy number to have on hand and watch over time.
Defect detection rate. This is not the rate of introduction of new defects into the system, but it's probably the closest proxy you'll find.
Defect resolution rate. If this is less than the detection rate, you're falling behind - if it's greater, you're getting ahead.
All of these numbers are more useful if you combine them with severity information. A product with 20 minor bugs may well closer to release than one with 2 crashing bugs. If you're clearing the minor bugs but not the severe ones, you have to get the developers to refocus their attention.
I would track these numbers per project and per developer. The reason for doing them per project should be clear. The per-developer numbers are certainly not the whole picture of an individual contributor's skill or productivity, but can point you to people who might need training or remediation.
You may also wish to tag all the tickets in your defect tracking system by project module as well (especially for larger projects), so that you can tell when critical modules are in a fragile state.

Why dont you consider defects per use case ? or defects per requirement. We have faced practical issues in arriving at the KLOC.

Tracking and prediciting quality level

What techniques do people recommend to track the quality level of a new program? Are their ways to take a poorly defined term like "quality level", quanitify it and then make predictions? Currently I use bug rates and S curves but I am looking for other ways to evaluate, estimate and predict quality levels.

Are you looking for this measure?
More seriously, for me code quality is about maintainability:
how easy it is to fix a bug, to add/remove/modify a feature,
how easy it is to refactor: is a regression test suite available?
how much technical debt?
Remember that you write code once, but you read it several times.

I am not sure what counting your bugs is going to do with out something to compare it with. What if the software you made was very hard and had lots of edge cases? You need some kind some kind of comparison...
Even though one of the other answers was clearly a joke code reviews are also probably a good idea. If you have too many bugs hire better engineers or have them write less code.
Edit: Added after considering comments...
Every bug is a like a unique sucky snow flake (a suckflake?). They have different levels of impact on your customers and developers. I would at least take this into account. Maybe adding severity (a measure of customer push for fix) and engineering hours spent fixing it might help improve accuracy. My concern I guess is that this is still over simplification of "quality" when it comes to developing software.
Sadly software quality != product quality. A recently released game called Fallout 3 won tons of awards and made lots of money (at least I assume) but was also a buggy hunk of junk on PC at least.
Just make sure you are tracking and optimizing the correct thing. Tracking # bugs vs time is just tracking # of bugs vs time. Reading any more into it requires some level of assumption however correct or incorrect.
What are your goals? Bugs are but one part of software quality. If you want to continue you to support your software maintainability is key. A lot of times bug fixes can make things less maintainable if your coder did things in a rush. This in turn makes future fixes and features harder and fixes can add new bugs.

This depends a lot on what kind of comparisons you're trying to make. If you're looking at a single project over time and the team doesn't change, then bug rates might be meaningful. However, if you're comparing different projects, with different teams, there's really no way to compare things like bug rates, because you are really comparing the rate of known bugs. One team may be much better at identifying bugs than the other, making their bug rate look higher, but they're really the ones with better software quality.

Do you do unit testing?
Code coverage on unit tests can be a decent measure of quality.

Understanding Dijkstra's Mozart programming style

I came across this article about programming styles, seen by Edsger Dijsktra. To quickly paraphrase, the main difference is Mozart, when the analogy is made to programming, fully understood (debatable) the problem before writing anything, while Beethoven made his decisions as he wrote the notes out on paper, creating many revisions along the way. With Mozart programming, version 1.0 would be the only version for software that should aim to work with no errors and maximum efficiency. Also, Dijkstra says software not at that level of refinement and stability should not be released to the public.
Based on his views, two questions. Is Mozart programming even possible? Would the software we write today really benefit if we adopted the Mozart style instead?
My thoughts. It seems, to address the increasing complexity of software, we've moved on from this method to things like agile development, public beta testing, and constant revisions, methods that define web development, where speed matters most. But when I think of all the revisions web software can go through, especially during maintenance, when often patches are applied over patches, to then be refined through a tedious refactoring process—the Mozart way seems very attractive. It would at least lessen those annoying software updates, e.g. Digsby, Windows, iTunes, etc., many the result of unforeseen vulnerabilities that require a new and immediate release.
Edit: Refer to the response below for a more accurate explanation of Dijsktra's views.

The Mozart programming style is a complete myth (everybody has to edit and modify their initial efforts), and although "Mozart" is essentially a metaphor in this example, it's worth noting that Mozart was substantially a myth himself.
Mozart was a supposed magical child prodigy who composed his first sonata at 4 (he was actually 6, and it sucked - you won't ever hear it performed anywhere). It's rarely mentioned, of course, that his father was considered Europe's greatest music teacher, and that he forced all of his children to practice playing and composition for hours each day as soon as they could pick up an instrument or a pen.
Mozart himself was careful to perpetuate the illusion that his music emerged whole from his mind by destroying most of his drafts, although enough survive to show that he was an editor like everyone else. Beethoven was just more honest about the process (maybe because he was deaf and couldn't tell if anyone was sneaking up on him anyway).
I won't even mention the theory that Mozart got his melodies from listening to songbirds. Or the fact that he created a system that used dice to randomly generate music (which is actually pretty cool, but might also explain how much of Mozart's music appeared to come from nowhere).
The moral of the story is: don't believe the hype. Programming is work, followed by more work to fix the mistakes you made the first time around, followed by more work to fix the mistakes you made the second time around, and so on and so forth until you die.

It doesn't scale.
I can figure out a line of code in my head, a routine, and even a small program. But a medium program? There are probably some guys that can do it, but how many, and how much do they cost? And should they really write the next payroll program? That's like wasting Mozart on muzak.
Now, try to imagine a team of Mozarts. Just for a few seconds.
Still it is a powerful instrument. If you can figure out a whole line in your head, do it. If you can figure out a small routine with all its funny cases, do it.
On the surface, it avoids going back to the drawing board because you didn't think of one edge case that requires a completely different interface altogether.
The deeper meaning (head fake?) can be explained by learning another human language. For a long time you thinking which words represent your thoughts, and how to order them into a valid sentence - that transcription costs a lot of foreground cycles.
One day you will notice the liberating feeling that you just talk. It may feel like "thinking in a foregin language", or as if "the words come naturally". You will sometimes stumble, looking for a particular word or idiom, but most of the time translation runs in the vast ressources of the "subconcious CPU".
The "high goal" is developing a mental model of the solution that is (mostly) independent of the implementation language, to separate solution of a problem from transcribing the problem. Transcription is easy, repetetive and easily trained, and abstract solutions can be reused.
I have no idea how this could be taught, but "figuring out as much as possible before you start to write it" sounds like good programming practice towards that goal.

A classic story from Usenet, about a true programming Mozart.
Real Programmers write in Fortran.
Maybe they do now, in this decadent
era of Lite beer, hand calculators and
"user-friendly" software but back in
the Good Old Days, when the term
"software" sounded funny and Real
Computers were made out of drums and
vacuum tubes, Real Programmers wrote
in machine code. Not Fortran. Not
RATFOR. Not, even, assembly language.
Machine Code. Raw, unadorned,
inscrutable hexadecimal numbers.
Directly.
Lest a whole new generation of
programmers grow up in ignorance of
this glorious past, I feel duty-bound
to describe, as best I can through the
generation gap, how a Real Programmer
wrote code. I'll call him Mel, because
that was his name.
I first met Mel when I went to work
for Royal McBee Computer Corp., a
now-defunct subsidiary of the
typewriter company. The firm
manufactured the LGP-30, a small,
cheap (by the standards of the day)
drum-memory computer, and had just
started to manufacture the RPC-4000, a
much-improved, bigger, better, faster
-- drum-memory computer. Cores cost too much, and weren't here to stay,
anyway. (That's why you haven't heard
of the company, or the computer.)
I had been hired to write a Fortran
compiler for this new marvel and Mel
was my guide to its wonders. Mel
didn't approve of compilers.
"If a program can't rewrite its own
code," he asked, "what good is it?"
Mel had written, in hexadecimal, the
most popular computer program the
company owned. It ran on the LGP-30
and played blackjack with potential
customers at computer shows. Its
effect was always dramatic. The LGP-30
booth was packed at every show, and
the IBM salesmen stood around talking
to each other. Whether or not this
actually sold computers was a question
we never discussed.
Mel's job was to re-write the
blackjack program for the RPC-4000.
(Port? What does that mean?) The new
computer had a one-plus-one addressing
scheme, in which each machine
instruction, in addition to the
operation code and the address of the
needed operand, had a second address
that indicated where, on the revolving
drum, the next instruction was
located. In modern parlance, every
single instruction was followed by a
GO TO! Put that in Pascal's pipe and
smoke it.
Mel loved the RPC-4000 because he
could optimize his code: that is,
locate instructions on the drum so
that just as one finished its job, the
next would be just arriving at the
"read head" and available for
immediate execution. There was a
program to do that job, an "optimizing
assembler", but Mel refused to use it.
"You never know where it's going to
put things", he explained, "so you'd
have to use separate constants".
It was a long time before I understood
that remark. Since Mel knew the
numerical value of every operation
code, and assigned his own drum
addresses, every instruction he wrote
could also be considered a numerical
constant. He could pick up an earlier
"add" instruction, say, and multiply
by it, if it had the right numeric
value. His code was not easy for
someone else to modify.
I compared Mel's hand-optimized
programs with the same code massaged
by the optimizing assembler program,
and Mel's always ran faster. That was
because the "top-down" method of
program design hadn't been invented
yet, and Mel wouldn't have used it
anyway. He wrote the innermost parts
of his program loops first, so they
would get first choice of the optimum
address locations on the drum. The
optimizing assembler wasn't smart
enough to do it that way.
Mel never wrote time-delay loops,
either, even when the balky
Flexowriter required a delay between
output characters to work right. He
just located instructions on the drum
so each successive one was just past
the read head when it was needed; the
drum had to execute another complete
revolution to find the next
instruction. He coined an
unforgettable term for this procedure.
Although "optimum" is an absolute
term, like "unique", it became common
verbal practice to make it relative:
"not quite optimum" or "less optimum"
or "not very optimum". Mel called the
maximum time-delay locations the "most
pessimum".
After he finished the blackjack
program and got it to run, ("Even the
initializer is optimized", he said
proudly) he got a Change Request from
the sales department. The program used
an elegant (optimized) random number
generator to shuffle the "cards" and
deal from the "deck", and some of the
salesmen felt it was too fair, since
sometimes the customers lost. They
wanted Mel to modify the program so,
at the setting of a sense switch on
the console, they could change the
odds and let the customer win.
Mel balked. He felt this was patently
dishonest, which it was, and that it
impinged on his personal integrity as
a programmer, which it did, so he
refused to do it. The Head Salesman
talked to Mel, as did the Big Boss
and, at the boss's urging, a few
Fellow Programmers. Mel finally gave
in and wrote the code, but he got the
test backwards, and, when the sense
switch was turned on, the program
would cheat, winning every time. Mel
was delighted with this, claiming his
subconscious was uncontrollably
ethical, and adamantly refused to fix
it.
After Mel had left the company for
greener pa$ture$, the Big Boss asked
me to look at the code and see if I
could find the test and reverse it.
Somewhat reluctantly, I agreed to
look. Tracking Mel's code was a real
adventure.
I have often felt that programming is
an art form, whose real value can only
be appreciated by another versed in
the same arcane art; there are lovely
gems and brilliant coups hidden from
human view and admiration, sometimes
forever, by the very nature of the
process. You can learn a lot about an
individual just by reading through his
code, even in hexadecimal. Mel was, I
think, an unsung genius.
Perhaps my greatest shock came when I
found an innocent loop that had no
test in it. No test. None. Common
sense said it had to be a closed loop,
where the program would circle,
forever, endlessly. Program control
passed right through it, however, and
safely out the other side. It took me
two weeks to figure it out.
The RPC-4000 computer had a really
modern facility called an index
register. It allowed the programmer to
write a program loop that used an
indexed instruction inside; each time
through, the number in the index
register was added to the address of
that instruction, so it would refer to
the next datum in a series. He had
only to increment the index register
each time through. Mel never used it.
Instead, he would pull the instruction
into a machine register, add one to
its address, and store it back. He
would then execute the modified
instruction right from the register.
The loop was written so this
additional execution time was taken
into account -- just as this
instruction finished, the next one was
right under the drum's read head,
ready to go. But the loop had no test
in it.
The vital clue came when I noticed the
index register bit, the bit that lay
between the address and the operation
code in the instruction word, was
turned on-- yet Mel never used the
index register, leaving it zero all
the time. When the light went on it
nearly blinded me.
He had located the data he was working
on near the top of memory -- the
largest locations the instructions
could address -- so, after the last
datum was handled, incrementing the
instruction address would make it
overflow. The carry would add one to
the operation code, changing it to the
next one in the instruction set: a
jump instruction. Sure enough, the
next program instruction was in
address location zero, and the program
went happily on its way.
I haven't kept in touch with Mel, so I
don't know if he ever gave in to the
flood of change that has washed over
programming techniques since those
long-gone days. I like to think he
didn't. In any event, I was impressed
enough that I quit looking for the
offending test, telling the Big Boss I
couldn't find it. He didn't seem
surprised.
When I left the company, the blackjack
program would still cheat if you
turned on the right sense switch, and
I think that's how it should be. I
didn't feel comfortable hacking up the
code of a Real Programmer.

Edsger Dijkstra discusses his views on Mozart vs Beethoven programming in this YouTube video entitled "Discipline in Thought".
People in this thread have pretty much discussed how Dikstra's views are impractical. I'm going to try and defend him some.
Dijkstra is against companies
essentially "testing" their software
on their customers. Releasing
version 1.0 and then immediately
patch 1.1. He felt that the program
should be polished to a degree that
"hotfix" patches are borderline
unethical.
He did not think that software should be written in one fell swoop or that changes would never need to be made. He often discusses his design ideals, one of them being modularity and ease of change. He often thought that individual algorithms should be written in this way however, after you have completely understood the problem. That was part of his discipline.
He found after all his extensive experience with programmers, that programmers aren't happy unless they are pushing the limits of their knowledge. He said that programmers didn't want to program something they completely and 100% understood because there was no challenge in it. Programmers always wanted to be on the brink of their knowledge. While he understood why programmers are like that he stated that it wasn't representative of low-error tolerance programming.
There are some industries or applications of programming that I believe Dijkstra's "discipline" are warranted as well. NASA Rovers, Health Industry embedded devices (ie dispense medication, etc), certain Financial software that transfer our money. These areas don't have the luxuries of incremental change after release and a more "Mozart Approach" is necessary.

I think the Mozart story confuses what gets shipped versus how it is developed. Beethoven did not beta-test his symphonies on the public. (It would be interesting to see how much he changed any of the scores after the first public performance.)
I also don't think that Dijkstra was insisting that it all be done in your head. After all, he wrote books on disciplined programming that involved working it out on paper, and to the same extent that he wanted to see mathematical-quality discipline, have you noticed how much paper and chalk board mathematicians may consume while working on a problem?
I favor Simucal's response, but I think the Mozart-Beethoven metaphor should be discarded. That shoe-horns Dijkstra's insistence on discipline and understanding into a corner where it really doesn't belong.
Additional Remarks:
The TV popularization is not so hot, and it confuses some things about musical composition and what a composer is doing and what a programmer is doing. In Dijkstra's own words, from his 1972 Turing Award Lecture: "We must not forget that it is not our business to make programs; it is our business to design classes of computations that will display a desired behavior." A composer may be out to discover the desired behavior.
Also, in Dijkstra's notion that version 1.0 should be the final version, we too easily confuse how desired behavior and functionality evolve over time. I believe he oversimplifies in thinking that all future versions are because the first one was not thought out and done rigorously and reliably.
Even without time-to-market urgency, I think we now understand much better that important kinds of software evolve along with the users experience with it and the utilitarian purpose they have for it. Obvious counter-examples are games (also consider how theatrical motion pictures are developed). Do you think Beethoven could have written Symphony No. 9 without all of his preceding experience and exploration? Do you think the audience could have heard it for what it was? Should he have waited until he had the perfect Sonata? I'm sure Dijkstra doesn't propose this, but I do think he goes too far with Mozart-Beethoven to make his point.
In addition, consider chess-playing software. The new versions are not because the previous ones didn't play correctly. It is about exploiting advances in chess-playing heuristics and the available computer power. For this and many other situations, the idea that version 1.0 be the final version is off base. I understand that he is rightfully objecting to the release of known-to-be unreliable and maybe impaired software with deficiencies to be made up in maintenance and future releases. But the Mozartian counter-argument doesn't hold up for me.
So, did Dijkstra continue to drive the first automobile he purchased, or clones of exactly that automobile? Maybe there is planned obsolescence, but a lot of it has to do with improvements and reliability that could not have possibly been available or even considered in previous generations of automotive technology.
I am a big Dijkstra fan, but I think the Mozart-Beethoven thing is way too simplistic as well as inappropriate. I am a big Beethoven fan too.

I think it's possible to appear to employ Mozart programming. I know of one company, Blizzard, that doesn't release a software product until they're good and ready. This doesn't mean that Diablo 3 will spring whole and complete from someone's mind in one session of dazzlingly brilliant coding. It does mean that that's how it will appear to the rest of us. Blizzard will test the heck out of their game internally, not showing it to the rest of the world until they've got all the kinks worked out. Most companies don't take this approach, preferring instead to release software when it's good enough to solve a problem, then fix bugs and add features as they come up. This approach works (to varying degrees) for most companies.

Well, we can't all be as good as Mozart, can we? Perhaps Beethoven programming is easier.

If Apple adopted "Mozart" programming, there would be no Mac OS X or iTunes today.
If Google adopted "Mozart" programming, there would be no Gmail or Google Reader.
If SO developers adopted "Mozart" programming, there would be no SO today.
If Microsoft adopted "Mozart" programming, there would be no Windows today (well, I think that would be good).
So the answer is simply NO. Nothing is perfect, and nothing is ever meant to be perfect, and that includes software.

I think the idea is to plan ahead. You need to at least have some kind of outline of what you are trying to do and how you plan to get there. If you just sit down at the keyboard and hope "the muse" will lead you to where your program needs to go, the results are liable to be rather uneven, and it will take you much longer to get there.
This is true with any kind of writing. Very few authors just sit down at a typewriter with no ideas and start banging away until a bestselling novel is produced. Heck, my father-in-law (a high school English teacher) actually writes outlines for his letters.

Progress in computing is worth a sacrifice in glory or genius here and there.

Hardest types of bugs to track? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
What are some of the nastiest, most difficult bugs you have had to track and fix and why?
I am both genuinely curious and knee deep in the process as we speak. So as they say - misery likes company.

Heisenbugs:
A heisenbug (named after the Heisenberg Uncertainty Principle) is a computer bug that disappears or alters its characteristics when an attempt is made to study it.

Race conditions and deadlocks. I do a lot of multithreaded processes and that is the hardest thing to deal with.

Bugs that happen when compiled in release mode but not in debug mode.

Any bug based on timing conditions. These often come when working with inter-thread communication, an external system, reading from a network, reading from a file, or communicating with any external server or device.

Bugs that are not in your code per se, but rather in a vendor's module on which you depend. Particularly when the vendor is unresponsive and you are forced to hack a work-around. Very frustrating!

We were developing a database to hold words and definitions in another language. It turns out that this language had only recently been added to the Unicode standard and it didn't make it into SQL Server 2005 (though it was added around 2005). This had a very frustrating effect when it came to collation.
Words and definitions went in just fine, I could see everything in Management Studio. But whenever we tried to find the definition for a given word, our queries returned nothing. After a solid 8 hours of debugging, I was at the point of thinking I had lost the ability to write a simple SELECT query.
That is, until I noticed English letters matched other English letters with any amount of foreign letters thrown in. For example, EnglishWord would match E!n#gl##$ish$&Word. (With !##$%^&* representing foreign letters).
When a collation doesn't know about a certain character, it can't sort them. If it can't sort them, it can't tell whether two string match or not (a surprise for me). So frustrating and a whole day down the drain for a stupid collation setting.

Threading bugs, especially race conditions. When you cannot stop the system (because the bug goes away), things quickly get tough.

The hardest ones I usually run into are ones that don't show up in any log trace. You should never silently eat an exception! The problem is that eating an exception often moves your code into an invalid state, where it fails later in another thread and in a completely unrelated manner.
That said, the hardest one I ever really ran into was a C program in a function call where the calling signature didn't exactly match the called signature (one was a long, the other an int). There were no errors at compile time or link time and most tests passed, but the stack was off by sizeof(int), so the variables after it on the stack would randomly have bad values, but most of the time it would work fine (the values following that bad parameter were generally being passed in as zero).
That was a BITCH to track.

Memory corruption under load due to bad hardware.

Bugs that happen on one server and not another, and you don't have access to the offending server to debug it.
Bugs that have to do with threading.

The most frustrating for me have been compiler bugs, where the code is correct but I've hit an undocumented corner case or something where the compiler's wrong. I start with the assumption that I've made a mistake, and then spend days trying to find it.
Edit: The other most frustrating was the time I got the test case set slightly wrong, so my code was correct but the test wasn't. That took days to find.
In general, I guess the worst bugs I've had have been the ones that aren't my fault.

The hardest bugs to track down and fix are those that combine all the difficult cases:
reported by a third party but you can't reproduce it under your own testing conditions;
bug occurs rarely and unpredictably (e.g. because it's caused by a race condition);
bug is on an embedded system and you can't attach a debugger;
when you try to get logging information out the bug goes away;
bug is in third-party code such as a library ...
... to which you don't have the source code so you have to work with disassembly only;
and the bug is at the interface between multiple hardware systems (e.g. networking protocol bugs or bus contention bugs).
I was working on a bug with all these features this week. It was necessary to reverse engineer the library to find out what it was up to; then generate hypotheses about which two devices were racing; then make specially-instrumented versions of the program designed to provoke the hypothesized race condition; then once one of the hypotheses was confirmed it was possible to synchronize the timing of events so that the library won the race 100% of the time.

There was a project building a chemical engineering simulator using a beowulf cluster. It so happened that the network cards would not transmit one particular sequence of bytes. If a packet contained that string, the packet would be lost. They solved the problem by replacing the hardware - finding it in the first place was much harder.

One of the hardest bugs I had to find was a memory corruption error that only occurred after the program had been running for hours. Because of the length of time it took to corrupt the data, we assumed hardware and tried two or three other computers first.
The bug would take hours to appear, and when it did appear it was usually only noticed quite a length of time after when the program got so messed up it started misbehaving. Narrowing down in the code base to where the bug was occurring was very difficult because the crashes due to corrupted memory never occurred in the function that corrupted the memory, and it took so damned long for the bug to manifest itself.
The bug turned out to be an off-by-one error in a rarely called piece of code to handle a data line that had something wrong with it (invalid character encoding from memory).
In the end the debugger proved next to useless because the crashes never occurred in the call tree for the offending function. A well sequenced stream of fprintf(stderr, ...) calls in the code and dumping the output to a file was what eventually allowed us to identify what the problem was.

Concurrency bugs are quite hard to track, because reproducing them can be very hard when you do not yet know what the bug is. That's why, every time you see an unexplained stack trace in the logs, you should search for the reason of that exception until you find it. Even if it happens only one time in a million, that does not make it unimportant.
Since you can not rely on the tests to reproduce the bug, you must use deductive reasoning to find out the bug. That in turn requires a deep understanding of how the system works (for example how Java's memory model works and what are possible sources of concurrency bugs).
Here is an example of a concurrency bug in Guice 1.0 which I located just some days ago. You can test your bug finding skills by trying to find out what is the bug causing that exception. The bug is not too hard to find - I found its cause in some 15-30 min (the answer is here).
java.lang.NullPointerException
at com.google.inject.InjectorImpl.injectMembers(InjectorImpl.java:673)
at com.google.inject.InjectorImpl$8.call(InjectorImpl.java:682)
at com.google.inject.InjectorImpl$8.call(InjectorImpl.java:681)
at com.google.inject.InjectorImpl.callInContext(InjectorImpl.java:747)
at com.google.inject.InjectorImpl.injectMembers(InjectorImpl.java:680)
at ...
P.S. Faulty hardware might cause even nastier bugs than concurrency, because it may take a long time before you can confidently conclude that there is no bug in the code. Luckily hardware bugs are rarer than software bugs.

A friend of mine had this bug. He accidentally put a function argument in a C program in square brackets instead of parenthesis like this: foo[5] instead of foo(5). The compiler was perfectly happy, because the function name is a pointer, and there is nothing illegal about indexing off a pointer.

One of the most frustrating for me was when the algorithm was wrong in the software spec.

Probably not the hardest, but they are extremely common and not trivial:
Bugs concerning mutable state. It is hard to maintain invariants in a data structure if it has many mutable fields. And you have operation order dependency - swap two lines and something bad occurs. One of my recent hard-to-find bugs was when I found that previous developer of the system I maintained used mutable data for hashtable keys - in some rare conditions it lead to infinite loops.
Order of initialization bugs. Can be obvious when found, but not so when coding.

The hardest one ever was actually a bug I was helping a friend with. He was writing C in MS Visual Studio 2005, and forgot to include time.h. He further called time without the required argument, usually NULL. This implicitly declared time like: int time(); This corrupted the stack, and in a completely unpredictable way. It was a large amount of code, and we didn't think to look at the time() call for quite some time.

Buffer overflows ( in native code )

Last year I spent a couple of months tracking a problem that ended up being a bug in a downstream system. The team lead from the offending system kept claiming that it must be something funny in our processing even though we passed the data just like they requested it from us. If the lead would have been a little more cooperative we might have nailed the bug sooner.

Uninitialized variables. (Or have modern languages done away with this?)

For embedded systems:
Unusual behaviour reported by customers in the field, but which we're unable to reproduce.
After that, bugs which turn out to be due to a freak series or concurrence of events. These are at least reproducable, but obviously they can take a long time - and a lot of experimentation - to make happen.

Difficulty of tracking:
off-by-one errors
boundary condition errors

Machine dependent problems.
I'm currently trying to debug why an application has an unhandled exception in a try{} catch{} block (yes, unhandled inside of a try / catch) that only manifests on certain OS / machine builds, and not on others.
Same version of software, same installation media, same source code, works on some - unhandled exception in what should be a very well handled part of code on others.
Gak.

When objects are cached and their equals and hashcode implementations are implemented so poorly that the hash code value isn't unique and the equals returns true when it isn't equal.

Cosmetic web bugs involving styling in various browser O/S configurations, e.g. a page looks fine in Windows and Mac in Firefox and IE but on the Mac in Safari something gets messed up. These are annoying sometimes because they require so much attention to detail and making the change to fix Safari may break something in Firefox or IE so one has to tread carefully and realize that the styling may be a series of hacks to fix page after page. I'd say those are my nastiest ones that sometimes just don't get fixed as they aren't viewed as important.

WAY back in the days, memory leaks. Thankfully, there's a lot of tools to find them, these days.

Memory issues, particularly on older systems. We have some legacy 16-bit C software that must remain 16-bit for the time being. The 64K memory blocks are royal pain to work with, and we constantly add statics or code logic that pushes us past the 64K group limits.
To make matters worse, memory errors usually don't cause the program to crash, but cause certain features to sporadically break (and not always the same features). Debugging is a non-option - the debugger doesn't have the same memory constraints so the programs always run fine in debug mode ... plus, we can't add inline printf statements for testing since that bumps the memory usage even higher.
As a result, we can sometimes spend DAYS trying to find a single block of code to rewrite, or hours moving static chars to files. Luckily the system is slowly being moved offline.

Multithreading, memory leaks, anything requiring extensive mocks, interfacing with third-party software.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio