did you heard about some of network card which support checksum offload functionality has an error in it? this may cause many problem like unexpected connection lost, most udp packet loss under a certain condition, or broken downloaded files, etc.
recently, i have experienced this kind of problem while developing a network program in some of our company's computer. and i figured out that the problem is about the checksum offload error. it was really hard to find the reason.
after i have learned about this, i asked my ex-colleagues and googled about this problem. so i noticed that the problem is not many but widely spread. and it's very hard to find the cause, hence the many people would have suffered from this error so far.
although, erroneous network card causes this problem, but those of user who is not friendly with computer hardly can't update driver or disabling checksum offload option. probably, they wouldn't have know what causes their problem.
hence my question is... does Microsoft has any plan to solve this problem? i think Microsoft can diagnose this and should make workaround for this. and ship the solution via windows update. then many of users who suffering from this problem all around the world will be happy. and network programmer either. ;-)
I have investigated some of symptoms related the checksum offload fault. there are so many kind of aspect from continuous and serious one to hard to aware one.
sadly, i think someone has erroneous NIC should try this-n-that as he or she can to figure out the cause. we hardly expect helping hand. even if you're a computer-illiterate. and hardware manufacturer would not recall their product. it's ridiculous. what the fun...
Are you sure it's faulty hardware and not external influence, e.g. by malware or over-protective antivirus/firewall systems?
As you're working in a company: swap the disks of two identical computers and see if the problem still appears. If it does, you've got a software problem. If not, fix the hardware.
I'm developing a commercial project on an ARM based embedded board with a custom Linux kernel on it, using Ruby. Target workspace of the project and the device is a closed-environment, no ethernet, inernet, I/O devices etc... I want to protect my code/program so that; it'll only work on the specific machines I let (so; people cant just copy and paste my code/program on to their embedded boards and run it w/o permission). This can probably done with the machine's MAC address tho; I don't have any experience on the subject. I guess, just a simple if(device.MACAddr == "XX:XX....XX") wouldn't be depandable (not to mention people can just easily delete the check from my code). I can't use some ruby obfuscators, which I found thru google, beacuse; the device doesnt run ruby-external-C-libraries or such stuff, only pure ruby code.
So; what are your suggestions, what type of approach should I take?
you can't really protect it, its hard enough protecting native code! and even then that basically fails if someone really wants to copy the software.
basically do very little if anything to secure it, its mostly wasted time and effort
This is isomorphic to the problem of DRM. You're giving a person both a lock and the key to that lock, and trying to stop that person from using the key in a way you don't like.
Therefore, I suggest using the same methods that other DRM users do: put your terms in the license, and sue them if they violate it. You need to use the law to enforce the other terms of the license, anyway.
The question says it all. If you have a bug that multiple users report, but there is no record of the bug occurring in the log, nor can the bug be repeated, no matter how hard you try, how do you fix it? Or even can you?
I am sure this has happened to many of you out there. What did you do in this situation, and what was the final outcome?
Edit:
I am more interested in what was done about an unfindable bug, not an unresolvable bug. Unresolvable bugs are such that you at least know that there is a problem and have a starting point, in most cases, for searching for it. In the case of an unfindable one, what do you do? Can you even do anything at all?
Language
Different programming languages will have their own flavour of bugs.
C
Adding debug statements can make the problem impossible to duplicate because the debug statement itself shifts pointers far enough to avoid a SEGFAULT---also known as Heisenbugs. Pointer issues are arduous to track and replicate, but debuggers can help (such as GDB and DDD).
Java
An application that has multiple threads might only show its bugs with a very specific timing or sequence of events. Improper concurrency implementations can cause deadlocks in situations that are difficult to replicate.
JavaScript
Some web browsers are notorious for memory leaks. JavaScript code that runs fine in one browser might cause incorrect behaviour in another browser. Using third-party libraries that have been rigorously tested by thousands of users can be advantageous to avoid certain obscure bugs.
Environment
Depending on the complexity of the environment in which the application (that has the bug) is running, the only recourse might be to simplify the environment. Does the application run:
on a server?
on a desktop?
in a web browser?
In what environment does the application produce the problem?
development?
test?
production?
Exit extraneous applications, kill background tasks, stop all scheduled events (cron jobs), eliminate plug-ins, and uninstall browser add-ons.
Networking
As networking is essential to so many applications:
Ensure stable network connections, including wireless signals.
Does the software reconnect after network failures robustly?
Do all connections get closed properly so as to release file descriptors?
Are people using the machine who shouldn't be?
Are rogue devices interacting with the machine's network?
Are there factories or radio towers nearby that can cause interference?
Do packet sizes and frequency fall within nominal ranges?
Are packets being monitored for loss?
Are all network devices adequate for heavy bandwidth usage?
Consistency
Eliminate as many unknowns as possible:
Isolate architectural components.
Remove non-essential, or possibly problematic (conflicting), elements.
Deactivate different application modules.
Remove all differences between production, test, and development. Use the same hardware. Follow the exact same steps, perfectly, to setup the computers. Consistency is key.
Logging
Use liberal amounts of logging to correlate the time events happened. Examine logs for any obvious errors, timing issues, etc.
Hardware
If the software seems okay, consider hardware faults:
Are the physical network connections solid?
Are there any loose cables?
Are chips seated properly?
Do all cables have clean connections?
Is the working environment clean and free of dust?
Have any hidden devices or cables been damaged by rodents or insects?
Are there bad blocks on drives?
Are the CPU fans working?
Can the motherboard power all components? (CPU, network card, video card, drives, etc.)
Could electromagnetic interference be the culprit?
And mostly for embedded:
Insufficient supply bypassing?
Board contamination?
Bad solder joints / bad reflow?
CPU not reset when supply voltages are out of tolerance?
Bad resets because supply rails are back-powered from I/O ports and don't fully discharge?
Latch-up?
Floating input pins?
Insufficient (sometimes negative) noise margins on logic levels?
Insufficient (sometimes negative) timing margins?
Tin whiskers?
ESD damage?
ESD upsets?
Chip errata?
Interface misuse (e.g. I2C off-board or in the presence of high-power signals)?
Race conditions?
Counterfeit components?
Network vs. Local
What happens when you run the application locally (i.e., not across the network)? Are other servers experiencing the same issues? Is the database remote? Can you use a local database?
Firmware
In between hardware and software is firmware.
Is the computer BIOS up-to-date?
Is the BIOS battery working?
Are the BIOS clock and system clock synchronized?
Time and Statistics
Timing issues are difficult to track:
When does the problem happen?
How frequently?
What other systems are running at that time?
Is the application time-sensitive (e.g., will leap days or leap seconds cause issues)?
Gather hard numerical data on the problem. A problem that might, at first, appear random, might actually have a pattern.
Change Management
Sometimes problems appear after a system upgrade.
When did the problem first start?
What changed in the environment (hardware and software)?
What happens after rolling back to a previous version?
What differences exist between the problematic version and good version?
Library Management
Different operating systems have different ways of distributing conflicting libraries:
Windows has DLL Hell.
Unix can have numerous broken symbolic links.
Java library files can be equally nightmarish to resolve.
Perform a fresh install of the operating system, and include only the supporting software required for your application.
Java
Make sure every library is used only once. Sometimes application containers have a different version of a library than the application itself. This might not be possible to replicate in the development environment.
Use a library management tool such as Maven or Ivy.
Debugging
Code a detection method that triggers a notification (e.g., log, e-mail, pop-up, pager beep) when the bug happens. Use automated testing to submit data into the application. Use random data. Use data that covers known and possible edge cases. Eventually the bug should reappear.
Sleep
It is worth reiterating what others have mentioned: sleep on it. Spend time away from the problem, finish other tasks (like documentation). Be physically distant from computers and get some exercise.
Code Review
Walk through the code, line-by-line, and describe what every line does to yourself, a co-worker, or a rubber duck. This may lead to insights on how to reproduce the bug.
Cosmic Radiation
Cosmic Rays can flip bits. This is not as big as a problem in the past due to modern error checking of memory. Software for hardware that leaves Earth's protection is subject to issues that simply cannot be replicated due to the randomness of cosmic radiation.
Tools
Sometimes, albeit infrequently, the compiler will introduce a bug, especially for niche tools (e.g. a C micro-controller compiler suffering from a symbol table overflow). Is it possible to use a different compiler? Could any other tool in the tool-chain be introducing issues?
If it's a GUI app, it's invaluable to watch the customer generate the error (or try to). They'll no doubt being doing something you'd never have guessed they were doing (not wrongly, just differently).
Otherwise, concentrate your logging in that area. Log most everything (you can pull it out later) and get your app to dump its environment as well. e.g. machine type, VM type, encoding used.
Does your app report a version number, a build number, etc.? You need this to determine precisely which version you're debugging (or not!).
If you can instrument your app (e.g. by using JMX if you're in the Java world) then instrument the area in question. Store stats e.g. requests+parameters, time made, etc. Make use of buffers to store the last 'n' requests/responses/object versions/whatever, and dump them out when the user reports an issue.
If you can't replicate it, you may fix it, but can't know that you've fixed it.
I've made my best explanation about how the bug was triggered (even if I didn't know how that situation could come about), fixed that, and made sure that if the bug surfaced again, our notification mechanisms would let a future developer know the things that I wish I had known. In practice, this meant adding log events when the paths which could trigger the bug were crossed, and metrics for related resources were recorded. And, of course, making sure that the tests exercised the code well in general.
Deciding what notifications to add is a feasability and triage question. So is deciding on how much developer time to spend on the bug in the first place. It can't be answered without knowing how important the bug is.
I've had good outcomes (didn't show up again, and the code was better for it), and bad (spent too much time not fixing the problem, whether the bug ended up fixed or not). That's what estimates and issue priorities are for.
Sometimes I just have to sit and study the code until I find the bug. Try to prove that the bug is impossible, and in the process you may figure out where you might be mistaken. If you actually succeed in convincing yourself it's impossible, assume you messed up somewhere.
It may help to add a bunch of error checking and assertions to confirm or deny your beliefs/assumptions. Something may fail that you'd never expect to.
It can be difficult, and sometimes near impossible. But my experience is, that you will sooner or later be able to reproduce and fix the bug, if you spend enough time on it (if that spent time is worth it, is another matter).
General suggestions that might help in this situation.
Add more logging, if possible, so that you have more data the next time the bug appears.
Ask the users, if they can replicate the bug. If yes, you can have them replicate it while watching over their shoulder, and hopefully find out, what triggers the bug.
Make random changes until something works :-)
Assuming you have already added all the logging that you think would help and it didn't... two things spring to mind:
Work backwards from the reported symptom. Think to yourself.. "it I wanted to produce the symptom that was reported, what bit of code would I need to be executing, and how would I get to it, and how would I get to that?" D leads to C leads to B leads to A. Accept that if a bug is not reproducible, then normal methods won't help. I've had to stare at code for many hours with these kind of thought processes going on to find some bugs. Usually it turns out to be something really stupid.
Remember Bob's first law of debugging: if you can't find something, it's because you're looking in the wrong place :-)
Think. Hard. Lock yourself away, admit no interuptions.
I once had a bug where the evidence was a hex dump of a corrupt database. The chains of pointers were systematically screwed up. All the user's programs, and our database software, worked faultlessly in testing. I stared at it for a week (it was an important customer), and after eliminating dozens of possible ideas, I realised that the data was spread across two physical files and the corruption occurred where the chains crossed file boundaries. I realized that if a backup/restore operation failed at a critical point, the two files could end up "out of sync", restored to different time points. If you then ran one of the customer's programs on the already-corrupt data, it would produce exactly the knotted chains of pointers I was seeing. I then demonstrated a sequence of events that reproduced the corruption exactly.
modify the code where you think the problem is happening, so extra debug info is recorded somewhere. when it happens next time, you will have what your need to solve the problem.
There are two types of bugs you can't replicate. The kind you discovered, and the kind someone else discovered.
If you discovered the bug, you should be able to replicate it. If you can't replicate it, then you simply haven't considered all of the contributing factors leading towards the bug. This is why whenever you have a bug, you should document it. Save the log, get a screenshot, etc. If you don't, then how can you even prove the bug really exists? Maybe it's just a false memory?
If someone else discovered a bug, and you can't replicate it, obviously ask them to replicate it. If they can't replicate it, then you try to replicate it. If you can't replicate it quickly, ignore it.
I know that sounds bad, but I think it is justified. The amount of time it will take you to replicate a bug that someone else discovered is very large. If the bug is real, it will happen again naturally. Someone, maybe even you, will stumble across it again. If it is difficult to replicate, then it is also rare, and probably won't cause too much damage if it happens a few more times.
You can be a lot more productive if you spend your time actually working, fixing other bugs and writing new code, than you will be trying to replicate a mystery bug that you can't even guarantee actually exists. Just wait for it to appear again naturally, then you will be able to spend all your time fixing it, rather than wasting your time trying to reveal it.
Discuss the problem, read code, often quite a lot of it. Often we do it in pairs, because you can usually eliminate the possibilities analytically quite quickly.
Start by looking at what tools you have available to you. For example crashes on a Windows platform go to WinQual, so if this is your case you now have crash dump information. Do you can static analysis tools that spot potential bugs, runtime analysis tools, profiling tools?
Then look at the input and output. Anything similar about the inputs in situations when users report the error, or anything out of place in the output? Compile a list of reports and look for patterns.
Finally, as David stated, stare at the code.
Ask user to give you a remote access for his computer and see everything yourself. Ask user to make a small video of how he reproduces this bug and send it to you.
Sure both are not always possible but if they are it may clarify some things. The common way of finding bugs are still the same: separating parts that may cause bug, trying to understand what`s happening, narrowing codespace that could cause the bug.
There are tools like gotomeeting.com, which you can use to share screen with your user and observe the behaviour. There could be many potential problems like number of softwares installed on their machines, some tools utility conflicting with your program. I believe gotomeeting, is not the only solution, but there could be timeout issues, slow internet issue.
Most of times I would say softwares do not report you correct error messages, for example, in case of java and c# track every exceptions.. dont catch all but keep a point where you can catch and log. UI Bugs are difficult to solve unless you use remote desktop tools. And most of time it could be bug in even third party software.
If you work on a real significant sized application, you probably have a queue of 1,000 bugs, most of which are definitely reproducible.
Therefore, I'm afraid I'd probably close the bug as WORKSFORME (Bugzilla) and then get on fixing some more tangible bugs. Or doing whatever the project manager decides to do.
Certainly making random changes is a bad idea, even if they're localised, because you risk introducing new bugs.
I'm looking for the concept to spawn a process such that:
it has only access to certain libraries/APIs
it cannot acess the file system or only specific parts
it can do least harm should malicious code run in it
This concept is known as sandbox or jail.
It is required to do this for each major Operating system (Windows, MacOSX and Linux) and the question is conceptual (as in what to do, which APIs to use and and what to observe) rather then language specific.
answer requirements
I really want to accept an answer and give you 20 points for that. I cannot accept my own answer, and I don't have it yet anyway. So if you really want your answer to be accepted, please observe:
The answer has to be specific and complete
With specific I mean that it is more then a pointer to some resource on the internet. It has to summarize what the resource says about the topic at least.
It may or may not contain example code, but if it does please write it in C
I cannot accept an answer that is 2/3 complete even if the 2/3 that are there are perfect.
this question FAQ
Is this homework? No.
Why do you ask this like a homework question? If you ask a specific question and you want to get a specific answer, and you know how that answer should look like, even though you don't know the answer, that's the style of question you get.
If you know how it should look like, why do you ask? 1) because I don't know all the answer 2) because on the internet there's no single place that contains all the details to this question in one place. Please also read the stackoverflow FAQ
Why is the main part of your question how to answer this question? Because nobody reads the FAQ.
Mac OS X has a sandbox facility code-named Seatbelt. The public API for it is documented in the sandbox(7), sandbox_init(3), and related manual pages. The public API is somewhat limited, but the facility itself is very powerful. While the public API only lets you choose from some pre-defined sandboxes (e.g. “All sockets-based networking is prohibited”), you can also use the more powerful underlying implementation which allows you to specify exactly what operating system resources are available via a Scheme-like language. For example, here is an excerpt of the sandbox used for portmap:
(allow process-exec (regex #"^/usr/sbin/portmap$"))
(allow file-read-data file-read-metadata (regex
#"^/etc"
#"^/usr/lib/.*\.dylib$"
#"^/var"
#"^/private/var/db/dyld/"
#"^/dev/urandom$"))
(allow file-write-data (regex
#"^/dev/dtracehelper$"))
You can see many sandboxes used by the system in /usr/share/sandbox. It is easy to experiment with sandboxes by using the sandbox-exec(1) command.
For Windows, you may want to have a look at David LeBlanc’s “Practical Sandboxing” talk given at Black Hat USA 2007. Windows has no built-in sandboxing technology per se, so the techniques described leverage an incomplete mechanism introduced with Windows 2000 called SAFER. By using restricted tokens, one can create a process that has limited access to operating system resources.
For Linux, you might investigate the complicated SELinux mechanism:
SELinux home,
a HOWTO. It is used by Red Hat, for example, to harden some system services in some of their products.
For Windows there is a sandbox in Google Chrome. You may want to investigate it. It uses liberal BSD-like license.
For Linux there would be good old chroot or more sophisticated http://plash.beasts.org/wiki/.
OS X since Leopard has some SELinux-like protection available.
The site codepad.prg has a good "About" page on how they safely allow the execution of any code snippets..
Code execution is handled by a supervisor based on geordi. The strategy is to run everything under ptrace, with many system calls disallowed or ignored. Compilers and final executables are both executed in a chroot jail, with strict resource limits. The supervisor is written in Haskell.
When your app is remote code execution, you have to expect security problems. Rather than rely on just the chroot and ptrace supervisor, I've taken some additional precautions:
The supervisor processes run on virtual machines, which are firewalled such that they are incapable of making outgoing connections.
The machines that run the virtual machines are also heavily firewalled, and restored from their source images periodically.
FreeBSD has specific concepts of jails, and Solaris has containers. Depending on what you're looking for, these may help.
chroot jails can help to limit what an application can do (though any app with root privileges can escape a jail), and they're available on most UNIXen, including OS X.
As for Windows, I'm not sure. If there was an easy way to sandbox a Windows app, most of them would be a lot more secure by now, I'm sure.
On windows (2000 and later) you can use Job objects to restrict processes.
If you really want a technique that will work with all these platforms, as opposed to a separate solution for each platform, then I think your only answer is to set up a virtual machine for each testing environment. You can restore back to a snapshot at any time.
Another big advantage of using virtualization is that you can have all of the testing environments with their guest operating systems all on the same box.
For Linux, there is AppArmor. Unfortunately, the project is somewhat on hiatus.
Another sandboxing-alternative is VServer, which uses virtualization.
Generally any virtual private server will do:
Linux VServer
http://linux-vserver.org/Welcome_to_Linux-VServer.org
Parallels Virtuozzo Containers
http://www.parallels.com/products/pvc/
and as was mentioned FreeBSD and Solaris has own implementations.
Oh. actually I've noticed you're asking it to work on ANY OS. Well, that might be complicated a bit as the I think less effort is just to reuse some VM that can support some level of sandboxing like:
Java
.NET
I'm not an expert on the topic, but i think the standard answer for linux is to define a SeLinux policy with the right capabilities for the process.