Any pointers to fix the Unix millennium bug or Y2k38 problem? - linux-kernel

I've been reviewing the year 2038 problem (Unix Millennium Bug).
I read the article about this on Wikipedia, where I read about a solution for this problem.
Now I would like to change the time_t data type to an unsigned 32bit integer, which will allow me to be alive until 2106. I have Linux kernel 2.6.23 with RTPatch on PowerPC.
Is there any patch available that would allow me to change the time_t data type to an unsigned 32bit integer for PowerPC? Or any patch available to resolve this bug?

time_t is actually defined in your libc implementation, and not the kernel itself.
The kernel provides various mechanisms that provide the current time (in the form of system calls), many of which already support over 32-bits of precision. The problem is actually your libc implementation (glibc on most desktop Linux distributions), which, after fetching the time from the kernel, returns it back to your application in the form of a 32-bit signed integer data type.
While one could theoretically change the definition of time_t in your libc implementation, in practice it would be fairly complicated: such a change would change the ABI of libc, in turn requiring that every application using libc to also be recompiled from sources.
The easiest solution instead is to upgrade your system to a 64-bit distribution, where time_t is already defined to be a 64-bit data type, avoiding the problem altogether.

About the suggested 64-bit distribution suggested here, may I note all the issues with implementing that. There are many 32-bit NONPAE computers in the embedded industry. Replacing these with 64-bit computers is going to be a LARGE problem. Everyone is used to desktop's that get replaced/upgraded frequently. All Linux O.S. suppliers need to get serious about providing a different option. It's not like a 32-bit computer is flawed or useless or will wear out in 16 years. It doesn't take a 64 bit computer to monitor analog input , control equipment, and report alarms.

Related

powerpc-elf abi instead of elfv2 on 64 bit powerpc systems, is it possible? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
I have been trying to cross compile gcc for 64-bit powerpc architecture. However, GCC configuration lacks "powerpc64-elf" target. It has "powerpc64-linux", powerpc-rtems (which can produces 32/64 bit code).
Digging further, I have read the following document (which describes the ABI used by linux for powerpc64 arch):
https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html
Specification introduces an additional segment called TOC. Also, this specification uses ELFv2 format to address these changes.
My first (and maybe unrelated questions is) is;
using TOC for access to global variables that beneficial? Instead of using single load instruction, we have to jump to TOC table instead and then use a load instruction.
does something prevent us from using single load on powerpc systems?
At this point, I am fairly uncertain about the advantages of (or even the necessity of) ELFv2 on ELF.
My actual question is, when compiling GCC, if I were to just change ABI to default powerpc ABI,
will the code produced by this compiler, can still produce valid 64bit code?
I am guessing even if this works, I may not be able utilize some components of the hardware?
Thanks in advance.
Edit:
Clarification:
I forgot to mention that I am not planning to run the programs compiled using modified gcc on existing linux targets. Rather, I want to simplify ABI for OS architecture support package (possibly using the same architecture support both platforms).
In this case, I want to run 64 bit code (without 4GB memory limit) using ELFv1 ABI on powerpc64 architecture using powerpc-linux (rather than powerpc64-linux).
TOC/GOT overheads:
According to Bill's answer about GOT and TOC overheads, I compared the dumps of simple programs compiled powerpc32 and powerpc64 compilers. As he described GOT too uses extra level of indirection. TOC seems to intruduce 2 additional instructions (a load immediate followed by an add immediate - which are trivial).
Edit2: In the end, I opted to use standard ABI. Compiler and OS needs a handshake at this point.
But I did create a custom configuration of gcc by observing other OS (like linux, rtems) and following this tutorial: Structure of a gcc backend.
Short answer: No, you can't use powerpc-linux as your target for a 64-bit PowerPC target.
You need to cross compile for the target of the Linux distribution where your code is intended to run. For most modern distributions, the code will run in little-endian mode, so you need to target powerpc64le-linux. Some older distributions run in big-endian mode, with target powerpc64-linux. Generally speaking, powerpc64-linux uses the ELF v1 ABI, and powerpc64le-linux uses the ELF v2 ABI. Both use a TOC pointer.
PowerPC is a little unique in its use of a compiler-managed table-of-contents (TOC) and a linker-managed global offset table (GOT), as opposed to a single GOT that many targets use. But in practice the overhead is not that different, thanks to a variety of compiler and linker optimizations. In both systems, an extra level of indirection is necessary when accessing global shared variables because addresses of these are not known until run time, and are satisfied by the dynamic linker (except when using pure static linking, which is uncommon today).
In short, don't worry about the TOC, and set up your cross-compile for the environment in which your code is expected to run.

ARM softfp vs hardfp performance

I have an ARM based platform with a Linux OS. Even though its gcc-based toolchain supports both hardfp and softfp, the vendor recommends using softfp and the platform is shipped with a set of standard and platform-related libraries which have only softfp version.
I'm making a computation-intensive (NEON) AI code based on OpenCV and tensorflow lite. Following the vendor guide, I have built these with softfp option. However, I have a feeling that my code is underperformed compared to other somewhat alike hardfp platforms.
Does the code performance depend on softfp/hardfp setting? Do I understand it right that all .o and .a files the compiler makes to build my program are also using softfp convention, which is less effective? If it does, are there any tricky ways to use hardfp calling convention internally but softfp for external libraries?
Normally, all objects that are linked together need to have the same float ABI. So if you need to use this softfp only library, i'm afraid you have to compile your own software in softfp too.
I had the same question about mixing ABIs. See here
Regarding the performance: the performance lost with softfp compared to hardfp is that you will pass (floating point) function parameters through usual registers instead of using FPU registers. This requires some additional copy between registers. As old_timer said it is impossible to evaluate the performance lost. If you have a single huge function with many float operations, the performance will be the same. If you have many small function calls with many floating variables and few operations, the performance will be dramatically slower.
The softfp option only affects the parameter passing.
In other words, unless you are passing lots of float type arguments while calling functions, there won't be any measurable performance hit compared to hardfp.
And since well designed projects heavily rely on passing pointer to structures instead of many single values, I would stick to softfp.

Moving to 64-bit on OS X?

What is the best practice for moving to 64-bit on OS X? Using the 10.6 SDK and 64-bit intel as my SDK and target.
I have int32 types to change
Does OS X have an 'int64' or would one use a 'long long'?
Where might I find a resource to available data types?
What other issues are there?
Apple has exactly the documentation you want, called 64-bit transition guide. OS X uses LP64 model, so you should use
#ifdef __LP64__
etc. to conditionally compile things according to the bit width, especially if you want your code to be 32bit/64bit clean.
For Cocoa, see 64 bit Transition Guide for Cocoa. There, NSInteger has an appropriate bit width according to the mode, so you don't have to deal with the bit width yourself.
Both long and long long are 64-bit types when building 64-bit on OS X. In addition, you can use the int64_t and uint64_t types defined in <stdint.h> if you need to specify that an integer is exactly 64 bits wide.
You generally don't need to change existing int32 types in your program, unless they're being used to perform pointer arithmetic (or otherwise depend on them being the same size as a pointer). 32 bit arithmetic continues to work just fine in 64-bit programs. If you do have variables that must be the same size as a pointer, use the uintptr_t type, which will work in both 32- and 64-bit builds.
The other situation where you might need to make changes is if a API expects to be passed (or returns) a size_t or long or intptr_t, and you've been using int all this time, instead of what the function in question actually specifies. It will have worked on 32-bit builds, but may introduce errors when built for 64-bit.
Yuji's suggestion of reading the 64-bit transition guides is excellent advice.

Non-Linux Implementations of boost::random_device

Currently, Boost only implements the random_device class for Linux (maybe *nix) systems. Does anyone know of existing implementations for other OS-es? Ideally, these implementations would be open-source.
If none exist, how should I go about implementing a non-deterministic RNG for Windows as well as Mac OS X? Do API calls exist in either environment that would provide this functionality? Thanks (and sorry for all the questions)!
On MacOSX, you can use /dev/random (since it's a *nix).
On Windows, you probably want the CryptGenRandom function. I don't know if there's an implementation of boost::random_device that uses it.
Depends on what you want to use you RNG for.
In general terms, you'll feed seed data into a buffer, generate hash values of the buffer, mix a counter into the result and hash it some more. The reason for using a hash function is that good hashes are designed to yield random-looking results from input data that's more structured.
If you want to use it for cryptography, things'll turn a lot hairier. You'll need to jump through more hoops to ensure that your RNG keeps repeating patterns within reasonably safe limits. I can recommend Bruce Schneier's "Practical Cryptography" (for an introduction on RNGs, and a sample implementation). He's also got some RNG-related stuff up about his yarrow RNG.
If boost relies on /dev/random, chances are it works on MacOS also (as it has that).
On Windows there is CryptoAPI as part of the OS, and that provides a crypto quality RNG.
Also, I believe modern Intel CPUs have a hardware RNG on the chip - however you'd have to figure out how to get at that on each OS. Using the higher level APIs is probably a better bet.
edit: Here's a link to how the Intel RNG works
OpenSSL has a decent one.
#include <openssl/rand.h>
...
time_t now = time(NULL);
RAND_seed(&now, sizeof(now)); // before first number you need
int success = RAND_bytes(...);
if (!success) die_loudly();
RAND_cleanup(); // after you don't need any more numbers
Microsoft CryptoAPI has one on Win32. It requires a few more function calls. Not including the details here because there are 2 to 5 args to each of these calls. Be careful, CryptoAPI seems to require the user to have a complete local profile (C:\Documents and Settings\user\Local Settings) correctly set up before it can give you a random number.
CryptAcquireContext // see docs
CryptGenRandom
CryptReleaseContext

What's the difference between Managed/Byte Code and Unmanaged/Native Code?

Sometimes it's difficult to describe some of the things that "us programmers" may think are simple to non-programmers and management types.
So...
How would you describe the difference between Managed Code (or Java Byte Code) and Unmanaged/Native Code to a Non-Programmer?
Managed Code == "Mansion House with an entire staff or Butlers, Maids, Cooks & Gardeners to keep the place nice"
Unmanaged Code == "Where I used to live in University"
think of your desk, if you clean it up regularly, there's space to sit what you're actually working on in front of you. if you don't clean it up, you run out of space.
That space is equivalent to computer resources like RAM, Hard Disk, etc.
Managed code allows the system automatically choose when and what to clean up. Unmanaged Code makes the process "manual" - in that the programmer needs to tell the system when and what to clean up.
I'm astonished by what emerges from this discussion (well, not really but rhetorically). Let me add something, even if I'm late.
Virtual Machines (VMs) and Garbage Collection (GC) are decades old and two separate concepts. Garbage-collected native-code compiled languages exist, even these from decades (canonical example: ANSI Common Lisp; well, there is at least a compile-time garbage-collected declarative language, Mercury - but apparently the masses scream at Prolog-like languages).
Suddenly GCed byte-code based VMs are a panacea for all IT diseases. Sandboxing of existing binaries (other examples here, here and here)? Principle of least authority (POLA)/capabilities-based security? Slim binaries (or its modern variant SafeTSA)? Region inference? No, sir: Microsoft & Sun does not authorize us to even only think about such perversions. No, better rewrite our entire software stack for this wonderful(???) new(???) language§/API. As one of our hosts says, it's Fire and Motion all over again.
§ Don't be silly: I know that C# is not the only language that target .Net/Mono, it's an hyperbole.
Edit: it is particularly instructive to look at comments to this answer by S.Lott in the light of alternative techniques for memory management/safety/code mobility that I pointed out.
My point is that non technical people don't need to be bothered with technicalities at this level of detail.
On the other end, if they are impressed by Microsoft/Sun marketing it is necessary to explain them that they are being fooled - GCed byte-code based VMs are not this novelty as they claim, they don't solve magically every IT problem and alternatives to these implementation techniques exist (some are better).
Edit 2: Garbage Collection is a memory management technique and, as every implementation technique, need to be understood to be used correctly. Look how, at ITA Software, they bypass GC to obtain good perfomance:
4 - Because we have about 2 gigs of static data we need rapid access to,
we use C++ code to memory-map huge
files containing pointerless C structs
(of flights, fares, etc), and then
access these from Common Lisp using
foreign data accesses. A struct field
access compiles into two or three
instructions, so there's not really
any performance. penalty for accessing
C rather than Lisp objects. By doing
this, we keep the Lisp garbage
collector from seeing the data (to
Lisp, each pointer to a C object is
just a fixnum, though we do often
temporarily wrap these pointers in
Lisp objects to improve
debuggability). Our Lisp images are
therefore only about 250 megs of
"working" data structures and code.
...
9 - We can do 10 seconds of Lisp computation on a 800mhz box and cons
less than 5k of data. This is because
we pre-allocate all data structures we
need and die on queries that exceed
them. This may make many Lisp
programmers cringe, but with a 250 meg
image and real-time constraints, we
can't afford to generate garbage. For
example, rather than using cons, we
use "cons!", which grabs cells from an
array of 10,000,000 cells we've
preallocated and which gets reset
every query.
Edit 3: (to avoid misunderstanding) is GC better than fiddling directly with pointers? Most of the time, certainly, but there are alternatives to both. Is there a need to bother users with these details? I don't see any evidence that this is the case, besides dispelling some marketing hype when necessary.
I'm pretty sure the basic interpretation is:
Managed = resource cleanup managed by runtime (i.e. Garbage Collection)
Unmanaged = clean up after yourself (i.e. malloc & free)
Perhaps compare it with investing in the stock market.
You can buy and sell shares yourself, trying to become an expert in what will give the best risk/reward - or you can invest in a fund which is managed by an "expert" who will do it for you - at the cost of you losing some control, and possibly some commission. (Admittedly I'm more of a fan of tracker funds, and the stock market "experts" haven't exactly done brilliant recently, but....)
Here's my Answer:
Managed (.NET) or Byte Code (Java) will save you time and money.
Now let's compare the two:
Unmanaged or Native Code
You need to do your own resource (RAM / Memory) allocation and cleanup. If you forget something, you end up with what's called a "Memory Leak" that can crash the computer. A Memory Leak is a term for when an application starts using up (eating up) Ram/Memory but not letting it go so the computer can use if for other applications; eventually this causes the computer to crash.
In order to run your application on different Operating Systems (Mac OSX, Windows, etc.) you need to compile your code specifically for each Operating System, and possibly change alot of code that is Operating System specific so it works on each Operating System.
.NET Managed Code or Java Byte Code
All the resource (RAM / Memory) allocation and cleanup are done for you and the risk of creating "Memory Leaks" is reduced to a minimum. This allows more time to code features instead of spending it on resource management.
In order to run you application on different Operating Systems (Mac OSX, Windows, etc.) you just compile once, and it'll run on each as long as they support the given Framework you are app runs on top of (.NET Framework / Mono or Java).
In Short
Developing using the .NET Framework (Managed Code) or Java (Byte Code) make it overall cheaper to build an application that can target multiple operating systems with ease, and allow more time to be spend building rich features instead of the mundane tasks of memory/resource management.
Also, before anyone points out that the .NET Framework doesn't support multiple operating systems, I need to point out that technically Windows 98, WinXP 32-bit, WinXP 64-bit, WinVista 32-bit, WinVista 64-bit and Windows Server are all different Operating Systems, but the same .NET app will run on each. And, there is also the Mono Project that brings .NET to Linux and Mac OSX.
Unmanaged code is a list of instructions for the computer to follow.
Managed code is a list of tasks for the computer follow that the computer is free to interpret on its own on how to accomplish them.
The big difference is memory management. With native code, you have to manage memory yourself. This can be difficult and is the cause of a lot of bugs and lot of development time spent tracking down those bugs. With managed code, you still have problems, but a lot less of them and they're easier to track down. This normally means less buggy software, and less development time.
There are other differences, but memory management is probably the biggest.
If they were still interested I might mention how a lot of exploits are from buffer overruns and that you don't get that with managed code, or that code reuse is now easy, or that we no longer have to deal with COM (if you're lucky anyway). I'd probably stay way from COM otherwise I'd launch into a tirade over how awful it is.
It's like the difference between playing pool with and without bumpers along the edges. Unless you and all the other players always make perfect shots, you need something to keep the balls on the table. (Ignore intentional ricochets...)
Or use soccer with walls instead of sidelines and endlines, or baseball without a backstop, or hockey without a net behind the goal, or NASCAR without barriers, or football without helmets ...)
"The specific term managed code is particularly pervasive in the Microsoft world."
Since I work in MacOS and Linux world, it's not a term I use or encounter.
The Brad Abrams "What is Managed Code" blog post has a definition that say things like ".NET Framework Common Language Runtime".
My point is this: it may not be appropriate to explain it the terms at all. If it's a bug, hack or work-around, it's not very important. Certainly not important enough to work up a sophisticated lay-persons description. It may vanish with the next release of some batch of MS products.

Resources