Processor architecture - processor

While HDDs evolve and offer more and more space on less room, why are we "sticking with" 32-bit or 64-bit?
Why can't there be a e.g.: 128-bit processor?
(This is not my homework; I'm just a student interested beyond the things they teach us in informatics)

Because the difference between 32-bit and 64-bit is astronomical - it's really the difference between 232 (a ten-digit number in the billions) and 264 (a twenty-digit number in the squillions :-).
64 bits will be more than enough for decades to come.

There's very little need for this, when do you deal with numbers that large? The current addressable memory space available to 64-bit is well beyond what any machine can handle for at least a few years...and beyond that it's probably more than any desktop will hold for quite a while.
Yes, desktop memory will continue to increase, but 4 billion times what it is now? That's going to take a while...sure we'll get to 128-bit, if the whole current model isn't thrown out before then, which I see equally as likely.
Also, it's worth noting that upgrading something from 32-bit to 64-bit puts you in a performance hole immediately in most scenarios (this is a major reason Visual Studio 2010 remains 32-bit only). The same will happen with 64-bit to 128-bit. The more small objects you have, the more pointers, which are now twice as large, that's more data to pass around to do the same thing, especially if you don't need that much addressable memory space.

When we talk about an n-bit architecture we are often conflating two rather different things:
(1) n-bit addressing, e.g. a CPU with 32-bit address registers and a 32-bit address bus can address 4 GB of physical memory
(2) size of CPU internal data paths and general purpose registers, e.g. a CPU with 32-bit internal architecture has 32-bit registers, 32-bit integer ALUs, 32-bit internal data paths, etc
In many cases (1) and (2) are the same, but there are plenty of exceptions and this may become increasingly the case, e.g. we may not need more than 64-bit addressing for the forseeable future, but we may want > 64 bits for registers and data paths (this is already the case with many CPUs with SIMD support).
So, in short, you need to be careful when you talk about, e.g. a "64-bit CPU" - it can mean different things in different contexts.

Cost. Also, what do you think the 128-bit architecture will get you? Memory addressing and such, but to handle it effectively, you need higher bandwidth buses and basically some new instruction languages that handle it. 64-bit is more than enough for addressing (18446744073709551616 bytes).
HDDs still have a bit of ground to catchup to RAM and such. They're still going to be the IO bottleneck I think. Plus, newer chips are just supporting more cores rather than making a massive change to the language.

Well, I happen to be a professional computer architect (my inventions are probably in the computer you are reading this on), and although I have not yet been paid to work on any processor with more than 64 bits of address, I know some of my friends who have been.
And I have been playing around with 128 bit architectures for fun for a few decades.
I.e. its already happening.
Actually, it has already happened to a limited extent. The HP Precision Architecture, Intel Itanium, and the higher end versions of the IBM Power line, have what I call a folded virtual memory. I have described these elsewhere, e.g. in comp.arch posts in some details, http://groups.google.com/group/comp.arch/browse_thread/thread/53a7396f56860e17/f62404dd5782f309?lnk=gst&q=folded+virtual+memory#f62404dd5782f309
I need to create a comp-arch.net wiki post for these.
But you can get the manuals for these processors and read them yourself.
E.g. you might start with a 64 bit user virtual address.
The upper 8 bits may be used to index a region table, that returns an upper 24 bits that is concatenated with the remaining 64-8=56 bits to produce an 80 bit expanded virtual address. Which is then translated by TLBs and page tables and hash lookups, as usual,
to whatever your physical address is.
Why go from 64->80?
One reason is shared libraries. You may want to have the shared libraries to stay at the same expanded virtual address in all processors, so that you cam share TLB entries. But you may be required, by your language tools, to relocate them to different user virtual addresses. Folded virtual addresses allow this.
Folded virtual addresses are not true >64 bit virtual addresses usable by the user.
For that matter, there are many proposals for >64 bit pointers: e.g. I worked on one where a pointer consisted of a 64bit address, and 64 bit lower and upper bounds, and metadata, for a total of 128 bits. Bounds checking. But, although these have >64 bit pointers or capabilities, they are not truly >64 bit virtual addresses.
Linus posts about 128 bit virtual addresses at http://www.realworldtech.com/beta/forums/index.cfm?action=detail&id=103574&threadid=103545&roomid=2

I'd also like to offer a computer architect's view of why 128bit is impractical at the moment:
Energy cost. See Bill Dally's presentations on how today, most energy in processors is spent moving data around (dissipated in the wires). However, since the most significant bits of a 128bit computation should change little, it should mitigate this problem.
Most arithmetic operations have a non-linear cost w.r.t operand size:
a. A tree multiplier has space complexity n^2, w.r.t. number of bits.
b. The delay of a hierarchical carry look ahead adder is Log[n] w.r.t number of bits (I think). So a 128bit adder will be slower than a 64bit add. Can anyone give some hard numbers (Log[n] seems very cheap) ?
Few programs use 128bit integers or quad precision floating point, and when they do, there are efficient ways to compose them from 32 or 64bit ops.

The next big thing in processor's architecture will be quantum computing. Instead of beeing just 0 or 1, a qbit has a probability of being 0 or 1.
This will lead to huge improvements in the performance of algorithm (for instance, it will be very easy to crack down any RSA private/public key).
Check http://en.wikipedia.org/wiki/Quantum_computer for more information and see you in 15 years ;-)

The main need for a 64 bit processor is to address more memory - and that is the driving force to switch to 64 bit. On 32 bit systems, you can really only address 4Gb of RAM, at least per process. 4Gb is not much.
64 bits give you an address space of several petabytes.(though, a lot of current 64 bit hardware can address "only" 48 bits - thats still enough to support 256 terrabytes of ram though).
Upping the natural integer sizes for a processor does not automatically make it "better" though. There are tradeoffs. With 128bit you'd need twice as much storage(registers/ram/caches/etc.) compared to 64 bit for common data types - with all the drawback that might have - more ram needed to store data, more data to transmit = slower, wider buses might requires more physical space/perhaps more power, etc.

Related

Are there performance advantages of 32 bit apps over 64 bit ones, on x86-64?

I know the advantages of 64 bit over 32 bit, but except for compatibility, are there any advantages of 32 bit applications over 64 bit ones that could make 32-bit application faster or otherwise more efficient?
There's one big advantage: 32-bit applications use significantly less memory (precisely because pointers are smaller). Not everything is a pointer, e.g. strings and numbers don't change their size, so the effective difference is not 2x. I happen to know about JavaScript engines specifically, where the 64-bit version typically uses around 50% more memory for the same workload than the 32-bit version of the same engine.
V8 has recently addressed this by implementing "pointer compression" in its 64-bit version. In theory, any C/C++ app could do the same thing, but it's a big engineering effort.
That said, this generally isn't a reason not to move to 64-bit, as other benefits (more registers, more address space) typically outweigh this drawback. But it does mean that if you're targeting devices/machines with less than 4GiB memory anyway, you might want to stick with 32-bit builds, if memory consumption is a concern.
(Performance, in my experience, is a mixed bag: smaller code and smaller data mean better cache utilization on 32-bit; OTOH having more and wider registers on 64-bit can save instructions there. In rare extreme cases, a 64-bit app can process twice as much data in the same time; most of the time the difference will only be in the range 1-5%, and can go in either direction: sometimes a 32-bit build is indeed a little faster than a 64-bit build; it really depends on what the app is doing.)
In short, no. To be more accurate, it might theoretically be the case for some processors, but none that I'm aware of.
The only other difference that comes to my mind is that 32-bit instuctions are generally smaller (due to not having a REX prefix, at least), so you might save some space this way, but it probably doesn't outweight the benefits of x64. Considering that instructions unique to x64 also tend to have higher impact, i.e. manipulating more data at once, the code might even turn out to be more compact in x64. And, for the same reason, x32 is generally slower. So no, x32 doesn't have any real advantages over x64, other than compatibility.
For windows apps, 32 bit are viewed as "most portable" (easier to distribute) though that's becoming less of an issue.
For memory hogs like Ruby it seemed to me like it used 1/2 the RAM, so you could run more apps on boxes that were RAM limited. Not to mention "all apps" use less RAM (kernel, etc.)
It also ran faster, since it traverses all its memory to do garbage collection, which fit in cache better, there was less overall RAM to traverse, looking for pointers, etc. At the same time, 64-bit is less likely to find false positives when the GC was looking for pointers, so small win for 64-bit there.
If you're real brave, you can try the hybrid x32 ABI (32 bit pointers, 64-bit registers) https://unix.stackexchange.com/questions/121424/linux-and-x32-abi-how-to-use which is meant to kind of get the best of both worlds. I'm really not sure why it isn't considered more of a popular option, it seems like a nice win to me, trade-off being you can't have more than 2GB RAM. My guess is most people aren't in a very RAM constrained environment, or that the win over just going straight "32 bit kernel" (which is well supported) isn't enough motivation? In essence, most boxes have gobs of RAM so it's not as much of a priority?

Why 64 bit mode ( Long mode ) doesn't use segment registers?

I'm a beginner level of student :) I'm studying about intel architecture,
and I'm studying a memory management such as a segmentation and paging.
I'm reading Intel's manual and it's pretty nice to understand intel's architectures.
However I'm still curious about something fundamental.
Why in the 64bit long mode, all segment registers are going to bit 0?
Why system doesn't use segment registers any longer?
Because system's 64bit of size (such as a GP registers) are enough to contain those logical address at once?
Is protection working properly in 64bit mode?
I tried to find 64bit addressing but I couldn't find in Google. Perhaps I have terrible searching skill or I may need some specfied previous knowledge to searching in google.
Hence I'd like to know why 16bit of segment registers are not going to use in 64bit mode,
and how could protection work properly in 64bit mode.
Thank you!
In a manner of speaking, when you perform array ("indexed") type addressing with general registers, you are doing essentially the same thing as the segment registers. In the bad old days of 8-bit and 16-bit programming, many applications required much more data (and occasionally more code) than a 16-bit address could reach.
So many CPUs solved this by having a larger addressable memory space than the 16-bit addresses could reach, and made those regions of memory accessible by means of "segment registers" or similar. A program would set the address in a "segment register" to an address above the (65536 byte) 16-bit address space. Then when certain instructions were executed, they would add the instruction specified address to the appropriate (or specified) "segment register" to read data (or code) beyond the range of 16-bit addresses or 16-bit offsets.
However, the situation today is opposite!
How so? Today, a 64-bit CPU can address more than (not less than) all addressable memory space. Most 64-bit CPUs today can address something like 40-bits to 48-bits of physical memory. True, there is nothing to stop them from addressing a full 64-bit memory space, but they know nobody (but the NSA) can afford that much RAM, and besides, hanging that much RAM on the CPU bus would load it down with capacitance, and slow down ALL memory accesses outside the CPU chip.
Therefore, the current generation of mainstream CPUs can address 40-bits to 48-bits of memory space, which is more than 99.999% of the market would ever imagine reaching. Note that 32-bits is 4-gigabytes (which some people do exceed today by a factor of 2, 4, 8, 16), but even 40-bits can address 256 * 4GB == 1024GB == 1TB. While 64GB of RAM is reasonable today, and perhaps even 256GB in extreme cases, 1024GB just isn't necessary except for perhaps 0.001% of applications, and is unaffordable to boot.
And if you are in that 0.001% category, just buy one of the CPUs that address 48-bits of physical memory, and you're talking 256TB... which is currently impractical because it would load down the memory bus with vastly too much capacitance (maybe even to the point the memory bus would stop completely stop working).
The point is this. When your normal addressing modes with normal 64-bit registers can already address vastly more memory than your computer can contain, the conventional reason to add segment registers vanishes.
This doesn't mean people could not find useful purposes for segment registers in 64-bit CPUs. They could. Several possibilities are evident. However, with 64-bit general registers and 64-bit address space, there is nothing that general registers could not do that segment registers can. And general purpose registers have a great many purposes, which segment registers do not. Therefore, if anyone was planning to add more registers to a modern 64-bit CPU, they would add general purpose registers (which can do "anything") rather than add very limited purpose "segment registers".
And indeed they have. As you may have noticed, AMD and Intel keep adding more [sorta] general-purpose registers to the SIMD register-file, and AMD doubled the number of [truly] general purpose registers when they designed their 64-bit x86_64 CPUs (which Intel copied).
Most answers to questions on irrelevance of segment registers in a 32/64 bit world always centers around memory addressing. We all agree that the primary purpose of segment registers was to get around address space limitation in a 16 bit DOS world. However, from a security capability perspective segment registers provide 4 rings of address space isolation, which is not available if we do 64 bit long mode, say for a 64 bit OS. This is not a problem with current popular OS's such as Windows and Linux that use only ring 0 and ring 3 with two levels of isolation. Ring 1 and 2 are sometimes part of the kernel and sometimes part of user space depending on how the code is written. With the advent of hardware virtualization (as opposed to OS virtualization) from isolation perspective, hypervisors did not quite fit in either in ring 0 or ring 1/2/3. Intel and AMD added additional instructions (e.g., INTEL VMX) for root and non-root operations of VM's.
So what is the point being made? If one is designing a new secure OS with 4 rings of isolation then we run in to problems if segmentation is disabled. As an example, we use one ring each for hardware mux code, hypervisor code /containers/VM, OS Kernel and User Space. So we can make a case for leveraging additional security afforded by segmentation based on requirements stated above. However, Intel/AMD still allow F and G segment registers to have non-zero value (i.e., segmentation is not disabled). To best of my knowledge no OS exploits this ray of hope to write more secure OS/Hypervisor for hardware virtualization.

Are 64 bit programs bigger and faster than 32 bit versions?

I suppose I am focussing on x86, but I am generally interested in the move from 32 to 64 bit.
Logically, I can see that constants and pointers, in some cases, will be larger so programs are likely to be larger. And the desire to allocate memory on word boundaries for efficiency would mean more white-space between allocations.
I have also heard that 32 bit mode on the x86 has to flush its cache when context switching due to possible overlapping 4G address spaces.
So, what are the real benefits of 64 bit?
And as a supplementary question, would 128 bit be even better?
Edit:
I have just written my first 32/64 bit program. It makes linked lists/trees of 16 byte (32b version) or 32 byte (64b version) objects and does a lot of printing to stderr - not a really useful program, and not something typical, but it is my first.
Size: 81128(32b) v 83672(64b) - so not much difference
Speed: 17s(32b) v 24s(64b) - running on 32 bit OS (OS-X 10.5.8)
Update:
I note that a new hybrid x32 ABI (Application Binary Interface) is being developed that is 64b but uses 32b pointers. For some tests it results in smaller code and faster execution than either 32b or 64b.
https://sites.google.com/site/x32abi/
I typically see a 30% speed improvement for compute-intensive code on x86-64 compared to x86. This is most likely due to the fact that we have 16 x 64 bit general purpose registers and 16 x SSE registers instead of 8 x 32 bit general purpose registers and 8 x SSE registers. This is with the Intel ICC compiler (11.1) on an x86-64 Linux - results with other compilers (e.g. gcc), or with other operating systems (e.g. Windows), may be different of course.
Unless you need to access more memory that 32b addressing will allow you, the benefits will be small, if any.
When running on 64b CPU, you get the same memory interface no matter if you are running 32b or 64b code (you are using the same cache and same BUS).
While x64 architecture has a few more registers which allows easier optimizations, this is often counteracted by the fact pointers are now larger and using any structures with pointers results in a higher memory traffic. I would estimate the increase in the overall memory usage for a 64b application compared to a 32b one to be around 15-30 %.
Regardless of the benefits, I would suggest that you always compile your program for the system's default word size (32-bit or 64-bit), since if you compile a library as a 32-bit binary and provide it on a 64-bit system, you will force anyone who wants to link with your library to provide their library (and any other library dependencies) as a 32-bit binary, when the 64-bit version is the default available. This can be quite a nuisance for everyone. When in doubt, provide both versions of your library.
As to the practical benefits of 64-bit... the most obvious is that you get a bigger address space, so if mmap a file, you can address more of it at once (and load larger files into memory). Another benefit is that, assuming the compiler does a good job of optimizing, many of your arithmetic operations can be parallelized (for example, placing two pairs of 32-bit numbers in two registers and performing two adds in single add operation), and big number computations will run more quickly. That said, the whole 64-bit vs 32-bit thing won't help you with asymptotic complexity at all, so if you are looking to optimize your code, you should probably be looking at the algorithms rather than the constant factors like this.
EDIT:
Please disregard my statement about the parallelized addition. This is not performed by an ordinary add statement... I was confusing that with some of the vectorized/SSE instructions. A more accurate benefit, aside from the larger address space, is that there are more general purpose registers, which means more local variables can be maintained in the CPU register file, which is much faster to access, than if you place the variables in the program stack (which usually means going out to the L1 cache).
I'm coding a chess engine named foolsmate. The best move extraction using a minimax-based tree search to depth 9 (from a certain position) took:
on Win32 configuration: ~17.0s;
after switching to x64 configuration: ~10.3s;
This is 41% of acceleration!
In addition to having more registers, 64-bit has SSE2 by default. This means that you can indeed perform some calculations in parallel. The SSE extensions had other goodies too. But I guess the main benefit is not having to check for the presence of the extensions. If it's x64, it has SSE2 available. ...If my memory serves me correctly.
In the specific case of x68 to x68_64, the 64 bit program will be about the same size, if not slightly smaller, use a bit more memory, and run faster. Mostly this is because x86_64 doesn't just have 64 bit registers, it also has twice as many. x86 does not have enough registers to make compiled languages as efficient as they could be, so x86 code spends a lot of instructions and memory bandwidth shifting data back and forth between registers and memory. x86_64 has much less of that, and so it takes a little less space and runs faster. Floating point and bit-twiddling vector instructions are also much more efficient in x86_64.
In general, though, 64 bit code is not necessarily any faster, and is usually larger, both for code and memory usage at runtime.
Only justification for moving your application to 64 bit is need for more memory in applications like large databases or ERP applications with at least 100s of concurrent users where 2 GB limit will be exceeded fairly quickly when applications cache for better performance. This is case specially on Windows OS where integer and long is still 32 bit (they have new variable _int64. Only pointers are 64 bit. In fact WOW64 is highly optimised on Windows x64 so that 32 bit applications run with low penalty on 64 bit Windows OS. My experience on Windows x64 is 32 bit application version run 10-15% faster than 64 bit since in former case at least for proprietary memory databases you can use pointer arithmatic for maintaining b-tree (most processor intensive part of database systems). Compuatation intensive applications which require large decimals for highest accuracy not afforded by double on 32-64 bit operating system. These applications can use _int64 in natively instead of software emulation. Of course large disk based databases will also show improvement over 32 bit simply due to ability to use large memory for caching query plans and so on.
Any applications that require CPU usage such as transcoding, display performance and media rendering, whether it be audio or visual, will certainly require (at this point) and benefit from using 64 bit versus 32 bit due to the CPU's ability to deal with the sheer amount of data being thrown at it. It's not so much a question of address space as it is the way the data is being dealt with. A 64 bit processor, given 64 bit code, is going to perform better, especially with mathematically difficult things like transcoding and VoIP data - in fact, any sort of 'math' applications should benefit by the usage of 64 bit CPUs and operating systems. Prove me wrong.
On my machine, same h265 encode works almost twice as fast using virtulDub_x64 (with x64 h265 library) vs virtulDub_x32 (regular x32 h265 library). That's probably because longint (64bits) numbers operations (ie: add) can be done on a single instruction on x64, but on 32bit needs two: add lower part, and then add (with carry) the higher part. So unless integer maths are limited to 32bit integers, most of it will take more time under x32.

64-bits and Memory Bandwidth

Mason asked about the advantages of a 64-bit processor.
Well, an obvious disadvantage is that you have to move more bits around. And given that memory accesses are a serious issue these days[1], moving around twice as much memory for a fair number of operations can't be a good thing.
But how bad is the effect of this, really? And what makes up for it? Or should I be running all my small apps on 32-bit machines?
I should mention that I'm considering, in particular, the case where one has a choice of running 32- or 64-bit on the same machine, so in either mode the bandwidth to main memory is the same.
[1]: And even fifteen years ago, for that matter. I remember talk as far back as that about good cache behaviour, and also particularly that the Alpha CPUs that won all the benchmarks had a giant, for the time, 8 MB of L2 cache.
Whether your app should be 64-bit depends a lot on what kind of computation it does. If you need to process very large data sets, you obviously need 64-bit pointers. If not, you need to know whether your app spends relatively more time doing arithmetic or memory accesses. On x86-64, the general purpose registers are not only twice as wide, there are twice as many and they are more "general purpose". This means that 64-bit code can have much better integer op performance. However, if your code doesn't need the extra register space, you'll probably see better performance by using smaller pointers and data, due to increased cache effectiveness. If your app is dominated by floating point operations, there probably isn't much point in making it 32-bit, because most of the memory accesses will be for wide vectors anyways, and having the extra SSE registers will help.
Most 64-bit programming environments use the "LP64" model, meaning that only pointers and long int variables (if you're a C/C++ programmer) are 64 bits. Integers (ints) remain 32-bits unless you're in the "ILP64" model, which is fairly uncommon.
I only bring it up because most int variables aren't being used for size_t-like purposes--that is, they stay within ranges comfortably held by 32 bits. For variables of that nature, you'll never be able to tell the difference.
If you're doing numerical or data-heavy work with > 4GB of data, you'll need 64 bits anyways. If you're not, you won't notice the difference, unless you're in the habit of using longs where most would use ints.
I think you're starting off with a bad assumption here. You say:
moving around twice as much memory
for a fair number of operations can't
be a good thing
and the first question is ask is "why not"? In a true 64 bit machine, the data path is 64 bits wide, and so moving 64 bits takes exactly (to a first approximation) as many cycles as moving 32 bits on a 32 bit machine. So, if you need to move 128 bytes, it takes half as many cycles as it would take on a 32 bit machine.

endian-ness of new macs - are all pc platforms the same now?

Does the change of macs over to Intel chips mean we are done with the bit twiddling on numbers in binary resources for cross platform data distributions?
Is that the last of this problem or are there some other platforms I'm not aware of?
Well, actually there are plenty of big endian CPUs left over.
Actually the PPC is not dead. You are aware, that the Xbox360 uses PPC CPUs (and it is a good example, that these CPUs are not as bad as their reputation - the Xbox360 is anything but slow). Okay, this one may not count as a PC.
But does a server count as a PC? There are still plenty of servers using Sun's UltraSparc CPUs, that are generally big endian, though the latest models can be either big or little endian. There are many CPUs that can be either one or the other (e.g. ARM, still used in many devices like mobile phones and the like), as supporting both adds greatest flexibility for the hardware and for the software vendors. Even the IA64 CPUs (the Intanium, that was intended to replace x86 before AMD invented x86-64, that was true 64 bit and could only emulate 32 bit, unlike x86-64 that can be both) is one of the CPUs that can be switched to big endian. CPUs that can be both are called bi-endian.
Actually if you ignore Intel (and compatible CPUs) for a second, most CPUs on the market are either big endian or at least bi-endian, though most of these are not used in any consumer PCs as far as I know.
However, I see no endian problem as many programmers do. Every modern CPU can swap endian in hardware. Actually if you'd write a program on a little endian Intel CPU, that swaps endianess of every integer read from memory and again when writing back to memory, this will cause maybe a performance penalty as little as 5%; and in practice you only need to swap endianess for data coming in and going out of your application, as within your application the endianess is constant, of course.
Also note:
Almost all network protocols I know specify byte order to be big endian, TCP/IP being the most familiar family. So if you work on lower network layers, you will always have to continue swapping bytes.
You seem to forget the endianness transcends processor architectures. There are plenty of algorithms and protocols that demand a particular byte order. For example, I spent two weeks trying to get an MD5 hashing algorithm to work, only to realize that I had assumed network byte order (Big Endian) while Ronald Rivest had assumed (without stating so in the RFC) that the implementor would use Little Endian byte order.
I was thinking the same question: since Macs are now Intel, is the endian issue dead? Nope. Aside from certain supercomputers (which, let's face it, us lay-folk will never have to deal with) there is still one major area where big-endian order is used: network protocols, particularly: the Internet Protocol (as in: "IP" of TCP/IP).
This is certainly not the last of this problem, particularly if you are writing for embedded systems, including Pocket PCs, etc. MIPS, ARM, and other architectures support bi-endian architectures which can select their endian-ness on system start-up.
If you're writing code that depends on byte ordering, you need to care about endian-ness. Don't expect this "problem" to go away anytime soon.
Pesky x86's dirtying up my memory registers with their segment pointers! ;)
I believe you don't need to flip words between PCs and Macs anymore, assuming you're eschewing backwards-compatibility with PowerPC.
Now, more than ever, a person's main computer is less likely to be a deskop computer running a general purpose operating system. Although that is still quite common, many other folks are using smartphones or umpc devices that are purpose built, ie for browsing the web. These platforms do not neccesarily have x86 cpus. More often, especially with smartphone devices, they are using an ARM core, which is big endian.
Define PC, what do you consider a PC?
I am currently typing this from an Linux distribution that is running on an arm 9 processor, which can be set into different endianness, but the default is big endian. Little endian is used by Intel, AMD and Via (x86 compatible).
Endian-ness won't go away any time soon, anytime you transmit anything over the network you have to make sure that it is in the right endianness, since the endian specified by Internet Protocol is actually big endian.
See the Wikipedia article on Endianness for more information.

Resources