How were 80s computer programmes so small in size? - memory-management

When you look at programmes written for several old computers of 80s like the commodore64 , atari and NES they are extremely small in size with most ranging to a few hundred kilobytes.
Not to mention these computers had very little memory to run on ,like Commodore 64 had 64KB of RAM and yet managed to run a GUI os!
How were these programs written to be so small?
Many of them seem to be unbelievable given the hardware constraints they had.
On a commodore 64 its resolution of 320 x 200 #4bpp would have eaten up half its 64k mem
While Atari 2600 had just 128bytes of ram

In today's apps, 80% or more of the on-disk size is graphical elements.
When space was expensive relative to programmer time, programmers spent more time optimizing for size, and often went to raw assembly. Today, space is cheap, so it doesn't pay for a company to save space.
Compare notepad to Edlin. Both are the simplest reasonable text editor for their paradigm. Edlin fits program and data comfortably into less than 64 K. But there is no way one could claim notepad is a graphical Edlin.
C64 did not have a gui os. It had a rudimentary menu system, and a skilled programmer could use custom hardware sprites to overlay small graphical icons.
In low resolution mode, you had 4 bits per pixel (16 colors). In high resolution mode, you had 1 bit per pixel (monochrome). Today's systems presume 16 bits per pixel or better, at 1080p (roughly 1900 x 1080 pixels). Even monochrome displays have ballooned from 8k to we'll over 1MB. With modern displays expecting 24 bit color depth or better, the minimum storage required for a single frame is multi megabytes. Add to that the working space for buffering and other things that a modern graphics card does, and it doesn't take long for your graphics needs to run To gigabytes. There is a reason the high resolution mode on that generation of computer was rarely used.
When you loaded a program, you unloaded the os. You ran only one program at time. Today I regularly run twenty or more apps at a time, not to mention dozens of background processes necessary to do my work.

I can't speak for other systems, but for the C64 (and C128)...
As #pojo-guy stated, we often went straight to assembly, which cut down on operating system overhead, since the OS was in ROM, so it used no RAM. Moreover, you could flip ROM in and out by playing with the memory registers, virtually "doubling" the available memory (although some of that "available" memory was read-only, the key point being that you didn't waste precious RAM on OS routines). By utilizing ROM routines, and using straight assembly, a large amount of (memory) overhead was eliminated.
For bitmaps you had two choices: high resolution, or multi-colour mode. In high resolution (320 x 200), each pixel showed either the foreground, or background, colour - so you only needed 320 x 200 = 64,000 bits or 8000 bytes.
Standard multi-colour mode offered four colours at the expense of horizontal resolution. To quote the C64 programmers reference manual:
Each dot in multi-colour mode can be one of 4 colours: screen colour (background colour register #0), the colour in background colour register #1, the colour in background colour register #2, or character colour. The only sacrifice is the horizontal resolution, because each multi-colour mode dot is twice as wide a a high-resolution dot. The minimal resolution is more than compensated for by the extra capabilities of multi-colour mode.
The reduction in overhead, a simpler OS (which could be completely swapped out), and simpler functionality (e.g. 2-bit colour which allowed you to have four colours), made things much smaller. As techniques improved, coders also applied overlays: loading parts in while other parts were playing.
Also, more advanced architectures (e.g. C128) had separate video RAM (either 16K or 64K depending on your model of C128), which gave even more space to flex your coding muscles, since graphics (or text) did not take up processing space.
Look up any of the 4k demo competitions to see what can really be done a machine with such a small memory footprint.

As C64 is 8-bit computer, all assembly language commands are 8 bits long. In addition to that, they might have 0-2 data bytes after it. So each command takes 1-3 bytes of ram.
Now when we move forward to more modern systems, the CPU is already often 64bit.
Basically, all CPU's have some "preferred sizes" for variables (which it can handle efficiently), and usually, it's exactly same number as how many "bits" does your processor have. This is usually what "int" is in C (with the exception, that it's guaranteed to always be at least 16bit, while obviously the "preferred" size on 8-bit CPUs is 8 bits)
So for an integer, no matter how small it might be, it's most efficient to use this "preferred size".
So on 8-bit systems, that would be 8 bits (obviously can't be int), and on 64bit systems, that would be 64 bits. So that's 8x in size.
Of course you can use smaller types, but often it's less efficient, and often, this also affects struct padding too.
But with pointers, you're often stuck with the amount of bits the CPU has (as you need to be able to address the whole memory range)
And while data values are generally bigger, so are assembly language commands. On the other hand, that allows for more complex commands, which can perform operations, that would need more 8-bit commands.
Of course there are exceptions, like thumb command set on ARM.
What I am saying is, efficient assembly language code on a modern platform takes more space than C64 assembly language (but is less restricted, and can do fancy stuff, such as multiply / divide etc.)
As for graphical operating systems on C64, the 2 best known ones are GEOS and Contiki. Final Cartridge 3 also had a built in windowing system, but iirc, only allowed built-in programs, and there wasn't anything that useful.
GEOS is rather "restricted", doesn't do any real multitasking (you can select which program is displayed in the main area of the window, but f.ex. clock is always running), and that's basically it. And even if it is restricted, there is f.ex. rather nice word processor (GEOWrite) for it, which I used back in the day.
Contiki is more "modern" (and actually mostly written in C iirc), and it's actually much more simple than you might think. It runs in character gfx mode (so 1000 bytes for onscreen graphics 0 2k for charset, and 1k color ram), so that's only 4k wasted there.
And I'd say Contiki is more of "proof of concept" than actually useful operating system, but unlike GEOS, does real (co-operative) multitasking.
I guess you're seriously overestimating what would be needed for a really simple graphical operating system. Instead, you could compare to AmigaOS, which was very modern for its time, and still rather small, and runs on CPU that's (internally) 32bit, so much closer to modern processors.

Related

Are bytes real?

I know that this question may sound stupid, but let me just explain. So...
Everyone knows that byte is 8 bits. Simple, right? But where exactly is it specified? I mean, phisically you don't really use bytes, but bits. For example drives. As I understand, it's just a reaaaaly long string of ones and zeros and NOT bytes. Sure, there are sectors, but, as far as I know, there are programmed at software level (at least in SSDs, I think). Also RAM, which is again - a long stream of ones and zeros. Another example is CPU. It doesn't process 8 bits at a time, but only one.
So where exactly is it specified? Or is it just general rule, which everyone follows? If so, could I make system (either operating system or even something at lower level) that would use, let's say, 9 bits in a byte? Or I wouldn't have to? Also - why can't you use less than a byte of memory? Or maybe you can? For example: is it possible for two applications to use the same byte (e.g. first one uses 4 bits and second one uses other 4)? And last, but not least - does computer drives really use bytes? Or is it that, for example, bits 1-8 belong to something, next to them there are some 3 random bits and bits 12-20 belong to something different?
I know that there are a lot of question and knowing answers to these questions doesn't change anything, but I was just wondering.
EDIT: Ok, I might've expressed myself not clear enough. I know that byte is just a concept (well, even bit is just a concept that we make real). I'm NOT asking why there are 8 bits in byte and why bytes exist as a term. What I'm asking is where in a computer is byte defined or if it even is defined. If bytes really are defined somewhere, at what level (a hardware level, OS level, programming language level or just at application level)? I'm also asking if computers even care about bytes (in that concept that we've made real), if they use bytes constantly (like in between two bytes, can there be some 3 random bits?).
Yes, they’re real insofaras they have a definition and a standardised use/understanding. The Wikipedia article for byte says:
The modern de-facto standard of eight bits, as documented in ISO/IEC 2382-1:1993, is a convenient power of two permitting the values 0 through 255 for one byte (2 in power of 8 = 256, where zero signifies a number as well).[7] The international standard IEC 80000-13 codified this common meaning. Many types of applications use information representable in eight or fewer bits and processor designers optimize for this common usage. The popularity of major commercial computing architectures has aided in the ubiquitous acceptance of the eight-bit size.[8] Modern architectures typically use 32- or 64-bit words, built of four or eight bytes
The full article is probably worth reading. No one set out stall 50+ years ago, banged a fist on the desk and said ‘a byte shallt be 8 bits’ but it became that way over time, with popular microprocessors being able to carry out operations on 8 bits at a time. Subsequent processor architectures carry out ops on multiples of this. While I’m sure intel could make their next chip a 100bit capable one, I think the next bitness revolution we’ll encounter will be 128
Everyone knows that byte is 8 bits?
These days, yes
But where exactly is it specified?
See above for the ISO code
I mean, phisically you don't really use bytes, but bits.
Physically we don’t use bits either, but a threshold of detectable magnetic field strength on a rust coated sheet of aluminium, or an amount of electrical charge storage
As I understand, it's just a reaaaaly long string of ones and zeros and NOT bytes.
True, everything to a computer is a really long stream of 0 and 1. What is important in defining anything else is where to stop counting this group of 0 or 1, and start counting the next group, and what you call the group. A byte is a group of 8 bits. We group things for convenience. It’s a lot more inconvenient to carry 24 tins of beer home than a single box containing 24 tins
Sure, there are sectors, but, as far as I know, there are programmed at software level (at least in SSDs, I think)
Sectors and bytes are analogous in that they represent a grouping of something, but they aren’t necessarily directly related in the way that bits and bytes are because sectors are a level of grouping on top of bytes. Over time the meaning of a sector as a segment of a track (a reference to a platter number and a distance from the centre of the platter) has changed as the march of progress has done away with positional addressing and later even rotational storage. In computing you’ll typically find that there is a base level that is hard to use, so someone builds a level of abstraction on top of it, and that becomes the new “hard to use”, so it’s abstracted again, and again.
Also RAM, which is again - a long stream of ones and zeros
Yes, and is consequently hard to use, so it’s abstracted, and abstracted again. Your program doesn’t concern itself with raising the charge level of some capacitive area of a memory chip, it uses the abstractions it has access to, and that attraction fiddles the next level down, and so on until the magic happens at the bottom of the hierarchy. Where you stop on this downward journey is largely a question of definition and arbitrary choice. I don’t usually consider my ram chips as something like ice cube trays full of electrons, or the subatomic quanta, but I could I suppose. We normally stop when it ceases to useful to solving the
Problem
Another example is CPU. It doesn't process 8 bits at a time, but only one.
That largely depends on your definition of ‘at a time’ - most of this question is about the definitions of various things. If we arbitrarily decide that ‘at a time’ is the unit block of the multiple picoseconds it takes the cpu to complete a single cycle then yes, a CPU can operate on multiple bits of information at once - it’s the whole idea of having a multiple bit cpu that can add two 32 bit numbers together and not forget bits. If you want to slice the time up so precisely that we can determine that enough charge has flowed to here but not there then you could say which bit the cpu is operating on right at this pico (or smaller) second, but it’s not useful to go so fine grained because nothing will happen until the end of the time slice the cpu is waiting for.
Suffice to say, when we divide time just enough to observe a single cpu cycle from start to finish, we can say the cpu is operating on more than one bit.
If you write at one letter per second, and I close my eyes for 2 out of every 3 seconds, I’ll see you write a whole 3 letter word “at the same time” - you write “the cat sat onn the mat” and to the observer, you generated each word simultaneously.
CPUs run cycles for similar reasons, they operate on the flow and buildup of electrical charge and you have to wait a certain amount of time for the charge to build up so that it triggers the next set of logical gates to open/close and direct the charge elsewhere. Faster CPUs are basically more sensitive circuitry; the rate of flow of charge is relatively constant, it’s the time you’re prepared to wait for input to flow from here to there, for that bucket to fill with just enough charge, that shortens with increasing MHz. Once enough charge has accumulated, bump! Something happens, and multiple things are processed “at the same time”
So where exactly is it specified? Or is it just general rule, which everyone follows?
it was the general rule, then it was specified to make sure it carried on being the general rule
If so, could I make system (either operating system or even something at lower level) that would use, let's say, 9 bits in a byte? Or I wouldn't have to?
You could, but you’d essentially have to write an adaptation(abstraction) of an existing processor architecture and you’d use nine 8bit bytes to achieve your presentation of eight 9bit bytes. You’re creating an abstraction on top of an abstraction and boundaries of basic building blocks don’t align. You’d have a lot of work to do to see the system out to completion, and you wouldn’t bother.
In the real world, if ice cube trays made 8 cubes at a time but you thought the optimal number for a person to have in the freezer was 9, you’d buy 9 trays, freeze them and make 72 cubes, then divvy them up into 8 bags, and sell them that way. If someone turned up with 9 cubes worth of water (it melted), you’d have to split it over 2 trays, freeze it, give it back.. this constant adaptation between your industry provided 8 slot trays and your desire to process 9 cubes is the adaptive abstraction
If you do do it, maybe call it a nyte? :)
Also - why can't you use less than a byte of memory? Or maybe you can?
You can, you just have to work with the limitations of the existing abstraction being 8 bits. If you have 8 Boolean values to store you can code things up so you flip bits of the byte on and off, so even though you’re stuck with your 8 cube ice tray you can selectively fill and empty each cube. If your program only ever needs 7 Booleans, you might have to accept the wastage of the other bit. Or maybe you’ll use it in combination with a regular 32 bit int to keep track of a 33bit integer value. Lot of work though, writing an adaptation that knows to progress onto the 33rd bit rather than just throw an overflow error when you try to add 1 to 4,294,967,295. Memory is plentiful enough that you’d waste the bit, and waste another 31bits using a 64bit integer to hold your 4,294,967,296 value.
Generally, resource is so plentiful these days that we don’t care to waste a few bits.. It isn’t always so, of course: take credit card terminals sending data over slow lines. Every bit counts for speed, so the ancient protocols for info interchange with the bank might well use different bits of the same byte to code up multiple things
For example: is it possible for two applications to use the same byte (e.g. first one uses 4 bits and second one uses other 4)?
No, because hardware and OS memory management these days keeps programs separate for security and stability. In the olden days though, one program could write to another program’s memory (it’s how we cheated at games, see the lives counter go down, just overwrite a new value), so in those days if two programs could behave, and one would only write to the 4 high bits and the other the 4 low bits then yes, they could have shared a byte. Access would probably be whole byte though, so each program would have to read the whole byte, only change its own bits of it, then write the entire result back
And last, but not least - does computer drives really use bytes? Or is it that, for example, bits 1-8 belong to something, next to them there are some 3 random bits and bits 12-20 belong to something different?
Probably not, but you’ll never know because you don’t get to peek to that level of abstraction enough to see the disk laid out as a sequence of bits and know where the byte boundaries are, or sector boundaries, and whether this logical sector follows that logical sector, or whether a defect in the disk surface means the sectors don’t follow on from each other. You don’t typically care though, because you treat the drive as a contiguous array of bytes (etc) and let its controller worry about where the bits are

Which library/code is responsible for rendering the terminal in retro computers?

For example as you type, which library is telling the computer screen to display the respective ascii character and to move the cursor accordingly?
Imagine something like the old school computers (with no GUI) running DOS or Basic... what/which library is responsible for the UI?
Links to source code would be great for understanding how said library(ies) works.
The photo you have posted is of a BBC Micro running in Mode 7. This was an exception to most rules. Mode 7 was a low memory mode, in which there where no pixels, just 256 text characters. 1K of memory was reserved in RAM to contain what was displayed on the screen at that moment. A special chip on the circuit board, called the Video ULA (Uncommited Logic Array) read the contents of that memory and coded it to the output. The ULA was ROM and could not be changed by the programmer.
The ZX81 worked in a similar way: 256 possible text characters and no pixels. However the ZX81 had less dedicated chips and the main CPU did most of the work.
A more common setup was that every pixel was represented by a number of bits in memory (often more than one bit per pixel was needed because colours had to be indicated). Examples are BBC in modes 1-6; the Acorn Electron; Spectrum; C64; also many others. When the user placed text on the screen, the computers ROM would convert this to the correct pixels. Graphics could often be written directly to the RAM, or 'plotted' via BASIC. Once again, dedicated ROM chips and circuitry would then render this memory to the output. This approach required much more memory to display.
Every 8 bit computer had its own way of representing the display in RAM. You need to get manuals of the machine you are trying to program (easy to find on internet for the better known Micros).
Many emulators are open source, if you want to see the internals. For example: https://github.com/stardot/beebem
If you're interested in seeing the internals of a terminal to better understand how it works and renders input/output, Bash is completely open source. You can download its latest source code here.

fastest blit (under winapi)

I am doing a lot of blitting (write many 2d game prototypes
in recent months), and I am searching for the fastest blit
possible Is there maybe anything faster than SetDIBitsToDevice
or StretchDIBits ? Those are about 1-5 ms as far as I remember
for usual window sizes so they are not terribly fast (hard or
impossible to write something faster than 200 fps) though
it is normal, I think, cause RAM itself is not so fast.
It depends.
The bottleneck for most machines will be pushing from system memory to graphics memory.
In many cases, there won't be any effective difference between a SetDIBitsToDevice and a BitBlt, but, in some cases, there can be. If you're running in a low color mode (e.g., 256), then it'll be faster to push 1-byte indexes than 32-bpp pixel data and have it remapped on the card. (Whether the remapping is handled in the graphics adapter or the system will depend on the driver--I assume.)
I believe the safest thing you can do is BitBlt from a device-dependent (compatible) bitmap. I don't think this will ever be worse than SetDIBitsToDevice, but it may often be a tie.
I would expect (but haven't tested) that any stretching blit may be slightly more expensive than a straight blit, unless the extra pixels are synthesized on the GPU.
You might consider some of the newer APIs, like Direct2D, which are designed to work closer with the hardware rather than present an idealistic software model.
Whichever solution you choose, I'd be prepared for big performance differences between machines.

Does caching to save memory make sense in face of swapping by the OS?

Disclaimer: I know very little about memory management or performance, and I code in C#.
Question:
Does "caching" medium-sized data (in the order of, say, dozens of MBs), especially media that will be sent at any time to a device (audio and images), on disk (instead of "keeping it in (virtual) memory"), in face of the fact that any OS will swap (maybe "page" is the correct word) unused memory to disk?
This may not have been clear, so I'll post examples.
It is mainly related to user interfaces, not network I/O.
Examples of what I'm talking about:
FooSlideshow app could store slides on disk instead of allocating virtual memory for them.
BarGame could store sounds of different, numerous events on disk and load them for playing.
BazRenderer could store bitmaps of the several layers in a composite image if they're not prone to constant changing (If only one layer changes, the rest just have to be read from disk).
Examples of what I'm not talking about:
FooPlayer caches a buffer of the song while it streams from the server.
BarBrowser caches images because the user may visit the same page.
Why I should care:
Because, let's say a slideshow, when shown fullscreen on a 1024x768 screen, with 32 bits/pixel, would spend 1024 * 768 * 32 bytes = 3 MiB (8 MiB for an HD screen). So for a 10-slides slideshow, that would be 30-80 MiB just to cache the images. A short song, converted to 16-bit sample 44.1 KHz (CD quality) would also weight that on average.
From my C# code (but it could be Java, Python, whatever), should I care about making a complex caching system to free memory whenever possible, or should I trust the OS to swap that out? (And, the result would be the same? One approach will be better than the other? Why?)

Processor architecture

While HDDs evolve and offer more and more space on less room, why are we "sticking with" 32-bit or 64-bit?
Why can't there be a e.g.: 128-bit processor?
(This is not my homework; I'm just a student interested beyond the things they teach us in informatics)
Because the difference between 32-bit and 64-bit is astronomical - it's really the difference between 232 (a ten-digit number in the billions) and 264 (a twenty-digit number in the squillions :-).
64 bits will be more than enough for decades to come.
There's very little need for this, when do you deal with numbers that large? The current addressable memory space available to 64-bit is well beyond what any machine can handle for at least a few years...and beyond that it's probably more than any desktop will hold for quite a while.
Yes, desktop memory will continue to increase, but 4 billion times what it is now? That's going to take a while...sure we'll get to 128-bit, if the whole current model isn't thrown out before then, which I see equally as likely.
Also, it's worth noting that upgrading something from 32-bit to 64-bit puts you in a performance hole immediately in most scenarios (this is a major reason Visual Studio 2010 remains 32-bit only). The same will happen with 64-bit to 128-bit. The more small objects you have, the more pointers, which are now twice as large, that's more data to pass around to do the same thing, especially if you don't need that much addressable memory space.
When we talk about an n-bit architecture we are often conflating two rather different things:
(1) n-bit addressing, e.g. a CPU with 32-bit address registers and a 32-bit address bus can address 4 GB of physical memory
(2) size of CPU internal data paths and general purpose registers, e.g. a CPU with 32-bit internal architecture has 32-bit registers, 32-bit integer ALUs, 32-bit internal data paths, etc
In many cases (1) and (2) are the same, but there are plenty of exceptions and this may become increasingly the case, e.g. we may not need more than 64-bit addressing for the forseeable future, but we may want > 64 bits for registers and data paths (this is already the case with many CPUs with SIMD support).
So, in short, you need to be careful when you talk about, e.g. a "64-bit CPU" - it can mean different things in different contexts.
Cost. Also, what do you think the 128-bit architecture will get you? Memory addressing and such, but to handle it effectively, you need higher bandwidth buses and basically some new instruction languages that handle it. 64-bit is more than enough for addressing (18446744073709551616 bytes).
HDDs still have a bit of ground to catchup to RAM and such. They're still going to be the IO bottleneck I think. Plus, newer chips are just supporting more cores rather than making a massive change to the language.
Well, I happen to be a professional computer architect (my inventions are probably in the computer you are reading this on), and although I have not yet been paid to work on any processor with more than 64 bits of address, I know some of my friends who have been.
And I have been playing around with 128 bit architectures for fun for a few decades.
I.e. its already happening.
Actually, it has already happened to a limited extent. The HP Precision Architecture, Intel Itanium, and the higher end versions of the IBM Power line, have what I call a folded virtual memory. I have described these elsewhere, e.g. in comp.arch posts in some details, http://groups.google.com/group/comp.arch/browse_thread/thread/53a7396f56860e17/f62404dd5782f309?lnk=gst&q=folded+virtual+memory#f62404dd5782f309
I need to create a comp-arch.net wiki post for these.
But you can get the manuals for these processors and read them yourself.
E.g. you might start with a 64 bit user virtual address.
The upper 8 bits may be used to index a region table, that returns an upper 24 bits that is concatenated with the remaining 64-8=56 bits to produce an 80 bit expanded virtual address. Which is then translated by TLBs and page tables and hash lookups, as usual,
to whatever your physical address is.
Why go from 64->80?
One reason is shared libraries. You may want to have the shared libraries to stay at the same expanded virtual address in all processors, so that you cam share TLB entries. But you may be required, by your language tools, to relocate them to different user virtual addresses. Folded virtual addresses allow this.
Folded virtual addresses are not true >64 bit virtual addresses usable by the user.
For that matter, there are many proposals for >64 bit pointers: e.g. I worked on one where a pointer consisted of a 64bit address, and 64 bit lower and upper bounds, and metadata, for a total of 128 bits. Bounds checking. But, although these have >64 bit pointers or capabilities, they are not truly >64 bit virtual addresses.
Linus posts about 128 bit virtual addresses at http://www.realworldtech.com/beta/forums/index.cfm?action=detail&id=103574&threadid=103545&roomid=2
I'd also like to offer a computer architect's view of why 128bit is impractical at the moment:
Energy cost. See Bill Dally's presentations on how today, most energy in processors is spent moving data around (dissipated in the wires). However, since the most significant bits of a 128bit computation should change little, it should mitigate this problem.
Most arithmetic operations have a non-linear cost w.r.t operand size:
a. A tree multiplier has space complexity n^2, w.r.t. number of bits.
b. The delay of a hierarchical carry look ahead adder is Log[n] w.r.t number of bits (I think). So a 128bit adder will be slower than a 64bit add. Can anyone give some hard numbers (Log[n] seems very cheap) ?
Few programs use 128bit integers or quad precision floating point, and when they do, there are efficient ways to compose them from 32 or 64bit ops.
The next big thing in processor's architecture will be quantum computing. Instead of beeing just 0 or 1, a qbit has a probability of being 0 or 1.
This will lead to huge improvements in the performance of algorithm (for instance, it will be very easy to crack down any RSA private/public key).
Check http://en.wikipedia.org/wiki/Quantum_computer for more information and see you in 15 years ;-)
The main need for a 64 bit processor is to address more memory - and that is the driving force to switch to 64 bit. On 32 bit systems, you can really only address 4Gb of RAM, at least per process. 4Gb is not much.
64 bits give you an address space of several petabytes.(though, a lot of current 64 bit hardware can address "only" 48 bits - thats still enough to support 256 terrabytes of ram though).
Upping the natural integer sizes for a processor does not automatically make it "better" though. There are tradeoffs. With 128bit you'd need twice as much storage(registers/ram/caches/etc.) compared to 64 bit for common data types - with all the drawback that might have - more ram needed to store data, more data to transmit = slower, wider buses might requires more physical space/perhaps more power, etc.

Resources