John Mashey on 64-bit computing

John Mashey post in comp.arch, with formating added by me.

Date: 19 Dec 2003 21:07:56 -0800
Message-ID: <ce9d692b.0312192107.325394c0@posting.google.com>

as promised, here's the original. I'll follow up (in couple weeks) with a 12-years-later rerospective summarizing what happened later.

There are a few typos in this, but most of it was pretty close, unsurprising as lots of people knew all this; I just wrote it down.

In mid-1991, I somehow :-) missed predicting that true 64-bit micros:

a) Would rapidly become crucial to the Internet, i.e., via many CISCO routers.

b) Would ship in low cost videogames (Nintendo N64) by 1996.

c) Would end up in disk controllers, wireless chips, set top boxes, communications processors, laser printers, etc.

64-BIT COMPUTING

What is a 64-bit microprocessor?
Why would you want one, especially in a personal system?

John R. Mashey
BYTE Magazine - September 1991, 135-142.

Today's most popular computers are built around 32-bit microprocessors. The next generation of chips - 64-bit microprocessors - will bring even more power to the desktop.

But what does it mean to call a chip 64-bit? It's easy to get confused, because different numbers of bits are used in different parts of a microprocessor. (see text box "What's in a Chip?" on page 138). Although the MIPS R4000 is currently the only 64-bit microprocessor, 64 bits is almost certainly a coming trend. At microprocessor conferences, sessions on the future of chip technology routinely predict widespread use of true 64-bit microprocessors by 1995 or earlier.

You may be thinking, "My PC software still runs in 16-bit mode and it will be years before the software catches up with 32 bits. But 64 bits? People who predict widespread use of true 64-bit microprocessors by 1995 must be raving lunatics!"

There are two reasons for the prediction: 64-bit integer processing and convenient use of more than 32 bits of address space. The first reason is a straightforward performance issue; the second has more widespread implications. As you'll see, applications for 64-bit microprocessors exist for both servers and desktops.

CPU architectures

When it comes to CPU architectures, it helps to distinguish between Instruction Set Architecture, which presents an assembly language programmer's view of a processor, and hardware implementations of that ISA. Successful ISAs persist unchanged or evolve in an upward-compatible direction for years. Distinct implementations are often built to yield different cost/performance points. At times people get confused about the difference between ISA and implementation sizes. Table 1 may help clear up the confusion.

In figure 1, the CPU's integer registers are R bits wide. Address arithmetic starts with R bits, either producing a virtual address size of V bits (V is the generated user address, V <= R) or using a segment register to expand R bits to V bits. The memory management unit translates V bits of virtual address to A bits of physical address that are actually used to access memory. For each access, up to D bits are transferred (i.e., the data bus is D bits wide). For user-level programs, R and V are programmer-visible properties of the ISA; A and D are usually less-visible implementation-specific characteristics. (Floating-point register size is almost always 64 or 80, and so is not included.)

========================================================================
Figure 1.  Efficient address arithmetic is limited by the integer
register width (R).  Actual memory access is limited by the virtual
address size (V) and address bus width (A).  Efficient memory I/O
size is limited by the data bus width (D).

    ------------------------------------------------
    | Segmentation       CPU     Integer registers |
    |(on some machines)                (R bits)    |
    |       |                              |       |
    |       V                              V       |
    |      Generated virtual address (V bits)      |
    |                       |                      |
    |                       V                      |
    |          Memory Management Unit              |
    ------------------------------------------------
           Physical Address |       ^  Data
           (A bits)         |       |  (D Bits)
                            V       V
                  External Memory System
=======================================================================

Table 1 lists numbers for well-known computer families. For simplicity, V is only given for user-level programs. The table shows that physical address size (A) and data bus size can vary within a processor family. The IBM S/360 family included five data bus sizes (8 to 128 bits); the 32-bit Intel 386 is sold in two sizes - 32 and 16.

========================================================
Table 1: The size that a microprocessor is called is generally the
integer register size.

CPU                             ISA          Hardware
                         Characteristics    implementation
                Year       Integer  Gen'd   Phys Data
            Released       register user    addr  bus
                     Size    size   addr    size  size
                   Called     (R)   (V)     (A)  (D)
--------------------------------------------------------
DEC PDP-11/45   1973  16       16   16*     18   32
DEC PDP-11/70   1976  16       16   16*     22   32
DEC VAX 11/780  1978  32       32   31      32   64

IBM S/360       1964  32       32   24      24  8-128
IBM S/370XA     1983  32       32   31      32  128  
IBM ESA/370     1988  32       32   31*     32  128

IBM RISC        1990  32       32   32*     32  64-128
   System/6000

HP Precision    1986  32       32   32*     32  32-64
 
Intel 386DX     1985  32       32   32*     32   32
Intel 386SX     1987  32       32   32*     24   16
Intel 860       1989  64       32   32      32   64
Intel 486DX     1989  32       32   32*     32   32
Intel 486SX     1991  32       32   32*     32   32

MIPS R2000      1986  32       32   31      32   32
MIPS R4000      1990  64       64  40-62    36   64

Motorola 68000  1980  32       32   24      24   16
Motorola 68020  1985  32       32   32      32   32
Motorola 68030  1987  32       32   32      32   32
Motorola 68040  1990  32       32   32      32   32

Sun SPARC       1987  32       32   32      36  32-64
========================================================
* These processors use some form of segmentation to obtain
more bits of user address space when necessary.

Better performance with bigger integers

For years, PDP-11 Unix systems have used 16-bit integers for most applications, as do many PCs. Sometimes performance can improve merely by switching to larger integers. Integer code has proved resistant to recent speedup techniques that have greatly helped floating-point performance, so any integer improvement is welcome. Some applications for 64-bit integers are the following:

Big-time addressing

Perhaps more important than using 64-bit integers for performance is the extension of memory addressing above 32 bits, enabling applications that are otherwise difficult to program. It is especially important to distinguish between virtual addressing and physical addressing.

The virtual addressing scheme often can exceed the limits of possible physical addresses. A 64-bit address can handle literally a mountain of memory: Assuming that 1 megabyte of RAM requires 1 cubic inch of space (using 4-megabit DRAM chips), 2**64 bytes would require a square mile of DRAM piled more than 300 feet high! For now, no one expects to address this much DRAM, even with next-generation 16-Mb DRAM chips, but increasing physical memory slightly beyond 32 bits is definitely a goal. With 16-Mb DRAM chips, 2**32 bytes fits into just over 1 cubic foot (not including cooling) - feasible for deskside systems.

An even more important goal is the increase of virtual addresses substantially beyond 32 bits, so you can "waste" it to make programming easier - or even just possible. Although this goal is somewhat independent of the physical memory goal, the two are related.

Database systems often spread a single file across several disks. Current SCSI disks hold up to 2 gigabytes (i.e., they use 31-bit addresses), Calculating file locations as virtual memory addresses requires integer arithmetic. Operating systems are accustomed to working around such problems, but it becomes unpleasant to make workarounds; rather than just making things work well, programmers are struggling just to make something work.

The physical address limit is an implementation choice that is often easier to change than the virtual address limit. For most computers, virtual memory limits often exceed physical limits, because the simplest, cheapest way to solve many performance problems is to add physical memory. If the virtual limit is much smaller than the physical limit, adding memory doesn't help, because software cannot take advantage of it. Of course, some processors use segmentation schemes to extend the natural size of the integer registers until they are equal to or greater than the physical address limit.

The mainframe, minicomputer, and microprocessor

Reflect on this aphorism:

Every design mistake gets made at least three times: once by mainframe people, once by minicomputer people, and then at least once by microprocessor people.

An illustrative sequence is found among IBM mainframes, DEC superminicomputers, and various microprocessors.

IBM S/360 mainframes used 32-bit integers and pointers but computed addresses only to 24 bits, thus limiting virtual (and physical) memory to 16 MB (see reference 1). This seemed reasonable at the time, as systems used core memory, not DRAM chips. A "large" mainframe (such as a 360/75) provided at most 1 MB of memory, although truly huge mainframes (360/91) might offer as much as 6 MB. In addition, most S/360s did not support virtual memory, so user programs generated physical addresses directly. There was little need to consider addresses larger than the physical address size. Although it was unfortunate that only 16MB was addressable, it was even worse to ignore the high-order 8 bits rather than trap on non-zero bits. Assembly language programmers "cleverly" took advantage of this quirk to pack 8 bits of flags with a 24-bit address pointer.

As memory became cheaper, the "adequate" 16-MB limit clearly became inadequate, especially as virtual addressing S/370s made it possible to run programs larger than physical memory. By 1983, 370-XA microprocessors added a 31-bit addressing mode for user programs but were required to retain a 24-bit mode for upward compatibility. Much software had to be rewritten to work in the 31-bit mode. I admit I was one of those "clever" programmers and was somewhat surprised to discover that a large program I wrote in 1970 is still running on many mainframes - in 24-bit compatibility mode, because it won't run any other way. "The evil that men do lives after them, the good is oft interred with their bones."

By the mid-1980s, 31-bit addressing was also viewed as insufficient for certain applications, especially databases. ESA/370 was designed with a form of segmentation to allow code to access multiple 2-gigabyte regions of memory, although it took tricky programming to do so.

In the minicomputer phase of this error, the DEC PDP-11 was a 16-bit minicomputer. Unfortunately, a single task addressed only 64 kilobytes of data and perhaps 64 KB of instructions. Gordon Bell and Craig Mudge wrote, "The biggest and most common mistake that can be made in computer design is that of not providing enough address bits for memory addressing and management. The PDP-11 followed this hallowed tradition of skimping on address bits, but was saved on the principle that a good design can evolve through at least one major change. For the PDP-11, the limited address space was solved for the short run, but not with enough finesse to support a large family of minicomputers. This was indeed a costly oversight." (See reference 2.)

Some PDP-11/70 database applications rapidly grew awkward on machines with 4 MB of memory that could only be addressed in 64-KB pieces, requiring unnatural acts to break up simple programs into pieces that would fit. Although the VAX-11/780 was not much faster than the PDP-11/70, the increased address space was such a major improvement that it essentially ended the evolution of high-end PDP-11s. In discussing the VAX-11/780, William Strecker wrote, "For many purposes, the 65-Kbyte virtual address space typically provided on minicomputers such as the PDP-11 has not been and probably will not continue to be a severe limitation. However, there are some applications whose programming is impractical in a 65-Kbyte address space, and perhaps more importantly, others whose programming is appreciably simplified by having a large address space." (See reference 3.)

Finally, we come to microprocessors. The Intel 8086 was a 16-bit architecture and thus, likely to fall prey to the same issues as the PDP-11. Fortunately, unlike the PDP-11, it at least provided a mechanism for explicit segment manipulation by the program. This made it possible for a single program to access more than 64 KB of data, although it took explicit action to do so. Personal computer programmers are familiar with the multiplicity of memory models, libraries, compiler flags, extenders, and other artifacts needed to deal with the issues.

The Motorola MC68000 started with a more straightforward programming model, since it offered 32-bit integer registers and no segmentation. However, by ignoring the high 8 bits of a 32-bit address computation, it repeated the same mistake made 15 years earlier by the IBM S/360. Once again, "clever" programmers found uses for those bits, and when the MC68020 interpreted all 32 bits, programs broke. BYTE readers may recall problems with some applications when moving from the original Macintosh to the Mac II.

The need for big computers

Two common rules of thumb are that DRAM chips gets four times bigger every three years and that virtual memory usage grows by a factor of 1.5 to 2 per year (see reference 4). Additional memory is often the cheapest and easiest solution to performance problems, but only if software can easily take advantage of it.

As the natural size of code and data reaches and then exceed some virtual address limit, the level of programming pain increases rapidly, because programmers must use more and more unnatural restructuring. If the virtual address limit is lower than the physical limit, it is especially irritating, since buying DRAM won't do you any good. Fortunately, the virtual address limit is typically larger than the physical limit, so programs may work but perhaps run slowly. In this case, you can at least add physical memory until performance becomes adequate.

There is no definite ratio between maximum task virtual-address limits and physical address limit. Conversations with many people have convinced me that a 4-to-1 ratio is reasonable (i.e., you will actually see practical programs four times bigger than physical memory) if the operating system can support them. Some people claim that a ratio of 4 to 1 is terribly conservative and that advanced file-mapping techniques (as in Multics or Mach) use up virtual memory much faster than physical memory. Certainly, in the process of chip design and simulation at Mips Computer Systems, some of our 256-MB servers routinely run programs with virtual images that are four to eight times larger (1 to 2 gigabytes). Several companies (including Mips) already sell desktops with 128 MB of memory. With 16-Mb DRAM chips, similar designs will soon hit 512 MB - enough to have programs that could use at least 4 gigabytes of virtual memory.

32-BIT CRISIS IN 1993

Consider the history of microprocessor-based servers from Mips Computer Systems and Sun Microsystems. Figure 2 shows that the 32-bit limit will become an issue even for physical memory around 1993 or 1994.

As soon as 16-Mb DRAM chips are available, some microprocessors will be sold with 2 to 4 gigabytes of main memory - in fact, just by replacing memory boards in existing cabinets. You may now be convinced that Sun and Mips designers must be crazy to think of such things; but if so, they have plenty of company from others, like those at Silicon Graphics, Hewlett Packard, and IBM. Keeping pace with DRAM growth requires appropriate CPU chips in 1991 so that tools can be debugged in 1992 and applications debugged by 1993 or 1994 - barely in time.

========================================================================
HITTING THE 32-BIT LIMIT
Figure 2: The memory sizes of a Mips machine and a Sun machine, year
by year, using a logarithmic scale.  The data points fall on a straight
line, gaining 2 bits every 3 years, as they naturally follow DRAM curves.
The top line shows virtual memory size at four times the maximum physical
memory size, hinting that large leading-edge applications may already be
pushing 32-bit limits in 1991 (and they are).  The line below shows
physical memory size at 50 percent of maximum size.  Vendors actually
sell a substantial number of such machines.

[I can't draw it here: it has a vertical size in number of bits,
with a band of points going from lower left to upper right.]

1991: 32-bit trouble for leading-edge systems
1994: 32-bit trouble for many systems
=========================================================================

Why so much memory?

Finally, look at applications that put pressure on the size of virtual memory addressing. To handle virtual memory greater than 32 bits, you either need segmentation of 64-bit integer registers.

Why 64 and not something smaller, like 48? It is difficult to introduce a new architecture that runs the C language poorly. C prefers byte-addressed machines whose number of 8-bit bytes per word is a power of 2. The use of 6 bytes per word requires slow addressing hardware and breaks many C programs, so 64 is the next step after 32.

Segmentation may or may not be an acceptable solution, but there is insufficient space here to debate the relative merits. Suffice it to say that many people with segmentation experience consider it a close encounter of a strange kind.

The following applications tend to consume virtual memory space quickly and generally prefer convenient addressing of large memory space, whether it's contiguous or sparse.

On the desktop?

Perhaps you now believe that 64-bit servers may be reasonable, but you still wonder about the desktop. Table 2 lists the application areas discussed, showing whether the primary use of 64-bit systems is for speed (either in desktop or server); for addressing large amounts of data simultaneously; or for using software in a desktop system identically to its use in a server but with less actual data. Such compatibility is likely to be crucial for CAD applications but is also important for others, if only to get software development done.

===========================================================================
APPLICABILITY OF 64 BITS
Table 2: The applicability of 64 bits differs for servers and desktop systems.

                  Server         Workstation
            ----------------  ----------------
Application Speed Addressing  Speed Compatibility
----------------------------------------------
Byte pushing  X                 X
Graphics                        X
Big integers  X       X         X
Database              X                  X
Video                           X
Image                 X                  X
CAD                   X                  X
GIS*                  X                  X
Number crunch         X         X

* Geographic information systems
==========================================================================

For most readers, 64 bits is likely to be most important as an enabling technology to bring powerful new applications to the desktop. The history of the computing industry, especially of personal computers, shows there is some merit to thinking ahead. Some of us remember when a 640-KB limit was considered huge.

As 64-bit systems become available, some of the number-crunching people will recompile their FORTRAN programs immediately, and some other developers will start working in this direction. However, I'd expect only a small fraction of applications to jump to 64 bits quickly. For example, I do not expect to see 64-bit word processors soon. [Editor's note: However, see "ASCII Goes Global," July BYTE.] As a result, and important part of 64-bit chip and software design is the ability to mix 32-bit and 64-bit programs on the same system.

Although 64-bit applications may be relatively few, some are absolutely crucial and some are indirectly important to many people. You've probably seen vendors' predictions of huge numbers of transistors per chip over the next few years. Although you may not do electrical CAD yourself, you may buy a system with those big chips; so, somewhere people will be running programs to simulate those big chips, and those programs are huge.

I often give talks that compare computers to cars, using the CPU chip as the engine, exception handling as the brakes, and so forth. What kind of car is a 64-bit computer? Think of it as a car with a four-wheel drive that you engage when necessary for better performance, but especially when faced with really tough problems, like driving up mountainsides. You wouldn't engage four-wheel drive to go to the grocery store, but when you'd need it, you'd need it very badly. Some people already have problems that require 64-bit processing, and soon more will. The necessary vehicles - 64-bit microprocessors - are on the way.

REFERENCES

1. Prasad, N.S. IBM Mainframes: Architecture and Design. New York: McGraw-Hill, 1989.

2. Bell, C. Gordon, and J. Craig Mudge. "The Evolution of the PDP-11." In Computer Engineering: A DEC View of Computer System Design, edited by C. Gordon Bell, J. Craig Mudge, and John E. McNamara. Bedford, MA: Digital Press, 1978.

3. Strecker, William D. "VAX-11/780: A Virtual Address Extension to the DEC PDP-11 Family." In Computer Engineering: A DEC View of Computer System Design, edited by C. Gordon Bell, J. Craig Mudge, and John E. McNamara. Bedford, MA: Digital Press, 1978.

4. Hennessy, John L., and David A. Patterson, Computer Architecture: A Quantitative Approach. San Mateo CA: Morgan Kaufmann, 1990.