Skip to content

Lecture 8: Memory Management

1.0 Foundational Concepts

1.1 Address Binding

Address binding is the general process of mapping program addresses from one address space to another. This occurs at multiple stages in a program's lifecycle. Source code uses symbolic addresses (e.g., variable_x). The compiler binds these to relocatable addresses (e.g., '14 bytes from the start of this module'). Finally, the linker or loader binds these relocatable addresses to absolute physical addresses (e.g., 74014).

Stages of Address Binding

  1. Compile time: If the final memory location of a process is known in advance, the compiler can generate absolute code, but this code must be recompiled if the starting location ever changes.
  2. Load time: If the final memory location is not known at compile time, the compiler must generate relocatable code, and the final binding is delayed until the program is loaded into memory.
  3. Execution time: If a process can be moved from one memory segment to another during its execution, binding must be delayed until run time, which requires dedicated hardware support.

A key distinction in memory management is between a logical address (also called a virtual address), which is generated by the CPU, and a physical address, which is the address seen by the memory hardware. In compile-time and load-time binding schemes, the logical and physical addresses are identical. However, in an execution-time binding scheme, they differ, allowing the OS far greater flexibility. The hardware component responsible for managing this translation at runtime is the Memory-Management Unit.

1.2 Hardware Support for Address Protection

To ensure the correct operation of the system, the operating system must protect processes from one another's memory spaces, as well as protect itself from user processes. This protection is typically implemented in hardware.

Base and Limit Registers

A simple and effective hardware protection mechanism uses a pair of registers: a base register and a limit register. Together, they define the logical address space of a process. The base register holds the starting physical address of the process, and the limit register specifies the size of its address range. The CPU hardware checks every memory access generated in user mode to ensure it is greater than or equal to the base address and less than the base + limit address, preventing any process from accessing memory outside its allocated partition.

Memory-Management Unit (MMU)

The Memory-Management Unit (MMU) is the hardware device responsible for mapping virtual addresses to physical addresses at runtime. In a simple scheme, the MMU's relocation register (which is effectively a base register) holds the starting physical address of a process. This value is added to every logical address generated by the process to create the corresponding physical address before it is sent to main memory.

With these foundational hardware concepts in place, we can now discuss the techniques used to move processes between main memory and secondary storage.

2. Core Memory Management Techniques

2.1 Swapping

Swapping is a technique that enables the total physical memory space required by all processes to exceed the actual physical memory available on the system. It works by temporarily moving a process from main memory to a backing store (a fast disk) and bringing it back into memory for continued execution at a later time.

Key Characteristics of Swapping

  • Backing Store: The backing store must be a fast, large storage device (typically a disk) capable of accommodating the memory images of all users and providing direct access to them.
  • Roll out, roll in: This is a variant of swapping often used in priority-based scheduling. When a higher-priority process needs to run, a lower-priority process can be "rolled out" (swapped to disk) to free up memory.
  • Context Switch Time: The time it takes to transfer a process to and from the backing store is a major component of swapping and can make context switching very slow. For example, swapping a 100 MB process to a disk with a transfer rate of 50 MB/sec would take 2 seconds to swap out and another 2 seconds to swap in, for a total of 4 seconds just for the memory transfer.
  • Constraints: Swapping is constrained by certain operations. For instance, a process cannot be swapped out if it has a pending I/O operation that is using its memory space as a buffer.

While standard swapping is not used in modern operating systems like UNIX, Linux, or Windows, a modified version is common. This version is normally disabled but can be activated by the OS if the amount of free memory becomes extremely low, and it is disabled again once the demand for memory is reduced.

Standard swapping is also generally not supported on mobile systems due to the limitations of their flash memory, which includes relatively small space, a limited number of write cycles, and poor throughput between flash and the CPU. We now turn to an earlier, simpler memory allocation method that predates modern techniques.

2.2 Contiguous Memory Allocation

Contiguous allocation is an early memory management method where each process is contained within a single, continuous section of physical memory. In this scheme, main memory is typically partitioned into two areas: one for the resident operating system and one for user processes.

Multiple-Partition Allocation

In a variable-partition scheme, the OS keeps track of available blocks of memory, known as holes. When a new process arrives, the OS allocates it a hole that is large enough to accommodate it. When a process terminates, it frees its partition of memory. If this newly freed partition is adjacent to other holes, the OS combines them into a single, larger hole.

Dynamic Storage-Allocation Strategies

When multiple holes are available, the OS must decide which one to allocate. Common strategies include:

  • First-fit: Allocate the first hole in the list that is big enough.
  • Best-fit: Allocate the smallest hole that is big enough, which requires searching the entire list of holes.
  • Worst-fit: Allocate the largest hole, which also requires a full search.

Fragmentation

Contiguous allocation suffers from a significant problem known as fragmentation.

  • External Fragmentation: This occurs when there is enough total free memory to satisfy a request, but the available space is not contiguous. The memory is fragmented into many small, unusable holes. A common heuristic known as the 50-percent rule states that with N allocated blocks, another 0.5N blocks will be lost to fragmentation.
  • Internal Fragmentation: This occurs when an allocated block of memory is larger than the requested amount. The size difference is memory that is internal to the allocated partition but is not being used.

External fragmentation can be reduced by compaction, which involves shuffling the memory contents to place all free memory together in one large block. However, compaction is only possible if address binding is dynamic and performed at execution time. Specifically, the problem of external fragmentation in contiguous allocation, even with compaction, is a critical flaw that directly motivates the development of non-contiguous schemes like segmentation and paging.

2.3 Segmentation

Segmentation is a memory-management scheme that supports the user's view of memory. Instead of seeing memory as a linear array of bytes, a programmer views it as a collection of logical units called segments.

A segment can represent various logical components of a program, such as:

  • A main program
  • A procedure or function
  • A stack
  • A symbol table or common block
  • Global variables

Segmentation Architecture

In a segmented system, a logical address is a two-part tuple: <segment-number, offset>. This address is translated to a physical address using a Segment Table. Each entry in this table contains the base (the starting physical address of the segment) and the limit (the length of the segment). The system uses a Segment-table base register (STBR) to locate the segment table in memory and a Segment-table length register (STLR) to ensure the segment number is valid.

Segmentation provides a natural way to implement protection. Each entry in the segment table can have a validation bit to indicate if the segment is legal, along with read, write, and execute privileges to control access. While segmentation elegantly supports the user's view of a program, it reintroduces the dynamic storage-allocation problem. Because segments are of variable size, the system can suffer from external fragmentation, which is the exact problem that paging is designed to eliminate.

2.4 Paging

Paging is a memory-management scheme that permits a process's physical address space to be non-contiguous. This technique completely avoids external fragmentation and the need for compaction.

The core terms in paging are:

  • Frames: Fixed-sized blocks of physical memory.
  • Pages: Fixed-sized blocks of logical memory, which are the same size as frames.
  • Page Table: The data structure the OS uses to translate logical addresses into physical addresses by mapping pages to frames.
  • Address Translation: A CPU-generated logical address is divided into two parts: a page number (p) and a page offset (d). The page number is used as an index into the process's page table to find the corresponding frame number. This frame number is then combined with the page offset to form the final physical address.

While paging eliminates external fragmentation, it can still suffer from Internal Fragmentation. Because a process rarely requires a number of pages that is an exact multiple of the page size, the final page allocated will likely be only partially full. For example, consider a process of 72,766 bytes and a page size of 2,048 bytes. The process requires 35 full pages and one partial page of 1,086 bytes (72,766 / 2,048 = 35 with a remainder of 1,086). The final frame will have 2,048 - 1,086 = 962 bytes of unused space. Choosing a page size involves a trade-off: smaller frames reduce internal fragmentation but increase the number of entries needed in the page table, which increases memory overhead. A major challenge with paging is the performance overhead of address translation, which requires a hardware-based solution.

3. Advanced Paging: Performance and Structure

3.1 Optimizing Paging with TLBs

The standard implementation of paging introduces a significant performance problem: every data or instruction access requires two memory accesses—one to retrieve the frame number from the page table, and a second to access the actual data or instruction in that frame.

To solve this, modern systems use Translation Look-aside Buffers (TLBs), a special, fast-lookup hardware cache also known as associative memory. The TLB contains a small number of recently used page-to-frame mappings. When the CPU generates a logical address, the hardware first checks the TLB for the page number. If it is present (a "TLB hit"), the frame number is retrieved almost instantly, and only one memory access is needed. If it is not present (a "TLB miss"), the hardware must perform a standard lookup in the main memory page table and then adds the mapping to the TLB.

The Effective Access Time (EAT) demonstrates the benefit of a TLB. For example, with a 100-nanosecond memory access time and assuming a near-instantaneous TLB lookup, the time for a TLB hit is 100ns (for the single memory access) and the time for a TLB miss is 200ns (one access for the page table, and a second for the data). If the TLB has a 99% hit ratio, the EAT would be (0.99 * 100ns) + (0.01 * 200ns) = 101ns. This is a dramatic improvement over the 200ns required for every access without a TLB.

To further improve performance, some TLBs use Address-space identifiers (ASIDs). An ASID uniquely identifies each process, allowing the TLB to hold entries for multiple processes simultaneously and avoiding the need to flush the entire cache on every context switch. From optimizing performance, we now turn to implementing protection and sharing in a paged environment.

3.2 Protection and Shared Pages

Memory protection in a paged environment is typically implemented by associating protection information with each entry in the page table.

A valid-invalid bit is attached to each entry. A "valid" bit indicates that the associated page is part of the process's legal logical address space. An "invalid" bit signifies that the page is not, and any attempt to access it will trigger a trap to the operating system.

Paging also enables an efficient mechanism for sharing common code. Shared Pages allow multiple processes to map to the same physical frames of memory. This is particularly useful for read-only (reentrant) code, such as text editors, compilers, or system libraries. Each process has its own page table, but the entries for the shared code all point to the same physical frames, saving a significant amount of memory. While the concept of paging is powerful, the page tables themselves can become very large, necessitating more sophisticated structures.

3.3 Advanced Page Table Structures

A simple, flat page table is often impractical for systems with large logical address spaces. For example, a 32-bit address space with 4 KB pages would require a page table with over 1 million entries. If each entry takes 4 bytes, the page table alone would consume 4 MB of memory for every single process. To address this, modern systems use more advanced page table structures.

  • Hierarchical Paging: This technique breaks the logical address space into multiple levels of page tables. In a common two-level scheme, the page table itself is paged. The page number is split into two parts: one indexes an outer page table, and the second indexes an inner page table. This allows the OS to only allocate memory for the inner page tables that are actually in use.
  • Hashed Page Tables: Common for address spaces larger than 32 bits, this method hashes the virtual page number into a page table. The table contains a chain of elements for handling collisions. The system searches this chain for a match to find the corresponding physical frame.
  • Inverted Page Tables: This approach reverses the standard structure. Instead of one page table per process, the system has a single inverted page table with one entry for each physical frame of memory. Each entry contains the virtual address of the page stored in that frame and the process that owns it. This drastically reduces the memory needed for page tables but increases the time required to search for a mapping, often requiring a hash table to speed up the lookup.

These structures represent a fundamental trade-off: Hierarchical Paging reduces memory usage for sparse address spaces at the cost of multiple memory accesses per lookup. Inverted Page Tables drastically reduce table memory overhead system-wide, but at the cost of a much slower lookup process that necessitates its own optimizations like hashing.

4. Real-World Architecture Examples

This section provides a high-level overview of how the memory management concepts discussed are implemented in the dominant commercial CPU architectures from Intel and ARM.

4.1 Intel IA-32 and x86-64 Architectures

The Intel IA-32 architecture supports both segmentation and paging. In this model, the CPU generates a logical address that is first passed to a segmentation unit. This unit produces a linear address, which is then passed to a paging unit. The paging unit translates the linear address into the final physical address in main memory.

To overcome the 4 GB memory limit of 32-bit systems, Intel introduced the Page Address Extension (PAE). PAE uses a three-level paging scheme to increase the physical address space to 36 bits, allowing access to up to 64 GB of memory.

The modern x86-64 architecture extends this further. In practice, it implements 48-bit virtual addressing using a four-level paging hierarchy, which is sufficient to address an enormous amount of memory. By using PAE, x86-64 systems can also support 48-bit virtual addresses with 52-bit physical addresses, allowing the system to address more physical RAM than its virtual address space would suggest.

4.2 ARM Architecture

The ARM architecture, which dominates mobile platforms like iOS and Android devices, also features a sophisticated MMU. It supports multiple page and section sizes (4 KB, 16 KB, 1 MB, 16 MB). The memory management approach is flexible: it uses one-level paging for large 1 MB or 16 MB sections and a two-level paging scheme for smaller 4 KB or 16 KB pages.

To optimize performance, ARM CPUs implement a two-level TLB structure. This consists of very fast "micro" TLBs (one for instructions and one for data) and a larger, unified main TLB. On a lookup, the micro TLBs are checked first before the main TLB, minimizing the need for a full, time-consuming page table walk.

5. Key Terms and Concepts Summary

TermDefinition
Logical AddressA CPU-generated address, also referred to as a virtual address.
Physical AddressThe address seen by the memory unit that corresponds to a physical location in RAM.
MMU (Memory-Management Unit)The hardware device that maps virtual addresses to physical addresses at runtime.
SwappingA technique to temporarily move a process out of memory to a backing store.
Contiguous AllocationAn early memory allocation method where each process occupies a single, continuous block of memory.
External FragmentationOccurs when total free memory exists to satisfy a request, but it is not in a single continuous block.
Internal FragmentationUnused memory that is internal to an allocated partition because the partition is larger than requested.
CompactionThe process of shuffling memory contents to place all free memory together in one large block.
SegmentationA memory-management scheme that views memory as a collection of logical units called segments.
PagingA memory-management scheme that allows a process's physical address space to be non-contiguous.
Page TableA data structure used to store the mapping between the logical pages of a process and physical frames.
FrameA fixed-sized block of physical memory.
TLB (Translation Look-aside Buffer)A special, fast-lookup hardware cache used to accelerate the translation of logical to physical addresses.
Shared PagesA mechanism where a single copy of read-only code can be shared among multiple processes.
Hierarchical PagingA multi-level page table structure, such as a two-level page table, used to manage large address spaces.
Inverted Page TableA page table structure that has one entry for each physical frame of memory in the system.