Lecture 8: Memory Management
1.0 Fundamental Concepts
1.1 Core Requirements for Program Execution
For a program to be executed, it must adhere to several fundamental requirements related to memory:
- The program must be brought from storage (disk) and loaded into main memory.
- It must be placed within the context of a process.
- The CPU can only directly access main memory and its own registers; therefore, all instructions and data must reside in one of these locations to be processed.
1.2 Key Terminology
- Main Memory: The primary, volatile storage where programs and data are kept when they are running.
- Registers: The fastest memory, located directly on the CPU, used to hold data currently being processed.
- Cache: A small, fast memory that sits between the CPU and main memory to reduce the average time to access data.
1.3 Memory Hierarchy Performance
Accessing registers is extremely fast, often taking one CPU clock cycle or less. In contrast, accessing main memory can take many cycles, forcing the CPU to wait, an event known as a stall. To mitigate this performance bottleneck, a cache is used. It stores frequently accessed data from main memory, allowing the CPU to retrieve it much faster and avoid stalls.
1.4 Memory Protection
Hardware address protection is essential to ensure that a process operates only within its own allocated memory space. This is achieved using two dedicated registers:
- A base register holds the smallest legal physical memory address for a process.
- A limit register specifies the size of the process's logical address space.
To ensure protection, every memory access generated by a user process is checked by the hardware. The logical address (addr) must satisfy the condition 0 <= addr < limit_register. If this check fails, a trap to the operating system occurs. If the check succeeds, the logical address is added to the base register (base_register + addr) to create the final physical address, which is then sent to memory.
2.0 Address Binding
2.1 Definition
Address Binding is the process of mapping program addresses from one address space to another. This typically involves translating symbolic addresses used in source code to relocatable addresses during compilation, and finally to absolute physical addresses when the program is loaded or executed.
2.2 Binding Stages
The binding of instructions and data to memory addresses can occur at three distinct stages in a program's lifecycle:
- Compile time: If the final memory location of the process is known in advance, the compiler can generate absolute code containing the final physical addresses. If the starting location ever changes, the code must be recompiled.
- Load time: If the final memory location is not known at compile time, the compiler must generate relocatable code. The final binding to absolute physical addresses is delayed until the program is loaded into memory.
- Execution time: If a process can be moved from one memory segment to another during its execution, binding is delayed until run time. This approach requires hardware support, such as base and limit registers, to map logical addresses to physical ones dynamically.
2.3 Address Types
- Symbolic addresses: Addresses used in source code, such as variable or function names.
- Relocatable addresses: Addresses relative to the start of a program module (e.g., "14 bytes from the beginning of this module").
- Absolute addresses: The final physical addresses in main memory.
3.0 Logical vs. Physical Address Space
3.1 Definitions
The distinction between logical and physical addresses is central to modern memory management. Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding schemes.
| Address Type | Definition |
|---|---|
| Logical Address | An address generated by the CPU, also known as a virtual address. It represents the address from the program's perspective. |
| Physical Address | An address seen by the memory unit itself. It is the actual address in the main memory hardware. |
3.2 MMU Function
The Memory-Management Unit (MMU) is a hardware device that maps logical (virtual) addresses to physical addresses at run time. In a simple scheme, this is achieved using a relocation register (another name for the base register). The value in the relocation register is added to every logical address generated by a user process to produce the corresponding physical address before it is sent to memory. This ensures a user program only deals with logical addresses and never sees the real physical addresses. As a historical example, MS-DOS on the Intel 80x86 used four relocation registers.
4.0 Dynamic Loading and Linking
4.1 Dynamic Loading
Dynamic Loading is a technique where a routine is not loaded into memory until it is called. This provides a significant advantage: better memory-space utilization, as an unused routine is never loaded. It is particularly useful for handling infrequently occurring cases that require large amounts of code. No special support from the operating system is required, though the OS can help by providing libraries to implement dynamic loading.
4.2 Dynamic vs. Static Linking
- Static Linking: The traditional approach where system libraries are combined with the program code by the loader to create a single binary program image before execution.
- Dynamic Linking: The linking process is postponed until execution time. Systems using this approach are also known as shared libraries. When a library routine is first called, a small piece of code called a stub is used to locate the memory-resident library routine. The stub then replaces itself with the routine's actual address and executes it. This allows multiple processes to share a single copy of a library, saving memory.
5.0 Swapping
5.1 Definition and Purpose
Swapping is a mechanism where a process can be temporarily moved out of main memory to a Backing Store to free up memory for other processes. The process can then be brought back into memory for continued execution. A backing store is typically a fast disk that is large enough to hold copies of the memory images for all users.
5.2 Process and Performance
A common swapping variant is "Roll out, roll in," which is used in priority-based scheduling. A lower-priority process is swapped out so that a higher-priority process can be loaded and executed. The most significant component of swap time is transfer time, which is directly proportional to the amount of memory being moved between main memory and the backing store.
5.3 Constraints on Swapping
A key constraint on swapping is handling pending I/O. A process that has an I/O operation in progress cannot be swapped out, as the I/O would then occur into the memory space of a different process. This can be managed by only performing I/O into kernel buffers, but this adds overhead (double buffering).
5.4 Swapping in Mobile Systems
Traditional swapping is generally not supported on mobile systems like iOS and Android. This is due to the limitations of flash memory, which has a finite number of write cycles and lower throughput. Instead, these operating systems use alternative strategies:
- iOS asks applications to voluntarily relinquish memory when levels are low. If an app fails to comply, it may be terminated.
- Android may terminate an application to free memory, but it first saves the application's state to flash storage, allowing for a fast restart.
6.0 Contiguous Memory Allocation
6.1 Definition
Contiguous Allocation is an early memory management method where each process is contained in a single, contiguous section of memory. Main memory is typically divided into two partitions:
- The resident operating system.
- User processes, held in the remaining high memory.
6.2 Dynamic Storage-Allocation Problem
In a variable-partition scheme, memory consists of allocated partitions and available blocks, called holes. The dynamic storage-allocation problem is how to satisfy a request for memory from a list of free holes. There are three common solutions:
- First-fit: Allocate the first hole that is big enough to satisfy the request.
- Best-fit: Allocate the smallest hole that is big enough; this requires searching the entire list of holes and produces the smallest leftover hole.
- Worst-fit: Allocate the largest hole; this also requires a full search and produces the largest leftover hole.
First-fit and best-fit are generally superior to worst-fit in terms of speed and storage utilization.
6.3 Fragmentation
- External Fragmentation: This occurs when total free memory space exists to satisfy a request, but it is not contiguous and is instead scattered in small, non-adjacent holes.
- Internal Fragmentation: This occurs when an allocated block of memory is larger than the requested amount. The size difference is memory that is internal to a partition but is not being used.
External fragmentation can be resolved using Compaction, which involves shuffling memory contents to place all free memory together in one large block. Compaction is only possible if address binding is dynamic and occurs at execution time.
7.0 Segmentation
7.1 Concept
Segmentation is a memory-management scheme that aligns with the user's view of memory. A program is seen as a collection of logical units called segments, such as a main program, procedures, functions, local variables, a stack, and so on.
7.2 Logical Address Structure
A logical address in a segmentation scheme is a two-tuple: **<segment-number, offset>**. The segment number specifies the segment, and the offset indicates the location within that segment.
7.3 Hardware Support
- Segment Table: This table maps the logical two-dimensional addresses to one-dimensional physical addresses. Each entry in the segment table contains:
base: The starting physical address where the segment resides in memory.limit: The length of the segment.
- Segment-table base register (STBR): A register that points to the segment table's location in memory.
- Segment-table length register (STLR): A register that indicates the number of segments used by a program, used to verify that a segment number is legal.
8.0 Paging
8.1 Core Concepts
- Paging is a memory-management scheme that permits a process's physical address space to be noncontiguous. Its primary advantage is that it avoids external fragmentation.
- Frames: Physical memory is divided into fixed-sized blocks called frames.
- Pages: Logical memory is divided into blocks of the same fixed size, called pages.
8.2 Address Translation
A logical address generated by the CPU is divided into two parts:
- Page number (p): Used as an index into a page table.
- Page offset (d): Combined with the base address from the page table to form the final physical address.
The page table stores the base address of each page in physical memory. The page number p is used to look up the corresponding frame number f in the page table, which is then combined with the page offset d to create the physical address.
8.3 Fragmentation in Paging
Paging is susceptible to internal fragmentation. Because a process may not require an exact number of pages, the last allocated frame may not be completely full.
- Worst-case fragmentation:
1 frame – 1 byte - On average fragmentation =
1 / 2 frame size
8.4 Page Table Implementation
- The Two Memory Access Problem: Because the page table is stored in main memory, every data or instruction access requires two memory accesses: one to fetch the frame number from the page table, and a second to access the actual data or instruction.
- Translation Look-aside Buffer (TLB): This problem is solved using a special, fast-lookup hardware cache called a Translation Look-aside Buffer (TLB) or associative memory. The TLB contains recently used page-to-frame mappings.
- Hit Ratio: The percentage of times that a page number is found in the TLB. A high hit ratio is crucial for performance.
- Effective Access Time (EAT): The average time for a memory access depends on the TLB hit ratio. Let
αbe the hit ratio,t_tlbbe the time for a TLB lookup, andt_membe the time for a memory access.- On a TLB hit (probability
α), the access time ist_tlb + t_mem. - On a TLB miss (probability
1-α), the access time ist_tlb + t_mem(to read the page table)+ t_mem(to access the data). - The formula is:
**EAT = α * (t_tlb + t_mem) + (1-α) * (t_tlb + 2*t_mem)**
- On a TLB hit (probability
8.5 Memory Protection
Memory protection in a paged environment is implemented using two primary mechanisms:
- Protection bits: These are associated with each frame and can define access rights, such as read-only or read-write.
- Valid-invalid bit: This bit is attached to each entry in the page table. A "valid" bit indicates the page is part of the process's legal logical address space. An "invalid" bit indicates it is not.
8.6 Shared Pages
Paging facilitates efficient code sharing. A single copy of read-only, reentrant code (code that does not modify itself) can be shared among multiple processes. Each process has its own page table, but the entries for the shared code all point to the same physical frames in memory.
9.0 Advanced Page Table Structures
9.1 The Problem with Large Address Spaces
For modern computer systems with large logical address spaces (e.g., 32-bit or 64-bit), a simple, flat page table can become enormous. For example, a 32-bit system with 4 KB pages would require a page table with over one million entries, consuming 4 MB of memory for the table alone. Advanced structures are needed to manage this.
9.2 Solution Techniques
- Hierarchical Paging: The page table itself is paged. For example, in a two-level scheme, an outer page table points to pages of a second-level (inner) page table.
- Hashed Page Tables: The virtual page number is hashed into a page table. The table contains a chain of elements that hash to the same location, which are then searched for a match. This is common for address spaces larger than 32 bits.
- Inverted Page Tables: This structure maintains one entry for each physical frame of memory rather than for each logical page of a process. This reduces the memory needed for page tables but increases the time required to search for a page.
10.0 Architecture-Specific Examples
10.1 Intel IA-32 (32-bit)
- Address Translation: The IA-32 architecture uses a two-stage process that combines segmentation with paging. A logical address is first passed to a segmentation unit to produce a
linear address. This linear address is then passed to a paging unit, which generates the finalphysical address. - Page Address Extension (PAE): To overcome the 4 GB memory limit of 32-bit addresses, Intel introduced PAE. This feature implements a three-level page table hierarchy and expands the physical address size to 36 bits, allowing access to up to 64 GB of physical memory.
10.2 Intel x86-64 (64-bit)
- Addressing: While the architecture is 64-bit, in practice, current implementations use 48 bits for virtual addressing.
- Paging Hierarchy: The x86-64 architecture uses a four-level paging hierarchy to manage the vast address space.
10.3 ARM Architecture
- Unit Sizes: The ARM architecture, dominant on mobile platforms, supports flexible memory unit sizes, including 4 KB and 16 KB pages as well as larger 1 MB and 16 MB sections.
- TLB Structure: It features a two-level TLB structure. The outer level has two micro TLBs (one for instructions, one for data), and the inner level is a single, larger main TLB.