Lecture 8: Memory Management

1.0 Fundamental Concepts

1.1 Core Requirements for Program Execution

For a program to be executed, it must adhere to several fundamental requirements related to memory:

The program must be brought from storage (disk) and loaded into main memory.
It must be placed within the context of a process.
The CPU can only directly access main memory and its own registers; therefore, all instructions and data must reside in one of these locations to be processed.

1.2 Key Terminology

Main Memory: The primary, volatile storage where programs and data are kept when they are running.
Registers: The fastest memory, located directly on the CPU, used to hold data currently being processed.
Cache: A small, fast memory that sits between the CPU and main memory to reduce the average time to access data.

1.3 Memory Hierarchy Performance

Accessing registers is extremely fast, often taking one CPU clock cycle or less. In contrast, accessing main memory can take many cycles, forcing the CPU to wait, an event known as a stall. To mitigate this performance bottleneck, a cache is used. It stores frequently accessed data from main memory, allowing the CPU to retrieve it much faster and avoid stalls.

1.4 Memory Protection

Hardware address protection is essential to ensure that a process operates only within its own allocated memory space. This is achieved using two dedicated registers:

A base register holds the smallest legal physical memory address for a process.
A limit register specifies the size of the process's logical address space.

To ensure protection, every memory access generated by a user process is checked by the hardware. The logical address (addr) must satisfy the condition 0 <= addr < limit_register. If this check fails, a trap to the operating system occurs. If the check succeeds, the logical address is added to the base register (base_register + addr) to create the final physical address, which is then sent to memory.

2.0 Address Binding

2.1 Definition

Address Binding is the process of mapping program addresses from one address space to another. This typically involves translating symbolic addresses used in source code to relocatable addresses during compilation, and finally to absolute physical addresses when the program is loaded or executed.

2.2 Binding Stages

The binding of instructions and data to memory addresses can occur at three distinct stages in a program's lifecycle:

Compile time: If the final memory location of the process is known in advance, the compiler can generate absolute code containing the final physical addresses. If the starting location ever changes, the code must be recompiled.
Load time: If the final memory location is not known at compile time, the compiler must generate relocatable code. The final binding to absolute physical addresses is delayed until the program is loaded into memory.
Execution time: If a process can be moved from one memory segment to another during its execution, binding is delayed until run time. This approach requires hardware support, such as base and limit registers, to map logical addresses to physical ones dynamically.

2.3 Address Types

Symbolic addresses: Addresses used in source code, such as variable or function names.
Relocatable addresses: Addresses relative to the start of a program module (e.g., "14 bytes from the beginning of this module").
Absolute addresses: The final physical addresses in main memory.

3.0 Logical vs. Physical Address Space

3.1 Definitions

The distinction between logical and physical addresses is central to modern memory management. Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding schemes.

Address Type	Definition
Logical Address	An address generated by the CPU, also known as a virtual address. It represents the address from the program's perspective.
Physical Address	An address seen by the memory unit itself. It is the actual address in the main memory hardware.

3.2 MMU Function

The Memory-Management Unit (MMU) is a hardware device that maps logical (virtual) addresses to physical addresses at run time. In a simple scheme, this is achieved using a relocation register (another name for the base register). The value in the relocation register is added to every logical address generated by a user process to produce the corresponding physical address before it is sent to memory. This ensures a user program only deals with logical addresses and never sees the real physical addresses. As a historical example, MS-DOS on the Intel 80x86 used four relocation registers.

4.0 Dynamic Loading and Linking

4.1 Dynamic Loading

Dynamic Loading is a technique where a routine is not loaded into memory until it is called. This provides a significant advantage: better memory-space utilization, as an unused routine is never loaded. It is particularly useful for handling infrequently occurring cases that require large amounts of code. No special support from the operating system is required, though the OS can help by providing libraries to implement dynamic loading.

4.2 Dynamic vs. Static Linking

Static Linking: The traditional approach where system libraries are combined with the program code by the loader to create a single binary program image before execution.
Dynamic Linking: The linking process is postponed until execution time. Systems using this approach are also known as shared libraries. When a library routine is first called, a small piece of code called a stub is used to locate the memory-resident library routine. The stub then replaces itself with the routine's actual address and executes it. This allows multiple processes to share a single copy of a library, saving memory.

5.0 Swapping

5.1 Definition and Purpose

Swapping is a mechanism where a process can be temporarily moved out of main memory to a Backing Store to free up memory for other processes. The process can then be brought back into memory for continued execution. A backing store is typically a fast disk that is large enough to hold copies of the memory images for all users.

5.2 Process and Performance

A common swapping variant is "Roll out, roll in," which is used in priority-based scheduling. A lower-priority process is swapped out so that a higher-priority process can be loaded and executed. The most significant component of swap time is transfer time, which is directly proportional to the amount of memory being moved between main memory and the backing store.

5.3 Constraints on Swapping

A key constraint on swapping is handling pending I/O. A process that has an I/O operation in progress cannot be swapped out, as the I/O would then occur into the memory space of a different process. This can be managed by only performing I/O into kernel buffers, but this adds overhead (double buffering).

5.4 Swapping in Mobile Systems

Traditional swapping is generally not supported on mobile systems like iOS and Android. This is due to the limitations of flash memory, which has a finite number of write cycles and lower throughput. Instead, these operating systems use alternative strategies:

iOS asks applications to voluntarily relinquish memory when levels are low. If an app fails to comply, it may be terminated.
Android may terminate an application to free memory, but it first saves the application's state to flash storage, allowing for a fast restart.

6.0 Contiguous Memory Allocation

6.1 Definition

Contiguous Allocation is an early memory management method where each process is contained in a single, contiguous section of memory. Main memory is typically divided into two partitions:

The resident operating system.
User processes, held in the remaining high memory.

6.2 Dynamic Storage-Allocation Problem

In a variable-partition scheme, memory consists of allocated partitions and available blocks, called holes. The dynamic storage-allocation problem is how to satisfy a request for memory from a list of free holes. There are three common solutions:

First-fit: Allocate the first hole that is big enough to satisfy the request.
Best-fit: Allocate the smallest hole that is big enough; this requires searching the entire list of holes and produces the smallest leftover hole.
Worst-fit: Allocate the largest hole; this also requires a full search and produces the largest leftover hole.

First-fit and best-fit are generally superior to worst-fit in terms of speed and storage utilization.

6.3 Fragmentation

External Fragmentation: This occurs when total free memory space exists to satisfy a request, but it is not contiguous and is instead scattered in small, non-adjacent holes.
Internal Fragmentation: This occurs when an allocated block of memory is larger than the requested amount. The size difference is memory that is internal to a partition but is not being used.

External fragmentation can be resolved using Compaction, which involves shuffling memory contents to place all free memory together in one large block. Compaction is only possible if address binding is dynamic and occurs at execution time.

7.0 Segmentation

7.1 Concept

Segmentation is a memory-management scheme that aligns with the user's view of memory. A program is seen as a collection of logical units called segments, such as a main program, procedures, functions, local variables, a stack, and so on.

7.2 Logical Address Structure

A logical address in a segmentation scheme is a two-tuple: **<segment-number, offset>**. The segment number specifies the segment, and the offset indicates the location within that segment.

7.3 Hardware Support

Segment Table: This table maps the logical two-dimensional addresses to one-dimensional physical addresses. Each entry in the segment table contains:
- base: The starting physical address where the segment resides in memory.
- limit: The length of the segment.
Segment-table base register (STBR): A register that points to the segment table's location in memory.
Segment-table length register (STLR): A register that indicates the number of segments used by a program, used to verify that a segment number is legal.

8.0 Paging

8.1 Core Concepts

Paging is a memory-management scheme that permits a process's physical address space to be noncontiguous. Its primary advantage is that it avoids external fragmentation.
Frames: Physical memory is divided into fixed-sized blocks called frames.
Pages: Logical memory is divided into blocks of the same fixed size, called pages.

8.2 Address Translation

A logical address generated by the CPU is divided into two parts:

Page number (p): Used as an index into a page table.
Page offset (d): Combined with the base address from the page table to form the final physical address.

The page table stores the base address of each page in physical memory. The page number p is used to look up the corresponding frame number f in the page table, which is then combined with the page offset d to create the physical address.

8.3 Fragmentation in Paging

Paging is susceptible to internal fragmentation. Because a process may not require an exact number of pages, the last allocated frame may not be completely full.

Worst-case fragmentation: 1 frame – 1 byte
On average fragmentation = 1 / 2 frame size

8.4 Page Table Implementation

The Two Memory Access Problem: Because the page table is stored in main memory, every data or instruction access requires two memory accesses: one to fetch the frame number from the page table, and a second to access the actual data or instruction.
Translation Look-aside Buffer (TLB): This problem is solved using a special, fast-lookup hardware cache called a Translation Look-aside Buffer (TLB) or associative memory. The TLB contains recently used page-to-frame mappings.
Hit Ratio: The percentage of times that a page number is found in the TLB. A high hit ratio is crucial for performance.
Effective Access Time (EAT): The average time for a memory access depends on the TLB hit ratio. Let α be the hit ratio, t_tlb be the time for a TLB lookup, and t_mem be the time for a memory access.
- On a TLB hit (probability α), the access time is t_tlb + t_mem.
- On a TLB miss (probability 1-α), the access time is t_tlb + t_mem (to read the page table) + t_mem (to access the data).
- The formula is: **EAT = α * (t_tlb + t_mem) + (1-α) * (t_tlb + 2*t_mem)**

8.5 Memory Protection

Memory protection in a paged environment is implemented using two primary mechanisms:

Protection bits: These are associated with each frame and can define access rights, such as read-only or read-write.
Valid-invalid bit: This bit is attached to each entry in the page table. A "valid" bit indicates the page is part of the process's legal logical address space. An "invalid" bit indicates it is not.

8.6 Shared Pages

Paging facilitates efficient code sharing. A single copy of read-only, reentrant code (code that does not modify itself) can be shared among multiple processes. Each process has its own page table, but the entries for the shared code all point to the same physical frames in memory.

9.0 Advanced Page Table Structures

9.1 The Problem with Large Address Spaces

For modern computer systems with large logical address spaces (e.g., 32-bit or 64-bit), a simple, flat page table can become enormous. For example, a 32-bit system with 4 KB pages would require a page table with over one million entries, consuming 4 MB of memory for the table alone. Advanced structures are needed to manage this.

9.2 Solution Techniques

Hierarchical Paging: The page table itself is paged. For example, in a two-level scheme, an outer page table points to pages of a second-level (inner) page table.
Hashed Page Tables: The virtual page number is hashed into a page table. The table contains a chain of elements that hash to the same location, which are then searched for a match. This is common for address spaces larger than 32 bits.
Inverted Page Tables: This structure maintains one entry for each physical frame of memory rather than for each logical page of a process. This reduces the memory needed for page tables but increases the time required to search for a page.

10.0 Architecture-Specific Examples

10.1 Intel IA-32 (32-bit)

Address Translation: The IA-32 architecture uses a two-stage process that combines segmentation with paging. A logical address is first passed to a segmentation unit to produce a linear address. This linear address is then passed to a paging unit, which generates the final physical address.
Page Address Extension (PAE): To overcome the 4 GB memory limit of 32-bit addresses, Intel introduced PAE. This feature implements a three-level page table hierarchy and expands the physical address size to 36 bits, allowing access to up to 64 GB of physical memory.

10.2 Intel x86-64 (64-bit)

Addressing: While the architecture is 64-bit, in practice, current implementations use 48 bits for virtual addressing.
Paging Hierarchy: The x86-64 architecture uses a four-level paging hierarchy to manage the vast address space.

10.3 ARM Architecture

Unit Sizes: The ARM architecture, dominant on mobile platforms, supports flexible memory unit sizes, including 4 KB and 16 KB pages as well as larger 1 MB and 16 MB sections.
TLB Structure: It features a two-level TLB structure. The outer level has two micro TLBs (one for instructions, one for data), and the inner level is a single, larger main TLB.

First Term

Intro to Cybersecurity

Lectures

IT Essentials

Assignments

Exams

Lectures

Sections

Math

Lectures

Models

Python

Assignments

Exams

Lectures

Sections

Second Term

C Essentials

C Essentials

Lectures

Cybersecurity Essentials

Lectures

Intro To IoT

Assignments

Lectures

MS Office

Assignments

C++

Assignments

Lectures

Summaries

Tasks

DB

Lectures

Summaries

DigitalEngineering

Assingments

Lectures

Sheets

Linux

Assignments

Lectures

Tasks

OS

Assignments

Lectures

Summaries

Summaries second attempt

WebDevelopment

Lectures

Summaries

Tasks

Chapters

Chapter One

Lecture 8: Memory Management ​

1.0 Fundamental Concepts ​

1.1 Core Requirements for Program Execution ​

1.2 Key Terminology ​

1.3 Memory Hierarchy Performance ​

1.4 Memory Protection ​

2.0 Address Binding ​

2.1 Definition ​

2.2 Binding Stages ​

2.3 Address Types ​

3.0 Logical vs. Physical Address Space ​

3.1 Definitions ​

3.2 MMU Function ​

4.0 Dynamic Loading and Linking ​

4.1 Dynamic Loading ​

4.2 Dynamic vs. Static Linking ​

5.0 Swapping ​

5.1 Definition and Purpose ​

5.2 Process and Performance ​

5.3 Constraints on Swapping ​

5.4 Swapping in Mobile Systems ​

6.0 Contiguous Memory Allocation ​

6.1 Definition ​

6.2 Dynamic Storage-Allocation Problem ​

6.3 Fragmentation ​

7.0 Segmentation ​

Lecture 8: Memory Management

1.0 Fundamental Concepts

1.1 Core Requirements for Program Execution

1.2 Key Terminology

1.3 Memory Hierarchy Performance

1.4 Memory Protection

2.0 Address Binding

2.1 Definition

2.2 Binding Stages

2.3 Address Types

3.0 Logical vs. Physical Address Space

3.1 Definitions

3.2 MMU Function

4.0 Dynamic Loading and Linking

4.1 Dynamic Loading

4.2 Dynamic vs. Static Linking

5.0 Swapping

5.1 Definition and Purpose

5.2 Process and Performance

5.3 Constraints on Swapping

5.4 Swapping in Mobile Systems

6.0 Contiguous Memory Allocation

6.1 Definition

6.2 Dynamic Storage-Allocation Problem

6.3 Fragmentation

7.0 Segmentation

7.1 Concept

7.2 Logical Address Structure

7.3 Hardware Support

8.0 Paging

8.1 Core Concepts

8.2 Address Translation

8.3 Fragmentation in Paging

8.4 Page Table Implementation

8.5 Memory Protection

8.6 Shared Pages

9.0 Advanced Page Table Structures

9.1 The Problem with Large Address Spaces

9.2 Solution Techniques

10.0 Architecture-Specific Examples

10.1 Intel IA-32 (32-bit)

10.2 Intel x86-64 (64-bit)

10.3 ARM Architecture