Tải bản đầy đủ
Execute the module’s mod_install function indirectly by looking up the module _init() routine ...

Execute the module’s mod_install function indirectly by looking up the module _init() routine ...

Tải bản đầy đủ

120

Kernel Bootstrap and Initialization

issued for an updated view. You can use the nm(1) command to examine an object’s
symbol table; use /dev/ksyms to examine the kernel’s table.
# nm -x
[1953]
[10072]
[1973]
[1972]
[9926]

/dev/ksyms | grep modload
|0xf011086c|0x000000ac|FUNC
|0xf011113c|0x000000a0|FUNC
|0xf0111398|0x000000b8|FUNC
|0xf01111dc|0x000000c8|FUNC
|0xf0111450|0x000000a4|FUNC

|LOCL
|GLOB
|LOCL
|LOCL
|GLOB

|0
|0
|0
|0
|0

|ABS
|ABS
|ABS
|ABS
|ABS

|modctl_modload
|modload
|modload_now
|modload_thread
|modloadonly

The preceding example searches the symbol table of the running kernel for modload, a kernel function we discussed earlier. The command returned several
matches that contain the modload string, including the desired modload function
symbol. (For more information on symbol tables and specific information on the
columns listed, see the nm(1), a.out(4), and elf(3E) manual pages. Also, refer to
any number of texts that describe the Executable and Linking Format (ELF) file,
which is discussed in more detail in Chapter 4.)
In step 5, we indicate that the module install code is invoked indirectly through
the module’s _init() function. Several functions must be included in any loadable kernel module to facilitate dynamic loading. Device drivers and STREAMS
modules must be coded for dynamic loading. As such, a loadable driver interface is
defined. In general, the required routines and data structures that are documented apply to all loadable kernel modules—not just to drivers and STREAMS
modules (although there are components that are specific to drivers)—and do not
apply to objects such as loadable system calls, file systems, or scheduling classes.
Within a loadable kernel object, an initialization, information, and finish routine must be coded, as per the definitions in the _init(9E), _info(9E), and
_fini(9E) manual pages. A module’s _init() routine is called to complete the
process of making the module usable after it has been loaded. The module’s
_info() and _fini() routines also invoke corresponding kernel module management interfaces, as shown in Table 4-2.
Table 4-2 Module Management Interfaces
Kernel
Module
Routine
_init()
_info()
_fini()

Module
Description
Facility
Interface
mod_install() Loads a kernel module.
mod_info()
Retrieves module information.
mod_remove()
Unloads a kernel module.

Module installation is abstracted to define a generic set of structures and interfaces within the kernel. Module operations function pointers for installing, removing, and information gathering (the generic interfaces shown in Table 4-2) are
maintained in a mod_ops structure, which is extended to provide a definition for
each type of loadable module. For example, there is a mod_installsys() func-

Kernel Module Loading and Linking

121

tion specific to loading system calls, a mod_installdrv() function specific to
loading device drivers, and so forth.
For each of these module types, a module linkage structure is defined; it contains a pointer to the operations structure, a pointer to a character string describing the module, and a pointer to a module-type-specific structure. For example, the
linkage structure for loadable system calls, modlsys, contains a pointer to the system entry table, which is the entry point for all system calls. Each loadable kernel
module is required to declare and initialize the appropriate type-specific linkage
structure, as well as a generic modlinkage structure that provides the generic
abstraction for all modules.

modlinkage
ml_linkage[]
modlxxx
xxx_modops

mod_xxxops

mod_installdrv()

mod_installxxx
mod_removexxx
mod_infoxxx

mod_installsys()
mod_installfs()
mod_installstrmod()
mod_installsched()

_init()
mod_install()

module-specific
functions

Within the module facility is a module type-specific routine for installing modules, entered through the MODL_INSTALL macro called from the generic
mod_install() code. More precisely, a loadable module’s _init() routine calls
mod_install(), which vectors to the appropriate module-specific routine through
the MODL_INSTALL macro. This procedure is shown in Figure 4.6.

mod_installexec()

A loadable kernel module
mod_install()
MODL_INSTALL(&modlinkage)

Macro will resolve to typespecific install function.

kernel module support code
Figure 4.6 Module Operations Function Vectoring
Figure 4.6 shows the data structures defined in a loadable kernel module: the
generic modlinkage, through which is referenced a type-specific linkage structure (modlxxx), which in turn links to a type-specific operations structure that
contains pointers to the type-specific functions for installing, removing, and gathering information about a kernel module. The MODL_INSTALL macro is passed the
address of the module’s generic linkage structure and from there vectors in to the
appropriate function. The module-specific installation steps are summarized in
Table 4-3.

122

Kernel Bootstrap and Initialization

Table 4-3 Module Install Routines
Module
Type
Device driver

mod_installdrv

System call

mod_installsys

File system

mod_installfs

STREAMS

mod_installstrmod

modules
Scheduling
class
Exec module

Install Function

mod_installsched
mod_installexec

Summary
Wrapper for ddi_installdrv().
Install the driver entry in the kernel devops table.
Install the system call’s sysent
table entry in the kernel sysent
table.
Installs the file system Virtual File
System (VFS) switch table entry.
Install the STREAMS entry in the
kernel fmodsw switch table.
Install the scheduling class in the
kernel sclass array.
Install the exec entry in the kernel
execsw switch table.

The summary column in Table 4-3 shows a definite pattern to the module installation functions. In many subsystems, the kernel implements a switch table mechanism to vector to the correct kernel functions for a specific file system, scheduling
class, exec function, etc. The details of each implementation are covered in subsequent areas of the book, as applicable to a particular chapter or heading.
As we’ve seen, the dynamic loading of a kernel module is facilitated through two
major kernel subsystems: the module management code and the kernel runtime
linker. These kernel components make use of other kernel services, such as the
kernel memory allocator, kernel locking primitives, and the kernel ksyms driver,
taking advantage of the modular design of the system and providing a good example of the layered model discussed earlier.

Part Two
THE SOLARIS
MEMORY SYSTEM

• Solaris Memory Architecture
• Kernel Memory
• Memory Monitoring

123

124

5
SOLARIS MEMORY
ARCHITECTURE
T

he virtual memory system can be considered the core of a Solaris system, and
the implementation of Solaris virtual memory affects just about every other subsystem in the operating system. In this chapter, we’ll take a look at some of the
memory management basics and then step into a more detailed analysis of how
Solaris implements virtual memory management. Subsequent chapters in Part
Two discuss kernel memory management and that can be used to monitor and
manage virtual memory.

5.1

Why Have a Virtual Memory System?
A virtual memory system offers the following benefits:
• It presents a simple memory programming model to applications so that
application developers need not know how the underlying memory hardware
is arranged.
• It allows processes to see linear ranges of bytes in their address space,
regardless of the physical layout or fragmentation of the real memory.
• It gives us a programming model with a larger memory size than available
physical storage (e.g., RAM) and enables us to use slower but larger secondary storage (e.g., disk) as a backing store to hold the pieces of memory that
don’t fit in physical memory.

125

126

Solaris Memory Architecture

A virtual view of memory storage, known as an address space, is presented to the
application while the VM system transparently manages the virtual storage
between RAM and secondary storage. Because RAM is significantly faster than
disk, (100 ns versus 10 ms, or approximately 100,000 times faster), the job of the
VM system is to keep the most frequently referenced portions of memory in the
faster primary storage. In the event of a RAM shortage, the VM system is required
to free RAM by transferring infrequently used memory out to the backing store. By
so doing, the VM system optimizes performance and removes the need for users to
manage the allocation of their own memory.
Multiple users’ processes can share memory within the VM system. In a multiuser environment, multiple processes can be running the same process executable binaries; in older Unix implementations, each process had its own copy of the
binary—a vast waste of memory resources. The Solaris virtual memory system
optimizes memory use by sharing program binaries and application data among
processes, so memory is not wasted when multiple instances of a process are executed. The Solaris kernel extended this concept further when it introduced dynamically linked libraries in SunOS, allowing C libraries to be shared among
processes.
To properly support multiple users, the VM system implements memory protection. For example, a user’s process must not be able access the memory of another
process, otherwise security could be compromised or a program fault in one program could cause another program (or the entire operating system) to fail. Hardware facilities in the memory management unit perform the memory protection
function by preventing a process from accessing memory outside its legal address
space (except for memory that is explicitly shared between processes).
Physical memory (RAM) is divided into fixed-sized pieces called pages. The size
of a page can vary across different platforms; the common size for a page of memory on an UltraSPARC Solaris system is 8 Kbytes. Each page of physical memory
is associated with a file and offset; the file and offset identify the backing store for
the page. The backing store is the location to which the physical page contents will
be migrated (known as a page-out) should the page need to be taken for another
use; it’s also the location the file will be read back in from if it’s migrated in
(known as a page-in). Pages used for regular process heap and stack, known as
anonymous memory, have the swap file as their backing store. A page can also be a
cache of a page-sized piece of a regular file. In that case, the backing store is simply the file it’s caching—this is how Solaris uses the memory system to cache files.
If the virtual memory system needs to take a dirty page (a page that has had its
contents modified), its contents are migrated to the backing store. Anonymous
memory is paged out to the swap device when the page is freed. If a file page needs
to be freed and the page size piece of the file hasn’t been modified, then the page
can simply be freed; if the piece has been modified, then it is first written back out
to the file (the backing store in this case), then freed.
Rather than managing every byte of memory, we use page-sized pieces of memory to minimize the amount of work the virtual memory system has to do to main-

Why Have a Virtual Memory System?

127

tain virtual to physical memory mappings. Figure 5.1 shows how the management
and translation of the virtual view of memory (the address space) to physical memory is performed by hardware, known as the virtual memory management unit
(MMU).

MMU
V

P

Process
Scratch
Memory
(Heap)

0000

Process
Binary

Process’s
Linear Virtual
Address Space

Virtual
Memory
Segments

Page size
Pieces of
Virtual
Memory

Virtual-toPhysical
Physical
Translation Memory
Pages
Tables

Physical
Memory

Figure 5.1 Solaris Virtual-to-Physical Memory Management
The Solaris kernel breaks up the linear virtual address space into segments, one
for each type of memory area in the address space. For example, a simple process
has a memory segment for the process binary and one for the scratch memory
(known as heap space). Each segment manages the mapping for the virtual
address range mapped by that segment and converts that mapping into MMU
pages. The hardware MMU maps those pages into physical memory by using a
platform-specific set of translation tables. Each entry in the table has the physical
address of the page of memory in RAM, so that memory accesses can be converted
on-the-fly in hardware. We cover more on how the MMU works later in the chapter when we discuss the platform-specific implementations of memory management.
Recall that we can have more virtual address space than physical address space
because the operating system can overflow memory onto a slower medium, like a
disk. The slower medium in Unix is known as swap space. Two basic types of memory management manage the allocation and migration of physical pages of memory to and from swap space: swapping and demand paging.

128

Solaris Memory Architecture

The swapping algorithm for memory management uses a user process as the
granularity for managing memory. If there is a shortage of memory, then all of the
pages of memory of the least active process are swapped out to the swap device,
freeing memory for other processes. This method is easy to implement, but performance suffers badly during a memory shortage because a process cannot resume
execution until all of its pages have been brought back from secondary storage.
The demand-paged model uses a page as the granularity for memory management. Rather than swapping out a whole process, the memory system just swaps
out small, least used chunks, allowing processes to continue while an inactive part
of the process is swapped out.
The Solaris kernel uses a combined demand-paged and swapping model.
Demand paging is used under normal circumstances, and swapping is used only as
a last resort when the system is desperate for memory. We cover swapping and
paging in more detail in “The Page Scanner” on page 178.
The Solaris VM system implements many more functions than just management of application memory. In fact, the Solaris virtual memory system is responsible for managing most objects related to I/O and memory, including the kernel,
user applications, shared libraries, and file systems. This strategy differs significantly from other operating systems like earlier versions of System V Unix, where
file system I/O used a separate buffer cache
One of the major advantages of using the VM system to manage file system
buffering is that all free memory in the system is used for file buffering, providing
significant performance improvements for applications that use the file system and
removing the need for tuning the size of the buffer cache. The VM system can allocate all free memory for file system buffers, meaning that on a typical system with
file system I/O, the amount of free memory available is almost zero. This number
can often be misleading and has resulted in numerous, bogus, memory-leak bugs
being logged over the years. Don’t worry, “almost zero” is normal. (Note that free
memory is no longer always low with Solaris 8.)
In summary, a VM system performs these major functions:
• It manages virtual-to-physical mapping of memory
• It manages the swapping of memory between primary and secondary storage
to optimize performance
• It handles requirements of shared images between multiple users and processes

5.2

Modular Implementation
Early SunOS versions (SunOS 3 and earlier) were based on the old BSD-style
memory system, which was not modularlized, and thus it was difficult to move the

Modular Implementation

129

memory system to different platforms. The virtual memory system was completely
redesigned at that time, with the new memory system targeted at SunOS 4.0. The
new SunOS 4.0 virtual memory system was built with the following goals in mind:
• Use of a new object-oriented memory management framework
• Support for shared and private memory (copy-on-write)
• Page-based virtual memory management
The VM system that resulted from these design goals provides an open framework
that now supports many different memory objects. The most important objects of
the memory system are segments, vnodes, and pages. For example, all of the following have been implemented as abstractions of the new memory objects:






Physical memory, in chunks called pages
A new virtual file object, known as the vnode
File systems as hierarchies of vnodes
Process address spaces as segments of mapped vnodes
Kernel address space as segments of mapped vnodes

• Mapped hardware devices, such as frame buffers, as segments of hardware-mapped pages
The Solaris virtual memory system we use today is implemented according to the
framework of the SunOS 4.0 rewrite. It has been significantly enhanced to provide
scalable performance on multiprocessor platforms and has been ported to many
platforms. Figure 5.2 shows the layers of the Solaris virtual memory implementation.
Physical memory management is done by the hardware MMU and a hardware-specific address translation layer known as the Hardware Address Translation (HAT) layer. Each memory management type has its own specific HAT
implementation. Thus, we can separate the common machine-independent memory management layers from the hardware-specific components to minimize the
amount of platform-specific code that must be written for each new platform.
The next layer is the address space management layer. Address spaces are mappings of segments, which are created with segment device drivers. Each segment
driver manages the mapping of a linear virtual address space into memory pages
for different device types (for example, a device such as a graphics frame buffer
can be mapped into an address space). The segment layers manage virtual memory as an abstraction of a file. The segment drivers call into the HAT layer to create the translations between the address space they are managing and the
underlying physical pages.

130

Solaris Memory Architecture

Global Page Replacement Manager — Page Scanner

Address Space Management

segkmem

segmap

segvn

Kernel Memory
Segment

File Cache Memory
Segment

Process Memory
Segment

Hardware Address Translation (HAT) Layer
sun4c
HAT layer

sun4m
HAT layer

sun4d
HAT layer

sun4u
HAT layer

x86
HAT layer

sun4c
sun4-mmu

sun4m
sr-mmu

sun4d
sr-mmu

sun4u
sf-mmu

x86
i386 mmu

32/32-bit
4K pages

32/36-bit
4K pages

32/36-bit
4K pages

64/64-bit
8K/4M pages

32/36-bit
4K pages

Figure 5.2 Solaris Virtual Memory Layers

5.3

Virtual Address Spaces
The virtual address space of a process is the range of memory addresses that are
presented to the process as its environment; some addresses are mapped to physical memory, some are not. A process’s virtual address space skeleton is created by
the kernel at the time the fork() system call creates the process. (See “Process
Creation” on page 293.) The virtual address layout within a process is set up by
the dynamic linker and sometimes varies across different hardware platforms. As
we saw in Figure 5.1 on page 127, virtual address spaces are assembled from a
series of memory segments. Each process has at least four segments: