Tải bản đầy đủ
6 Procfs — The Process File System

6 Procfs — The Process File System

Tải bản đầy đủ

Procfs — The Process File System


• /proc//as — The process’s address space, as defined by the p_as link
to an address space structure (struct as) in the process’s proc structure. In
other words, the process’s address space as represented by the
/proc//as file is not a /proc-specific representation of the address
space. Rather, /proc provides a path to address space mappings through the
proc structure’s p_as pointer.
• /proc//ctl — A process control file. Can be opened for write-only,
and can be used to send control messages to a process to initiate a specific
event or to enable a particular behaviour. Examples include stopping or starting a process, setting stops on specific events, or turning on microstate
This file exemplifies the power and elegance of procfs; you can accomplish
process control and event tracing by opening the control file for the target
process and writing a control message (or multiple control messages) to inject
desired behavior. See the proc(4) man page for a detailed list of control messages and functions.
• /proc//status — General state and status information about the
process. The specific contents are defined in the pstatus structure, defined
in /usr/include/sys/procfs.h. pstatus is also described in proc(4).
Note that pstatus embeds an lwpstatus structure (the pr_lwp field of
pstatus). This structure is described as a representative LWP. A nonthreaded process has only one LWP, so selecting a representative LWP is simple. For threaded processes with multiple LWPs, an internal kernel routine
loops through all the LWPs in the process and selects one on the basis of its
state. First choice is an executing LWP. If an executing LWP is not available,
selection criteria look for runnable, sleeping, or stopped.
• /proc//lstatus — An array of lwpstatus structures, one for each
LWP in the process.
• /proc//psinfo — Process information as provided by the ps(1) command. Similar to the status data as described above, in that a representative
LWP is included with an embedded lwpsinfo structure.
• /proc//lpsinfo — Per-LWP ps(1) information.
• /proc//map — Address space map information. The data displayed by
the pmap(1) command.
• /proc//rmap — Reserved address space segments of the process.
• /proc//xmap — Extended address space map information. The data
displayed when the pmap(1) command is run with the -x flag.
• /proc//cred — Process credentials, as described in the prcred
structure (/usr/include/sys/procfs.h).


The Solaris Multithreaded Process Architecture

• /proc//sigact — Array of sigaction structures, each representing
the signal disposition for all signals associated with the process.
• /proc//auxv — An array of auxv (auxiliary vector, defined in
/usr/include/sys/auxv.h) structures, with the initial values as passed to
the dynamic linker when the process was exec’d.
• /proc//ldt — Local descriptor table. Intel x86 architecture only.
• /proc//usage — Process resource usage data. See “Process Resource
Usage” on page 318.
• /proc//lusage — Array of LWP resource usage data. See “Process
Resource Usage” on page 318.
• /proc//pagedata — Another representation of the process’s address
space. Provides page-level reference and modification tracking.
• /proc//watch — An array of prwatch structures (defined in
/usr/include/sys/procfs.h), as created when the kernel sets a PCWATCH
operation by writing to the control file. Allows for monitoring (watching) one
or more address space ranges, such that a trap is generated when a memory
reference is made to a watched page.
• /proc//cwd — Symbolic link to the process’s current working directory.
• /proc//root — Symbolic link to the process’s root directory.
• /proc//fd — Directory that contains references to the process’s open
• /proc//fd/nn — The process’s open file descriptors. Directory files
are represented as symbolic links.
• /proc//object — Subdirectory containing binary shared object files
the process is linked to.
• /proc//object/nn — Binary object files. The process’s executable
binary (a.out), along with shared object files the process is linked to.
In addition to the file objects and subdirectories maintained at the process level,
each /proc// directory has an lwp subdirectory, which contains several file
objects that contain per-LWP data. Subdirectories are described below.
• /proc//lwp — Subdirectory containing files that represent all the
LWPs in the process.
• /proc//lwp/ — Subdirectory containing the procfs files specific to an LWP.
• /proc//lwp//lwpctl — Control file for issuing control operations on a per-LWP basis.

Procfs — The Process File System


• /proc//lwp//lwpstatus — LWP state and status information,
• /proc//lwp//lwpsinfo — LWP ps(1) command information, as defined in lwpsinfo, also in /usr/include/sys/procfs.h.
• /proc//lwp//lwpusage — LWP resource usage data. See
“Process Resource Usage” on page 318.
• /proc//lwp//xregs — Extra general state registers — This
file is processor-architecture specific and may not be present on some platforms. On SPARC-based systems, the data contained in this file is defined in
the prxregset structure, in /usr/include/sys/procfs_isa.h.
• /proc//lwp//gwindows — General register windows. This
file exists on SPARC-based systems only and represents the general register
set of the LWP (part of the hardware context), as defined in the gwindows
structure in /usr/include/sys/regset.h.
• /proc//lwp//asrs — Ancillary register set. SPARC V9
architecture (UltraSPARC) only. An additional set of hardware registers
defined by the SPARC V9 architecture. This file, representing the ancillary
registers, is present only on sun4u based systems running a 64-bit kernel
(Solaris 7 or later), and only for 64-bit processes. (Remember, a 64-bit kernel
can run 32-bit processes. A 32-bit process will not have this file in its lwp subdirectory space.)
That completes the listing of files and subdirectories in the procfs directory space.
Once again, please refer to the proc(4) manual page for more detailed information on the various files in /proc and for a complete description of the control messages available.

8.6.1 Procfs Implementation
Procfs is implemented as a dynamically loadable kernel module, /kernel/fs/procfs, and is loaded automatically by the system at boot time. /proc is
mounted during system startup by virtue of the default /proc entry in the
/etc/vfstab file. The mount phase causes the invocation of the procfs prinit()
(initialize) and prmount() file-system-specific functions, which initialize the vfs
structure for procfs and create and initialize a vnode for the top-level directory
file, /proc.
The kernel memory space for the /proc files is, for the most part, allocated
dynamically, with an initial static allocation for the number of directory slots
required to support the maximum number of processes the system is configured to
support (see “The Kernel Process Table” on page 290).


The Solaris Multithreaded Process Architecture

A kernel procdir (procfs directory) pointer is initialized as a pointer to an
array of procent (procfs directory entry) structures. The size of this array is
derived from the v.v_proc variable established at boot time, representing the
maximum number of processes the system can support. The entry in procdir
maintains a pointer to the process structure and maintains a link to the next entry
in the array. The procdir array is indexed through the pr_slot field in the process’s pid structure. The procdir slot is allocated to the process from the array
and initialized at process creation time (fork()), as shown in Figure 8.16.

kernel process table




The process’s PID structure
pid_prslot pointer indexes
the procdir array.





Figure 8.16 procfs Kernel Process Directory Entries
The specific format of the procfs directory entries is described in the procfs kernel
code. It is modeled after a typical on-disk file system: each directory entry in the
kernel is described with a directory name, offset into the directory, a length field,
and an inode number. The inode number for a /proc file object is derived internally from the file object type and process PID. Note that /proc directory entries
are not cached in the directory name lookup cache (dnlc); by definition they are
already in physical memory.
Because procfs is a file system, it is built on the Virtual File System (VFS) and
vnode framework. In Solaris, an instance of a file system is described by a vfs
object, and the underlying files are each described by a vnode. The vfs/vnode
architecture is described in “Solaris File System Framework” on page 541. Procfs
builds the vfs and vnode structures, which are used to reference the file-system-specific functions for operations on the file systems (e.g., mount, unmount),
and file system specific functions on the /proc directories and file objects (e.g.,
open, read, write).
Beyond the vfs and vnode structures, the procfs implementation defines two
primary data structures used to describe file objects in the /proc file system. The
first, prnode, is the file-system-specific data linked to the vnode. Just as the kernel UFS implementation defines an inode as a file-system-specific structure that
describes a UFS file, procfs defines a prnode to describe a procfs file. Every file in
the /proc directory has a vnode and prnode. A second structure, prcommon,
exists at the directory level for /proc directory files. That is, the /proc/
and /proc//lwp/ directories each have a link to a prcommon

Procfs — The Process File System


structure. The underlying nondirectory file objects within /proc/ and
/proc//lwp/ do not have an associated prcommon structure. The
reason is that prcommon’s function is the synchronization of access to the file
objects associated with a process or an LWP within a process. The prcommon structure provides procfs clients with a common file abstraction of the underlying data
files within a specific directory (see Figure 8.17).
prnode prcommon

prnode prcommon

prnode prcommon



file objects






file objects

Figure 8.17 procfs Directory Hierarchy
Refer to /usr/include/sys/proc/prdata.h for definitions for the prnode and
prcommon structures.
Structure linkage is maintained at the proc structure and LWP level, which reference their respective /proc file vnodes. Every process links to its primary
/proc vnode (that is, the vnode that represents the /proc/ file), and each
LWP in the process links to the vnode that represents its
/proc//lwp/ file, as shown in Figure 8.18.

The Solaris Multithreaded Process Architecture




kthread LWP




back to
the proc




kthread LWP vnode



array of
to vnodes
for all files
within the


A multithreaded process






Figure 8.18 procfs Data Structures
Figure 8.18 provides a partial view of what the procfs data structures and related
links look like when a procfs file is opened for reading or writing. Note that all of
the vnodes associated with a process are linked through the pr_next pointer in
the prnode. When a reference is made to a procfs directory and underlying file
object, the kernel dynamically creates the necessary structures to service a client
request for file I/O. More succinctly, the procfs structures and links are created and
torn down dynamically. They are not created when the process is created (aside
from the procdir procfs directory entry and directory slot allocation). They
appear to be always present because the files are available whenever an open(2)
request is made or a lookup is done on a procfs directory or data file object. (It is
something like the light in your refrigerator—it’s always on when you look, but not
when the door is closed.)
The data made available through procfs is, of course, always present in the kernel proc structures and other data structures that, combined, form the complete
process model in the Solaris kernel. This model, shown in Figure 8.5 on page 270,
represents the true source of the data extracted and exported by procfs. By hiding
the low-level details of the kernel process model and abstracting the interesting
information and control channels in a relatively generic way, procfs provides a ser-

Procfs — The Process File System


vice to client programs interested in extracting bits of data about a process or
somehow controlling the execution flow. The abstractions are created when
requested and are maintained as long as necessary to support file access and
manipulation requests for a particular file.
File I/O operations through procfs follow the conventional methods of first opening a file to obtain a file descriptor, then performing subsequent read/write operations and closing the file when completed. The creation and initialization of the
prnode and prcommon structures occur when the procfs-specific vnode operations are entered through the vnode switch table mechanism as a result of a client (application program) request. The actual procfs vnode operations have
specific functions for the lookup and read operations on the directory and data files
within the /proc directory.
The implementation in procfs of lookup and read requests through an array of
function pointers that resolve to the procfs file-type-specific routine is accomplished through the use of a lookup table and corresponding lookup functions. The
file type is maintained at two levels. At the vnode level, procfs files are defined as
VPROC file types (v_type field in the vnode). The prnode includes a type field
(pr_type) that defines the specific procfs file type being described by the pnode.
The procfs file types correspond directly to the description of /proc files and directories that are listed at the beginning of this section (“Procfs — The Process File
System” on page 306). Examples include the various directory types (a process PID
directory, an LWPID directory, etc.) and data files (status, psinfo, address space,
The basic execution flow of a procfs file open is shown in Figure 8.19.
Specific procfs directory object
lookup functions are invoked
through the pr_lookup_function[]

code flow


VOP_LOOKUP() -> prlookup()
index based on type
Construct full path name,
looking up each element
in the path.


VOP_OPEN() -> propen()

Figure 8.19 procfs File Open
The function flow in Figure 8.19 starts at the application program level, where an
open(2) system call is issued on a procfs file. The vnode kernel layer is entered


The Solaris Multithreaded Process Architecture

(vn_open()), and a series of lookups is performed to construct the full path name
of the desired /proc file. Macros in the vnode layer invoke file-system-specific
operations. In this example, VOP_LOOKUP() will resolve to the procfs
pr_lookup() function. pr_lookup() will do an access permissions check and
vector to the appropriate procfs function based on the directory file type, for example, pr_lookup_piddir() to perform a lookup on a /proc/ directory. Each
of the pr_lookup_xxx() directory lookup functions does some directory-type-specific work and calls prgetnode() to fetch the prnode.
prgetnode() creates the prnode (which includes the embedded vnode) for the
/proc file and initializes several of the prnode and vnode fields. For /proc PID
and LWPID directories (/proc/, /proc//lwp/), the
prcommon structure is created, linked to the prnode, and partially initialized. Note
that for /proc directory files, the vnode type will be changed from VPROC (set initially) to VDIR, to correctly reflect the file type as a directory (it is a procfs directory, but a directory file nonetheless).
Once the path name is fully constructed, the VOP_OPEN() macro invokes the
file-system-specific open() function. The procfs propen() code does some additional prnode and vnode field initialization and file access testing for specific file
types. Once propen() completes, control is returned to vn_open() and ultimately a file descriptor representing a procfs file is returned to the caller.
The reading of a procfs data file object is similar in flow to the open scenario,
where the execution of a read system call on a procfs file will ultimately cause the
code to enter the procfs prread() function. The procfs implementation defines a
data-file-object-specific read function for each of the file objects (data structures)
available: pr_read_psinfo(), pr_read_pstatus(), pr_read_lwpsinfo(),
etc. The specific function is entered from prread() through an array of function
pointers indexed by the file type—the same method employed for the previously
described lookup operations.
The Solaris 7 implementation of procfs, where both 32-bit and 64-bit binary executables can run on a 64-bit kernel, provides 32-bit versions of the data files available in the /proc hierarchy. For each data structure that describes the contents of
a /proc file object, a 32-bit equivalent is available in a 64-bit Solaris 7 kernel (e.g.,
lwpstatus and lwpstatus32, psinfo and psinfo32). In addition to the 32-bit
structure definitions, each of the pr_read_xxx() functions has a 32-bit equivalent in the procfs kernel module, more precisely, a function that deals specifically
with the 32-bit data model of the calling program. Procfs users are not exposed to
the multiple data model implementation in the 64-bit kernel. When prread() is
entered, it checks the data model of the calling program and invokes the correct
function as required by the data model of the caller. An exception to this is a read
of the address space (/proc//as) file; the caller must be the same data
model. A 32-bit binary cannot read the as file of a 64-bit process. A 32-bit process
can read the as file of another 32-bit process running on a 64-bit kernel.
The pr_read_xxxx() functions essentially read the data from its original
source in the kernel and write it to the corresponding procfs data structure fields,

Procfs — The Process File System


thereby making the requested data available to the caller. For example,
pr_read_psinfo() will read data from the targeted process’s proc structure,
credentials structure, and address space (as) structure and will write it to the corresponding fields in the psinfo structure. Access to the kernel data required to
satisfy the client requests is synchronized with the proc structure’s mutex lock,
plock. This approach protects the per-process or LWP kernel data from being
accessed by more than one client thread at a time.
Writes to procfs files are much less frequent. Aside from writing to the directories to create data files on command, writes are predominantly to the process or
LWP control file (ctl) to issue control messages. Control messages (documented in
proc(1)) include stop/start messages, signal tracing and control, fault management, execution control (e.g., system call entry and exit stops), and address space
Note: We’ve discussed I/O operations on procfs files in terms of standard system calls because currently those calls are the only way to access the /proc
files from developer-written code. However, there exists a set of interfaces specific to procfs that are used by the proc(1) commands that ship with Solaris.
These interfaces are bundled into a libproc.so library and are not currently
documented or available for public use. The libproc.so library is included in
the /usr/lib distribution in Solaris 7, but the interfaces are evolving and not
yet documented. Plans are under way to document these libproc.so interfaces and make them available as a standard part of the Solaris APIs.
The interface layering of the kernel procfs module functions covered in the previous pages is shown in Figure 8.20.

custom /proc code


stdio interfaces


system calls


vnode layer

Figure 8.20 procfs Interface Layers
The diagram in Figure 8.20 shows more than one path into the procfs kernel
routines. Typical developer-written code makes use of the shorter system call path,
passing through the vnode layer as previously described. The proc(1) command is


The Solaris Multithreaded Process Architecture

built largely on the libproc.so interfaces. The need for a set of library-level
interfaces specific to procfs is twofold: an easy-to-use set of routines for code development reduces the complexity of using a powerful kernel facility; the complexity
in controlling the execution of a process, especially a multithreaded process,
requires a layer of code that really belongs at the application programming interface (as opposed to kernel) level.
The developer controls a process by writing an operation code and (optional)
operand to the first 8 bytes of the control file (or 16 bytes if it’s an LP64 kernel).
The control file write path is also through the vnode layer and ultimately enters
the procfs prwritectl() function. The implementation allows multiple control
messages (operations and operands) to be sent to the control file in a single write.
The prwritectl() code breaks multiple messages into distinct operation/operand pairs and passes them to the kernel pr_control() function, where the
appropriate flags are set at the process or LWP level as a notification that a control mechanism has been injected (e.g., a stop on an event).
Table 8-3 lists the possible control messages (operations) that are currently
implemented. We include them here to provide context for the subsequent descriptions of control functions, as well as to illustrate the power of procfs. See also the
proc(1) manual page and /usr/include/sys/procfs.h.
Table 8-3 procfs Control Messages

timeout value



Requests process or LWP to stop; waits for stop.
Requests process or LWP to stop.
Waits for the process or LWP to stop. No timeout
Waits for stop, with millisecond timeout arg.
Sets process or LWP runnable. Long arg can specify
clearing of signals or faults, setting single step mode,
Clears current signal from LWP.
Clears current fault from LWP.
Sets current signal from siginfo_t.
Posts a signal to process or LWP.
Deletes a pending signal from the process or LWP.
Sets LWP signal mask from arg.
Sets traced signal set from arg.
Sets traced fault set from arg.
Sets tracing of system calls (on entry) from arg.
Sets tracing of system calls (on exit) from arg.
Sets mode(s) in process/LWP

Procfs — The Process File System


Table 8-3 procfs Control Messages (Continued)
PCSFPREG prfpregset_

Unsets mode(s) in process/LWP
Sets LWP’s general registers from arg.
Sets LWP’s floating-point registers from arg.
Sets LWP’s extra registers from arg.
Sets nice value from arg.
Sets PC (program counter) to virtual address in arg.
Sets or clears watched memory area from arg.
Creates agent LWP with register values from arg.
Reads from the process address space through arg.
Writes to process address space through arg.
Sets process credentials from arg.
Sets ancillary state registers from arg.

As you can see from the variety of control messages provided, the implementation
of process/LWP control is tightly integrated with the kernel process/LWP subsystem. Various fields in the process, user (uarea), LWP, and kernel thread structures facilitate process management and control with procfs. Establishing process
control involves setting flags and bit mask fields to track events that cause a process or thread to enter or exit the kernel. These events are signals, system calls,
and fault conditions. The entry and exit points for these events are well defined
and thus provide a natural inflection point for control mechanisms.
The system call, signals, and faults are set through the use of a set data type,
where sigset_t, sysset_t, and fltset_t operands have values set by the calling (controlling) program to specify the signal, system call, or fault condition of
interest. A stop on a system call entry occurs when the kernel is first entered (the
system call trap), before the argument list for the system call is read from the process. System call exit stops have the process stop after the return value from the
system call has been saved. Fault stops also occur when the kernel is first entered;
fault conditions generate traps, which force the code into a kernel trap handler.
Signal stops are tested for at all the points where a signal is detected, on a return
from a system call or trap, and on a wakeup (see “Signals” on page 324).
Address space watch directives allow a controlling process to specify a virtual
address, range (in bytes), and access type (e.g., read or write access) for a segment
of a process’s virtual address space. When a watched event occurs, a watchpoint
trap is generated, which typically causes the process or LWP to stop, either
through a trace of a FLTWATCH fault or by an unblocked SIGTRAP signal.
In some cases, the extraction of process information and process control requires
the controlling process to have the target process perform specific instructions on