Tải bản đầy đủ
Figure 10.8 door_call() Flow with Shuttle Switching

Figure 10.8 door_call() Flow with Shuttle Switching

Tải bản đầy đủ

Solaris Doors

477

• The kernel door_return() code copies the return data back to the caller
and places the server thread back in the door server pool. The calling (client)
thread, which we left in a sleep state back in door_call(), is set back to an
T_ONPROC state, and the shuttle code (shuttle_resume()) is called to give
the processor back to the caller and have it resume execution.
Some final points to make regarding doors. There’s a fair amount of code in the
kernel doorfs module designed to deal with error conditions and the premature
termination of the calling thread or server thread. In general, if the calling thread
is awakened early, that is, before door_call() has completed, the code figures
out why the wakeup occurred (signal, exit call, etc.) and sends a cancel signal
(SIGCANCEL) to the server thread. If a server thread is interrupted because of a
signal, exit, error condition, etc., the door_call() code bails out. In the client, an
EINTR (interrupted system call) error will be set, signifying that door_call()
terminated prematurely.

478

Interprocess Communication

Part Four
FILES AND FILE
SYSTEMS







Files and File I/O
File System Overview
File System Framework
The UFS File System
File System Caching

479

480

Stack
read() fwrite()
aioread()
mmap()
write()
fread()
aiowrite()

Stdio Libraries Async IO Library
/dev/rdsk/...
/dev/rdsk/...

Heap
read / write

Special
File Sys.
(specfs)

Application
Memory
(HEAP)

Regular File Systems
( ufs, vxfs, pcfs etc )

NFS Server

Directory Structures

seg_drv

Binary (Data)
Binary (Text)

aio

Directory Name
Lookup Cache

File System Blocks

VNODE Segment
driver (seg_vn)

File Segment
driver (seg_map)

Meta Data (Inode)
Cache

direct
I/O
Paged VNODE VM Core
(File System Cache & Page Cache)

Old Buffer Cache
(BUFHWM)

Block I/O Interface

Volume Manager Layer - Disk Suite, VxVM

SCSI/Device Driver Framework
Data
Disks

The Solaris File I/O System

Swap Disk

11
SOLARIS FILES AND
FILE I/O
F

rom its inception, Unix has been built around two fundamental entities: processes and files. Everything that is executed on the system is a process, and all process I/O is done to a file. We saw in previous chapters how the process model has
evolved and how the kernel thread is the unit of execution in the Solaris kernel.
The implementation of files and file I/O facilities has also seen some changes since
the early versions of UNIX. The notion of a file now includes more abstract types,
and the interfaces available for doing file I/O have expanded.
In this chapter, we look at the implementation of files in Solaris and discuss
some of the abstract file types and the file I/O facilities.

11.1 Files in Solaris
Generically defined, a file is an entity that stores data as an array of bytes, beginning at byte zero and extending to the end of the file. The contents of the file (the
data) can take any number of forms: a simple text file, a binary executable file, a
directory file, etc. Solaris supports many types of files, several of which are defined
at the kernel level, meaning that some component of the kernel has intimate
knowledge of the file’s format by virtue of the file type. An example is a directory
file on a UFS file system—directory files have a specific format that is known to
the UFS kernel routines designed for directory I/O.
The number of file types in the kernel has increased over the last several years
with the addition new kernel abstractions in the form of pseudofiles. Pseudofiles
481

482

Solaris Files and File I/O

provide a means by which the kernel can abstract as a file a binary object, like a
data structure in memory. Users and programmers view the object as a file, in that
the traditional file I/O operations are supported on it (for the most part). It’s a
pseudofile because it is not an on-disk file; it’s not a real file in the traditional
sense.
Under the covers, the operations performed on the object are managed by the
file system on which the file resides. A specific file type often belongs to an underlying file system that manages the storage and retrieval of the file and defines the
kernel functions for I/O and control operations on the file. ( See Chapter 14, “The
Unix File System”," for details about file systems.) Table 11-1 lists the various
types of files implemented in Solaris.
Table 11-1 Solaris File Types
File Type
Regular

Directory

Symbolic
Link
Character
Special

Block Special
Named Pipe
(FIFO)
Door

File
Character
Description
System Designation
UFS

A traditional on-disk file. Can be a
text file, binary shared object, or
executable file.
UFS
d
A file that stores the names of other
files and directories. Other file systems can implement directories
within their own file hierarchy.
UFS
l
A file that represents a link to
another file, potentially in another
directory or on another file system.
specfs
c
A device special file for devices
capable of character mode I/O.
Device files represent I/O devices
on the system and provide a means
of indexing into the device driver
and uniquely identifying a specific
device.
specfs
b
As above, a device special file for
devices capable of block-mode I/O,
such as disk and tape devices
fifofs
p
A file that provides a bidirectional
communication path between processes running on the same system.
doorfs
D
Part of the door interprocess communication facility. Doors provide a
means of doing very fast interprocess procedure calling and message
and data passing.

Files in Solaris

483

Table 11-1 Solaris File Types (Continued)
File Type
Socket

File
Character
Description
System Designation
sockfs
s
A communication endpoint for network I/O, typically used for TCP or
UDP connections between processes on different systems. UNIX
domain sockets are also supported
for interprocess communication
between processes on the same system. The “s” character designation
appears only for AF_UNIX sockets.

The character designation column in Table 11-1 refers to the character produced in
the lefthand column of an ls -l command. When a long file listing is executed, a
single character designates the type of file in the listing.
Within a process, a file is identified by a file descriptor: an integer value
returned to the process by the kernel when a file is opened. An exception is made if
the standard I/O interfaces are used. In that case, the file is represented in the
process as a pointer to a FILE structure, and the file descriptor is embedded in the
FILE structure. The file descriptor references an array of per-process file entry
(uf_entry) structures, which form the list of open files within the process. These
per-process file entries link to a file structure, which is a kernel structure that
maintains specific status information about the file on behalf of the process that
has the file opened. If a specific file is opened by multiple processes, the kernel
maintains a file structure for each process; that is, the same file may have multiple file structures referencing it. The primary reason for this behavior is to maintain a per-process read/write file pointer for the file, since different processes may
be reading different segments of the same file.
The kernel implements a virtual file abstraction in the form of a vnode, where
every opened file in Solaris is represented by a vnode in the kernel. A given file
has but one vnode that represents it in the kernel, regardless of the number of
processes that have the file opened. The vnode implementation is discussed in
detail in “The vnode” on page 543. In this discussion, we allude to the vnode and
other file-specific structures as needed for clarity.
Beyond the vnode virtual file abstraction, a file-type-specific structure describes
the file. The structure is implemented as part of the file system on which the file
resides. For example, files on the default Unix File System (UFS) are described by
an inode that is linked to the v_data pointer of the vnode.
Figure 11.1 illustrates the relationships of the various file-related components,
providing a path from the file descriptor to the actual file. The figure shows how a
file is viewed at various levels. Within a process, a file is referenced as a file
descriptor. The file descriptor indexes the per-process u_flist array of uf_entry

484

Solaris Files and File I/O

structures, which link to the kernel file structure. The file is abstracted in the kernel as a virtual file through the vnode, which links to the file-specific structures
(based on the file type) through the v_data pointer in the vnode.

inode
u_flist

file
descriptor
uf_entry

{

uf_ofile
uf_pofile
uf_refcnt
uf_ofile
uf_pofile
uf_refcnt
uf_ofile
uf_pofile
uf_refcnt

disk blocks

f_vnode
f_offset
f_cred
f_count

v_flags
v_type
v_count
v_data

kernel file
structures

kernel
vnodes

rnode
network
file
pnode
kernel
object

.
.
.

process

Process view of an opened file

Virtual file abstraction

File-type-specific
structures and links

Figure 11.1 File-Related Structures
The process-level uf_entry structures are allocated dynamically in groups of 24
as files are opened, up to the per-process open file limit. The uf_entry structure
contains a pointer to the file structure (uf_ofile) and a uf_pofile flag field
used by the kernel to maintain file state information. The possible flags are FRESERVED, to indicate that the slot has been allocated, FCLOSING, to indicate that a
file-close is in progress, and FCLOSEXEC, a user-settable close-on-exec flag, which
instructs the kernel to close the file descriptor if an exec(2) call is executed.
uf_entry also maintains a reference count in the uf_refcnt member. This count
provides a means of tracking multiple references to the file in multithreaded processes.
The kernel establishes a default hard and soft limit for the number of files a
process can have opened at any time. rlim_fd_max is the hard limit, and
rlim_fd_cur is the current limit (or soft limit). A process can have up to
rlim_fd_cur file descriptors and can increase the number up to rlim_fd_max.
You can set these parameters systemwide by placing entries in the /etc/system
file:
set rlim_fd_max=8192
set rlim_fd_cur=1024

Files in Solaris

485

You can alter the per-process limits either directly from the command line with the
limit(1) or ulimit(1) shell commands or programmatically with setrlimit(2).
The actual number of open files that a process can maintain is driven largely by
the file APIs used. For 32-bit systems, if the stdio(3S) interfaces are used, the
limit is 256 open files. This limit results from the data type used in the FILE structure for the actual file descriptor. An unsigned 8-bit data type, which has a range
of values of 0–255, is used. Thus, the maximum number of file descriptors is limited to 256 for 32-bit stdio(3S)-based programs. For 64-bit systems (and 64-bit
processes), the stdio(3S) limit is 64 Kbytes.
The select(3C) interface, which provides a mechanism for file polling, imposes
another API limit. select(3C) limits the number of open files to 1 Kbyte on 32-bit
systems, with the exception of 32-bit Solaris 7. In 32-bit Solaris 7, select(3C) can
poll up to 64-Kbyte file descriptors. If you use file descriptors greater than 1-Kbyte
with select(3C) on 32-bit Solaris 7, then you must declare FD_SETSIZE in the
program code. On 64-bit Solaris 7, a 64-bit process has a default file descriptor set
size (FD_SETSIZE) of 64 Kbytes. Table 11-2 summarizes file descriptor limitations.
Table 11-2 File Descriptor Limits
Interface (API)
Limit
stdio(3S)
256
stdio(3S)
64K
(65536)
select(3C)
1K
(1024)
select(3C)
64K
(65536)

select(3C)

64K
(65536)

Notes
All 32-bit systems.
64-bit programs only (Solaris 7 and
later).
All 32-bit systems. Default value for
32-bit Solaris 7.
Attainable value on 32-bit Solaris 7.
Requires you to add:
#define FD_SETSIZE 65536
to program code before inclusion on additional system header files.
Default for 64-bit Solaris 7 (and beyond).

Those limitations aside, there remain only the practical limits that govern the
number of files that can be opened on a per-process and systemwide basis. A practical limit from a per-process perspective really comes down to two things: how the
application software is designed; and what constitutes a manageable number of
file descriptors within a single process, such that the maintenance, performance,
portability, and availability requirements of the software can be met. The file
descriptors and uf_entry structures do not require a significant amount of memory space, even in large numbers, so per-process address space limitations are typically not an issue when it comes to the number of open files.

486

Solaris Files and File I/O

11.1.1 Kernel File Structures
The Solaris kernel does not implement a system file table in the traditional sense.
That is, the systemwide list of file structures is not maintained in an array or as a
linked list. A kernel object cache segment is allocated to hold file structures, and
they are simply allocated and linked to the process and vnode as files are created
and opened.
We can see in Figure 11.1 that each process uses file descriptors to reference a
file. The file descriptors ultimately link to the kernel file structure, defined as a
file_t data type, shown below.
typedef struct file {
kmutex_t
ushort_t
ushort_t
struct vnode
offset_t
struct cred
caddr_t
int
} file_t;

f_tlock;
f_flag;
f_pad;
*f_vnode;
f_offset;
*f_cred;
f_audit_data;
f_count;

/* short-term lock */
/*
/*
/*
/*
/*
/*

Explicit pad to 4-byte boundary */
pointer to vnode structure */
read/write character pointer */
credentials of user who opened it */
file audit data */
reference count */

Header File
The fields maintained in the file structure are, for the most part, self-explanatory.
The f_tlock kernel mutex lock protects the various structure members. These
include the f_count reference count, which lists how many threads have the file
opened, and the f_flag file flags, described in “File Open Modes and File Descriptor Flags” on page 495.
Solaris allocates file structures for opened files as needed, growing the open file
count dynamically to meet the requirements of the system load. Therefore, the
maximum number of files that can be opened systemwide at any time is limited by
available kernel address space, and nothing more. The actual size to which the
kernel can grow depends on the hardware architecture of the system and the
Solaris version the system is running. The key point is that a fixed kernel limit on
a maximum number of file structures does not exist.
The system initializes space for file structures during startup by calling
file_cache(), a routine in the kernel memory allocator code that creates a kernel object cache. The initial allocation simply sets up the file_cache pointer with
space for one file structure. However, the kernel will have allocated several file
structures by the time the system has completed the boot process and is available
for users, as all of the system processes that get started have some opened files. As
files are opened/created, the system either reuses a freed cache object for the file

Files in Solaris

487

entry or creates a new one if needed. You can use /etc/crash as root to examine
the file structures.
# crash
dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout
> file
ADDRESS
RCNT
TYPE/ADDR
OFFSET
FLAGS
3000009e008
1
FIFO/300009027e0
0
read write
3000009e040
1
UFS /3000117dc68
535
write appen
3000009e078
1
SPEC/300008ed698
3216
write appen
3000009e0b0
1
UFS /300010d8c98
0
write
3000009e0e8
1
UFS /30001047ca0
4
read write
3000009e120
2
DOOR/30000929348
0
read write
3000009e158
1
SPEC/30000fb45d0
0
read
3000009e1c8
1
UFS /300014c6c98
106
read write
3000009e200
1
SPEC/30000c376a0
0
write
3000009e238
2
DOOR/30000929298
0
read write
3000009e270
3
SPEC/300008ecf18
0
read
3000009e2a8
1
UFS /30000f5e0f0
0
read
3000009e2e0
1
SPEC/30000fb46c0
0
read write
3000009e318
1
UFS /300001f9dd0
0
read
3000009e350
1
FIFO/30000902c80
0
read write

The ADDRESS column is the kernel virtual memory address of the file structure.
RCNT is the reference count field (f_count). TYPE is the type of file, and ADDR is
the kernel virtual address of the vnode. OFFSET is the current file pointer, and
FLAGS are the flags bits currently set for the file.
You can use sar(1M) for a quick look at how many files are opened systemwide.
# sar -v 3 3
SunOS devhome 5.7 Generic sun4u
11:38:09
11:38:12
11:38:15
11:38:18

proc-sz
100/5930
100/5930
101/5930

ov inod-sz
0 37181/37181
0 37181/37181
0 37181/37181

08/01/99
ov
0
0
0

file-sz
603/603
603/603
607/607

ov
0
0
0

lock-sz
0/0
0/0
0/0

This example shows 603 opened files. The format of the sar output is a holdover
from the early days of static tables, which is why it is displayed as 603/603. Originally, the value on the left represented the current number of occupied table slots,
and the value on the right represented the maximum number of slots. Since file
structure allocation is completely dynamic in nature, both values will always be
the same.