Tải bản đầy đủ
Figure 15.4 In-Memory Inodes (Referred to as the “Inode Cache”)

Figure 15.4 In-Memory Inodes (Referred to as the “Inode Cache”)

Tải bản đầy đủ

Inode Caches

619

looking at the buf_inuse and buf_total parameters in the inode kernel memory statistics.
# sar -v 3 3
SunOS devhome 5.7 Generic sun4u
11:38:09
11:38:12
11:38:15
11:38:18

proc-sz
100/5930
100/5930
101/5930

08/01/99

ov inod-sz
ov file-sz
0 37181/37181
0 603/603
0 37181/37181
0 603/603
0 37181/37181
0 607/607

ov
0
0
0

lock-sz
0/0
0/0
0/0

# netstat -k ufs_inode_cache
ufs_inode_cache:
buf_size 440 align 8 chunk_size 440 slab_size 8192 alloc 1221573 alloc_fail 0
free 1188468 depot_alloc 19957 depot_free 21230 depot_contention 18 global_alloc 48330
global_free 7823 buf_constructed 3325 buf_avail 3678 buf_inuse 37182
buf_total 40860 buf_max 40860 slab_create 2270 slab_destroy 0 memory_class 0
hash_size 0 hash_lookup_depth 0 hash_rescale 0 full_magazines 219
empty_magazines 332 magazine_size 15 alloc_from_cpu0 579706 free_to_cpu0 588106
buf_avail_cpu0 15 alloc_from_cpu1 573580 free_to_cpu1 571309 buf_avail_cpu1 25

The inode memory statistics show us how many inodes are allocated by the
buf_inuse field. We can also see from the ufs inode memory statistics that the
size of each inode is 440 bytes. We can use this value to calculate the amount of
kernel memory required for desired number of inodes when setting ufs_ninode
and the directory name cache size.
The ufs_ninode parameter controls the size of the hash table that is used to
look up inodes and indirectly sizes the inode idle queue (ufs_ninode / 4). The
inode hash table is ideally sized to match the total number of inodes expected to be
in memory—a number that is influenced by the size of the directory name cache.
By default, ufs_ninode is set to the size of the directory name cache, which provides approximately the correct size for the inode hash table. In an ideal world, we
could set ufs_ninode to four-thirds the size of the DNLC, to take into account the
size of the idle queue, but practice has shown this unnecessary.
We typically set ufs_ninode indirectly by setting the directory name cache size
(ncsize) to the expected number of files accessed concurrently, but it is possible to
set ufs_ninode separately in /etc/system.
* Set number of inodes stored in UFS inode cache
*
set ufs_ninode = new_value

620

Solaris File System Cache

15.7.2 VxFS Inode Cache
The Veritas file system uses a similar parameter, vxfs_ninode, to control the size
of the inode cache. It also attempts to keep one-fourth of the vxfs_ninode parameter number of inodes on the inode idle queue.
* Set number of inodes stored in VxFS inode cache
*
set vxfs:vxfs_ninode = new_value

A

KERNEL TUNABLES,
SWITCHES, AND LIMITS

In this appendix, we provide several tables showing the various kernel settable
parameters. The variables listed here do not represent every kernel variable that
can be altered. Almost any kernel variable that is visible to the kernel linker can
be altered with an entry in the /etc/system file or with a debugger like adb(1).
Certainly, it was never intended that each and every kernel variable, along with its
meaning and possible values, be documented as user settable. Most have never
been intended for public use, but rather exist for debugging or experimentation.
Here we list what we consider the mainstream variables—those that have always
been intended to be user settable—and several that are not as well known but that
have proved to be useful for some installations.

A.1 Setting Kernel Parameters
You establish settable kernel tunable parameters by adding an entry to the
/etc/system file, in the form of:
set parameter = value
or
set kernel_module:parameter = value
The second example applies to those kernel variables that are part of a loadable
kernel module, where the module name is separated by a colon from the variable

621

622

Kernel Tunables, Switches, and Limits

name in the entry. The values in the /etc/system file are read at boot time, so
any changes made to this file require a reboot to take effect.
These settable kernel variables are traditionally referred to as kernel tunable
parameters. The settable kernel variables can be more accurately categorized into
one of three groups:
• Switches — Refers to a kernel parameter that simply turns on or off a particular behavior or functional component, which, of course, affects system
behavior and performance. An example of a switch is the priority_paging
parameter, which is either on (value of 1) or off (value of 0).
• Limits — Refers to kernel variables that impose hard limits on a particular
resource. The System V IPC tunables fall into the limit category. Several others do as well.
• Tunables — Refers to kernel variables that will alter performance or behavior. Think of these as a tuning knob that has a range of values (0 to N, where
N represents that maximum allowable value).
Kernel parameters can be further divided into those parameters that are set on
typical installations and impose minimal risk, and those that are less well known
and not well understood. Changing the value of any kernel parameter imposes
some level of risk. However, many of the kernel limit parameters, such as those set
for System V IPC resources, are set on many installations and are generally well
understood. Others can alter system behavior and performance, and sometimes it
is not easy (or even possible) to predict which direction performance will move in
(better or worse) as a result of changing a particular value.
In the tables that follow, we list the various kernel settable parameters, indicating their category (switch, limit, tunable) and whether or not we believe that the
parameter is something that may impact system behavior in an unpredictable way,
where such a warning is applicable. We also provide a reference to the page number in the book where more information about the kernel variable can be found.
As a practice, you should never change a kernel settable parameter in a
production system without first trying the value in a lab environment and
then testing extensively.

A.2 System V IPC - Shared Memory Parameters
Table A-1 describes shared memory parameters. For more information, refer to
“System V Shared Memory” on page 433.

System V IPC - Shared Memory Parameters

623

Table A-1 System V IPC - Shared Memory
Parameter

Default

Category

shmmax

1048576

Limit

shmmin

1

Limit

shmmni

100

Limit

shmseg

6

Limit

segspt_minfree

5% of available
memory

Limit

Description / Notes
System V IPC shared memory. Maximum
shared memory segment size, in bytes.
System V IPC shared memory. Minimum
shared memory segment size, in bytes.
System V IPC shared memory. Maximum
number of shared memory segments, systemwide.
System V IPC shared memory. Maximum
number of shared segments, per process.
Number of pages of physical memory not
available for allocation as ISM shared segments. Default value translates to allowing
up to 95% of available memory get allocated
to ISM shared segments.

Table A-2 lists System V IPC semaphores. For more information, refer to “System
V Semaphores” on page 444.
Table A-2 System V IPC - Semaphores
Parameter

Default

Category

semmap
semmni

10
10

Limit
Limit

semmns

60

Limit

semmnu

30

Limit

semmsl

25

Limit

semopm

10

Limit

semume
semvmx
semaem

10
32767
16384

Limit
Limit
Limit

Description / Notes
Size of the semaphore map.
Maximum number of semaphore identifiers,
systemwide.
Maximum number of semaphores, systemwide. Should be the product of semmni and
semmsl.
Maximum number of semaphore undo structures, systemwide.
Maximum number of semaphores per semaphore ID.
Maximum number of semaphore operations
per semop() call.
Maximum per-process undo structures.
Maximum value of a semaphore.
Maximum adjust-on-exit value.

Table A-3 describes message queues. For more information, refer to “System V
Message Queues” on page 451.

624

Kernel Tunables, Switches, and Limits

Table A-3 System V IPC - Message Queues
Parameter

Default

Category

msgmap
msgmax
msgmnb

100
2048
4096

Limit
Limit
Limit

msgmni

50

Limit

msgssz
msgtql

8
40

Limit
Limit

msgseg

1024

Limit

Description/Notes
Maximum size of resource map for messages.
Maximum size, in bytes, of a message.
Maximum number of bytes on a message
queue.
Maximum number message queue identifiers, systemwide.
Message segment size.
Maximum number of message headers, systemwide.
Maximum number of message segments.

A.3 Virtual Memory Parameters
Table A-4 lists parameters that relate to the virtual memory system and paging
activity. Such activity is closely tied to file system I/O because of the buffer caching done by file systems such as UFS and VxFS.
You can read more about the memory paging parameters in “Summary of Page
Scanner Parameters” on page 186, “Solaris File System Cache” on page 601, “Page
Cache and Virtual Memory System” on page 605, and “In summary, we have seen
the a strong relationship between the VM system and file system behavior. The
parameters that control the paging system have the most influence on file system
performance, since they govern the way pages are made available to the file system. Figure 15.3 depicts the paging parameters that affect file systems and the
memory parameters that control paging as the amount of free memory falls to the
point where it hits these parameters.” on page 612.
Table A-4 Virtual Memory
Parameter
fastscan

slowscan
lotsfree

Default

Category

1/4th of physi- Tunable
cal memory, or
64 MB, whichever is larger.
100
Tunable
1/64th of physi- Tunable
cal memory, or
512 KB, whichever is larger.

Description/Notes
The maximum number of pages per second
the page scanner will scan.

Initial page scan rate, in pages per second.
Desired size of the memory free list (number
of free pages). When the free list drops below
lotsfree, the page scanner runs.

Virtual Memory Parameters

625

Table A-4 Virtual Memory
Parameter

Default

Category

desfree

lotsfree / 2.

Tunable

minfree

desfree / 2.

Tunable

throttlefree

minfree

Tunable

pageout_reserve

throttlefree / 2

Tunable

priority_paging

0

Switch

cachefree

*

Tunable

Description/Notes
Free memory desperation threshold. When
freemem drops below desfree, the page
scan rate increases and the system will alter
its default behavior for specific events. desfree must be less than lotsfree.
Minimum acceptable amount of free memory. minfree must be less than desfree.
Memory threshold at which the kernel will
block memory allocation requests. Must be
less than minfree.
Memory pages reserved for pageout and
memory scheduler threads. When freemem
drops below pageout_reserve, memory
allocations are denied for anything other
than pageout and sched.
Enables priority paging when set to 1. Priority paging relieves memory pressure on executable pages due to cached file system
activity. Priority paging is available for
Solaris 2.6 with kernel jumbo patch
105181-10 or greater. It is in Solaris 7.
The memory page threshold that triggers the
priority paging behavior, where file system
cache pages are marked for pageout only as
long as freemem is below cachefree but
above lotsfree. cachefree must be
greater than lotsfree.
* If priority_paging is 0, then cachefree = lotsfree. If priority_paging is
1 (enabled), then cachefree = (lotsfree *
2)

pages_pp_maximum

*

Limit

tune_t_minarmem

25

Limit

min_percent_cpu

4

Tunable

Number of pages the system requires remain
unlocked.
* 200, or tune_t_minarmem, or 10% of
available memory, whichever is greater.
Minimum number of memory pages reserved
for the kernel. A safeguard to ensure that a
minimum amount of nonswappable memory
is available to the kernel.
Minimum percentage of CPU time pageout
can consume.

626

Kernel Tunables, Switches, and Limits

Table A-4 Virtual Memory
Parameter
handspreadpages

Default
fastscan

pages_before_pager 200

maxpgio

40

Category
Tunable
Tunable

Tunable

Description/Notes
Number of pages between the first and second hand of the page scanner.
Used in conjunction with lotsfree to establish the point at which the kernel will free
file system pages after an I/O. If available
memory is less than lotsfree +
pages_before_pager, then the kernel will
free pages after an I/O (rather than keep
them in the page cache for reuse).
Maximum number of pageout operations per
second the kernel will schedule. Set to 100
times the number of disks with swap files or
swap partitions.

A.4 File System Parameters
The file system and page flushing parameters in Table A-5 provide for tuning file
system performance and manage the flushing of dirty pages from memory to disk.
You can read more about the fsflush and associated parameters in “Bypassing
the Page Cache with Direct I/O” on page 614. “Directory Name Cache” on page 615
and “Inode Caches” on page 617 have more information on the file system parameters directory and inode caches.
Table A-5 File System and Page Flushing Parameters
Parameter

Default

Category

tune_t_flushr

5

Tunable

autoup

30

Tunable

dopageflush

1

Switch

doiflush

1

Switch

Description/Notes
fsflush interval; the fsflush daemon runs
every tune_t_flushr seconds.
Age in seconds of dirty pages. Used in conjunction with tune_t_fsflushr; modified
pages that are older than autoup are written to disk.
When set, enables dirty page flushing by
fsflush. Can be set to zero to disable page
flushing.
Flag to control flushing of inode cache during fsflush syncs. Set to 0 to disable inode
cache flushing.

File System Parameters

627

Table A-5 File System and Page Flushing Parameters
Parameter
ncsize

Default
*

Category
Limit

Description/Notes
Size of the directory name lookup cache
(DNLC); a kernel cache that caches path
names for vnodes for UFS and NFS files.
* ncsize defaults to (17 * maxusers) + 90 on
2.5.1, and (68 * maxusers) + 360 on Solaris
2.6 and 7.
Maximum amount of memory (in Kbytes)
allocated to the I/O buffer cache, which
caches file system inodes, superblocks, indirect blocks, and directories.
Number of UFS quota structures to allocate.
Applies only if quotas are enabled for UFS.

bufhwm

2% of physical
memory

Limit

ndquot

Limit

Limit

Maximum physical I/O size, in bytes. For
some devices, the maximum physical I/O size
is set dynamically when the driver loads.

ufs_ninode
ufs:ufs_WRITES

((maxusers *
40) / 4) +
max_nprocs
126976 (sun4m
and sun4d),
131072 (sun4u),
57344 (x86)
ncsize
1

Limit
Switch

ufs:ufs_LW

256 Kbytes

Tunable

ufs:ufs_HW

384 Kbytes

Tunable

nrnode

ncsize

Limit

tmpfs_maxkmem

Set dynamically when
tmpfs is first
used.

Limit

tmpfs_minfree

256 pages

Limit

Number of inodes to cache in memory.
Enables UFS per-file write throttle. See
below.
UFS write throttle low watermark. See
below.
UFS write throttle high watermark. If the
number of outstanding bytes to be written to
a file exceeds ufs_HW, then writes are
deferred until ufs_LW or less is pending.
Maximum number of rnodes allocated.
rnodes apply to NFS files, and are the NFS
equivalent of a UFS inode.
Maximum amount of kernel memory for
tmpfs data structures. The value is set the
first time tmpfs is used, to a range somewhere between the memory page size of the
platform, to 25% of the amount of available
kernel memory.
Minimum amount of swap space tmpfs will
leave for non-tmpfs use (i.e., the rest of the
system).

maxphys

628

Kernel Tunables, Switches, and Limits

Table A-6 lists parameters related to swapfs—the pseudofile that is a key component in the management of kernel anonymous memory pages. These parameters are generally not changed from their defaults.
Table A-6 Swapfs Parameters
Parameter
swapfs_reserve

swapfs_minfree

Default
4 MB or 1/16th
of memory,
whichever is
smaller
2 MB or 1/8 of
physical memory, whichever
is larger.

Category

Description/Notes

Limit

The amount of swap reserved for system processes. Those processes owned by root (UID
0).

Limit

Amount of memory the kernel keeps available for the rest of the systems (all processes).

A.5 Miscelaneous Parameters
Table A-7 lists miscellaneous kernel parameters. You can read more about many of
the tunable parameters in “Kernel Bootstrap and Initialization” on page 107.

Table A-7 Miscellaneous Parameters
Parameter

Default

Category

maxusers

MB of RAM

Limit

ngroups_max

16

Limit

npty

48

Limit

pt_cnt

48

Limit

rstchown

1

Switch

rlim_fd_cur
rlim_fd_max

64
1024

Limit
Limit

Description/Notes
Generic tunable for sizing various kernel
resources.
Maximum number of supplementary groups
a user can belong to.
Number of pseudodevices, /dev/pts
slave devices, and /dev/pty controller
devices.
Number of pseudodevices, /dev/pts slave
devices, and /dev/ptm master devices.
Enables POSIX_CHOWN_RESTRICTED behavior. Only a root process can change file ownership. A process must be a current member
of the group to which it wishes to change a
files group, unless it is root.
Maximum per-process open files.
Per process open files hard limit.
rlim_fd_cur can never be larger than
rlim_fd_max.

Miscelaneous Parameters

629

Table A-7 Miscellaneous Parameters
Parameter

Default

number of
pages of RAM
kobj_map_space_len 1 MB
physmem

Category
Limit
Limit

kmem_flags

0

Switch

kmem_debug_enable

0

Switch

moddebug

0

Switch

timer_max

32

Limit

consistent_
coloring

0

Switch

Description/Notes
Can be set to reduce the effective amount of
usable physical memory. Values are in pages.
Amount of kernel memory allocated to store
symbol table information. In Solaris 2.6, it
defines to total space for the kernel symbol
table. In Solaris 7, space is dynamically allocated as needed, in units of
kobj_map_space_len.
Solaris 2.6 and later. Enable some level of
debug of kernel memory allocation. Values:
0x1 – AUDIT: maintain an activity audit log.
0x2 – TEST: Allocator tests memory prior to
allocation.
0x3 – REDZONE: Allocator adds extra memory to the end of an allocated buffer, and
tests to determine if the extra memory was
written into when the buffer is freed.
0x4 – CONTENTS: Logs up to 256 bytes of
buffer contents when buffer is freed.
Requires AUDIT also be set.
Kernel memory allocator debug flag. Allows
kma debug information for any or all kmem
caches. Value of −1 in all caches. Solaris 2.6
and 7 only. Removed in Solaris 7, 3/99.
Turn on kernel module debugging messages.
The many possible values for moddebug can
be found in /usr/include/sys/modctl.h.
Some useful values are:
0x80000000 – print loading/unloading messages.
0x40000000 – print detailed error messages.
Number of POSIX timers (timer_create(2)
system call) available.
sun4u (UltraSPARC) only. Establishes the
page placement policy for physical pages and
L2 cache blocks. Possible values are:
0 – page coloring
1 – virtual address = physical address
2 – bin-hopping