Tải bản đầy đủ
Chapter 20. More Devices and Drivers

Chapter 20. More Devices and Drivers

Tải bản đầy đủ

ECC Reporting
Several memory controllers contain special silicon to measure the fidelity of stored data using error correcting
codes (ECCs). The Error Detection And Correction (EDAC) driver subsystem announces occurrences of memory
error events generated by ECC-aware memory controllers. Typical ECC DRAM chips have the capability to
correct single-bit errors (SBEs) and detect multibit errors (MBEs). In EDAC parlance, the former errors are
correctable errors (CEs), whereas the latter are uncorrectable errors (UEs).
ECC operations are transparent to the operating system. This means that if your DRAM controller supports ECC,
error correction and detection occurs silently without operating system participation. EDAC's task is to report
such events and allow users to fashion error handling policies (such as replace a suspect DRAM chip).
The EDAC driver subsystem consists of the following:

A core module called edac_mc that provides a set of library routines.

Separate drivers for interacting with supported memory controllers. For example, the driver module that
works with the memory controller that is part of the Intel 82860 North Bridge is called i82860_edac.

EDAC reports errors via files in the sysfs directory, /sys/devices/system/edac/. It also generates messages that
can be gleaned from the kernel error log.
The layout of DRAM chips is specified in terms of the number of chip-selects emanating from the memory
controller and the data-transfer width (channels) between the memory controller and the CPU. The number of
rows in the DRAM chip array depends on the former, whereas the number of columns hinge on the latter. One of
the main aims of EDAC is to point the needle of suspicion at problem DRAM chips, so the EDAC sysfs node
structure is designed according to the physical chip layout: /sys/devices/system/edac/mc/mcX/csrowY/
corresponds to chip-select row Y in memory controller X. Each such directory contains details such as the
number of detected CEs (ce_count), UEs (ue_count), channel location, and other attributes.

Device Example: ECC-Aware Memory Controller
Let's add EDAC support for a yet-unsupported memory controller. Assume that you're putting Linux onto a
medical grade device that is an embedded x86 derivative. The North Bridge chipset (which includes the memory
controller as discussed in the sidebar "The North Bridge" in Chapter 12, "Video Drivers") on your board is the
Intel 855GME that is capable of ECC reporting. All DRAM banks connected to the 855GME on this system are
ECC-enabled chips because this is a life-critical device. EDAC does not yet support the 855GME, so let's take a
stab at implementing it.
ECC DRAM controllers have two major ECC-related registers: an error status register and an error address
pointer register, as shown in Table 20.1. When an ECC error occurs, the former contains the status (whether the
error is an SBE or an MBE), whereas the latter contains the physical address where the error occurred. The
EDAC core periodically checks these registers and reports results to user space via sysfs. From a configuration
standpoint, all devices inside the 855GME appear to be on PCIbus 0. The DRAM controller resides on device 0 of
this bus. DRAM interface control registers (including the ECC-specific registers) map into the corresponding PCI
configuration space. To add EDAC support for the 855GME, add hooks to read these registers, as shown in
Listing 20.1. Refer back to Chapter 10, "Peripheral Component Interconnect," for explanations on PCI device
driver methods and data structures.

Table 20.1. ECC-Related Registers on the DRAM Controller

ECC-Specific Registers Residing in the
DRAM Controller's PCI Configuration
Space

Description

I855_ERRSTS_REGISTER

The error status register, which signals
occurrence of an ECC error. Shows
whether the error is an SBE or an MBE.

I855_EAP_REGISTER

The error address pointer register, which
contains the physical address where the
most recent ECC error occurred.

Listing 20.1. An EDAC Driver for the 855GME
Code View:
/* Based on drivers/edac/i82860_edac.c */
#define I855_PCI_DEVICE_ID

0x3584 /* PCI Device ID of the memory
controller in the 855 GME */
#define I855_ERRSTS_REGISTER 0x62
/* Error Status Register's offset
in the PCI configuration space */
#define I855_EAP_REGISTER
0x98
/* Error Address Pointer Register's
offset in the PCI configuration space */
struct i855_error_info {
u16 errsts; /* Error Type */
u32 eap;
/* Error Location */
};
/* Get error information */
static void
i855_get_error_info(struct mem_ctl_info *mci,
struct i855_error_info *info)
{
struct pci_dev *pdev;
pdev = to_pci_dev(mci->dev);
/* Read error type */
pci_read_config_word(pdev, I855_ERRSTS_REGISTER, &info->errsts);
/* Read error location */
pci_read_config_dword(pdev, I855_EAP_REGISTER, &info->eap);
}
/* Process errors */
static int
i855_process_error_info(struct mem_ctl_info *mci,
struct i855_error_info *info,
int handle_errors)
{
int row;
info->eap >>= PAGE_SHIFT;
row = edac_mc_find_csrow_by_page(mci, info->eap); /* Find culprit row */
/* Handle using services provided by the EDAC core.
Populate sysfs, generate error messages, and so on */
if (is_MBE()) {
/* is_MBE() looks at I855_ERRSTS_REGISTER and checks
for an MBE. Implementation not shown */

edac_mc_handle_ue(mci, info->eap, 0, row, "i855 UE");
} else if (is_SBE()) {
/* is_SBE() looks at I855_ERRSTS_REGISTER and checks
for an SBE. Implementation not shown */
edac_mc_handle_ce(mci, info->eap, 0, info->derrsyn, row, 0,
"i855 CE");
}
return 1;
}
/* This method is registered with the EDAC core from i855_probe() */
static void
i855_check(struct mem_ctl_info *mci)
{
struct i855_error_info info;
i855_get_error_info(mci, &info);
i855_process_error_info(mci, &info, 1);
}
/* The PCI driver probe method, part of the pci_driver structure */
static int
i855_probe(struct pci_dev *pdev, int dev_idx)
{
struct mem_ctl_info *mci;
/* ... */
pci_enable_device(pdev);
/* Allocate control memory for this memory controller.
The 3 arguments to edac_mc_alloc() correspond to the
amount of requested private storage, number of chip-select
rows, and number of channels in your memory layout */
mci = edac_mc_alloc(0, CSROWS, CHANNELS);
/* ... */
mci->edac_check = i855_check; /* Supply the check method to the
EDAC core */
/* Do other memory controller initializations */
/* ... */
/* Register this memory controller with the EDAC core */
edac_mc_add_mc(mci, 0);
/* ... */
}
/* Remove method */
static void __devexit
i855_remove(struct pci_dev *pdev)
{
struct mem_ctl_info *mci = edac_mc_find_mci_by_pdev(pdev);
if (mci && !edac_mc_del_mc(mci)) {
edac_mc_free(mci); /* Free memory for this controller. Reverse
of edac_mc_alloc() */
}
}
/* PCI Device ID Table */
static const struct pci_device_id i855_pci_tbl[] __devinitdata = {
{PCI_VEND_DEV(INTEL, I855_PCI_DEVICE_ID),
PCI_ANY_ID, PCI_ANY_ID, 0, 0,},

{0,},
};
MODULE_DEVICE_TABLE(pci, i855_pci_tbl);
/* pci_driver structure for this device.
Re-visit Chapter 10 for a detailed explanation */
static struct pci_driver i855_driver = {
.name
= "855",
.probe
= i855_probe,
.remove
= __devexit_p(i855_remove),
.id_table = i855_pci_tbl,
};
/* Driver Initialization */
static int __init
i855_init(void)
{
/* ... */
pci_rc = pci_register_driver(&i855_driver);
/* ... */
}

Look at drivers/edac/* for EDAC source files and at Documentation/drivers/edac/edac.txt for detailed semantics
of EDAC sysfs nodes.

Chapter 20. More Devices and Drivers
In This Chapter
578
ECC Reporting
583
Frequency Scaling
584
Embedded Controllers
585
ACPI
587
ISA and MCA
588
FireWire
589
Intelligent Input/Output
590
Amateur Radio
590
Voice over IP
591
High-Speed Interconnects

So far, we have devoted a full chapter to each major device driver class, but there are several
subdirectories under drivers/ that we haven't yet descended into. In this chapter let's venture
inside some of them at a brisk pace.

ECC Reporting
Several memory controllers contain special silicon to measure the fidelity of stored data using error correcting
codes (ECCs). The Error Detection And Correction (EDAC) driver subsystem announces occurrences of memory
error events generated by ECC-aware memory controllers. Typical ECC DRAM chips have the capability to
correct single-bit errors (SBEs) and detect multibit errors (MBEs). In EDAC parlance, the former errors are
correctable errors (CEs), whereas the latter are uncorrectable errors (UEs).
ECC operations are transparent to the operating system. This means that if your DRAM controller supports ECC,
error correction and detection occurs silently without operating system participation. EDAC's task is to report
such events and allow users to fashion error handling policies (such as replace a suspect DRAM chip).
The EDAC driver subsystem consists of the following:

A core module called edac_mc that provides a set of library routines.

Separate drivers for interacting with supported memory controllers. For example, the driver module that
works with the memory controller that is part of the Intel 82860 North Bridge is called i82860_edac.

EDAC reports errors via files in the sysfs directory, /sys/devices/system/edac/. It also generates messages that
can be gleaned from the kernel error log.
The layout of DRAM chips is specified in terms of the number of chip-selects emanating from the memory
controller and the data-transfer width (channels) between the memory controller and the CPU. The number of
rows in the DRAM chip array depends on the former, whereas the number of columns hinge on the latter. One of
the main aims of EDAC is to point the needle of suspicion at problem DRAM chips, so the EDAC sysfs node
structure is designed according to the physical chip layout: /sys/devices/system/edac/mc/mcX/csrowY/
corresponds to chip-select row Y in memory controller X. Each such directory contains details such as the
number of detected CEs (ce_count), UEs (ue_count), channel location, and other attributes.

Device Example: ECC-Aware Memory Controller
Let's add EDAC support for a yet-unsupported memory controller. Assume that you're putting Linux onto a
medical grade device that is an embedded x86 derivative. The North Bridge chipset (which includes the memory
controller as discussed in the sidebar "The North Bridge" in Chapter 12, "Video Drivers") on your board is the
Intel 855GME that is capable of ECC reporting. All DRAM banks connected to the 855GME on this system are
ECC-enabled chips because this is a life-critical device. EDAC does not yet support the 855GME, so let's take a
stab at implementing it.
ECC DRAM controllers have two major ECC-related registers: an error status register and an error address
pointer register, as shown in Table 20.1. When an ECC error occurs, the former contains the status (whether the
error is an SBE or an MBE), whereas the latter contains the physical address where the error occurred. The
EDAC core periodically checks these registers and reports results to user space via sysfs. From a configuration
standpoint, all devices inside the 855GME appear to be on PCIbus 0. The DRAM controller resides on device 0 of
this bus. DRAM interface control registers (including the ECC-specific registers) map into the corresponding PCI
configuration space. To add EDAC support for the 855GME, add hooks to read these registers, as shown in
Listing 20.1. Refer back to Chapter 10, "Peripheral Component Interconnect," for explanations on PCI device
driver methods and data structures.

Table 20.1. ECC-Related Registers on the DRAM Controller

ECC-Specific Registers Residing in the
DRAM Controller's PCI Configuration
Space

Description

I855_ERRSTS_REGISTER

The error status register, which signals
occurrence of an ECC error. Shows
whether the error is an SBE or an MBE.

I855_EAP_REGISTER

The error address pointer register, which
contains the physical address where the
most recent ECC error occurred.

Listing 20.1. An EDAC Driver for the 855GME
Code View:
/* Based on drivers/edac/i82860_edac.c */
#define I855_PCI_DEVICE_ID

0x3584 /* PCI Device ID of the memory
controller in the 855 GME */
#define I855_ERRSTS_REGISTER 0x62
/* Error Status Register's offset
in the PCI configuration space */
#define I855_EAP_REGISTER
0x98
/* Error Address Pointer Register's
offset in the PCI configuration space */
struct i855_error_info {
u16 errsts; /* Error Type */
u32 eap;
/* Error Location */
};
/* Get error information */
static void
i855_get_error_info(struct mem_ctl_info *mci,
struct i855_error_info *info)
{
struct pci_dev *pdev;
pdev = to_pci_dev(mci->dev);
/* Read error type */
pci_read_config_word(pdev, I855_ERRSTS_REGISTER, &info->errsts);
/* Read error location */
pci_read_config_dword(pdev, I855_EAP_REGISTER, &info->eap);
}
/* Process errors */
static int
i855_process_error_info(struct mem_ctl_info *mci,
struct i855_error_info *info,
int handle_errors)
{
int row;
info->eap >>= PAGE_SHIFT;
row = edac_mc_find_csrow_by_page(mci, info->eap); /* Find culprit row */
/* Handle using services provided by the EDAC core.
Populate sysfs, generate error messages, and so on */
if (is_MBE()) {
/* is_MBE() looks at I855_ERRSTS_REGISTER and checks
for an MBE. Implementation not shown */

edac_mc_handle_ue(mci, info->eap, 0, row, "i855 UE");
} else if (is_SBE()) {
/* is_SBE() looks at I855_ERRSTS_REGISTER and checks
for an SBE. Implementation not shown */
edac_mc_handle_ce(mci, info->eap, 0, info->derrsyn, row, 0,
"i855 CE");
}
return 1;
}
/* This method is registered with the EDAC core from i855_probe() */
static void
i855_check(struct mem_ctl_info *mci)
{
struct i855_error_info info;
i855_get_error_info(mci, &info);
i855_process_error_info(mci, &info, 1);
}
/* The PCI driver probe method, part of the pci_driver structure */
static int
i855_probe(struct pci_dev *pdev, int dev_idx)
{
struct mem_ctl_info *mci;
/* ... */
pci_enable_device(pdev);
/* Allocate control memory for this memory controller.
The 3 arguments to edac_mc_alloc() correspond to the
amount of requested private storage, number of chip-select
rows, and number of channels in your memory layout */
mci = edac_mc_alloc(0, CSROWS, CHANNELS);
/* ... */
mci->edac_check = i855_check; /* Supply the check method to the
EDAC core */
/* Do other memory controller initializations */
/* ... */
/* Register this memory controller with the EDAC core */
edac_mc_add_mc(mci, 0);
/* ... */
}
/* Remove method */
static void __devexit
i855_remove(struct pci_dev *pdev)
{
struct mem_ctl_info *mci = edac_mc_find_mci_by_pdev(pdev);
if (mci && !edac_mc_del_mc(mci)) {
edac_mc_free(mci); /* Free memory for this controller. Reverse
of edac_mc_alloc() */
}
}
/* PCI Device ID Table */
static const struct pci_device_id i855_pci_tbl[] __devinitdata = {
{PCI_VEND_DEV(INTEL, I855_PCI_DEVICE_ID),
PCI_ANY_ID, PCI_ANY_ID, 0, 0,},

{0,},
};
MODULE_DEVICE_TABLE(pci, i855_pci_tbl);
/* pci_driver structure for this device.
Re-visit Chapter 10 for a detailed explanation */
static struct pci_driver i855_driver = {
.name
= "855",
.probe
= i855_probe,
.remove
= __devexit_p(i855_remove),
.id_table = i855_pci_tbl,
};
/* Driver Initialization */
static int __init
i855_init(void)
{
/* ... */
pci_rc = pci_register_driver(&i855_driver);
/* ... */
}

Look at drivers/edac/* for EDAC source files and at Documentation/drivers/edac/edac.txt for detailed semantics
of EDAC sysfs nodes.

Frequency Scaling
The CPU frequency (cpufreq) driver subsystem aids power management by scaling CPU frequencies on-the-fly.
If you use a suitable scaling algorithm (called a governor), your device's battery can potentially last longer.
Cpufreq supports several architectures such as x86, ARM, and PowerPC. To obtain cpufreq capabilities, you also
need to enable a suitable processor driver (say, the Intel Enhanced SpeedStep driver if you are using a
SpeedStep-enabled CPU such as Pentium M).
You can control cpufreq's behavior via files in the /sys/devices/system/cpu/cpuX/cpufreq/ directory, where X is
the CPU number. To set maximum and minimum frequency scaling limits, write desired values to
scaling_max_freq and scaling_min_freq, respectively. To see a list of supported cpufreq governors, look at
the contents of scaling_available_governors. The kernel supports several governors:

The performance governor statically sets the CPU frequency to scaling_max_freq.

Powersave sets the CPU frequency to scaling_min_freq.

Ondemand adjusts the frequency depending on CPU load.

Conservative is a variant of ondemand where the speed change occurs smoothly in gradual steps.

Userspace lets applications dictate the scaling technique. Some distributions set the governor to userspace
and implement the scaling algorithm via a daemon called cpuspeed, which is spawned during boot.

You may also implement your own kernel governor using the cpufreq_register_governor() interface.

Each supported governor is implemented as a kernel module. To see cpufreq in action, assign a governor and
vary the system load:
bash> cd /sys/devices/system/cpu/cpu0/cpufreq
bash>cat scaling_max_freq
Maximum frequency
1700000
bash> cat scaling_min_freq
Minimum frequency
600000
bash> cat cpuinfo_cur_freq
Current frequency
600000
bash> cat scaling_governor
Scaling algorithm in use
powersave
bash> cat scaling_available_frequencies
1700000 1400000 1200000 1000000 800000 600000
bash> cat scaling_available_governors
conservative ondemand powersave userspace performance
bash> echo conservative > scaling_governor
Assign 'conservative' governor
bash> ls -lR /
Switch to another terminal and
load your system by recursively

traversing all directories.

If you now monitor the running frequency by looking at
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq, you will see it dancing to the tune of the CPU load.
The CPU scaling code lives in the drivers/cpufreq/ directory. Look at Documentation/cpu-freq/* for the detailed
semantics of cpufreq sysfs nodes.