SOC Kernel Thermal Manager


The kernel thermal framework already exists in the mainline kernel builds, as well as being carried over into the various ARM trees, including linux-omap. Through various discussions with interested parties about what their requirements are for software based thermal management, the kernel thermal framework seems to provide the best interface that meets all requirements.

The thermal framework can best be described as an abstraction layer between the hardware components that either source temperature (“temperature zone”) or affect temperature (“cooling devices”), and then provides a link and grouping of temperature sources to cooling devices. It does not provide any direct means to source a temperature or cool a device, but instead provides the data structures and API calls that allow for a standardized abstraction between these devices. Policies may be placed either directly in Kernel or in user-space; the default policy is in-Kernel and consists of affecting the rate of cooling for each group tied to a particular temperature sensor. The default policy is explained a bit more later on. The kernel framework also provides a consistent interface to the filesystem for all temperature sensors and cooling devices; there is no standard for hwmon interfaces to the filesystem so data may be represented in centigrade, centi-centrigrade, kelvin, etc., as well as be in either float or integer type. The thermal zone expects the temperature in millidegree celsius. For a more complete explanation of what the thermal framework is, look under Documentation/thermal/ in any mainline kernel tree clone for more details.

File Structure Overview

Like most frameworks, the thermal framework is broken into a common component and a platform specific component. Platform specific components may be on a per-processor and/or on a per-board basis. The common components come in the form of drivers, and are built as such. The platform specific functions provide the actual binding of temperature sensors to cooling devices, and invoke the common component data structures and function calls. The common components consist of the following:

../include/linux/thermal.h – holds the pre-defined structures and prototypes. Must be included in any code that will make use of the thermal framework

../build/drivers/thermal/thermal_sys.c – Provides sysfs support to thermal management. Devices that register themselves as cooling devices or temperature sensors will receive abstracted sysfs entries via these functions

../build/drivers/acpi/thermal.c – Provides the binding structures for temperature sources to cooling devices, as well as the main processing algorithm for thermal management. As can be inferred from the directory path, this is tied specifically to acpi (and is even called “acpi thermal zone driver”) which will not be directly applicable to ARM-based systems (as no ACPI). This functionality will need to be cloned for an ARM-based thermal management, proposals of which are outlined below.

The platform specific files are listed here:

../platform/x86/acerhdf.c – used by the Aspire One netbook (Atom based) – a nice example of a single temperature sensor controlling a single cooling device .

../platform/x86/intel_mid_thermal.c – Provided by Intel for their MID platforms. More complex than acerhdf.c as it has four thermal sensors and multiple cooling zones. Also, shows how to map in external hardware as a thermal zone.

As may be inferred from the above, it is fairly straightforward to make use of the thermal framework. A separate driver is created, usually on a per-board basis, but can include a per-device (i.e., omap4460) basis as a default, that defines and binds all temperature sensors to cooling devices, as well as the period each resource is to be sampled and all boundaries on the temperature sensors. Platform specific algorithms can also be added in this driver, overriding the default policy algorithm in thermal.c.

Design Architecture Overview

The thermal framework has a nice abstract definition and data structure implementation in tying temperature sensors and cooling devices to create “thermal zones”, with the possibility of separate algorithmic control on each thermal zone. Although the current implementations all assume that the cooling device is a fan of some sort, it is quite possible to define a processor or other SOC-component that may go into off mode or control frequency as a cooling device in this framework. The actual control for this, however, must be implemented as the current thermal framework does not take this into account. If SOC-level control is added to the thermal framework, it would provide a single resource that controls all thermal condiitions, from the device to the system.

Thermal framework does not yet consider “independently” controlled thermal domains such as the LPDDR2 thermal requirements. It is expected that the individual device driver is responsible fully for this, and that the cooling effects applied do not affect the rest of the system. However, if there are some cross-functional affects (such as turning off the graphics core), thermal framework would be the resource to implement this control.

In the existing thermal framework, the main control module (thermal.c) is expecting to take input from the BIOS regarding temperature state. Thermal.c only concerns itself with controlling thermal state in “valid” regions – if the device is running too hot or other critical thermal state occurs (such as overheating during boot, etc.), it is expected that the BIOS will take over and reset the device or otherwise insure the system is handled correctly. There is no mechanism at this time within thermal framework to handle these critical out-of-bounds conditions. Any other external temperature sensor NOT covered by the BIOS is placed in platform specific files, and makes use of hwmon directly. The platform specific implementation then exports these hwmon-based sensors to thermal framework compliant sysfs entries (via thermal_sys.c).

The default thermal algorithm as part of thermal.c, as it assumes fan control, provides an “aggressive” cooling control in that it anticipates the device heating up and will proactively apply cooling states to try to reduce the rate of device heating until heating is no longer detected, in which case cooling states are removed. This algorithm works well when there is no real perceived “downside” to applying the cooling (other than a noisy fan or increased power consumption in a laptop environment). However, if an “off” mode or frequency reduction (via OPP removal) is applied, this may have a much bigger impact on the user experience than simply cranking up a fan would, and as a result, the default policy algorithm used should be modified.

Requirements for an ARM-based Implementation

There are three major changes to thermal framework that must be considered to have be useable by ARM (or any non-fan) based system:

  1. Remove dependency on ACPI and BIOS exporting information.
  2. Add in default implementations to control frequency or retention/off mode for modules that can be considered cooling devices
  3. Policy algorithm that does not assume the implementation used to cool a device will not have an adverse affect on the system.

For 1), this would be done by mapping directly the temperature sensor driver information directly to the thermal framework, and have the functions called be more generalized. This is fairly straightforward as the hwmon information would already exist in the temp driver, and it just needs to be called and then formatted to the thermal framework definitions. This would require a new module to the thermal framework, where instead of relying on an acpi driver implementation (as is current), a new driver would need to be created that would call in say a “build/drivers/<something ARM power>/thermal.c” would need to be used. The largest headache in this is not so much the implementation, but rather getting it main-lined in a way that pleases everyone. Thermal framework today does NOT consider a non ACPI implementation, and if one is to be created, how best to manage this inclusion would require maintainer support. As critical functions (such as the default algorithm used) is included in the “thermal.c” module, this functionality does need to be cloned somewhere, but the actual implementation will be far different than the present ACPI version.

For 2), this has been prototyped with the user-space thermal manager in Linaro (see apropriate git trees). The basic premise is to remove operating points if the device is getting too hot, but it is not the only approach that can be used. Forcing the device into idle (by allowing say only the device to be run at 80% load, for example) has been considered and additional data needs to be collected to see if this provides a valid benefit. Caping load has the benefit of not affecting “known” usecases as much since operating points will not be removed, but will remove “total available mips” to the system by forcing an idle state. This could result in starvation of certain processes and would require modifications to other kernel components to allow for the “skipping” of scheduled work. Removing an operating point allows all mips to be available at all times, but if overheating is detected, may impact user-experience in known usecases by removing operating points from the system (and then, total possible mips at that given point) but would not result in missed or scheduled work. Both of these methods have merit and ultimately should be supported (let the user or system engineer decide, based on how he wants the system to behave). Regardless of the specific mechanism used, the method chosen will need to be coded and placed as part of a “build/drivers/<something ARM power>/thermal.c”, which would then be the driver called for the cooling device when a thermal event is detected.

For 3), this had been partially explained above, but in addition to providing perhaps a less aggressive thermal policy, a means should be added to allow the user to choose the tradeoffs of the cooling solution used. This should take the form of exporting thermal states to user-space and allow a policy manager of some kind to be able to set “constraints” or trade-offs to the system. Default constraints could also be added as well.

Finally, as stated above, critical thresholds are generally not handled by the thermal framework (but directly by lower level functionality). The thermal framework should NOT be the last resort to protect the system from overheating. This functionality, which should be done in hardware or via an external monitoring circuit, should kick in and provide a device reset if critical thermal states are reached.

Limitation and feature addition in the current thermal api's and logic

1)A seperate cooling device is associated with a trip points of the thermal zone. This is fine if the cooling device is of different type. But in cases if it is the same cpufreq cooling device and only the different frequency levels are associated with each trip points. So to support this some modification is needed in the core thermal framework. Basically, The cooling device needs to have some information about thermal zone and trip points.

2)Currently the cooling happens in a direct and symmetric way where a particular temperature triggers a particular cooling action. This may work well in case of slightly bigger system but in case of SOC and embedded space where the everything is packaged together, such that it is difficult to predict the cooling effect precisely in one shot in the beginning or starting from highest cooling level and falling to lower level. So some ways of heuristics and prediction based on the rate of cooling achieved from the past history may be useful for intelligent selection.

3)Currently each thermal zone is known or identified by a single temperature sensor. But this can change when multiple sensors will be put inside the SOC in some hierarchical order and the different sensor have some link between them and can be prioritized. For e.g. CPU temp sensor, Memory temp sensor, ambient temp sensor, battery sensor. The handling of so many temperature sensor might be done efficiently inside the same thermal zone instead of multiple thermal zones.

WorkingGroups/PowerManagement/Archives/OldSpecs/ThermalArchitecture (last modified 2013-08-22 07:58:35)