Diagnostic Routines

This guide details each of the diagnostic routines provided by cros_healthd, along with any options the routine supports, a sample invocation run via the diag component of cros-health-tool, and sample output from running the routine. Routines can be run through crosh or directly through cros-health-tool. The sample invocations below run the same routine for crosh and cros-health-tool.

Routine Availability

Not all routines are available on all devices. For example, battery-related routines are not available on Chromeboxes, which do not have batteries. To get a list of all routines available on a given device, run the following command:

From crosh:

crosh> diag list

From cros-health-tool:

$ cros-health-tool diag --action=get_routines

Sample output:

Available routine: battery_capacity
Available routine: battery_health
...
Available routine: floating_point_accuracy
Available routine: prime_search

Battery and Power Routines

ac_power

Confirms that the AC power adapter is being recognized properly by the system.

Parameters:

  • --ac_power_is_connected - Whether or not a power supply is expected to be connected. Type: bool. Default: true.
  • --expected_power_type - The type of power supply expected to be connected. Only valid when --ac_power_is_connected=true. Type: string. Default: ""

To ensure that a power supply of type USB_PD is connected and recognized:

From crosh:

crosh> diag ac_power --expected_power_type="USB_PD"

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=ac_power --expected_power_type="USB_PD"

Sample output:

Progress: 33
Plug in the AC adapter.
Press ENTER to continue.

Progress: 100
Status: Passed
Status message: AC Power routine passed.

battery_capacity

Confirms that the device's battery design capacity lies within the given limits.

Parameters:

  • --low_mah - Lower bound for the allowable design capacity of the battery, in mAh. Type: uint32_t. Default: 1000.
  • --low_mah - Upper bound for the allowable design capacity of the battery, in mAh. Type: uint32_t. Default: 10000.

To ensure the device's battery capacity lies within the default range of (1000, 10000) mAh:

From crosh:

crosh> diag battery_capacity

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=battery_capacity

Sample output:

Progress: 100
Status: Passed
Status message: Battery design capacity within given limits.

battery_discharge

Checks to see if the battery discharges excessively during a period of time.

Parameters:

  • --length_seconds - Length of time to run the routine for, in seconds. Type: uint32_t. Default: 10.
  • --maximum_discharge_percent_allowed - Maximum discharge percent allowed during the runtime of the routine. If, after the routine ends, the battery has discharged more than this percent, then the routine fails. Type: uint32_t. Default: 100.

To ensure the battery discharges less than 10 percent in 600 seconds:

From crosh:

crosh> diag battery_discharge --length_seconds=600 --maximum_discharge_percent_allowed=10

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=battery_discharge --length_seconds=600 --maximum_discharge_percent_allowed=10

Sample output:

Progress: 0
Unplug the AC adapter.
Press ENTER to continue.

Progress: 0
Progress: 1
...
Progress: 99
Progress: 100
Output: Battery discharged 7% in 600 seconds.
Status: Passed
Status message: Battery discharge routine passed.

battery_health

Provides some basic information on the status of the battery, and determines if the battery's cycle count and wear percentage are greater than the given limits.

Parameters:

  • --maximum_cycle_count - Upper bound for the battery's cycle count. Type: uint32_t. Default: 0.
  • --percent_battery_wear_allowed - Upper bound for the battery's wear percentage. Type: uint32_t. Default: 100.

To ensure the device's battery has a cycle count less than 5 and wear percentage less than 15:

From crosh:

crosh> diag battery_health --maximum_cycle_count=5 --percent_battery_wear_allowed=15

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=battery_health --maximum_cycle_count=5 --percent_battery_wear_allowed=15

Sample output:

Progress: 100
Output: Charge Full: 4759000
Charge Full Design: 5275000
Charge Now: 4759000
Current Now: 0
Cycle Count: 10
Manufacturer: 333-22-
Present: 1
Status: Charging
Voltage Now: 13055000
Wear Percentage: 10

Status: Failed
Status message: Battery cycle count is too high.

CPU Routines

cpu_cache

Performs cache coherency testing via stressapptest --cc_test.

Parameters:

  • --length_seconds - Length of time to run the routine for, in seconds. Type: uint32_t. Default: 10.

To run cache coherency testing for 600 seconds:

From crosh:

crosh> diag cpu_cache --length_seconds=600

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=cpu_cache --length_seconds=600

Sample output:

Progress: 0
Progress: 1
Progress: 2
Progress: 3
Progress: 4
...
Progress: 100
Progress: 100
Status: Passed
Status message: Routine passed.

cpu_stress

Performs CPU stress-testing via stressapptest -W, which mimics a realistic high-load situation.

Parameters:

  • --length_seconds - Length of time to run the routine for, in seconds. Type: uint32_t. Default: 10.

To run the stress test for the default 10 seconds:

From crosh:

crosh> diag cpu_stress

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=cpu_stress

Sample output:

Progress: 0
Progress: 1
Progress: 2
Progress: 3
Progress: 4
...
Progress: 100
Progress: 100
Status: Passed
Status message: Routine passed.

floating_point_accuracy

Repeatedly checks the accuracy of millions of floating-point operations against known good values for the duration of the routine.

Parameters:

  • --length_seconds - Length of time to run the routine for, in seconds. Type: uint32_t. Default: 10.

To perform floating-point operations for 300 seconds:

From crosh:

crosh> diag floating_point_accuracy --length_seconds=300

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=floating_point_accuracy --length_seconds=300

Sample output:

Progress: 0
Progress: 1
Progress: 2
Progress: 3
Progress: 4
...
Progress: 100
Progress: 100
Status: Passed
Status message: Routine passed.

prime_search

Repeatedly checks the CPU's brute-force calculations of prime numbers from 2 to the given maximum number for the duration of the routine.

Parameters:

  • --length_seconds - Length of time to run the routine for, in seconds. Type: uint32_t. Default: 10.
  • --max_num - Primes between two and this parameter will be calculated. Type: uint64_t. Default: 1000000.

To search for prime numbers between 2 and 10000 for the default 10 seconds:

From crosh:

crosh> diag prime_search --max_num=10000

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=prime_search --max_num=10000

Sample output:

Progress: 0
Progress: 1
Progress: 2
Progress: 3
Progress: 4
...
Progress: 100
Progress: 100
Status: Passed
Status message: Routine passed.

urandom

Stresses the CPU by reading from /dev/urandom for the specified length of time.

Parameters:

  • --length_seconds - Length of time to run the routine for, in seconds. Type: uint32_t. Default: 10.

To stress the CPU for 120 seconds:

From crosh:

crosh> diag urandom --length_seconds=120

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=urandom --length_seconds=120

Sample output:

Progress: 0
Progress: 1
Progress: 2
Progress: 3
Progress: 4
...
Progress: 100
Progress: 100
Status: Passed
Status message: Routine passed.

Storage Routines

disk_read

Uses the fio utility to write a temporary file with random data, then repeatedly read the file either randomly or linearly for the duration of the routine. Checks to see that the data read matches the data written.

Parameters:

  • --length_seconds - Length of time to run the routine for, in seconds. Type: uint32_t. Default: 10.
  • --disk_read_routine_type - Type of reading to perform. Type: string. Default: linear. Allowable values: [linear|random]
  • --file_size_mb - Size of the file to read and write, in MB. Type: int32_t. Default: 1024.

To read a test file of size 10MB randomly for 120 seconds:

From crosh:

crosh> diag disk_read --length_seconds=120 --disk_read_routine_type="random" --file_size_mb=10

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=disk_read --length_seconds=120 --disk_read_routine_type="random" --file_size_mb=10

Sample output:

Progress: 0
Progress: 1
Progress: 2
Progress: 3
Progress: 4
...
Progress: 99
Progress: 100
Status: Passed
Status message: Routine passed.

nvme_self_test

Conducts either a short or a long self-test of the device's NVMe storage.

Parameters:

  • --nvme_self_test_long - Whether or not to conduct a long self-test. Type: bool. Default: false.

To conduct a short self-test of the device's NVMe storage:

From crosh:

crosh> diag nvme_self_test

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=nvme_self_test

Sample output:

Progress: 0
Progress: 1
Progress: 2
Progress: 3
Progress: 4
...
Progress: 99
Progress: 100
Output: AQAAABAAAAA7AAAAAAAAAA==
Status: Passed
Status message: SelfTest status: Test PASS

nvme_wear_level

Compares the device‘s NVMe storage’s wear level against the input threshold.

Parameters:

  • --wear_level_threshold - Acceptable wear level for the device's NVMe storage. Type: uint32_t. Default: 50. Allowable values: (0,99)

To ensure the device's NVMe storage has a wear level no more than 20:

From crosh:

crosh> diag nvme_wear_level --wear_level_threshold=20

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=nvme_wear_level --wear_level_threshold=20

Sample output:

Progress: 0
Progress: 100
Output: AAAAAAAAAADxBAAAAAAAAA==
Progress: 100
Status: Passed
Status message: Wear-level status: PASS.

smartctl_check

Checks to see if the drive's remaining spare capacity is high enough to protect against asynchronous event completion.

The smartctl_check routine has no parameters.

To check that the device's spare capacity is sufficient:

From crosh:

crosh> diag smartctl_check

From cros-health-tool:

$ cros-health-tool diag --action=run_routine --routine=smartctl_check

Sample output:

Progress: 33
Progress: 100
Progress: 100
Status: Passed
Status message: Routine passed