| # Chromium OS Metrics |
| |
| The Chromium OS "metrics" package contains utilities for client-side user metric |
| collection. |
| |
| When Chromium is installed, Chromium will take care of aggregating and uploading |
| the metrics to the UMA server. |
| |
| When Chromium is not installed (e.g. an embedded/headless build) and the |
| metrics_uploader USE flag is set, `metrics_daemon` will aggregate and upload the |
| metrics itself. |
| |
| [TOC] |
| |
| |
| ## The Metrics Library: libmetrics |
| |
| libmetrics implements the basic C and C++ API for metrics collection. All |
| metrics collection is funneled through this library. The easiest and |
| recommended way for a client-side module to collect user metrics is to link |
| libmetrics and use its APIs to send metrics to Chromium for transport to |
| UMA. In order to use the library in a module, you need to do the following: |
| |
| - Add a dependence (DEPEND and RDEPEND) on chromeos-base/metrics to the module's |
| ebuild. |
| |
| - Link the module with libmetrics (for example, by passing `-lmetrics` to the |
| module's link command). Both `libmetrics.so` and `libmetrics.a` are built |
| and installed into the sysroot libdir (e.g. `$SYSROOT/usr/lib/`). By default |
| `-lmetrics` links against `libmetrics.so`, which is preferred. |
| |
| - Make sure `/var/lib/metrics` is writable by the daemon. For example, if you |
| are using libmetrics in a daemon, you can achieve this by adding |
| `-b /var/lib/metrics,,1` to the `minijail0` command that starts the daemon. |
| |
| - To access the metrics library API in the module, include the |
| `<metrics/metrics_library.h>` header file. The file is installed in |
| `$SYSROOT/usr/include/` when the metrics library is built and installed. |
| |
| - The API is documented in [metrics_library.h](./metrics_library.h). Before |
| using the API methods, a MetricsLibrary object needs to be constructed. A |
| quick example: |
| |
| ```c++ |
| MetricsLibrary metrics; |
| bool result = metrics.SendToUMA( |
| /*name=*/"Platform.MyModule.MyLabel", |
| /*sample=*/3, |
| /*min=*/1, |
| /*max=*/10, |
| /*num_buckets=*/10); |
| if (!result) { |
| LOG(ERROR) << "Failed to send to UMA"; |
| } |
| ``` |
| |
| For more information on the C API, see |
| [c_metrics_library.h](./c_metrics_library.h). |
| |
| - On the target platform, shortly after the sample is sent, it should be visible |
| in Chromium through `chrome://histograms`. |
| |
| - The library includes a CumulativeMetrics class which can be used for |
| histograms whose samples represent accumulation of quantities on the |
| same device across a period of time: for instance, how much time was spent |
| playing music on each device and each day of use. Please see the |
| CumulativeMetrics section below. |
| |
| ### How metrics are actually sent |
| |
| libmetrics always writes histogram data to `/var/lib/metrics/uma-events` using a |
| custom format. flock() is used to avoid races. |
| |
| *** note |
| **Warning:** All metrics are written synchronously to disk and may block if |
| another process has the uma-events file locked. Unlike UMAs in Chrome, care must |
| be taken to not to update UMAs in performance-critical sections. |
| *** |
| |
| *** aside |
| **Note:** libmetrics does not check consent before writing to |
| /var/lib/metrics/uma-events, leaving that to the sender. |
| *** |
| |
| On most boards, the uma-events file is processed by Chromium's |
| `chromeos::ExternalMetrics` class. `chromeos::ExternalMetrics` periodically |
| flock's the file, reads all the metrics in it, and truncates the file. The |
| `chromeos::ExternalMetrics` sends the metrics from the file into Chrome's UMA |
| histogram collection system, after which they are treated like any other Chrome |
| UMA. In particular, Chrome will check consent before uploading the histograms. |
| |
| However, on the few boards that do not run a Chrome browser, uploading is |
| handled by the UploadService inside metrics_daemon. The UploadService is only |
| instatiated if `--uploader` is passed to `metric_daemon`. Similar to Chrome, the |
| UploadService will periodically lock-read-truncate-unlock the uma-events |
| file. If we have user permission to upload stats, the UploadService will then |
| send the metrics after unlocking the file. Here, user permission is controlled |
| by the device policy's `metrics_enabled` field. (If the `metrics_enabled` field |
| is not set, this falls back to enabling stats if the device is enterprise |
| enrolled; if that isn't the case, the existence of the "/home/chronos/Consent To |
| Send Stats" file is used.) |
| |
| ### Required paths and permissions for sandboxing |
| |
| Various functions in MetricsLibrary need to be able to access certain paths in |
| order to work. Programs that are sandboxed (e.g. by minijail) may not allow |
| access to these paths. In some cases, this will cause MetricsLibrary to silently |
| misbehave. As such, we are documenting the needed paths: |
| |
| * `IsGuestMode` needs read access to `/run/state` and permission to make dbus |
| calls to SessionManager. |
| * `AreMetricsEnabled` needs access to everything `IsGuestMode` needs, plus |
| read access to everything under `/run/daemon-store/uma-consent`, |
| `/var/lib/devicesettings`, and `/etc/ssl/openssl.cnf`. |
| * `IsAppSyncEnabled` needs read access to everything under |
| `/run/daemon-store/appsync-optin`. |
| * `EnableMetrics` needs everything `AreMetricsEnabled` needs access to, plus |
| write access to `/home/chronos/`. |
| * `DisableMetrics` needs write access to `/home/chronos/`. |
| * All `Send..ToUMA` functions must be able to access `/var/lib/metrics` and |
| `/var/lib/metrics/uma-events.d`. Specifically, they need to be able to: |
| * Create `/var/lib/metrics/uma-events`. |
| * Write to `/var/lib/metrics/uma-events`, regardless of whether or not |
| they created it. |
| * Create and write to files in `/var/lib/metrics/uma-events.d`. |
| * Exception: If a `MetricsWriter` is passed into |
| the `MetricsLibrary` constructor, and `MetricsWriter::SetOutputFile()` |
| was called, then the `MetricsLibrary` needs the ability to create and |
| write to that output file instead. |
| |
| In all cases, these need to be the "real" files, not a namespaced shadow copy, |
| if the functions are expected to work correctly. |
| |
| ## The Metrics Client: metrics_client |
| |
| `metrics_client` is a command-line utility for sending histogram samples and |
| user actions. It is installed under /usr/bin on the target platform and uses |
| libmetrics. It is typically used for generating metrics from shell scripts. |
| |
| For usage information and command-line options, run `metrics_client` on the |
| target platform or look for `Usage:` in |
| [metrics_client.cc](./metrics_client.cc). |
| |
| |
| ## The Metrics Daemon: metrics_daemon |
| |
| metrics_daemon is a daemon that runs in the background on the target platform |
| and is intended for passive or ongoing metrics collection, or metrics collection |
| requiring input from other modules. For example, it listens to D-Bus |
| signals related to the user session and screen saver states to determine if the |
| user is actively using the device or not and generates the corresponding |
| data. The metrics daemon also uses libmetrics. |
| |
| The recommended way to generate metrics data from a module is to link and use |
| libmetrics directly. However, the module could instead send signals to or |
| communicate in some alternative way with the metrics daemon. Then the metrics |
| daemon needs to monitor for the relevant events and take appropriate action -- |
| for example, aggregate data and send the histogram samples. |
| |
| ## Cumulative Metrics |
| |
| The CumulativeMetrics class in libmetrics helps keep track of quantities across |
| boot sessions, so that the quantities can be accumulated over stretches of time |
| (for instance, a day or a week) without concerns about intervening reboots or |
| version changes, and then reported as samples. For this purpose, some |
| persistent state (i.e. partial accumulations) is maintained as files on the |
| device. These "backing files" are typically placed in |
| `/var/lib/<daemon-name>/metrics`. (The metrics daemon is an exception, with its |
| backing files being in `/var/lib/metrics`.) |
| |
| ## Memory Daemon |
| |
| The [memd](./memd/) subdirectory contains a daemon that collects data at high |
| frequency during episodes of heavy memory pressure. |
| |
| ## vmlog |
| |
| [vmlog_writer](./vmlog_writer.cc) writes `/var/log/vmlog` files. It is a |
| space-delimited format. In order to parse, use the the first line to obtain the |
| list of items, because the number of columns depends on number of CPU cores. |
| |
| - time: current time |
| - From /proc/vmstat |
| - pgmajfault: major faults |
| - pgmajfault_f: major faults served from disk |
| - pgmajfault_a: major faults served from zram |
| - pswpin: number of swap in (pages). |
| - pswpout: number of swap out (pages). |
| - From /proc/stat |
| - cpuusage: all cpu usage ticks from `cpu` line excluding idle and iowait. |
| - gpufreq: GPU frequency; how it's obtained depends on device. |
| - From `/sys/devices/system/cpu/cpuN/cpufreq/scaling_cur_freq` |
| - cpufreqN: frequency of core in kHz. |
| |
| `vmlog_writer` manages a number of symbolic links, which are not immediately |
| intuitive: |
| |
| * `vmlog.LATEST` is a symbolic link to the current logfile that `vmlog_writer` |
| is writing to. It is rotated whenever the size gets above 256KiB. The |
| underlying log file is **not** renamed during this rotation, so the date |
| suffix reflects the time when `vmlog_writer` started during the most recent |
| run of `metrics_daemon` (most likely at boot, unless `metrics_daemon` |
| restarted). |
| * `vmlog.1.LATEST` is a symbolic link to the **last** log file written to. |
| When `vmlog.LATEST` would exceed 256KiB, its contents are copied and |
| overwrite the current contents of the file `vmlog.1.LATEST` points to. |
| Similarly, the date suffix on the target of this link only changes when |
| `metrics_daemon` starts. |
| * `vmlog.PREVIOUS` is a symbolic link that points to the vmlog contents as of |
| the previous shutdown. That is, when `metrics_daemon` starts, it renames |
| `vmlog.LATEST` to `vmlog.PREVIOUS` and creates a new `vmlog.LATEST`, which |
| points to the new log file it just created. |
| * `vmlog.1.PREVIOUS` is a symbolic link that points to the `vmlog.1.LATEST` |
| contents as of the previous shutdown. That is, when `metrics_daemon` starts, |
| it renames `vmlog.1.LATEST` to `vmlog.1.PREVIOUS`. |
| |
| An illustrative example: |
| |
| ``` |
| # ls -l /var/log/vmlog |
| total 660 |
| -rw-r--r--. 1 metrics metrics 262060 Oct 25 12:11 vmlog.1.20231024-213014 |
| -rw-r--r--. 1 metrics metrics 262074 Oct 25 14:15 vmlog.1.20231025-163712 |
| lrwxrwxrwx. 1 metrics metrics 38 Oct 25 14:15 vmlog.1.LATEST -> /var/log/vmlog/vmlog.1.20231025-163712 |
| lrwxrwxrwx. 1 metrics metrics 38 Oct 24 19:12 vmlog.1.PREVIOUS -> /var/log/vmlog/vmlog.1.20231024-213014 |
| -rw-r--r--. 1 metrics metrics 74182 Oct 24 17:29 vmlog.20231024-210100 |
| -rw-r--r--. 1 metrics metrics 66750 Oct 25 12:37 vmlog.20231024-213014 |
| -rw-r--r--. 1 metrics metrics 1622 Oct 25 14:16 vmlog.20231025-163712 |
| lrwxrwxrwx. 1 metrics metrics 21 Oct 25 12:37 vmlog.LATEST -> vmlog.20231025-163712 |
| lrwxrwxrwx. 1 metrics metrics 21 Oct 24 17:30 vmlog.PREVIOUS -> vmlog.20231024-213014 |
| ``` |
| |
| From this set of logs and symlinks, we can understand the following sequence of |
| events: |
| |
| 1. 2023-10-24 at 21:30:14 UTC: `metrics_daemon` starts up, and creates |
| `vmlog.20231024-213014`. |
| 2. 2023-10-25 at 12:11 **local** time: The size of `vmlog.20231024-213014` |
| would grow too large after the next data write, so `vmlog_writer` copies it |
| to `vmlog.1.20231024-213014`, overwriting its contents, and truncates |
| `vmlog.20231024-213014`. |
| 3. 2023-10-25 at 16:37:12 UTC: `metrics_daemon` starts up, and creates |
| `vmlog.20231025-163712`. It updates `vmlog.LATEST` to point to this new file |
| and vmlog.PREVIOUS to point to `vmlog.20231024-213014`, and renames the |
| previous `vmlog.1.LATEST` to be `vmlog.1.PREVIOUS`, keeping it pointing to |
| `vmlog.1.20231024-213014`. |
| 4. 2023-10-24 at 14:15 **local** time: The size of `vmlog.20231025-163712` |
| would grow too large after the next data write, so `vmlog_writer` copies it |
| to `vmlog.1.20231025-163712`, overwriting its contents. At this time, it |
| creates a new `vmlog.1.LATEST` file, pointing it to |
| `vmlog.1.20231025-163712`. Finally, it truncates `vmlog.20231025-163712`. |
| 5. 2023-10-25 at 14:16 **local** time: `vmlog_writer` continues writing to the |
| existing file, `vmlog.20231025-163712`. |
| |
| Periodically, `chromeos-cleanup-logs` will remove old `vmlog.<TIMESTAMP>` files. |
| |
| ## Further Information |
| |
| See |
| |
| https://chromium.googlesource.com/chromium/src.git/+/HEAD/tools/metrics/histograms/README.md |
| |
| for more information on choosing name, type, and other parameters of new |
| histograms. The rest of this README is a super-short synopsis of that |
| document, and with some luck it won't be too out of date. |
| |
| |
| ## Synopsis: Histogram Naming Convention |
| |
| Use TrackerArea.MetricName. For example: |
| |
| * Platform.DailyUseTime |
| * Network.TimeToDrop |
| |
| |
| ## Synopsis: Server Side |
| |
| If the histogram data is visible in `chrome://histograms`, it will be sent by an |
| official Chromium build to UMA, assuming the user has opted into metrics |
| collection. To make the histogram visible on "chromedashboard", the histogram |
| description XML file needs to be updated (steps 2 and 3 after following the |
| "Details on how to add your own histograms" link under the Histograms tab). |
| Include the string "Chrome OS" in the histogram description so that it's easier |
| to distinguish Chromium OS specific metrics from general Chromium histograms. |
| |
| The UMA server logs and keeps the collected field data even if the metric's name |
| is not added to the histogram XML. However, the dashboard histogram for that |
| metric will show field data as of the histogram XML update date; it will not |
| include data for older dates. If past data needs to be displayed, manual |
| server-side intervention is required. In other words, one should assume that |
| field data collection starts only after the histogram XML has been updated. |
| |
| ## Synopsis: FAQ |
| |
| ### What should my histogram's |min| and |max| values be set at? |
| |
| You should set the values to a range that covers the vast majority of samples |
| that would appear in the field. Values below |min| are collected in the |
| "underflow bucket" and values above |max| end up in the "overflow bucket". The |
| reported mean of the data is precise, i.e. it does not depend on range and |
| number of buckets. |
| |
| ### How many buckets should I use in my histogram? |
| |
| You should allocate as many buckets as necessary to perform proper analysis on |
| the collected data. Most data is fairly noisy: 50 buckets are plenty, 100 |
| buckets are probably overkill. Also consider that the memory allocated in |
| Chromium for each histogram is proportional to the number of buckets, so don't |
| waste it. |
| |
| ### When should I use an enumeration (linear) histogram vs. a regular (exponential) histogram? |
| |
| Enumeration histograms should really be used only for sampling enumerated |
| events and, in some cases, percentages. Normally, you should use a regular |
| histogram with exponential bucket layout that provides higher resolution at |
| the low end of the range and lower resolution at the high end. Regular |
| histograms are generally used for collecting performance data (e.g., timing, |
| memory usage, power) as well as aggregated event counts. |