Chrome OS Machine Learning Service

Summary

The Machine Learning (ML) Service provides a common runtime for evaluating machine learning models on device. The service wraps the TensorFlow Lite runtime and provides infrastructure for deployment of trained models. The TFLite runtime runs in a sandboxed process. Chromium communicates with ML Service via a Mojo interface.

How to use ML Service

You need to provide your trained models to ML Service first, then load and use your model from Chromium using the client library provided at //chromeos/services/machine_learning/public/cpp/. See this doc for more detailed instructions.

Note: The sandboxed process hosting TFLite models is currently shared between all users of ML Service. If this isn't acceptable from a security perspective for your model, follow this bug about switching ML Service to having a separate sandboxed process per loaded model.

Metrics

The following metrics are currently recorded by the daemon process in order to understand its resource costs in the wild:

  • MachineLearningService.MojoConnectionEvent: Success/failure of the D-Bus->Mojo bootstrap.
  • MachineLearningService.TotalMemoryKb: Total (shared+unshared) memory footprint every 5 minutes.
  • MachineLearningService.PeakTotalMemoryKb: Peak value of MachineLearningService.TotalMemoryKb per 24 hour period. Daemon code can also call ml::Metrics::UpdateCumulativeMetricsNow() at any time to take a peak-memory observation, to catch short-lived memory usage spikes.
  • MachineLearningService.CpuUsageMilliPercent: Fraction of total CPU resources consumed by the daemon every 5 minutes, in units of milli-percent (1/100,000).

Additional metrics added in order to understand the resource costs of each request for a particular model:

  • MachineLearningService.|MetricsModelName|.|request|.Event: OK/ErrorType of the request.
  • MachineLearningService.|MetricsModelName|.|request|.TotalMemoryDeltaKb: Total (shared+unshared) memory delta caused by the request.
  • MachineLearningService.|MetricsModelName|.|request|.CpuTimeMicrosec: CPU time usage of the request, which is scaled to one CPU core, i.e. the units are CPU-core*microsec (10 CPU cores for 1 microsec = 1 CPU core for 10 microsec = recorded value of 10).

|MetricsModelName| is specified in the model's metadata for builtin models and is specified in |FlatBufferModelSpec| by the client for flatbuffer models. The above |request| can be following:

  • LoadModelResult
  • CreateGraphExecutorResult
  • ExecuteResult (model inference)

The request name “LoadModelResult” is used no matter the model is loaded by |LoadBuiltinModel| or by |LoadFlatBufferModel|. This is valid based on the fact that for a particular model, it is either loaded by |LoadBuiltinModel| or by |LoadFlatBufferModel| and never both.

There is also an enum histogram “MachineLearningService.LoadModelResult” which records a generic model specification error event during a |LoadBuiltinModel| or |LoadFlatBufferModel| request when the model name is unknown.

Original design docs

Note that aspects of the design may have evolved since the original design docs were written.