blob: 49451a0533cb76e48197b7c7de75a17d2a8a6195 [file] [log] [blame] [view] [edit]
# Chrome OS Machine Learning Service
## Summary
The Machine Learning (ML) Service provides a common runtime for evaluating
machine learning models on device. The service wraps the TensorFlow Lite runtime
and provides infrastructure for deployment of trained models. The TFLite runtime
runs in a sandboxed process. Chromium communicates with ML Service via a Mojo
interface.
## How to use ML Service
You need to provide your trained models to ML Service first, then load and use
your model from Chromium using the client library provided at
[//chromeos/services/machine_learning/public/cpp/]. See [this
doc](docs/publish_and_use_model.md) for more detailed instructions.
Note: The sandboxed process hosting TFLite models is currently shared between
all users of ML Service. If this isn't acceptable from a security perspective
for your model, follow [this bug](http://crbug.com/933017) about switching ML
Service to having a separate sandboxed process per loaded model.
## Metrics
The following metrics are currently recorded by the daemon process in order to
understand its resource costs in the wild:
* MachineLearningService.MojoConnectionEvent: Success/failure of the
D-Bus->Mojo bootstrap.
* MachineLearningService.TotalMemoryKb: Total (shared+unshared) memory footprint
every 5 minutes.
* MachineLearningService.PeakTotalMemoryKb: Peak value of
MachineLearningService.TotalMemoryKb per 24 hour period. Daemon code can
also call ml::Metrics::UpdateCumulativeMetricsNow() at any time to take a
peak-memory observation, to catch short-lived memory usage spikes.
* MachineLearningService.CpuUsageMilliPercent: Fraction of total CPU resources
consumed by the daemon every 5 minutes, in units of milli-percent (1/100,000).
Additional metrics added in order to understand the resource costs of each
request for a particular model:
* MachineLearningService.|MetricsModelName|.|request|.Event: OK/ErrorType of the
request.
* MachineLearningService.|MetricsModelName|.|request|.TotalMemoryDeltaKb: Total
(shared+unshared) memory delta caused by the request.
* MachineLearningService.|MetricsModelName|.|request|.CpuTimeMicrosec: CPU time
usage of the request, which is scaled to one CPU core, i.e. the units are
CPU-core\*microsec (10 CPU cores for 1 microsec = 1 CPU core for 10 microsec =
recorded value of 10).
|MetricsModelName| is specified in the model's [metadata][model_metadata.cc] for
builtin models and is specified in |FlatBufferModelSpec| by the client for
flatbuffer models.
The above |request| can be following:
* LoadModelResult
* CreateGraphExecutorResult
* ExecuteResult (model inference)
The request name "LoadModelResult" is used no matter the model is loaded by
|LoadBuiltinModel| or by |LoadFlatBufferModel|. This is valid based on the fact
that for a particular model, it is either loaded by |LoadBuiltinModel| or by
|LoadFlatBufferModel| and never both.
There is also an enum histogram "MachineLearningService.LoadModelResult"
which records a generic model specification error event during a
|LoadBuiltinModel| or |LoadFlatBufferModel| request when the model name is
unknown.
## Original design docs
Note that aspects of the design may have evolved since the original design docs
were written.
* [Overall design](https://docs.google.com/document/d/1ezUf1hYTeFS2f5JUHZaNSracu2YmSBrjLkri6k6KB_w/edit#)
* [Mojo interface](https://docs.google.com/document/d/1pMXTG-OIhkNifR2DCPa2bCF0X3jrAM-U6UK230pBv5I/edit#)
* [Deamon\<-\>Chromium IPC implementation](https://docs.google.com/document/d/1EzBKLotvspe75GUB0Tdk_Namstyjm6rJHKvNmRCCAdM/edit#)
* [Model publishing](https://docs.google.com/document/d/1LD8sn8rMOX8y6CUGKsF9-0ieTbl97xZORZ2D2MjZeMI/edit#)
[//chromeos/services/machine_learning/public/cpp/]: https://cs.chromium.org/chromium/src/chromeos/services/machine_learning/public/cpp/service_connection.h
[model_metadata.cc]: https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/ml/model_metadata.cc