The Machine Learning (ML) Service provides a common runtime for evaluating machine learning models on device. The service wraps the TensorFlow Lite runtime and provides infrastructure for deployment of trained models. The TFLite runtime runs in a sandboxed process. Chromium communicates with ML Service via a Mojo interface.
You need to provide your trained models to ML Service first, then load and use your model from Chromium using the client library provided at //chromeos/services/machine_learning/public/cpp/. See this doc for more detailed instructions.
Note: The sandboxed process hosting TFLite models is currently shared between all users of ML Service. If this isn't acceptable from a security perspective for your model, follow this bug about switching ML Service to having a separate sandboxed process per loaded model.
The following metrics are currently recorded by the daemon process in order to understand its resource costs in the wild:
Additional metrics added in order to understand the resource costs of each request for a particular model:
|MetricsModelName| is specified in the model's metadata for builtin models and is specified in |FlatBufferModelSpec| by the client for flatbuffer models. The above |request| can be following:
The request name “LoadModelResult” is used no matter the model is loaded by |LoadBuiltinModel| or by |LoadFlatBufferModel|. This is valid based on the fact that for a particular model, it is either loaded by |LoadBuiltinModel| or by |LoadFlatBufferModel| and never both.
There is also an enum histogram “MachineLearningService.LoadModelResult” which records a generic model specification error event during a |LoadBuiltinModel| or |LoadFlatBufferModel| request when the model name is unknown.
Note that aspects of the design may have evolved since the original design docs were written.