ml/README.md - third_party/platform2 - Git at Google

 # Chrome OS Machine Learning Service

 ## Summary

 The Machine Learning (ML) Service provides a common runtime for evaluating
 machine learning models on device. The service wraps the TensorFlow Lite runtime
 and provides infrastructure for deployment of trained models. The TFLite runtime
 runs in a sandboxed process. Chromium communicates with ML Service via a Mojo
 interface.

 ## How to use ML Service

 You need to provide your trained models to ML Service first, then load and use
 your model from Chromium using the client library provided at
 [//chromeos/services/machine_learning/public/cpp/]. See [this
 doc](docs/publish_and_use_model.md) for more detailed instructions.

 Note: The sandboxed process hosting TFLite models is currently shared between
 all users of ML Service. If this isn't acceptable from a security perspective
 for your model, follow [this bug](http://crbug.com/933017) about switching ML
 Service to having a separate sandboxed process per loaded model.

 ## Metrics

 The following metrics are currently recorded by the daemon process in order to
 understand its resource costs in the wild:

 * MachineLearningService.MojoConnectionEvent: Success/failure of the
   D-Bus->Mojo bootstrap.
 * MachineLearningService.TotalMemoryKb: Total (shared+unshared) memory footprint
   every 5 minutes.
 * MachineLearningService.PeakTotalMemoryKb: Peak value of
   MachineLearningService.TotalMemoryKb per 24 hour period. Daemon code can
   also call ml::Metrics::UpdateCumulativeMetricsNow() at any time to take a
   peak-memory observation, to catch short-lived memory usage spikes.
 * MachineLearningService.CpuUsageMilliPercent: Fraction of total CPU resources
   consumed by the daemon every 5 minutes, in units of milli-percent (1/100,000).

 Additional metrics added in order to understand the resource costs of each
 request for a particular model:

 * MachineLearningService.|MetricsModelName|.|request|.Event: OK/ErrorType of the
   request.
 * MachineLearningService.|MetricsModelName|.|request|.TotalMemoryDeltaKb: Total
   (shared+unshared) memory delta caused by the request.
 * MachineLearningService.|MetricsModelName|.|request|.CpuTimeMicrosec: CPU time
   usage of the request, which is scaled to one CPU core, i.e. the units are
   CPU-core\*microsec (10 CPU cores for 1 microsec = 1 CPU core for 10 microsec =
   recorded value of 10).

 |MetricsModelName| is specified in the model's [metadata][model_metadata.cc] for
 builtin models and is specified in |FlatBufferModelSpec| by the client for
 flatbuffer models.
 The above |request| can be following:

 * LoadModelResult
 * CreateGraphExecutorResult
 * ExecuteResult (model inference)

 The request name "LoadModelResult" is used no matter the model is loaded by
 |LoadBuiltinModel| or by |LoadFlatBufferModel|. This is valid based on the fact
 that for a particular model, it is either loaded by |LoadBuiltinModel| or by
 |LoadFlatBufferModel| and never both.

 There is also an enum histogram "MachineLearningService.LoadModelResult"
 which records a generic model specification error event during a
 |LoadBuiltinModel| or |LoadFlatBufferModel| request when the model name is
 unknown.

 ## Original design docs

 Note that aspects of the design may have evolved since the original design docs
 were written.

 * [Overall design](https://docs.google.com/document/d/1ezUf1hYTeFS2f5JUHZaNSracu2YmSBrjLkri6k6KB_w/edit#)
 * [Mojo interface](https://docs.google.com/document/d/1pMXTG-OIhkNifR2DCPa2bCF0X3jrAM-U6UK230pBv5I/edit#)
 * [Deamon\<-\>Chromium IPC implementation](https://docs.google.com/document/d/1EzBKLotvspe75GUB0Tdk_Namstyjm6rJHKvNmRCCAdM/edit#)
 * [Model publishing](https://docs.google.com/document/d/1LD8sn8rMOX8y6CUGKsF9-0ieTbl97xZORZ2D2MjZeMI/edit#)


 [//chromeos/services/machine_learning/public/cpp/]: https://cs.chromium.org/chromium/src/chromeos/services/machine_learning/public/cpp/service_connection.h
 [model_metadata.cc]: https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/ml/model_metadata.cc
	# Chrome OS Machine Learning Service

	## Summary

	The Machine Learning (ML) Service provides a common runtime for evaluating
	machine learning models on device. The service wraps the TensorFlow Lite runtime
	and provides infrastructure for deployment of trained models. The TFLite runtime
	runs in a sandboxed process. Chromium communicates with ML Service via a Mojo
	interface.

	## How to use ML Service

	You need to provide your trained models to ML Service first, then load and use
	your model from Chromium using the client library provided at
	[//chromeos/services/machine_learning/public/cpp/]. See [this
	doc](docs/publish_and_use_model.md) for more detailed instructions.

	Note: The sandboxed process hosting TFLite models is currently shared between
	all users of ML Service. If this isn't acceptable from a security perspective
	for your model, follow [this bug](http://crbug.com/933017) about switching ML
	Service to having a separate sandboxed process per loaded model.

	## Metrics

	The following metrics are currently recorded by the daemon process in order to
	understand its resource costs in the wild:

	* MachineLearningService.MojoConnectionEvent: Success/failure of the
	D-Bus->Mojo bootstrap.
	* MachineLearningService.TotalMemoryKb: Total (shared+unshared) memory footprint
	every 5 minutes.
	* MachineLearningService.PeakTotalMemoryKb: Peak value of
	MachineLearningService.TotalMemoryKb per 24 hour period. Daemon code can
	also call ml::Metrics::UpdateCumulativeMetricsNow() at any time to take a
	peak-memory observation, to catch short-lived memory usage spikes.
	* MachineLearningService.CpuUsageMilliPercent: Fraction of total CPU resources
	consumed by the daemon every 5 minutes, in units of milli-percent (1/100,000).

	Additional metrics added in order to understand the resource costs of each
	request for a particular model:

	* MachineLearningService.\|MetricsModelName\|.\|request\|.Event: OK/ErrorType of the
	request.
	* MachineLearningService.\|MetricsModelName\|.\|request\|.TotalMemoryDeltaKb: Total
	(shared+unshared) memory delta caused by the request.
	* MachineLearningService.\|MetricsModelName\|.\|request\|.CpuTimeMicrosec: CPU time
	usage of the request, which is scaled to one CPU core, i.e. the units are
	CPU-core\*microsec (10 CPU cores for 1 microsec = 1 CPU core for 10 microsec =
	recorded value of 10).

	\|MetricsModelName\| is specified in the model's [metadata][model_metadata.cc] for
	builtin models and is specified in \|FlatBufferModelSpec\| by the client for
	flatbuffer models.
	The above \|request\| can be following:

	* LoadModelResult
	* CreateGraphExecutorResult
	* ExecuteResult (model inference)

	The request name "LoadModelResult" is used no matter the model is loaded by
	\|LoadBuiltinModel\| or by \|LoadFlatBufferModel\|. This is valid based on the fact
	that for a particular model, it is either loaded by \|LoadBuiltinModel\| or by
	\|LoadFlatBufferModel\| and never both.

	There is also an enum histogram "MachineLearningService.LoadModelResult"
	which records a generic model specification error event during a
	\|LoadBuiltinModel\| or \|LoadFlatBufferModel\| request when the model name is
	unknown.

	## Original design docs

	Note that aspects of the design may have evolved since the original design docs
	were written.

	* [Overall design](https://docs.google.com/document/d/1ezUf1hYTeFS2f5JUHZaNSracu2YmSBrjLkri6k6KB_w/edit#)
	* [Mojo interface](https://docs.google.com/document/d/1pMXTG-OIhkNifR2DCPa2bCF0X3jrAM-U6UK230pBv5I/edit#)
	* [Deamon\<-\>Chromium IPC implementation](https://docs.google.com/document/d/1EzBKLotvspe75GUB0Tdk_Namstyjm6rJHKvNmRCCAdM/edit#)
	* [Model publishing](https://docs.google.com/document/d/1LD8sn8rMOX8y6CUGKsF9-0ieTbl97xZORZ2D2MjZeMI/edit#)


	[//chromeos/services/machine_learning/public/cpp/]: https://cs.chromium.org/chromium/src/chromeos/services/machine_learning/public/cpp/service_connection.h
	[model_metadata.cc]: https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/ml/model_metadata.cc