Expose system debugging information over DBus to allow better sandboxing of the user session and more detailed diagnostic availability through Chrome.
Currently, our debugging and diagnostic tools (specifically those implemented in crosh
and in chrome://system
) work by shelling out to run binary code. This exposes a lot of surface area via crosh (and Chrome, to a lesser extent) and forces us to allow those contexts to execute programs and read files, which they otherwise have no need to do. Another concern is that some of these diagnostics (for example, crosh's ‘ping’) command rely on executing setuid binaries. Removing the ability to use setuid altogether from the user session and from crosh removes a lot of attack surface that is otherwise exposed in the linker and kernel.
Safely expose system debugging information over DBus. This allows us to restrict contexts which otherwise must have very broad access to exec() and setuid binaries to communicating over DBus.
The debug daemon will be implemented as a single daemon, running as an unprivileged user, communicating over DBus. It will accept commands over DBus and either compute the information itself or run a helper program, then hand the result back over DBus. The debug daemon does not cache results for repeated requests. The debug daemon will run under strict seccomp system-call filtering rules, which will reduce the kernel ABI exposed to debugd and its helpers.
The debug daemon will present its functionality as a single object at a fixed path /org/chromium/debugd
implementing the interface described in /dbus_bindings/org.chromium.debugd.xml
. All the debugd methods can be synchronous, since it is used only to fetch debugging info - we don't need to worry about concurrent users since it is unlikely that the user will run two debug commands from two different crosh instances at once, and even if they do, the commands will be queued. Making chrome://system
slower is something we do need to be concerned about. An example method might be:
CellularStatus : () -> a{sv}
“CellularStatus takes nothing and returns a map from string to variant.”
The implementation is documented in /doc/implementation
. In general, the debug daemon blocks inside DBus, waiting for incoming messages; when it receives a message, it looks up the incoming message name in a method table and calls the associated function. The function gathers information and replies to the DBus message as needed.
The debug daemon also has a list of helpers, fixed at compiletime; when debugd starts up, it creates a new tmpfs, visible only to it and its descendants, and mounts it at /mnt/debugd
. Each of the helper programs is then launched, and can spool information into the tmpfs as desired, presumably for collection by some method inside debugd. Some helpers are launched as needed instead of running persistently. Helper sources live in /src/helpers
.
Files stored in the tmpfs can be written as json. Doing so makes it easier to write helpers, since a utility function is available for “reply to this dbus message with this json structure”. Protocol buffers are unsuitable for this because they are not self-describing; we would need to compile separate protobuf deserializers for each method into debugd and choose which one to use for each file.
Some methods have to return data structures that are not simple (for example, the ‘GetModemStatus’ method). For these methods, we have three choices for moving the complex data structure across DBus:
We use JSON; although it makes more work for debugd, it makes it easier for Chrome and crosh to use debugd.
This daemon will have its own attack surface which we need to take care of. Argument sanitization is of paramount importance, although using execve() instead of /bin/sh to run commands will remove an entire class of attacks that crosh currently has.
There are some security mitigations we can apply to debugd itself:
There are some mitigations we can't apply yet:
We can broadly divide debugd's functionality into two classes for testing purposes: functions that generate new information (like ping or traceroute), and functions that return already-generated information (like reading information out of sysfs).
Functions that generate new information are often sensitive to the surrounding hardware/network environment - for example, pinging an outside host relies on working networking and such. We can sometimes test these functions by relying only on things we know exist in any reasonable test environment (like pinging 127.0.0.1 and making sure we get properly-formatted output), but some of them (3g status, for example) rely on hardware state, and for these we need a human to ensure the output lines up with hardware.
Functions that return already-generated information can be tested by using minijail‘s chroot-and-bind functionality to fake the already-generated information, then testing debugd’s returns against the known fake data.
ellyjones: add more detail here