blob: 01ccd46aac19490333aeaca4f4393f6573180d4b [file] [log] [blame] [view] [edit]
# Design doc: debugd
## Objective
Expose system debugging information over DBus to allow better sandboxing
of the user session and more detailed diagnostic availability through
Chrome.
## Background
Currently, our debugging and diagnostic tools (specifically those
implemented in `crosh` and in `chrome://system`) work by shelling out
to run binary code. This exposes a lot of surface area via crosh (and
Chrome, to a lesser extent) and forces us to allow those contexts to
execute programs and read files, which they otherwise have no need to
do. Another concern is that some of these diagnostics (for example,
crosh's 'ping') command rely on executing setuid binaries. Removing
the ability to use setuid altogether from the user session and from
crosh removes a lot of attack surface that is otherwise exposed in the
linker and kernel.
## Overview
Safely expose system debugging information over DBus. This allows us to
restrict contexts which otherwise must have very broad access to exec()
and setuid binaries to communicating over DBus.
## Detailed Design
The debug daemon will be implemented as a single daemon, running as an
unprivileged user, communicating over DBus. It will accept commands over DBus
and either compute the information itself or run a helper program, then hand the
result back over DBus. The debug daemon does not cache results for repeated
requests. The debug daemon will run under strict seccomp system-call
filtering rules, which will reduce the kernel ABI exposed to debugd and
its helpers.
The debug daemon will present its functionality as a single object at a
fixed path `/org/chromium/debugd` implementing the interface described
in [`/dbus_bindings/org.chromium.debugd.xml`][iface]. All the debugd
methods can be synchronous, since it is used only to fetch debugging info - we
don't need to worry about concurrent users since it is unlikely that the user
will run two debug commands from two different crosh instances at once,
and even if they do, the commands will be queued. Making `chrome://system`
slower is something we do need to be concerned about. An example method
might be:
CellularStatus : () -> a{sv}
"CellularStatus takes nothing and returns a map from string to variant."
The implementation is documented in [`/doc/implementation`][impl]. In general,
the debug daemon blocks inside DBus, waiting for incoming messages; when
it receives a message, it looks up the incoming message name in a method
table and calls the associated function. The function gathers
information and replies to the DBus message as needed.
The debug daemon also has a list of helpers, fixed at compiletime; when
debugd starts up, it creates a new tmpfs, visible only to it and its
descendants, and mounts it at `/mnt/debugd`. Each of the helper programs
is then launched, and can spool information into the tmpfs as desired,
presumably for collection by some method inside debugd. Some helpers are
launched as needed instead of running persistently. Helper sources live
in [`/src/helpers`](../src/helpers/).
Files stored in the tmpfs can be written as json. Doing so makes it
easier to write helpers, since a utility function is available for
"reply to this dbus message with this json structure". Protocol buffers
are unsuitable for this because they are not self-describing; we would
need to compile separate protobuf deserializers for each method into
debugd and choose which one to use for each file.
## Returning Complex Datastructures
Some methods have to return data structures that are not simple (for
example, the 'GetModemStatus' method). For these methods, we have three
choices for moving the complex data structure across DBus:
1. Transport them in DBus' wire format directly.
* P: No conversions needed in debugd
* P: Everyone talking to us implicitly speaks it
* C: Chrome needs to turn DBus wire format into its internal Value type
for use/display
2. Transport them as protocol buffers.
* P: Typesafe on the wire
* C: Need to convert DBus to protobuf in debugd
* C: Need a C/C++ helper for crosh to print these
* C: Chrome needs to turn these into its internal Value type
3. Transport them as JSON.
* P: Chrome can serialize/deserialize directly.
* P: Human-readable; can be shown directly to user by crosh
* P: Parseable from Javascript; can manipulate it from an
extension.
* C: Typesafe only at endpoints
* C: Need to convert DBus to JSON in debugd
We use JSON; although it makes more work for debugd, it makes it easier for
Chrome and crosh to use debugd.
## Security Considerations
This daemon will have its own attack surface which we need to take care
of. Argument sanitization is of paramount importance, although using
execve() instead of /bin/sh to run commands will remove an entire class
of attacks that crosh currently has.
There are some security mitigations we can apply to debugd itself:
1. We can drop to a different uid/gid.
* If we use a dedicated gid for debugd, we can take a lot of
files that are currently world-readable and instead make them
root:debugd 0640.
2. We can chroot and put ourselves in a bare vfs namespace.
* If we do this, we have to bring the things we need into our
namespace with us, although we can make their mounts
read-only.
* This doesn't really buy us anything over seccomp-filter if
our policy is appropriately tight, but eventually we might
need to allow writes for some debug tools, which would make
this a good line of defense.
3. We can seccomp-sandbox ourselves with syscall filtering, since we
should only need to do a fairly restrictive set of things.
* This will probably involve a lot of effort. Tracking down
which syscalls various helper programs use and keeping the
filter policy up-to-date will take time.
* The decrease in kernel and platform (filesystem permissions,
etc.) attack surface gained is worth it.
4. We can set rlimits, if we feel so inclined.
* The particular gain we might get here is that we can restrict
the number of outstanding helper programs we can have running
at a time, which might avoid systemwide denial-of-service
attacks.
* On the other hand, it opens us up to much easier denial of
service against the debug daemon. The debug daemon would have
to kill helper programs that ran past a certain time limit,
but perhaps it has to do this already.
There are some mitigations we can't apply yet:
1. We can't enable SECURE_NOROOT, since some of our helper programs
(e.g. /bin/ping) are setuid. Fixing this is going to require some
fairly major legwork.
2. We can't use a pid namespace, because this destroys the crash
reporter on 2.6.38. There's a patch floating around to fix this that
we'd need to apply.
3. We can't use a network namespace, because some of our tools (ping,
traceroute) need access to the real network.
## Testing Plan
We can broadly divide debugd's functionality into two classes for
testing purposes: functions that generate new information (like ping or
traceroute), and functions that return already-generated information
(like reading information out of sysfs).
Functions that generate new information are often sensitive to the
surrounding hardware/network environment - for example, pinging an
outside host relies on working networking and such. We can sometimes
test these functions by relying only on things we know exist in any reasonable
test environment (like pinging 127.0.0.1 and making sure we get
properly-formatted output), but some of them (3g status, for example)
rely on hardware state, and for these we need a human to ensure the
output lines up with hardware.
Functions that return already-generated information can be tested by
using minijail's chroot-and-bind functionality to fake the
already-generated information, then testing debugd's returns against the
known fake data.
ellyjones: add more detail here
[iface]: ../dbus_bindings/org.chromium.debugd.xml
[impl]: implementation.md