Crash Reporter Security

As a tool that usually runs as root in order to access the data in various log and proc and sys files, we have to be extra careful while also processing untrusted data -- the coredumps and process state are entirely under the control of the untrusted processes. This doesn‘t mean we can’t take steps to reduce privilege and access to runtime state we don't need, but we do have to be careful as we reduce access to not also break our ability to collect data.

Status

With that out of the way, the current state of the world: all processes run as root all the time. Oops. That doesn't mean we want to stay here -- this section includes various ideas, suggestions, and brainstorms for how we can reduce even further.

Our ultimate goals at all times are to run with minimal access to resources and capabilities. Further splitting up these tools into discrete helpers/components will probably help with keeping clear lines for which tools need which privileges.

anomaly_detector needs read access to /var/log/messages and write access to /var/spool/crash/. crash_reporter_logs.conf needs read access to dmesg and some sysfs PCI registers for the kernel_warning_collector. At the very least, we should be able to have this minijail itself.

crash_sender runs in a limited minijail, but still as root, and accesses servers over the network. It needs write access to all the crash spool directories, as well as its own internal /var/lib/crash_sender/ state, and /run/lock/crash_sender. It only needs read access to /run/crash_reporter/ for testing state. Otherwise, the only mutable paths it needs are entirely self contained in the spool paths, so dropping access to everything else should be doable. Another idea is to fork a child for parsing the crash reports and communicating with the network -- it would be able to drop privs and run under a restrictive seccomp filter. The only content it would need access to is the specific set of crash reports.

crash_reporter is a bit of a beast. Not only does it need write access to /var/lib/crash_reporter/ and the crash spool directories, but it also needs read access to many /proc/ files (especially the /proc/<pid>/ of the crashing process which usually have read restrictions in place based on the crashing process's uid/gid), as well as all the random supplemental log sources in crash_reporter_logs.conf. Perhaps during early startup, we setuid to an account for most work, and we only setuid(0) again when we need to read a restricted path, and then we setuid back. Or we drop all caps except CAP_DAC_OVERRIDE assuming the kernel still allows us to access all the paths we need to. For helper programs we run (most notably core2md), we should be able to run them in a more restrictive environment as they are data-in/data-out tools.

Access to the spool dirs is needed by only these tools and feedback reports (which are gathered as root by debugd). So we should be able to change all of these from root:root to a new dedicated account like crash:crash, as well as using that account for dropping privs. This is tracked in https://crbug.com/441427.

Cryptohome Protected Crash Reports

Some crash report are considered to be especially likely to contain sensitive user information and are stored in the cryptohome. Right now these are stored in /home/user/<user_hash>/crash, but /home/user/<user_hash> is only traversable by user chronos and group chronos-access. This means that anything writing into that spool directory currently must also acquire permission to read large parts of the user data stored in the cryptohome, which limits our ability to have lower privilege processes record crashes here.

Worse, it creates a potential privilege escalation vector because crash_sender may end up processing reports with a higher privilege level then is required to write to the directory. This means a lower privileged process could set up the spool directory in an unexpected way to trick crash_sender by e.g. creating symlinks, or modifying it at the same time as crash_sender is accessing it. This has been a source of many historical vulnerabilities.

Fortunately, we now have another set of paths that form part of the cryptohome, mounted under /home/root/<user_hash>. Directories under this path are created there by cryptohomed and bind mounted to /var/daemon-store/*/<user_hash>, which can be traversed to by any process. Therefore, we can create a new crash sub-directory owned by crash:crash-user-access and processes that need to produce encrypted crash reports can be given access to only this path by making them members of crash-user-access. crash_sender will also be able to access this directory while having strictly less privilege then any process that creates crash reports. The new /home/root/<user_hash>/crash directory should eventually replace the /home/user/<user_hash>/crash directory entirely.

This leaves some residual risk that one crash reporting process will compromise another via this shared directory. Crash reporters interact with it in two ways. Once by reading the filenames to determine if the directory is full, which is unlikely to be exploitable by writing things to the directory, and later by writing out files into the directory. This could be exploited by tricking the process into writing to a symlink, but most (possibly all) writes to spool directories open files using O_CREAT|O_EXCL to ensure they only write to newly created ordinary files.

Historical Vulnerabilities

Here we cover some vulnerabilities that were found in crash-reporter. Hopefully by understanding the types of bugs that hit us in the past, we can design a system that disables entire classes of bugs rather than simply fixing each of these in a one-off fashion.

crbug.com/678365 (info leak)

https://crbug.com/678365

This bug allowed the chronos user to read any file as root (including memory of processes via /proc/ symlinks).

The scenario is as follows:

chronos user has full read/write access to the path /home/chronos/Consent To Send Stats.
chronos user deletes the consent file and symlinks it to desired file.
chronos user creates a valid crash (such as visiting chrome://crash).
chronos user triggers crash uploading (a valid & authorized request).
crash_sender (as root) reads that path to get the UID.
crash_sender passes the value read as a command line option to curl.
chronos user is able to observe the commandline of all processes.
chronos user simply polls and watches /proc/*/cmdline files.

The fix for this was a directed one:

https://chromium-review.googlesource.com/422851: (1) metrics_client no longer follows symlinks, (2) metrics_client now validates the input file, and (3) all other projects replaced their own ad-hoc logic with a single robust implementation (calling metrics code).

There are a few alternative ways this could have been addressed, albeit with a lot more disruption to the overall system.

https://crbug.com/359207: Lock down /proc so users can only see their own processes.
https://crbug.com/655606: Disallow following symlinks in the kernel on writable partitions (stateful).

crbug.com/766275 (priv escalation)

https://crbug.com/766275

This bug allowed any user on the system to get root execution.

The scenario is as follows:

crash_reporter uses /tmp/crash_reporter/<pid>/ with hardcoded filenames (e.g. environ) to hold intermediate state.
User starts a process that does:
- Creates /home/chronos/drop/.
- Symlinks /home/chronos/drop/environ to /proc/sys/kernel/core_pattern.
- Symlinks /tmp/crash_reporter/getpid() to /home/chronos/drop.
- Modifies own environment with content to be written to core_pattern (e.g. |/bin/bash /home/...).
- Forces self to crash.
crash_reporter is run by kernel as root to handle the crash.
crash_reporter sees /tmp/crash_reporter/<pid>/ already exists.
crash_reporter copies /proc/<pid>/environ to /tmp/crash_reporter/<pid>/environ.
This dereferences the symlink and writes the content to core_pattern.
crash_reporter finishes its execution.
User triggers another crash.
Kernel executes user's custom code and they now have root.

A few directed fixes went in first:

https://chromium-review.googlesource.com/673485: /tmp/crash_reporter/ is created and owned as root during early boot so no one else could hijack it.
https://chromium-review.googlesource.com/716878 & https://chromium-review.googlesource.com/721564: Move some /tmp state to /run which is only accessible by root and the daemon that initialized the specific subdir (e.g. crash-reporter).

After that, some class fixes went in:

https://chromium-review.googlesource.com/672943 & https://chromium-review.googlesource.com/678294 & https://chromium-review.googlesource.com/723869: Have crash_reporter & crash_sender always run in unique mount namespaces and create unique & empty /tmp mounts. Now any attacks via shared /tmp are impossible.
We also mount /proc read-only so any write attacks to sysctl paths are impossible.
https://chromium-review.googlesource.com/753406: Make /proc/self/mem read-only for all processes (a noexec bypass).

There are a few class fixes that would help with this:

https://crbug.com/655606: Disallow following symlinks in the kernel on writable partitions (stateful).
https://crbug.com/873733: Following symlinks from sticky dirs are disallowed, but only when the last component of the path is a symlink. If an intermediate path is a symlink, it is still traversed.

crbug.com/817993 (priv escalation)

https://crbug.com/817993

This bug allowed the chronos user to get root execution.

The scenario is as follows:

User creates a valid crash report (e.g. visit chrome://crash).
User edits the .meta file created under their chronos-owned /home/chronos/<user_hash>/crash/ spool directory.
Add a line starting with upload_ and followed by a sed script. e.g. /p;s^.*^setsid${IFS}bash${IFS}<a-shell-script>${IFS}\&^ep;/=1.
crash_sender (as root) parses that .meta file.
The upload_... key is extracted and passed to sed verbatim.
The e sed command executes an arbitrary command via system().

The quick directed fix was to validate all input .meta reports:

https://chromium-review.googlesource.com/944788: If the .meta file contains any bad content, we just delete the report.

After that, some class fixes went in:

https://chromium-review.googlesource.com/949523: Hard enable sed's existing “sandbox” mode all the time for boards. This makes the r/w/e commands (read file/write file/execute) always unavailable and prevents arbitrary code exec ever again.
https://crbug.com/767182: Add a sandbox mode to awk and always enable it for all boards. This disables system(), file redirects, and pipelines.

There are a few class fixes that would help with this:

If the code parsing the report wasn‘t run as root (and it doesn’t really have to), then that'd limit the damage of arbitrary code exec.
https://crbug.com/391887: Rewrite crash_sender in C++ to avoid all usage of external programs like sed or awk and thus any arbitrary code execution they introduce.

crbug.com/866895 (priv escalation)

https://crbug.com/866895

This bug allowed people to chown arbitrary paths to chronos as root.

The scenario is as follows:

User deletes their crash spool directory in their profile.
User symlinks crash to an arbitrary path.
User triggers a crash.
crash_reporter is invoked by the kernel as root. It derefs the symlink and then chowns the target to chronos.

The fix was to improve the spool directory walking code to avoid any TOCTOU races, and to have all filesystem operations avoid derefing any symlinks.

https://crrev.com/c/1156064: All paths are processed relative to the spool directory instead of absolute paths. This allows us to guarantee the directory we open isn‘t a symlink, and that it never changes on us while we’re processing it.
https://crrev.com/c/1152078: We use filesystem funcs that reject symlinks to initialize all spool directories.

The larger class fix involved blocking symlinks entirely.

https://crbug.com/655606: Disallow following symlinks in the kernel on writable partitions (stateful).