devserver: check presence of actual files when asserting staging

We have been seeing failures of lab tests due to staged files that are
missing despite the devserver reporting their artifact as staged. Since
the devserver uses an (empty) marker file to denote that an artifact has
been staged, this probably means that we suffer from either (i) a rare
bug that causes us to place a marker file for an artifact that was not
fully stage; or (ii) a rare race condition where files of a previously
staged artifact are being deleted, but not the marker file.

This CL works around both problem using brute force: it stores the list
of files that were actually "installed" (i.e. found their way to the
cache directory) explicitly in the artifact's marker file. When checking
whether an artifact has been staged, it reads the list of files and
ensures that each one of them is present. Some technical notes:

* When we fail to find one or more of the listed files, an loud error
  message is logged (which includes the list of missing files) and the
  marker flag is removed entirely.  This shortens the time needed for
  subsequent queries.

* To keep the logic simple, we store absolute file paths, which are easy
  to check, albeit not portable; this shouldn't be a problem assuming we
  do not expect a devserver's cache to mobilize.

* We list only files (including symlinks), hence the requirement for
  directories presence is only as implied by file paths. We further list
  all files that were copied to the cache directory, which may include
  (for example) the archive that was downloaded from GS and used for
  extracting one or more files. This may be relaxed in the future if we
  agree that such temporary files are not being used further; it has to
  be relaxed if we decide to remove these archives after extraction
  (e.g. to preserve cache space).

* The staging verification logic is written such that the files listed
  in the marker file constitute the smallest set of files whose presence
  entails the artifact being staged. This means that existing devserver
  caches (e.g. in the lab) will remain consistent wrt staging queries.

* Some artifacts (currently, the Autotest tarball artifact) opt not to
  store the full list of files. As noted above, this is a valid choice
  and simply means that, for that artifact, staging verification will
  have the same semantics it had before this change.  Storing explicit
  file lists is harder to justify for artifacts the contain a large
  number of files, such as the Autotest tarball. We may decide to undo
  this exception in the future if we keep getting inconsistent staging
  verification results for these artifacts.

* Unit tests were extended to validate the content of marker files for
  each tested artifact. A new test was added for the staging
  verification logic.

BUG=chromium:277839
TEST=Unit testing of new staging logic.
TEST=All existing unit tests.

Change-Id: Ib2ee1d56dbe31da095c3afd23f05d5fdf1200b0f
Reviewed-on: https://chromium-review.googlesource.com/176557
Tested-by: Gilad Arnold <garnold@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Chris Sosa <sosa@chromium.org>
Commit-Queue: Gilad Arnold <garnold@chromium.org>
3 files changed