Add automatic zone retries to fix test stockout failures

Running 20+ test builds at the same time frequently causes zone resource
exhaustion (stockout) errors. To make testing more scalable and
reliable, run_builds.sh now automatically retries stockout failures in
different zones.

Additionally, because retrying builds takes extra time, the overall
timeout limit in postsubmit-cloudbuild.yaml has been increased to
prevent the main job from timing out.

A _ZONE substition has been added to the tests for centralized control.

Remote test result:
https://paste.googleplex.com/5721419085381632

BUG=b/509619277
TEST=presubmit
RELEASE_NOTE=None

Change-Id: I5fcc682e52474e0992bd3d2d4644970239d7b39c
Reviewed-on: https://cos-review.googlesource.com/c/cos/tools/+/151929
Reviewed-by: Robert Kolchmeyer <rkolchmeyer@google.com>
Cloud-Build: 228075978874@cloudbuild.gserviceaccount.com <228075978874@cloudbuild.gserviceaccount.com>
Tested-by: Chenglong Tang <chenglongtang@google.com>
30 files changed
tree: d6b4564a84c2aaff535101c785aa6b6748ea055c
  1. coverage/
  2. release/
  3. src/
  4. testing/
  5. vendor/
  6. .bazelignore
  7. .gitignore
  8. BUILD.bazel
  9. cloudbuild.yaml
  10. CONTRIBUTING.md
  11. deps.bzl
  12. go.mod
  13. go.sum
  14. LICENSE
  15. postsubmit-cloudbuild.yaml
  16. README.md
  17. run_builds.sh
  18. run_unit_tests.sh
  19. WORKSPACE
README.md

Tools for Container-Optimized OS

This is a repository of various tools developed for Container-Optimized OS. Examples include cos-gpu-installer, cos-toolbox, etc.

See CONTRIBUTING.md for how to contribute.