check_ethernet: defer action if recently (un)locked

See $BUG for more info. When a test releases this lock, it doesn't
guarantee that upstream connectivity is completely restored; it only
guarantees that the test has finished with any of its destructive
actions (e.g., it has finished restarting Shill; it has finished
resetting its IP address; it has finished its suspend/resume cycle). The
DUT could still be renegotiating DHCP, for instance.

Tests will be updated to update the file modification time before
releasing the lock, and so we can use that to determine whether a test
may have recently disturbed connectivity. If an actor has disturbed
connectivity in the last 30 seconds, we skip any attempt at
detection/recovery, and just wait for the next time around.

BUG=chromium:1083044
TEST=run a network test that stops shill, in a loop, while
     simultaneously running check_ethernet.hook in a tight loop;
     we expect not to see check_ethernet.hook triggering any recovery
     actions (and so, never see it log "successful after XX seconds")
     `while : ; do tast -verbose run -failfortests \
        ${HOST} network.DefaultProfile || break; done`
     on DUT:
     `stop recover_duts
      while : ; do
        /usr/local/libexec/recover-duts/hooks/check_ethernet.hook 2>&1 | \
        tee /dev/tty | grep -q 'successful after [0-9]* seconds' && break
      done`

Cq-Depend: chromium:2293100
Change-Id: Ifc25e33347f0cfbe300a66c3081a76c0088c4aa9
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crostestutils/+/2293019
Tested-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Matthew Wang <matthewmwang@chromium.org>
Commit-Queue: Brian Norris <briannorris@chromium.org>
diff --git a/recover_duts/hooks/check_ethernet.hook b/recover_duts/hooks/check_ethernet.hook
index 8fba061..ebf78a1 100755
--- a/recover_duts/hooks/check_ethernet.hook
+++ b/recover_duts/hooks/check_ethernet.hook
@@ -243,26 +243,36 @@
   # "Pause" the ethernet check for up to 30 minutes at the
   # request of any test that creates and flocks PAUSE_FILE.
 
+  local start_time
+  local paused_time
+
+  # Check the modification time before locking, in case we're the first to
+  # create it.
+  start_time=$(stat -c%Z "${PAUSE_FILE}") || true
+  if [ -n "${start_time}" ]; then
+    local now
+    now="$(date +%s)"
+    paused_time=$((now - start_time))
+  fi
+
   # Acquire the lock and hold it until exit, if possible.
   if try_pause_lock; then
+    if [ -n "${paused_time}" -a "${paused_time}" -lt 30 ]; then
+      # We were recently paused. Skip this iteration, in case the Ethernet link
+      # is still coming up.
+      info_msg "Last locked ${paused_time} seconds ago; skipping"
+      return 0
+    fi
     # File wasn't locked - no need to pause.
     return 1
   fi
 
-  local now
-  local start_time
-
-  now="$(date +%s)"
-  start_time=$(stat -c%Z "${PAUSE_FILE}") || true
-
   if [ -z "${start_time}" ]; then
     # Couldn't figure out lock time - just clobber it.
     force_pause_lock
     return 1
   fi
 
-  local paused_time
-  paused_time=$((now - start_time))
   if [ ${paused_time} -gt $((30*60)) ] ; then
     critical_msg "Pause request exceeded 30 minutes. Checking lab network link."
     force_pause_lock