recover_duts: add ability to lock out shill restart

Some tests deliberately stop shill. For these tests, we need
the ability to keep recover_duts from restarting shill. But,
at the same time, we want to ensure that a bug in those tests
doesn't hang a DUT indefinitely.

To that end: add a locking facility to check_ethernet.hook.
If /var/lock/shill-start.lock is present (and not a dangling
symlink), then check_ethernet.hook will not attempt to
(re-)start shill.

Indefinite hangs should not occur, because
 i) check_ethernet.hook eventually reboots the system, and
ii) /var/lock is on a tmpfs, so the lock file disappears on
    reboot.

As a small optimization, if the lock file is a dangling symlink,
then the lock is ignored. This means that the locking process
can point /var/lock/shill-start.lock at /proc/<own PID>, and be
reasonably confident that the lock will be ignored if the locking
process exits. ("Reasonably confident" because there is a small
chance of PID reuse.)

BUG=chromium:530791
TEST=manual (see below)

unlocked case
- change TIMEOUT_MINUTES to 2
- change LONG_BOOT_DELAY to 10 seconds (in recover_duts.py)
- copy recover_duts.py to DUT
- copy check_ethernet.hook to DUT
- touch /mnt/stateful_partition/.labmachine
- stop shill; restart recover_duts
- wait 3 minutes
- observe that system is back online, and has _not_ rebooted

locked case
- change TIMEOUT_MINUTES to 2
- change LONG_BOOT_DELAY to 10 seconds (in recover_duts.py)
- copy recover_duts.py to DUT
- copy check_ethernet.hook to DUT
- touch /mnt/stateful_partition/.labmachine
- ln -s /proc/1 /var/lock/shill-start.lock
- stop shill; restart recover_duts
- wait 4 minutes
- observe that system is back online, and _has_ rebooted

stale lock case
- change TIMEOUT_MINUTES to 2
- change LONG_BOOT_DELAY to 10 seconds (in recover_duts.py)
- copy recover_duts.py to DUT
- copy check_ethernet.hook to DUT
- touch /mnt/stateful_partition/.labmachine
- ln -s /proc/0 /var/lock/shill-start.lock
- stop shill; restart recover_duts
- wait 4 minutes
- observe that system is back online, and has _not_ rebooted

Change-Id: Ic938923a529bf2db7d9b6913224f4f452f7da0cb
Reviewed-on: https://chromium-review.googlesource.com/299484
Commit-Ready: mukesh agrawal <quiche@chromium.org>
Tested-by: mukesh agrawal <quiche@chromium.org>
Reviewed-by: Paul Stewart <pstew@chromium.org>
1 file changed