Checking in a small tool that has customizable hooks to recover DUTs.

Checking in a new tool that runs after failsafe.conf has started
(30s in to boot or after Chrome has booted) that will run a set of hooks
on Chromium/Chrome OS test images that check "bricked" conditions and fix them.
I'm starting with a hook that toggles eth0 and restarts shill if we can't reach
www.google.com.

BUG=chromium-os:34178
TEST=Ran it and checked logs

Change-Id: I65f1c0eec61ff4dac04478d8bf2e94abd0609dbe
Reviewed-on: https://gerrit.chromium.org/gerrit/32453
Commit-Ready: Chris Sosa <sosa@chromium.org>
Reviewed-by: Chris Sosa <sosa@chromium.org>
Tested-by: Chris Sosa <sosa@chromium.org>
diff --git a/recover_duts/README b/recover_duts/README
new file mode 100644
index 0000000..c76a5b4
--- /dev/null
+++ b/recover_duts/README
@@ -0,0 +1,15 @@
+# Copyright (c) 2012 The Chromium OS Authors. All rights reserved.
+# Use of this source code is governed by a BSD-style license that can be
+# found in the LICENSE file.
+
+Recover DUTS is a Python Utility that runs on test images. It periodically runs
+a set of hooks to ensure that we can correctly connect / recover a bricked DUT
+without requiring manual recovery.
+
+In order to add a hook, add an executable script in the hooks/ dir that ends with
+.hook suffix. It can be written in anything that is executable by a test image --
+Python, Dash script or a binary program.
+
+Hooks that fail result in logs @ /var/log/recover_duts_log.
+
+The init script for this tool is stored in platform/init.
diff --git a/recover_duts/hooks/check_ethernet.hook b/recover_duts/hooks/check_ethernet.hook
new file mode 100755
index 0000000..19fc00a
--- /dev/null
+++ b/recover_duts/hooks/check_ethernet.hook
@@ -0,0 +1,43 @@
+#!/bin/sh
+#
+# Copyright (c) 2012 The Chromium OS Authors. All rights reserved.
+# Use of this source code is governed by a BSD-style license that can be
+# found in the LICENSE file.
+
+set -e
+
+# Only run this script on test machines that run in the lab.
+# See autotest/server/hosts/site_host.py for more information.
+if [ ! -f /mnt/stateful_partition/.labmachine ]; then
+  exit 0
+fi
+
+# Ping itself doesn't work on test images in a VM.
+PING="curl --interface eth0 -o /dev/null www.google.com"
+
+if ${PING};  then
+  exit 0
+fi
+
+ifconfig eth0 down
+ifconfig eth0 up
+sleep 5
+
+if ${PING}; then
+  echo "Reconfigured using ifconfig down/up."
+  exit 1
+fi
+
+initctl stop flimflam || echo "Flimflam was not running."
+initctl start flimflam
+
+sleep 5
+
+if ${PING}; then
+  exit 1
+fi
+
+# Last chance - reboot if we can't get any connectivity.
+echo "All efforts to recover ethernet have been exhausted. Rebooting."
+(sleep 5 && reboot) &
+exit 1
\ No newline at end of file
diff --git a/recover_duts/recover_duts.py b/recover_duts/recover_duts.py
new file mode 100755
index 0000000..47720c6
--- /dev/null
+++ b/recover_duts/recover_duts.py
@@ -0,0 +1,59 @@
+#!/usr/bin/python
+#
+# Copyright (c) 2012 The Chromium OS Authors. All rights reserved.
+# Use of this source code is governed by a BSD-style license that can be
+# found in the LICENSE file.
+
+# This module runs at system startup on Chromium OS test images. It runs through
+# a set of hooks to keep a DUT from being bricked without manual intervention.
+# Example hook:
+#   Check to see if ethernet is connected. If its not, unload and reload the
+#     ethernet driver.
+
+import logging
+import os
+import subprocess
+import time
+
+LOGGING_SUBDIR = '/var/log/recover_duts'
+LOG_FILENAME_FORMAT = os.path.join(LOGGING_SUBDIR,
+                                   'recover_duts_log_%Y%m%d_%H%M%S.txt')
+LOGGING_FORMAT = '%(asctime)s - %(levelname)s - %(message)s'
+SLEEP_DELAY = 600
+
+
+def main():
+  if not os.path.isdir(LOGGING_SUBDIR):
+    os.makedirs(LOGGING_SUBDIR)
+
+  log_filename = time.strftime(LOG_FILENAME_FORMAT)
+  logging.basicConfig(filename=log_filename, level=logging.DEBUG,
+                      format=LOGGING_FORMAT)
+  hooks_dir = os.path.join(os.path.dirname(__file__), 'hooks')
+  try:
+    while(True):
+      for script in os.listdir(hooks_dir):
+        script = os.path.join(hooks_dir, script)
+        if os.path.isfile(script) and script.endswith('.hook'):
+          logging.debug('Running hook: %s', script)
+          popen = subprocess.Popen([script], stdout=subprocess.PIPE,
+                                   stderr=subprocess.STDOUT)
+          output = popen.communicate()[0]
+          if popen.returncode == 0:
+            logging.debug('Running of %s succeeded with output:\n%s', script,
+                          output)
+          else:
+            logging.warn('Running of %s failed with output:\n%s', script,
+                         output)
+      else:
+        time.sleep(SLEEP_DELAY)
+
+  except Exception as e:
+    # Since this is run from an upstart job we want to ensure we log this into
+    # our log file before dying.
+    logging.fatal(str(e))
+    raise
+
+
+if __name__ == '__main__':
+  main()