[autotest] limit repair failed count to the same host and hqe The limit is added so we won't repeatedly repair a host for a job created from AFE. The code path has a bug that will set the host in repair failed status even for jobs created with meta_host, and the host was repair the first time. This CL limits the count of repair job to the ones with same host and hqe. Thus, a host can be tried to be repaired if an hqe failed in multiple hosts. DEPLOY=scheduler BUG=chromium:392496,chromium:426905 TEST=local set max_repair_limit in global config to 0, raise an exception in reset to force reset to fail. test frontend job: Create a job from AFE with a given host. Confirm that the dut goes into repair failed status and no repair job queued. test suite job: create a suite job When max_repair_limit is set to 0, confirm the duts goes into repair failed status and no repair job queued. Wehn max_repair_limit is set to 2, confirm that repair job was created after reset failure. Change-Id: Icf737f7ff90a96edd6f08b5d79f431b66313d242 Reviewed-on: https://chromium-review.googlesource.com/225442 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>

commit: a1f0d02e55fb497e0051c4fee7edc0b99ff378b7 [log] [tgz]
author: Dan Shi <dshi@chromium.org> Fri Oct 24 12:13:04 2014 -0700
committer: chrome-internal-fetch <chrome-internal-fetch@google.com> Tue Oct 28 07:46:47 2014 +0000
tree: 2a5d0679ecc25e1f70c134d24c34784457770936
parent: 49567d92b3e8499f8fd96abd2b1f3be6d88ac62b [diff]
diff --git a/scheduler/prejob_task.py b/scheduler/prejob_task.py
index 64c63c3..4524fd7 100644
--- a/scheduler/prejob_task.py
+++ b/scheduler/prejob_task.py

@@ -125,9 +125,13 @@
                 # limit, since then we overwrite the PARSING state of the HQE.
                 self.queue_entry.requeue()
 
+            # Limit the repair on a host when a prejob task fails, e.g., reset,
+            # verify etc. The number of repair jobs is limited to the specific
+            # HQE and host.
             previous_repairs = models.SpecialTask.objects.filter(
                     task=models.SpecialTask.Task.REPAIR,
-                    queue_entry_id=self.queue_entry.id).count()
+                    queue_entry_id=self.queue_entry.id,
+                    host_id=self.queue_entry.host_id).count()
             if previous_repairs >= scheduler_config.config.max_repair_limit:
                 self.host.set_status(models.Host.Status.REPAIR_FAILED)
                 self._fail_queue_entry()
commit	a1f0d02e55fb497e0051c4fee7edc0b99ff378b7	[log] [tgz]
author	Dan Shi <dshi@chromium.org>	Fri Oct 24 12:13:04 2014 -0700
committer	chrome-internal-fetch <chrome-internal-fetch@google.com>	Tue Oct 28 07:46:47 2014 +0000
tree	2a5d0679ecc25e1f70c134d24c34784457770936
parent	49567d92b3e8499f8fd96abd2b1f3be6d88ac62b [diff]