)]}'
{
  "commit": "894f17dcc8a978b3bd6e85e3cfe41e97d55a87d8",
  "tree": "202d75563348b5b95e62bce815e5b963f81a7eb3",
  "parents": [
    "7ccb6135081f892aaa5b3b94bc90050a536ccab2"
  ],
  "author": {
    "name": "Harshit Agarwal",
    "email": "harshit@nutanix.com",
    "time": "Tue Feb 25 18:05:53 2025 +0000"
  },
  "committer": {
    "name": "Derek Taylor",
    "email": "ddtaylor@google.com",
    "time": "Fri Mar 06 09:49:14 2026 -0800"
  },
  "message": "sched/rt: Fix race in push_rt_task\n\ncommit 690e47d1403e90b7f2366f03b52ed3304194c793 upstream.\n\nOverview\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\nWhen a CPU chooses to call push_rt_task and picks a task to push to\nanother CPU\u0027s runqueue then it will call find_lock_lowest_rq method\nwhich would take a double lock on both CPUs\u0027 runqueues. If one of the\nlocks aren\u0027t readily available, it may lead to dropping the current\nrunqueue lock and reacquiring both the locks at once. During this window\nit is possible that the task is already migrated and is running on some\nother CPU. These cases are already handled. However, if the task is\nmigrated and has already been executed and another CPU is now trying to\nwake it up (ttwu) such that it is queued again on the runqeue\n(on_rq is 1) and also if the task was run by the same CPU, then the\ncurrent checks will pass even though the task was migrated out and is no\nlonger in the pushable tasks list.\n\nCrashes\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\nThis bug resulted in quite a few flavors of crashes triggering kernel\npanics with various crash signatures such as assert failures, page\nfaults, null pointer dereferences, and queue corruption errors all\ncoming from scheduler itself.\n\nSome of the crashes:\n-\u003e kernel BUG at kernel/sched/rt.c:1616! BUG_ON(idx \u003e\u003d MAX_RT_PRIO)\n   Call Trace:\n   ? __die_body+0x1a/0x60\n   ? die+0x2a/0x50\n   ? do_trap+0x85/0x100\n   ? pick_next_task_rt+0x6e/0x1d0\n   ? do_error_trap+0x64/0xa0\n   ? pick_next_task_rt+0x6e/0x1d0\n   ? exc_invalid_op+0x4c/0x60\n   ? pick_next_task_rt+0x6e/0x1d0\n   ? asm_exc_invalid_op+0x12/0x20\n   ? pick_next_task_rt+0x6e/0x1d0\n   __schedule+0x5cb/0x790\n   ? update_ts_time_stats+0x55/0x70\n   schedule_idle+0x1e/0x40\n   do_idle+0x15e/0x200\n   cpu_startup_entry+0x19/0x20\n   start_secondary+0x117/0x160\n   secondary_startup_64_no_verify+0xb0/0xbb\n\n-\u003e BUG: kernel NULL pointer dereference, address: 00000000000000c0\n   Call Trace:\n   ? __die_body+0x1a/0x60\n   ? no_context+0x183/0x350\n   ? __warn+0x8a/0xe0\n   ? exc_page_fault+0x3d6/0x520\n   ? asm_exc_page_fault+0x1e/0x30\n   ? pick_next_task_rt+0xb5/0x1d0\n   ? pick_next_task_rt+0x8c/0x1d0\n   __schedule+0x583/0x7e0\n   ? update_ts_time_stats+0x55/0x70\n   schedule_idle+0x1e/0x40\n   do_idle+0x15e/0x200\n   cpu_startup_entry+0x19/0x20\n   start_secondary+0x117/0x160\n   secondary_startup_64_no_verify+0xb0/0xbb\n\n-\u003e BUG: unable to handle page fault for address: ffff9464daea5900\n   kernel BUG at kernel/sched/rt.c:1861! BUG_ON(rq-\u003ecpu !\u003d task_cpu(p))\n\n-\u003e kernel BUG at kernel/sched/rt.c:1055! BUG_ON(!rq-\u003enr_running)\n   Call Trace:\n   ? __die_body+0x1a/0x60\n   ? die+0x2a/0x50\n   ? do_trap+0x85/0x100\n   ? dequeue_top_rt_rq+0xa2/0xb0\n   ? do_error_trap+0x64/0xa0\n   ? dequeue_top_rt_rq+0xa2/0xb0\n   ? exc_invalid_op+0x4c/0x60\n   ? dequeue_top_rt_rq+0xa2/0xb0\n   ? asm_exc_invalid_op+0x12/0x20\n   ? dequeue_top_rt_rq+0xa2/0xb0\n   dequeue_rt_entity+0x1f/0x70\n   dequeue_task_rt+0x2d/0x70\n   __schedule+0x1a8/0x7e0\n   ? blk_finish_plug+0x25/0x40\n   schedule+0x3c/0xb0\n   futex_wait_queue_me+0xb6/0x120\n   futex_wait+0xd9/0x240\n   do_futex+0x344/0xa90\n   ? get_mm_exe_file+0x30/0x60\n   ? audit_exe_compare+0x58/0x70\n   ? audit_filter_rules.constprop.26+0x65e/0x1220\n   __x64_sys_futex+0x148/0x1f0\n   do_syscall_64+0x30/0x80\n   entry_SYSCALL_64_after_hwframe+0x62/0xc7\n\n-\u003e BUG: unable to handle page fault for address: ffff8cf3608bc2c0\n   Call Trace:\n   ? __die_body+0x1a/0x60\n   ? no_context+0x183/0x350\n   ? spurious_kernel_fault+0x171/0x1c0\n   ? exc_page_fault+0x3b6/0x520\n   ? plist_check_list+0x15/0x40\n   ? plist_check_list+0x2e/0x40\n   ? asm_exc_page_fault+0x1e/0x30\n   ? _cond_resched+0x15/0x30\n   ? futex_wait_queue_me+0xc8/0x120\n   ? futex_wait+0xd9/0x240\n   ? try_to_wake_up+0x1b8/0x490\n   ? futex_wake+0x78/0x160\n   ? do_futex+0xcd/0xa90\n   ? plist_check_list+0x15/0x40\n   ? plist_check_list+0x2e/0x40\n   ? plist_del+0x6a/0xd0\n   ? plist_check_list+0x15/0x40\n   ? plist_check_list+0x2e/0x40\n   ? dequeue_pushable_task+0x20/0x70\n   ? __schedule+0x382/0x7e0\n   ? asm_sysvec_reschedule_ipi+0xa/0x20\n   ? schedule+0x3c/0xb0\n   ? exit_to_user_mode_prepare+0x9e/0x150\n   ? irqentry_exit_to_user_mode+0x5/0x30\n   ? asm_sysvec_reschedule_ipi+0x12/0x20\n\nAbove are some of the common examples of the crashes that were observed\ndue to this issue.\n\nDetails\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\nLet\u0027s look at the following scenario to understand this race.\n\n1) CPU A enters push_rt_task\n  a) CPU A has chosen next_task \u003d task p.\n  b) CPU A calls find_lock_lowest_rq(Task p, CPU Z’s rq).\n  c) CPU A identifies CPU X as a destination CPU (X \u003c Z).\n  d) CPU A enters double_lock_balance(CPU Z’s rq, CPU X’s rq).\n  e) Since X is lower than Z, CPU A unlocks CPU Z’s rq. Someone else has\n     locked CPU X’s rq, and thus, CPU A must wait.\n\n2) At CPU Z\n  a) Previous task has completed execution and thus, CPU Z enters\n     schedule, locks its own rq after CPU A releases it.\n  b) CPU Z dequeues previous task and begins executing task p.\n  c) CPU Z unlocks its rq.\n  d) Task p yields the CPU (ex. by doing IO or waiting to acquire a\n     lock) which triggers the schedule function on CPU Z.\n  e) CPU Z enters schedule again, locks its own rq, and dequeues task p.\n  f) As part of dequeue, it sets p.on_rq \u003d 0 and unlocks its rq.\n\n3) At CPU B\n  a) CPU B enters try_to_wake_up with input task p.\n  b) Since CPU Z dequeued task p, p.on_rq \u003d 0, and CPU B updates\n     B.state \u003d WAKING.\n  c) CPU B via select_task_rq determines CPU Y as the target CPU.\n\n4) The race\n  a) CPU A acquires CPU X’s lock and relocks CPU Z.\n  b) CPU A reads task p.cpu \u003d Z and incorrectly concludes task p is\n     still on CPU Z.\n  c) CPU A failed to notice task p had been dequeued from CPU Z while\n     CPU A was waiting for locks in double_lock_balance. If CPU A knew\n     that task p had been dequeued, it would return NULL forcing\n     push_rt_task to give up the task p\u0027s migration.\n  d) CPU B updates task p.cpu \u003d Y and calls ttwu_queue.\n  e) CPU B locks Ys rq. CPU B enqueues task p onto Y and sets task\n     p.on_rq \u003d 1.\n  f) CPU B unlocks CPU Y, triggering memory synchronization.\n  g) CPU A reads task p.on_rq \u003d 1, cementing its assumption that task p\n     has not migrated.\n  h) CPU A decides to migrate p to CPU X.\n\nThis leads to A dequeuing p from Y\u0027s queue and various crashes down the\nline.\n\nSolution\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\nThe solution here is fairly simple. After obtaining the lock (at 4a),\nthe check is enhanced to make sure that the task is still at the head of\nthe pushable tasks list. If not, then it is anyway not suitable for\nbeing pushed out.\n\nTesting\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\nThe fix is tested on a cluster of 3 nodes, where the panics due to this\nare hit every couple of days. A fix similar to this was deployed on such\ncluster and was stable for more than 30 days.\n\nBUG\u003db/462066962\nTEST\u003dpresubmit\nRELEASE_NOTE\u003dFixed CVE-2025-38234 in the Linux kernel.\n\ncos-patch: security-moderate\nCo-developed-by: Jon Kohler \u003cjon@nutanix.com\u003e\nChange-Id: I6e2afec9f5bf22566b569e26fa4788eb2e378de0\nSigned-off-by: Jon Kohler \u003cjon@nutanix.com\u003e\nCo-developed-by: Gauri Patwardhan \u003cgauri.patwardhan@nutanix.com\u003e\nSigned-off-by: Gauri Patwardhan \u003cgauri.patwardhan@nutanix.com\u003e\nCo-developed-by: Rahul Chunduru \u003crahul.chunduru@nutanix.com\u003e\nSigned-off-by: Rahul Chunduru \u003crahul.chunduru@nutanix.com\u003e\nSigned-off-by: Harshit Agarwal \u003charshit@nutanix.com\u003e\nSigned-off-by: Peter Zijlstra (Intel) \u003cpeterz@infradead.org\u003e\nReviewed-by: \"Steven Rostedt (Google)\" \u003crostedt@goodmis.org\u003e\nReviewed-by: Phil Auld \u003cpauld@redhat.com\u003e\nTested-by: Will Ton \u003cwilliam.ton@nutanix.com\u003e\nCc: stable@vger.kernel.org\nLink: https://lore.kernel.org/r/20250225180553.167995-1-harshit@nutanix.com\nSigned-off-by: Rajani Kantha \u003c681739313@139.com\u003e\nSigned-off-by: Greg Kroah-Hartman \u003cgregkh@linuxfoundation.org\u003e\nSigned-off-by: Kernel CVE Triage Automation \u003ccloud-image-kernel-cve-triage-automation@prod.google.com\u003e\nReviewed-on: https://cos-review.googlesource.com/c/third_party/kernel/+/136321\nTested-by: Cusky Presubmit Bot \u003cpresubmit@cos-infra-prod.iam.gserviceaccount.com\u003e\nReviewed-by: Robert Kolchmeyer \u003crkolchmeyer@google.com\u003e\nReviewed-by: Derek Taylor \u003cddtaylor@google.com\u003e\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "91b1ee0d81fce47497c39413b10c0a66faf893c9",
      "old_mode": 33188,
      "old_path": "kernel/sched/rt.c",
      "new_id": "2d0acdd32108ab5afb540b315b07a90d1dbc2618",
      "new_mode": 33188,
      "new_path": "kernel/sched/rt.c"
    }
  ]
}