Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/torvalds/linux.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorOded Gabbay <ogabbay@kernel.org>2021-10-21 14:02:40 +0300
committerOded Gabbay <ogabbay@kernel.org>2021-12-26 09:59:03 +0300
commit4cd454a205069965463515e2068190f56b0e4206 (patch)
treea666387eb2b655c27385c5b96fe457e685c1e4cd /drivers/misc
parentc9d1383c75c95be55d9207e8a8d5c7c1659a029e (diff)
habanalabs/gaudi: recover from CPU WD event
There are rare cases where the device CPU's watchdog has expired and as a result, the watchdog reset has happened and the CPU will now move to running its preboot f/w. When that happens, the driver will only know that a heartbeat failure occurred. As a result, the driver will send a message to the CPU's main f/w asking it to reset the device, but because the CPU is now running preboot, it won't respond and the re-initialization process will later fail when trying to load the f/w. The solution is to send the request to the preboot as well, only if the reset was caused because of HB failure. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Diffstat (limited to 'drivers/misc')
-rw-r--r--drivers/misc/habanalabs/gaudi/gaudi.c20
1 files changed, 19 insertions, 1 deletions
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 825737dfe381..d2b7ecb45497 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Copyright 2016-2020 HabanaLabs, Ltd.
+ * Copyright 2016-2021 HabanaLabs, Ltd.
* All Rights Reserved.
*/
@@ -4296,6 +4296,24 @@ static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset
WREG32(irq_handler_offset,
gaudi_irq_map_table[GAUDI_EVENT_HALT_MACHINE].cpu_id);
+
+ /* This is a hail-mary attempt to revive the card in the small chance that the
+ * f/w has experienced a watchdog event, which caused it to return back to preboot.
+ * In that case, triggering reset through GIC won't help. We need to trigger the
+ * reset as if Linux wasn't loaded.
+ *
+ * We do it only if the reset cause was HB, because that would be the indication
+ * of such an event.
+ *
+ * In case watchdog hasn't expired but we still got HB, then this won't do any
+ * damage.
+ */
+ if (hdev->curr_reset_cause == HL_RESET_CAUSE_HEARTBEAT) {
+ if (hdev->asic_prop.hard_reset_done_by_fw)
+ hl_fw_ask_hard_reset_without_linux(hdev);
+ else
+ hl_fw_ask_halt_machine_without_linux(hdev);
+ }
} else {
if (hdev->asic_prop.hard_reset_done_by_fw)
hl_fw_ask_hard_reset_without_linux(hdev);