fix(core): optimize warm cache performance for task execution by FrozenPandaz · Pull Request #35172 · nrwl/nx

FrozenPandaz · 2026-04-03T16:17:55Z

Current Behavior

When running cached tasks, the task orchestrator processes each task individually: separate cache lookups, individual daemon IPC calls for output hash checking/recording, and per-task scheduling with a full array re-sort on each insert. For a workspace with 1,110 projects, this means ~1,100 individual JS→Rust→SQLite cache lookups, ~2,200 sequential daemon round-trips, and O(n² log n) scheduling overhead.

Expected Behavior

Warm cache runs should resolve quickly by batching all hot-path operations: cache lookups, daemon calls, scheduling, and filesystem scans.

Key Changes

Rust Native (`cache.rs`)

NxCache.get_batch(): Single SQL UPDATE ... WHERE hash IN (...) RETURNING query + Rayon-parallel terminal output file reads. Replaces N individual JS→Rust boundary crossings and SQLite queries with 1.
get_files_for_outputs_batch(): Rayon-parallel filesystem scanning for output expansion (from first commit).

Task Orchestrator (`task-orchestrator.ts`)

Wire resolveCachedTasksBulk into coordinator loop: Bulk-resolves all cache hits before falling through to individual workers. Uses batched daemon calls for output hash checking.
Reorder coordinator: bulk resolve before processTask: Skips N processTask lifecycle calls for tasks resolved from cache — only processes remaining cache misses.
Remove dead executeDiscreteTaskLoop: Unused after coordinator introduction.

Task Scheduler (`tasks-schedule.ts`)

Batch scheduling with single sort: Collect all schedulable roots, push them all at once, sort once. Previously re-sorted the entire array per task insert.

Cache (`cache.ts`)

DbCache.getBatch(): TypeScript wrapper for the native batch cache lookup.

Daemon Outputs Tracking (`outputs-tracking.ts`)

Skip filesystem scan when no recorded hash: Short-circuit outputsHashesMatch and outputsHashesMatchBatch when the daemon has no recorded hashes, avoiding unnecessary Rayon filesystem scans after nx reset.

From First Commit

Batch daemon methods: recordOutputsHashBatch and outputsHashesMatchBatch
Cache readProjectsConfigurationFromProjectGraph: Avoids rebuilding project map per task
Skip recordOutputsHash for local-cache-kept-existing: Already has correct hashes
Batch hashing per topological level

Benchmark Results (1,110 projects, warm cache)

Metric	Before	After	Improvement
build-warm	6.27s	2.12s	-66%

Related Issue(s)

netlify · 2026-04-03T16:18:05Z

✅ Deploy Preview for nx-docs ready!

Name	Link
🔨 Latest commit	`0437035`
🔍 Latest deploy log	https://app.netlify.com/projects/nx-docs/deploys/69d87dbba874270008157c16
😎 Deploy Preview	https://deploy-preview-35172--nx-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify · 2026-04-03T16:18:05Z

✅ Deploy Preview for nx-dev ready!

Name	Link
🔨 Latest commit	`0437035`
🔍 Latest deploy log	https://app.netlify.com/projects/nx-dev/deploys/69d87dbba0b7a200080e8266
😎 Deploy Preview	https://deploy-preview-35172--nx-dev.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

nx-cloud · 2026-04-03T16:18:33Z

View your CI Pipeline Execution ↗ for commit 0437035

Command	Status	Duration	Result
`nx affected --targets=lint,test,build,e2e,e2e-c...`	✅ Succeeded	51m 13s	View ↗
`nx run-many -t check-imports check-lock-files c...`	✅ Succeeded	4s	View ↗
`nx-cloud record -- pnpm nx conformance:check`	✅ Succeeded	7s	View ↗
`nx build workspace-plugin`	✅ Succeeded	<1s	View ↗
`nx-cloud record -- nx format:check`	✅ Succeeded	2s	View ↗
`nx-cloud record -- nx sync:check`	✅ Succeeded	<1s	View ↗

☁️ Nx Cloud last updated this comment at 2026-04-10 05:29:37 UTC

Batch daemon calls, add bulk cache resolution fast path, and parallelize output hash checking to dramatically improve warm cache hit performance. - Batch daemon calls for recordOutputsHash and outputsHashesMatch - Add resolveCachedTasksBulk fast path for bulk cache resolution - Cache readProjectsConfigurationFromProjectGraph in getExecutorForTask - Add verified match cache in daemon to skip redundant filesystem scans - Add Rayon-parallel get_files_for_outputs_batch in Rust - Batch-hash unhashed tasks per topological level - Skip recordOutputsHash for local-cache-kept-existing

- Add NxCache.get_batch() in Rust: single SQL query + Rayon parallel file reads instead of N individual JS→Rust round-trips - Wire resolveCachedTasksBulk into coordinator loop to bulk-resolve cache hits with batched daemon calls - Batch task scheduling with single sort instead of re-sorting per insert - Remove unused executeDiscreteTaskLoop

Co-authored-by: FrozenPandaz <FrozenPandaz@users.noreply.github.com>

Two bugs in the batch scheduling optimization: 1. Collecting all schedulable roots at once bypassed parallelism checks - scheduling a non-parallel task must block subsequent tasks from being scheduled in the same pass. 2. Sort comparator was non-transitive when two tasks both lacked historical timing data, causing non-deterministic ordering.

The coordinator loop used a separate workerCompletedCallbacks list while the continuous task loop used waitingForTasks. This meant when a task completed via scheduleNextTasksAndReleaseThreads, only the continuous loop was woken — the coordinator would stay blocked, causing deadlocks in e2e scenarios with both discrete and continuous tasks. Consolidate both loops to use waitingForTasks.

…ht count Race condition: scheduleNextTasksAndReleaseThreads wakes the coordinator before .finally() decrements inFlightWorkers. The coordinator sees inFlightWorkers > 0, goes back to sleep, then .finally() decrements but nobody wakes the coordinator again — deadlock. Fix: also fire waitingForTasks from .finally() so the coordinator re-evaluates the exit condition after the decrement.

nx-cloud

Important

At least one additional CI pipeline execution has run since the conclusion below was written and it may no longer be applicable.

Nx Cloud is proposing a fix for your failed CI:

We added a .catch() handler on the fire-and-forget applyFromCacheOrRunTask call to capture errors (e.g. remote cache 401/connection failures from cache.put) that were previously silently swallowed by .finally(). The captured error is re-thrown after the coordinator loop exits, restoring the original behavior where these errors propagated to the CLI, printed the diagnostic message, and exited with a non-zero code.

Note

⏳ We are verifying this fix by re-running e2e-nx:e2e-ci--src/cache.test.ts.

Suggested Fix changes

diff --git a/packages/nx/src/tasks-runner/task-orchestrator.ts b/packages/nx/src/tasks-runner/task-orchestrator.ts
index 0c75521158..bbf924e089 100644
--- a/packages/nx/src/tasks-runner/task-orchestrator.ts
+++ b/packages/nx/src/tasks-runner/task-orchestrator.ts
@@ -227,6 +227,7 @@ export class TaskOrchestrator {
     parallelism: number
   ) {
     let inFlightWorkers = 0;
+    let firstWorkerError: unknown = null;
 
     while (true) {
       if (this.bailed || this.stopRequested) break;
@@ -287,8 +288,18 @@ export class TaskOrchestrator {
         dispatched = true;
         inFlightWorkers++;
         const groupId = this.closeGroup();
-        this.applyFromCacheOrRunTask(doNotSkipCache, task, groupId).finally(
-          () => {
+        this.applyFromCacheOrRunTask(doNotSkipCache, task, groupId)
+          .catch((e) => {
+            // Capture the first worker error so it can be re-thrown after
+            // the coordinator loop exits. This preserves the old behavior
+            // where errors (e.g. remote cache 401/connection failures) from
+            // cache.put propagated to crash the build with a visible message.
+            if (!firstWorkerError) firstWorkerError = e;
+            this.bailed = true;
+            this.waitingForTasks.forEach((f) => f(null));
+            this.waitingForTasks.length = 0;
+          })
+          .finally(() => {
             this.openGroup(groupId);
             inFlightWorkers--;
             // Wake coordinator — the decrement above may satisfy the
@@ -296,8 +307,7 @@ export class TaskOrchestrator {
             // when scheduleNextTasksAndReleaseThreads fired earlier.
             this.waitingForTasks.forEach((f) => f(null));
             this.waitingForTasks.length = 0;
-          }
-        );
+          });
       }
       if (dispatched) continue;
 
@@ -309,6 +319,11 @@ export class TaskOrchestrator {
       // 7. Wait for a worker to finish (woken by scheduleNextTasksAndReleaseThreads)
       await new Promise((res) => this.waitingForTasks.push(res));
     }
+
+    // Re-throw any error captured from a fire-and-forget worker so it
+    // propagates to run() and ultimately the CLI (which prints the message
+    // and exits with non-zero code).
+    if (firstWorkerError) throw firstWorkerError;
   }
 
   private async executeContinuousTaskLoop(continuousTaskCount: number) {

🔔 Heads up, your workspace has pending recommendations ↗ to auto-apply fixes for similar failures.

Or Apply changes locally with:

npx nx-cloud apply-locally cNYl-nDXD

Apply fix locally with your editor ↗ View interactive diff ↗

_{🎓 Learn more about Self-Healing CI on nx.dev}

When applyFromCacheOrRunTask rejects (e.g. remote cache 401), the fire-and-forget dispatch silently swallowed the error. The task would never complete and the error message was lost. Add .catch() that prints the error via the lifecycle and marks the task as failed through postRunSteps, matching the behavior of the original sequential dispatch.

FrozenPandaz force-pushed the cat-wram branch 8 times, most recently from 1115e1c to 771cc55 Compare April 4, 2026 13:50