Skip to content

fix(core): optimize warm cache performance for task execution#35172

Draft
FrozenPandaz wants to merge 8 commits intomasterfrom
cat-wram
Draft

fix(core): optimize warm cache performance for task execution#35172
FrozenPandaz wants to merge 8 commits intomasterfrom
cat-wram

Conversation

@FrozenPandaz
Copy link
Copy Markdown
Contributor

@FrozenPandaz FrozenPandaz commented Apr 3, 2026

Current Behavior

When running cached tasks, the task orchestrator processes each task individually: separate cache lookups, individual daemon IPC calls for output hash checking/recording, and per-task scheduling with a full array re-sort on each insert. For a workspace with 1,110 projects, this means ~1,100 individual JS→Rust→SQLite cache lookups, ~2,200 sequential daemon round-trips, and O(n² log n) scheduling overhead.

Expected Behavior

Warm cache runs should resolve quickly by batching all hot-path operations: cache lookups, daemon calls, scheduling, and filesystem scans.

Key Changes

Rust Native (cache.rs)

  • NxCache.get_batch(): Single SQL UPDATE ... WHERE hash IN (...) RETURNING query + Rayon-parallel terminal output file reads. Replaces N individual JS→Rust boundary crossings and SQLite queries with 1.
  • get_files_for_outputs_batch(): Rayon-parallel filesystem scanning for output expansion (from first commit).

Task Orchestrator (task-orchestrator.ts)

  • Wire resolveCachedTasksBulk into coordinator loop: Bulk-resolves all cache hits before falling through to individual workers. Uses batched daemon calls for output hash checking.
  • Reorder coordinator: bulk resolve before processTask: Skips N processTask lifecycle calls for tasks resolved from cache — only processes remaining cache misses.
  • Remove dead executeDiscreteTaskLoop: Unused after coordinator introduction.

Task Scheduler (tasks-schedule.ts)

  • Batch scheduling with single sort: Collect all schedulable roots, push them all at once, sort once. Previously re-sorted the entire array per task insert.

Cache (cache.ts)

  • DbCache.getBatch(): TypeScript wrapper for the native batch cache lookup.

Daemon Outputs Tracking (outputs-tracking.ts)

  • Skip filesystem scan when no recorded hash: Short-circuit outputsHashesMatch and outputsHashesMatchBatch when the daemon has no recorded hashes, avoiding unnecessary Rayon filesystem scans after nx reset.

From First Commit

  • Batch daemon methods: recordOutputsHashBatch and outputsHashesMatchBatch
  • Cache readProjectsConfigurationFromProjectGraph: Avoids rebuilding project map per task
  • Skip recordOutputsHash for local-cache-kept-existing: Already has correct hashes
  • Batch hashing per topological level

Benchmark Results (1,110 projects, warm cache)

Metric Before After Improvement
build-warm 6.27s 2.12s -66%

Related Issue(s)

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 3, 2026

Deploy Preview for nx-docs ready!

Name Link
🔨 Latest commit 0437035
🔍 Latest deploy log https://app.netlify.com/projects/nx-docs/deploys/69d87dbba874270008157c16
😎 Deploy Preview https://deploy-preview-35172--nx-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 3, 2026

Deploy Preview for nx-dev ready!

Name Link
🔨 Latest commit 0437035
🔍 Latest deploy log https://app.netlify.com/projects/nx-dev/deploys/69d87dbba0b7a200080e8266
😎 Deploy Preview https://deploy-preview-35172--nx-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@nx-cloud
Copy link
Copy Markdown
Contributor

nx-cloud bot commented Apr 3, 2026

View your CI Pipeline Execution ↗ for commit 0437035

Command Status Duration Result
nx affected --targets=lint,test,build,e2e,e2e-c... ✅ Succeeded 51m 13s View ↗
nx run-many -t check-imports check-lock-files c... ✅ Succeeded 4s View ↗
nx-cloud record -- pnpm nx conformance:check ✅ Succeeded 7s View ↗
nx build workspace-plugin ✅ Succeeded <1s View ↗
nx-cloud record -- nx format:check ✅ Succeeded 2s View ↗
nx-cloud record -- nx sync:check ✅ Succeeded <1s View ↗

☁️ Nx Cloud last updated this comment at 2026-04-10 05:29:37 UTC

@FrozenPandaz FrozenPandaz force-pushed the cat-wram branch 8 times, most recently from 1115e1c to 771cc55 Compare April 4, 2026 13:50
nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

@FrozenPandaz FrozenPandaz force-pushed the cat-wram branch 3 times, most recently from 7221ff0 to 6962f00 Compare April 8, 2026 00:54
nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

nx-cloud[bot]

This comment was marked as outdated.

@FrozenPandaz FrozenPandaz force-pushed the cat-wram branch 2 times, most recently from 6962f00 to 3aa24b3 Compare April 8, 2026 14:16
nx-cloud[bot]

This comment was marked as outdated.

FrozenPandaz and others added 5 commits April 8, 2026 13:52
Batch daemon calls, add bulk cache resolution fast path, and parallelize
output hash checking to dramatically improve warm cache hit performance.

- Batch daemon calls for recordOutputsHash and outputsHashesMatch
- Add resolveCachedTasksBulk fast path for bulk cache resolution
- Cache readProjectsConfigurationFromProjectGraph in getExecutorForTask
- Add verified match cache in daemon to skip redundant filesystem scans
- Add Rayon-parallel get_files_for_outputs_batch in Rust
- Batch-hash unhashed tasks per topological level
- Skip recordOutputsHash for local-cache-kept-existing
- Add NxCache.get_batch() in Rust: single SQL query + Rayon parallel
  file reads instead of N individual JS→Rust round-trips
- Wire resolveCachedTasksBulk into coordinator loop to bulk-resolve
  cache hits with batched daemon calls
- Batch task scheduling with single sort instead of re-sorting per insert
- Remove unused executeDiscreteTaskLoop
Co-authored-by: FrozenPandaz <FrozenPandaz@users.noreply.github.com>
Two bugs in the batch scheduling optimization:
1. Collecting all schedulable roots at once bypassed parallelism checks -
   scheduling a non-parallel task must block subsequent tasks from being
   scheduled in the same pass.
2. Sort comparator was non-transitive when two tasks both lacked historical
   timing data, causing non-deterministic ordering.
The coordinator loop used a separate workerCompletedCallbacks list
while the continuous task loop used waitingForTasks. This meant when
a task completed via scheduleNextTasksAndReleaseThreads, only the
continuous loop was woken — the coordinator would stay blocked,
causing deadlocks in e2e scenarios with both discrete and continuous
tasks. Consolidate both loops to use waitingForTasks.
…ht count

Race condition: scheduleNextTasksAndReleaseThreads wakes the coordinator
before .finally() decrements inFlightWorkers. The coordinator sees
inFlightWorkers > 0, goes back to sleep, then .finally() decrements but
nobody wakes the coordinator again — deadlock.

Fix: also fire waitingForTasks from .finally() so the coordinator
re-evaluates the exit condition after the decrement.
Copy link
Copy Markdown
Contributor

@nx-cloud nx-cloud bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

At least one additional CI pipeline execution has run since the conclusion below was written and it may no longer be applicable.

Nx Cloud is proposing a fix for your failed CI:

We added a .catch() handler on the fire-and-forget applyFromCacheOrRunTask call to capture errors (e.g. remote cache 401/connection failures from cache.put) that were previously silently swallowed by .finally(). The captured error is re-thrown after the coordinator loop exits, restoring the original behavior where these errors propagated to the CLI, printed the diagnostic message, and exited with a non-zero code.

Note

We are verifying this fix by re-running e2e-nx:e2e-ci--src/cache.test.ts.

Suggested Fix changes
diff --git a/packages/nx/src/tasks-runner/task-orchestrator.ts b/packages/nx/src/tasks-runner/task-orchestrator.ts
index 0c75521158..bbf924e089 100644
--- a/packages/nx/src/tasks-runner/task-orchestrator.ts
+++ b/packages/nx/src/tasks-runner/task-orchestrator.ts
@@ -227,6 +227,7 @@ export class TaskOrchestrator {
     parallelism: number
   ) {
     let inFlightWorkers = 0;
+    let firstWorkerError: unknown = null;
 
     while (true) {
       if (this.bailed || this.stopRequested) break;
@@ -287,8 +288,18 @@ export class TaskOrchestrator {
         dispatched = true;
         inFlightWorkers++;
         const groupId = this.closeGroup();
-        this.applyFromCacheOrRunTask(doNotSkipCache, task, groupId).finally(
-          () => {
+        this.applyFromCacheOrRunTask(doNotSkipCache, task, groupId)
+          .catch((e) => {
+            // Capture the first worker error so it can be re-thrown after
+            // the coordinator loop exits. This preserves the old behavior
+            // where errors (e.g. remote cache 401/connection failures) from
+            // cache.put propagated to crash the build with a visible message.
+            if (!firstWorkerError) firstWorkerError = e;
+            this.bailed = true;
+            this.waitingForTasks.forEach((f) => f(null));
+            this.waitingForTasks.length = 0;
+          })
+          .finally(() => {
             this.openGroup(groupId);
             inFlightWorkers--;
             // Wake coordinator — the decrement above may satisfy the
@@ -296,8 +307,7 @@ export class TaskOrchestrator {
             // when scheduleNextTasksAndReleaseThreads fired earlier.
             this.waitingForTasks.forEach((f) => f(null));
             this.waitingForTasks.length = 0;
-          }
-        );
+          });
       }
       if (dispatched) continue;
 
@@ -309,6 +319,11 @@ export class TaskOrchestrator {
       // 7. Wait for a worker to finish (woken by scheduleNextTasksAndReleaseThreads)
       await new Promise((res) => this.waitingForTasks.push(res));
     }
+
+    // Re-throw any error captured from a fire-and-forget worker so it
+    // propagates to run() and ultimately the CLI (which prints the message
+    // and exits with non-zero code).
+    if (firstWorkerError) throw firstWorkerError;
   }
 
   private async executeContinuousTaskLoop(continuousTaskCount: number) {

🔔 Heads up, your workspace has pending recommendations ↗ to auto-apply fixes for similar failures.

Apply fix via Nx Cloud  Reject fix via Nx Cloud


Or Apply changes locally with:

npx nx-cloud apply-locally cNYl-nDXD

Apply fix locally with your editor ↗   View interactive diff ↗



🎓 Learn more about Self-Healing CI on nx.dev

When applyFromCacheOrRunTask rejects (e.g. remote cache 401), the
fire-and-forget dispatch silently swallowed the error. The task would
never complete and the error message was lost.

Add .catch() that prints the error via the lifecycle and marks the
task as failed through postRunSteps, matching the behavior of the
original sequential dispatch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant