From Local to Distributed: Benchmarks and Grid Operations
You've built a WitEngine plugin. It works locally, tests pass, life is good. But WitEngine's real power is distributed execution—spreading work across dozens, hundreds, or thousands of machines.
This post covers what it takes to make your plugin distribution-ready: benchmarking for intelligent load balancing, Grid.ForEach for parallel execution, and the checklist for WitCloud deployment.
The SDK vs. Production
First, let's be clear about where you are and where you're going.
WitEngine SDK (what you've been using):
| Limit | Value |
|---|---|
| Max activities per job | 50 |
| Max variables per job | 100 |
| Max execution time | 5 minutes |
| Max nodes | 1 (local only) |
| Max variable size | 100 MB |
WitCloud / OmnibusCloud (production):
| Capability | Value |
|---|---|
| Activities per job | Unlimited |
| Variables per job | Unlimited |
| Execution time | Hours to days |
| Nodes | Thousands+ |
| Variable size | Gigabytes |
The SDK is your development sandbox. It runs everything on one machine, which is perfect for building and testing. But to actually distribute work, you need WitCloud infrastructure—and your plugin needs to be ready for it.
Why Benchmarks Matter
In a distributed system, not all machines are equal. Your cluster might include:
- A brand new workstation with an RTX 4090
- Five-year-old office PCs
- A powerful server with many cores but older architecture
- Contributed laptops with variable performance
If you distribute work equally (100 tasks ÷ 4 machines = 25 each), the fast machines finish early and wait. The slow machine becomes a bottleneck. You've paid for four machines but only fully utilized one.
Benchmarks solve this. They measure actual performance on each node, enabling proportional distribution:
The IWitBenchmarkAdapter Interface
To enable intelligent distribution, your adapter implements IWitBenchmarkAdapter:
public interface IWitBenchmarkAdapter
{
/// <summary>
/// Measures how fast this node can execute the activity.
/// </summary>
Task<IWitBenchmarkResult> RunBenchmark(
IWitBenchmarkOptions? options,
CancellationToken cancellationToken);
/// <summary>
/// Estimates relative work for a specific task.
/// </summary>
double EstimateWork(
IWitActivity activity,
IWitVariablesCollection pool);
}Two methods, two purposes:
- RunBenchmark — "How fast is this node?" (runs once per node, cached)
- EstimateWork — "How big is this task?" (runs for each task)
Together, they enable optimal task allocation.
Implementing RunBenchmark
RunBenchmark measures node performance by running a representative workload:
public class WitActivityAdapterMyTransform
: WitActivityAdapterTransform<WitActivityMyTransform>,
IWitBenchmarkAdapter
{
public async Task<IWitBenchmarkResult> RunBenchmark(
IWitBenchmarkOptions? options,
CancellationToken cancellationToken)
{
options ??= WitBenchmarkOptions.Default;
// 1. Create representative test data
var testData = CreateBenchmarkData();
// 2. Warmup phase - JIT compilation, cache warming
for (int i = 0; i < options.WarmupIterations; i++)
{
cancellationToken.ThrowIfCancellationRequested();
ProcessItem(testData);
}
// 3. Measurement phase - count operations until time limit
var stopwatch = Stopwatch.StartNew();
long operations = 0;
while (stopwatch.Elapsed < options.MinDuration)
{
cancellationToken.ThrowIfCancellationRequested();
ProcessItem(testData);
operations++;
}
stopwatch.Stop();
// 4. Calculate rate (operations per second)
double rate = operations / stopwatch.Elapsed.TotalSeconds;
return new WitBenchmarkResult(rate);
}
}Benchmark Best Practices
| Practice | Why It Matters |
|---|---|
| Use representative data | Benchmark should reflect real workload characteristics |
| Warmup first | JIT compilation makes first runs slower; exclude them |
| Run for sufficient time | Short runs have high variance; aim for 500ms+ |
| Check cancellation | Allow graceful termination |
| Return operations/second | Higher number = faster node |
What Makes Good Test Data?
Your benchmark data should be:
- Representative — Similar size and complexity to real tasks
- Deterministic — Same data every time for consistent results
- Fast to create — Don't spend benchmark time on setup
private MyData CreateBenchmarkData()
{
// Create a typical-sized item
// Not the smallest, not the largest - representative
return new MyData
{
Values = Enumerable.Range(0, 1000).Select(i => i * 0.1).ToArray(),
Metadata = "benchmark-item"
};
}Implementing EstimateWork
While RunBenchmark measures node speed, EstimateWork measures task size. This handles cases where tasks have different complexity.
Constant-Time Operations
If every task takes the same time regardless of input:
public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
// Temperature conversion, simple calculations, etc.
return 1.0;
}Linear Operations (O(n))
If time scales with data size:
public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
var myActivity = (WitActivityProcessData)activity;
if (pool.TryGetValue(myActivity.Data, out DataCollection? data))
{
return data?.Count ?? 1.0;
}
return 1.0;
}Quadratic Operations (O(n²))
If time scales with the square of data size:
public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
var myActivity = (WitActivityMatrixOp)activity;
if (pool.TryGetValue(myActivity.Matrix, out Matrix? matrix))
{
double size = matrix?.RowCount ?? 1;
return size * size;
}
return 1.0;
}Complex Estimation
For operations with multiple factors:
public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
var renderTask = (WitActivityRenderFrame)activity;
if (pool.TryGetValue(renderTask.Task, out RenderTaskData? task))
{
// Resolution affects time
double pixels = (task?.Width ?? 1920) * (task?.Height ?? 1080);
// Samples affect time linearly
double samples = task?.Samples ?? 128;
// Combine factors
return pixels * samples / 1_000_000; // Normalize to reasonable range
}
return 1.0;
}Grid.ForEach: Distributed Iteration
The Grid controller provides Grid.ForEach—the primary way to distribute work:
ResultCollection:results = Grid.ForEach(item in items, opts) => Transform(item);This single line:
- Takes a collection of items
- Distributes them across available nodes
- Applies your transform to each item
- Collects and returns all results
Processing Strategies
WitEngine offers two distribution strategies:
Balanced — Pre-allocates tasks based on benchmarks:
ProcessingOptions:opts = ProcessingOptions.Create("Balanced");
~ All tasks assigned upfront based on node speed ~
~ Node A (fast): gets 60% of tasks ~
~ Node B (slow): gets 40% of tasks ~Queued — Nodes pull tasks from a central queue:
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
~ Tasks go into a queue ~
~ Each node pulls next task when ready ~
~ Fast nodes naturally get more tasks ~When to Use Which
| Your workload... | Use |
|---|---|
| All tasks take similar time | Balanced |
| Task duration varies | Queued |
| Task complexity is known in advance | Balanced |
| Task complexity is unpredictable | Queued |
| Nodes are stable | Balanced |
| Nodes may disconnect | Queued |
Default recommendation: Start with Queued. It's more forgiving of variability and requires less tuning.
Transform Activities
For Grid.ForEach to work, your activity must be a Transform—an operation that takes input and produces output, designed for parallel execution.
Transform vs. Function
// FUNCTION - runs on host, not distributed
[Activity("Calculate")]
public class WitActivityCalculate : WitActivityFunction { }
// TRANSFORM - can run on any node, distributable
[Activity("ProcessBatch")]
public class WitActivityProcessBatch : WitActivityTransform { }Use WitActivityTransform base class for activities that will be used in Grid.ForEach.
Transform Adapter
public class WitActivityAdapterProcessBatch
: WitActivityAdapterTransform<WitActivityProcessBatch>, // Note: Transform base
IWitBenchmarkAdapter
{
// ProcessInner runs on worker nodes
protected override async Task<object?> ProcessInner(
WitActivityProcessBatch activity,
IWitVariablesCollection pool,
IWitActivityStatus? activityStatus,
WitProcessingStatus status)
{
// This code executes on remote nodes!
// Only serialized data is available
if (!pool.TryGetValue(activity.Input, out MyData? input))
throw new InvalidOperationException("Failed to get input");
return Process(input);
}
// Benchmark implementation
public async Task<IWitBenchmarkResult> RunBenchmark(...) { ... }
public double EstimateWork(...) { ... }
}Testing Distributed Logic Locally
The SDK runs on one node, but you can still test distributed patterns:
Test Grid.ForEach
[Test]
public async Task GridForEachProcessesAllItems()
{
var job = WitEngineSdk.Instance.Compile(@"
Job:Test()
{
IntCollection:items = [1, 2, 3, 4, 5];
ProcessingOptions:opts = ProcessingOptions.Create(""Balanced"");
IntCollection:results =
Grid.ForEach(item in items, opts) => MyMath.Square(item);
Return(results);
}
");
var status = await WitEngineSdk.Instance.ScheduleAndWaitAsync(job);
Assert.That(status.Result, Is.EqualTo(WitProcessingResult.Completed));
var results = status.ReturnedValues.First() as IEnumerable<int>;
CollectionAssert.AreEquivalent(new[] { 1, 4, 9, 16, 25 }, results);
}Test Serialization Round-Trip
Distributed execution serializes activities. Test that yours survives:
[Test]
public void ActivitySurvivesSerializationRoundTrip()
{
var original = new WitActivityMyTransform
{
InputValue = new WitParameterVariable("x")
};
// Serialize
var bytes = MemoryPackSerializer.Serialize(original);
// Deserialize
var restored = MemoryPackSerializer.Deserialize<WitActivityMyTransform>(bytes);
// Verify
Assert.That(original.Is(restored), Is.True);
}Test Stateless Behavior
Adapters must be stateless—they're recreated on each node:
// WRONG - adapter has state
public class BadAdapter : WitActivityAdapterTransform<MyActivity>
{
private int _counter = 0; // State lost on transfer!
protected override async Task<object?> ProcessInner(...)
{
_counter++; // This won't work distributed
return _counter;
}
}
// RIGHT - stateless adapter
public class GoodAdapter : WitActivityAdapterTransform<MyActivity>
{
protected override async Task<object?> ProcessInner(...)
{
// All data comes from activity and pool
// No adapter state
}
}WitCloud Deployment Checklist
Before deploying to WitCloud or OmnibusCloud, verify:
Serialization
| Check | Status |
|---|---|
Activity class has [MemoryPackable] |
☐ |
Activity class is partial |
☐ |
| All properties are serializable types | ☐ |
Interface properties have [MemoryPackAllowSerialize] |
☐ |
Custom data types have [MemoryPackable] |
☐ |
| Serialization round-trip test passes | ☐ |
Distribution
| Check | Status |
|---|---|
Transform activities use WitActivityTransform base |
☐ |
| Adapters are stateless | ☐ |
Adapter implements IWitBenchmarkAdapter |
☐ |
RunBenchmark returns meaningful rate |
☐ |
EstimateWork reflects actual complexity |
☐ |
Controller
| Check | Status |
|---|---|
Controller implements IWitControllerHost |
☐ |
Controller implements IWitControllerNode |
☐ |
All activities registered in Initialize() |
☐ |
Deployment
| Check | Status |
|---|---|
| Controller DLL built for target runtime | ☐ |
| All dependencies included | ☐ |
| No hardcoded paths (use relative or injected) | ☐ |
| Error handling includes context | ☐ |
What Happens on Real Clusters
When your plugin runs on WitCloud or OmnibusCloud:
1. Controller Distribution
Your controller DLL is distributed to all nodes:
Host Machine Worker Node 1 Worker Node 2
@Controllers/ @Controllers/ @Controllers/
├── MyPlugin.dll ├── MyPlugin.dll ├── MyPlugin.dll
└── Variables.dll └── Variables.dll └── Variables.dll2. Benchmark Execution
Each node runs your benchmark:
Node A runs RunBenchmark() → 1500 ops/sec
Node B runs RunBenchmark() → 800 ops/sec
Node C runs RunBenchmark() → 2200 ops/sec3. Task Allocation
The host uses benchmarks + work estimates to allocate:
1000 tasks total
Node A (1500 ops/sec): 333 tasks
Node B (800 ops/sec): 178 tasks
Node C (2200 ops/sec): 489 tasks4. Distributed Execution
Tasks are serialized, sent to nodes, executed, results returned:
Host Nodes
│ │
├── Serialize Task 1 ────────► Node A: Execute, return result
├── Serialize Task 2 ────────► Node B: Execute, return result
├── Serialize Task 3 ────────► Node C: Execute, return result
│ │
◄── Collect results ───────────┘5. Result Aggregation
Results stream back and are collected into the output collection.
From Smart TVs to Servers
Here's the payoff of all this work: extreme heterogeneity.
WitCloud and OmnibusCloud can include:
Device Type Benchmark Score
─────────────────────────────────────────
Dedicated server ~3000 ops/sec
Gaming PC ~1500 ops/sec
Office workstation ~800 ops/sec
Laptop ~400 ops/sec
Smart TV ~50 ops/sec
Mobile device ~30 ops/secA 100× performance spread! But with proper benchmarking:
- The server gets 100× more tasks than the smart TV
- The smart TV still contributes useful work
- All devices finish at approximately the same time
- No wasted capacity anywhere
This is why benchmarking isn't optional for distributed plugins—it's what makes heterogeneous computing possible.
Summary
To make your plugin distribution-ready:
| Component | What It Does |
|---|---|
| IWitBenchmarkAdapter | Enables intelligent load balancing |
| RunBenchmark | Measures node speed |
| EstimateWork | Measures task complexity |
| WitActivityTransform | Marks activities as distributable |
| Stateless adapters | Ensures clean execution on any node |
| MemoryPack serialization | Enables data transfer between nodes |
The SDK lets you develop and test locally. Benchmarks and proper architecture ensure your plugin works at scale—whether that's a 10-node office cluster or a global OmnibusCloud mesh with everything from servers to smart TVs.