From Local to Distributed: Benchmarks and Grid Operations

You've built a WitEngine plugin. It works locally, tests pass, life is good. But WitEngine's real power is distributed execution—spreading work across dozens, hundreds, or thousands of machines.

This post covers what it takes to make your plugin distribution-ready: benchmarking for intelligent load balancing, Grid.ForEach for parallel execution, and the checklist for WitCloud deployment.

The SDK vs. Production

First, let's be clear about where you are and where you're going.

WitEngine SDK (what you've been using):

Limit	Value
Max activities per job	50
Max variables per job	100
Max execution time	5 minutes
Max nodes	1 (local only)
Max variable size	100 MB

WitCloud / OmnibusCloud (production):

Capability	Value
Activities per job	Unlimited
Variables per job	Unlimited
Execution time	Hours to days
Nodes	Thousands+
Variable size	Gigabytes

The SDK is your development sandbox. It runs everything on one machine, which is perfect for building and testing. But to actually distribute work, you need WitCloud infrastructure—and your plugin needs to be ready for it.

Why Benchmarks Matter

In a distributed system, not all machines are equal. Your cluster might include:

A brand new workstation with an RTX 4090
Five-year-old office PCs
A powerful server with many cores but older architecture
Contributed laptops with variable performance

If you distribute work equally (100 tasks ÷ 4 machines = 25 each), the fast machines finish early and wait. The slow machine becomes a bottleneck. You've paid for four machines but only fully utilized one.

Benchmarks solve this. They measure actual performance on each node, enabling proportional distribution:

The IWitBenchmarkAdapter Interface

To enable intelligent distribution, your adapter implements IWitBenchmarkAdapter:

csharp

public interface IWitBenchmarkAdapter
{
    /// <summary>
    /// Measures how fast this node can execute the activity.
    /// </summary>
    Task<IWitBenchmarkResult> RunBenchmark(
        IWitBenchmarkOptions? options,
        CancellationToken cancellationToken);
    
    /// <summary>
    /// Estimates relative work for a specific task.
    /// </summary>
    double EstimateWork(
        IWitActivity activity,
        IWitVariablesCollection pool);
}

Two methods, two purposes:

RunBenchmark — "How fast is this node?" (runs once per node, cached)
EstimateWork — "How big is this task?" (runs for each task)

Together, they enable optimal task allocation.

Implementing RunBenchmark

RunBenchmark measures node performance by running a representative workload:

csharp

public class WitActivityAdapterMyTransform 
    : WitActivityAdapterTransform<WitActivityMyTransform>,
      IWitBenchmarkAdapter
{
    public async Task<IWitBenchmarkResult> RunBenchmark(
        IWitBenchmarkOptions? options,
        CancellationToken cancellationToken)
    {
        options ??= WitBenchmarkOptions.Default;
        
        // 1. Create representative test data
        var testData = CreateBenchmarkData();
        
        // 2. Warmup phase - JIT compilation, cache warming
        for (int i = 0; i < options.WarmupIterations; i++)
        {
            cancellationToken.ThrowIfCancellationRequested();
            ProcessItem(testData);
        }
        
        // 3. Measurement phase - count operations until time limit
        var stopwatch = Stopwatch.StartNew();
        long operations = 0;
        
        while (stopwatch.Elapsed < options.MinDuration)
        {
            cancellationToken.ThrowIfCancellationRequested();
            ProcessItem(testData);
            operations++;
        }
        
        stopwatch.Stop();
        
        // 4. Calculate rate (operations per second)
        double rate = operations / stopwatch.Elapsed.TotalSeconds;
        
        return new WitBenchmarkResult(rate);
    }
}

Benchmark Best Practices

Practice	Why It Matters
Use representative data	Benchmark should reflect real workload characteristics
Warmup first	JIT compilation makes first runs slower; exclude them
Run for sufficient time	Short runs have high variance; aim for 500ms+
Check cancellation	Allow graceful termination
Return operations/second	Higher number = faster node

What Makes Good Test Data?

Your benchmark data should be:

Representative — Similar size and complexity to real tasks
Deterministic — Same data every time for consistent results
Fast to create — Don't spend benchmark time on setup

csharp

private MyData CreateBenchmarkData()
{
    // Create a typical-sized item
    // Not the smallest, not the largest - representative
    return new MyData
    {
        Values = Enumerable.Range(0, 1000).Select(i => i * 0.1).ToArray(),
        Metadata = "benchmark-item"
    };
}

Implementing EstimateWork

While RunBenchmark measures node speed, EstimateWork measures task size. This handles cases where tasks have different complexity.

Constant-Time Operations

If every task takes the same time regardless of input:

csharp

public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
    // Temperature conversion, simple calculations, etc.
    return 1.0;
}

Linear Operations (O(n))

If time scales with data size:

csharp

public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
    var myActivity = (WitActivityProcessData)activity;
    
    if (pool.TryGetValue(myActivity.Data, out DataCollection? data))
    {
        return data?.Count ?? 1.0;
    }
    
    return 1.0;
}

Quadratic Operations (O(n²))

If time scales with the square of data size:

csharp

public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
    var myActivity = (WitActivityMatrixOp)activity;
    
    if (pool.TryGetValue(myActivity.Matrix, out Matrix? matrix))
    {
        double size = matrix?.RowCount ?? 1;
        return size * size;
    }
    
    return 1.0;
}

Complex Estimation

For operations with multiple factors:

csharp

public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
    var renderTask = (WitActivityRenderFrame)activity;
    
    if (pool.TryGetValue(renderTask.Task, out RenderTaskData? task))
    {
        // Resolution affects time
        double pixels = (task?.Width ?? 1920) * (task?.Height ?? 1080);
        
        // Samples affect time linearly
        double samples = task?.Samples ?? 128;
        
        // Combine factors
        return pixels * samples / 1_000_000; // Normalize to reasonable range
    }
    
    return 1.0;
}

Grid.ForEach: Distributed Iteration

The Grid controller provides Grid.ForEach—the primary way to distribute work:

ResultCollection:results = Grid.ForEach(item in items, opts) => Transform(item);

This single line:

Takes a collection of items
Distributes them across available nodes
Applies your transform to each item
Collects and returns all results

Processing Strategies

WitEngine offers two distribution strategies:

Balanced — Pre-allocates tasks based on benchmarks:

ProcessingOptions:opts = ProcessingOptions.Create("Balanced");

~ All tasks assigned upfront based on node speed ~
~ Node A (fast): gets 60% of tasks ~
~ Node B (slow): gets 40% of tasks ~

Queued — Nodes pull tasks from a central queue:

ProcessingOptions:opts = ProcessingOptions.Create("Queued");

~ Tasks go into a queue ~
~ Each node pulls next task when ready ~
~ Fast nodes naturally get more tasks ~

When to Use Which

Your workload...	Use
All tasks take similar time	Balanced
Task duration varies	Queued
Task complexity is known in advance	Balanced
Task complexity is unpredictable	Queued
Nodes are stable	Balanced
Nodes may disconnect	Queued

Default recommendation: Start with Queued. It's more forgiving of variability and requires less tuning.

Transform Activities

For Grid.ForEach to work, your activity must be a Transform—an operation that takes input and produces output, designed for parallel execution.

Transform vs. Function

csharp

// FUNCTION - runs on host, not distributed
[Activity("Calculate")]
public class WitActivityCalculate : WitActivityFunction { }

// TRANSFORM - can run on any node, distributable  
[Activity("ProcessBatch")]
public class WitActivityProcessBatch : WitActivityTransform { }

Use WitActivityTransform base class for activities that will be used in Grid.ForEach.

Transform Adapter

csharp

public class WitActivityAdapterProcessBatch 
    : WitActivityAdapterTransform<WitActivityProcessBatch>,  // Note: Transform base
      IWitBenchmarkAdapter
{
    // ProcessInner runs on worker nodes
    protected override async Task<object?> ProcessInner(
        WitActivityProcessBatch activity,
        IWitVariablesCollection pool,
        IWitActivityStatus? activityStatus,
        WitProcessingStatus status)
    {
        // This code executes on remote nodes!
        // Only serialized data is available
        
        if (!pool.TryGetValue(activity.Input, out MyData? input))
            throw new InvalidOperationException("Failed to get input");

        return Process(input);
    }
    
    // Benchmark implementation
    public async Task<IWitBenchmarkResult> RunBenchmark(...) { ... }
    public double EstimateWork(...) { ... }
}

Testing Distributed Logic Locally

The SDK runs on one node, but you can still test distributed patterns:

Test Grid.ForEach

csharp

[Test]
public async Task GridForEachProcessesAllItems()
{
    var job = WitEngineSdk.Instance.Compile(@"
        Job:Test()
        {
            IntCollection:items = [1, 2, 3, 4, 5];
            ProcessingOptions:opts = ProcessingOptions.Create(""Balanced"");
            
            IntCollection:results = 
                Grid.ForEach(item in items, opts) => MyMath.Square(item);
            
            Return(results);
        }
    ");

    var status = await WitEngineSdk.Instance.ScheduleAndWaitAsync(job);
    
    Assert.That(status.Result, Is.EqualTo(WitProcessingResult.Completed));
    
    var results = status.ReturnedValues.First() as IEnumerable<int>;
    CollectionAssert.AreEquivalent(new[] { 1, 4, 9, 16, 25 }, results);
}

Test Serialization Round-Trip

Distributed execution serializes activities. Test that yours survives:

csharp

[Test]
public void ActivitySurvivesSerializationRoundTrip()
{
    var original = new WitActivityMyTransform
    {
        InputValue = new WitParameterVariable("x")
    };
    
    // Serialize
    var bytes = MemoryPackSerializer.Serialize(original);
    
    // Deserialize
    var restored = MemoryPackSerializer.Deserialize<WitActivityMyTransform>(bytes);
    
    // Verify
    Assert.That(original.Is(restored), Is.True);
}

Test Stateless Behavior

Adapters must be stateless—they're recreated on each node:

csharp

// WRONG - adapter has state
public class BadAdapter : WitActivityAdapterTransform<MyActivity>
{
    private int _counter = 0;  // State lost on transfer!
    
    protected override async Task<object?> ProcessInner(...)
    {
        _counter++;  // This won't work distributed
        return _counter;
    }
}

// RIGHT - stateless adapter
public class GoodAdapter : WitActivityAdapterTransform<MyActivity>
{
    protected override async Task<object?> ProcessInner(...)
    {
        // All data comes from activity and pool
        // No adapter state
    }
}

WitCloud Deployment Checklist

Before deploying to WitCloud or OmnibusCloud, verify:

Serialization

Check	Status
Activity class has `[MemoryPackable]`	☐
Activity class is `partial`	☐
All properties are serializable types	☐
Interface properties have `[MemoryPackAllowSerialize]`	☐
Custom data types have `[MemoryPackable]`	☐
Serialization round-trip test passes	☐

Distribution

Check	Status
Transform activities use `WitActivityTransform` base	☐
Adapters are stateless	☐
Adapter implements `IWitBenchmarkAdapter`	☐
`RunBenchmark` returns meaningful rate	☐
`EstimateWork` reflects actual complexity	☐

Controller

Check	Status
Controller implements `IWitControllerHost`	☐
Controller implements `IWitControllerNode`	☐
All activities registered in `Initialize()`	☐

Deployment

Check	Status
Controller DLL built for target runtime	☐
All dependencies included	☐
No hardcoded paths (use relative or injected)	☐
Error handling includes context	☐

What Happens on Real Clusters

When your plugin runs on WitCloud or OmnibusCloud:

1. Controller Distribution

Your controller DLL is distributed to all nodes:

Host Machine              Worker Node 1           Worker Node 2
@Controllers/             @Controllers/           @Controllers/
├── MyPlugin.dll          ├── MyPlugin.dll        ├── MyPlugin.dll
└── Variables.dll         └── Variables.dll       └── Variables.dll

2. Benchmark Execution

Each node runs your benchmark:

Node A runs RunBenchmark() → 1500 ops/sec
Node B runs RunBenchmark() → 800 ops/sec
Node C runs RunBenchmark() → 2200 ops/sec

3. Task Allocation

The host uses benchmarks + work estimates to allocate:

1000 tasks total
Node A (1500 ops/sec): 333 tasks
Node B (800 ops/sec): 178 tasks  
Node C (2200 ops/sec): 489 tasks

4. Distributed Execution

Tasks are serialized, sent to nodes, executed, results returned:

Host                           Nodes
  │                              │
  ├── Serialize Task 1 ────────► Node A: Execute, return result
  ├── Serialize Task 2 ────────► Node B: Execute, return result
  ├── Serialize Task 3 ────────► Node C: Execute, return result
  │                              │
  ◄── Collect results ───────────┘

5. Result Aggregation

Results stream back and are collected into the output collection.

From Smart TVs to Servers

Here's the payoff of all this work: extreme heterogeneity.

WitCloud and OmnibusCloud can include:

Device Type               Benchmark Score
─────────────────────────────────────────
Dedicated server          ~3000 ops/sec
Gaming PC                 ~1500 ops/sec
Office workstation        ~800 ops/sec
Laptop                    ~400 ops/sec
Smart TV                  ~50 ops/sec
Mobile device             ~30 ops/sec

A 100× performance spread! But with proper benchmarking:

The server gets 100× more tasks than the smart TV
The smart TV still contributes useful work
All devices finish at approximately the same time
No wasted capacity anywhere

This is why benchmarking isn't optional for distributed plugins—it's what makes heterogeneous computing possible.

Summary

To make your plugin distribution-ready:

Component	What It Does
IWitBenchmarkAdapter	Enables intelligent load balancing
RunBenchmark	Measures node speed
EstimateWork	Measures task complexity
WitActivityTransform	Marks activities as distributable
Stateless adapters	Ensures clean execution on any node
MemoryPack serialization	Enables data transfer between nodes

The SDK lets you develop and test locally. Benchmarks and proper architecture ensure your plugin works at scale—whether that's a 10-node office cluster or a global OmnibusCloud mesh with everything from servers to smart TVs.