From Local to Distributed: Benchmarks and Grid Operations

You've built a WitEngine plugin. It works locally, tests pass, life is good. But WitEngine's real power is distributed execution—spreading work across dozens, hundreds, or thousands of machines.

This post covers what it takes to make your plugin distribution-ready: benchmarking for intelligent load balancing, Grid.ForEach for parallel execution, and the checklist for WitCloud deployment.


The SDK vs. Production

First, let's be clear about where you are and where you're going.

WitEngine SDK (what you've been using):

Limit Value
Max activities per job 50
Max variables per job 100
Max execution time 5 minutes
Max nodes 1 (local only)
Max variable size 100 MB

WitCloud / OmnibusCloud (production):

Capability Value
Activities per job Unlimited
Variables per job Unlimited
Execution time Hours to days
Nodes Thousands+
Variable size Gigabytes

The SDK is your development sandbox. It runs everything on one machine, which is perfect for building and testing. But to actually distribute work, you need WitCloud infrastructure—and your plugin needs to be ready for it.


Why Benchmarks Matter

In a distributed system, not all machines are equal. Your cluster might include:

  • A brand new workstation with an RTX 4090
  • Five-year-old office PCs
  • A powerful server with many cores but older architecture
  • Contributed laptops with variable performance

If you distribute work equally (100 tasks ÷ 4 machines = 25 each), the fast machines finish early and wait. The slow machine becomes a bottleneck. You've paid for four machines but only fully utilized one.

Benchmarks solve this. They measure actual performance on each node, enabling proportional distribution:


The IWitBenchmarkAdapter Interface

To enable intelligent distribution, your adapter implements IWitBenchmarkAdapter:

csharp
public interface IWitBenchmarkAdapter
{
    /// <summary>
    /// Measures how fast this node can execute the activity.
    /// </summary>
    Task<IWitBenchmarkResult> RunBenchmark(
        IWitBenchmarkOptions? options,
        CancellationToken cancellationToken);
    
    /// <summary>
    /// Estimates relative work for a specific task.
    /// </summary>
    double EstimateWork(
        IWitActivity activity,
        IWitVariablesCollection pool);
}

Two methods, two purposes:

  • RunBenchmark — "How fast is this node?" (runs once per node, cached)
  • EstimateWork — "How big is this task?" (runs for each task)

Together, they enable optimal task allocation.


Implementing RunBenchmark

RunBenchmark measures node performance by running a representative workload:

csharp
public class WitActivityAdapterMyTransform 
    : WitActivityAdapterTransform<WitActivityMyTransform>,
      IWitBenchmarkAdapter
{
    public async Task<IWitBenchmarkResult> RunBenchmark(
        IWitBenchmarkOptions? options,
        CancellationToken cancellationToken)
    {
        options ??= WitBenchmarkOptions.Default;
        
        // 1. Create representative test data
        var testData = CreateBenchmarkData();
        
        // 2. Warmup phase - JIT compilation, cache warming
        for (int i = 0; i < options.WarmupIterations; i++)
        {
            cancellationToken.ThrowIfCancellationRequested();
            ProcessItem(testData);
        }
        
        // 3. Measurement phase - count operations until time limit
        var stopwatch = Stopwatch.StartNew();
        long operations = 0;
        
        while (stopwatch.Elapsed < options.MinDuration)
        {
            cancellationToken.ThrowIfCancellationRequested();
            ProcessItem(testData);
            operations++;
        }
        
        stopwatch.Stop();
        
        // 4. Calculate rate (operations per second)
        double rate = operations / stopwatch.Elapsed.TotalSeconds;
        
        return new WitBenchmarkResult(rate);
    }
}

Benchmark Best Practices

Practice Why It Matters
Use representative data Benchmark should reflect real workload characteristics
Warmup first JIT compilation makes first runs slower; exclude them
Run for sufficient time Short runs have high variance; aim for 500ms+
Check cancellation Allow graceful termination
Return operations/second Higher number = faster node

What Makes Good Test Data?

Your benchmark data should be:

  • Representative — Similar size and complexity to real tasks
  • Deterministic — Same data every time for consistent results
  • Fast to create — Don't spend benchmark time on setup
csharp
private MyData CreateBenchmarkData()
{
    // Create a typical-sized item
    // Not the smallest, not the largest - representative
    return new MyData
    {
        Values = Enumerable.Range(0, 1000).Select(i => i * 0.1).ToArray(),
        Metadata = "benchmark-item"
    };
}

Implementing EstimateWork

While RunBenchmark measures node speed, EstimateWork measures task size. This handles cases where tasks have different complexity.

Constant-Time Operations

If every task takes the same time regardless of input:

csharp
public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
    // Temperature conversion, simple calculations, etc.
    return 1.0;
}

Linear Operations (O(n))

If time scales with data size:

csharp
public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
    var myActivity = (WitActivityProcessData)activity;
    
    if (pool.TryGetValue(myActivity.Data, out DataCollection? data))
    {
        return data?.Count ?? 1.0;
    }
    
    return 1.0;
}

Quadratic Operations (O(n²))

If time scales with the square of data size:

csharp
public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
    var myActivity = (WitActivityMatrixOp)activity;
    
    if (pool.TryGetValue(myActivity.Matrix, out Matrix? matrix))
    {
        double size = matrix?.RowCount ?? 1;
        return size * size;
    }
    
    return 1.0;
}

Complex Estimation

For operations with multiple factors:

csharp
public double EstimateWork(IWitActivity activity, IWitVariablesCollection pool)
{
    var renderTask = (WitActivityRenderFrame)activity;
    
    if (pool.TryGetValue(renderTask.Task, out RenderTaskData? task))
    {
        // Resolution affects time
        double pixels = (task?.Width ?? 1920) * (task?.Height ?? 1080);
        
        // Samples affect time linearly
        double samples = task?.Samples ?? 128;
        
        // Combine factors
        return pixels * samples / 1_000_000; // Normalize to reasonable range
    }
    
    return 1.0;
}

Grid.ForEach: Distributed Iteration

The Grid controller provides Grid.ForEach—the primary way to distribute work:

ResultCollection:results = Grid.ForEach(item in items, opts) => Transform(item);

This single line:

  1. Takes a collection of items
  2. Distributes them across available nodes
  3. Applies your transform to each item
  4. Collects and returns all results

Processing Strategies

WitEngine offers two distribution strategies:

Balanced — Pre-allocates tasks based on benchmarks:

ProcessingOptions:opts = ProcessingOptions.Create("Balanced");

~ All tasks assigned upfront based on node speed ~
~ Node A (fast): gets 60% of tasks ~
~ Node B (slow): gets 40% of tasks ~

Queued — Nodes pull tasks from a central queue:

ProcessingOptions:opts = ProcessingOptions.Create("Queued");

~ Tasks go into a queue ~
~ Each node pulls next task when ready ~
~ Fast nodes naturally get more tasks ~

When to Use Which

Your workload... Use
All tasks take similar time Balanced
Task duration varies Queued
Task complexity is known in advance Balanced
Task complexity is unpredictable Queued
Nodes are stable Balanced
Nodes may disconnect Queued

Default recommendation: Start with Queued. It's more forgiving of variability and requires less tuning.


Transform Activities

For Grid.ForEach to work, your activity must be a Transform—an operation that takes input and produces output, designed for parallel execution.

Transform vs. Function

csharp
// FUNCTION - runs on host, not distributed
[Activity("Calculate")]
public class WitActivityCalculate : WitActivityFunction { }

// TRANSFORM - can run on any node, distributable  
[Activity("ProcessBatch")]
public class WitActivityProcessBatch : WitActivityTransform { }

Use WitActivityTransform base class for activities that will be used in Grid.ForEach.

Transform Adapter

csharp
public class WitActivityAdapterProcessBatch 
    : WitActivityAdapterTransform<WitActivityProcessBatch>,  // Note: Transform base
      IWitBenchmarkAdapter
{
    // ProcessInner runs on worker nodes
    protected override async Task<object?> ProcessInner(
        WitActivityProcessBatch activity,
        IWitVariablesCollection pool,
        IWitActivityStatus? activityStatus,
        WitProcessingStatus status)
    {
        // This code executes on remote nodes!
        // Only serialized data is available
        
        if (!pool.TryGetValue(activity.Input, out MyData? input))
            throw new InvalidOperationException("Failed to get input");

        return Process(input);
    }
    
    // Benchmark implementation
    public async Task<IWitBenchmarkResult> RunBenchmark(...) { ... }
    public double EstimateWork(...) { ... }
}

Testing Distributed Logic Locally

The SDK runs on one node, but you can still test distributed patterns:

Test Grid.ForEach

csharp
[Test]
public async Task GridForEachProcessesAllItems()
{
    var job = WitEngineSdk.Instance.Compile(@"
        Job:Test()
        {
            IntCollection:items = [1, 2, 3, 4, 5];
            ProcessingOptions:opts = ProcessingOptions.Create(""Balanced"");
            
            IntCollection:results = 
                Grid.ForEach(item in items, opts) => MyMath.Square(item);
            
            Return(results);
        }
    ");

    var status = await WitEngineSdk.Instance.ScheduleAndWaitAsync(job);
    
    Assert.That(status.Result, Is.EqualTo(WitProcessingResult.Completed));
    
    var results = status.ReturnedValues.First() as IEnumerable<int>;
    CollectionAssert.AreEquivalent(new[] { 1, 4, 9, 16, 25 }, results);
}

Test Serialization Round-Trip

Distributed execution serializes activities. Test that yours survives:

csharp
[Test]
public void ActivitySurvivesSerializationRoundTrip()
{
    var original = new WitActivityMyTransform
    {
        InputValue = new WitParameterVariable("x")
    };
    
    // Serialize
    var bytes = MemoryPackSerializer.Serialize(original);
    
    // Deserialize
    var restored = MemoryPackSerializer.Deserialize<WitActivityMyTransform>(bytes);
    
    // Verify
    Assert.That(original.Is(restored), Is.True);
}

Test Stateless Behavior

Adapters must be stateless—they're recreated on each node:

csharp
// WRONG - adapter has state
public class BadAdapter : WitActivityAdapterTransform<MyActivity>
{
    private int _counter = 0;  // State lost on transfer!
    
    protected override async Task<object?> ProcessInner(...)
    {
        _counter++;  // This won't work distributed
        return _counter;
    }
}

// RIGHT - stateless adapter
public class GoodAdapter : WitActivityAdapterTransform<MyActivity>
{
    protected override async Task<object?> ProcessInner(...)
    {
        // All data comes from activity and pool
        // No adapter state
    }
}

WitCloud Deployment Checklist

Before deploying to WitCloud or OmnibusCloud, verify:

Serialization

Check Status
Activity class has [MemoryPackable]
Activity class is partial
All properties are serializable types
Interface properties have [MemoryPackAllowSerialize]
Custom data types have [MemoryPackable]
Serialization round-trip test passes

Distribution

Check Status
Transform activities use WitActivityTransform base
Adapters are stateless
Adapter implements IWitBenchmarkAdapter
RunBenchmark returns meaningful rate
EstimateWork reflects actual complexity

Controller

Check Status
Controller implements IWitControllerHost
Controller implements IWitControllerNode
All activities registered in Initialize()

Deployment

Check Status
Controller DLL built for target runtime
All dependencies included
No hardcoded paths (use relative or injected)
Error handling includes context

What Happens on Real Clusters

When your plugin runs on WitCloud or OmnibusCloud:

1. Controller Distribution

Your controller DLL is distributed to all nodes:

Host Machine              Worker Node 1           Worker Node 2
@Controllers/             @Controllers/           @Controllers/
├── MyPlugin.dll          ├── MyPlugin.dll        ├── MyPlugin.dll
└── Variables.dll         └── Variables.dll       └── Variables.dll

2. Benchmark Execution

Each node runs your benchmark:

Node A runs RunBenchmark() → 1500 ops/sec
Node B runs RunBenchmark() → 800 ops/sec
Node C runs RunBenchmark() → 2200 ops/sec

3. Task Allocation

The host uses benchmarks + work estimates to allocate:

1000 tasks total
Node A (1500 ops/sec): 333 tasks
Node B (800 ops/sec): 178 tasks  
Node C (2200 ops/sec): 489 tasks

4. Distributed Execution

Tasks are serialized, sent to nodes, executed, results returned:

Host                           Nodes
  │                              │
  ├── Serialize Task 1 ────────► Node A: Execute, return result
  ├── Serialize Task 2 ────────► Node B: Execute, return result
  ├── Serialize Task 3 ────────► Node C: Execute, return result
  │                              │
  ◄── Collect results ───────────┘

5. Result Aggregation

Results stream back and are collected into the output collection.


From Smart TVs to Servers

Here's the payoff of all this work: extreme heterogeneity.

WitCloud and OmnibusCloud can include:

Device Type               Benchmark Score
─────────────────────────────────────────
Dedicated server          ~3000 ops/sec
Gaming PC                 ~1500 ops/sec
Office workstation        ~800 ops/sec
Laptop                    ~400 ops/sec
Smart TV                  ~50 ops/sec
Mobile device             ~30 ops/sec

A 100× performance spread! But with proper benchmarking:

  • The server gets 100× more tasks than the smart TV
  • The smart TV still contributes useful work
  • All devices finish at approximately the same time
  • No wasted capacity anywhere

This is why benchmarking isn't optional for distributed plugins—it's what makes heterogeneous computing possible.


Summary

To make your plugin distribution-ready:

Component What It Does
IWitBenchmarkAdapter Enables intelligent load balancing
RunBenchmark Measures node speed
EstimateWork Measures task complexity
WitActivityTransform Marks activities as distributable
Stateless adapters Ensures clean execution on any node
MemoryPack serialization Enables data transfer between nodes

The SDK lets you develop and test locally. Benchmarks and proper architecture ensure your plugin works at scale—whether that's a 10-node office cluster or a global OmnibusCloud mesh with everything from servers to smart TVs.