AI/ML Batch Processing with WitEngine

Machine learning workflows demand massive computational resources. Training pipelines preprocess terabytes of data. Inference systems classify millions of samples daily. Evaluation runs span thousands of hyperparameter combinations. When a single machine takes days, distribution becomes essential.

WitEngine transforms these workflows from bottlenecks into streamlined operations. Instead of managing infrastructure, you write simple scripts that express what you want computed—WitEngine handles where and how.


Scenario 1: Batch Inference at Scale

The challenge: A production system needs to classify 10 million images using a pre-trained ResNet model. Single-threaded processing estimates 5+ hours. Users expect results within minutes.

How WitEngine helps:

WitEngine's orchestration layer breaks the dataset into batches and distributes them across available nodes. The Grid.ForEach primitive handles task assignment automatically:

~ Load model configuration ~
String:modelPath = "/models/resnet50.onnx";
Int:batchSize = 32;
Double:confidenceThreshold = 0.8;

~ Prepare inference tasks ~
StringCollection:imageFiles = File.ListFiles("/data/images", "*.jpg");
ImageBatchCollection:batches = Image.CreateBatches(imageFiles, batchSize);

~ Configure distribution ~
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");
opts = ProcessingOptions.SetRequirement(opts, "VRAM", "8GB");

~ Distribute inference ~
ClassificationResultCollection:results = 
    Grid.ForEach(batch in batches, opts) => ML.ClassifyBatch(model, batch, confidenceThreshold);

Key mechanisms:

Mechanism Benefit
Queued strategy Nodes pull tasks as they finish—handles variable inference times gracefully
GPU requirements Only nodes with sufficient VRAM receive tasks
Model caching Model loads once per node, shared across all batches
Automatic batching Optimal batch sizes maximize GPU utilization

Result: 10 million images processed in ~15 minutes across 20 GPU nodes instead of 5+ hours on one machine.


Scenario 2: Embedding Generation

The challenge: A semantic search system requires embeddings for 50 million text documents. Each document passes through a transformer model. Processing sequentially would take weeks.

How WitEngine helps:

Embedding generation is embarrassingly parallel—each document processes independently. WitEngine exploits this by distributing chunks across heterogeneous hardware:

~ Configure embedding model ~
String:modelPath = "/models/sentence-transformer";
Int:chunkSize = 10000;
String:device = "cuda";

~ Load documents in chunks ~
DocumentChunkCollection:chunks = Document.LoadChunks("/data/corpus", chunkSize);

~ Create embedding tasks ~
EmbeddingTaskCollection:tasks = [];

ForEach(chunk in chunks)
{
    EmbeddingTask:task = EmbeddingTask.Create(chunk, modelPath);
    task = EmbeddingTask.SetDevice(task, device);
    task = EmbeddingTask.SetNormalize(task, true);
    tasks = Collection.Add(tasks, task);
}

~ Distribute across mixed hardware ~
ProcessingOptions:opts = ProcessingOptions.Create("Balanced");

~ Some nodes have A100s, others have consumer GPUs ~
EmbeddingResultCollection:results = 
    Grid.ForEach(task in tasks, opts) => ML.GenerateEmbeddings(task);

Key mechanisms:

Mechanism Benefit
Heterogeneous node support A100 nodes get more chunks than RTX 3070 nodes—proportional to benchmarked speed
Balanced strategy Pre-assigns work based on node performance, minimizing idle time
Chunk-based processing Memory-efficient—no single node loads the entire corpus
Normalized outputs Consistent embedding format regardless of which node processed them

Result: 50 million documents embedded in hours instead of weeks. Faster nodes automatically receive proportionally more work.


Scenario 3: Synthetic Data Generation

The challenge: Training a computer vision model requires 1 million synthetic images with controlled variations—lighting, angles, backgrounds, occlusions. Rendering each image takes 2-10 seconds depending on scene complexity.

How WitEngine helps:

Synthetic data generation has highly variable task durations. WitEngine's queued strategy handles this naturally—fast tasks don't block on slow ones:

~ Define generation parameters ~
Int:totalImages = 1000000;
Int:imagesPerTask = 100;

~ Create generation tasks with varying complexity ~
SyntheticTaskCollection:tasks = [];

Loop(Int.Divide(totalImages, imagesPerTask))
{
    SyntheticTask:task = SyntheticTask.Create();
    task = SyntheticTask.SetOutputSize(task, 512, 512);
    task = SyntheticTask.SetVariations(task, {
        "lighting": ["studio", "natural", "dramatic"],
        "angle": [-30, 0, 30, 45],
        "background": "random",
        "occlusion": [0.0, 0.1, 0.2]
    });
    task = SyntheticTask.SetSeed(task, Random.Int());  ~ Reproducibility ~
    tasks = Collection.Add(tasks, task);
}

~ Distribute generation ~
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");

SyntheticResultCollection:results = 
    Grid.ForEach(task in tasks, opts) => Synthetic.GenerateBatch(task);

Key mechanisms:

Mechanism Benefit
Pull-based distribution Nodes request work when ready—no coordinator bottleneck
Seeded generation Every image reproducible from its seed value
Variation control Systematic coverage of the parameter space
Fault tolerance If a node fails, its incomplete tasks return to the queue

Result: 1 million images generated in under 3 hours across 30 render nodes. Full reproducibility—regenerate any image by replaying its seed.


What You Get

Throughput

WitEngine maximizes hardware utilization through intelligent scheduling:

Single Machine WitEngine (20 nodes) Speedup
Preprocess 1M images: 8 hours 24 minutes 20×
Inference on 10M samples: 5 hours 15 minutes 20×
100 hyperparameter configs: 4 days 5 hours 19×
Generate 1M synthetic images: 60 hours 3 hours 20×

The benchmark system measures actual node performance for your specific workloads. Fast nodes get more work. Slow nodes aren't wasted—they contribute what they can.

Cost Predictability

Fixed infrastructure costs, variable workload scales:

  • Pre-partitioned work — Know task counts before execution starts
  • Resource requirements — Tasks only run on capable nodes
  • No over-provisioning — Mixed hardware contributes proportionally
  • Batch accounting — Track compute time per task for chargeback

Reproducibility

Every execution can be replayed:

  • Deterministic task assignment — Same inputs produce same distribution
  • Seeded randomness — Stochastic processes controlled by explicit seeds
  • Version-locked models — Model paths explicit in scripts
  • Immutable configurations — Parameters captured at execution time

Debugging a failed batch? Replay the exact task on a single node with full tracing enabled.

Observability

Real-time visibility into distributed execution:

~ Enable progress tracking ~
opts = ProcessingOptions.SetProgressReporting(opts, true);
opts = ProcessingOptions.SetProgressInterval(opts, 10);

~ Progress callback ~
Grid.OnProgress(progress =>
{
    Trace("Progress:", String.Format("{0:F1}%", Double.Multiply(progress.Percent, 100)));
    Trace("Tasks completed:", progress.CompletedTasks, "/", progress.TotalTasks);
    Trace("Active nodes:", progress.ActiveNodes);
    
    If(progress.StalledTasks > 0)
    {
        Alert.Send(endpoint, "StalledTasks", {"count": progress.StalledTasks});
    }
});

Track at every level:

Level Visibility
Job Overall progress, wall-clock time, success/failure counts
Task Individual status, node assignment, processing duration
Node Current load, completed tasks, error rate
Sample Per-item results, confidence scores, processing metadata

Typical Workflow

1. Define your data pipeline

Identify independent units of work—batches, chunks, or individual samples that can process in parallel.

2. Create task collections

Build a collection of task objects, each containing the inputs and parameters for one unit of work:

InferenceTaskCollection:tasks = [];

ForEach(batch in batches)
{
    InferenceTask:task = InferenceTask.Create(modelPath, batch);
    tasks = Collection.Add(tasks, task);
}

3. Configure distribution strategy

Choose based on your workload characteristics:

Strategy Use When
Balanced Task durations are predictable and similar
Queued Task durations vary significantly

4. Specify node requirements

Declare what hardware your tasks need:

ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");
opts = ProcessingOptions.SetRequirement(opts, "VRAM", "8GB");
opts = ProcessingOptions.SetRequirement(opts, "CUDA", "11.0+");

5. Execute and aggregate

Distribute tasks and collect results:

ResultCollection:results = 
    Grid.ForEach(task in tasks, opts) => ML.ProcessBatch(task);

6. Analyze and report

Process results, compute statistics, generate outputs:

Int:successCount = 0;
Double:totalTime = 0;

ForEach(result in results)
{
    If(result.Success == true)
    {
        successCount = Int.Add(successCount, 1);
        totalTime = Double.Add(totalTime, result.ProcessingTime);
    }
}

Trace("Completed:", successCount, "/", Collection.Count(tasks));
Trace("Throughput:", Double.Divide(totalProcessed, wallClockTime), "samples/sec");

Scaling to the Cloud: From Workstations to Global Compute Mesh

WitEngine's architecture doesn't impose artificial limits on cluster size. The same script that runs on 5 workstations runs on 5,000 cloud instances—or 50,000. This isn't theoretical: WitEngine is the core of WitCloud, a distributed computing platform with two deployment modes designed for exactly this scale.

WitCloud: Two Paths to Scale

Local WitCloud (Enterprise) — Deploy within your organization. Your hardware, your data, your control. Connect offices across time zones into a unified compute mesh.

OmnibusCloud (Global Service) — The public instance of WitCloud. A worldwide distributed computing platform where anyone can contribute resources and anyone can run workloads. Currently in closed beta with plans for open access.

Why This Matters for AI/ML

Machine learning workloads are bursty. You don't need 10,000 GPUs every day—but when you're retraining a model on fresh data or running inference on a massive backlog, you need them now.

Scenario On-Premise (20 nodes) Local WitCloud (500 nodes) OmnibusCloud (20,000+ nodes)
Embed 50M documents 8 hours 20 minutes 30 seconds
Inference on 100M samples 12 hours 30 minutes 45 seconds
Hyperparameter search (10,000 configs) 5 days 5 hours 6 minutes
Generate 10M synthetic images 30 hours 1.5 hours 2 minutes

The script doesn't change. The infrastructure scales.

Elastic Scaling Pattern

~ Define task collection (same as always) ~
EmbeddingTaskCollection:tasks = CreateEmbeddingTasks(documents, chunkSize);

~ Configure for cloud execution ~
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");
opts = ProcessingOptions.SetRequirement(opts, "VRAM", "16GB");

~ Local WitCloud, OmnibusCloud, or hybrid—script is identical ~
EmbeddingResultCollection:results = 
    Grid.ForEach(task in tasks, opts) => ML.GenerateEmbeddings(task);

Local WitCloud: The "Office at Night" Advantage

Consider a typical enterprise: 500 workstations across three offices. During business hours, employees use perhaps 5-10% of CPU capacity. From 8 PM to 8 AM—12 hours every night—these machines sit completely idle.

Local WitCloud turns this waste into a free supercomputer:

Daytime: 500 machines × 5% utilization = 25 machines worth of compute
Nighttime with WitCloud: 500 machines × 100% utilization = 500 machines

Result: 20× more compute capacity, zero additional hardware cost

For ML teams, this means:

  • Nightly model retraining on the full dataset, not samples
  • Hyperparameter sweeps that would be cost-prohibitive on cloud
  • Batch inference backlogs cleared by morning

OmnibusCloud: Global Compute on Demand

OmnibusCloud extends this model globally—a worldwide mesh of contributed compute resources accessible via the internet.

Current status: Closed beta, initially focused on distributed rendering (Blender plugin). ML workloads planned for Phase 2.

The vision: Submit an embedding job tonight, have it processed by idle machines across time zones, results ready by morning. Pay-per-use or contribute your own idle resources.

Architecture for Massive Scale

WitEngine's design principles enable cloud-scale deployment:

Principle Benefit at Scale
Stateless nodes Nodes don't share state—spin up/down freely
Pull-based distribution No coordinator bottleneck; nodes request work when ready
Capability matching Heterogeneous cloud instances (spot, on-demand, different GPU types) contribute appropriately
Fault tolerance Node failures don't lose work—tasks return to queue
Benchmark-aware allocation Different instance types get proportional workloads

Hybrid Deployment

Many organizations run hybrid: baseline on-premise capacity plus cloud burst for peaks.

Typical hybrid pattern:

On-premise cluster (always running):
  - 20 GPU workstations
  - Handles daily inference load
  - Predictable cost

Cloud burst (on-demand):
  - Scale to 500-5,000 instances
  - Monthly model retraining
  - Quarterly full dataset reprocessing
  - Pay only for burst duration

The Queued strategy handles this naturally—cloud nodes join the pool, pull work, and disappear when done. No reconfiguration required.

Cost Optimization at Scale

Cloud scale introduces cost considerations that WitEngine's architecture addresses:

Challenge WitEngine Approach
Spot instance interruption Tasks return to queue, picked up by other nodes
Mixed instance pricing Benchmark system allocates work proportional to cost-performance
Idle time waste Queued strategy ensures nodes work until queue empty
Over-provisioning Real-time progress tracking shows actual completion ETA

When you're paying per-second for 10,000 GPU instances, efficiency isn't optional—it's the difference between a viable pipeline and a budget disaster.


Summary

WitEngine transforms ML batch processing from infrastructure management into workflow definition. You describe what needs computing—batches of inference, embeddings to generate, synthetic data to create. WitEngine handles task distribution, node selection, fault recovery, and result aggregation.

The result: faster iteration cycles, predictable costs, reproducible results, and full visibility into your distributed workloads.