AI/ML Batch Processing with WitEngine

Machine learning workflows demand massive computational resources. Training pipelines preprocess terabytes of data. Inference systems classify millions of samples daily. Evaluation runs span thousands of hyperparameter combinations. When a single machine takes days, distribution becomes essential.

WitEngine transforms these workflows from bottlenecks into streamlined operations. Instead of managing infrastructure, you write simple scripts that express what you want computed—WitEngine handles where and how.

Scenario 1: Batch Inference at Scale

The challenge: A production system needs to classify 10 million images using a pre-trained ResNet model. Single-threaded processing estimates 5+ hours. Users expect results within minutes.

How WitEngine helps:

WitEngine's orchestration layer breaks the dataset into batches and distributes them across available nodes. The Grid.ForEach primitive handles task assignment automatically:

~ Load model configuration ~
String:modelPath = "/models/resnet50.onnx";
Int:batchSize = 32;
Double:confidenceThreshold = 0.8;

~ Prepare inference tasks ~
StringCollection:imageFiles = File.ListFiles("/data/images", "*.jpg");
ImageBatchCollection:batches = Image.CreateBatches(imageFiles, batchSize);

~ Configure distribution ~
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");
opts = ProcessingOptions.SetRequirement(opts, "VRAM", "8GB");

~ Distribute inference ~
ClassificationResultCollection:results = 
    Grid.ForEach(batch in batches, opts) => ML.ClassifyBatch(model, batch, confidenceThreshold);

Key mechanisms:

Mechanism	Benefit
Queued strategy	Nodes pull tasks as they finish—handles variable inference times gracefully
GPU requirements	Only nodes with sufficient VRAM receive tasks
Model caching	Model loads once per node, shared across all batches
Automatic batching	Optimal batch sizes maximize GPU utilization

Result: 10 million images processed in ~15 minutes across 20 GPU nodes instead of 5+ hours on one machine.

Scenario 2: Embedding Generation

The challenge: A semantic search system requires embeddings for 50 million text documents. Each document passes through a transformer model. Processing sequentially would take weeks.

How WitEngine helps:

Embedding generation is embarrassingly parallel—each document processes independently. WitEngine exploits this by distributing chunks across heterogeneous hardware:

~ Configure embedding model ~
String:modelPath = "/models/sentence-transformer";
Int:chunkSize = 10000;
String:device = "cuda";

~ Load documents in chunks ~
DocumentChunkCollection:chunks = Document.LoadChunks("/data/corpus", chunkSize);

~ Create embedding tasks ~
EmbeddingTaskCollection:tasks = [];

ForEach(chunk in chunks)
{
    EmbeddingTask:task = EmbeddingTask.Create(chunk, modelPath);
    task = EmbeddingTask.SetDevice(task, device);
    task = EmbeddingTask.SetNormalize(task, true);
    tasks = Collection.Add(tasks, task);
}

~ Distribute across mixed hardware ~
ProcessingOptions:opts = ProcessingOptions.Create("Balanced");

~ Some nodes have A100s, others have consumer GPUs ~
EmbeddingResultCollection:results = 
    Grid.ForEach(task in tasks, opts) => ML.GenerateEmbeddings(task);

Key mechanisms:

Mechanism	Benefit
Heterogeneous node support	A100 nodes get more chunks than RTX 3070 nodes—proportional to benchmarked speed
Balanced strategy	Pre-assigns work based on node performance, minimizing idle time
Chunk-based processing	Memory-efficient—no single node loads the entire corpus
Normalized outputs	Consistent embedding format regardless of which node processed them

Result: 50 million documents embedded in hours instead of weeks. Faster nodes automatically receive proportionally more work.

Scenario 3: Synthetic Data Generation

The challenge: Training a computer vision model requires 1 million synthetic images with controlled variations—lighting, angles, backgrounds, occlusions. Rendering each image takes 2-10 seconds depending on scene complexity.

How WitEngine helps:

Synthetic data generation has highly variable task durations. WitEngine's queued strategy handles this naturally—fast tasks don't block on slow ones:

~ Define generation parameters ~
Int:totalImages = 1000000;
Int:imagesPerTask = 100;

~ Create generation tasks with varying complexity ~
SyntheticTaskCollection:tasks = [];

Loop(Int.Divide(totalImages, imagesPerTask))
{
    SyntheticTask:task = SyntheticTask.Create();
    task = SyntheticTask.SetOutputSize(task, 512, 512);
    task = SyntheticTask.SetVariations(task, {
        "lighting": ["studio", "natural", "dramatic"],
        "angle": [-30, 0, 30, 45],
        "background": "random",
        "occlusion": [0.0, 0.1, 0.2]
    });
    task = SyntheticTask.SetSeed(task, Random.Int());  ~ Reproducibility ~
    tasks = Collection.Add(tasks, task);
}

~ Distribute generation ~
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");

SyntheticResultCollection:results = 
    Grid.ForEach(task in tasks, opts) => Synthetic.GenerateBatch(task);

Key mechanisms:

Mechanism	Benefit
Pull-based distribution	Nodes request work when ready—no coordinator bottleneck
Seeded generation	Every image reproducible from its seed value
Variation control	Systematic coverage of the parameter space
Fault tolerance	If a node fails, its incomplete tasks return to the queue

Result: 1 million images generated in under 3 hours across 30 render nodes. Full reproducibility—regenerate any image by replaying its seed.

What You Get

Throughput

WitEngine maximizes hardware utilization through intelligent scheduling:

Single Machine	WitEngine (20 nodes)	Speedup
Preprocess 1M images: 8 hours	24 minutes	20×
Inference on 10M samples: 5 hours	15 minutes	20×
100 hyperparameter configs: 4 days	5 hours	19×
Generate 1M synthetic images: 60 hours	3 hours	20×

The benchmark system measures actual node performance for your specific workloads. Fast nodes get more work. Slow nodes aren't wasted—they contribute what they can.

Cost Predictability

Fixed infrastructure costs, variable workload scales:

Pre-partitioned work — Know task counts before execution starts
Resource requirements — Tasks only run on capable nodes
No over-provisioning — Mixed hardware contributes proportionally
Batch accounting — Track compute time per task for chargeback

Reproducibility

Every execution can be replayed:

Deterministic task assignment — Same inputs produce same distribution
Seeded randomness — Stochastic processes controlled by explicit seeds
Version-locked models — Model paths explicit in scripts
Immutable configurations — Parameters captured at execution time

Debugging a failed batch? Replay the exact task on a single node with full tracing enabled.

Observability

Real-time visibility into distributed execution:

~ Enable progress tracking ~
opts = ProcessingOptions.SetProgressReporting(opts, true);
opts = ProcessingOptions.SetProgressInterval(opts, 10);

~ Progress callback ~
Grid.OnProgress(progress =>
{
    Trace("Progress:", String.Format("{0:F1}%", Double.Multiply(progress.Percent, 100)));
    Trace("Tasks completed:", progress.CompletedTasks, "/", progress.TotalTasks);
    Trace("Active nodes:", progress.ActiveNodes);
    
    If(progress.StalledTasks > 0)
    {
        Alert.Send(endpoint, "StalledTasks", {"count": progress.StalledTasks});
    }
});

Track at every level:

Level	Visibility
Job	Overall progress, wall-clock time, success/failure counts
Task	Individual status, node assignment, processing duration
Node	Current load, completed tasks, error rate
Sample	Per-item results, confidence scores, processing metadata

Typical Workflow

1. Define your data pipeline

Identify independent units of work—batches, chunks, or individual samples that can process in parallel.

2. Create task collections

Build a collection of task objects, each containing the inputs and parameters for one unit of work:

InferenceTaskCollection:tasks = [];

ForEach(batch in batches)
{
    InferenceTask:task = InferenceTask.Create(modelPath, batch);
    tasks = Collection.Add(tasks, task);
}

3. Configure distribution strategy

Choose based on your workload characteristics:

Strategy	Use When
Balanced	Task durations are predictable and similar
Queued	Task durations vary significantly

4. Specify node requirements

Declare what hardware your tasks need:

ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");
opts = ProcessingOptions.SetRequirement(opts, "VRAM", "8GB");
opts = ProcessingOptions.SetRequirement(opts, "CUDA", "11.0+");

5. Execute and aggregate

Distribute tasks and collect results:

ResultCollection:results = 
    Grid.ForEach(task in tasks, opts) => ML.ProcessBatch(task);

6. Analyze and report

Process results, compute statistics, generate outputs:

Int:successCount = 0;
Double:totalTime = 0;

ForEach(result in results)
{
    If(result.Success == true)
    {
        successCount = Int.Add(successCount, 1);
        totalTime = Double.Add(totalTime, result.ProcessingTime);
    }
}

Trace("Completed:", successCount, "/", Collection.Count(tasks));
Trace("Throughput:", Double.Divide(totalProcessed, wallClockTime), "samples/sec");

Scaling to the Cloud: From Workstations to Global Compute Mesh

WitEngine's architecture doesn't impose artificial limits on cluster size. The same script that runs on 5 workstations runs on 5,000 cloud instances—or 50,000. This isn't theoretical: WitEngine is the core of WitCloud, a distributed computing platform with two deployment modes designed for exactly this scale.

WitCloud: Two Paths to Scale

Local WitCloud (Enterprise) — Deploy within your organization. Your hardware, your data, your control. Connect offices across time zones into a unified compute mesh.

OmnibusCloud (Global Service) — The public instance of WitCloud. A worldwide distributed computing platform where anyone can contribute resources and anyone can run workloads. Currently in closed beta with plans for open access.

Why This Matters for AI/ML

Machine learning workloads are bursty. You don't need 10,000 GPUs every day—but when you're retraining a model on fresh data or running inference on a massive backlog, you need them now.

Scenario	On-Premise (20 nodes)	Local WitCloud (500 nodes)	OmnibusCloud (20,000+ nodes)
Embed 50M documents	8 hours	20 minutes	30 seconds
Inference on 100M samples	12 hours	30 minutes	45 seconds
Hyperparameter search (10,000 configs)	5 days	5 hours	6 minutes
Generate 10M synthetic images	30 hours	1.5 hours	2 minutes

The script doesn't change. The infrastructure scales.

Elastic Scaling Pattern

~ Define task collection (same as always) ~
EmbeddingTaskCollection:tasks = CreateEmbeddingTasks(documents, chunkSize);

~ Configure for cloud execution ~
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");
opts = ProcessingOptions.SetRequirement(opts, "VRAM", "16GB");

~ Local WitCloud, OmnibusCloud, or hybrid—script is identical ~
EmbeddingResultCollection:results = 
    Grid.ForEach(task in tasks, opts) => ML.GenerateEmbeddings(task);

Local WitCloud: The "Office at Night" Advantage

Consider a typical enterprise: 500 workstations across three offices. During business hours, employees use perhaps 5-10% of CPU capacity. From 8 PM to 8 AM—12 hours every night—these machines sit completely idle.

Local WitCloud turns this waste into a free supercomputer:

Daytime: 500 machines × 5% utilization = 25 machines worth of compute
Nighttime with WitCloud: 500 machines × 100% utilization = 500 machines

Result: 20× more compute capacity, zero additional hardware cost

For ML teams, this means:

Nightly model retraining on the full dataset, not samples
Hyperparameter sweeps that would be cost-prohibitive on cloud
Batch inference backlogs cleared by morning

OmnibusCloud: Global Compute on Demand

OmnibusCloud extends this model globally—a worldwide mesh of contributed compute resources accessible via the internet.

Current status: Closed beta, initially focused on distributed rendering (Blender plugin). ML workloads planned for Phase 2.

The vision: Submit an embedding job tonight, have it processed by idle machines across time zones, results ready by morning. Pay-per-use or contribute your own idle resources.

Architecture for Massive Scale

WitEngine's design principles enable cloud-scale deployment:

Principle	Benefit at Scale
Stateless nodes	Nodes don't share state—spin up/down freely
Pull-based distribution	No coordinator bottleneck; nodes request work when ready
Capability matching	Heterogeneous cloud instances (spot, on-demand, different GPU types) contribute appropriately
Fault tolerance	Node failures don't lose work—tasks return to queue
Benchmark-aware allocation	Different instance types get proportional workloads

Hybrid Deployment

Many organizations run hybrid: baseline on-premise capacity plus cloud burst for peaks.

Typical hybrid pattern:

On-premise cluster (always running):
  - 20 GPU workstations
  - Handles daily inference load
  - Predictable cost

Cloud burst (on-demand):
  - Scale to 500-5,000 instances
  - Monthly model retraining
  - Quarterly full dataset reprocessing
  - Pay only for burst duration

The Queued strategy handles this naturally—cloud nodes join the pool, pull work, and disappear when done. No reconfiguration required.

Cost Optimization at Scale

Cloud scale introduces cost considerations that WitEngine's architecture addresses:

Challenge	WitEngine Approach
Spot instance interruption	Tasks return to queue, picked up by other nodes
Mixed instance pricing	Benchmark system allocates work proportional to cost-performance
Idle time waste	Queued strategy ensures nodes work until queue empty
Over-provisioning	Real-time progress tracking shows actual completion ETA

When you're paying per-second for 10,000 GPU instances, efficiency isn't optional—it's the difference between a viable pipeline and a budget disaster.

Summary

WitEngine transforms ML batch processing from infrastructure management into workflow definition. You describe what needs computing—batches of inference, embeddings to generate, synthetic data to create. WitEngine handles task distribution, node selection, fault recovery, and result aggregation.

The result: faster iteration cycles, predictable costs, reproducible results, and full visibility into your distributed workloads.