AI/ML Batch Processing with WitEngine
Machine learning workflows demand massive computational resources. Training pipelines preprocess terabytes of data. Inference systems classify millions of samples daily. Evaluation runs span thousands of hyperparameter combinations. When a single machine takes days, distribution becomes essential.
WitEngine transforms these workflows from bottlenecks into streamlined operations. Instead of managing infrastructure, you write simple scripts that express what you want computed—WitEngine handles where and how.
Scenario 1: Batch Inference at Scale
The challenge: A production system needs to classify 10 million images using a pre-trained ResNet model. Single-threaded processing estimates 5+ hours. Users expect results within minutes.
How WitEngine helps:
WitEngine's orchestration layer breaks the dataset into batches and distributes them across available nodes. The Grid.ForEach primitive handles task assignment automatically:
~ Load model configuration ~
String:modelPath = "/models/resnet50.onnx";
Int:batchSize = 32;
Double:confidenceThreshold = 0.8;
~ Prepare inference tasks ~
StringCollection:imageFiles = File.ListFiles("/data/images", "*.jpg");
ImageBatchCollection:batches = Image.CreateBatches(imageFiles, batchSize);
~ Configure distribution ~
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");
opts = ProcessingOptions.SetRequirement(opts, "VRAM", "8GB");
~ Distribute inference ~
ClassificationResultCollection:results =
Grid.ForEach(batch in batches, opts) => ML.ClassifyBatch(model, batch, confidenceThreshold);Key mechanisms:
| Mechanism | Benefit |
|---|---|
| Queued strategy | Nodes pull tasks as they finish—handles variable inference times gracefully |
| GPU requirements | Only nodes with sufficient VRAM receive tasks |
| Model caching | Model loads once per node, shared across all batches |
| Automatic batching | Optimal batch sizes maximize GPU utilization |
Result: 10 million images processed in ~15 minutes across 20 GPU nodes instead of 5+ hours on one machine.
Scenario 2: Embedding Generation
The challenge: A semantic search system requires embeddings for 50 million text documents. Each document passes through a transformer model. Processing sequentially would take weeks.
How WitEngine helps:
Embedding generation is embarrassingly parallel—each document processes independently. WitEngine exploits this by distributing chunks across heterogeneous hardware:
~ Configure embedding model ~
String:modelPath = "/models/sentence-transformer";
Int:chunkSize = 10000;
String:device = "cuda";
~ Load documents in chunks ~
DocumentChunkCollection:chunks = Document.LoadChunks("/data/corpus", chunkSize);
~ Create embedding tasks ~
EmbeddingTaskCollection:tasks = [];
ForEach(chunk in chunks)
{
EmbeddingTask:task = EmbeddingTask.Create(chunk, modelPath);
task = EmbeddingTask.SetDevice(task, device);
task = EmbeddingTask.SetNormalize(task, true);
tasks = Collection.Add(tasks, task);
}
~ Distribute across mixed hardware ~
ProcessingOptions:opts = ProcessingOptions.Create("Balanced");
~ Some nodes have A100s, others have consumer GPUs ~
EmbeddingResultCollection:results =
Grid.ForEach(task in tasks, opts) => ML.GenerateEmbeddings(task);Key mechanisms:
| Mechanism | Benefit |
|---|---|
| Heterogeneous node support | A100 nodes get more chunks than RTX 3070 nodes—proportional to benchmarked speed |
| Balanced strategy | Pre-assigns work based on node performance, minimizing idle time |
| Chunk-based processing | Memory-efficient—no single node loads the entire corpus |
| Normalized outputs | Consistent embedding format regardless of which node processed them |
Result: 50 million documents embedded in hours instead of weeks. Faster nodes automatically receive proportionally more work.
Scenario 3: Synthetic Data Generation
The challenge: Training a computer vision model requires 1 million synthetic images with controlled variations—lighting, angles, backgrounds, occlusions. Rendering each image takes 2-10 seconds depending on scene complexity.
How WitEngine helps:
Synthetic data generation has highly variable task durations. WitEngine's queued strategy handles this naturally—fast tasks don't block on slow ones:
~ Define generation parameters ~
Int:totalImages = 1000000;
Int:imagesPerTask = 100;
~ Create generation tasks with varying complexity ~
SyntheticTaskCollection:tasks = [];
Loop(Int.Divide(totalImages, imagesPerTask))
{
SyntheticTask:task = SyntheticTask.Create();
task = SyntheticTask.SetOutputSize(task, 512, 512);
task = SyntheticTask.SetVariations(task, {
"lighting": ["studio", "natural", "dramatic"],
"angle": [-30, 0, 30, 45],
"background": "random",
"occlusion": [0.0, 0.1, 0.2]
});
task = SyntheticTask.SetSeed(task, Random.Int()); ~ Reproducibility ~
tasks = Collection.Add(tasks, task);
}
~ Distribute generation ~
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");
SyntheticResultCollection:results =
Grid.ForEach(task in tasks, opts) => Synthetic.GenerateBatch(task);Key mechanisms:
| Mechanism | Benefit |
|---|---|
| Pull-based distribution | Nodes request work when ready—no coordinator bottleneck |
| Seeded generation | Every image reproducible from its seed value |
| Variation control | Systematic coverage of the parameter space |
| Fault tolerance | If a node fails, its incomplete tasks return to the queue |
Result: 1 million images generated in under 3 hours across 30 render nodes. Full reproducibility—regenerate any image by replaying its seed.
What You Get
Throughput
WitEngine maximizes hardware utilization through intelligent scheduling:
| Single Machine | WitEngine (20 nodes) | Speedup |
|---|---|---|
| Preprocess 1M images: 8 hours | 24 minutes | 20× |
| Inference on 10M samples: 5 hours | 15 minutes | 20× |
| 100 hyperparameter configs: 4 days | 5 hours | 19× |
| Generate 1M synthetic images: 60 hours | 3 hours | 20× |
The benchmark system measures actual node performance for your specific workloads. Fast nodes get more work. Slow nodes aren't wasted—they contribute what they can.
Cost Predictability
Fixed infrastructure costs, variable workload scales:
- Pre-partitioned work — Know task counts before execution starts
- Resource requirements — Tasks only run on capable nodes
- No over-provisioning — Mixed hardware contributes proportionally
- Batch accounting — Track compute time per task for chargeback
Reproducibility
Every execution can be replayed:
- Deterministic task assignment — Same inputs produce same distribution
- Seeded randomness — Stochastic processes controlled by explicit seeds
- Version-locked models — Model paths explicit in scripts
- Immutable configurations — Parameters captured at execution time
Debugging a failed batch? Replay the exact task on a single node with full tracing enabled.
Observability
Real-time visibility into distributed execution:
~ Enable progress tracking ~
opts = ProcessingOptions.SetProgressReporting(opts, true);
opts = ProcessingOptions.SetProgressInterval(opts, 10);
~ Progress callback ~
Grid.OnProgress(progress =>
{
Trace("Progress:", String.Format("{0:F1}%", Double.Multiply(progress.Percent, 100)));
Trace("Tasks completed:", progress.CompletedTasks, "/", progress.TotalTasks);
Trace("Active nodes:", progress.ActiveNodes);
If(progress.StalledTasks > 0)
{
Alert.Send(endpoint, "StalledTasks", {"count": progress.StalledTasks});
}
});Track at every level:
| Level | Visibility |
|---|---|
| Job | Overall progress, wall-clock time, success/failure counts |
| Task | Individual status, node assignment, processing duration |
| Node | Current load, completed tasks, error rate |
| Sample | Per-item results, confidence scores, processing metadata |
Typical Workflow
1. Define your data pipeline
Identify independent units of work—batches, chunks, or individual samples that can process in parallel.
2. Create task collections
Build a collection of task objects, each containing the inputs and parameters for one unit of work:
InferenceTaskCollection:tasks = [];
ForEach(batch in batches)
{
InferenceTask:task = InferenceTask.Create(modelPath, batch);
tasks = Collection.Add(tasks, task);
}3. Configure distribution strategy
Choose based on your workload characteristics:
| Strategy | Use When |
|---|---|
| Balanced | Task durations are predictable and similar |
| Queued | Task durations vary significantly |
4. Specify node requirements
Declare what hardware your tasks need:
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");
opts = ProcessingOptions.SetRequirement(opts, "VRAM", "8GB");
opts = ProcessingOptions.SetRequirement(opts, "CUDA", "11.0+");5. Execute and aggregate
Distribute tasks and collect results:
ResultCollection:results =
Grid.ForEach(task in tasks, opts) => ML.ProcessBatch(task);6. Analyze and report
Process results, compute statistics, generate outputs:
Int:successCount = 0;
Double:totalTime = 0;
ForEach(result in results)
{
If(result.Success == true)
{
successCount = Int.Add(successCount, 1);
totalTime = Double.Add(totalTime, result.ProcessingTime);
}
}
Trace("Completed:", successCount, "/", Collection.Count(tasks));
Trace("Throughput:", Double.Divide(totalProcessed, wallClockTime), "samples/sec");Scaling to the Cloud: From Workstations to Global Compute Mesh
WitEngine's architecture doesn't impose artificial limits on cluster size. The same script that runs on 5 workstations runs on 5,000 cloud instances—or 50,000. This isn't theoretical: WitEngine is the core of WitCloud, a distributed computing platform with two deployment modes designed for exactly this scale.
WitCloud: Two Paths to Scale
Local WitCloud (Enterprise) — Deploy within your organization. Your hardware, your data, your control. Connect offices across time zones into a unified compute mesh.
OmnibusCloud (Global Service) — The public instance of WitCloud. A worldwide distributed computing platform where anyone can contribute resources and anyone can run workloads. Currently in closed beta with plans for open access.
Why This Matters for AI/ML
Machine learning workloads are bursty. You don't need 10,000 GPUs every day—but when you're retraining a model on fresh data or running inference on a massive backlog, you need them now.
| Scenario | On-Premise (20 nodes) | Local WitCloud (500 nodes) | OmnibusCloud (20,000+ nodes) |
|---|---|---|---|
| Embed 50M documents | 8 hours | 20 minutes | 30 seconds |
| Inference on 100M samples | 12 hours | 30 minutes | 45 seconds |
| Hyperparameter search (10,000 configs) | 5 days | 5 hours | 6 minutes |
| Generate 10M synthetic images | 30 hours | 1.5 hours | 2 minutes |
The script doesn't change. The infrastructure scales.
Elastic Scaling Pattern
~ Define task collection (same as always) ~
EmbeddingTaskCollection:tasks = CreateEmbeddingTasks(documents, chunkSize);
~ Configure for cloud execution ~
ProcessingOptions:opts = ProcessingOptions.Create("Queued");
opts = ProcessingOptions.SetRequirement(opts, "GPU", "true");
opts = ProcessingOptions.SetRequirement(opts, "VRAM", "16GB");
~ Local WitCloud, OmnibusCloud, or hybrid—script is identical ~
EmbeddingResultCollection:results =
Grid.ForEach(task in tasks, opts) => ML.GenerateEmbeddings(task);Local WitCloud: The "Office at Night" Advantage
Consider a typical enterprise: 500 workstations across three offices. During business hours, employees use perhaps 5-10% of CPU capacity. From 8 PM to 8 AM—12 hours every night—these machines sit completely idle.
Local WitCloud turns this waste into a free supercomputer:
Daytime: 500 machines × 5% utilization = 25 machines worth of compute
Nighttime with WitCloud: 500 machines × 100% utilization = 500 machines
Result: 20× more compute capacity, zero additional hardware costFor ML teams, this means:
- Nightly model retraining on the full dataset, not samples
- Hyperparameter sweeps that would be cost-prohibitive on cloud
- Batch inference backlogs cleared by morning
OmnibusCloud: Global Compute on Demand
OmnibusCloud extends this model globally—a worldwide mesh of contributed compute resources accessible via the internet.
Current status: Closed beta, initially focused on distributed rendering (Blender plugin). ML workloads planned for Phase 2.
The vision: Submit an embedding job tonight, have it processed by idle machines across time zones, results ready by morning. Pay-per-use or contribute your own idle resources.
Architecture for Massive Scale
WitEngine's design principles enable cloud-scale deployment:
| Principle | Benefit at Scale |
|---|---|
| Stateless nodes | Nodes don't share state—spin up/down freely |
| Pull-based distribution | No coordinator bottleneck; nodes request work when ready |
| Capability matching | Heterogeneous cloud instances (spot, on-demand, different GPU types) contribute appropriately |
| Fault tolerance | Node failures don't lose work—tasks return to queue |
| Benchmark-aware allocation | Different instance types get proportional workloads |
Hybrid Deployment
Many organizations run hybrid: baseline on-premise capacity plus cloud burst for peaks.
Typical hybrid pattern:
On-premise cluster (always running):
- 20 GPU workstations
- Handles daily inference load
- Predictable cost
Cloud burst (on-demand):
- Scale to 500-5,000 instances
- Monthly model retraining
- Quarterly full dataset reprocessing
- Pay only for burst durationThe Queued strategy handles this naturally—cloud nodes join the pool, pull work, and disappear when done. No reconfiguration required.
Cost Optimization at Scale
Cloud scale introduces cost considerations that WitEngine's architecture addresses:
| Challenge | WitEngine Approach |
|---|---|
| Spot instance interruption | Tasks return to queue, picked up by other nodes |
| Mixed instance pricing | Benchmark system allocates work proportional to cost-performance |
| Idle time waste | Queued strategy ensures nodes work until queue empty |
| Over-provisioning | Real-time progress tracking shows actual completion ETA |
When you're paying per-second for 10,000 GPU instances, efficiency isn't optional—it's the difference between a viable pipeline and a budget disaster.
Summary
WitEngine transforms ML batch processing from infrastructure management into workflow definition. You describe what needs computing—batches of inference, embeddings to generate, synthetic data to create. WitEngine handles task distribution, node selection, fault recovery, and result aggregation.
The result: faster iteration cycles, predictable costs, reproducible results, and full visibility into your distributed workloads.