World Mining Congress 2026

A Robotic Testbed for Autonomous Dump Pocket Cleaning Using Imitation Learning

1Pontifical Catholic University of Peru (PUCP)    2Antamina    3NONHUMAN Lab

Abstract

Dump pocket blockages at primary crushers cause production downtime and require manual clearing in confined spaces, leading to multiple fatalities annually from engulfment. Current autonomous approaches in mining remain limited, and the feasibility of state-of-the-art imitation learning (IL) for excavation tasks has not been systematically evaluated. This paper introduces an experimental testbed and benchmark for evaluating end-to-end IL architectures on autonomous dump pocket cleaning under controlled laboratory conditions. We benchmark four IL architectures: Action Chunking with Transformers (ACT), Diffusion Policy, and Vision-Language-Action models (π₀.₅ and SmolVLA), using a low-cost SO-ARM100 platform ($250) with granular bentonite material. In a preliminary single-session evaluation (10-minute continuous autonomous operation per model), π₀.₅ removes 404 g (40.4 g/min, approximately 65% of the expert teleoperation estimate of 620 g), while ACT removes 169 g, SmolVLA 113 g, and Diffusion Policy 57 g. Notably, ACT removes more material than SmolVLA despite lacking pretrained representations, suggesting that pretraining benefit is architecture- and scale-dependent rather than universal in this domain. Diffusion Policy is notably the slowest, consistent with its iterative denoising inference process. This work establishes a reproducible benchmark and open dataset (162 demonstrations) to support future research on autonomous confined-space operations.

Key Results

Single 10-minute autonomous session per model. Expert baseline estimated from demonstration data (62.04 g/min × 10 min). N=1 session per model — values represent single observed measurements.

Model Removed (g / 10 min) Rate (g/min) % of Expert (est.)
Expert Teleoperation (est.) ≈ 620 62.0 100%
π₀.₅ (VLA) 404 40.4 65%
ACT 169 16.9 27%
SmolVLA 113 11.3 18%
Diffusion Policy 57 5.7 9%

Success Rollouts

One representative autonomous run per model (10-minute session, muted). Filmed with the overhead and wrist cameras used during evaluation.

π₀.₅  VLA · ~3.7B params
404 g removed · 40.4 g/min · 65% of expert
ACT  No pretraining
169 g removed · 16.9 g/min · 27% of expert
SmolVLA  VLA · 450M params
113 g removed · 11.3 g/min · 18% of expert
Diffusion Policy  No pretraining
57 g removed · 5.7 g/min · 9% of expert

Figures

System overview
Figure 1. Experimental setup for autonomous dump pocket cleaning using imitation learning. Top left: teleoperated dataset (162 demonstrations). Top right: physical testbed with dual RGB cameras (top-view, wrist-mounted), dump pocket, and SO-ARM100 platform (follower and leader arms). Bottom: imitation learning pipeline receiving observations (camera streams, task prompt*, proprioceptive state), processing through policy network, and executing actions in closed loop. *Task prompt used only by VLA models (π₀.₅, SmolVLA) for language conditioning; ACT and Diffusion operate on vision and proprioception only.
Benchmark results
Figure 2. Benchmark results and offline training metrics. (a) Validation loss trajectories; ACT is plotted as (val/loss)² for visual scale comparability — ACT uses a composite L₁+λKL objective while other models use MSE/flow-matching losses. Best checkpoints (stars): ACT 0.0754 @ 9,205 steps; π₀.₅ 0.0745 @ 25,774 steps; SmolVLA 0.0254 @ 14,728 steps; Diffusion 0.0141 @ 7,252 steps. This panel is a convergence diagnostic; cross-architecture comparison uses real-robot results only. (b) Total material removed (g) in a single 10-minute autonomous session. Expert baseline (~620 g) estimated from demonstration data (62.0 g/min, 2542 g over 40.99 min). π₀.₅ removes 404 g, followed by ACT (169 g), SmolVLA (113 g), and Diffusion Policy (57 g).

Resources

Citation

@inproceedings{meza2026autonomous,
  title={A Robotic Testbed for Autonomous Dump Pocket Cleaning Using Imitation Learning},
  author={Meza Pinedo, Brik Henrry and Pajares Correa, Brian},
  booktitle={World Mining Congress 2026},
  year={2026}
}

Acknowledgments

We thank NONHUMAN for providing laboratory facilities and infrastructure support that enabled this research. We are grateful to the open-source robotics community, particularly the developers of the SO-ARM100 platform and the open-source robot learning library, whose accessible tools and collaborative spirit made these experiments possible. We also acknowledge PUCP for institutional support.