Published on

Humanoid Dexterity: Why Our Robot Achieved 100% Success on Fruit Picking but 0% on Block Stacking.

Authors

Some learnings after training vision-language-action models for humanoid manipulation:

šŸ“ The Camera Dependency Problem: Models trained on fixed viewpoints fail catastrophically with 30° camera shifts. It’s not poor generalization—it’s optical illusion at the neural network level.

šŸ¤– The Embodiment Gap: Using Apple Vision Pro for teleoperation, our operators needed 12 attempts to pick up an apple. Why? No depth perception. No force feedback. If humans struggle with these constraints, imagine what we’re asking AI to do.

⚔ Inference Bottleneck: VLMs run at ~5Hz. Smooth robot control needs 20Hz minimum. The dual-architecture approach (System 1 for fast control, System 2 for reasoning) helps but introduces trajectory discontinuities.

The difference between our fruit-picking success and block-stacking failure is revealing: āœ… Fruit picking: Single object, large grasp tolerance, terminal success state āŒ Block stacking: Sequential precision tasks, force-sensitive placement, cumulative error propagation

Between ā€œgreat ideaā€ and ā€œworking robotā€: āž”ļø 3 weeks converting Unitree teleop data to LeRobot format

āž”ļø Custom bridges between NVIDIA’s own tools (IsaacLab doesn’t natively interface with GR00T)

āž”ļø Force Feedback Vacuum: Without tactile sensing, grasping becomes binary (success/failure) rather than continuous adjustment.

āž”ļø Depth Blindness: Current RGB-only models lack the stereo vision humans take for granted. Adding RGB-D could be transformative.

āž”ļø And least we forget - the sim2real gap: COSMOS + IsaacSim could potentially generate 20-100x training data, but sim2real transfer remains challenging.

Every viral humanoid demo represents hundreds of failed attempts and carefully controlled conditions. This isn’t fraud—it’s the difference between possibility and reliability.

The path from demo to deployment isn’t just long—it’s filled with fundamental challenges that simply throwing compute at will not solve. Nuaced, interconnected techniques are the name of the game….And that’s exactly why it’s worth doing.

Author

Ai Base Network (ABN), ABN ASIA was founded by people with deep roots in academia, with work experience in the US, Holland, Hungary, Japan, South Korea, Singapore, and Vietnam. ABN Asia is where academia and technology meet opportunity. With our cutting-edge solutions and competent software development services, we're helping businesses level up and take on the global scene. Our commitment: Faster. Better. More reliable. In most cases: Cheaper as well.

Feel free to reach out to us whenever you require IT services, digital consulting, off-the-shelf software solutions, or if you'd like to send us requests for proposals (RFPs). You can contact us at [email protected]. We're ready to assist you with all your technology needs.

ABNAsia.org

Ā© ABN ASIA