In this episode I sat down with Kentaro Wada, a computer vision engineer at Mujin and creator of LabelMe, to explore the evolution of image annotation workflows. We discuss how his need to label data for a robotics challenge led to building one of the most widely used open-source annotation tools, and how it has evolved alongside the shift from traditional computer vision to deep learning. Kentaro explains the impact of foundation models like Segment Anything (SAM), and how annotation is rapidly moving toward a prompt-and-verify paradigm where models do the heavy lifting and humans focus on quality control. We also dive into his recent work integrating SAM into LabelMe, the challenges of applying these models to satellite imagery, and why approaches like bounding-box prompting outperform text in that domain. Finally, we cover new support for large, multi-channel geospatial data, practical deployment considerations, and what this means for scaling annotation in real-world machine learning systems. Note that a recording of this conversation, along with a demonstration of geospatial annotation using LabelMe, is available on YouTube via the links below:
Bio: Kentaro Wada was born in Japan in 1994. He received his B.Sc. (2016) and M.Sc. (2018) from Mechanical Engineering and Computer Science Department in The University of Tokyo (UTokyo). In his research at UTokyo, he was working on learning-based scene understanding for robotic manipulation at JSK Laboratory supervised by Prof. Masayuki Inaba and Prof. Kei Okada. He received his PhD in 2022, at Dyson Robotics Laboratory in Imperial College London supervised by Prof. Andrew Davison. During his PhD, he worked on object-level semantic scene understanding, a general scene representation useful for robotic manipulation, and showed several novel capabilities of robots. He joined Mujin, Inc. in 2022 as a computer vision engineer, and is working on advancing robots' capabilities in the real-world environment.











