# Benzun Pious Wisely Babu, Ph.D.

I am passionate about discovery and exploration. My research is in the field of 3D computer Vision with focus on localization (SLAM), geometric perception, sensor fusion, state estimation and spatial AI. The most direct applications of my work are in the domain of mixed reality and robotics.

Currently I am working as a computer vision engineer at Apple. Previously, I was at Bosch Research & Technology center, Sunnyvale. I completed my Ph.D. in Robotics at Worcester Polytechnic Institute (WPI). During my time at WPI, I was part of team WPI-CMU in DARPA Robotics Challenge. Our robot never had a fall nor a restart. We came 2nd in term of points (7th overall). I also worked at the Precision Personnel Locator lab on visual localization for first responders. My Ph.D. dissertation was on Motion conflict aware visual inertial localization. I was guided by Dr. R.J Duckworth, Dr. D. Cyganski and Dr. Michael Gennert. I am grateful to all my collegues, collaborators, advisors, friends and family who empowered me in this journey of exploration.

### Publications

#### Analytic Combined IMU Integrator (ACI2) for Visual Inertial Navigation

Y. Yang, B. P. Wisely Babu, C. Chen, G. Huang and L. Ren
Batch optimization based inertial measurement unit (IMU) and visual sensor fusion enables high rate localization for many robotic tasks. However, it remains a challenge to ensure that the batch optimization is computationally efficient while being consistent for high rate IMU measurements without marginalization. In this paper, we derive inspiration from maximum likelihood estimation with partial-fixed estimates to provide a unified approach for handing both IMU pre-integration and time-offset calibration. We present a modularized analytic combined IMU integrator (ACI^2) with elegant derivations for IMU integrations, bias Jabcobians and related covariances. With the aim to simplify our derivation, we also prove that the right Jacobians for Hamilton quaterions and SO(3) are equivalent. Finally, we present a time offset calibrator that operates by fixing the linearization point for a given time offset. This reduces re-integration of the IMU measurements and thus improve efficiency. The proposed ACI^2 and time-offset calibration is verified by intensive Monte-Carlo simulations generated from real world datasets. A proof-of-concept real world experiment is also conducted to verify the proposed ACI^2 estimator.
                      @inproceedings{yulin_icra2020,
title   = {Analytic Combined IMU Integration (ACI^2) Visual Inertial Navigation},
author  = {Yulin Yang, Benzun Pious Wisely Babu, Chuchu Chen, Guoquan Huang and Liu Ren},
journal = {2020 IEEE International Conference of Robotics and Automation (ICRA)},
month   = {May},
year    = {2020}
}



#### On Exploiting Per-Pixel Motion Conflicts to Extract Secondary Motions

B. P. Wisely Babu, Z. Yan, M. Ye and L. Ren
Ubiquitous Augmented Reality requires robust localization in complex daily environments. The combination of camera and Inertial Mersurement Unit (IMU) has shown promising results for robust localization due to the complementary characteristics of the visual and inertial modalities. However, there exists many cases where the measurements from visual and inertial modalities do not provide a single consistent motion estimate thus causing disagreement on the estimated motion. Limited literature has addressed this problem associated with sensor fusion for localization. Since the disagreement is not a result of measurement noises, existing outlier rejection techniques are not suitable to address this problem. In this paper, we propose a novel approach to handle the disagreement as motion conflict with two key components. The first one is a generalized Hidden Markov Model (HMM) that formulates the tracking and management of the primary motion and the secondary motion as a single estimation problem. The second component is an epipolar constrained Deep Neural Network that generates a per-pixel motion conflict probability map. Experimental evaluations demonstrate significant improvement to the tracking accuracy in cases of strong motion conflict compared to previous state-of-the-art algorithms for localization. Moreover, as a consequence of motion tracking on the secondary maps, our solution enables augmentation of virtual content attached to secondary motions, which brings us one step closer to Ubiquitous Augmented Reality.
                      @inproceedings{babu2018mcvio,
title       ={{On Exploiting Per-Pixel Motion Conflicts to Extract Secondary Motions}},
author      = {Wisely Babu, Benzun Pious and
Yan, Zhixin and
Ye, Mao and
Ren, Liu},
booktitle   = {2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR},
month       = {October},
year        = {2018}
}



#### Detection and Resolution of Motion Conflict in Visual Inertial Odometry

B. P. Wisely Babu, D. Cyganski, J. Duckworth S. Kim
In this paper, we present a novel method to detect and resolve motion conflicts in visual-inertial odometry. Recently, it has been common to integrate an IMU sensor with visual odometry in order to improve localization accuracy and robustness. However, when a disagreement between the two sensor modalities occurs, the localization accuracy reduces drastically and leads to irreversible errors. In such conditions, multiple motion estimates based on the set of observations used are possible. This creates a conflict motion conflict in determining which observations to use for accurate ego-motion estimation. Therefore, we present a method to detect motion conflicts based on per-frame positional estimate discrepancy and per-landmark reprojection errors. Additionally, we also present a method to resolve motion conflicts by eliminating inconsistent IMU and landmark measurements. Finally, we implement Motion Conflict aware Visual Inertial Odometry (MC-VIO) by combining both detection and resolution of motion conflicts. We perform quantitative and qualitative evaluation of MC-VIO on visually and inertially challenging datasets. Experimental results indicate that the MC-VIO algorithm reduces the increase in absolute trajectory error by 80\% and the relative pose error by 60\% for scenes with motion conflict, in comparison to the state-of-the-art reference VIO algorithm.
                      @inproceedings{babu2018ICRA,
title       ={{Detection and Resolution of Motion Conflict in Visual Inertial Odometry}},
author      = {Wisely Babu, Benzun Pious and
Cyganski, David and
Duckworth, James and
Kim, Soohwan},
booktitle   ={Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
month       = {May},
year        = {2018}
}



#### Comparing apples and oranges: Off‐road pedestrian detection on the National Robotics Engineering Center agricultural person‐detection dataset

P. Zachary, T. Trenton Tabor, H. Peiyun, C. Jonathan K, R. Deva, W. Carl, B. P. Wisely Babu, H. Herman
Person detection from vehicles has made rapid progress recently with the advent of multiple high‐quality datasets of urban and highway driving, yet no large‐scale benchmark is available for the same problem in off‐road or agricultural environments. Here we present the National Robotics Engineering Center (NREC) Agricultural Person‐Detection Dataset to spur research in these environments. It consists of labeled stereo video of people in orange and apple orchards taken from two perception platforms (a tractor and a pickup truck), along with vehicle position data from Real Time Kinetic (RTK) GPS. We define a benchmark on part of the dataset that combines a total of 76k labeled person images and 19k sampled person‐free images. The dataset highlights several key challenges of the domain, including varying environment, substantial occlusion by vegetation, people in motion and in nonstandard poses, and people seen from a variety of distances; metadata are included to allow targeted evaluation of each of these effects. Finally, we present baseline detection performance results for three leading approaches from urban pedestrian detection and our own convolutional neural network approach that benefits from the incorporation of additional image context. We show that the success of existing approaches on urban data does not transfer directly to this domain.
                      @article{zac_jfr2017,
author = {Pezzementi, Zachary and Tabor, Trenton and Hu, Peiyun and Chang, Jonathan K. and Ramanan, Deva and Wellington, Carl and Wisely Babu, Benzun P. and Herman, Herman},
title = {Comparing apples and oranges: Off-road pedestrian detection on the National Robotics Engineering Center agricultural person-detection dataset},
journal = {Journal of Field Robotics},
volume = {35},
number = {4},
pages = {545-563},
doi = {10.1002/rob.21760},
year = {2018}



#### Team WPI-CMU: Achieving Reliable Humanoid Behavior in the DARPA Robotics Challenge.

M. DeDonato, F. Polido, K. Knoedler, B. P. Wisely Babu, N. Banerjee, C. P. Bove, X. Cui, R. Du, P. Franklin, J. P. Graff, P. He, A. Jaeger, L. Li, D. Berenson, M.A. Gennert, S. Feng, C. Liu, X. Xinjilefu, J. Kim, C.G. Atkeson, X. Long, and T. Padır
In the DARPA Robotics Challenge (DRC), participating human-robot teams were required to integrate mobility, manipulation, perception, and operator interfaces to complete a simulated disaster mission. We describe our approach using the humanoid robot Atlas Unplugged developed by Boston Dynamics. We focus on our approach, results, and lessons learned from the DRC Finals to demonstrate our strategy, including extensive operator practice, explicit monitoring for robot errors, adding additional sensing, and enabling the operator to control and monitor the robot at varying degrees of abstraction. Our safety-first strategy worked: we avoided falling, and remote operators could safely recover from difficult situations. We were the only team in the DRC Finals that attempted all tasks, scored points (14/16), did not require physical human intervention (a reset), and did not fall in the two missions during the two days of tests. We also had the most consistent pair of runs.
                      @article{dedonatojfr2017,
author = {DeDonato, Mathew and Polido, Felipe and Knoedler, Kevin and Babu, Benzun P. W. and Banerjee, Nandan and Bove, Christoper P. and Cui, Xiongyi and Du, Ruixiang and Franklin, Perry and Graff, Joshua P. and He, Peng and Jaeger, Aaron and Li, Lening and Berenson, Dmitry and Gennert, Michael A. and Feng, Siyuan and Liu, Chenggang and Xinjilefu, X and Kim, Joohyung and Atkeson, Christopher G. and Long, Xianchao and Padır, Taşkın},
title = {Team WPI-CMU: Achieving Reliable Humanoid Behavior in the DARPA Robotics Challenge},
journal = {Journal of Field Robotics},
volume = {34},
number = {2},
pages = {381-399},
doi = {10.1002/rob.21685},
year = {2017}
}



#### σ-DVO: Sensor Noise Model Meets Dense Visual Odometry

B. P. Wisely Babu, S. Kim, Z. Yan, R. Liu
In this paper we propose a novel method called s-DVO for dense visual odometry using a probabilistic sensor noise model. In contrast to sparse visual odometry, where camera poses are estimated based on matched visual features, we apply dense visual odometry which makes full use of all pixel information from an RGB-D camera. Previously, t-distribution was used to model photometric and geometric errors in order to reduce the impacts of outliers in the optimization. However, this approach has the limitation that it only uses the error value to determine outliers without considering the physical process. Therefore, we propose to apply a probabilistic sensor noise model to weigh each pixel by propagating linearized uncertainty. Furthermore, we find that the geometric errors are well represented with the sensor noise model, while the photometric errors are not. Finally we propose a hybrid approach which combines t-distribution for photometric errors and a probabilistic sensor noise model for geometric errors. We extend the dense visual odometry and develop a visual SLAM system that incorporates keyframe generation, loop constraint detection and graph optimization. Experimental results with standard benchmark datasets show that our algorithm outperforms previous methods by about a 25% reduction in the absolute trajectory error.
                      @inproceedings{sigma_dvo,
title = {$\sigma$-{DVO}: Sensor Noise Model Meets Dense Visual Odometry},
author = {Babu, Benzun Wisely and
Kim, Soohwan and
Yan, Zhixin and
Ren, Liu},
booktitle = {Proceeding of IEEE International Symposium on Mixed and Augmented Reality},
pages = {18--26},
year = {2016}
}



#### NO FALLS, NO RESETS: Reliable Humanoid Behavior in the DARPA Robotics Challenge

C. G. Atkeson, B. P. Wisely Babu, N. Banerjee, D. Berenson, C. P. Bove, X. Cui, M. DeDonato, R. Du, S. Feng, P. Franklin, M. Gennert, J. P. Graff, P. He, A. Jaeger, J. Kim, K. Knoedler, L. Li, C. Liu, X. Long, T. Padir, F. Polido, G. G. Tighe, X. Xinjilefu
We describe Team WPI-CMU's approach to the DARPA Robotics Challenge (DRC), focusing on our strategy to avoid failures that required physical human intervention. We implemented safety features in our controller to detect potential catastrophic failures, stop the current behavior, and allow remote intervention by a human supervisor. Our safety methods and operator interface worked: we avoided catastrophe and remote operators could safely recover from difficult situations. We were the only team in the DRC Finals that attempted all tasks, scored points (14/16), did not require physical human intervention (a reset), and did not fall in the two missions during the two days of tests. We also had the most consistent pair of runs. Much of the paper discusses lessons learned from the DRC.
                      @inproceedings{humanoid2015,
author={C. G. {Atkeson} and B. P. W. {Babu} and N. {Banerjee} and D. {Berenson} and C. P. {Bove} and X. {Cui} and M. {DeDonato} and R. {Du} and S. {Feng} and P. {Franklin} and M. {Gennert} and J. P. {Graff} and P. {He} and A. {Jaeger} and J. {Kim} and K. {Knoedler} and L. {Li} and C. {Liu} and X. {Long} and T. {Padir} and F. {Polido} and G. G. {Tighe} and X. {Xinjilefu}},
booktitle={2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids)},
title={No falls, no resets: Reliable humanoid behavior in the DARPA robotics challenge},
year={2015},
pages={623-630}
}



#### Gyroscope assisted scalable visual simultaneous localization and mapping

B. P. Wisely Babu, D. Cyganski, J. Duckworth
This paper describes the development and evaluation of an indoor localization algorithm using Visual Simultaneous Localization and Mapping (VSLAM) aided by gyroscope sensor information. Indoor environments pose several challenges which could cause a vision only system to fail due to tracking errors. Investigation revealed significant feature loss in a vision only system when traversing plain walls, windows and staircases. However, the addition of a gyroscope helps in handling such difficult conditions by providing additional rotational information. A portable system consisting of an Inertial Measurement Unit (IMU) and a stereo camera has been developed for indoor mapping. The images and gyroscope rates acquired by the system are stored and post-processed using a new Gyroscope Assisted Scalable Visual Simultaneous Localization and Mapping Algorithm (GA-ScaViSLAM). The algorithm has been evaluated for data-sets collected in the Atwater Kent building, Worcester Polytechnic Institute. This algorithm was found to be more robust in comparison to the vision only system. The Ga-ScaViSLAM was found to have an error (rms) of 0.6 m in the indoor environment over a total path length of 77m.
                      @inproceedings{babu14gascavislam,
author    = {B. P. Wisely Babu and D. Cyganski and J. Duckworth},
booktitle = {2014 Ubiquitous Positioning Indoor Navigation and Location Based Service (UPINLBS)},
title    = {Gyroscope assisted scalable visual simultaneous localization and mapping},
year     = {2014},
pages    = {220-227},
doi      = {10.1109/UPINLBS.2014.7033731},
month    = {Nov},
}



#### Tight Coupling between Manipulation and Perception using SLAM

B. P. Wisely Babu, C. P. Bove, M. A. Gennert
A tight coupling between perception and manipulation is required for dynamic robots to react in a timely and appropriate manner to changes in the world. In conventional robotics, perception transforms visual information into internal models which are used by planning algorithms to generate trajectories for motion. Under this paradigm, it is possible for a plan to become stale if the robot or environment changes configuration before the robot can replan. Perception and actuation are only loosely coupled through planning; there is no rapid feedback or interplay between them. For a statically stable robot in a slowly changing environment, this is an appropriate strategy for manipulating the world. A tightly coupled system, by contrast, connects perception directly to actuation, allowing for rapid feedback. This tight coupling is important for a dynamically unstable robot which engages in active manipulation. In such robots, planning does not fall between perception and manipulation; rather planning creates the connection between perception and manipulation. We show that Simultaneous Localization and Mapping (SLAM) can be used as a tool to perform the tight coupling for a humanoid robot with numerous proprioceptive and exteroceptive sensors. Three different approaches to generate a motion plan for grabbing a piece of debris is evaluated using for Atlas humanoid robot. Results indicate higher success rate and accuracy for motion plans that implement tight coupling between perception and manipulation using SLAM.
                      @inproceedings{babu14iros_workshop,
author    = {B. P. Wisely Babu and D. Cyganski and J. Duckworth},
booktitle = {2014 Ubiquitous Positioning Indoor Navigation and Location Based Service (UPINLBS)},
title    = {Gyroscope assisted scalable visual simultaneous localization and mapping},
year     = {2014},
pages    = {220-227},
doi      = {10.1109/UPINLBS.2014.7033731},
month    = {Nov},
}



#### A Tree Climbing Robot for Invasive Insect Detection

B. P. Wisely Babu, Eric T. Read, Justin A. Gostanian, M A. Gennert
This paper reviews progress in the development of a scansorial robot for invasive insect detection. It discusses the motivation for our approach, provides design considerations and implementation details, and presents progress to date. One notable feature of the robot is its use of vSLAM to map the tree under study. The robot is currently under development at WPI and this paper provides a summary of its status and future plans.
                      @inbook{babuclawar2012,
author = { B. P. Wisely Babu  and  Erick T. Read  and  Justin A. Gostanian  and  M. A. Gennert },
title = {A Tree Climbing Robot for Invasive Insect Detection},