Abstract:
This paper presents a new system level framework for Diminished Reality, leveraging for the first time both the appearance and 3D information provided by large photo collections on the Internet. Recent computer vision techniques have made it possible to automatically reconstruct 3-D structure-from-motion points from large and unordered photo collections. Using these point clouds and a prior provided by GPS, reasonably accurate 6 degree of freedom camera poses can be obtained, thus allowing localization. Once the camera (and hence the user) is correctly localized, photos depicting scenes visible from the user's viewpoint can be used to remove unwanted objects indicated by the user in the video sequences. Existing methods based on texture synthesis bring undesirable artifacts and video inconsistency when the background is heterogeneous; the task is rendered even harder for these methods when the background contains complex structures. On the other hand, methods based on plane warping fail when the background has arbitrary shape. Unlike these methods, our algorithm copes with these problems by making use of internet photos, registering them in 3D space and obtaining the 3D scene structure in an offline process. We carefully design the various components during the online phase so as to meet both speed and quality requirements of the task. Experiments on real data collected demonstrate the superiority of our system.
Social Program