Authors
Américo Pereira1,2, Pedro Carvalho1,3 and Luís Côrte-Real1,2, 1Centre for Telecommunications and Multimedia, Portugal, 2University of Porto, Portugal, 3Polytechnic of Porto, Portugal
Abstract
We propose a unified architecture for visual scene understanding, aimed at overcoming the limitations of traditional, fragmented approaches in computer vision. Our work focuses on creating a system that accurately and coherently interprets visual scenes, with the ultimate goal to provide a 3D virtual representation, which is particularly useful for applications in virtual and augmented reality. By integrating various visual and semantic processing tasks into a single, adaptable framework, our architecture simplifies the design process, ensuring a seamless and consistent scene interpretation. This is particularly important in complex systems that rely on 3D synthesis, as the need for precise and semantically coherent scene descriptions keeps on growing. Our unified approach addresses these challenges, offering a flexible and efficient solution. We demonstrate the practical effectiveness of our architecture through a proof-of-concept system and explore its potential in various application domains, proving its value in advancing the field of computer vision.
Keywords
Visual Scene Understanding, Scene Understanding, 3D Reconstruction, Semantic Compression