Deep Learning-Based Digital Human Modeling And Applications
dc.contributor.advisor | Wang, Pu | |
dc.contributor.author | Ali, Ayman | |
dc.date.accessioned | 2024-01-07T12:54:39Z | |
dc.date.available | 2024-01-07T12:54:39Z | |
dc.date.issued | 2023-12-14 | |
dc.description.abstract | Recent advancements in the domain of deep learning models have engendered remarkable progress across numerous computer vision tasks. Notably, there has been a burgeoning interest in the field of recovering three-dimensional (3D) human models from monocular images in recent years. This heightened interest can be attributed to the extensive practical applications that necessitate the utilization of 3D human models, including but not limited to gaming, human-computer interaction, virtual systems, and digital twin. The focus of this dissertation is to conceptualize and develop a suite of deep learning-based models with the primary objective of enabling the expeditious and high-fidelity digitalization of human subjects. This endeavor further aims to facilitate a multitude of downstream applications that leverage digital 3D human models. The endeavor to estimate a three-dimensional (3D) human mesh from a monocular image necessitates the application of intricate deep-learning models for enhanced feature extraction, albeit at the expense of heightened computational requirements. As an alternative approach, researchers have explored the utilization of a skeleton-based modality, which represents a lightweight abstraction of human pose, aimed at mitigating the computational intensity. However, this approach entails the omission of significant visual cues, particularly shape information, which cannot be entirely derived from the 3D skeletal representation alone. To harness the advantages of both paradigms, a hybrid methodology that integrates the benefits of 3D human mesh and skeletal information offers a promising avenue. Over the past decade, substantial strides have been made in the estimation of two-dimensional (2D) joint coordinates derived from monocular images. Simultaneously, the application of Convolutional Neural Networks (CNNs) for the extraction of intricate visual features from images has demonstrated its prowess in feature extraction. This progress serves as a compelling impetus for our investigation into a hybrid architectural framework that combines CNNs with a lightweight graph transformer-based approach. This innovative architecture is designed to elevate the 2D joint pose to a comprehensive 3D representation and recover essential visual cues essential for the precise estimation of pose and shape parameters. While SOTA results in 3D Human Pose Estimation (HPE) are important, they do not guarantee the accuracy and plausibility required for biomechanical analysis. Our innovative two-stage deep learning model is designed to efficiently estimate 3D human poses and associated kinematic attributes from monocular videos, with a primary focus on mobile device deployment. The paramount significance of this contribution lies in its ability to provide not only accurate 3D pose estimations but also biomechanically plausible results. This plausibility is essential for achieving accurate biomechanical analyses, thereby advancing various applications, including motion tracking, gesture recognition, and ergonomic assessments. Our work significantly contributes to enhancing our understanding of human movement and its interaction with the environment, ultimately impacting a wide range of biomechanics-related studies and applications. In the realm of human movement analysis, one prominent downstream task is the recognition of human actions based on skeletal data, known as Skeleton-based Human Action Recognition (HAR). This domain has garnered substantial attention within the computer vision community, primarily due to its distinctive attributes, such as computational efficiency, the innate representational power of features, and robustness to variations in illumination. In this context, our research demonstrates that, by representing 3D pose sequences as RGB images, conventional Convolutional Neural Network (CNN) architectures, exemplified by ResNet-50, when complemented by as tute training strategies and diverse augmentation techniques, can attain State-of-the-Art (SOTA) accuracy levels, surpassing the widely adopted Graph Neural Network models. The domain of radar-based sensing, rooted in the transmission and reception of radio waves, offers a non-intrusive and versatile means to monitor human movements, gestures, and vital signs. However, despite its vast potential, the lack of comprehensive radar datasets has hindered the broader implementation of deep learning in radar-based human sensing. In response, the application of synthetic data in deep learning training emerges as a crucial advantage. Synthetic datasets provide an expansive and practically limitless resource, enabling models to adapt and generalize proficiently by exposing them to diverse scenarios, transcending the limitations of real-world data. As part of this research’s trajectory, a novel computational framework known as "virtual radar" is introduced, leveraging 3D pose-driven physics-informed principles. This paradigm allows for the generation of high-fidelity synthetic radar data by merging 3D human models and the principles of Physical Optics (PO) approximation for radar cross-section modeling. The introduction of virtual radar marks a groundbreaking path towards establishing foundational models focused on the nuanced understanding of human behavior through privacy-preserving radar-based methodologies. | |
dc.format.extent | 251 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14154/70542 | |
dc.language.iso | en_US | |
dc.publisher | Saudi Digital Library | |
dc.subject | Computer Science | |
dc.subject | Deep learning | |
dc.subject | CNN | |
dc.subject | GCN | |
dc.subject | Transformer | |
dc.subject | Biomechanics | |
dc.subject | Action recognition | |
dc.subject | Action Detection | |
dc.subject | Opensim | |
dc.subject | MMwave | |
dc.subject | Radar based action recognition | |
dc.title | Deep Learning-Based Digital Human Modeling And Applications | |
dc.type | Thesis | |
sdl.degree.department | Computing and Informatics | |
sdl.degree.discipline | Computer Science | |
sdl.degree.grantor | University of North Carolina | |
sdl.degree.name | Doctor Of Philosophy |