Analyzing Hand Poses for Interaction Control with Deep Learning
Abstract
Work in 3D hand pose estimation had been a prominent field in Computer Vision
for its importance in various applications such as Virtual Reality, among many
other use cases. Previous work had focused on estimating 3D hand poses from
depth images, and recently had more emphasis on RGB images, various techniques
were employed in both methodologies but the most successful is Deep Learning
where Convolution Neural Networks have demonstrated improvement in
accuracy. In this work, we employ self-supervised and supervised techniques,
specifically, we attempt to use Deep Learning techniques from the literature to
capture hand poses, experiment with 3D and Quaternion representations of hand
poses, analyze the data, use various autoencoders for dimensionality reduction,
manifold visualization and experiment with classifiers for an interaction control
task of Sign Language Recognition. Our results show how complex the hand pose
data and difficult it can be for deep self-supervised approaches, while UMAP, an
off-shelf solution has more potential than the deep learning approach in our work.
It was also found that interaction control for Sign Language Recognition could be
achieved with 3D raw data effectively, even with linear methods, where
performance was superior than when using 3D or Quaternion encodings.