Toward Better Insight Into Big Data: Engineering A Multilingual Cloud-Based Analytical Platform

No Thumbnail Available

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

People, organizations, and modern technologies have all become factories, pumping out unbelievable amounts of data every day. This phenomena and the techniques and tools to deal with it is collectively known as “big data.” Four characteristics of big data —volume, variety, velocity, and value (a.k.a. the 4Vs)—make processing large data sets challenging. One of the largest producers of big data is social media. Therefore, researchers in many fields have started analyzing social media data to gain valuable knowledge and insights. For collecting and analyzing big data from social media, the need for data-intensive software systems has emerged. The role of software engineers in this context is to deliver well-designed systems of the highest quality that can scale to address the challenges of the 4Vs. Many studies have contributed to accomplishing this goal, and Project EPIC at CU Boulder is a prime contributor. The main goal of Project EPIC is to analyze social media data, mainly Twitter, to understand social behaviors during disasters. Developing and enhancing the software infrastructure for Project EPIC uncovers the important role of software engineering in supporting the multidisciplinary domain of crisis informatics. However, the focus of Project EPIC has always been on collecting and analyzing tweets written in English and this situation raises an interesting question: how do the requirements and challenges change when we aim to collect and get insight from tweets written in multiple languages during crisis events? My research aims to empower Project EPIC by enhancing its capabilities to collect and analyze multilingual tweets. More generally, this dissertation provides insight into tackling the challenges of big data in a multilingual context.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025