Towards Automated Security and Privacy Policies Specification and Analysis
Date
2024-07-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Colorado State University
Abstract
Security and privacy policies, vital for information systems, are typically expressed in natural
language documents. Security policy is represented by Access Control Policies (ACPs) within security requirements, initially drafted in natural language and subsequently translated into enforceable policy. The unstructured and ambiguous nature of the natural language documents makes the
manual translation process tedious, expensive, labor-intensive, and prone to errors. On the other
hand, Privacy policy, with its length and complexity, presents unique challenges. The dense lan-
guage and extensive content of the privacy policies can be overwhelming, hindering both novice
users and experts from fully understanding the practices related to data collection and sharing. The
disclosure of these data practices to users, as mandated by privacy regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), is of utmost importance.
To address these challenges, we have turned to Natural Language Processing (NLP) to automate
extracting critical information from natural language documents and analyze those security and
privacy policies. Thus, this dissertation aims to address two primary research questions:
Question 1: How can we automate the translation of Access Control Policies (ACPs) from
natural language expressions to the formal model of Next Generation Access Control (NGAC) and
subsequently analyze the generated model?
Question 2: How can we automate the extraction and analysis of data practices from privacy
policies to ensure alignment with privacy regulations (GDPR and CCPA)?
Addressing these research questions necessitates the development of a comprehensive framework comprising two key components. The first component, SR2ACM, focuses on translating natural language ACPs into the NGAC model. This component introduces a series of innovative
contributions to the analysis of security policies. At the core of our contributions is an automated
approach to constructing ACPs within the NGAC specification directly from natural language documents. Our approach integrates machine learning with software testing, a novel methodology to
ensure the quality of the extracted access control model. The second component, Privacy2Practice,
is designed to automate the extraction and analysis of the data practices from privacy policies written in natural language. We have developed an automated method to extract data practices mandated by privacy regulations and to analyze the disclosure of these data practices within the privacy
policies.
The novelty of this research lies in creating a comprehensive framework that identifies the
critical elements within security and privacy policies. Thus, this innovative framework enables
automated extraction and analysis of both types of policies directly from natural language documents.
Description
Keywords
Security Policy, Privacy Policy, AI, NLP, Formal Analysis