# Data Classification*

# Overview

Data is classified based on its value or sensitivity level. This is performed in the create phase of the data lifecycle.

Data classification can be defined as a tool for categorization of data to help an organization effectively answer the following questions:

  • What data types are available?
  • Where is certain data located?
  • What access levels are implemented?
  • What protection level is implemented, and does it adhere to compliance regulations?

Virtualization has the potential to affect data classification processes and implementations in the cloud. Data transforming from raw objects to virtualized instances to snapshotted images back into virtualized instances and then back out to the users in the form of raw data may affect the organization's current classification methodology. Techniques and tools that were suitable for the traditional IT environment might not withstand the standard cloud environment.

Datasets Input Entities
Primary set P&DP law
Scope and purpose of the processing
Categories of the personal data to be processed
Categories of the processing to be performed
Secondary set Data location allowed
Categories of users allowed
Data retention constraints
Security measures to be ensured
Data breach constraints

# Classification Process

The data classification section describes how and when data should be classified, and gives security procedures and controls for handling the data classifications.

Ask yourself, "how much damage could it cause if this data got inadvertently exposed?" (This is harm.) Data's value includes harm, time to create data, liability/compromise, etc. Similarly, what would the impact be in the following scenarios:

  • If the information was widely distributed (such as SSNs or government information).
  • If an employee of the CSP accessed the data.
  • If the data was manipulated by an outsider or was unexpectedly changed.
  • If the information was unavailable for a period of time.

The following items help determine classification:

  • Sensitivity
  • Jurisdiction
  • Criticality

The following process should be followed:

  • Execute data discovery
  • Define data classification policies
  • Execute data classification process
  • Implement enforcement technologies to protect classified data

# Data Labeling

Labeling is a technology which can be used to group data elements together.

When the data owner creates, categorizes, and classifies the data, it also needs to be labeled. It is the data owner's job to label data, not the CSP.

Labels might include the following types of information:

  • Data owner
  • Date of creation
  • Date of scheduled destruction/disposal
  • Confidentiality level
  • Handling directions
  • Dissemination/distribution instructions
  • Access limitations
  • Source
  • Jurisdiction
  • Applicable regulation

# Data Protection and Control

  • Data retention
  • Data deletion
  • Data archiving

# Data Retention

The retention periods section details how long the different data classifications should be retained.

The data retention policy should include the following:

  • Retention periods.
  • Applicable regulations.
  • Retention/data formats. The retention formats section details the medium on which the different data classifications should be stored. It also contains any handling procedures that should be followed.
  • Data security.
  • Data classification.
  • Archiving and retrieval procedures.
  • Monitoring, maintenance, and enforcement.

# Data Disposal

Disposal options in the legacy environment:

  • Physical destruction of media and hardware
  • Degaussing
  • Overwriting
  • Cryptoshredding

# Data Archival

The archiving and retrieval procedures section of the data retention policy will contain information on how data should be sent into storage to support later recovery if needed.

  • Data encryption procedures
  • Data monitoring procedures
  • Ability to perform e-discovery and granular retrieval
  • Backup and DR options
  • Data format and media type
  • Data restoration procedures