data categorization
Identification of data based on its attributes and properties. For example, assuming there are no dashes, the data categorization of a 9-digit number could be either a US Social Security number, a US mailing Zip+4 code, or none of the above, depending on its context. If the context of the data is a list of addresses, the data is likely to be Zip+4. However, if the context of data is non-US addresses, then the data classification is some other data class.
Legacy data discovery and classification solutions rely on regular expressions that categorize 9-digit numbers as Social Security numbers. Therefore, they are unable to perform data categorization with accuracy, leading to many false positives, which makes them difficult and time-consuming to use.
A frictionless data discovery and classification solution uses automated reasoning to perform data categorization efficiently and accurately.