I have used curated data resources and several data generators and to obtain good enough American, Canadian, and European specific datasets.
It’s essential to support country specific localisation (l10n) as an integral part of your policies to reduce false positive and false negative. The flexibility provided by the internationalisation (i18n) to ensure that DLP policies can be adapted to various languages and regions without engineering changes.
The datasets are identified with the country ISO code except for generic english document.
Country | Code |
---|---|
Canada | CA |
Federal Republic of Germany | DE |
French Republic | FR |
Hellenic Republic | GR |
Kingdom of Belgium | BE |
Kingdom of Norway | NO |
Kingdom of Sweden | SE |
Kingdom of the Netherlands | NL |
Portugal | PT |
Republic of Finland | FI |
Swiss Confederation | CH |
United States of America | US |
Secrets
Items:
- password files / shadow
- common passwords
- LDAP1: LDF2schema to store content & actions to perform such as a adding, modifying, removing and renaming objects (e.g., users and groups)
- base-64 encoded files
- ICS/SCADA3
Compliance:
- To be defined
Finance
Items:
- Credit card number (CCN)
Compliance:
- PCI
Information Technology (IT)
Items:
- A list of free email provider domains curated by T. Brian Jones
- ldap
- code
Compliance:
- To be defined
International
Items:
- Contract
- NDA
The United Nations (UN) group of experts on geographical names document the short and the formal countries names for the official national languages and the UN official languages (e.g. English, French, Spanish, Russian, Chinese, and Arabic).
ISO 3166 is the International Standard for country codes, codes for subdivisions and formerly used codes (codes that were once used to describe countries but are no longer in use).
The country codes can be represented either as a two-letter code (alpha-2) which is recommended as the general purpose code, a three-letter code (alpha-3) which is more closely related to the country name and a three digit numeric code (numeric -3) which can be useful if you need to avoid using Latin script.
Names and codes for subdivisions are usually taken from relevant official national information sources.
Compliance:
- To be defined
Personal
Items:
- PII
- PHI
Compliance:
- GDPR
Legal
Items:
- Contract
- NDA
Compliance:
- To be defined
Personal
Items:
- PII
- PHI
Compliance:
- GDPR