Skip to content

Latest commit

 

History

History
190 lines (162 loc) · 6.15 KB

presidio_V2.md

File metadata and controls

190 lines (162 loc) · 6.15 KB

Presidio Revamp (aka V2)

As of March 2021, Presidio had undergo a revamp to a new version refereed to as V2.

The main changes introduced in V2 are:

  1. gRPC replaced with HTTP to allow more customizable APIs and easier debugging

  2. Focus on the Analyzer and Anonymizer services.

    1. Presidio Anonymizer is now Python based and pip installable.
    2. Presidio Analyzer does not use templates and external recognizer store.
    3. Image Redactor (formerly presidio-image-anonymizer) is in early beta and is Python based and pip installable.
    4. Other services are deprecated and potentially be migrated over time to V2 with the help of the community.
  3. Improved documentation, samples and build flows.

  4. Format Preserving Encryption replaced with Advanced Encryption Standard (AES)

V1 Availability

Version V1 (legacy) is still available for download. To continue using the previous version:

  • For docker containers, use tag=v1
  • For python packages, download version < 2 (e.g. pip install presidio-analyzer==0.95)

!!! note "Note" The legacy V1 code base will continue to be available under branch V1 but will no longer be officially supported.

API Changes

The move from gRPC to HTTP based APIs included changes to the API requests.

  1. Change in payload - moving from structures to jsons.

  2. Removing templates from the API - includes flattening the json.

  3. Using snake_case instead of camelCase .

Below is a detailed outline of all the changes done to the Analyzer and Anonymizer.

Analyzer API Changes

Legacy json request (gRPC)

{
    "text": "My phone number is 212-555-5555",
    "AnalyzeTemplateId": "1234",
    "AnalyzeTemplate": {
        "Fields": [
            {
                "Name": "PHONE_NUMBER",
                "MinScore": "0.5"
            }
        ],
        "AllFields": true,
        "Description": "template description",
        "CreateTime": "template creation time",
        "ModifiedTime": "template modification time",
        "Language": "fr",
        "ResultsScoreThreshold": 0.5
    }
}

V2 json request (HTTP)

{
    "text": "My phone number is 212-555-5555",
    "entities": ["PHONE_NUMBER"],
    "language": "en",
    "correlation_id": "213",
    "score_threshold": 0.5,
    "trace": true,
    "return_decision_process": true
}

Anonymizer API Changes

Legacy json request (gRPC)

{
  "text": "hello world, my name is Jane Doe. My number is: 034453334",
  "template": {
    "description": "DEPRECATED",
    "create_time": "DEPRECATED",
    "modified_time": "DEPRECATED",
    "default_transformation": {
      "replace_value": {...},
      "redact_value": {...},
      "hash_value": {...},
      "mask_value": {...},
      "fpe_value": {...}
    },
    "field_type_transformations": [
      {
        "fields": [
          {
            "name": "FIRST_NAME",
            "min_score": "0.2"
          }
        ],
        "transfomarion": {
          "replace_value": {...},
          "redact_value": {...},
          "hash_value": {...},
          "mask_value": {...},
          "fpe_value": {...}
        }
      }
    ],
    "analyze_results": [
      {
        "text": "Jane",
        "field": {
          "name": "FIRST_NAME",
          "min_score": "0.5"
        },
        "location": {
          "start": 24,
          "end": 32,
          "length": 6
        },
        "score": 0.8
      }
    ]
  }
}

V2 json request (HTTP)

{
    "text": "hello world, my name is Jane Doe. My number is: 034453334",
    "anonymizers": {
        "DEFAULT": {
            "type": "replace",
            "new_value": "val"
        },
        "PHONE_NUMBER": {
            "type": "mask",
            "masking_char": "*",
            "chars_to_mask": 4,
            "from_end": true
        }
    },
    "analyzer_results": [
        {
            "start": 24,
            "end": 32,
            "score": 0.8,
            "entity_type": "NAME"
        },
        {
            "start": 24,
            "end": 28,
            "score": 0.8,
            "entity_type": "FIRST_NAME"
        },
        {
            "start": 29,
            "end": 32,
            "score": 0.6,
            "entity_type": "LAST_NAME"
        },
        {
            "start": 48,
            "end": 57,
            "score": 0.95,
            "entity_type": "PHONE_NUMBER"
        }
    ]
}

Specific for each anonymization type:

Anonymization name Legacy format (V1) New json format (V2)
Replace
string newValue = 1;
{ "new_value": "VALUE" }
Redact NONE NONE
Mask
string maskingCharacter = 1;
int32 charsToMask = 2;
bool fromEnd = 3;
{
"chars_to_mask": 10,
"from_end": true,
"masking_char": "*"
}
Hash NONE
{"hash_type": "VALUE"}
FPE (now Encrypt)
string key = 3t6w9z$C&F)J@NcR;
int32 tweak = D8E7920AFA330A73
{"key": "3t6w9z$C&F)J@NcR"}

!!! note "Note" The V2 API keeps changing please follow the change log for updates.