Nowadays the process of anonymization of documents has been the subject of several studies and debates. By anonymization of documents, we mean the process of replacing sensitive data in order to preserve the confidentiality of documents without altering their content. In this work, we introduce Docflow, an open-source document anonymization engine capable of anonymizing documents based on specific filters chosen by the user. We applied Docflow to anonymize a set of legal documents and performed a processing performance analysis. By providing a Markdown input file to be anonymized, Docflow is able to redact all information according to users' choices, preserving the document content. Docflow will be integrated with NLP algorithms for the generation of the Markdown source file starting from documents already processed in different formats, but always with human supervision in the loop.

Docflow: Supervised Multi-Method Document Anonymization Engine

Morabito G.;Lukaj V.;Ruggeri A.;Fazio M.;Astone M. A.;Villari M.
2023-01-01

Abstract

Nowadays the process of anonymization of documents has been the subject of several studies and debates. By anonymization of documents, we mean the process of replacing sensitive data in order to preserve the confidentiality of documents without altering their content. In this work, we introduce Docflow, an open-source document anonymization engine capable of anonymizing documents based on specific filters chosen by the user. We applied Docflow to anonymize a set of legal documents and performed a processing performance analysis. By providing a Markdown input file to be anonymized, Docflow is able to redact all information according to users' choices, preserving the document content. Docflow will be integrated with NLP algorithms for the generation of the Markdown source file starting from documents already processed in different formats, but always with human supervision in the loop.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3290168
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact