Human voice is a rich source of personal information, including identity. Advancements in speech technology, like voice conversion and synthesis have made it easier to clone and manipulate voices for misuse which is a huge privacy risk. Voice anonymization is a crucial step to protect speaker identity in speech data. Traditionally voice anonymization is evalauated for utlity and privacy using only a single ASR and ASV stystem. This leads to biases in the evaluation and does not reflect real world scenarios where multiple ASR and ASV systems exist. To resolve this we propose AUDI and Fusion EER metrics to aggregate results across 6 ASR and ASV systems.Kindly read through the Metrics Explained tab to understand how the metrics are computed. Below we rank SOTA voice anonymization systems on the proposed metrics. Note that we present Fusion EER in two scenarios - A2A (Lazy Informed attacker ) and O2A (Ignorant Attacker ). Lazy informed attackers have partial knowledge of the ASV system and thus can make certain imporvements to the attacks. Ignorant attackers have little to knowldege of the ASV system.

⚙️Utility Measure: AUDI (ASR Utility Distortion Index)

Table: AUDI (ASR Utility Distortion Index)

🛡️ Privacy Measures: Fusion EER (Equal Error Rate)

Table: A2A Fusion EER (Lazy Informed Attacker case)

Table: O2A Fusion EER (Ignorant Attacker Case)