TY - GEN
T1 - Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks
AU - Wu, Peter
AU - Liang, Paul Pu
AU - Shi, Jiatong
AU - Salakhutdinov, Ruslan
AU - Watanabe, Shinji
AU - Morency, Louis Philippe
N1 - Publisher Copyright:
© 2021 APSIPA.
PY - 2021
Y1 - 2021
N2 - As users increasingly rely on cloud-based computing services, it is important to ensure that uploaded speech data re-mains private. Existing solutions rely either on server-side meth-ods or focus on hiding speaker identity. While these approaches reduce certain security concerns, they do not give users client-side control over whether their biometric information is sent to the server. In this paper, we formally define client-side privacy and discuss its unique technical challenges requiring 1) direct manipulation of raw data on client devices, 2) adaptability with a broad range of server-side processing models, and 3) low time and space complexity for compatibility with limited-bandwidth devices. These unique challenges require a new class of models that achieve fidelity in reconstruction, privacy preservation of sensitive personal attributes, and efficiency during training and inference. As a step towards client-side privacy for speech recog-nition, we investigate three techniques spanning signal processing, disentangled representation learning, and adversarial training. Through a series gender and accent masking tasks, we observe that each method has its unique strengths, but none manage to effectively balance the trade-offs between performance, privacy, and complexity. These insights call for more research in client-side privacy to ensure a safer deployment of cloud-based speech processing services.
AB - As users increasingly rely on cloud-based computing services, it is important to ensure that uploaded speech data re-mains private. Existing solutions rely either on server-side meth-ods or focus on hiding speaker identity. While these approaches reduce certain security concerns, they do not give users client-side control over whether their biometric information is sent to the server. In this paper, we formally define client-side privacy and discuss its unique technical challenges requiring 1) direct manipulation of raw data on client devices, 2) adaptability with a broad range of server-side processing models, and 3) low time and space complexity for compatibility with limited-bandwidth devices. These unique challenges require a new class of models that achieve fidelity in reconstruction, privacy preservation of sensitive personal attributes, and efficiency during training and inference. As a step towards client-side privacy for speech recog-nition, we investigate three techniques spanning signal processing, disentangled representation learning, and adversarial training. Through a series gender and accent masking tasks, we observe that each method has its unique strengths, but none manage to effectively balance the trade-offs between performance, privacy, and complexity. These insights call for more research in client-side privacy to ensure a safer deployment of cloud-based speech processing services.
UR - http://www.scopus.com/inward/record.url?scp=85126700909&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126700909&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85126700909
T3 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
SP - 841
EP - 848
BT - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Y2 - 14 December 2021 through 17 December 2021
ER -