research-article

A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning

Authors:

Francisco Guzmán,

Benjamin I. P. Rubinstein,

Trevor CohnAuthors Info & Claims

WWW '21: Proceedings of the Web Conference 2021

Pages 3638 - 3650

https://doi.org/10.1145/3442381.3450034

Published: 03 June 2021 Publication History

Abstract

As modern neural machine translation (NMT) systems have been widely deployed, their security vulnerabilities require close scrutiny. Most recently, NMT systems have been found vulnerable to targeted attacks which cause them to produce specific, unsolicited, and even harmful translations. These attacks are usually exploited in a white-box setting, where adversarial inputs causing targeted translations are discovered for a known target system. However, this approach is less viable when the target system is black-box and unknown to the adversary (e.g., secured commercial systems). In this paper, we show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data. We show that this attack can be realised practically via targeted corruption of web documents crawled to form the system’s training data. We then analyse the effectiveness of the targeted poisoning in two common NMT training scenarios: the from-scratch training and the pre-train & fine-tune paradigm. Our results are alarming: even on the state-of-the-art systems trained with massive parallel data (tens of millions), the attacks are still successful (over 50% success rate) under surprisingly low poisoning budgets (e.g., 0.006%). Lastly, we discuss potential defences to counter such attacks.

References

[1]

Ahmed Abdelali, Francisco Guzman, Hassan Sajjad, and Stephan Vogel. 2014. The AMARA Corpus: Building Parallel Language Resources for the Educational Domain. In LREC.

[2]

Marta Bañón, Pinzhen Chen, Barry Haddow, Kenneth Heafield, Hieu Hoang, Miquel Esplà-Gomis, Mikel L. Forcada, Amir Kamran, Faheem Kirefu, Philipp Koehn, Sergio Ortiz Rojas, Leopoldo Pla Sempere, Gema Ramírez-Sánchez, Elsa Sarrías, Marek Strelec, Brian Thompson, William Waites, Dion Wiggins, and Jaume Zaragoza. 2020. ParaCrawl: Web-Scale Acquisition of Parallel Corpora. In ACL.

[3]

Loïc Barrault, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santanu Pal, Matt Post, and Marcos Zampieri. 2019. Findings of the 2019 Conference on Machine Translation (WMT19). In WMT.

[4]

Marco Barreno, Blaine Nelson, Russell Sears, Anthony D Joseph, and J Doug Tygar. 2006. Can machine learning be secure?. In ASIACCS.

[5]

Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Manuel Rangel Pardo, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation.

[6]

Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and natural noise both break neural machine translation. In ICLR.

[7]

Gabriel Bernier-Colborne and Chi-kiu Lo. 2019. NRC Parallel Corpus Filtering System for WMT 2019. In Proceedings of the Fourth Conference on Machine Translation.

[8]

Davide Canali, Marco Cova, Giovanni Vigna, and Christopher Kruegel. 2011. Prophiler: a fast filter for the large-scale detection of malicious web pages. In WWW.

[9]

Isaac Caswell and Bowen Liang. 2020 (accessed August 9, 2020). Recent Advances in Google Translate. https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html.

[10]

Mauro Cettolo, Niehues Jan, Stüker Sebastian, Luisa Bentivogli, Roldano Cattoni, and Marcello Federico. 2016. The IWSLT 2016 evaluation campaign. In International Workshop on Spoken Language Translation.

[11]

Vishrav Chaudhary, Yuqing Tang, Francisco Guzmán, Holger Schwenk, and Philipp Koehn. 2019. Low-Resource Corpus Filtering Using Multilingual Sentence Embeddings. In Proceedings of the Fourth Conference on Machine Translation.

[12]

Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526(2017).

[13]

Minhao Cheng, Jinfeng Yi, Pin-Yu Chen, Huan Zhang, and Cho-Jui Hsieh. 2020. Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples. In AAAI.

[14]

Yong Cheng, Lu Jiang, and Wolfgang Macherey. 2019. Robust Neural Machine Translation with Doubly Adversarial Inputs. In ACL.

[15]

Yong Cheng, Lu Jiang, Wolfgang Macherey, and Jacob Eisenstein. 2020. AdvAug: Robust Adversarial Augmentation for Neural Machine Translation. In ACL.

[16]

Javid Ebrahimi, Daniel Lowd, and Dejing Dou. 2018. On adversarial examples for character-level neural machine translation. In COLING.

[17]

Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding Back-Translation at Scale. In EMNLP.

[18]

Ahmed El-Kishky, Vishrav Chaudhary, Francisco Guzmán, and Philipp Koehn. 2020. CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs. In EMNLP.

[19]

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. In ICML.

[20]

Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733(2017).

[21]

Philipp Koehn, Francisco Guzmán, Vishrav Chaudhary, and Juan Pino. 2019. Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions. In Proceedings of the Fourth Conference on Machine Translation.

[22]

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In ACL (Demo and Poster Sessions).

[23]

Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight Poisoning Attacks on Pretrained Models. In ACL.

[24]

Pierre Lison and Jörg Tiedemann. 2016. OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In LREC.

[25]

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and Luke Zettlemoyer. 2020. Multilingual denoising pre-training for neural machine translation. arXiv preprint arXiv:2001.08210(2020).

[26]

Marco Lui and Timothy Baldwin. 2012. langid.py: An Off-the-shelf Language Identification Tool. In ACL (System Demonstrations).

[27]

Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In EMNLP.

[28]

Justin Ma, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. 2009. Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In KDD.

[29]

Andrew Marshall, Jugal Parikh, Emre Kiciman, and Ram Shankar Siva Shankar, Kumar. 2019 (accessed October 2, 2020). Threat Modeling AI/ML Systems and Dependencies. https://docs.microsoft.com/en-us/security/engineering/threat-modeling-aiml

[30]

Paul Michel, Xian Li, Graham Neubig, and Juan Pino. 2019. On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models. In NAACL.

[31]

Alexander Moshchuk, Tanya Bragin, Damien Deville, Steven D Gribble, and Henry M Levy. 2007. SpyProxy: Execution-based Detection of Malicious Web Content. In USENIX Security Symposium.

[32]

Luis Muñoz-González, Battista Biggio, Ambra Demontis, Andrea Paudice, Vasin Wongrassamee, Emil C Lupu, and Fabio Roli. 2017. Towards poisoning of deep learning algorithms with back-gradient optimization. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 27–38.

Digital Library

[33]

Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, and Sergey Edunov. 2019. Facebook FAIR’s WMT19 News Translation Task Submission. In Proceedings of the Fourth Conference on Machine Translation.

[34]

Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of NAACL-HLT 2019: Demonstrations.

[35]

Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277(2016).

[36]

Matt Post. 2018. A Call for Clarity in Reporting BLEU Scores. In Proceedings of the Third Conference on Machine Translation: Research Papers.

[37]

Benjamin I. P. Rubinstein, Blaine Nelson, Ling Huang, Anthony D. Joseph, Shing hon Lau, Satish Rao, Nina Taft, and J. D. Tygar. 2009. ANTIDOTE: Understanding and defending against poisoning of anomaly detectors. In Proceedings of the 9th ACM SIGCOMM Internet Measurement Conference(IMC).

Digital Library

[38]

Víctor M. Sánchez-Cartagena, Marta Bañón, Sergio Ortiz-Rojas, and Gema Ramírez-Sánchez. 2018. Prompsit’s submission to WMT 2018 Parallel Corpus Filtering shared task. In Proceedings of the Third Conference on Machine Translation.

[39]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In ACL.

[40]

Ali Shafahi, W Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks. In NeurIPS.

[41]

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. Mass: Masked sequence to sequence pre-training for language generation. ICML (2019).

[42]

Jacob Steinhardt, Pang Wei W Koh, and Percy S Liang. 2017. Certified defenses for data poisoning attacks. In NeurIPS.

[43]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR.

[44]

Jörg Tiedemann. 2012. Parallel Data, Tools and Interfaces in OPUS. In LREC.

[45]

Dániel Varga, Péter Halácsy, András Kornai, Viktor Nagy, László Németh, and Viktor Trón. 2007. Parallel corpora for medium density languages. Amsterdam Studies In The Theory And History Of Linguistic Science Series 4 292(2007), 247.

[46]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS.

[47]

Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. 2019. Universal Adversarial Triggers for Attacking and Analyzing NLP. In EMNLP–IJCNLP.

[48]

Eric Wallace, Mitchell Stern, and Dawn Song. 2020. Imitation Attacks and Defenses for Black-box Machine Translation Systems. In EMNLP.

[49]

Eric Wallace, Tony Z Zhao, Shi Feng, and Sameer Singh. 2020. Customizing Triggers with Concealed Data Poisoning. arXiv preprint arXiv:2010.12563(2020).

[50]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144(2016).

[51]

Hainan Xu and Philipp Koehn. 2017. Zipporah: a Fast and Scalable Data Cleaning System for Noisy Web-Crawled Parallel Corpora. In EMNLP.

[52]

Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating natural adversarial examples. In ICLR.

Cited By

Wu BChen KHe YChen GZhang WYu N(2024)CodeWMBench: An Automated Benchmark for Code Watermarking EvaluationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674447(120-125)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674447
Garcia XBansal YCherry CFoster GKrikun MJohnson MFirat OKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)The unreasonable effectiveness of few-shot learning for machine translationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618846(10867-10878)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618846
Li JLi ZZhang HLi GJin ZHu XXia X(2023)Poison Attack and Poison Detection on Deep Source Code Processing ModelsACM Transactions on Software Engineering and Methodology10.1145/3630008Online publication date: Nov-2023
https://doi.org/10.1145/3630008
Show More Cited By

Recommendations

Defending Against Adversarial Denial-of-Service Data Poisoning Attacks
DYNAMICS '20: Proceedings of the 2020 Workshop on DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security

Data poisoning is one of the most relevant security threats against machine learning and data-driven technologies. Since many applications rely on untrusted training data, an attacker can easily craft malicious samples and inject them into the training ...
Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box Attacks
CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

Recent work has proposed stateful defense models (SDMs) as a compelling strategy to defend against a black-box attacker who only has query access to the model, as is common for online machine learning platforms. Such stateful defenses aim to defend ...
Stronger data poisoning attacks break data sanitization defenses
Abstract
Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models’ training sets. A common defense against these attacks is data sanitization: first filter out ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '21: Proceedings of the Web Conference 2021

April 2021

4054 pages

ISBN:9781450383127

DOI:10.1145/3442381

Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '21

Sponsor:

SIGWEB

WWW '21: The Web Conference 2021

April 19 - 23, 2021

Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
274
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)10

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu BChen KHe YChen GZhang WYu N(2024)CodeWMBench: An Automated Benchmark for Code Watermarking EvaluationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674447(120-125)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674447
Garcia XBansal YCherry CFoster GKrikun MJohnson MFirat OKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)The unreasonable effectiveness of few-shot learning for machine translationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618846(10867-10878)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618846
Li JLi ZZhang HLi GJin ZHu XXia X(2023)Poison Attack and Poison Detection on Deep Source Code Processing ModelsACM Transactions on Software Engineering and Methodology10.1145/3630008Online publication date: Nov-2023
https://doi.org/10.1145/3630008
Liu HZhao PXu TBian YHuang JZhu YMu Y(2023)Curriculum Graph PoisoningProceedings of the ACM Web Conference 202310.1145/3543507.3583211(2011-2021)Online publication date: 30-Apr-2023
https://doi.org/10.1145/3543507.3583211
Sheng XHan ZLi PChang X(2022)A Survey on Backdoor Attack and Defense in Natural Language Processing2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS57517.2022.00086(809-820)Online publication date: Dec-2022
https://doi.org/10.1109/QRS57517.2022.00086
Ni MWang CZhu TYu SLiu W(2022)Attacking neural machine translations via hybrid attention learningMachine Language10.1007/s10994-022-06249-x111:11(3977-4002)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s10994-022-06249-x

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents