Bernardino
Romera-Paredes,
Mohammadamin
Barekatain, Alexander Novikov, Matej Balog,
M Pawan Kumar, Emilien Dupont, Francisco JR
Ruiz, Jordan S Ellenberg, Pengming Wang, Omar
Fawzi, et al. 2023. Mathematical discoveries from
program search with large language models. Nature,
pages 1–3.
Subhro Roy and Dan Roth. 2015. Solving general arith-
metic word problems. In Proceedings of EMNLP,
pages 1743–1752.
Baptiste Rozi`ere, Jonas Gehring, Fabian Gloeckle, Sten
Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi,
Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom
Kozhevnikov, Ivan Evtimov, Joanna Bitton, Man-
ish Bhatt, Cristian Canton-Ferrer, Aaron Grattafiori,
Wenhan Xiong, Alexandre Défossez, Jade Copet,
Faisal Azhar, Hugo Touvron, Louis Martin, Nico-
las Usunier, Thomas Scialom, and Gabriel Synnaeve.
2023. Code llama: Open foundation models for code.
CoRR, abs/2308.12950.
Mrinmaya Sachan, Avinava Dubey, and Eric P. Xing.
2017. From textbooks to knowledge: A case study in
harvesting axiomatic knowledge from textbooks to
solve geometry problems. In Proceedings of EMNLP,
pages 773–784.
Mrinmaya Sachan and Eric P. Xing. 2017. Learn-
ing to solve geometry problems from natural lan-
guage demonstrations in textbooks. In Proceedings
of *SEM @ACM, pages 251–261.
Tomohiro Sawada, Daniel Paleka, Alexander Havrilla,
Pranav Tadepalli, Paula Vidas, Alexander Kranias,
John J. Nay, Kshitij Gupta, and Aran Komatsuzaki.
2023. ARB: advanced reasoning benchmark for large
language models. CoRR, abs/2307.13692.
Min Joon Seo, Hannaneh Hajishirzi, Ali Farhadi, Oren
Etzioni, and Clint Malcolm. 2015. Solving geometry
problems: Combining text and diagram interpretation.
In Proceedings of EMNLP, pages 1466–1476.
Paulo Shakarian, Abhinav Koyyalamudi, Noel Ngu, and
Lakshmivihari Mareedu. 2023. An independent eval-
uation of chatgpt on mathematical word problems
(MWP). In Proceedings of the AAAI 2023 Spring
Symposium on Challenges Requiring the Combina-
tion of Machine Learning and Knowledge Engineer-
ing (AAAI-MAKE 2023), Hyatt Regency, San Fran-
cisco Airport, California, USA, March 27-29, 2023,
volume 3433 of CEUR Workshop Proceedings.
Alessandro Stolfo, Zhijing Jin, Kumar Shridhar, Bern-
hard Schölkopf, and Mrinmaya Sachan. 2023. A
causal framework to quantify the robustness of math-
ematical reasoning with language models. In Pro-
ceedings of ACL, pages 545–561.
Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas
Scialom, Anthony Hartshorn, Elvis Saravia, An-
drew Poulton, Viktor Kerkez, and Robert Stojnic.
2022. Galactica: A large language model for science.
CoRR, abs/2211.09085.
Alberto Testolin. 2023. Can neural networks do arith-
metic? A survey on the elementary numerical skills
of state-of-the-art deep learning models. CoRR,
abs/2303.07735.
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier
Martinet, Marie-Anne Lachaux, Timothée Lacroix,
Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal
Azhar, Aurélien Rodriguez, Armand Joulin, Edouard
Grave, and Guillaume Lample. 2023a. Llama: Open
and efficient foundation language models. CoRR,
abs/2302.13971.
Hugo Touvron, Louis Martin, Kevin Stone, Peter Al-
bert, Amjad Almahairi, Yasmine Babaei, Nikolay
Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti
Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton-
Ferrer, Moya Chen, Guillem Cucurull, David Esiobu,
Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller,
Cynthia Gao, Vedanuj Goswami, Naman Goyal, An-
thony Hartshorn, Saghar Hosseini, Rui Hou, Hakan
Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa,
Isabel Kloumann, Artem Korenev, Punit Singh Koura,
Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Di-
ana Liskovich, Yinghai Lu, Yuning Mao, Xavier Mar-
tinet, Todor Mihaylov, Pushkar Mishra, Igor Moly-
bog, Yixin Nie, Andrew Poulton, Jeremy Reizen-
stein, Rashi Rungta, Kalyan Saladi, Alan Schelten,
Ruan Silva, Eric Michael Smith, Ranjan Subrama-
nian, Xiaoqing Ellen Tan, Binh Tang, Ross Tay-
lor, Adina Williams, Jian Xiang Kuan, Puxin Xu,
Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan,
Melanie Kambadur, Sharan Narang, Aurélien Ro-
driguez, Robert Stojnic, Sergey Edunov, and Thomas
Scialom. 2023b. Llama 2: Open foundation and
fine-tuned chat models. CoRR, abs/2307.09288.
Trieu Trinh, Yuhuai Wu, Quoc Le, He He, and Thang