TY - GEN
T1 - Can Large Language Models Reason About Goal-Oriented Tasks?
AU - Bellos, Filippos
AU - Li, Yayuan
AU - Liu, Wuao
AU - Corso, Jason J.
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Most adults can complete a sequence of steps to achieve a certain goal, such as making a sandwich or repairing a bicycle tire. In completing these goal-oriented tasks, or simply tasks in this paper, one must use sequential reasoning to understand the relationship between the sequence of steps and the goal. LLMs have shown impressive capabilities across various natural language understanding tasks. However, prior work has mainly focused on logical reasoning tasks (e.g. arithmetic, commonsense QA); how well LLMs can perform on more complex reasoning tasks like sequential reasoning is not clear. In this paper, we address this gap and conduct a comprehensive evaluation of how well LLMs are able to conduct this reasoning for tasks and how they scale w.r.t multiple dimensions(e.g. adaptive prompting strategies, number of in-context examples, varying complexity of the sequential task). Our findings reveal that while Chain of Thought (CoT) prompting can significantly enhance LLMs’ sequential reasoning in certain scenarios, it can also be detrimental in others, whereas Tree of Thoughts (ToT) reasoning is less effective for this type of task. Additionally, we discover that an increase in model size or in-context examples does not consistently lead to improved performance.
AB - Most adults can complete a sequence of steps to achieve a certain goal, such as making a sandwich or repairing a bicycle tire. In completing these goal-oriented tasks, or simply tasks in this paper, one must use sequential reasoning to understand the relationship between the sequence of steps and the goal. LLMs have shown impressive capabilities across various natural language understanding tasks. However, prior work has mainly focused on logical reasoning tasks (e.g. arithmetic, commonsense QA); how well LLMs can perform on more complex reasoning tasks like sequential reasoning is not clear. In this paper, we address this gap and conduct a comprehensive evaluation of how well LLMs are able to conduct this reasoning for tasks and how they scale w.r.t multiple dimensions(e.g. adaptive prompting strategies, number of in-context examples, varying complexity of the sequential task). Our findings reveal that while Chain of Thought (CoT) prompting can significantly enhance LLMs’ sequential reasoning in certain scenarios, it can also be detrimental in others, whereas Tree of Thoughts (ToT) reasoning is less effective for this type of task. Additionally, we discover that an increase in model size or in-context examples does not consistently lead to improved performance.
UR - http://www.scopus.com/inward/record.url?scp=85190265832&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190265832&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85190265832
T3 - SCALE-LLM 2024 - 1st Edition of the Workshop on the Scaling Behavior of Large Language Models, Proceedings of the Workshop
SP - 24
EP - 34
BT - SCALE-LLM 2024 - 1st Edition of the Workshop on the Scaling Behavior of Large Language Models, Proceedings of the Workshop
A2 - Miceli-Barone, Antonio Valerio
A2 - Barez, Fazl
A2 - Cohen, Shay B.
A2 - Voita, Elena
A2 - Germann, Ulrich
A2 - Lukasik, Michal
T2 - 1st Workshop on the Scaling Behavior of Large Language Models, SCALE-LLM 2024
Y2 - 22 March 2024
ER -