Binding Language Models in Symbolic Languages

Abstract

Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations. Specifically, we employ GPT-3 Codex as the LM. In the parsing stage, with only a few in-context exemplars, Codex is able to identify the part of the task input that cannot be answerable by the original programming language, correctly generate API calls to prompt Codex to solve the unanswerable part, and identify where to place the API calls while being compatible with the original grammar. In the execution stage, Codex can perform versatile functionalities (e.g., commonsense QA, information extraction) given proper prompts in the API calls. Binder achieves state-of-the-art results on WikiTableQuestions and TabFact datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while Binder only uses dozens of annotations as in-context exemplars without any training.

Compare Binder with:

End2end & Chain-of-Thought: More interpretable and robust!
Binder program is deterministically executed to entail the prediction/answer.
Semantic Parsing & Code Generation: Higher grammar coverage!
Binder injects Codex functionalities into SQL/Python to handle more diverse questions.
Neural-symbolic Approach: More general (domains), training-free, few annotations!
Binder is based on general programming languages (SQL/Python), and requires only a dozens in-context exemplar annotations.

Video

Overview

Binder first prompts Codex to parse a question input into a Binder program, in which Codex has to decide (1) which parts in the input can be converted to the target programming language (grey clause in figure), (2) the corresponding task API calls (blue clause in figure) to prompt Codex to resolve the other parts, and (3) where to insert the API calls in the Binder program. Next, Binder prompts Codex again to generate answers to the task API calls (given the generated task prompts), integrates the generated results back to the programming language, and executes the resulting programming language expression to derive the final answer. In this way, Binder enables flexible functionality integration to the programming language to improve its grammar coverage while requiring only few-shot exemplar annotations.

Results

Binder framework achieves SOTA or comparable performance with only dozens of program annotations without training on three benchmarks!

SOTA on WikiTableQuestions.
SOTA on TabFact.
Comparable to SOTA on MultiModalQA.

Further, Binder is much more robust to large or noisy inputs than the end-to-end manner.

More Binder Program Examples

Copy our prompt from here with your input (table, question etc.) on OpenAI Playground to generate Binder program by yourself!

Acknowledgement

We thank Daniel Fried, Hongjin Su, Chen Henry Wu, Rui Zhang, Bailin Wang, Pengcheng Yin, Yizhong Wang, Weijia Shi for their helpful feedback on this work.

BibTeX

@article{Binder,
  title={Binding Language Models in Symbolic Languages},
  author={Zhoujun Cheng and Tianbao Xie and Peng Shi and Chengzu Li and Rahul Nadkarni and Yushi Hu and Caiming Xiong and Dragomir Radev and Mari Ostendorf and Luke Zettlemoyer and Noah A. Smith and Tao Yu},
  journal={ICLR},
  year={2023}
}