By Alexander Chiejina with wire reports
Google has unveiled PaLM 2, its next generation large language model that builds on Google’s legacy of breakthrough research in machine learning and responsible AI.
It excels at advanced reasoning tasks, including code and maths, classification and question answering, translation and multilingual proficiency, and natural language generation better than our previous state-of-the-art LLMs, including PaLM.
It can accomplish these tasks because of the way it was built – bringing together compute-optimal scaling, an improved dataset mixture, and model architecture improvements.
PaLM 2 is grounded in Google’s approach to building and deploying AI responsibly. It was evaluated rigorously for its potential harms and biases, capabilities and downstream uses in research and in-product applications. It is being used in other state-of-the-art models, like Med-PaLM 2 and Sec-PaLM, and is powering generative AI features and tools at Google, like Bard and the PaLM API.
What PaLM2 can do
PaLM 2 can decompose a complex task into simpler subtasks and is better at understanding nuances of the human language than previous LLMs, like PaLM. For example, PaLM 2 excels at understanding riddles and idioms, which requires understanding ambiguous and figurative meaning of words, rather than the literal meaning. PaLM 2 was pre-trained on parallel multilingual text and on a much larger corpus of different languages than its predecessor, PaLM. This makes PaLM 2 excel at multilingual tasks.
Building PaLM 2
PaLM 2 excels at tasks like advanced reasoning, translation, and code generation because of how it was built. It improves upon its predecessor, PaLM, by unifying three distinct research advancements in large language models:
The basic idea of compute-optimal scaling is to scale the model size and the training dataset size in proportion to each other. This new technique makes PaLM 2 smaller than PaLM, but more efficient with overall better performance, including faster inference, fewer parameters to serve, and a lower serving cost.
Previous LLMs, like PaLM, used pre-training datasets that were mostly English-only text. PaLM 2 improves on its corpus with a more multilingual and diverse pre-training mixture, which includes hundreds of human and programming languages, mathematical equations, scientific papers, and web pages. PaLM 2 has an improved architecture and was trained on a variety of different tasks, all of which helps PaLM 2 learn different aspects of language.
Evaluating PaLM 2
PaLM 2 achieves state of the art results on reasoning benchmark tasks such as WinoGrande and BigBench-Hard. It is significantly more multilingual than our previous large language model, PaLM, achieving better results on benchmarks such as XSum, WikiLingua and XLSum. PaLM 2 also improves translation capability over PaLM and Google Translate in languages like Portuguese and Chinese.
PaLM 2 continues our responsible AI development and commitment to safety. PaLM 2 demonstrates improved multilingual toxicity classification capabilities, and has built-in control over toxic generation.
Potential harms and bias were evaluated across a range of potential downstream uses for PaLM 2, including dialog, classification, translation, and question answering. This includes developing new evaluations for measuring potential harms in generative question-answering settings and dialog settings related to toxic language harms and social bias related to identity terms.