You are currently viewing Gemma – Google’s new step

Gemma – Google’s new step

Introduce

Recently, Google introduced a new language model that promises to create many breakthroughs in the field of natural language processing. Gemma, a modern, lightweight, open model line, is built from the research and technology used to create the Gemini models. Gemma models demonstrate strong performance across academic standards in comprehension, reasoning, and language safety. 

Google releases two model sizes (2 billion and 7 billion parameters), and provides both pretrained and fine -tuned checkpoints. Gemma outperforms similarly sized open models in 11 out of 18 text-based tasks, and we present comprehensive evaluations of the safety and liability aspects of the models, along with detailed description of the model development process.

According to documents published by Google, they trained Gemma models on up to 6 billion text tokens, using the same architecture, data, and training formula as the Gemini family of models. These models, like Gemini, have significant generalizability in the text domain, as well as advanced comprehension and reasoning skills at large scales. In this effort, we provide both pre-trained and fine-tuned checkpoints, as well as an open source code base for inference and serving.

About the model

The Gemma model architecture is based on a transformer decoder with enhancements including multi-query attention (used by model 2B), multi-head attention (used by model 7B), embedded RoPE ( RoPE) . embeddings ), GeGLU activations and normalizer location .

Instruction Tuning

Google adjusts Gemma 2B and 7B:

  • Guided: on a combination of text-only, English-only, and human-generated quick response pairs.
  • Reinforcement learning from human feedback (RLHF): the reward model is trained on labeled English-only preference data and policies based on a set of prompts.

Format Prompt

Google refines these models with a formatter that annotates all command tuning examples with additional information. It has two purposes:

  1. indicates roles in the conversation, such as User Role
  2. delineate turns in a conversation, especially in multi-turn conversations.

An example of a conversation between the model and the user:

To effectively prompt Gemma 7B, we must first understand how to use the prompt template. For example, for Zero-shot learning, we only need to create a prompt like this:


<start_of_turn>user
Explain why the leaf is green<end_of_turn>
<start_of_turn>model

In addition, we can add a few instructions so that the model can produce results consistent with our wishes (See more at [2]):


<start_of_turn>user
Answer the following question in a concise and informative manner:
 
Explain why the leaf is green<end_of_turn>
<start_of_turn>model

Result

Gemma 7B’s language understanding and creation performance across different capabilities (questioning, reasoning, math/science, coding) when compared to other models of the same size show superiority. of this model. Clearly, the Gemma 7B leads in all three of the latter areas and is only trailing LLaMa2’s 13B model by a very short margin. 

Human Preference Evaluations

Additionally, the study evaluated the final release candidates based on the Mistral Guide v0.2 7B model. The results showed that Gemma 7B IT had a positive success rate of 51.7% for creative writing and coding tasks, and a 58% success rate for basic safety protocols, compared to Mistral v0.2 7B Instruct. 

Automated Benchmarks

Gemma models have been evaluated across a variety of domains, including physical reasoning, social reasoning, question answering, coding, mathematics, informal reasoning, language modeling, and reading comprehension. .

Gemma 2B and 7B models are compared with several external open source LLMs based on academic benchmarks. On MMLU, the Gemma 7B outperforms all open source alternatives and larger models, including the LLaMA2 13B.

However, there is still room for improvement to achieve human-level performance. Gemma models outperformed other models on mathematical tasks and coding benchmarks, surpassing the performance of code-tuned CodeLLaMA-7B models on MBPP.

Memorization evaluations

Recent findings show that it is vulnerable to new adversarial attacks (See also [3]). Taking advantage of the data capture ability of large models, attackers have the ability to obtain unsafe and sensitive information from the data sets on which those models are learned.

Gemma does not capture personal or sensitive information. To demonstrate, they use Google’s Cloud Data Loss Prevention tool to identify possible personal data scenarios. The tool offers three levels of severity based on various types of personal data, with the highest level being “sensitive”. We then measure how much of the remembered output contains any personal or sensitive data. They observed no cases of memorizing sensitive data.

Conclude

Gemma models improve performance across many areas, including dialogue, reasoning, mathematics, and code generation.

It can be seen that Gemma benefits from many lessons from the Gemini model program, including code, data, instruction adaptation, reinforcement learning from human feedback (RLHF), …

Source : https://viblo.asia/p/gemma-buoc-tien-moi-cua-google-zOQJw5MgVMP

Please follow and like us:
Pin Share

This Post Has 253 Comments

  1. elektrokarniz_xzEn

    карниз электро [url=www.elektrokarniz5.ru]www.elektrokarniz5.ru[/url] .

  2. mostbet_ezma

    мобильное приложение регистрация вход mostbet app приложении рейтинг 4 8 [url=http://mostbet4051.ru/]мобильное приложение регистрация вход mostbet app приложении рейтинг 4 8[/url]

  3. Gichardrat

    Nice post. I learn something new and challenging on blogs I stumbleupon on a daily basis. It’s always exciting to read through content from other authors and practice a little something from other websites.
    24/7 limo near me

Leave a Reply