Open-Weight Release of Mixtral 8x7B

12/24/23

Editorial team at Bits with Brains

Mixtral 8x7B is a new model from Mistral AI. It is a mixture of experts (MoE) implementation that combines eight separate models, each an expert in certain areas, into a single model.

The model uses a software router to select the best-suited model to respond to a given prompt. For inference, it chooses two models out of the eight. Despite being about the size of a sixty billion parameter model when all eight are combined, it outperforms the Llama 2 70b, a seventy billion parameter model, while being about four times faster. This is because it only uses a subset of the model for inference, not the entire model. The Mixtral model matches or outperforms the Llama 2 70b and GPT 3.5 on most benchmarks and has an inference speed of a twelve billion parameter model. It supports a context length of 32,000 tokens.

The model's multilingual capabilities extend to languages such as English, French, Italian, German, and Spanish, but it does not support Asian languages yet. Its prowess in code generation and its ability to be fine-tuned as an instruction-following model are also highlighted, with scores that are competitive with or surpass GPT-3.5 on standard benchmarks.

The MoE approach contrasts with traditional dense models that use all parameters for every task, which can be less efficient and slower. With MoE, the model's size can be effectively larger (in terms of the total number of parameters) without a proportional increase in computation during inference, as only a fraction of the model is used at a time.

This model, while not strictly open-source, has been made available as an open-weight model under the Apache 2.0 license. This means that while the training code and datasets are not provided, the model weights are accessible for use and fine-tuning by the public.

The release of Mixtral 8x7B has sparked discussions about the future of open-source AI, as it demonstrates that open-weight models can be extremely competitive with closed-source offerings. This could potentially democratize access to powerful AI tools, allowing a broader community of developers and researchers to innovate and build upon the model.

Sources:

https://mistral.ai/news/mixtral-of-experts/

What Every Senior Decision-Maker Needs to Know About AI and its Impact

Open-Weight Release of Mixtral 8x7B

Sources