Bits With Brains
Curated AI News for Decision-Makers
What Every Senior Decision-Maker Needs to Understand About AI and its Impact
Yet Another Powerful New Open Source Model: Solar 10.7B
1/7/24
Editorial team at Bits with Brains
Solar 10.7B is a large language model (LLM) that has demonstrated superior performance in various tasks. Despite having only 10.7 billion parameters, it outperforms models with up to 30 billion parameters, including the recent Mixtral 8X7B model.
Solar 10.7B is not a new architecture but rather a combination of multiple models, like the Goliath 120 billion parameter model.
The Solar 10.7B model introduces a new technique called depth upscaling. This technique involves taking a base model, in this case, a 32-layer Llama 2 architecture, and initializing it with pre-trained weights from Mistral 7B. Copies of the model are then created, with the top eight layers removed from the first copy and the bottom eight layers removed from the second copy. The remaining layers are then concatenated to create a new model with 48 layers and a total of 10.7 billion parameters. This technique is said to rival techniques like the mixture of experts.
The model has shown remarkable performance in various tests. After continued pre-training, the model was fine-tuned in two stages: instruction fine-tuning followed by alignment tuning through DPO (Direct Presence Optimization). The model has also reported minimal data contamination in the benchmarks. However, it is always recommended to test these models on your own applications.
Solar 10.7B has been tested in various scenarios, including logical reasoning, creative writing, and programming tasks. In creative writing tasks, the model has shown impressive results, generating coherent and contextually appropriate responses. However, in logical reasoning tasks, the model's performance was less consistent, a common failing with many smaller models. In programming tasks, the model was able to generate correct code for simple tasks.
The Solar 10.7B model, with its depth upscaling technique, presents an interesting alternative to the mixture of experts approach. It is currently one of the best-performing models on the leaderboard. The future may see a combination of this approach with the mixture of experts, potentially leading to even more powerful small models.
Sources:
[2] https://huggingface.co/upstage/SOLAR-10.7B-v1.0
[3] https://www.reddit.com/r/LocalLLaMA/comments/17rsmox/goliath120b_quants_and_future_plans/?rdt=58835
Sources