DeepSeek V4 - almost on the frontier, a fraction of the price
Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) last December. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash. Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They're using the standard MIT license. I think this makes DeepSeek-V4-Pro the new largest open weights model. It's larger than Kimi…
Soutenez Simon Willison's Weblog en consultant la ressource originale
Lire l'article originalVous aimez découvrir ces sources ?
Soutenez-moi sur Patreon