May 10 News.IBM On the 2nd of this month, we introduced its Granite One of the smallest versions in the 4.0 family of models: a preview version of Granite 4.0 Tiny.

Granite 4.0 Tiny Preview of theAdvantages include high computational efficiency andlow memorydemand (economics)The NVIDIA GeForce RTX 3060 12GB consumer graphics card with a suggested retail price of $329 (note: $2,383 at current exchange rates) requires only 12GB of RAM to run a concurrent dialog of five 128KB context windows at FP8 precision.
Granite 4.0 Tiny plans to train at least 15T of tokens, and the current Preview preview version only trains 2.5T, but theGranite 3.3 2B Instruct, which already delivers comparable performance to 12T training tokensThe memory requirements are reduced by about 72% for 16 concurrent sessions in a 128KB context window, and the final performance is expected to be comparable to that of Granite 3.3 8B Instruct.
The Granite 4.0 Tiny Preview has a total parameter size of 7B with 1B active parameters, and is based on the hybrid Mamba-2 / Transformer architecture that has been adopted by the entire Granite 4.0 family, combining the speed and accuracy of both, and lowering memory consumption without a significant loss of performance.
A preview version of Granite 4.0 Tiny is now available on Hugging Face under the standard Apache 2.0 license, and will be available from IBM atOfficially launched this summer Tiny and Small, Medium versions of the Granite 4.0 family of models.