Moore Threads has launched the MUSA port of vLLM, an open source project for high-speed reasoning framework for large language models, providing developers with an example of porting the open source project to MUSA based on Moore Threads' full-featured GPUs. Moore Threads strives to build a complete MUSA application ecosystem. vLLM is an efficient and easy-to-use large model inference and service framework, which has been widely used in various large language models. Moore Threads has ported and adapted vLLM v0.4.2 and made it fully open source. Thanks to MUSA's architectural advancements and the software stack's excellent CUDA compatibility, CUDA code can be migrated to the MUSA platform via the MUSIFY code conversion tool, which allows for the rapid replacement of CUDA-related libraries with MUSA-accelerated libraries. Moore Threads improves application porting efficiency, shortens development cycles, and provides utilities and scripts through the compatibility of the MUSA software stack with the CUDA software stack interface.
Open source address:
https://github.com/MooreThreads/vLLM_musa
Paper address:
https://arxiv.org/pdf/2309.06180
