Petals
Petal is a decentralized platform that uses a powerful language model, such as Bloom-176b, to carry out inference and fine-tuning. One single-batch inference takes around one second per step (token) and has the capability to manage parallel inference up to hundreds of tokens/sec. This tool provides more than a traditional language model API, with fine-tune sampling approaches and the option to observe hidden states and execute custom paths. Petal also has a convenient PyTorch API, and is part of the BigScience research workshop project.