Abstract: Large-language models (LLM) are widely used with exceptional performance, but their prohibitive size and cost limits deployment on edge devices. The compound-AI combines several specialized ...