It's the same problem as fireworks, the only models supporting LORA are like year old dense models that perform horribly on most tasks. If you want to do anything close to relevant you still need to rent/own dedicated GPUs, which seems insane to me when vLLM fully support dynamic LORA loading.
Disclaimer: I work at Cloudflare, but not on this.