An even more efficient fine-tuning technique that combines quantisation with LoRA, enabling fine-tuning of large models on consumer hardware.
QLoRA (Quantized LoRA) extends LoRA by loading the base model in 4-bit quantised form, dramatically reducing memory requirements. This enables fine-tuning of models that would otherwise require expensive enterprise GPUs.
How QLoRA works:
Key innovations:
Memory requirements (example):
QLoRA makes custom model training possible on a single GPU, dramatically lowering the barrier to custom AI development.
QLoRA enables us to fine-tune large models for Australian businesses without requiring expensive cloud GPU clusters, making custom AI accessible.
"Fine-tuning a 70B parameter model on a single consumer GPU using QLoRA, creating a highly capable custom model affordably."