GitHub - Cre4T3Tiv3/unsloth-llama3-alpaca-lora: Custom model training using modern architectures. 4-bit QLoRA fine-tuning pipeline for LLaMA 3 8B with production-grade optimization. Memory-efficient training on consumer GPUs. Published adapter on HuggingFace. From training pipeline to deployed model.

4-bit QLoRA fine-tuning pipeline for LLaMA 3 8B. From training configuration to published adapter on HuggingFace. Memory-efficient instruction tuning on consumer GPUs using Unsloth.

What This Is

An end-to-end pipeline for custom model training: dataset preparation, QLoRA fine-tuning, evaluation, and deployment. The adapter is trained on the Alpaca-cleaned instruction dataset plus grounded QLoRA reasoning examples added to mitigate hallucinations. Published and runnable on HuggingFace.

Training Configuration

Parameter	Value
Base Model	`unsloth/llama-3-8b-bnb-4bit`
Adapter Format	LoRA (merged post-training)
LoRA r / alpha / dropout	16 / 16 / 0.05
Epochs	2
Training Data	~2K examples (alpaca-cleaned + grounded)
Precision	4-bit (bitsandbytes)
Training Hardware	A100 (40GB)
Framework	Unsloth + HuggingFace PEFT

Evaluation

The included eval_adapter.py script checks for hallucination patterns (false QLoRA definitions), computes keyword overlap per instruction against a threshold, and outputs a JSON summary. Run make eval to validate adapter behavior.

Usage

make install   # Create .venv and install with uv
make train     # Train LoRA adapter
make eval      # Evaluate output quality
make run       # Run inference

Local Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/llama-3-8b-bnb-4bit", device_map="auto", load_in_4bit=True
)
model = PeftModel.from_pretrained(
    base_model, "Cre4T3Tiv3/unsloth-llama3-alpaca-lora"
).merge_and_unload()
tokenizer = AutoTokenizer.from_pretrained("Cre4T3Tiv3/unsloth-llama3-alpaca-lora")

prompt = "### Instruction:\nExplain LoRA fine-tuning in simple terms.\n\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

Trained on ~2K samples in a single fine-tuning run. Not optimized for contexts longer than 2K tokens. 4-bit quantization may reduce fidelity. Not production-grade for factual QA or critical domains.

Links

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
docs/assets		docs/assets
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
unsloth_llama3_alpaca_lora_training.ipynb		unsloth_llama3_alpaca_lora_training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What This Is

Training Configuration

Evaluation

Usage

Local Inference

Limitations

Links

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

What This Is

Training Configuration

Evaluation

Usage

Local Inference

Limitations

Links

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages