Skip to main content

New best story on Hacker News: Fine-tune your own Llama 2 to replace GPT-3.5/4

Fine-tune your own Llama 2 to replace GPT-3.5/4
478 by kcorbitt | 125 comments on Hacker News.
There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at https://ift.tt/oGNS7Xy ). I've been playing around with fine-tuning models for a couple of years, and wanted to share some insights and practical code. I’ve condensed what I’ve learned into a small set of notebooks at https://ift.tt/HtjTo13... , covering labeling data, fine-tuning, running efficient inference, and evaluating costs/performance. The 7B model we train here matches GPT-4’s labels 95% of the time on the test set, and for the 5% of cases where they disagree it’s often because the correct answer is genuinely ambiguous. What is fine-tuning? You can think of it as a more-powerful form of prompting, where instead of writing your instructions in text you actually encode them in the weights of the model itself. You do this by training an existing model on example input/output pairs that demonstrate the task you want your fine-tuned model to learn. Fine-tuning can work with as few as 50 examples but I usually try to get 1000+ if possible. Prompting still has some big advantages over fine-tuning. It's way easier/faster to iterate on your instructions than label data and re-train a model. And operationally it's easier to deploy one big model and just adjust its behavior as necessary vs deploying many small fine-tuned models that will likely each get lower utilization. Fine-tuning has one huge advantage though: it is far more effective at guiding a model's behavior than prompting, so you can often get away with a much smaller model. That gets you faster responses and lower inference costs. A fine-tuned Llama 7B model is 50x cheaper than GPT-3.5 on a per-token basis, and for many use cases can produce results that are as good or better! For example, classifying the 2M recipes at https://ift.tt/jIpkB62 with GPT-4 would cost $23k. Even with GPT-3.5 it would cost over $1k. The model we fine-tuned performs similarly to GPT-4 and costs just $19 to run over the entire dataset. Disclaimer: My brother David and I are working on an open-source product called OpenPipe ( https://ift.tt/bLZE0QF ) to help engineers adopt fine-tuning as simply as possible. But none of the information above depends on our startup. The current post is just about sharing information that we’ve learned about fine-tuning. I hope it’s useful!

Comments

Popular posts from this blog

New best story on Hacker News: Ask HN: I’m an FCC Commissioner proposing regulation of IoT security updates

Ask HN: I’m an FCC Commissioner proposing regulation of IoT security updates 449 by SimingtonFCC | 144 comments on Hacker News. Hi everyone, I’m FCC Commissioner Nathan Simington, and I’m here to discuss security updates for IoT devices and how you can make a difference by filing comments with the FCC. As you know, serious vulnerabilities are common in IoT, and it often takes too long for these to be patched on end-user devices—if the manufacturer even bothers to release an update, and if the device was even designed to receive them. Companies may cease supporting a device well before consumers have stopped using it. The support period is often not communicated at the time of sale. And sometimes the end of support is not even announced, leaving even informed users unsure whether their devices are still safe. I’ve advocated for the FCC to require device manufacturers to support their devices with security updates for a reasonable amount of time [1]. I can't bring such a proposal