Here my current plan for AI hardware:
- I ordered a used K80 with 2x12 GB recently, a used one of course, for 100$/95€ shipping included. It's an old GPU, only supported by older CUDA versions and might not run quantified models. Uses much energy, but it's two GPUs with 12GB each. I plan to pair this one after a while with a RTX3060 (12GB, 300€ used or 400€ new I think) in one home server. Context:
https://technical.city/en/video/Tesla-K80m-vs-GeForce-RTX-3060 - That one, will then run my 12GB models. For fine tuning, or models which don't run on the K80, I would use the 3060. I don't know yet if I can somehow joint them together and use 3x12GB through the bus. It just seems to need some software support in these programs for running models at home.
- I plan to use online services like Colab to learn about how to run these things, but have the K80 for more private data and learning how to do these things at home.
- Then I'll get some SBCs, most likely Orange PI's, which can run small models of Whisper (speech regocnition). Also, another small home server with a Intel Arc380 (140-160€), which is fast enough to run the big and better model of Whisper at the speed of one fast human speaker. It does this quite energy efficient. These devices will not run anything else, for security reasons, and be connected to microphones which will be always on. The server will receive the audio through the SBCs from all rooms using the home network (likely on a VPN using Tinc). All of them will send the transcripts to some server in my network which can then decide how to respond. Most likely filtering first for which data is more sensitive than others.
- Some small device, like a Raspi, will maybe handle responses based on AIML or using some small model.
- Questions which don't contain private information might be send to OpenAI or another service.
- The next step up will be getting a M40 (180€) and then a used RTX3090 (700-800€ right now I think), putting them in another home server at some point. Of course I might use this one for gaming till I get even the next GPU. These can handle the models which need 24 GB. The 3090 will do the fine tuning if I want to do that, since it has more power, while the M40 doesn't need as much energy. Context:
https://technical.city/en/video/GeForce-RTX-3090-vs-Tesla-M40-24-GB
- Then the next step might be getting a AMD XTX (1k-1.2k€) if it's supported well enough for AI by this time. I can use this one for gaming and then put the 3090 in a home server with the M40. If it's possible to combine cards using PCI express, then it might be interesting to think about getting another XTX later, and have 48GB vRAM.
- But I hope that either Intel or AMD will come out with a prosumer or consumer card for AI at home, which is rather slow but has 48GB and is not too expensive.
(If you buy K80 or M40 on Ebay make sure not to buy the 12GB versions by accident while only looking at the price. They aren't much cheaper. K80 should have 2x12GB and the M40 24GB.)