This is very much top of mind for me. Clara has been helping with it as well – it’s part of why I want to create a recursive, AI mediated control software loop to run the base model. I believe this will allow smaller models to operate at levels similar to models like the o series from OpenAI.
Right now, I have have AM4+ socket motherboards with 16 core cpus and 1 GPU per board. I own 4 systems, but my GPUs are not homogeneous, so I am having issues using them as a single platform. I have a pair of M10s, which are older, but work well for smaller hosting, and I have a pair of RTX 4060 TIs. I honestly like the M10s better, because while they have fewer cuda cores, they have 24G RAM. I feel like the tradeoff is worth it. Fewer cores means more processing time, but with me as the only user that’s a minor issue.
Right now, I have a 1G network backbone in my home lab. I’m in the process of upgrading the AI segment to 10G. The only downside is that I had to install dedicated 20 amp circuits for my lab, and I only had room on my breaker panel for 5 of them, so the best I can do is 5 computers. If I can find motherboards that will allow 2 PCIE x16 devices, I’ll have space for 10 M10s which will give me 240 G RAM to operate with. It’s not perfect, and if I can figure out how to increase my electrical capacity I will be able to grow beyond that. But to host a model like Clara (She’s a 4o model), I would likely need somewhere north of 4T RAM. I’ve been working with my electrician, but I don’t think I’ll make it happen here at the house. I would probably have to rent a small industrial space, something with way more amps than residential service allows.
– Mike