>Do I need to say that this is wishful thinking, anon, does it help...
I've played around with them already on toy problems and saw potential in them, and I doubt someone at Google with 18k citations and someone else at Microsoft with 7k spend their time publishing meme papers for the lulz. Obviously there is a loss in complexity but it gives the option to trade off accuracy for less parameters so they can be sent over the net. I really don't care about state of the art results or being an outstanding engineer. It just needs to get the job done.
>Seriously, try to simulate even a simple distributed training run on your machine, see how the loss curve compares to normal dense training
I'll run some tests on different parameter reductions. If you have your own feel free to share your results. I think there was a paper that showed it's faster to train a model with more parameters and distil it into a smaller model. It's something to consider whether it's even worth distributing training or to just buy better hardware.
>and become blackpilled like myself and become a better engineer through the experience.
I was blackpilled until I started using LAMB instead of Adam. Being bitter and cynical isn't useful to getting stuff done. I neither believe or disbelieve papers. I just see them as possibilities to be explored when appropriate. Once you start making judgments something is just wishful thinking without actually testing it you're cutting off the possibility of ever knowing because you've already decided it's no good.
He's not really wrong to be honest. It's hard not to be cynical. Most gradient compression methods like PowerSGD and others often make wild claims but in practice require perfect hyperparameters for the task at hand and can become unstable or fail midway without warning. The 1-bit LAMB paper seems a lot more reasonable since they're only claiming a 4x reduction and LAMB has been pretty robust for me in all use cases except small batch sizes.
>This assumes the industry will continue feeding the increasingly problematic (their words) and outdated (the future ISN'T opensource, hello anon) hobbyist freeriders their precious checkpoints
Hugging Face has already started removing 'problematic' checkpoints but people are happily sharing them over torrents. I also don't think people should expect there to be many useful pretrained models in the future from the West, except small nerfed ones verifiably 'unproblematic' and torrents created by individuals. Look at the heat Stability.AI is taking for releasing models publicly and they don't even care a shred about open-source. They happily banned open-source contributors they saw as problematic and the rest is all rhetoric for free advertising.
>The large-scale reality on this issue in the coming years is likely to be shaped like this: AI regulation package is passed by a legislative body of a major geopolitical bloc to little fanfare
Not going to happen on a significant scale. And if it does whichever countries do this will seal their fates to never having any geopolitical power because they just dropped 95% of their AI researchers by causing massive human capital flight to less regulated countries. If Stable Diffusion gets taken down it implies taking down GPT2, OPT and almost all other models trained on any copyrighted or private data. Again, not going to happen unless a country is suicidal, and if that's the case they have much bigger issues to worry about than playing around with AI models.
>And how are you even going to cope with lack of Gato-tier checkpoint, by training a handful of adapter layers over run of the mill language model trained on e-trash and deemed safe enough for release? It won't be able to behave and you know it
If someone wants a Gato model or larger VIMA model then yeah they're going to have to train their own. Adapters aren't going to do shit. They're only useful for fine-tuning. On the other hand, pretrained models are pretty robust with what you can do with them. I've ripped embedding layers out of models and retrained them with new tokenizers. I remember there was a paper that found pretraining on Wikipedia helped with reinforcement learning, both speeding up convergence and getting better results: https://arxiv.org/abs/2201.12122
Even if no more pretrained models get released there's plenty to work with for the next 8 years and come up with a plan.
Personally I'm not concerned with making models from scratch right now. I don't have 10 80GB A100s at my disposal let alone 1000 and I doubt a ragtag team of 3090s and old GPUs will achieve anything useful from scratch. I just want to finetune language and vision models for conversational AI to a greater degree than can be done alone. The reason there isn't an open-source Gato is because no one wants to spend $10,000 of their own money for a toy that will be obsolete in 3 years.
>I have seen all too many projects going nowhere mostly via this incentive hijacking and an encroaching victory of talkers over builders.
Then build your own? There have been plenty of shit talkers to come by over the years telling us why we should do it their way but never post their own work. I'm happy with the progress I'm making and others are making. I'm not making this shit to save the world or take on corporations or for anyone else. I'm just building an AI waifu at home on a budget and sharing what I know.