/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

The canary has FINALLY been updated. -robi

Server software upgrades done, should hopefully keep the feds away. -robi

LynxChan 2.8 update this weekend. I will update all the extensions in the relevant repos as well.

The mail server for Alogs was down for the past few months. If you want to reach out, you can now use admin at this domain.

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


Knowing more than 100% of what we knew the moment before! Go beyond! Plus! Ultra!


Open file (2.21 MB 1825x1229 chobit.png)
Robowaifu@home: Together We Are Powerful Robowaifu Technician 03/14/2021 (Sun) 09:30:29 No.8958
The biggest hurdle to making quick progress in AI is the lack of compute to train our own original models, yet there are millions of gamers with GPUs sitting around barely getting used, potentially an order of magnitude more compute than Google and Amazon combined. I've figured out a way though we can connect hundreds of computers together to train AI models by using gradient accumulation. How it works is by doing several training steps and accumulating the loss of each step, then dividing by the amount of accumulation steps taken before the optimizer step. If you have a batch size of 4 and do 256 training steps before an optimizer step, it's like training with a batch size of 1024. The larger the batch size and gradient accumulation steps are, the faster the model converges and the higher final accuracy it achieves. It's the most effective way to use a limited computing budget: https://www.youtube.com/watch?v=YX8LLYdQ-cA These training steps don't need to be calculated by a single computer but can be distributed across a network. A decent amount of bandwidth will be required to send the gradients each optimizer step and the training data. Deep gradient compression achieves a gradient compression ratio from 270x to 600x without losing accuracy, but it's still going to be using about 0.5 MB download and upload to train something like GPT2-medium each optimizer step, or about 4-6 mbps on a Tesla T4. However, we can reduce this bandwidth by doing several training steps before contributing gradients to the server. Taking 25 would reduce it to about 0.2 mbps. Both slow and fast computers can contribute so long as they have the memory to hold the model. A slower computer might only send one training step whereas a fast one might contribute ten to the accumulated gradient. Some research needs to be done if a variable accumulation step size impacts training, but it could be adjusted as people join and leave the network. All that's needed to do this is a VPS. Contributors wanting anonymity can use proxies or TOR, but project owners will need to use VPNs with sufficient bandwidth and dedicated IPs if they wish that much anonymity. The VPS doesn't need an expensive GPU rental either. The fastest computer in the group could be chosen to calculate the optimizer steps. The server would just need to collect the gradients, decompress them, add them together, compress again and send the accumulated gradient to the computer calculating the optimizer step. Or if the optimizing computer has sufficient bandwidth, it could download all the compressed gradients from the server and calculate the accumulated gradient itself. My internet has 200 mbps download so it could potentially handle up to 1000 computers by keeping the bandwidth to 0.2 mbps. Attacks on the network could be mitigated by analyzing the gradients, discarding nonsensical ones and banning clients that send junk, or possibly by using PGP keys to create a pseudo-anonymous web of trust. Libraries for distributed training implementing DGC already exist, although not as advanced as I'm envisioning yet: https://github.com/synxlin/deep-gradient-compression I think this will also be a good way to get more people involved. Most people don't know enough about AI or robotics enough to help but if they can contribute their GPU to someone's robowaifu AI they like and watch her improve each day they will feel good about it and get more involved. At scale though some care will need to be taken that people don't agree to run dangerous code on their computers, either through a library that constructs the models from instructions or something else. And where the gradients are calculated does not matter. They could come from all kinds of hardware, platforms and software like PyTorch, Tensorflow or mlpack.
The way hentai@home created a giant distributed CDN built off of bittorrent was to provide the people that run instances of it a local copy of the media they want that gets automatically updated with proper tags instead of having to deal with downloading it manually themselves. There's also a reward structure that lets users download at high speeds so it turns into an offsite backup service for their hentai collection. https://ehwiki.org/wiki/Hentai@Home Asides for the altruistic donation of bandwidth and GPU processing power to help speed development of an AI they might want to use what would running robowaifu@home provide as a benefit to the end user?
>>8963 >The majority of anons can't be expected to understand the nuances of AI techniques & technologies. They just want their robowaifus to talk to them effectively. Yea, some thought is gonna have to go into how to utilize people's computers effectively and automatically. They might only have a 2 GB toaster GPU. While not ideal it could still help prototyping smaller models. I think feedback will be important otherwise people will shut the program off one night and forget to turn it back on when they don't see any results. Larger models could be compressed for people to use on their computers so they can directly reap the benefits of their contributions. It'll need to be able to run on Windows, Mac and Linux to reach the most amount of users. I imagine when they boot up this distributed training program it shows a list of projects their hardware is capable of contributing to and the user selects which one they want to help. Part of the responsibility will be project owners making their project pages look good enough that people want to lend their GPUs. Users could also dedicate their GPU to a project owner so their GPU can be used for any project or prototype by them. I plan on making a simple version soon to utilize all my computers and friends' computers. I'm sure a proof of concept will eventually attract other developers. The biggest issue will be securing it without nerfing what devs can do with it. The simplest solution would be to review code, manually approve projects and basically have package maintainers. And devs could choose to join untrusted projects that haven't been approved yet since they can review the code themselves. It wouldn't be much different from the risk taken when installing open-source software. But there could also be a sandboxed version where people can prototype vanilla models by defining hyper-parameters and network structure from existing modules. >>8964 >Would you guys contribute GPU cycles to create a GPT-3 clone? The problem with trying to clone GPT-3 is the model is too big to fit on anyone's GPU or in memory. The full size GPT-3 requires around 16 x 48GB GPUs and likely they have a few hundred or thousand, not just 16, doing gradient accumulation. The heads can be split up across devices in parallel but the layers can't be so easily and would incur a huge cost going back and forth from GPU/RAM, plus there's the bandwidth cost of sending all that data to the next computer to perform the next substep of the training step. It would be really inefficient and the whole network would have to work together to do any inference on the model. >>8965 Its purpose would be much more general. People could use the system for doing other projects unrelated to robowaifus and AI, such as finding twin primes or something else. It would be more like a crowd-sourced cloud computing platform. Adding a privacy mode is a good idea though in case people do give embarrassing names to their projects so other people using the computer only see 'Distributed Computing' or something like that. >>8966 If necessary the bandwidth can be greatly reduced at the expense of accuracy. A little bit of noise from high compression doesn't seem to impact gradients too much since they're already quite noisy. We don't have to be too pessimistic about bandwidth growth though. Once Starlink finishes rolling out satellites it will have 1 Gbps connections. ISPs are already getting nervous their cartel is threatened and have been doubling bandwidth to customers to keep them. >>8982 >what would running robowaifu@home provide as a benefit to the end user? A virtual waifu and all her functions. Once basic chat is solved people are going to expand their virtual waifus to perform other functions such as playing video games, composing music, drawing, debating, summarizing research papers, searching the web, etc. People wanting these functions will contribute to those projects and receive a compressed version of the training results that their hardware can run or the full size version if they wish. Alternatively, someone could create a marketplace where people can pay crypto for compute, but I'm not familiar with how to do that. I think SingularityNET does something like that with AI services.
>>8990 > It would be more like a crowd-sourced cloud computing platform I see. Then all the more argument not to name the system Robowaifu@home. Some variant of CrowdCloud might be a more appropriate choice. Actually, a name like that could probably attract investment money, if you can secure it.
>>8991 Once you get investors you no longer own your projects. It's theirs to exploit. I think the whole point of this is to decentralize AI and avoid nobodies telling us what we can and can't compute and to give us an edge to compete with Big Tech. If Big Tech owns the platform they're not going to let that happen.
>>8990 >If necessary the bandwidth can be greatly reduced at the expense of accuracy I see. You did mention that before. That's actually rather convenient that you can make trade-offs and dial in functionality like that. >Once Starlink finishes rolling out satellites it will have 1 Gbps connections. I sure hope they pull it off, and then give it away practically for free. A man can dream. One point to mention here are goyphones *[shudders externally]*. If you could somehow quantify the 'total compute power potential' as a measure of which silicon die technology is being most heavily rolled out, I suspect the phones are already ahead of servers/desktops. Throw in Starlink etc. and mobile represents a sizable potential for raw compute power.
>>8990 >They might only have a 2 GB toaster GPU. Even if they have a 16GB high end GPU it might be incompatible with some tasks as AMD uses a different memory structure that can't run many popular CUDA compute libraries. That's why my low end 4GB nvidia GPU almost tripled in price in the last year while similarly specced or slightly more powerful AMD cards haven't. >But there could also be a sandboxed version Nowadays thanks to PCI passthrough with virtualization this project could be 100% OS independent with little to no performance penalty on top of having built in security. That requires very modern hardware to work properly but virtualization is going to be a huge game changer in personal computing in the upcoming decade. Wendell from Level1Tech has been talking about the potential of this as it comes out of the server space for years now. He's also one of the few people that probably has 16 x 48GB GPUs in a server rack but is smart enough to both use them to keep quiet about it. >A virtual waifu and all her functions. I can understand a collaborative effort at optimizing AI rather than having everyone do their own thing or replicate the same work but what would be the benefits of running RW@H compared to just downloading the models it has done and running it on your own hardware without using up extra bandwidth or electricity? That's the hard thing to come up with and would really sell this project.
>>8995 >That's the hard thing to come up with and would really sell this project. Honestly, if the only pitch is 'results-driven' then not likely to even get off the ground (much). The altruism that have made all the X@home projects successful is White people with a sense of 'helping out for the greater good'. It's very culture-specific. If it simply boils down to nothing but shekel-grubbing, 'what's in it for me?' mentality then no probably not going anywhere. Regardless, even things like BitTorrent have proven successful when only a very small number of us seed and 98% of only-self-interested exploiters don't. While that's quite a different model than this, it at least can be somewhat informative for the social dynamics of the thing. >tl;dr People will do it b/c they want to help. You know, give then bonus points for e-peen or jewel power-ups or something.
I think this is a brilliant idea, anon! I once joined a distributed computing group called "Mindmodelling@home" using BOINC (they were interested in curing neurological diseases, but of course I was there in the hopes of advancing A.I. for future robowaifus). But I left after a few months because they never seemed to post any updates and their project appeared to be dead. If something like this were to become reality, I'd upgrade my PC just to help crunch work units! We already have various options for robot bodies and synthetic voices, but it's the A.I. where we are severely behind. I also agree with >>8965 though, that we should name it something agreeable and generic like "robotworld@home" or "droidschool@home". So that the Western MSM has less to latch onto.
>>9000 >"droidschool@home" That's not bad IMO. What about something like "Mindschool@home" ? That seems like it could basically be construed to mean just about anything. Mommies might even approve of something named that! :^) >"How many roads must a man walk down?" >42
>>8993 This is a very good point. I think we should keep things BSD/MIT licensed so any anons can take our ideas and run with it. But you're correct about investors basically being sharks. In fact most of them have a fundamental MO of ousting the founders before long. E.G., Cisco Systems, and countless others.
>>8990 >The problem with trying to clone GPT-3 is the model is too big to fit on anyone's GPU or in memory. Hmm, I see. Well, my guess is that this system could be re-purposed relatively easily for different types of AI problems/solutions correct? What about something like Pattern-Exploiting Training (PET) ? (>>5793, >>5799 and following) Also, I think what Anon mentioned here >>9000 >BOINC Isn't that a generalizable type of flexible framework for this kind of thing? Do you think your project could utilize this?
>>8995 >but what would be the benefits of running RW@H compared to just downloading the models it has done and running it on your own hardware It would be good if people prefer giving their hardware to train new models and functions that don't exist yet instead of wasting power reinventing the wheel or trying to get a 1% improvement on old models. If someone has a good idea people will want to try it and help out, then move on to the next project once the model reaches an acceptable result. >>9009 I'm not familiar with BOINC but it appears capable of running Python with some headaches to deal with to make sure package versions are consistent across different platforms and systems. Using virtual environments should take care of that, but it's not really clear what their API is capable of doing and the Python wrapper has limited functionality, which might be missing necessary things for distributed training. I couldn't find any Python machine learning projects on it so I imagine it's lacking something. >this system could be re-purposed relatively easily for different types of AI problems/solutions correct? What about something like Pattern-Exploiting Training (PET) ? Yeah, any model that people want to create. The PET model is doable with 223M parameters. That's 2/3rds the size of GPT-2 medium. Beating GPT-3 in a small domain of few shot learning is remarkable with 0.1% of the parameters but it doesn't mean that PET excels in everything else. GPT-3 has other glaring flaws in it as well like seeing everything as byte-pair encoded tokens, which gives it trouble with misspelled words and discerning patterns in long strings like ABC.. etc. that a tiny character-level model can pick up on easily. Some informal research has found that GPT-3 seems to be just using the structure of the sentences and the parts of speech to predict text, rather than the actual meaning of the words. VAEs on the other hand can interpolate between the meanings of sentences. For example, incrementally changing a bad review into a good one. They're notoriously difficult to train with GPUs but can still benefit from gradient accumulation and distributed training. There's a lot of other cool models we could try out even with only seven computers contributing. That would reduce a week of training into one day. With 24 what would take two years could be done in only a month, assuming similar performance between them. With 100 or 200 we would have no problem rapidly iterating prototypes and advancing, and for most models we wouldn't reach diminishing returns until hitting around 1k, with benefits vanishing completely by 10k.
>>9018 >VAEs Just in case this guy has dug up something important to us here. https://github.com/matthewvowels1/Awesome-VAEs
>>9018 >BOINC >virtual environments <"Volunteer Computing and Virtualization - CERN Indico" > >There's a lot of other cool models we could try out even with only seven computers contributing. That would reduce a week of training into one day. With 24 what would take two years could be done in only a month, assuming similar performance between them. With 100 or 200 we would have no problem rapidly iterating prototypes and advancing, and for most models we wouldn't reach diminishing returns until hitting around 1k, with benefits vanishing completely by 10k. You don't have to convince me Anon, I'm already 'part of the choir' with you. OTOH, how do you convince this Anon >>8995 ? While I wish his point was entirely invalid, the simple fact is he's right. The vast majority of unused power out there resides on anon's computers who have grown accustomed to a 'gibbs me dat' mentality (not that I'm impugning him in any way in this regards, he's simply pointing the issue out). OTO-OH, many of these X@home projects do succeed at attracting numerous volunteer contributors -- even up to 100'000 of them. https://en.wikipedia.org/wiki/Folding_@Home#Patterns_of_participation > So, how do we successfully promote your idea far and wide? It will take a large exposure IMO to overcome the greed factor already mentioned and find sufficient altruism needed for good success. https://en.wikipedia.org/wiki/Distributed_computing https://en.wikipedia.org/wiki/Citizen_science
>>9021 >So, how do we successfully promote your idea far and wide? It will take a large exposure IMO to overcome the greed factor already mentioned and find sufficient altruism needed for good success. I was thinking of the game route, where gamers literally train their in-game waifus. I was thinking a "gacha" game originally, but maybe a PC option awards some sort of in-game cosmetic or reward for contributing processing power. Although, if it was to become a PC game, then my original plan of simplicity for mobile phone purposes means I have to actually attempt to think up an entertaining game that would attract gamers who need an excuse to donate such processing power. I am currently looking into certain legal aspects, and also the huge issue of making such "gacha" games addicting...the art. It seems that finding a good artist that wouldn't break the game on a rando niche start-up is going to be difficult. I rather program, but taking some time so I can learn how to draw might be my last resort just to get something started.
Open file (396.50 KB 1116x709 Selection_285.png)
Idea: What if we tried to make a game of some kind out of Robowaifu@home OP? Like the Foldit guys did. Can't we sort of interactively 'give grades' to the AI's work and help accelerate it towards actual semantic understanding that way? >
>>8990 >Yea, some thought is gonna have to go into how to utilize people's computers effectively and automatically. That's going to take expertise to determine. The local client hardware can be probed for capabilities easily enough, but associating that with actual AI modeling potentials isn't something for a neophyte. >They might only have a 2 GB toaster GPU. While not ideal it could still help prototyping smaller models. True enough. Hopefully these 'smaller models' will become ever-more important in the future as our capabilities improve with time. >I think feedback will be important otherwise people will shut the program off one night and forget to turn it back on when they don't see any results. Very true. Even if it's simply some kind of graphic related to the actual work going on, similar to Folding@home's approach. >Larger models could be compressed for people to use on their computers so they can directly reap the benefits of their contributions. Some kind of reduction pre-processing? >It'll need to be able to run on Windows, Mac and Linux to reach the most amount of users. Obviously. I'd also suggest the potential of smartphones be investigated too. >I imagine when they boot up this distributed training program it shows a list of projects their hardware is capable of contributing to and the user selects which one they want to help. Sounds like a good plan. >Part of the responsibility will be project owners making their project pages look good enough that people want to lend their GPUs. Users could also dedicate their GPU to a project owner so their GPU can be used for any project or prototype by them. Please tell us more about project managers and their roles? >I plan on making a simple version soon to utilize all my computers and friends' computers. I'm sure a proof of concept will eventually attract other developers. I'm sure we'd all be interested in seeing the specific progress you're making as you go along with that Anon. >The biggest issue will be securing it without nerfing what devs can do with it. It's easy to see why that could be a tension of interests. Some kind of sandboxing springs to mind just offhand. >The simplest solution would be to review code, manually approve projects and basically have package maintainers. And devs could choose to join untrusted projects that haven't been approved yet since they can review the code themselves. It wouldn't be much different from the risk taken when installing open-source software. I like the basic idea of 'trustworthy' package maintainers. We're all basically dependent on them today, and that approach generally seems to work out OK. >But there could also be a sandboxed version where people can prototype vanilla models by defining hyper-parameters and network structure from existing modules. >"by defining hyper-parameters and network structure from existing modules" Mind clarifying that for us with more detail. Not sure I understand what that really means Anon.
>>9028 It depends on the model but people could do this for their projects. >>9029 >The local client hardware can be probed for capabilities easily enough, but associating that with actual AI modeling potentials isn't something for a neophyte. Performance tests can be done on various tasks like matrix multiplication, FFT, CNNs, RNNs, transformers, etc. and projects can be profiled and estimated with the parameters. It'll be a big task but I don't think it'll be something to worry about in the beginning. We'll likely be throwing every CPU and GPU we have at one task at a time rather than distributing them across multiple. Even with 2-5 projects the biggest performance difference will be from allocating GPUs to CNNs and letting RNNs have the CPUs. >Some kind of reduction pre-processing? Most of the weights of models can be pruned without much accuracy loss due to the sparsity of useful parameters. This allows massive models to run on mobile devices. It will be necessary for the work produced to be useful to people. If you use unpruned models for speech recognition, speech synthesis, and text generation together, you're looking at needing 16+ GB to run all that. With pruning though that can be brought down to 2-4 GB or even less by accepting a lower accuracy. >Please tell us more about project managers and their roles? It'd be like managing and maintaining any open-source project. I imagine some work will be involved checking that contributors are sending good data and not using incorrect training data until such tests are automated. Some additional work will be needed later to profile a project and make the information available to others so they can see if their hardware is a good match to contribute. And of course pruning models according to baseline contributor performance and resolving issues so everyone use them. >"by defining hyper-parameters and network structure from existing modules" >Mind clarifying that for us with more detail. Rather than writing Python code for models the sandboxed version could use its own model format that just specifies pre-built modules, the hyper-parameters defining how many parameters those modules use, and the connections between them. It goes in hand with the idea of an AI toolkit where you can drag and drop modules and connect them in a graphical interface without coding anything. These models would be generally safe to run from untrusted sources, far safer than stuff installed from Python's pip, so long as security exploits are minimized in the software itself.
Open file (6.68 MB 640x360 soy_not_even_once.mp4)
>>9026 It will be interesting to see what you come up with exactly, Anon. >>9031 >f you use unpruned models for speech recognition, speech synthesis, and text generation together Yes, I suppose all 3 will be needed for suitable virtual waifu apps (including animation too ofc, but that should be pretty lightweight, computation-wise). >It goes in hand with the idea of an AI toolkit where you can drag and drop modules and connect them in a graphical interface without coding anything. That's actually a very nice idea Anon. And as far as the specified code/models being safer than using python's pip, that would be very welcome. I hope someday we can just move away from the python ecosystem entirely. It seems fraught with numerous difficulties to me. OTOH, I'm obviously in the minority here. 'Computer Scientists' from the most prestigious universities today obviously think it's the single best thing since anything ever. :^)
>>8990 >Starlink I don't really trust Elon with these things, look at Tesla's services. He definitely isn't /ourguy/ and will provide it at a cost of "freedom". > Once basic chat is solved people are going to expand their virtual waifus to perform other functions such as playing video games This has a ltot of potential but it's prob not a good idea, since there doesn't exist just one game and it would become a Herculian task to try to appease everyone. >composing music, drawing Again, it's very vague and doesn't necessarily help the project in of itself nor necessarily appease our interests. >debating We already have many bots who can do that and we know for a fact it's not that good of an idea. >summarizing research papers This is actually very realistic. There has been made a paper around this and an AI which can do it so it's very easy to implement (we just need codemonkeys for that) >searching the web Unless it can do it in a non-pozzed way, it's practically useless. >Alternatively, someone could create a marketplace where people can pay crypto for compute, but I'm not familiar with how to do that. Don't care about that too much, it's still not worth it to enough people for it. My suggestion would be to first make an AI that, just like GPT-3, can code programs that we tell it to. Once we automate the coding part, the rest will become much easier and we won't have to worry too much about it.
>>9086 >My suggestion would be to first make an AI that, just like GPT-3, can code programs that we tell it to. I'm quite skeptical of the fundamentals of that claim. I'm familiar with the article that made the statement, but he basically pinned the entire notion on a single addition operation. It just returned an exemplar from it's working corpus that performed that exact operation already. I would argue the lexicographic problem of the AI digesting the command "Waifu, add 3 plus 4 for me" is much more difficult. We'll probably get there in the future with software programming waifus, but it certainly isn't here just now.
>>9000 https://www.mlcathome.org/mlcathome/ I've managed to find a new project that's just popped up using BOINC. It seems to be something for "Un-black-box-ifying" a lot of traditional machine learning algorithms.
>>9105 Thanks! Just in case anyone, like me, can't get to it via Tor: https://web.archive.org/web/20201225185201/https://www.mlcathome.org/mlcathome/
Seems like this could be somewhat related to your goals OP. https://github.com/tensorflow/mesh Please forgive my naivete if this isn't related.
>>8995 So does that mean even AMD GPUs with support for ROCam and having the appropriate motherboards would not be able to support this future crowd computed robowaifu project?
>>10532 It would depend on the code libraries that are used. Right now most of the most popular ones for AI are designed for Nvidia thanks to CUDA being the industry and academic standard. It might be possible to port it over to run on AMD's GPU computing platform if there's enough demand. My comment was about how the memory was laid out on some AMD card, this isn't the first time this problem comes up in the GPU world. Last time it was with the GTX 970 which ended in a class action lawsuit. This time it's just an issue with cryptocurrencies that need large amounts to store the entire blockchain, some of them can't be mined on AMD cards even if the CUDA code was ported over. What I'm going to do in the future is get two cards, a powerful AMD one due to better drivers on GNU/Linux and a weaker CUDA capable one for use in a virtual machine so all my bases are covered.
>>10533 Not him, but thanks for the explanation Anon.
>>10533 So OpenCL is dead in the water or something?
>>10550 (Not the same anon) AMD uses Rocm which is supported by PyTorch since recently. AMD has professional cards, Rocm seems to primarily support those: https://www.reddit.com/r/hardware/comments/ly6j3h/pytorch_18_supports_amd_rocm/gpr91eq and https://www.reddit.com/r/Amd/comments/nhpsnf/state_of_rocm_for_deep_learning Long story short, till AMD invests much more into the eco system: Forget it.
>>10556 Welp that is quite unfortunate that AMD doesn't bother to improve Rocm support. I find it a bit odd that AMD doesn't try to compete with Nvidia on this part too and instead allows Nvidia to gain a whopping market share in machine learning.
>>10563 What people tend to forget: AMD is much smaller than Nvida. They didn't have the money to do this, thanks to their underdog status in the past. That market might also be less relevant. But then, RocM and underlying technologies like OpenCL are open source, and Intel will release their own discrete GPUs soon, and there might be other players, like Apple for example, or other companies working with Arm based chips and a GPU.
>>10568 > or other companies working with Arm based chips and a GPU. Sadly, certain (((interests))) in the US Govt approved Nvidia's buyout of ARM, and the sale has been completed. Nvidia owns ARM now, lock, stock, and barrel.
Open file (62.11 KB 795x595 1612068100712.jpg)
>>10568 Hmm right, I recall there was a news or blog post what's the difference? I forget, anyway which says that only the bri'ish people buy AMD because it is cheaper and the other people are all buying Nvidia only. >>10577 >Nvidia owns ARM completely now >US Gov' sees no problem with a giant getting even bigger This shit just makes me sad.
>>10591 >This shit just makes me sad. It makes me angry, actually. Just follow the money, Anon, just follow the money.
>>10577 A month or so ago, the talk was that Nvidia buying ARM isn't finished bc Europe and China. Though, I didn't look into it today. ARM also licences it's designs to others, and they certainly won't be allowed to just stop that, if they even want to. I also assume this would only be relevant for future designs, hypothetically. Apple might already be quite independent in designing their own stuff, and there's still Power and Risc-V.
Is the OP still alive? Does he or any others have any takeaways from their research, especially those beyond the usual obvious?
Open file (53.04 KB 620x439 adapters go brrr.png)
Open file (111.94 KB 1048x489 kronecker adapter.png)
Open file (159.01 KB 522x786 hyperformer.png)
In August HuggingFace merged support for loading models in 8-bit: https://github.com/huggingface/transformers/pull/17901 https://arxiv.org/abs/2208.07339 (Note, at the time of writing this bitsandbytes does not support CUDA 11.7+ yet and will fail to load but support is coming) https://github.com/TimDettmers/bitsandbytes/issues/52#issuecomment-1272699887 By training with Kronecker adapters or low rank adapters the parameters can be reduced to 0.2%: https://arxiv.org/abs/2203.16329 https://arxiv.org/abs/2106.09685 This should make it possible to finetune opt-2.7b with only 6 GB VRAM. However, large models are still slow to inference. Using opt-350m or opt-1.3b would be much more practical and let people with smaller GPUs contribute. Some of the most popular GPUs on Steam are still only 4 GB: https://store.steampowered.com/hwsurvey/videocard/ From my experience working with large batch sizes the LAMB optimizer is the way to go, other optimizers can't compete: https://arxiv.org/abs/1904.00962 >Our empirical results demonstrate the superior performance of LAMB across various tasks such as BERT and ResNet-50 training with very little hyperparameter tuning. In particular, for BERT training, our optimizer enables use of very large batch sizes of 32868 without any degradation of performance. By increasing the batch size to the memory limit of a TPUv3 Pod, BERT training time can be reduced from 3 days to just 76 minutes Using large batch sizes with gradient accumulation will also help reduce bandwidth because each node can take many accumulated steps before needing to send a gradient update for an optimizer step. A Pytorch implementation of LAMB is available here: https://github.com/jettify/pytorch-optimizer DeepSpeed also provides training in PyTorch with 1-bit LAMB: >To train large models (like BERT and GPT-3) on hundreds of GPUs, communication has become a major bottleneck, especially on commodity systems with a limited bandwidth TCP network. >On one side large batch-size optimization such as LAMB algorithm was proposed to reduce the frequency of communication. On the other side, communication compression algorithms such as 1-bit Adam help to reduce the volume of each communication. However, we find that simply using one of the techniques is not sufficient to solve the communication challenge, especially under low network bandwidth. >Motivated by this we aim to combine the power of large-batch optimization and communication compression, but we find that existing compression strategies cannot be directly applied to LAMB due to its unique adaptive layerwise learning rates. To this end, we design a new communication efficient algorithm, 1-bit LAMB, which introduces a novel way to support adaptive layerwise learning rates under compression. >In addition, we introduce a new system implementation for compressed communication using the NCCL backend of PyTorch distributed, which improves both usability and performance. For BERT-Large pre-training task with batch sizes from 8K to 64K, our evaluations on up to 256 GPUs demonstrate that 1-bit LAMB with NCCL-based backend is able to achieve up to 4.6x communication volume reduction, up to 2.8x end-to-end timewise speedup, and the same sample-wise convergence speed (and same fine-tuning task accuracy) compared to uncompressed LAMB. https://arxiv.org/abs/2104.06069 https://www.deepspeed.ai/tutorials/onebit-lamb/ https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/fp16/onebit/lamb.py Sending gradient updates over a modest internet connection will be a breeze with adapters and 1-bit LAMB, so I'll start working on distributed training sometime in 2023 if nothing else comes along. Hypernetworks could also be added to the adapters so the model can be finetuned to do multiple tasks: https://arxiv.org/abs/2106.04489 I'm sure people will have disagreements of what to train on, but if training on two different tasks is better than training on just one then everyone wins. And lastly, Hugging Face is working on safetensors which will be necessary so people joining robowaifu mining pools don't get RCE pickled: https://github.com/huggingface/safetensors
>>17510 >the parameters can be reduced to 0.2% That sounds remarkable tbh. Thanks Anon.
>>17510 There's a lot of ambiguity over what gets called a "hypernet", and at least some variants end up being useless. From what I've heard, it isn't clear right now if hypernets are actually a good idea. >1-bit LAMB Very cool. I hadn't looked into distributed training algorithms before. Are 1-bit algorithms enough for Internet speeds? My understanding is that that would still require 1 bit per parameter per batch, which for a 1B model would be 125MB of parameter update data per batch. How long would a distributed reduce operation take on that much data with Internet latencies? Assuming there are 10-100 people training, I would guess it would be on the order of minutes, given that upload speeds tend be much slower than download speeds. Are there any studies on the relationship between batch size and number of batches required for convergence? If the batch sizes can be huge, then maybe those latencies are tolerable. I don't know how much efficiency gets lost with huge batch sizes though. There are so many ways to optimize this that I'm sure it can be done with a few more tricks. For example, I see no reason to stop at 1 bit per parameter. If you have massive batch sizes, you might as well go down to 1 bit per two or three parameters, with some per-batch pseudo-randomly selected coupling between parameters. >I'll start working on distributed training sometime in 2023 if nothing else comes along. I'll be working on distributed datasets, probably later this year. Once we're a bit further along, I can use your use cases as my test cases. >I'm sure people will have disagreements of what to train on, but if training on two different tasks is better than training on just one then everyone wins. Easy way to get consensus: fine-tune an established model on a specialized dataset. If that's possible with an (online learning?) algorithm that's less prone to catastrophic forgetting, you can let people decide for themselves what data they want to train on, so there's even less agreement required for getting good results. I can think of at least one community that would love to contribute gradients if it meant making large models work better with their data.
Open file (24.22 KB 379x462 attention.png)
Open file (74.86 KB 467x482 ANML.png)
>>17530 >There's a lot of ambiguity over what gets called a "hypernet", and at least some variants end up being useless. From what I've heard, it isn't clear right now if hypernets are actually a good idea. A hypernetwork is just a network that generates the weights of another. What makes transformers so effective is their generated weights in the queries, keys and values. This is all a hypernetwork is really doing. It's learning slow weights to program the fast weights of another network. NovelAI for example put adapters before the keys and values in the cross attention layers of Stable Diffusion and it had fantastic results. There are other similar methods that are quite interesting such as gating outputs with another network which has been shown to be super effective in reducing catastrophic forgetting after training sequentially on 600 different tasks: https://arxiv.org/abs/2002.09571 >Are 1-bit algorithms enough for Internet speeds? My understanding is that that would still require 1 bit per parameter per batch, which for a 1B model would be 125MB of parameter update data per batch. It's actually more than that initially before the compression can really begin. Overall through training it needs about 4 bits per parameter. Finetuning the full parameters directly over the net won't be possible. This is why adapters are needed to reduce the training parameters. >Are there any studies on the relationship between batch size and number of batches required for convergence? If the batch sizes can be huge, then maybe those latencies are tolerable. I don't know how much efficiency gets lost with huge batch sizes though. Not sure if there have been more recent studies but these are some I remember: >On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima https://arxiv.org/abs/1609.04836 (explains why Adam fails to generalize on large batch sizes) >Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates https://arxiv.org/abs/1708.07120 >Don't Decay the Learning Rate, Increase the Batch Size: https://arxiv.org/abs/1711.00489 >The Limit of the Batch Size https://arxiv.org/abs/2006.08517 (further investigation confirmed LAMB is the only optimizer that can generalize well at huge batch sizes but starts underperforming baselines at 812K) When training from scratch there isn't much advantage to using a large batch size with LAMB in the beginning but for finetuning there is a huge advantage and you can use much larger learning rates. There are still diminishing returns though once you get to larger batch sizes. In the LAMB paper they tested up to 131K in the first stage of training but found 65K worked best https://arxiv.org/abs/1904.00962 >Easy way to get consensus: fine-tune an established model on a specialized dataset. If that's possible with an (online learning?) algorithm that's less prone to catastrophic forgetting One of the benefits of using adapters is you can finetune on small datasets without degrading model performance, which the Compacter adapter explored and actually outperformed full fine-tuning on SuperGLUE: https://arxiv.org/abs/2106.04647 I'm looking forward to trying out the Hyperformer to finetune a model that can switch between casual chat, question answering and story writing. I envision the task for the Hyperformer being written in natural language so users can provide more general instructions like "respond with a joke" and it generates really good jokes in a way that can't be achieved with prompting. Then if someone wants to train it to generate Arxiv papers they can do that under a task name like "write a research paper". It might generalize at some point and understand "write a joke paper", which would be so much better than the ridiculous prompting models need now to do tasks. I think this could be done by using the language model itself, like taking the detached hidden state of the first 16 tokens or something like that and using those to generate the weights.
>>17533 >ANML Wow. That sounds great for foundation models, where the number of fine-tuning steps is much smaller than the number of pretraining steps, and where selectively ignoring information could be considered a good thing. On the point of catastrophic forgetting, I think several papers have found that if you train with a contrastive objective, then training a single linear output layer is as good as fine-tuning the whole network. That might be useful for decentralized training, where everyone can contribute both (1) gradients for a final layer based on a task-specific loss, and (2) gradients for the rest of the network based on a contrastive loss. For people that are only working on new tasks and not on new data, that along with the update compression tricks could reduce per-batch bandwidth requirements to just several hundred bytes. At that point, latency becomes a much bigger problem than bandwidth. >Finetuning the full parameters directly over the net won't be possible. This is why adapters are needed to reduce the training parameters. Ah, I got it. The adapters reduce the rank of changes to weight matrices, not the actual weight matrices, so they can support really aggressive levels of compression in pretty much all cases. Both adapters and 1-bit algorithms are forms of compression for parameter updates. LoRA does it by creating a parameter space related to the original through a low-rank linear transformation, whereas 1-bit algorithms do it by using a codebook. >[Batch sizes, learning rates] Very cool. I can't believe I hadn't looked into it before. I don't understand why per-layer learning rates seem to be so important for large batch sizes. I guess I'll have to read the LAMB paper for this. Random thought if per-layer learning rates are so important: it might make sense to model layer-layer interactions when setting the learning rate. Intuitively, if learning rates define a metric on the space of gradients, then it might help to use a non-diagonal metric tensor. Maybe LAMB already does this.
>>17510 The problem with adapter-based training is the obvious degradation of complexity of what you can learn. Generally speaking, optimizing the very process of what makes Deep Learning work at all - that is, meddling with weight optimization via backpropagation - is very dangerous. Dangerous not in an interesting mad science sense, but simply because it easily degrades the learning curve, until the moment where you could reach the same loss (result) on a single machine and with a smaller model. We need to pretrain our own model, and adapters here are very likely not useful (and regarding low-rank decomposition tbh I have seen only one paper - by Cohere - where low-rank worked for pretraining at all, for a humble parameter reduction). 1-bit LAMB is more interesting and useful, but again, with serious caveats: 1. Even with 1 bit per parameter without some additional sparsity and/or compression you make training of moderately large models over the internet unrealistic due to data waiting stalls exceeding your computation time by order of magnitude and more. 2. Many 1 bit and sparse schemes incur some sort of loss curve degradation, though there are solid schemes which incur mostly none. 3. The paradigm of X-bit OptimizerName is likely misguided, fully cooperative optimization overcomplicates the engineering and requires the admin of the system to trust nodes more; Parameter-server (possibly with distributed parameter server cluster) based approaches where nodes only compute gradients seem optimal. >By training with Kronecker adapters or low rank adapters the parameters can be reduced to 0.2% Do I need to say that this is wishful thinking, anon, does it help... Task-specific (hypernetwork-like) embeddings have their place, but making it work on a general-purpose system that shuffles these automatically task-appropriately and in a learned way is hard, and recent deepmind's foray into this area ended with modest gains. >>17530 >If you have massive batch sizes Massive batch size is not a given - most known tasks have a critical batch size past which the scaling becomes detrimental to test loss curve performance. >>17533 Nice papers on batch size scaling mentioned. >I'm looking forward to trying out the Hyperformer to finetune a model that can switch between casual chat, question answering and story writing. Tbh I don't see how it's different from knowledge engineering which led us nowhere. Hard to beat scaling on generalist pretraining. TLDR; What has not been validated at scale almost certainly does not work. Designing and executing distributed training of meaningfully-sized models is very hard, some very smart people tried and failed it. A conservative approach is needed, but even if it materializes I just don't see how to attract enough committed volunteers to make it happen. Also have a nice paper and repo (finally something that kind of works lol): https://www.semanticscholar.org/paper/Spartan%3A-Differentiable-Sparsity-via-Regularized-Tai-Tian/210c47fc0c16bf1cfc9beeb01faf70fcdbd3b978 https://github.com/facebookresearch/spartanver (possibly with distributed parameter server cluster) based approaches where nodes only compute gradients seem optimal.
>>17543 Important note: in my models of this distributed training process the maximum allowed batch size is the de-facto limiting factor (with the other limited factor being the parameter server bandwidth - which should be alleviated with distributed parameter server aka trusted supernodes), given even modest success among volunteers. This mandates task and dataset design which make large batch sizes beneficial - an R&D-heavy open-ended problem. This general problem of distributed training requires careful experiment-driven engineering and iteration, not stacking of meme paper upon meme paper (almost certainly not even applicable to the distributed training context) in one's head, only to fail once the initial implementation gets done (does it even, lol). Seriously, try to simulate even a simple distributed training run on your machine, see how the loss curve compares to normal dense training - and become blackpilled like myself and become a better engineer through the experience. I came to see training as a very fragile process which gets ruined quickly by our brazen approximation and compression.
>>17543 >We need to pretrain our own model I say from experience that this is very bad advice. Not only does it cost a lot to pretrain good models from scratch, but you're inevitably going to fall behind SOTA as the (much larger, much better-funded) ML community continues to publish increasingly powerful models. Getting cheap, flexible ways to adapt other people's models to custom use cases seems far more promising, even if those adaptations perform worse than normal fine-tuning. >This general problem of distributed training requires careful experiment-driven engineering and iteration, not stacking of meme paper upon meme paper (almost certainly not even applicable to the distributed training context) in one's head, only to fail once the initial implementation gets done (does it even, lol). Your attitude is completely assinine. You're making assumptions about a person that have already been demonstrated false, and you're using 'meme' language to degrade them.
Open file (98.28 KB 928x534 215.png)
Open file (63.19 KB 822x370 323.png)
Open file (542.01 KB 1049x819 745.png)
>>17549 >I say from experience that this is very bad advice. Not only does it cost a lot to pretrain good models from scratch, but you're inevitably going to fall behind SOTA as the (much larger, much better-funded) ML community continues to publish increasingly powerful models. Getting cheap, flexible ways to adapt other people's models to custom use cases seems far more promising, even if those adaptations perform worse than normal fine-tuning. This assumes the industry will continue feeding the increasingly problematic (their words) and outdated (the future ISN'T opensource, hello anon) hobbyist freeriders their precious checkpoints, which is becoming increasingly tired and borderline crazy assumption to make, as the political machinery of AI regulation and so-called "compute governance" gets ironed out and implemented in laws and industry norms. If you don't see this and think stable diffusion is a counter-example you are myopic, dear anon. Simple question: have you seen an open-source Gato replication, and if not, care to elaborate why, given small apparent cost? (And how are you even going to cope with lack of Gato-tier checkpoint, by training a handful of adapter layers over run-of-the-mill language model trained on e-trash and deemed safe enough for release? It won't be able to behave and you know it, anon - no mental gymnastics will help it gain modes of behavior and concepts it didn't get in pretraining phase - unless you throw it away and just re-train it almost from scratch). The large-scale reality on this issue in the coming years is likely to be shaped like this: AI regulation package is passed by a legislative body of a major geopolitical bloc to little fanfare. Other countries follow suit, like usual. The legislation mostly centers around a concept of "general-purpose AI system" general-purpose here being a wide and blurry enough legalese definition - which firmly encompasses Gato-like and even simpler systems. The legislation effectively forbids startups and individuals from experimentation on such systems, ensuring the regulatory burden is high enough (for example requiring costly bureaucratic project-level and fully transparent runtime audits - to the point of requiring giving away your servers' SSH keys to authorities), while also forbidding the more powerful entities the occasional release of such general-purpose checkpoints under fear of large fines and career damage. This is it, this kills your dream, unless you understand me and start thinking outside the cozy hugbox. The funny thing is the industry is already going in this direction by virtue of self-censorship. Again, you sure understand why we have only pussy VIMA instead of full-on Gato, and why stability.ai did a lot of various things yet wasn't brave enough to tackle this? Smart people are exceedingly good at self-censorship, you should have known if you studied how academia works. Maybe we will see opensource Gato at some point, but this long delay and silent self-censorship on behalf of all opensource AI collectives regarding this issue is already telling. It's like you don't read twitter and don't see how AI alignment meme slowly wins over generally capable yet socially primitive autistic minds of researchers. Add 2 and 2 and stop expecting freeriding in an industry which is increasingly being compared to uranium experimentation by eccentric physicists before the WWII. >much larger, much better-funded Yes this is what defeat looks like. We (who are we lol, misfits, individualist tinkerers?) are being defeated year after year. Consumers are happily consooming mediocre (yet still head and shoulders above the opensource demos due to a modicum of product development and UX polish) products and ask for more. The very creators abandon their creations when their internal motivation falters, lacking attention span to polish them to some trivial degree. It's all obvious and sad.
Open file (166.17 KB 822x693 415.png)
Open file (160.95 KB 1497x623 742.png)
>>17549 >Your attitude is completely assinine. You're making assumptions about a person that have already been demonstrated false, and you're using 'meme' language to degrade them. I don't see how to motivate people to do diligent engineering tbh. If people can get praise and updoots for DL-flavored shower thoughts on an imageboard promising unrealistic order of magnitude gains, they will do just that, mostly. It seems all remotely capable people from the first world are vacuumed by startups and industry, and what remains, in most of the best cases, is just undergrads playing with ideas from meme papers - until they crash into reality, finally git gud and find a job which will look like a salvation by that time (mind you, I don't like this outcome but this is how it works in most cases). Would be cool if someone proved my observation false - and I can find a vanishingly small minority of human counterexamples - just not here. I don't fear causing negative reactions at all, I have seen all too many projects going nowhere mostly via this incentive hijacking and an encroaching victory of talkers over builders. As you may see I have more than enough of wordcel quality as well (thankfully, not the only gift of mother Nature) - this environment selects for this property, which should be noticed and punished before it is all too late. If you are with me by this time, I can give you several valuable suggestions. If you really want to win, fulfill three requirements: 1. Design and implement a realistic deep learning system that is capable of training in self-supervised and reinforcement learning modes. The system should maximize parameter efficiency, and should be able to be run on high-powered consumer hardware. Low-powered hardware is a poor man's deadend. 2. Design and implement either a distributed training system, or a social project to gather cryptocurrency from people and knowhow to channel it into normal centralized training of your system. 3. Amass stable and massive popular support among individuals loyal to your case and owning enough compute or cryptocurrency to participate in the training run. Train your system on general-purpose task distribution in Gato-like fashion. Release the checkpoint via a signed magnet link. The vast majority of approaches that don't tick these checkboxes are pity trash and self-delusion. On a general note, hugboxes are order of magnitude more evil than casual negativity when taken long-term. A hugbox is a cemetery for budding talent.
Open file (60.74 KB 510x456 tay stonks.png)
>Do I need to say that this is wishful thinking, anon, does it help... I've played around with them already on toy problems and saw potential in them, and I doubt someone at Google with 18k citations and someone else at Microsoft with 7k spend their time publishing meme papers for the lulz. Obviously there is a loss in complexity but it gives the option to trade off accuracy for less parameters so they can be sent over the net. I really don't care about state of the art results or being an outstanding engineer. It just needs to get the job done. >17544 >Seriously, try to simulate even a simple distributed training run on your machine, see how the loss curve compares to normal dense training I'll run some tests on different parameter reductions. If you have your own feel free to share your results. I think there was a paper that showed it's faster to train a model with more parameters and distil it into a smaller model. It's something to consider whether it's even worth distributing training or to just buy better hardware. >and become blackpilled like myself and become a better engineer through the experience. I was blackpilled until I started using LAMB instead of Adam. Being bitter and cynical isn't useful to getting stuff done. I neither believe or disbelieve papers. I just see them as possibilities to be explored when appropriate. Once you start making judgments something is just wishful thinking without actually testing it you're cutting off the possibility of ever knowing because you've already decided it's no good. >>17549 He's not really wrong to be honest. It's hard not to be cynical. Most gradient compression methods like PowerSGD and others often make wild claims but in practice require perfect hyperparameters for the task at hand and can become unstable or fail midway without warning. The 1-bit LAMB paper seems a lot more reasonable since they're only claiming a 4x reduction and LAMB has been pretty robust for me in all use cases except small batch sizes. >>17551 >This assumes the industry will continue feeding the increasingly problematic (their words) and outdated (the future ISN'T opensource, hello anon) hobbyist freeriders their precious checkpoints Hugging Face has already started removing 'problematic' checkpoints but people are happily sharing them over torrents. I also don't think people should expect there to be many useful pretrained models in the future from the West, except small nerfed ones verifiably 'unproblematic' and torrents created by individuals. Look at the heat Stability.AI is taking for releasing models publicly and they don't even care a shred about open-source. They happily banned open-source contributors they saw as problematic and the rest is all rhetoric for free advertising. >The large-scale reality on this issue in the coming years is likely to be shaped like this: AI regulation package is passed by a legislative body of a major geopolitical bloc to little fanfare Not going to happen on a significant scale. And if it does whichever countries do this will seal their fates to never having any geopolitical power because they just dropped 95% of their AI researchers by causing massive human capital flight to less regulated countries. If Stable Diffusion gets taken down it implies taking down GPT2, OPT and almost all other models trained on any copyrighted or private data. Again, not going to happen unless a country is suicidal, and if that's the case they have much bigger issues to worry about than playing around with AI models. >And how are you even going to cope with lack of Gato-tier checkpoint, by training a handful of adapter layers over run of the mill language model trained on e-trash and deemed safe enough for release? It won't be able to behave and you know it If someone wants a Gato model or larger VIMA model then yeah they're going to have to train their own. Adapters aren't going to do shit. They're only useful for fine-tuning. On the other hand, pretrained models are pretty robust with what you can do with them. I've ripped embedding layers out of models and retrained them with new tokenizers. I remember there was a paper that found pretraining on Wikipedia helped with reinforcement learning, both speeding up convergence and getting better results: https://arxiv.org/abs/2201.12122 Even if no more pretrained models get released there's plenty to work with for the next 8 years and come up with a plan. Personally I'm not concerned with making models from scratch right now. I don't have 10 80GB A100s at my disposal let alone 1000 and I doubt a ragtag team of 3090s and old GPUs will achieve anything useful from scratch. I just want to finetune language and vision models for conversational AI to a greater degree than can be done alone. The reason there isn't an open-source Gato is because no one wants to spend $10,000 of their own money for a toy that will be obsolete in 3 years. >17552 >I have seen all too many projects going nowhere mostly via this incentive hijacking and an encroaching victory of talkers over builders. Then build your own? There have been plenty of shit talkers to come by over the years telling us why we should do it their way but never post their own work. I'm happy with the progress I'm making and others are making. I'm not making this shit to save the world or take on corporations or for anyone else. I'm just building an AI waifu at home on a budget and sharing what I know.
>>17553 >Being bitter and cynical isn't useful to getting stuff done. I would go a step farther and say it's quite counter-productive. A fundamental belief that you can succeed at a given endeavor is, all else being equal, the most fundamental pre-req for eventual actual success in the effort. History is rife with pertinant examples in the arts & sciences, politics & tech. To wit, compare: >Just give up, it's hopeless!111 >vs. >Keep.moving.forward. Which way would you prefer to live Anon, whether success or failure is the ultimate outcome? Indeed which is more likely to succeed?
This post is off topic. Feel free to delete it or move it if you don't want it here. >>17551 >Yes this is what defeat looks like. That's not a defeat scenario, that's what I expect is the inevitable scenario for all parties, even well-funded ones. Large models have gotten 10x bigger year-on-year since 2018, and I think training costs have risen faster than that. Companies are eventually going to be forced to specialize in their AI direction, at which point none of them will be "the best" at everything. At that point, if you want "the best" AI, you'll need to be able to plug into multiple models from multiple parties regardless of how well-funded you are. In the long run, no model performs better than a mixture of all the leading models. I'm not concerned about new legislation hindering open source AI. The US is far too afraid of China taking the lead on tech to introduce legislation that hinders AI development in any meaningful way. AI deployment, maybe, but AI development, no. I would guess that any Five Eyes country will be the same. The EU is going to get screwed on legislation as usual. That sucks, but as far as I know, the majority of Western open source AI enthusiasm is in the US and UK, and EU legislation is largely irrelevant in this. (Sorry if you're in the EU. If you do get screwed on legislation, maybe some of us can help proxy your work.) At least in the US, it's more likely that people will use current legislation against open source AI development, but I think even that is unlikely to succeed at scale. As far as taking advantage of open source code goes, it looks like Microsoft will be forced to take lead on the defense thanks to GitHub and Copilot, and they are very familiar with large legal battles around software. As far as making sure AI has access to copyrighted data, Google has an enormous stake in this, and they have won at least one related battle (Authors Guild, Inc. v. Google) in the US Supreme Court with a ruling that's definitely broad enough to cover AI use cases. As far as open source development goes, open source code and published research papers fall under the First Amendment, and this has been tested at the federal level even for something as extreme as cryptography (Bernstein v. Department of State). These cases can be overturned by the Supreme Court, but public opinion does not sway the current Supreme Court, as has been demonstrated recently with Row v. Wade. From the defense to the precedents to the judges, everything seems to work in favor of open source software in the US.
>>17557 >This post is off topic. It certainly is, but one of rather high-quality. Thanks, actually. >Feel free to delete it or move it if you don't want it here. I may move the conversation to /meta or the news thread. As to your post, you're simply replying in-kind to the already-offtopic-poster who seems to do this frequently, but as he is an outsider and newcomer by his own admission no surprises tbh, you can get a hallpass this time Anon. :^)

Report/Delete/Moderation Forms
Delete
Report