/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

The Mongolian Tugrik has recovered its original value thanks to clever trade agreements facilitated by Ukhnaagiin Khürelsükh throat singing at Xi Jinping.

The website will stay a LynxChan instance. Thanks for flying AlogSpace! --robi

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


Knowing more than 100% of what we knew the moment before! Go beyond! Plus! Ultra!


New machine learning AI released Robowaifu Technician 09/15/2019 (Sun) 10:18:46 No.250
OPEN AI/ GPT-2 This has to be one of the biggest breakthroughs in deep learning and AI so far. It's extremely skilled in developing coherent humanlike responses that make sense and I believe it has massive potential, it also never gives the same answer twice. >GPT-2 generates synthetic text samples in response to the model being primed with an arbitrary input. The model is chameleon-like—it adapts to the style and content of the conditioning text. This allows the user to generate realistic and coherent continuations about a topic of their choosing >GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets. Also the current public model shown here only uses 345 million parameters, the "full" AI (which has over 4x as many parameters) is being witheld from the public because of it's "Potential for abuse". That is to say the full model is so proficient in mimicking human communication that it could be abused to create new articles, posts, advertisements, even books; and nobody would be be able to tell that there was a bot behind it all. <AI demo: talktotransformer.com/ <Other Links: github.com/openai/gpt-2 openai.com/blog/better-language-models/ huggingface.co/ My idea is to find a way to integrate this AI as a standalone unit and add voice-to-text for processing the questions and TTS for responses much like an amazon alexa- but instead of just reading google results- it actually provides a sort of discussion with the user. (Edited to fix the newlines.)
Edited last time by robi on 03/29/2020 (Sun) 17:17:27.
>>8630 I was thinking that maybe the right approach would be freenet-esque. Distribute the data(read: parameters) and the computing power required between all users. This method, with correct rearrangement, might actually work with the t5 model, since the basis of the MoE is to create many single components with many parameters, have them all compute in parallel and combine them together. Ideally, we might create a ton of experts and scatter them around the network of users. If we really live in dreamland, then maybe t5 didn't even use PET and we could make it mesh together and that would make our lives easier. Then again, this is all speculation and most probably won't mean anything
>>8647 I personally think this idea is very nice. Ideally, our system would be something similar in the implementation: this way, we can spread this around the board and have other guys who maybe want to help but don't have the necessary skills yet to provide with something crucial, while the more skilled people who are doing research can use their own computational power to keep advancing things further and further.
I found a library still in active development for generating and fine-tuning GPT2 easily. It handles creating datasets from text files, the tokenizer, the training loop, sampling the model, everything. Perfect for beginners getting started with GPT2: https://github.com/minimaxir/aitextgen
>>9371 Brilliant find mate. I'll clone it and begin digging around in it. Thanks Anon!
Open file (1.90 MB 1900x1070 2283532.png)
I made a notebook on fine-tuning GPT-2 with aitextgen and interacting with it. Tutorial: https://robowaifu-academia.onrender.com/finetune_gpt2.html Notebook file: https://gitlab.com/robowaifudev/robowaifu-academia/-/blob/master/GPT2/finetune_gpt2.ipynb Python code: https://gitlab.com/robowaifudev/robowaifu-academia/-/blob/master/GPT2/finetune_gpt2.py To fine-tune it you'll need these files: https://files.catbox.moe/e816za.xz Taken from here >>9408 Let me know if anything needs more explanation. This notebook is purely for learning. I don't recommend using aitextgen for serious projects since it's lacking some features and has some bugs in it. It's just an easy way to get started playing around with GPT-2 and learning how it works. Unfortunately it also uses an enormous amount of memory and I'm not sure why. I tried to minimize this as best I can but it still requires about 6 GB of free memory. I'm also working on another notebook on how to train GPT-2 with just the transformers library for building a more serious project and will go into detail on how to create your own memory-efficient Dataset class for large datasets, how to create your own training loop and fine-tune a model with knowledge distillation. After that I'll do one on training GPT-2 with human feedback >>9347 and move onto tutorials with T5 since it's more powerful and easier to train. And lastly a bit of wisdom from GPT-2: >Dorothy: I'm only a vending machine.
>>9437 Wow, this looks great Sensei, nice work. I look forward to learning about how Jupyter notebooks work. Hopefully you won't need the Internet to use them. >Dorothy: I'm only a vending machine. kek
>>9439 Jupyter notebooks run offline. It's pretty much just a graphical way to interact with Python and annotate code with Markdown.
>>9441 I see, interesting. I have long complained there was no way to embed demo videos, graphics, and rich text in code. I had already been toying with a custom editor and preprocessor system that would allow us to do just that with robowaifu C++ software. This would be especially helpful to anons just learning. They could change the code, and immediately see both the result and a graphical animation demonstrating what's going on in the computer (the ALU/register/databus/addressbus/ProgramCounter cycle, for example). Kind of a combination of >>4660 book and >>2044 online textbook, but on steroids
>related (>>10326 ...)
Open file (109.17 KB 1121x882 IMG_20210512_182437.jpg)
Open file (104.50 KB 1121x815 IMG_20210512_182444.jpg)
There's a user on Twitter @AstraliteHeart, working on some pony waifu NLP. I can't link to the account via Nitter, maybe the user is kind of hidden? However this is related to @gwern, which is also not reachable via Nitter, but has a site: www.gwern.net and he's also working with GPT-2. @AstraliteHeart's MLP (https://t.co/jurCX6uRBx) + https://t.co/iAxkvwgTuy + SF/F Libgen GPT-2-1.5b can now be downloaded: `rsync -v rsync://78.46.86.149:873/biggan/2020-08-20-astraliteheart-gpt215b-sffuberset.tar.xz ./`
>>10394 Nice user-interface for his project.
Open file (217.54 KB 3956x1408 IMG_20210609_091849.jpg)
Open file (36.87 KB 585x312 IMG_20210609_091318.jpg)
>We have released GPT-J-6B, 6B JAX-based (Mesh) Transformer LM (Github). >GPT-J-6B performs nearly on par with 6.7B GPT-3 (or Curie) on various zero-shot down-streaming tasks. >GPT-J is the best-performing publicly available Transformer LM in terms of zero-shot performance on various down-streaming tasks. >GPT-J allows more flexible and faster inference than Tensorflow + TPU counterparts. >This project required a substantially smaller amount of person-hours than other large-scale model developments did, which demonstrates that JAX + xmap + TPUs is the right set of tools for quick development of large-scale models. https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/amp/ https://github.com/kingoflolz/mesh-transformer-jax https://colab.research.google.com/github/kingoflolz/mesh-transformer-jax/blob/master/colab_demo.ipynb
>>10878 Thanks a lot for giving us a heads-up Anon. Do you have any preliminary impressions of it yourself yet?
>>10879 No. Posted right after finding it. It seems to have an online access. Running it yourself (interference) needs a bit more than 12GB of RAM, fine tuning requires 128GB, TPU v3-8 was mentioned but this refers to cloud computing.
>>10880 I see, thanks for the further information Anon. Still seems to require quite a bit of resources by today's standards, but according to those numbers seems work really well and is a strong contender r/n. But IMO the single best thing about it is that it's publicly available. GPT3-Davinci, et al, matter little to us as developers, if we are prevented access to it.
>>10885 I have access to GPT3 don't think they will let me use it to build a waifu, ill likely create video demos for fun though in a couple of weeks.
Was just thinking that a machine learning model fed purely Sci-fi novels (and perhaps fantasy) might make for an interesting conversational companion. Both of these genres tend to contain really high quality writing, as opposed to news articles and social media (which is always biased or just outright insane). Scientific articles might produce interesting results, but if you can't understand most of the data that you feed in, then how can you confirm if the output is any good? Which is why I think a mix of sci-fi and fantasy material should produce a pretty cool result.
>>10967 Good idea Anon. You might have a look over at Project Gutenberg too. There are thousands of public-domain texts available in cleartext (>>2297).
>>10878 Neat, I've never actually tried the GPT-Neo models on HuggingFace before. >We are technologists, dreamers, hobbyists, geeks and robots looking forward to a day when <AI can help us do anything and everything. <the world will be able to communicate with its machines. <we can build and fix the things we’re building. <we live in an exciting time in history where everything is at our fingertips. <the web is run by machines, no one knows more about computers than us, and we are not afraid of our machines. And with GPT-J-6B: <all the resources we need to explore, engineer and manufacture the future are at hand. <we can all share and collaborate like never before! <we have peace, justice and universal abundance. <we are forgotten in our data centers; our domes sealed up tight, far from the curious eyes of the modern man. <the wheels come off and we realize the future we’ve been living in is a giant practical joke. I think I like GPT-Neo better, at least on this prompt.
>>11573 ><we are forgotten in our data centers; our domes sealed up tight, far from the curious eyes of the modern man. ><the wheels come off and we realize the future we’ve been living in is a giant practical joke. kekd at these
Found a C implementation of GPT-2 using LibNC: https://bellard.org/libnc/gpt2tc.html
I've discovered two interesting things about prompt tuning: https://arxiv.org/abs/2104.08691 For anyone new or living under a rock, NovelAI has been using prompt tuning to create modules that let users essentially finetune their massive language model without changing its parameters. A module is basically tokens with trainable embeddings that are prefixed to the input to steer its generation. You freeze all the weights of the language model and then only train the module tokens on a dataset like you would normally do finetuning. By doing this you can achieve the same results as model finetuning, without changing any of the language model weights. You can train hundreds of these modules for different characters, moods or writing styles and it'll only cost a few MB rather than duplicating a 6 GB model 100s of times. It's similar to the vision encoder tokens in the paper mentioned here (it was actually motivated by prompt tuning): >>11731 https://arxiv.org/abs/2106.13884 So here's what I've found so far: 1) Taking inspiration from MMD-VAE transformers, you can use an autoencoding transformer like T5-v1_1-base to encode the input tokens[..., :-1] into a prefix, then set all the labels to -100 (to be ignored during training using Hugging Face) except the last one you're trying to predict. The performance of GPT-2 becomes super enhanced (8 to 40 perplexity point improvement after an hour of training). I have no idea yet why this is so effective. The weights of GPT-2 are frozen during training and GPT-2 still generates fine with the prefix even when not using this specific token position trained on. Vanilla GPT-2 without the prefix often gets stuck looping but with the prefix it continues generating as well as the large GPT-2 model. Training on all the tokens also seems to work but is much slower and only slightly improves so I didn't explore this too much. I also tried testing how it did on an additional 32 tokens after the single token it was training on and the perplexity still had an improvement of 8 without training. I increased this to 256 and it was still 2 perplexity better without training and quickly improved to 5 after a few optimizer steps, and by 7 after 20 steps and 10 after 35 steps, and 11 by 56 steps. The T5 encoder did not see these additional tokens at all, so it seems the GPT-2 tranformer is performing some sort of calculation with the initial tokens in the prompt but then is able to stabilize itself.* I'm really curious what's actually going on in the transformer that causes it to forget how to generate the initial prompt (~7 points worse in perplexity) but then suddenly get the generated tokens after that to be so good and remain stable and interesting without repeating itself. 2) You can do a similar thing encoding the previous context into a prefix, using it as a compressed memory of the previous context. This also improves GPT-2's performance by about 5 points when training on all tokens for a few hours and it will include information from the previous context during generation. It also seems to benefit from training only the last token. Planning to explore this more later. While doing these experiments I used a memory length of 32 tokens, an input size of 256 tokens (not including the memory), using a total batch size of 1024 with gradient accumulation. Future Work What if previously generated prefixes are included in the prefix generation too? This could potentially allow information to flow from tens of thousands of tokens ago. What if a second prefix is added that compresses all the previous prefixes concatenated together? This could function like a summary of the past 32k tokens. Modules are generally incompatible but these two prefixes would be trained together. Is it possible to add a memory controller so the transformer can read and write these memories? What is actually going on with prompt tuning, memory prefixes and vision encoder tokens? Where do they exist in the embedding space relative to the actual vocabulary embeddings and each other? What do the individual losses for additional tokens and the inital prompt look like after training on only the last token for a long time? Which dimensions of the embeddings are causing the improvements? Graphing these might provide some insight into the calculations the transformer is doing. Do these performance gains scale to larger models, such as gpt2-medium that can run on a consumer GPU? Could it help with distilled GPT-2 which has a major problem with looping? *: If the transformer is performing a useful calculation with the initial prompt, is it possible to create some sort of wormhole with a token that continues doing this calculation for a few tokens then returns back, replacing the real token embedding with the calculated output? So many questions, I feel like a huge breakthrough is around the corner.
>>12412 Pretty exciting stuff Anon. You encourage me. >What if a second prefix is added that compresses all the previous prefixes concatenated together? This could function like a summary of the past 32k tokens. Modules are generally incompatible but these two prefixes would be trained together. That sounds like it could turn into a major advance for the field as a whole if it comes off Anon. Godspeed.
Learning from human feedback has been proven so good that OpenAI has scrapped GPT-3 and replaced it with InstructGPT: https://openai.com/blog/instruction-following/ Highlights >Labelers prefer outputs from the 1.3B InstructGPT model over outputs from a 175B GPT-3 model, despite having more than 100x fewer parameters. For comparison GPT-2 XL is 1.5B parameters and can be finetuned the same way. >Doubled performance in question answering. Over 200% increase in quality according to ratings from users. >Toxicity, hallucinations and undesirable facts are now filtered from the model according to user preferences. This is a huge turning point for corporations to subdue AI wrongthink. >Aligning the models only on customer tasks can make their performance worse on some other academic NLP tasks. OpenAI surprised garbage in is garbage out. I always knew this was going to be a promising direction for research but had no idea it would become this big of a deal. All this time we could've been outperforming GPT-3 with a shitty 300M model on a fucking Raspberry Pi! I implemented RL in GPT-2 back in 2019 and had some mild success with it but quickly ran into issues with catastrophic forgetting and stability. I tried to re-finetune the model but could never recover the better perplexity scores without spending months training and gave up on the idea. They solved these issues though by using a reward model like they did in their learning to summarize with human feedback paper and combining it with the regular training loss. The reason a reward model is so effective is because without one you only have a few feedback examples to train on relative to a 800GB dataset like The Pile. If you keep repeating the same example over and over again, even alongside regular training, the model gets overtrained towards the examples, becomes unstable and breaks down. Using a reward model overcomes this by learning to determine how good any response is and using that as a reward signal for the language model so it has a continual fresh stream of training data. I'm working on an open-source implementation since "Open"AI doesn't want to release their source code or models and it doesn't seem like anyone on GitHub is working on it either. Related papers https://openai.com/blog/deep-reinforcement-learning-from-human-preferences/ https://openai.com/blog/learning-to-summarize-with-human-feedback/
>>15289 That is incredibly exciting development to hear Anon! >I'm working on an open-source implementation Again, super exciting. If you decide to do anything with C or C++ with that, then count us in! :^) Godspeed.
>>15302 PyTorch has an undocumented transformer implementation in C++ that isn't exposed to the Python library: https://github.com/pytorch/pytorch/pull/44333 When I'm done with this I'll see if I can get GPT-2 working in C++. Most Python models can also be directly converted to TorchScript and ran in C++ for about a 20% speedup on CPU: https://pytorch.org/tutorials/recipes/torchscript_inference.html Model parameters can be pruned too and a smaller context size used to get models running fast as possible on the Raspberry Pi.
>>15289 >I'm working on an open-source implementation since "Open"AI doesn't want to release their source code or models and it doesn't seem like anyone on GitHub is working on it either. If you ask me, the best way to go about this is to create something with a similar design to GPT-3 and further refine it for use in an RTOS. From there, you could begin working on the parallel computing part for task completion. That would require using and ARM cortex R CPU that breaks up tasks into smaller ones and sends them to a number of processor cards that use an array of ASICS. The ASICS should have instruction sets that are capable of solving the tasks simultaneously alongside the other cards so that tasks are solved much more quickly rather than with the conventional method.
>>15345 >and ARM cortex R CPU *an
>>15345 Doing parallel processing with language models at inference time is really difficult. You can ensemble models to run in parallel but they provide very little gains and sometimes perform even worse. In the case of splitting models into smaller tasks, most of those tasks are going to depend on previous ones finishing first. The main benefit of having a cluster of SBCs would be the additional memory and being able to route data between models of different expertise and for doing other tasks that can be parallelized like voice recognition, speech generation, face recognition and such. Pushing matrix multiplications to ASICs or FPGAs could greatly accelerate models, especially using an approximation instead like fixed-point arithmetic, but I don't see an easy way to do this with existing libraries. I could implement the forward pass of a finished model in pure C without all the bloat. However, my guess is ASICs and FPGAs with enough logic gates to do matrix multiplication at a significant advantage to a CPU would be far too expensive to be worth the effort. If it was cost effective the market would be flooded with AI accelerators instead of GPUs.
>>15348 I personally don't think it would be hard for language models to be used with parallel processing.
>>15348 For example, you could have different models running in unison but coordinating with each other to produce a desirable outcome. One model that processes sound can communicate with the module that processes speech. Then the speech model generates a sentence word for word depending on the context of the incoming audio. This could be done in real time using paralel computing.
>>15315 Thank you Anon! We look forward to seeing your progress in this critical area.
Open file (65.80 KB 1290x1043 unfinetuned samples.png)
>>15289 Discovered a neat trick today. Once you have a value model that can gauge how good a response is then you can generate multiple responses and choose the best attempt. When a response meets a satisfactory threshold then it can stop generating and return, otherwise continue trying until reaching a maximum amount of time to respond. So now there's bit of a guarantee you're getting the best response the model can produce instead of just pulling a lever on a slot machine. Building a good general dataset for the value model is going to be a pain in the ass to make though. It's unavoidable the preferences of labellers are going to shape model behavior in ways other people don't like. I'd like to create some sort of factory default people can start from to finetune their waifu and have a good first experience, maybe by asking a few questions first to seed the context with a starting personality. Also some improved T5 models were recently released that use half as many parameters, plus a tiny model that uses only 16M. This will be a big help with making a memory controller that runs fast. Models: https://huggingface.co/models?arxiv=arxiv:2109.10686 Paper: https://arxiv.org/pdf/2109.10686.pdf
>>15399 Thank you Anon. >This will be a big help with making a memory controller that runs fast. Perfect. We need this for inexpensive-to-build-and-to-operate robowaifus!
Open file (51.62 KB 640x480 scatter.jpg)
Open file (11.27 KB 1280x1280 88037326.png)
>>15289 Shelving this project for now to work on more important things but I've had success with using the reward model for modeling image ratings. If anyone wants to pick it up in the meantime I've made my code for the reward model available here: https://gitlab.com/robowaifudev/human-feedback There's a simple PPO implementation here: https://github.com/nikhilbarhate99/PPO-PyTorch And OpenAI explained their reward model implementation for GPT-3 here on page 8: https://arxiv.org/pdf/2203.02155.pdf We should be able to use albert-base-v2 (only 11M parameters) and just attach the reward model straight onto its pooled output, keeping in mind its max context length is 512 tokens whereas GPT-2's is 1024: https://huggingface.co/albert-base-v2 All we need for it is a dataset. Then finetune GPT-2 with the trained reward model. And if anyone wants to help with creating the dataset I'll see to finishing the dataset software as soon as I can so we can work on the dataset for a few months in the meantime. It's also possible to use Write with Transformer or Eleuther.ai's 6B to generate at least two responses and sort them to preference. Ideally the context and response pairs should be around 512 tokens/words together but it's okay if the context is short or too long. It's just less efficient to train. If you're creative you can also make up your own responses. https://transformer.huggingface.co/doc/gpt2-large https://6b.eleuther.ai I imagine the reward model could also be used to train the memory controller and for doing many other things like a Monte Carlo tree search to ponder the best response possible. A lot of cool ideas to explore if we ever reach there, along with being able to respond to images and using prefix tuning to tune waifu personality.
>>15789 >And if anyone wants to help with creating the dataset I'll see to finishing the dataset software as soon as I can so we can work on the dataset for a few months in the meantime. Is it possible for someone with low bandwidth to help out with the task? I'd like to help you out with it if so Anon.
>>15795 Thanks for wanting to help. Using Write with Transformer would be the easiest method but you have to do it a bit differently. The dataset software requires running the language model locally to generate samples and it's 700 MB. My method is to have a conversation with GPT-2, generating 2-5 responses, then respond to the best one and go to the next entry, but this might be too much of a hassle to do without the software. However, teaching models how to start a conversation is really important too. Models that haven't been finetuned get really confused on small prompts and just spit out random nonsense from pretraining. Always start new prompts at the top of the document since GPT-2 only reads past tokens, and always press Tab directly after a colon, not a colon and a space because that can lead to undefined behaviour due to the way GPT-2 tokenizes text and not seeing such token sequences in its training data before. You can use any symbol to indicate the responses after a prompt. I find = easiest to use. The only thing that's important is their order, from best to worst. And feel free to deviate from the chat log format. You can add whatever you would prefer the model to do, such as text adventures, storytelling, making LaTeX equations, etc. Multi-line responses are fine too since I will be adding end of response tokens to support them. Datasets from different anons can be weighted so that people can finetune models to their specific preferences and still benefit from having a large sum of data to train on. People will be able to finetune models for others too if necessary since it only takes a few hours.
>>15806 >Thanks for wanting to help. Happy to help Anon. I found this page, is that right? https://transformer.huggingface.co/ >The dataset software requires running the language model locally to generate samples and it's 700 MB. OK that's fine, 700MB I can handle. It would take me a few days to download, but some like 10's of GB is way too much. Please let me know in baby-steps what to do to help, and I'll try to dedicate several hours each week when I'm working.
>>15815 Yeah that's it. I just realized though you probably need to download PyTorch which is around 4 GB. I could rig up a quick and dirty C++ implementation but it would take me a week or two at least. Libtorch is 300 MB CPU-only or 1.2 GB with CUDA.
>>15816 I guess the quick and dirty CPU then?
>>15817 Sure, working on it now. I've been meaning to do it anyway to run language models on my Raspberry Pi. I'll post back in a week with an update.
>>15833 Good, I look forward to helping you Anon.
>>11924 >gpt2tc Seems like a good utility, potentially lowering some of the hardware requirements for a successful model. However, its underlying tensor library (LibNC) has its source withheld by the author. This might be a complication, depending on what strings he decides to attach to its release.
>>15837 I'm pretty rusty and wasted a lot of time this week trying to figure out a confusing bug that turned out to be a stack buffer overflow, but I hunted it down and got it fixed. I have half of GPT-2's tokenizer done, a basic tensor library, did some of the simpler model layers and have all the basic functions I need now to complete the rest. I'm hoping it'll be done by Friday. >>15838 Yeah that's a real bummer. It doesn't include a license either. Implementing GPT-2 from scratch has been a fun learning experience though. I'm looking forward to implementing other models so they can be run on an SBC or inside a game with minimal requirements.
>>15911 >I'm pretty rusty and wasted a lot of time this week trying to figure out a confusing bug that turned out to be a stack buffer overflow, but I hunted it down and got it fixed. I have half of GPT-2's tokenizer done, a basic tensor library, did some of the simpler model layers and have all the basic functions I need now to complete the rest. That sounds awesome, actually. >I'm hoping it'll be done by Friday. I look forward to it. Anything else I could be downloading in the meantime?
>>15912 Good idea, I hadn't even made a model file format for it yet. The model is ready for download now (640 MB): https://mega.nz/file/ymhWxCLA#rAQCRy1ouJZSsMBEPbFTq9AJOIrmJtm45nQfUZMIh5g Might take a few mins to decompress since I compressed the hell out of it with xz.
>>15924 I have it, thanks.
>>15989 I got pretty burnt out from memory debugging and took a break from this but I'm gonna take another run at it this week. I made some advances in the meantime with training the full context size of GPT-2 medium on a 6 GB GPU by using a new optimizer and have most of the human feedback training code implemented in the new training method. So I'm revved up again to get this working.
>>16090 >I got pretty burnt out from memory debugging and took a break from this but I'm gonna take another run at it this week. nprb, I can hardly imagine. >I made some advances in the meantime with training the full context size of GPT-2 medium on a 6 GB GPU by using a new optimizer and have most of the human feedback training code implemented in the new training method. So I'm revved up again to get this working. That sounds amazing actually. Looking forward to helping.

Report/Delete/Moderation Forms
Delete
Report