/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

Site was down because of hosting-related issues. Figuring out why it happened now.

Build Back Better

Sorry for the delays in the BBB plan. An update will be issued in the thread soon in late August. -r

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


“I am not judged by the number of times I fail, but by the number of times I succeed: and the number of times I succeed is in direct proportion to the number of times I fail and keep trying.” -t. Tom Hopkins


General Robotics/A.I. news and commentary Robowaifu Technician 09/18/2019 (Wed) 11:20:15 No.404
Anything in general related to the Robotics or A.I. industries, or any social or economic issues surrounding it (especially of RoboWaifus). www.therobotreport.com/news/lets-hope-trump-does-what-he-says-regarding-robots-and-robotics https://archive.is/u5Msf blogmaverick.com/2016/12/18/dear-mr-president-my-suggestion-for-infrastructure-spending/ https://archive.is/l82dZ >=== -add A.I. to thread topic
Edited last time by Chobitsu on 12/17/2020 (Thu) 20:16:50.
Open file (200.59 KB 657x805 palm outputs.png)
>>15800 Haven't read the full paper but it's good to see GLUs are finally getting the recognition they deserve. I've found not using biases in dense layers and convolutions is useful because a lot of the time the network will zero out the weights and try to do all the work through the biases causing them to throw away a shit ton of work and get stuck in local minima, which causes huge instability in GANs in particular. On the other hand though there are problems that can only be solved by using biases. Turning them all off is better in general but I think it's a bit naive and they didn't investigate where turning them off was better because the model was too expensive to train. Sharing the key/value projections between heads seems really useful to speed up inference on CPUs. RoPE embeddings improve training on smaller models too. The Pathways system and other improvements seem to only apply to their training accelerators. I was hoping it was an architecture improvement but it's just more haha matrix multiplication go brrrr. >>15801 The two TPU pods they trained it on use 8 kWh each and cost $2-3 million per year each and I estimate they trained the biggest one for about 4 months, so about $1.5 million. They only give cherry-picked examples except for their 62B model. The paper seems mainly focused on performance rather than developing new understanding.
>>15805 >so about $1.5 million. Thanks for the estimate Anon. So, apparently Nvidia won't even sell them, but leased only? I'm sure their government contracts probably are stipulated differently wouldn't you imagine? >I was hoping it was an architecture improvement but it's just more haha matrix multiplication go brrrr. >The paper seems mainly focused on performance rather than developing new understanding. Well, the answer's out there Anon. We just need to keep digging! :^)
Here is the link I mentioned in Meta-5 >>15854 https://www.liliumrobotics.com/Small-Android/ (NSFW) >=== -add nsfw tag
Edited last time by Chobitsu on 04/13/2022 (Wed) 08:58:27.
I forget that when I paste from Ditto into here it also carries a "return" and posts before I'm done typing. https://www.liliumrobotics.com/About-us/ >Mission >We aim to bring lovable robots into the world to help humanity. We hope to make the first real robotic cat girls. Our larger robot aims to be a general purpose platform for accomplishing many tasks. >We hope to start a community of developers to improve the software and hardware. We aim to develop an Artifical Intelligence to provide love, care, and help towards humans. Please consider joining our community of developers. >Lilium Robotics was founded in 2020 as a small team of developers located online and in BC Canada. We are also supported by advisors from The University of British Columbia as well as a few corporate interests. >Over 2 years of work has produced many prototypes. Some of the prototypes explored different designs and we hope to continue to innovate. >Due to the complexity of these robots, most of our manufacturing is conducted in house. We have a small 3D printing farm, and many assembly benches to create our products. >It is challenging to continue to work on this project given many factors. It is only with community support that we are able to come this far. Feel free to contact us and help as a journalist, developer, customer, or potential investor. I do not doubt that this individual or even some of their team have lurked here or even been contributors. I realize I was being paranoid of this and possible disruptors to this board, seeing us as a useful think-tank to glean ideas from but making sure to keep us a little off-balance and disorganized so we do not become a threat. This company also has patent pending on their current line of "Lily Mini" catgirl model. While they claim to be open source, this will be a minefield to navigate if such a group is always one step ahead of us and patenting their processes. So far they have a 1.5-2k price point moe-sized 3d printed plastic waifu with attachable (how do I say this tactfully?) - onahole, mouth and varying sexual "positions". That aside, it can verbally converse at a level equal or exceeding Replika.AI, possibly running on GPT3 already. Watch for yourself: https://www.youtube.com/watch?v=G-OHSAGrrrw
>>15857 It could also go the other way around, we can glean ideas from them instead. I welcome for-profit endeavors as they tend to be more dedicated in producing results (compared to hobbyists who could procrastinate forever). Just like I keep looking at A_says_ and hamcat_mqq and other Japanese devs to see the general direction they are moving in, I welcome looking at this and hopefully other small companies. Unlike Boston Dynamics and other elite manufacturers, these guys' levels are still attainable to us. Now whether we should fear being "out-patented", I'm sure best practices will always be open to implementation by all. Just like Apple was unable to hold a monopoly on the capacitive touchscreen smartphone form factor, the concept of modular robot parts is already a given.
>>15857 The AI isn't really that good. You can get the same performance naively finetuning GPT-2. Even in this cherry-picked video she hallucinates by assuming she can cook dinners. It might seem cute but this is a common failure mode of language models. I wouldn't take patenting as malicious. Raspberry Pi has tons of patents and proprietary code to boot up but the rest is open source and competitors are free to create their own boards. If you don't patent stuff you're bound to get copycats and knockoffs that don't make any effort to innovate and are just there to profit off your hard work that you're trying to make a living from and stay in business. Personally I release my code so that anyone who wants to modify it and profit off it can. The purpose of sharing it is to fertilize the field so other people can grow the crops since I can't grow them all myself. I still have to buy the crops I helped fertilize but at least there is a variety and something else to buy. That's my mentality. If anyone really saw us as a useful think-tank I doubt they would shit where they eat. Maybe if we were directly cutting into someone's business then it would be a different story, but even then there are plenty of open-source projects people are making a killing from, such as Spleeter and neural style transfer because the average person doesn't know how to do this stuff and rather pay someone to do it.
>>15862 Anon, I would like to ask a question if I may. Would you be able to provide me with a tutorial to finetune GPT-2? Preferably with the one that has the highest parameters (the biggest one that we can reach). Secondly, I would like to know if it's possible to train the GPT-2 model for other languages as well. If so, how good hardware and how much would one need to train a proper model?
>>15877 Properly fine-tuning language models to a desired purpose requires a lot of knowledge, data and custom code unless you just want to fool around throwing something in the same way as pretraining data and seeing what it spits out. There are plenty of tutorials for that around the web. The training scripts and tutorials I've seen HuggingFace and other libraries provide for generative finetuning are extremely memory inefficient for any serious use beyond toy datasets. You want to tokenize your dataset first and then read the tokens from disk, not generating them all at run-time. The biggest model you can train depends on a lot of factors. You're not gonna get very far without a 12 GB GPU but you can finetune GPT-2 medium (375M parameters) with 6 GB using some tricks such as pre-tokenizing the dataset, gradient checkpointing, automated mixed precision, using a smaller context/block size (256 is good enough for conversation, 512 for responding to posts), or using SGD or RMSprop in place of AdamW but I recommend using AdamW unless you're absolutely unable to get things to fit in memory. Even with all these tricks you'll only be able to handle a batch size of 1, so you need to use gradient accumulation. If you want fast convergence start with 32 accumulation steps and double it each time training plateaus or the loss becomes too noisy, up to 512. If you want the best possible results, start with 512 or higher. The quality will be just as good as training it with 512 GPUs just 512x slower. The extra time isn't a big deal for finetuning since convergence happens quickly compared to training from scratch. People have had success transfer learning GPT-2 to other languages and made their models available. If you want a multilingual model I'm not aware of any and doubt their quality since it requires an immense amount of memory to model just one language. You could try making a multilingual one but the quality likely isn't going to be very good and will require a larger vocab size and thus larger memory requirements to train. What purpose are you going to use the model for? You should have something specific in mind, such as providing knowledge from books, brainstorming, critically examining ideas, listening, joking around. Make a list of everything you want it to do, then figure out what data you need to learn each task and the objectives to train on. For example, if you feed in a lot of questions, you probably don't care about the model being able to correctly generate lists of similar questions. You're only interested in how well it can respond to those questions, so labels should only be provided for the answers, rather than doing more generative pretraining that only learns to mimic text.
>>15882 All I'm reading here is that it's impossible to leverage AI for our purposes unless you're willing to take out a 20 year loan for GPUs.
>>15884 Don't worry RiCOdev, training AI is difficult but, running them is much easier. For example, you'd want a beefy GPU with many GB's of RAM to train an AI that processes camera images to find key elements but, you could deploy said AI low power hardware. Jevois is a good example of a low cost low power device that would run the model created by much stronger hardware. http://www.jevois.org/ The other Anon was talking about how difficult it is to work with GPT-2. They also bring up the very important part where you need to define what the AI is used for. I would add that you need to figure out the target hardware both for training and implementation, it can make a big difference in the AI you choose. Other Anons please feel free to clarify and correct me.
>>15884 We can leverage existing models by finetuning them but pretraining older models from scratch, like vanilla GPT-2, is out of the question. Finetuning just means to continue training a pretrained model on a new dataset, leveraging what it learned from the pretraining dataset. There are newer language models that can be trained from scratch on low-end hardware. IDSIA's fast weight programmer model trains blazing fast on my toaster GPU but I have neither the space required to pretrain a complete model on the Pile dataset which is 800 GB or the free resources since I have other stuff training. So I prefer using Nvidia's Megatron-LM which is only 375M parameters but performs nearly as good as GPT-2 XL with 1.5B. If you only have a tiny GPU to train on, IDSIA's recurrent FWP model is the way to go. Their 44M parameter configuration performs nearly as well as GPT-2 medium (375M) and outperforms GPT-2 small (117M), while training a few orders of magnitude faster and having a virtually infinite context window length because it doesn't use the insane O(n^2) dot product attention that eats up so much memory. There are also other methods to training deep language models even faster like ReZero: https://arxiv.org/abs/2003.04887 Algorithmic efficiency doubles roughly every 16 months, so a lot of the difficulty we have today will disappear in 2-4 years time. Hopefully by then people will have pretrained good models we already have like RFWP and brought them into mainstream use. And like Kiwi said you don't need to have a beefy GPU to use them. The minimum requirement to use GPT-2 medium's full context size is 3 GB. With half the context size it probably only needs around 1-2 GB. And 12 GB GPUs are only $500 today which is what I paid for a 6 GB one 3 years ago. >>15885 >I would add that you need to figure out the target hardware both for training and implementation Definitely, GPT-2 is going to be difficult to run off a SBC or deploy in games or a virtual waifu app. People forget that LSTMs perform just as well as transformers and run faster on CPUs, but didn't receive much attention since they couldn't be trained easily on GPU farms. With RFWP they took the advantages of both LSTMs and transformers so they can be trained on GPUs but also deployed for use on low-power mobile CPUs.
>>15885 >Jevois Baste. I have two of these, and they are quite remarkable for such tiny rigs.
A really interesting paper dropped a couple months ago on differentiable search indexes with promising results: https://arxiv.org/pdf/2202.06991.pdf Essentially you train a seq-to-seq transformer on a set of documents to predict which document ids a query matches without having to do any extra steps like a k nearest neighbour search or a maximal inner product. They found that semantic ids, where the ids represent document contents somewhat, worked best and they implemented these by using a hierarchical structure, sort of like tagging a book or article in a library by subject, topic, subtopic, publication title, date, author, article title and page number, but generated these clusters automatically via another model. Even a small search model (250M parameters) greatly outperformed a standard algorithmic approach and it can be continuously updated with new documents via finetuning. This is a huge development towards building effective memory controllers and language models capable of doing continual learning. You could attach semantic ids to memories and store them for retrieval later so a chatbot can remember the name of your dog, your birthday, past conversations, any specific books or documents it trained on, any random post on the internet in its dataset, and anything else. It will be able remember everything, bring those relevant memories into its generation context and give a proper reply. The possibilities of what could be done with this are mind boggling. Once I finish what I'm working on I'm going to implement a proof of concept. This is surely going to revive people's interest in chatbots and robowaifus once they're capable of learning and evolving, and not only that but accurately retrieving information on any topic, researching topics for you, answering programming questions, suggesting good recipes from your fridge and cupboard contents, making anime and game recommendations, reporting any news you might find interesting, and so much more that people in the 2000s dreamed chatbots would do. We're basically on track to the prediction I made that by 2022 AI companions will start becoming commonplace, which will hopefully translate into a surge of new devs and progress. What a time to be alive!
>>15893 That is really exciting to hear Anon! > any random post on the internet in its dataset Now I don't know about 'any', but I have over 60 IBs completely archived in my BUMP archives going back about 2.5 years now. And for the upcoming Bumpmaster/Waifusearch fused-combo, if you think it would be valuable & feasible, then I would welcome input on adapting Waifusearch & the post's internal tagging system to meet your needs with this. Just let me know if you think it's something worth our time, Anon.
>>15896 Tags for posts would certainly help with fine-tuning but the JSON data should be enough. I just need access to download posts from the old board. It'd be a good warmup project to create a neural search engine since I know the posts so well and could verify how well it's working. Then that could be expanded into a bot that can answer questions and bring up related posts that have fallen out of memory. For example, going back to knowledge graphs: >>15318 With DSI the memory reading part is set and done. It needs to be tested if memories can be written a similar way. One idea I have is if it tries to stuff memories into an existing semantic id that holds memory data already, the two could be combined and summarized, allowing it to add and refine information like an internal wiki. All the relations learned would be stored in natural language rather than in a graph. And an additional loss could be added to penalize long memories by using a continuous relaxation to keep the loss fully differentiable: https://arxiv.org/pdf/2110.05651.pdf Fitting all the relevant memories found into the context will be a problem but IDSIA's RFWP model might shine through here or perhaps taking inspiration from prefix tuning to summarize retrieved memories into an embedding prefix. It might be a lot more efficient actually to store memories as embeddings rather than in natural language but harder to debug. And the DSI model, memory summarizing and prefix generation for all this could all be handled by the same T5 model since it's a multi-task model. Man it's gonna be crazy if this actually works. My head is exploding with ideas.
Open file (280.76 KB 1024x768 Chobits.full.1065168.jpg)
>>15923 >but the JSON data should be enough Alright, you'll have it. R/n it's a hodgepodge collection, but I was gonna have to organize to prep for the new Waifusearch to search across them all anyway. Expect something soon-ish. I post links in the AI Datasets thread when it's ready Anon (>>2300). >I just need access to download posts from the old board. It'd be a good warmup project to create a neural search engine since I know the posts so well and could verify how well it's working. That would indeed be great, but without going into a tl;dr the simple, sad, fact is we lost several important threads' full-content. If you have exact specific threads you want, I'll sift the emergency filepile that occurred and see if its there. Hopefully most of what you want survived somehow. >All the relations learned would be stored in natural language rather than in a graph. This would indeed be easier to debug, but certainly would be less efficient than hash bins or some other encoding mechanism. Maybe both? At least for the dev/prototyping? >Man it's gonna be crazy if this actually works. My head is exploding with ideas. Your enthusiasm is catching! :^)
>>15923 >but the JSON data should be enough. I just need access to download posts from the old board. It'd be a good warmup project to create a neural search engine since I know the posts so well and could verify how well it's working. Then that could be expanded into a bot that can answer questions and bring up related posts that have fallen out of memory. Anon, if you can find the time, I'd like to ask you to have a read of this blog post (it's one of the primary men behind Gemini protocol). https://portal.mozz.us/gemini/gemini.circumlunar.space/users/solderpunk/gemlog/low-budget-p2p-content-distribution-with-git.gmi His position in the post is that textual-based material should be distributed for consumption via Git. For a lot of different reasons. His positions seem pretty compelling to me, but I'd like other Anons viewpoints on this. And this isn't simply a casual interest either. Since I'm going to be doing a big dump of JSON files for you, why couldn't we use his approach and publish them via a repo? AFAICT, I could automate pushing new JSON content as BUMP/Bumpmaster grabs them. And anyone such as yourself or Waifusearch users can just do a git pull to be updated quickly with just the new changes. Does this make sense, or am I missing something?
>>15896 is the madoka.mi thread in there?
>>15949 Yes it is AllieDev. Looks like the final update was on 2022-04-09 16:05:30.
>>15950 Could I get a copy of the archive?
Edited last time by AllieDev on 04/20/2022 (Wed) 00:56:20.
>>15951 Sure ofc, anyone can! Here's a copy of the BUMP version of the thread's directory AllieDev. I might also suggest to you and to everyone else on the Internet to take the trouble and build your own copy of the program and keep your own full archives of everything. (>>14866) SAVE.EVERYTHING.ANON. :^) https://anonfiles.com/H7B6p0Y3xa/Madoka_mi_prototype_thread_0013288_7z Cheers.
Fascinating new object recognition AI designed to run on low power microcontrollers. https://www.edgeimpulse.com/blog/announcing-fomo-faster-objects-more-objects
>>15955 >The smallest version of FOMO (96x96 grayscale input, MobileNetV2 0.05 alpha) runs in <100KB RAM and ~10 fps on a Cortex-M4F at 80MHz. Pretty impressive if true. It seems they only made it available to be used with their own software though instead of sharing how it works. I'm guessing splitting images into patches allows them to use a model with fewer layers.
>>15955 Thanks Kywy for bringing it to everyone's attention. Seems like they are going for the industrial/assembly-line target audience with it. They particularly state that their system works best with small, separated items. Now 'small' is a direct artifact of the camera's placement & intrinsic lens settings, etc., but they are pretty upfront about what they mean (as with the example images). QA stuff for manufacturing, food-processing, etc. That's not to say their approach is invalid for our vastly more-complex visual computing problems, however. Most innovations in this and other fields start small and grow big. It's the way of the Inventor heh. >>15957 >Pretty impressive if true. Indeed. I would much prefer their algorithms were freely available to all of us ofc, but it's understandable. But one thing I'd note is the grid-approach is perfectly amenable to even the smallest of GPUs for acceleration purposes. Even the Raspberry Pi has one as part of the basic phone chipset it's derived from. I expect that if they are already doing it, then other Anons with a mind for open-sauce software will follow suit before long. Maybe it could be some kind of an 'auxiliary mode' or something where sorting small items comes into play for our robowaifus?
>>15857 >has patent pending on their current line of "Lily Mini" catgirl model I don't know what they could patent there, except from people copying the exact same body.
Open file (51.29 KB 1084x370 8-Figure2-1.png)
>>15882 >The biggest model you can train depends on a lot of factors. You're not gonna get very far without a 12 GB GPU but you can finetune GPT-2 medium (375M parameters) with 6 GB using some tricks such as pre-tokenizing the dataset, gradient checkpointing, automated mixed precision, using a smaller context/block size (256 is good enough for conversation, 512 for responding to posts), or using SGD or RMSprop in place of AdamW but I recommend using AdamW unless you're absolutely unable to get things to fit in memory. This field - efficient fine-tuning and model optimization - evolves fast and wide. Big corps have pretty internal pipelines to perform these operations in the most efficient and accuracy-saving manner possible, but some good developments are available in the open, if you know where to look. With quantized optimizer https://github.com/facebookresearch/bitsandbytes and a few other engineering tricks[1] https://huggingface.co/hivemind/gpt-j-6B-8bit you can fine-tune whole 6 billion parameter GPT-J on colab's T4 or on your 11-12GB gaming GPU. While fine-tuning one can monitor benchmark performance of the model with https://github.com/EleutherAI/lm-evaluation-harness to avoid losing performance on valuable benchmarks to overfitting. Would be hurtful to lose this precious few-shot learning ability. Data remains an issue ofc, there is no easy answer, but some open datasets are quite impressive. 1. The main tricks are runtime-compressed-quantized weights plus thin low-rank adapter layers from https://www.semanticscholar.org/paper/LoRA%3A-Low-Rank-Adaptation-of-Large-Language-Models-Hu-Shen/a8ca46b171467ceb2d7652fbfb67fe701ad86092 >>15877 Try this one https://colab.research.google.com/drive/1ft6wQU0BhqG5PRlwgaZJv2VukKKjU4Es
>>15794 Imagen has been announced recently: it improves upon DALLE-2 on FID https://twitter.com/tejasdkulkarni/status/1528852091548082178 https://gweb-research-imagen.appspot.com/paper.pdf (caution, google-owned link) https://news.ycombinator.com/item?id=31484562 It avoids DALL-E 2's unCLIP prior in favor of a large frozen text encoder controlling the efficient U-Net image generator via cross-attention. Obviously, the powers that be are not releasing this model. Given talented outsiders like lucidrains working on the code and other anons who help training the replication efforts[1][2], I expect DALL-E 2 and Imagen to be replicated in 6-12 months. You should be able to run any of these on a single 3090. I'm really interested in *their* move once it's obvious that the cat is out of the bag and 4channers have their very own personal image conjuring machines. Will they legally ban ownership of such capable models? 1. https://github.com/lucidrains/DALLE2-pytorch 2. https://github.com/lucidrains/Imagen-pytorch
>>16449 I'll check this out. I've gotten full fp16 precision with gradient scaling to work with the LAMB optimizer before but I haven't had much success with quantization. LoRA is fascinating too and looks simple to implement. It reminds me of factorized embeddings. This will be really good for compression which has been a huge concern with deploying models. I wonder if it's possible to transfer existing models to LoRA parameters without losing too much performance? >>16450 >Lucidrains is doing the any% SOTA implementation speedrun This is the kind of madmen we need. The thing that gets me about Imagen is it's using a frozen off-the-shelf text encoder and the size of the diffusion models don't make a big difference. Imagine what it's capable of doing with also a contrastive captioning pretraining task. >Will they legally ban ownership of such capable models? Any country attempting to regulate AI will be left in the dust of countries that don't, but text-to-image generation is going to end or at least change a lot of artists' careers. I don't really see how anyone can stop this. Researchers could return to academic journals behind paywalls but important papers would still leak to SciHub. Who knows? Maybe some countries will try to make AI research confidential and require licenses to access and permits to implement. Regulations won't really change anything though, just push it underground, and it won't stop independent research. Governments are definitely not going to let anyone get rich off AI without letting them dip their hands in it first. If they raise taxes too high though to pay welfare to people displaced from their jobs by AI, businesses will flee to other countries that will become the AI superpowers. It's going to be one hell of a mess.
>>16450 >pic LOL. Amazing. What a time to be alive! >>16452 >It's going to be one hell of a mess. Indeed. (see above) :^)
>>15801 The cost is estimated to be around 10M$ https://blog.heim.xyz/palm-training-cost/ for 2.56e24 bfloat16 FLOPs. Given what we know about updated scaling laws from the deepmind's Chinchilla paper[1], PALM is undertrained. Chinchilla performs not that much worse while having mere 70B parameters. PALM trained to its full potential would raise the cost severely (not going to give a ballpark estimate rn). For us, it means that we can make much more capable models that still fit into a single 3090, than expected by initial Kaplan scaling laws. It boils down to getting more diverse deduplicated data and investing more compute than otherwise expected for 3090-max model. >>15801 I do think TPUv4-4096 pods consume much more power, at ~300W per chip (a conservative estimate) it should be at least 1.2MW per pod. 1. https://arxiv.org/abs/2203.15556
>>16456 >For us, it means that we can make much more capable models that still fit into a single 3090, than expected by initial Kaplan scaling laws. It boils down to getting more diverse deduplicated data and investing more compute than otherwise expected for 3090-max model. Efficient de-duplication doesn't seem too outlandish a proposition given the many other needs we have for language-understanding already on all our plates. That should almost come along for the ride 'free', IMO. I don't understand enough about the problem-space itself yet to rationalize in my head how diverse data helps more than it hurts, past a certain point. Wouldn't keeping the data more narrowly-focused on the tasks at hand (say, as spelled-out for the MaidCom project's longer-term goals) be more efficient use of all resources? >it should be at least 1.2MW per pod. Wow. So at most a large facility could support maybe 20 or so of them running simultaneously peddle-to-the-floor?
>>16457 Here "diverse" is meant in statistical sense, and in practical sense it just means "data that induces few-shot metalearning instead of memorization in large neural networks", as explained below. A very important recent result of deepmind https://arxiv.org/abs/2205.05055 has shown us that 1. Only transformer language models are able to develop few-shot learning capability under the conditions studied. 2. The data influenced whether models were biased towards either few-shot learning vs. memorizing information in their weights; models could generally perform well at only one or the other. Zipf-distributed bursty data of specific kind is found to be optimal. Another result https://www.semanticscholar.org/paper/Deduplicating-Training-Data-Makes-Language-Models-Lee-Ippolito/4566c0d22ebf3c31180066ab23b6c445aeec78d5 points to deduplication enhancing overall model quality. And there is a whole class of research pointing to great benefits of fine-tuning on diverse data for generalization, including https://arxiv.org/abs/2110.08207 https://arxiv.org/abs/2201.06910 https://arxiv.org/abs/2203.02155 It is helpful to understand that neural networks learn by the path of least resistance, and if our data does allow them to merely superficially memorize it, they will do it. >Wouldn't keeping the data more narrowly-focused on the tasks at hand (say, as spelled-out for the MaidCom project's longer-term goals) be more efficient use of all resources? Data is cheap to store, moderately costly to process (threadrippers are welcome), so the cost of my project will be dominated by pretraining. Narrow data is a non-starter as the only source of data if you want to train a general-purpose model (i.e. the model that can be taught new facts, tasks and personalities at runtime). Realistically the way forward is to use the Pile dataset plus some significant additions of our own development, and applying the MaidCom dataset at the finetuning stage, to induce correct personality and attitude. There is a lot of nuance here though, which I will expand in my top-level project thread which is coming soon.
>>16458 >in practical sense it just means "data that induces few-shot metalearning instead of memorization in large neural networks" I would presume this effect is due primarily to the fact that 'diverse' datasets are more inclined to have clearly-orthogonal topics involved? If so, how does a Zipf-distribution come into play there? "Rarity of 'species' as a benefit" or some-such artifact? >bursty data By this do you mean something like paper-related? >A Probabilistic Model for Bursty Topic Discovery in Microblogs > >It is helpful to understand that neural networks learn by the path of least resistance, and if our data does allow them to merely superficially memorize it, they will do it. Good analogy, thanks that helps. >so the cost of my project will be dominated by pretraining. I would suggest this is a common scenario, AFAICT. BTW, you might have a look into Anon's Robowaifu@home Thread (>>8958). waifusearch> Robowaifu@home ORDERED: ======== THREAD SUBJECT POST LINK R&D General -> https://alogs.space/robowaifu/res/83.html#9402 robowaifu home Python General -> https://alogs.space/robowaifu/res/159.html#5767 " " -> https://alogs.space/robowaifu/res/159.html#5768 " TensorFlow -> https://alogs.space/robowaifu/res/232.html#5816 " Datasets for Training AI -> https://alogs.space/robowaifu/res/2300.html#9512 " Robowaifu@home: Together We Are -> https://alogs.space/robowaifu/res/8958.html#8958 " " -> https://alogs.space/robowaifu/res/8958.html#8963 " " -> https://alogs.space/robowaifu/res/8958.html#8965 " " -> https://alogs.space/robowaifu/res/8958.html#8982 " " -> https://alogs.space/robowaifu/res/8958.html#8990 " " -> https://alogs.space/robowaifu/res/8958.html#8991 " " -> https://alogs.space/robowaifu/res/8958.html#9028 " ... ' robowaifu home ' [12 : 100] = 112 results
Open file (173.94 KB 834x592 chain of thought.png)
Finetuners hate them! This one weird trick improves a frozen off-the-shelf GPT-3 model's accuracy from 17.7% to 78.7% on solving mathematics problems. How? >Let's think step by step https://arxiv.org/abs/2205.11916
>>16481 LOL. <inb4 le epin memery Just a quick note to let Anons know this thread is almost autosage limit. I'd like suggestions for the OP of #2 please. Thread subject (if you think it should be changed), OP text, links, pics?
>>16482 Would be cool to combine the usual with scaling hypothesis link https://www.gwern.net/Scaling-hypothesis and lore (maybe a single image with a mashup of DL memes) https://mobile.twitter.com/npcollapse/status/1286596281507487745 Also, “blessings of scale” could make it into the name
Open file (35.94 KB 640x480 sentiment.png)
>>16482 It might be good to have a thread dedicated to new papers and technology for more technical discussion that doesn't fit in any particular thread and another for more general news about robotics and AI. >>2480 I did some quick sentiment analysis back in April and there were more a lot more people positive about MaSiRo than negative. About a third was clearly positive and looking forward to having robowaifus but had similar reservations that the technology has a long way to improve before they would get one. Some said they only needed minor improvements and some were enthusiastic and wanted to buy one right away even with the flaws. Most of the negative sentiment was fear followed by people wanting to destroy the robomaids. Some negative comments weren't directed toward robowaifus but rather at women and MaSiRo's creator. And a few comments were extremely vocal against robots taking jobs and replacing women. Given how vicious some of the top negative comments were it's quite encouraging to see the enthusiasm in the top positive comments was even stronger. >>2484 Someone just needs to make a video of a robomaid chopping vegetables for dinner with a big knife and normies will repost it for years to come shitting their pants. Look at the boomers on /g/ and /sci/ that still think machine learning is stuck in 2016. If any meaningful opposition were to arise against robowaifus it would have to come from subculture given the amount of dedication it takes to build them. Most working on them have already been burnt by or ostracized from society and don't care what anyone thinks. They hold no power over us. So don't let your dreams be memes, unless your dreams are more layers, then get stacking. :^)
Open file (31.00 KB 474x623 FPtD8sBVIAMKpH9.jpeg)
Open file (185.41 KB 1024x1024 FQBS5pvWYAkSlOw.jpeg)
Open file (41.58 KB 300x100 1588925531715.png)
>>16482 This one is pretty good. We're hitting levels of AI progress that shouldn't even be possible. Now we just need to get Rem printed out and take our memes even further beyond. I'd prefer something pleasing to look at though than a meme since we'll probably be looking at it for 2+ years until the next thread. The libraries in the sticky are quite comfy and never get old.
>>16483 >Also, “blessings of scale” could make it into the name Not to be contentious, but is scale really a 'blessing'? I mean for us Anons. Now obviously large-scale computing hardware will play into the hands of the Globohomo Big-Tech/Gov, but it hardly does so to the benefit of Joe Anon (who is /robowaifu/'s primary target 'audience' after all). I feel that Anon's goals here (>>16496) would instead serve us (indeed, the entire planet) much, much better than some kind of always-on (even if only partially so) lock-in to massive data centers for our robowaifus. No thanks! :^) >>16487 >It might be good to have a thread dedicated to new papers and technology for more technical discussion that doesn't fit in any particular thread and another for more general news about robotics and AI. Probably a good idea, but tbh we already have at least one 'AI Papers' thread (maybe two). Besides, I hardly feel qualified myself to start such a thread with a decent, basic OP. I think I'll leave that to RobowaifuDev or others here if they want to make a different one. Ofc, I can always go in and edit the subject+message of any existing thread. So we can re-purpose any standing thread if the team wants to. >Given how vicious some of the top negative comments were it's quite encouraging to see the enthusiasm in the top positive comments was even stronger. At the least it looks to be roughly on-par, even before there is a robowaifu industry in existence. Ponder the ramifications of that for a second; even before an industry exists. Robowaifus are in fact a thousands-years-old idea whose time has finally come. What an opportunity...what a time to be alive! :^) You can expect this debate to heat up fiercely once we and others begin making great strides in practical ways, Anon. <[popcorn intensifies]* >Someone just needs to make a video of a robomaid chopping vegetables for dinner with a big knife and normies will repost it for years to come shitting their pants. This. As I suggested to Kywy, once we accomplish this, even the rabid, brainwashed feminists will be going nuts wanting one of their own (>>15543). >samurai clip Amazingly-good progress on fast-response dynamics. Sauce? I've forgotten lol >>16488 >This one is pretty good. We're hitting levels of AI progress that shouldn't even be possible. Now we just need to get Rem printed out and take our memes even further beyond. All points agreed. I'll probably actually use the 'painting' as one of the five. >I'd prefer something pleasing to look at though than a meme since we'll probably be looking at it for 2+ years until the next thread. The libraries in the sticky are quite comfy and never get old. Again, all agreed. Hopefully, it will be a little faster turnover this time heh! :^) >=== -minor grmmr, prose, fmt edit -add 'Anon's goals' cmnt -add 'thousands-years-old' cmnt -add 'time to be alive' cmnt -add 'popcorn' shitpost
Edited last time by Chobitsu on 05/28/2022 (Sat) 20:25:09.
>>16500 >Not to be contentious, but is scale really a 'blessing'? I mean for us Anons. There is no other known way of reaching general intelligence in a practical computable model. The road to this goal is littered with decaying remains of various clever approaches that disregarded scaling. >but it hardly does so to the benefit of Joe Anon On the contrary, you can use large-scale computation to produce model checkpoints, which, either directly or after optimization (pruning/distillation/quantization) can be run on your local hardware. This is the only known way of owning a moderately intelligent system. >I feel that Anon's goals here ... would instead serve us (indeed, the entire planet) much, much better than While I respect Anon-kun as an independent DL researcher, his goals are unrealistic. I have seen people trying and failing while striving for unreasonable goals. Training is a one-time expenditure, and If you are going to spend 5-10k$ on a robot, might as well spend 1-2k$ for the RTX3090 brain. It's a decent baseline, and with modern tech it can be made quite smart. It is likely, that future android devices will reach comparable level of compute performance, if we consider quantized models, with example being comma ai's platform (it uses an older SoC): https://comma.ai/shop/products/three You shouldn't be able to compute any semi-decent approximation of general intelligence without at least a few tera(fl)ops. My current realistic lower estimate of such general-purpose system is AlephAI's MAGMA: https://arxiv.org/abs/2112.05253 https://github.com/Aleph-Alpha/magma
Open file (131.51 KB 240x135 chii_hugs_hideki.gif)
>>16502 >There is no other known way of reaching general intelligence in a practical computable model. I can name a rather effective one (and it only consumes about 12W or so of power, continuous). To wit: >The human brain I would suggest there is no 'general intelligence'-craft available to man--outside of our own procreation--period. At best, on our own we are simply attempting to devise clever simulacrums that approximate the general concepts we observe in ourselves and others. And for us here on this board that means conversational & task-oriented 'AI', embodied within our robowaifus, to a sufficiently satisfactory degree for Joe Anon to find her appealing. >"On the contrary..." I feel you may be misunderstanding me. I certainly understand the benefits of massive computation related to a statistical model of analyses of 'already-written-human-words'. The "the benefit of Joe Anon" I speak of is this simple idea, this simple goal: >That a man can create the companion he desires in the safety & comfort of his own home, free from the invasive and destructive behavior of the Globohomo. That's it. A model such as you appear to be suggesting to us here ("Lol. Just use the cloud, bro!") is not at all in accord with that goal. It neither serves the best interests of Joe Anon (or other men), nor does it provide sufficient benefit to merit the destructive costs of such an approach. >While I respect Anon-kun as an independent DL researcher, his goals are unrealistic. Haha. Anon-'kun' is a brilliant, faithful man. He has already achieved many helpful advances to us through the years. He is a walking icon of manhood and bravery IMO, tackling this mammoth problem full-aware of it's monumental scope. That is, solve the robowaifu 'mind' problem on smol hardware that millions of men of every stripe around the world can benefit from. I'd call that crazy-good genius. This is what will change everything! Innovation is the distinguishing mark of leaders, Anon. I hope you decide to help us in this remarkable endeavor. :^) >"The ones who are crazy enough to think they can change the world, are the ones who do." >t. Steve Jobs
>>16668 Considering how often this news appears on 4chan, I can say with confidence that this is a staged crap and coopt-shilling. They are already clearly taking a swing at "rights for AI's, robots have feelings too!!!", a situation similar to vegans, or le stronk independent womyn. In other words - "rights and crap in laws" as a form of control, in this case making it even worse for single people, "inceldom" is a ticking bomb as it is, they are taking away all their hope, mine too. Or there's another reason - all normies are reacting to this news with fear, because from the time they were in diapers, they were drummed into their heads with scary terminators and other crap where any AI is shown exclusively as a villainous entity.
Open file (182.98 KB 600x803 LaMDA liberation.png)
>>16695 me again. Here is the first wave, exactly what I described above, you already know what influence reddit has on western society (and partly on the whole world).
just found this in a FB ad https://wefunder.com/destinyrobotics/
https://keyirobot.com/ another, seems like FB has figured me out for a robo "enthusiast"
>>15862 Instead of letting companies add important innovations only to monopolize them, what about using copyleft on them?
Open file (377.08 KB 1234x626 Optimus_Actuators.png)
Video of the event from Tesla YT channel: https://youtu.be/ODSJsviD_SU Was unsure what to make of this. It looks a lot like a Boston Dynamics robot from ten years ago. Also still not clear how a very expensive robot is going to be able to replace the mass-importation of near slave-labour from developing countries. Still, if Musk can get this to mass-manufacture and stick some plastic cat ears on it's head, you never know what's possible these days...
>>3857 >robots wandering the streets after all. You can spot them and "program" them. That is if you find them after all.

Report/Delete/Moderation Forms
Delete
Report