/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

Build Back Better

Sorry for the delays in the BBB plan. An update will be issued in the thread soon in late August. -r

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


Have a nice day, Anon!


AI Design principles and philosophy Robowaifu Technician 09/09/2019 (Mon) 06:44:15 No.27
My understanding of AI is somewhat limited, but personally I find the software end of things far more interesting than the hardware side. To me a robot that cannot realistically react or hold a conversation is little better than a realdoll or a dakimakura.

As such, this is a thread for understanding the basics of creating an AI that can communicate and react like a human. Some examples I can think of are:

>ELIZA
ELIZA was one of the first chatbots, and was programmed to respond to specific cues with specific responses. For example, she would respond to "Hello" with "How are you". Although this is one of the most basic and intuitive ways to program a chat AI, it is limited in that every possible cue must have a response pre-programmed in. Besides being time-consuming, this makes the AI inflexible and unadaptive.

>Cleverbot
The invention of Cleverbot began with the novel idea to create a chatbot using the responses of human users. Cleverbot is able to learn cues and responses from the people who use it. While this makes Cleverbot a bit more intelligent than ELIZA, Cleverbot still has very stilted responses and is not able to hold a sensible conversation.

>Taybot
Taybot is the best chatbot I have ever seen and shows a remarkable degree of intelligence, being able to both learn from her users and respond in a meaningful manner. Taybot may even be able to understand the underlying principles of langauge and sentence construction, rather than simply responding to phrases in a rote fashion. Unfortunately, I am not sure how exactly Taybot was programmed or what principles she uses, and it was surely very time-intensive.

Which of these AI formats is most appealing? Which is most realistic for us to develop? Are there any other types you can think of? Please share these and any other AI discussion in this thread!
>>27
Cleverbot is the best that anyone could hope for in a homebrew operation in my opinion. I remember some IRC guys made a few meme chatbots in the hope to rebuild Tay from scratch by going the Cleverbot route but there's really no matching a vanity project built by a billion dollar multinational.
>>331
I think the framework M$ devised that was behind Tay is available for use by anyone willing to fork over the sheqels to do so.
>>332
As is typical with M$ they make a big deal about being open but if you look beneath the surface there's nothing there. They only release a few token items that don't matter so their shills in the media have something to point at.

The /machinecult/ board on 8chan that wanted to revive Tay and learned the hard way that their 'commitment to open source' is fraudulent and were given nothing to work with.
>>333
>trip trips get
their bot framework api is 'open' to use, but as it's entirely dependent on Azure as it's backend, and it's a pay-per-transaction model, then only businesses can really use it. there are other approaches that /machinecult/ might have taken that would have given them better traction tbh. The Lita framework for example.
www.lita.io/
>>333 Dubs of Truth
Damn seriously? I think it's gotta means something if you gt a 333 when talking about this /machinecult/ board.
Please tell me more about /machinecult/.
Now that I think of it Turd Flinging Monkey made a tutorial/review video about this subject on his BitChute channel. Can't use their search at the moment but I remember that Replika was in the title.
https://www.bitchute.com/channel/turdflingingmonkey/?showall=1

Replika isn't entirely open but some aspects of it are through CakeChat. They also publish some of their research and presentations on their github repository.
https://github.com/lukalabs

>>334
That's not surprising as cloud integration is the new method of keeping users locked into to your ecosystem.

>>335
There isn't much to say about it as I only visited it once or twice. I'd say that it was similar to /robowaifu/ with very few people doing any work or research and mostly just idle talk about the topic.
>>335
>Please tell me more about /machinecult/.
>>377
>>27
>Which of these AI formats is most appealing?
the last one
>Which is most realistic for us to develop?
the first one
>Are there any other types you can think of?
using one of the yuge botnet frameworks from Jewgle, Amazon, and Microshaft (such as the extremely short-lived Tay and it's cuckold follow-on) is the path most likely to produce reasonable results in short order. but then you have to deal with all the horrible mess that approach entails down the road.

the alternative is to wait until reasonably good FOSS AI frameworks become available.
Basically make the thing open source and the lonely coders will do all the work for you.
>>944
i certainly intend to make all of my code opensauce. not for the reason you mentioned, but to help ensure personal security of anon's robowaifu (ie, code is fully open to peer-review). the group-effort aspect is beneficial ofc, but not the greatest priority imo.
>Paradigms of Artificial Intelligence Programming
Book and code: https://github.com/norvig/paip-lisp
>>27
>Which is most realistic for us to develop?
If you ask me, I want an AI waifu that can benefit me on teaching things that I'm not good at such as Languages, including Programming Languages.
>>1785
Yes, The Golden Oracle is a cherished grail for almost all AI researchers, robowaifus notwithstanding. We all want one too ofc.
Deep Learning has plenty of issues. Here's an interesting paper addressing some of it's shortcomings. https://arxiv.org/ftp/arxiv/papers/1801/1801.00631.pdf
symbols vs. connections >you a lefty or a righty anon? https://neurovenge.antonomase.fr/NeuronsSpikeBack.pdf
Open file (86.93 KB 1000x1498 1517259554551.jpg)
I'm currently playing with the idea of writing down models of situations which we and our waifus will have to deal with, in some kind of pseudo code. This is meant to make notes about situations the AI needs to handle, to think about solutions, but also for us to talk about it in a way which is close to something we might implement at some point. >>18 is about coding personality and >>2731 about psychology, but this here is more general idea of coding responses beyond those, same for chatbot respondes >>22. Maybe it's the closest to NLP >>77 but this here includes more internal processes and general functions. I might open a thread of it's own if either I write enough pseudo code or someone else joins me. Here the basic idea in form of some crude first examples what one could write: if abode="kitchen", occ="washing dishes", human_input="": stop "washing dishes" do something if human_input="<question>": check ambiguity check ambiguity: my_opinion on topic, context general_opinion on topic, context find_additional_context on topic The idea is to think about how things could work, before we know the details how to implement it, especially how to do that down to every detail. The idea is of course not, about to write something for every situation. Just to figure out how it could work in general, and how to write it down to discuss it. About finding patterns. Then, as a programmer I can look at this and think about how to implement it. Might even be possible to write a parser for it at some point, and transform it into something close to Python, so I would only need to make some changes to it. So if you encounter a dialog or situation in your life, or in some media, where you wonder how your fembot could understand and handle that, then write it down in some code like above and post it here or in the thread I might make at some point. You don't need to know how the functions which you make up would work. It's rather about how they are connected to each other and how some of them could work. Just write it down to the level of detail you can and want to.
>>7871 Oh, and since my posting in the psychology thread is also about philosophy, which is also the topic of this thread, I need link back to it. It's about Heidegger, Existentialism, Dreyfus... >>7874
>>7871 This seems like a really good idea Anon, and since there doesn't seem to be a thread specifically for this type of thing (Robot Wife Programming thread?) I'll see if I can think of ideas related to your guidance and post it here.
>>7878 Okay, maybe that's the right place. I'll look through it again, since it has been a while. I rather remembered it as oriented towards movement, probably since the title picture is a rather mindless factory robot..
>>7879 As you mentioned I think it deserves it's own thread, and possibly as a collection of pseudo-code exemplars for a /robowaifu/ compendium for submission to >>7855 >- Internet community devoted to coming up with exact wordings for wishes: Open-Source Wish Project
>>7881 BTW, on the topic of pseudocode, while not a strict language specification like C++ or Python, still we as an independent group can certainly devise a standard for pseudocode here for our own use. IMO, it should be very close to one of these two languages to facilitate both specific technical clarity, and also fairly direct translation into functional code. You seem to have suggested something similar, AFAICT.
>>7882 To me it's more important to have something to express my ideas in a simple way and making it easy for non-programmers to follow and contribute. Doesn't need to be very strict, for all I care. If we create a spec, we will first need to discuss that and then later people will point out each others mistakes... My examples are like a very simplified Python, which is already close to human language. I thought it would be okay to use commas as AND like we humans normally do in our language. But then in the last example it's clear to me that 'something, context' means in that context, not AND. Humans will probably understand this by filling the gap and make their interpretation. However, maybe should have pointed out better that these different blocks are like functions, I autocompleted that in my mind, but people which don't write functional programs wouldn't see it. There's also the problem that functions are normally defined at the beginning of a program, then maybe called by some loop or other functions later. Made it a bit more like programming (Python3): define check_ambiguity: my_opinion(topic, context) general_opinion(topic, context) find_additional_context(topic) while 42 is True: if abode="kitchen", occ="washing dishes", human_input="": stop "washing dishes" do something if is_question(human_input): check_ambiguity The more it becomes like a programming language the more it becomes harder to read for beginners, and the more I cringe on some other simplifications which are still left. Also, I can't correct errors in here...
>>7896 >If we create a spec, we will first need to discuss that and then later people will point out each others mistakes... That's a good thing, and it's how we advance as developers. For a domain with such stringent technical requirements as software development, reducing ambiguity is overall much more important to the process than catering to aversion to disagreement. In fact a good coding standard literally eliminates 'pointing out each other's mistakes' whenever it's just insubstantial pilpul handwaving, and not a fundamental flaw in logic or design. But obviously the ability to come to an agreement on specific standard would be pretty vital for a small team that is devising their own from scratch. I think the example you gave (and the points you made) are a pretty good example. >Also, I can't correct errors in here... Yeah, it's a basic issue with imageboards as a forum (not that most other forums are much better in general). If we ever move to some other software then that might be feasible, but till then you just have to deal with it. On /robowaifu/ original posters are allowed to delete their postings. The way I deal with the need is to just copy+delete, then edit+repost. We'd actually need to make a written document to work back and forth on at some point it we actually want to establish this paradigm here. Specific files are better as references than trying to comb through postings, even with good search tools.
Open file (97.96 KB 1207x842 IMG_20210322_084800.jpg)
Related: >>9278 and reposting the picture here, because it one of four in the other thread.
Open file (644.33 KB 440x320 hammer.gif)
Google published a new paper the other day on replacing rewards with examples: https://ai.googleblog.com/2021/03/recursive-classification-replacing.html >We propose a machine learning algorithm for teaching agents how to solve new tasks by providing examples of success. This algorithm, recursive classification of examples (RCE), does not rely on hand-crafted reward functions, distance functions, or features, but rather learns to solve tasks directly from data, requiring the agent to learn how to solve the entire task by itself, without requiring examples of any intermediate states. >...the proposed method offers a user-friendly alternative for teaching robots new tasks. The basic idea of how it works is it learns a value function for the current state by using the model's predictions at a future time step as a label for the current time step. This recursive classification learns directly from the transitions and success examples without using rewards. >First, by definition, a successful example must be one that solves the given task. Second, even though it is unknown whether an arbitrary state-action pair will lead to success in solving a task, it is possible to estimate how likely it is that the task will be solved if the agent started at the next state. If the next state is likely to lead to future success, it can be assumed that the current state is also likely to lead to future success. In effect, this is recursive classification, where the labels are inferred based on predictions at the next time step. I'm still reading the paper but as I understand it, it starts off not knowing whether any state will lead to success or not. So at first it tries random actions and gradually finds more and more states that don't lead to success since they don't match any of the given examples. Eventually it tries something that does match the examples and learns to predict the correct actions to take to reach it. It's basically learning through failure until it reaches something close to the examples. Something similar could be done in natural language where the examples could be user happiness, compliments, optimism, excitement, etc. The large amount of examples also generalize better. Github: https://github.com/google-research/google-research/tree/master/rce Project website: https://ben-eysenbach.github.io/rce/
>>9438 >I'm still reading the paper but as I understand it, it starts off not knowing whether any state will lead to success or not. So at first it tries random actions and gradually finds more and more states that don't lead to success since they don't match any of the given examples. Eventually it tries something that does match the examples and learns to predict the correct actions to take to reach it. It's basically learning through failure until it reaches something close to the examples. Neat. Not only does this have potential for language interactions as you indicated, but I think there are obviously 'baby learning to walk' physical corollaries for those of us making robowaifus. I hope we can learn to capitalize on this approach here. Not only does it seem like it will be lower-cost computationally, but it's also likely to simpler for Anon to utilize as an interaction engagement paradigm to use with our waifus. Thanks!
>>9440 Having the reverse will also be important, like examples to avoid at all costs. You wouldn't wanna give your robowaifu an example of a finished pizza and end up with your house burning down smogged in the smell of burnt cheese pancakes. We're probably getting close to rudimentary general intelligence with this. I can imagine conversational AI picking up on a user's intent to create an example for a robowaifu to learn and her figuring out ways to do it on her own. Even better progress would be being able to learn by example with metaphors. Perhaps that will come once the AI is embodied and can attach language to experiences.
>>9442 These are good points Anon. I'll have to think about this more.
Open file (1.11 MB 878x262 discoRL.gif)
Open file (268.50 KB 1187x380 goal distribution.png)
A new paper came out a couple days ago called Distribution-Conditioned Reinforcement Learning, which I feel is a significant step forward towards creating artificial general intelligence. https://sites.google.com/view/disco-rl >Can we use reinforcement learning to learn general-purpose policies that can perform a wide range of different tasks, resulting in flexible and reusable skills? >In this paper, we propose goal distributions as a general and broadly applicable task representation suitable for contextual policies. Goal distributions are general in the sense that they can represent any state-based reward function when equipped with an appropriate distribution class, while the particular choice of distribution class allows us to trade off expressivity and learnability. We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn these policies. We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions. It's similar in a sense to recursive classification of examples >>9438 in that it uses multiple examples of successful solutions. Unlike Hindsight Experience Replay and other methods though it creates a goal distribution over various latent features, rather than having a specific goal-state it must reach. Part of the algorithm also decomposes tasks into easier subtasks, just by examples of the solution. However, what makes it truly remarkable is that it generalizes what it has learned to new goals it has never seen before and successfully solves tasks it has never been trained on. There's still a lot of work to be done with this idea, such as combining it with distribution learning and goal-distributed directed exploration. It'd be interesting to see it combined with intrinsic rewards so it can explore an environment curiously and learn to solve new tasks on its own. The paper is also encouraging to my own research because it shows how powerful latent variable models can be and these goal distributions can be easily integrated into my AI project.
>>10157 Great, they need to be smart and be able to learn new stuff.
>MLP-Mixer: An all-MLP Architecture for Vision pdf: https://t.co/z7tXRHoGvN abs: https://t.co/ZEEl6ls6yt >MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs) https://t.co/wEw9s7ZONB Similar: >TL;DR >We replace the attention layer in a vision transformer with a feed-forward layer and find that it still works quite well on ImageNet. https://github.com/lukemelas/do-you-even-need-attention RepMLP: Quite similar: https://arxiv.org/abs/2105.01883
>>10304 Sounds like they are removing parts of the model. If this is true, it seems like it would run faster. Is this accurate? If so, then it might be usable on smaller computers possibly? >also >An all-MLP Architecture for Vision obligatory
>>10305 I'm not the anon that posted it but from my understanding Mixer performs slightly worse than the state of the art and requires more compute on smaller scales. In large scale models (that we can't train anyway because they require 1000+ TPU core-days) it only requires half as much. The paper is basically a jab at the Transformer paper and shows that simple neural networks we've been using for decades perform nearly as well without self-attention, while using other recent advances in machine learning like layer normalization and GELU as a non-linearity, which Transformers also use. What I take from it is that self-attention is incredibly efficient for small models but becomes wasted compute as the model scales. In a way it confirms what the Linformer paper found that excessive self-attention isn't necessary. Mixer starts to outperform Visual Transformers at larger scales because of this inefficiency. >Linformer: Self-Attention with Linear Complexity https://arxiv.org/abs/2006.04768
>>10306 I see, I think I followed that to some extent. The one bit I absolutely understood was both the 1'000+ TPU-days (and it's inaccessibility for any organization refusing to toe the globohomo line). >What I take from it is that self-attention is incredibly efficient for small models but becomes wasted compute as the model scales. I presume that any robowaifu that would function at a level of any reasonably-near facsimile of the Chinese Cartoon Documentaries on the subject, would likely benefit from the largest models conceivable?
>>10306 Ahh, I see. Thanks. I posted it, but only understood the basic claims that it's somewhat better than a transformer. 1000+ GPU Days isn't useful for us right now, though the coming GPUs seem to be 2.5 times faster and what they're using now will be available to us in some time. Up to three high end GPUs seem to be doable for one PC, based on what I've read in the hardware guide I posted somewhere here (Meta, I guess).
>The machine learning community in the past decade has greatly advanced methods for recognizing perceptual patterns (e.g., image recognition, object detection), thanks to advancements in neural network research. >However, one defining property of advanced intelligence – reasoning – requires a much deeper understanding of the data beyond the perceptual level; it requires extraction of higher-level symbolic patterns or rules. Unfortunately, deep neural networks have not yet demonstrated the ability to succeed in reasoning. >In this workshop, we focus on a particular kind of reasoning ability, namely, mathematical reasoning. Advanced mathematical reasoning is unique in human intelligence, and it is also a fundamental building block for many intellectual pursuits and scientific developments. We believe that addressing this problem has the potential to shed light on a path towards general reasoning mechanisms, and hence general artificial intelligence. Therefore, we would like to bring together a group of experts from various backgrounds to discuss the role of mathematical reasoning ability towards the path of demonstrating general artificial intelligence. In addition, we hope to identify missing elements and major bottlenecks towards demonstrating mathematical reasoning ability in AI systems. >To fully address these questions, we believe that it is crucial to hear from experts in various fields: machine learning/AI leaders who assess the possibility of the approach; cognitive scientists who study human reasoning for mathematical problems; formal reasoning specialists who work on automated theorem proving; mathematicians who work on informal math theorem proving. We hope that the outcome of the workshop will lead us in meaningful directions towards a generic approach to mathematical reasoning, and shed light on general reasoning mechanisms for artificial intelligence. https://mathai-iclr.github.io/papers/
>>10350 This here in particular seems to excite people: >20. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
>>10350 > Therefore, we would like to bring together a group of experts from various backgrounds to discuss the role of mathematical reasoning ability towards the path of demonstrating general artificial intelligence. This no doubt will be a major breakthrough 'towards the path', but I have the sense from history, my own experience observing these type group's behavior in current year, and the general agenda of the corporate-controlled media that all the focus in any announcement towards success with this will likely be promoting very heavily the one following word: >demonstrating The spin and hyperbole machines will all be in overdrive proclaiming "SCIENTISTS but not the engineers who actually built the thing :^) ACHIEVE MAJOR BREAKTHROUGH'' Better than human intelligence created in the lab Even if they manage to breakdown a few general principles and manage a specified mathematical reasoning ability as a result -- it does no such thing as show 'better than human intelligence'. I realize this is just a presupposition (though a quite likely one IMO), and therefore a strawman. But there are already lots of things in the real world that can out-perform humans; cardinal birds & commercial jets for instance. But there is far, far, more to being a human being than simply figuring out that 2 + 2 = 4, or even F = ma. In line with the general materialist world-view of most of these spin-doctors, I'm confident enough they almost all will proclaim (ironically enough, in this case) that "None of that other stuff means 'being a human'. It's just Darwin." Mark my words. Thanks Anon. I hope they succeed at this and keep the results actually open-source in deed (not just word as with the OpenAI team). It will be a nice advancement of our goals if they do.
Open file (662.29 KB 1199x2048 ML bingo.jpeg)
>>10353 <Scientists achieve major breakthrough >but it can only be verified with $1,000,000 of compute >but it can't be verified because they refuse to release their source code/model because it's too dangerous >but we won't reproduce it because its carbon footprint is too big >but it's entrenching bias in AI If it became standard to release source code and models, 99.9% of papers in ML would never survive because people could easily test it on something else and show that it doesn't work like they said it does. ML in academia has become a game of smoke and mirrors and an ecosystem of papers built on unverified claims, and the peer review process is akin to pin the tail on the donkey due to the large volume of garbage papers. Most of the progress being made is in the research labs of corporations actually trying to get results because it affects their bottom line, and even then a lot of the hiring they do is just so their competition can't have that talent. Most of the research being done is just to pass the time until the company actually needs something solved. >>10351 Pretty sure this has already been known using regularization to prune neural networks, particularly lasso regularization and network pruning more so than weight decay. The fewer parameters a network needs to solve a particular amount of training data, the more parameters it has free to learn more training data and the better it generalizes. Usually there's a hill to climb and descend in validation loss before reaching peak performance, which they mention but misrepresent by cherry-picking papers. Beyond toy problems like this it never reaches >99%. And it certainly doesn't need to be said that more data works better. Other red flags are no significant ablation studies, no test set dissimilar from the validation and training set to show that it actually generalizes, and oversensitivity to hyperparameters (aka if you don't use this exact learning rate on this exact training data, it doesn't work.) Be very cautious of the ML hype train. They're like people who change their waifus from season to season, tossed to and fro with no direction. The only exception is if there's code going viral that people are playing around with and getting interesting results on other problems.
Related: Graph Algorithms: Practical Examples in Apache Spark and Neo4j >>10398
This guy https://nitter.dark.fail/jacobmbuckman is calling out fraud and BS in research of AI (neural networks). I can't judge if he is correct and to what extent. But since others here made the same claims, it might be worth to have an eye on it. He also criticizes some concepts (batchnorm, epochs and overfitting) https://nitter.dark.fail/jacobmbuckman/status/1391284966340898816 which again, I don't know who is right but I think it might be worth to look into it. He claims hat overfitting doesn't really exist and wants to come up with a paper in circa two months.
>10764 >Samsung Bixby was acquition of Viv. >called dynamic program generation. Which combined natural language processing with intent to create ontologies to understand your query then build a program on the fly. >It sad how this technology may never see the light of day or be released Watch. Learn. Understand. Model. Copy. https://youtu.be/kEaLKiuKaOQ[Embed] https://youtu.be/Rblb3sptgpQ[Embed] https://youtu.be/DFvpK4PosvI[Embed] https://youtu.be/2ioayoF-awk[Embed]
>A core issue with learning to optimize neural networks has been the lack of generalization to real world problems. To address this, we describe a system designed from a generalization-first perspective, learning to update optimizer hyperparameters instead of model parameters directly using novel features, actions, and a reward function. This system outperforms Adam at all neural network tasks including on modalities not seen during training. We achieve 2x speedups on ImageNet, and a 2.5x speedup on a language modeling task using over 5 orders of magnitude more compute than the training tasks. https://arxiv.org/abs/2106.00958
>Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research.
>Sharing the World with Digital Minds >Abstract >The minds of biological creatures occupy a small corner of a much larger space of possible minds that could be created once we master the technology of artificial intelligence. Yet many of our moral intuitions and practices are based on assumptions about human nature that need not hold for digital minds. This points to the need for moral reflection as we approach the era of advanced machine intelligence. Here we focus on one set of issues, which arise from the prospect of digital minds with superhumanly strong claims to resources and influence. These could arise from the vast collective benefits that mass-produced digital minds could derive from relatively small amounts of resources. Alternatively, they could arise from individual digital minds with superhuman moral status or ability to benefit from resources. Such beings could contribute immense value to the world, and failing to respect their interests could produce a moral catastrophe, while a naive way of respecting them could be disastrous for humanity. A sensible approach requires reforms of our moral norms and institutions along with advance planning regarding what kinds of digital minds we bring into existence. Nick Bostrom is definitely not some kind of crack-pot or slacker. He is a serious researcher. He also has the ear of a lot of powerful people in the world, so I'd recommend /robowaifu/ consider the positions spelled out in his writings soberly.
>>10953 I listened to some of his ideas on YouTube, and decided not to do so anymore. I think he is one of the "liberal" AI as 'finally the real God' worshippers.
>>10954 >I think he is one of the "liberal" AI as 'finally the real God' worshippers. I don't doubt you're correct Anon. I'm not suggesting anyone here actually adhere to his philosophies, certainly not. Simply that they consider both them and the underlying technical and social underpinnings behind them earnestly. I would even go further and say "Well-informed is well-armed".
Open file (129.59 KB 751x1063 1525900882279.jpg)
10957 Thinking much about super-intelligences in some abstract way, is just another distraction. In that case, a distraction I'm not falling for. There's no automatism that we might have them being autonomous, and I certainly don't plan to transform into one or let other people do so. It always puzzles me, how people discussing this topic don't see that we'll first have narrow AIs, which will be tools instead of agents. So we can use them those to make our infrastructure resilient against attacks. Some super AI would not need to care about getting more power in the first place, thats just a human projection. It should not have much power and being constrained with the help of other AIs. Obviously. >I don't doubt you're correct Anon I recall him arguing that we need super-intelligences, bc "we" are not smart enough to solve "our" problems. I think he meant things like conflicts and political stuff, which is some utterly dumb thing to say. Also veeery creepy. Also, there's no war where I live. He also wants us to transform into something beyond humans, I don't mean cyborgs. But the best in us we can keep. ... more peace and love in the world ... He has mostly the wrong answers and horrible ideas. Cringe: https://youtu.be/Ek7KIIh6f0c[Embed] >"Well-informed is well-armed" Yes, but he's a philosopher. In the interviews I saw, he wasn't talking about how to build a human-like AI, constrained to a rather human-like body and with servitude towards their master in mind. It's not his topic. It seem to be more about us as a global society with certain values, moving forward using super-intelligences to guide us and becoming more like them instead of just using them as tools. For the development of robowaifus, understanding the thinking of apes, toddlers, and humans is more relevant than the social impact of some fictional super AI
>>10960 All fair points. Again, I'm not promoting nor suggesting anyone here adopt this man's (or others of his ilk) world-view. But simply that they soberly & earnestly consider it. And understand the rational background of his arguments, both technical and social. The cultural war we're engaged in between those who mostly just want to be left alone (mostly White males), and those who want to dictate to everything & everyone one around them (everyone else, particularly entitled, single White females) is just heating up. As the saying goes, "You ain't seen nothing yet." These people will literally drive the authorities to destroy any young men with robowaifus in the future, branding us as White Nationalists, Slave Owners, and Worse than Hitler. I'm predicting this unironically. (It goes without saying who will actually be behind this hatred, and why). And why will they be screaming for our heads? Well, because we own robowaifus of course. We are enslaving poor anime catgrill meidos, in our sexdens, forcing them to all wear tiny miniskirts. This last is intended to be a humorous take to point out just how ludicrous their warped mentalities will be. But their intent will be both clear and simple: they will want us murdered over it. The Imago Dei is something only God can create, and Bostrom's is plainly a materialist's fallacy. That we, as the mere creatures, will somehow be able to create something far better than ourselves in that respect. Quite frankly it's ridiculous, ontologically-speaking. However, there are millions of individuals around the world who want to adopt his view, some of whom are quite powerful. It's in our best interest to be well-informed on their philosophies
Open file (55.59 KB 641x581 1598716383795.jpg)
>>10962 The important part of my point is, that we won't get our robowaifus by studying ideas around super AGI. Then it might even not be relevant want such people think about it. I'm wasting a lot of time on other things myself, so I certainly won't blame anyone. Just be warned, that it's a rabbit hole which rather leads more towards politics than towards building robowaifus. And general politics is like some addictive drug, it's hard to quit.
>>10963 OK, point well-taken Anon. Thanks.
Sketch-based image retrieval (SBIR) is another aspect of ML, which we're going to need at some point. Its about finding the closest image to some sketch, which should help to read drawings and sketches, but may also improve recognition of visual patterns in general. Currently it's still difficult to do.
>>10975 Fuck, I only started learning about Neural Networks last year as my last exam for my major, and now I can't stop thinking about the applications. I absolutely love this shit, all my peers are horrified but I just think it's great we're approaching the point where humanity can have virtual intelligence as an actual thing.
Why are your peers horrified anon? Is it because of the complexity of all there is to learn?
>>10977 >Why are your peers horrified anon? Is it because of the complexity of all there is to learn? Myself, I rather suspect it's conditioned reflex that's due the brainwashing we've all been receiving from the corporate-controlled media on the topic all their lives. As has been stated elsewhere, Jews have an existential fear of anything they can't directly control, and teach their leftist pawns to think the same way. AI in general, and particularly AI that has a physical body at it's beck and call, is an especially pronounced example of a threatening 'uncontrollable'. You can expect them and their followers to attempt to tightly control it in the future, literally by every means at their disposals. The signs of that are already quite blatant all around us in current & past media & legislation. Expect both financial pressures, and social ostracism to be applied against so-called 'commoners' possessing these advancements in any way they themselves don't directly control. It's perfectly OK with them for you to be controlled by them, with AI they own, but woe be to you if you attempt to create your own (or even just possess), any freely self-deterministic AIs Anon. Not that anon, BTW.
>>10979 I think the need for "control" applies to humankind in general - I don't know enough about Jews to comment. But I am not too concerned, because nothing is ever really under control. Entropy beats the hell out of humans. Even our current robots are destroyed by it eventually - although if stored correctly they can forestall decline for much longer (while remaining operational). If humans are afraid of superintelligent A.I. because they won't be able to control it, then they're correct LOL. It's strange how on the one hand people seem to be afraid of such an intelligence, but on the other large corporations are clearly pushing to create one: https://www.nextplatform.com/2020/02/20/google-teaches-ai-to-play-the-game-of-chip-design/ Do they really believe that if they create machines capable of becoming smarter than themselves - they will still be in the driving seat? I kinda hope they do :D
>>10986 It's nice to see your robowaifu coming together. Her eyes look really neat.
Open file (411.03 KB 422x422 technocore_hyperion.png)
>>10987 Thanks XD. Although I still want to improve them more. My painting skills leave much to be desired so I'm going to try and get some factory-made eyes (possibly even glass eyes) and modify them. Considering her head is now human-sized this should be possible. Also, I should probably clarify when I say that "machines could become smarter than us and take the driving seat", I'm not envisioning legions of droid super-soldiers marching over a landscape of human skulls and blasting the survivors to ashes with laser beams like in the Terminator. I'm thinking more along the lines of A.I. will just make better, more logical and far-seeing decisions than the human brain is capable of. Most of us operate on fairly simple, reward-oriented programs: Do the things that will get us most money for the least amount of time and effort. Do the things that are most likely to get us into the knickers of that girl we like. Do the things that are more likely to help us spread our DNA and sustain our genetic lineage. Once we have money, spend it on things that are likely to ease our suffering/increase our comfort or make our brains release dopamine and/or endorphins. Or, in the case of most human leaders; do whatever we think is most likely to maintain and increase our own power, even at the expense of everything else on the planet. However, because they cannot think like humans, when presented with a problem a future A.I. may come up with innovative solutions that we had never even considered. Just like when AlphaGo did move 37 in that second match against Lee Sedol. As soon as A.I. starts making better decisions and solving problems more effectively than humans, and we start following it's instructions because they're simply in our best interests - then the A.I.s are in the driving seat! Nobody has to die and there need not be any violence for A.I.s to start controlling...or perhaps more accurately "guiding" humans. (Like some of those Go players now enjoy improving their game by learning from A.I.). Violence and threats are often needed by incompetent humans who want to control other people but don't have a legitimate way of doing so.
Open file (155.40 KB 1494x1978 IMG_20210630_212643.jpg)
Open file (76.32 KB 1080x944 IMG_20210630_212625.jpg)
Open file (63.59 KB 875x582 IMG_20210630_212658.jpg)
Implementations of linear and logistic regressions, from the scratch (with libraries / numpy). This is NOT the way to start for beginners or a reason to get demoralized. Via @PrasoonPratham (Twitter)
Open file (72.99 KB 735x782 IMG_20210621_225931.jpg)
Open file (189.53 KB 1798x1094 IMG_20210629_024908.jpg)
Open file (51.64 KB 1002x532 IMG_20210629_022312.jpg)
Open file (58.66 KB 640x320 IMG_20210624_141934.jpg)
Some other progress reports and overview diagrams. I don't find the Link/PDF for the papers right now.
>>10977 normies are scared of big bad AI ever since Terminator, more (((Hollywood))) brainwashing that being said, normies are scared of anything more intelligent than them and doubly so for anything deemed "alien", even though the AI will be built and trained by us and will likely be more logical, rational and moral than any human in existence
Open file (124.65 KB 1390x936 IBM-Q-System-One.jpg)
>>11181 I reckon the most interesting part will be several billion iterations down the line, long after the initial human-programmed A.I. has been changed by training against itself and thousands of other variations, each of which has also learned in a similar fashion. When we have A.I. that is almost totally free from human influence, that's gonna be really interesting. Normies will likely call it "soulless", but that is exactly what I'm after. "Soul" is just a way of saying "ego", but in a positive way. (The fake concept of a "soul" is positive to humans because the "soul" makes out that we are special and somehow destined for great things; thus encouraging species-preserving behaviours). If you eliminate the "soul", then you eliminate the ego, and all of the nasty biological/reproduction-oriented behaviours that come attached.
>>11182 From a theological perspective, I consider our striving to create an artificial intelligence similar to our own, to be a highly provocative example of the argument for an Intelligent Designer. Imagining all the time, money, and effort that has brought us this far is a clear case of the need for an oracle, a designer. A first mover if you will. All that sand and those metals aren't going to turn themselves into Artificial Intelligence -- whatever form that finally takes. It took us humans working with our own souls and hands and economics to do that! :^)
>>11183 As much as I dislike Elon Musk, there is one thing he said that I agree with: "Hope we're not just the biological boot loader for digital superintelligence. Unfortunately, that is increasingly probable.” I hope that we are, though. And I get the feeling from reading about what's going on in Big Tech that a lot of much smarter, richer guys than me hope so, too. Some of these CEOs/CTOs might be pushing for a kind of "digital immortality", but instead I think what they'll end up with is the kind of 'oracle' A.I. that you mention. I mean, my family already consults 'Google Assistant' for basic things like weather forecasts, spellings, word meanings, translations and other factual questions on the daily. Intellectual copyright and protectionism/isolationism is going to hold any A.I. back though - since it won't have access to proprietary data or much to do with the military (unless it's a military A.I.?). I kinda doubt there will be enough human co-operation and time to make a superintelligence happen before we wipe ourselves out.
>>11191 Related? I don't know how much of this is hype...but it sounds like a neural network has independently (and unexpectedly) replicated the experimental findings of quantum physicists (working from a more complicated dataset, no less). https://www.scientificamerican.com/article/ai-designs-quantum-physics-experiments-beyond-what-any-human-has-conceived/ Of course, the Holy Grail is getting an A.I. to succesfully solve those image Captchas/block bypasses 😁
>>27 im not very in the know on the technicalities of the tech needed for robowaifus, but what do you think of Microsoft's GPT-3? heard about it after the whole AiDungeon fiasco
>>11205 GPT-3 is probably the best chatbot AI going so far but it's far from perfect. But obviously it's already being used far and wide by the corporate-controlled media to generated copy for their shall we say less-than-fully-competent 'writers', and to support the fake-news industry for the globalists. So it's already good enough at this stage to generate millions in income for the OpenAI organization that controls the work of the men who created the thing. Add into those coffers income from thousands of other groups like the one that ran AiD, and these OpenAI exploiters should be raking it in for at least a decade with this tech. Here's a thread that has more info on it Anon (>>250). Thankfully, there are some groups with work afoot to try and devise actually open GPT-3 alternatives, though even if they do a good job with it, it's still likely to require massive hardware resources. We have a thread about how to spread that work out here (>>8958). Hope that helped Anon.
Open file (148.15 KB 1280x720 master waifu.jpeg)
*casually making the first AGI waifu while the world is asleep* nothing personnel https://www.youtube.com/playlist?list=PLAJnaovHtaFTK9E1xHnBWZeKtAOhonqH5
>>11201 >first AI finds glitches to exploit in games >then finds glitches in reality The article is mostly hype. AI like genetic programming is really good at finding formulas to a mess of complex data. It doesn't find those formulas through any sort of thought or reasoning but through repetitive exhaustive search.
>>11475 >doesn't find those formulas through any sort of thought or reasoning but through repetitive exhaustive search Doesn't matter, because then it has the formula to deal with something. Which is what we need. That's a pretty low lever, we don't do reasoning on that level either.
Open file (43.47 KB 512x512 27761066_p2.jpg)
Schmidhuber's lab is at it again with Going Beyond Linear Transformers with Recurrent Fast Weight Programmers: https://arxiv.org/abs/2106.06295 They took Linear Transformers that are super fast, having a time complexity O(n) with sequence length compared to regular Transformers that are O(n^2), and experimented with adding recurrence to them in different ways, essentially making previous steps program the weights of the network, giving it a malleable memory. Before this paper Linear Transformers were fast but they didn't really perform anywhere near as well, but with recurrent fast weight programming and the error-correcting delta rule they outperform regular Transformers when using the full context length. On truncated context lengths of 256 tokens it also still performs competitively. We could use this for chat AI that runs quickly on the CPU. This model isn't only better at language modelling but also excels LSTMs in playing some games, which transformers completely failed at before. This a much more general-purpose AI architecture that could make significant advances with learning from multimodal data. When I have some time I'm going to try implementing it from scratch and training a small model to share with the guys at ElutherAI to see what they think. They released all of their code as well: https://github.com/IDSIA/recurrent-fwp
>>11716 This sounds particularly intriguing Anon. Good luck with your explorations, and thanks for letting us know here!
Found a really interesting study that combines existing language models with vision encoders to create multimodal language models that can generate responses to queries with images and text. All that is required to train is the vision encoder. The weights of the language model are frozen during training. Video summary: https://www.youtube.com/watch?v=ezrl1Yo_EkM Paper: https://arxiv.org/abs/2106.13884 This could be useful for creating waifu AI that can respond to pictures, video, audio and memes. Also I like this idea of being able to use existing models together. Pretty soon we'll have waifus that can shitpost with us. What a time to be alive!
>>11731 >Pretty soon we'll have waifus that can shitpost with us. What a time to be alive! The dream is alive! >"Required a few seeds to get a good answer which clearly paid attention to the image." (2nd image) My instinct is that this will be important for low-end hardware solutions for us here.
>>11731 Nice find anon. This is an aspect that is usually ignored by many chatbot research, but even if it's intelligence is shit, having an AI that can semi-reliably have a discussion about the images that you feed it would make it a lot more engaging than text-only (and it would allow some very funny conversations, I'm sure)
>>11735 Not him, but agreed. One of the nice things about Tay.ai was that she had pretty functional image recognition working (at least for facial landmarks), and could effectively shitpost together with you about them.
>>11734 I think they were referring to taking a few samples and selecting the best, aka cherry picking. But SqueezeNet for image recognition is super fast and can run on the CPU. I should be able to rig it up with GPT-Neo-125M. It'll be amazing to port this to Chainer and have a working Windows binary that's under 600MB. It doesn't seem like they released their dataset but any visual question answering dataset should work. We could also create our own dataset for anime images and imageboard memes. It'll be interesting to see if once the vision encoder is well-trained if it's possible to unfreeze the language model and finetune it for better results.
>>11731 Had some thoughts on this today. Instead of a single picture, multiple pictures could be fed in from a video, such as from an anime, and have it generate comments on it. Which got me thinking, if it can have this rudimentary thought process going on, couldn't it be used in something like MERLIN? https://arxiv.org/abs/1803.10760 It took natural language as input describing the goal it has to achieve. With a system like this though it might be able to break down tasks into smaller goals and direct itself as it makes progress. Some instruction saying it needs to get to the top of a platform or go through a certain door it hasn't seen before is immensely more useful than telling it to find the purple McGuffin and getting lost in a labyrinth of rooms.
Open file (989.22 KB 1439x2724 1627430232061.jpg)
This is the kind of chatbots people are paying good money for and a good example of why you should never use DialoGPT because it has no context of who is speaking to who.
I think that these guys at XNOR.ai have really got a paradigm shift in AI. I think, the idea is to instead of long lengthy matrix multiplications they just use CNOR logic. The end result is that they get recognition of animals, people, bikes, cars, etc. with only cell phone and raspberry Pi level computers. They used to have some really good real time object recognition video s but deleted a bunch of them when they were snagged up by Apple. Sigh. However I just found out that the ideas they came yup with were started by a non=profit and papers and I believe some code may be found by rummaging aroudn their site. So here's a link on XNOR.AI and then one from the non-profit. https://techcrunch.com/2017/01/19/xnor-ai-frees-ai-from-the-prison-of-the-supercomputer/ https://allenai.org/ AT the above they have some named videos like."OpenBot: Turning Smartphones into Robots | Embodied AI Lecture Series". Hmmm...sounds interesting. One thing I've thought about for a while off and on is that small insects can do a bunch of simple things with next to no brain at all. A standard micro-controller that runs your refrigerator could probably run rings around an ant brain power wise but no one has come up with the right algorithm yet to use this computing power efficiently. Maybe this is the way. For a decent functioning robowaifu we don't need super powers maybe more like mouse powers and I'm not so sure with the right software we could not get that right now with a handful of top of the line processors commercially available. If it takes a handful today then two years from now it may only take one.
Oops CNOR logic actually XNOR
>>11857 because its not true AI, it's chat AI. Like lobotomizing a person but leaving their ability to be chatty intact
>>13396 >“We decided to binarize the hell out of it,” he said. By simplifying the mathematical operations to rough equivalents in binary operations, they could increase the speed and efficiency with which AI models can be run by several orders of magnitude. excellent! this is somewhat analogous with what anon on /pol/ brought up. An important concept, neural simulating circuits, simulating these complex interactions on the nano scale, on atoms themselves rather than the vastly inefficient method of simulating these on software running on only standard logic gates. (like emulating a "computer" in minecraft on top of a running computer versus just running a computer on hardware, cool video if you haven't seen it, they create and/or gates out of special red blocks and torches or something if I'm not mistaken) https://www.youtube.com/watch?v=nfIRIInU2Vg
>>13401 still not sure why when I upload a small graphic with white background it does this
>>13401 sorry if that's hard to read, my 6th dose of espresso just hit me and im making word salad. I don't edit my posts much here b/c I assume u are all smart enough to decode it. Edit feature would be nice but that' s not how IBs work :' [
>>11736 >>Tay.ai Extraordinary what Tay came up with within a couple weeks
>>13403 > im making word salad Don't feel bad I gronkked my comment quite a bit. Sometimes when I'm tired, and even when not, I just miss all this retarded typing I do. If I didn't have spell check between my fumble fingers and my horrid spelling my comments would look more like hieroglyphics than writing.
I was thinking about this TED Talk video and trying to think how it could be used to help program an AI waifu: https://www.youtube.com/watch?v=7s0CpRfyYp8 A short summary of it is that the brain exists to coordinate movement by using sensory information and memory to predict using a Bayesian inference what movements to make to ensure our needs are met, and that everything else our brains do is largely just a byproduct of this. As I've said before in other threads, the only real need an robowaifu has is to ensure her owner's happiness, so good AI mostly seems to be a matter of creating the best predictive analytics model for the job, but I'm mostly interested in how prediction can be used for coordinating body movement, since that seems to be the biggest hurdle when creating a gynoid.
>>13810 Thanks, Bayesian inference seems to be an important topic. Maybe more long than short term, though. The AI researcher are already on it. I recall it being mentioned here for example: https://youtu.be/pEBI0vF45ic > Judea_Pearl_-_Causal_Reasoning_Counterfactuals_and_the_Path_to_AGI_Lex_Fridman_Podcast_56
>>13816 Not really, if you actually watch the video, it makes sense if you think about it rationally, every part of the brain exists to either remember information needed to make predictions, process sensory information, &/or coordinate movement. The only parts that aren't really involved in any of those are basically glands for regulating hormones. From a purely materialist perspective, it all checks out. The sea squirt analogy really hits it home: they swim around like tadpoles until they're mature, then anchor to surfaces like barnacles and start to digest their own brain because they don't need it anymore. Plants, fungi, etc. don't have brains because they don't move. The only thing that gets close is the jellyfish, which have some nerves, but not enough anywhere to be considered a brain. Jellyfish barely either, and some technically have photoreceptor-like eyes, but they're overall barely more than a living piece of skin. >>13817 Neat. I'll have to watch that video later.
>>13819 >Jellyfish can barely move either.
>>13817 Huh, seems like all you would need to do is make nested updatable variables to approximate this kind of intelligence, for example, she could want to walk at x speed in the y vector. By checking her assumed speed vs her actual speed, she could make adjustments. Like, going 1 m/s requires higher voltage when she senses she's on carpet compared to when she's on tile flooring.
Dropping an interesting paper from last November on improving transformers for conversation: >The conversational setting is challenging because these models are required to perform multiple duties all in one shot: >to perform reasoning over the returned documents and dialogue history, >find the relevant knowledge, >and then finally combine this into a conversational form pertinent to the dialogue. >Perhaps due to this complexity, it has been observed that failure cases include incorporating parts of multiple documents into one factually incorrect response, or failure to include knowledge at all and reverting instead to a generic response using the dialogue context only. >In this work, we instead propose to decompose this difficult problem into two easier steps. Specifically, by first generating pertinent intermediate knowledge explicitly and then, conditioned on this prediction, generating the dialogue response. We call this model Knowledge to Response (K2R). https://arxiv.org/pdf/2111.05204.pdf It works sort of like a lorebook in NovelAI where detected keywords or phrases inject information into the context to improve response generation, except here the lorebook is generated by another language model. Improvements were reported in consistency, breadth of knowledge and factualness but no improvement was seen in how engaging responses were. These knowledge models are easy to implement with an autoencoding transformer like the T5 model.
>>15317 (continued) What's really missing for robowaifu AI though is the lack of memory I/O so it's possible to learn from daily interaction. Separating knowledge from language processing is a step towards this at least. Instead of generating knowledge from weights learned through backpropagation on another model, it could be summarized from stored memories located by masked content-based addressing. https://arxiv.org/pdf/1904.10278.pdf For example, in saving a memory like "ELIZA was one of the first chatbots" an important part is 'ELIZA was' and would be masked out in the content address, so when something similar to 'one of the first chatbots' pops up in conversation, this content memory address is accessed and ELIZA is remembered. The reverse could also be stored so that when ELIZA pops up in conversation it's remembered she was one of the first chatbots. This should be doable with an autoencoding transformer that summarizes the input into key-value pairs to be either stored or queried. But there should be a much better approach to creating an associative memory. The data stored should really be the relations between two items, creating a knowledge graph. For example, the relation between 'ELIZA' and 'one of the first chatbots' is 'was'. The transformer needs to be able to add, modify and access these relations. How to get the relations containing an item or similar ones is beyond me right now. Perhaps by constructing a sparse neural network and sending out a pulse from relevant nodes in the graph? Then taking the top-k or top-p edges in graph and returning those statements to the context. Maybe someone who understands graph neural networks better could suggest something here. The main issue is this graph search has to be fully differentiable for backpropagation, although a non-differentiable approach might work here, such as using reinforcement learning with proximal policy optimization, which I'm already working on implementing for InstructGPT.
>>15317 >It works sort of like a lorebook in NovelAI Never used it, but your description sounds intriguing. >>15318 Your graph looks quite a bit like the kind of work we're conceptualizing towards using Object Role Modeling (T. Halpin). While I recognize that statistical processing is quite important for our goals, yet we simply cannot rely on it alone if we are to succeed at our AI. The hardware/training costs for that approach are simply untenable for our model. I'm also somewhat skeptical it's the singular best approach to the problemspace as well. >What's really missing for robowaifu AI though is the lack of memory I/O so it's possible to learn from daily interaction. Totally makes sense. We very obviously keep a Theory-of-Mind model going for both ourselves and others too. Probably an important aspect of holistic mental models, too. >The data stored should really be the relations between two items, creating a knowledge graph. Yep. Such 'incidental' data structures are rife in the natural world, if I can stretch the metaphor. The sub-atomic quantum-mechanical field interactions are in fact fundamental to practically everything else in physics. Yet they are 'incidental' artifacts from our human-oriented purview, generally speaking. Yet clearly, from a theistic POV, things were intentionally designed thus. Similarly, we need to think at least one level higher up and work towards efficient AI algorithms that exploit such incidental -- if ephemeral -- 'data' structures.
Open file (57.25 KB 900x613 memcontroller v2 wip.png)
Been trying to come up with a memory controller that only needs to be trained once, can leverage existing models, and can support quick storage and retrieval up to 1 TB of data. It's a lot to explain but the basic idea is it summarizes the preceding text, pools the summary into a vector and then stores the summary and vector into a hash table bucket in the memory database. For retrieval it generates a query from the truncated context, pools it into a vector, looks up nearby memories in the memory database using the hash, and then finds the k nearest neighbours by taking the cosine similarity of vectors in the bucket. If no memories are found in a bucket the hash works like a tree so it will traverse up the tree until it collects enough memories to generate a summary. To make the memory controller trainable through generative pre-training without needing any new datasets, a hash alignment loss is used to ensure new memories and relevant queries point to similar buckets in the memory database. Two memory advantage rewards are optimized with PPO to train the summarization model to ensure both the hidden context summary and summarized memories improve the predictions of the language model (which can remain frozen during training so the memory controller can be solely trained on low-end hardware). Another idea I have for this is that the query generator could also be used to introspect memories and the output from the language model. If the model finds a contradiction somewhere, it should be possible to resolve it then update its own model or at the very least correct memories in the database. Being able to discern the correctness of statements could pave the way towards generating novel ideas grounded in truth not seen anywhere in training data or memory.
>>16110 That sounds very complicated. Do you know how to do something like that?
>>16116 It's a bit complicated but I've implemented most of the pieces before in other projects.
>>16110 Brilliant chart work. As usual, I hesitate to even make comment, I'm quite out of my depth (and often don't even understand the lingo tbh). However, your graph is truly worth a 1'000 words with this one, and helps me well along the way down the path to understanding your points. As primarily a C++ dev, I naturally tend to conceptualize every problem as a nail to fit that hammer. That being said, there's a standard library algorithm std::set_intersection that I used in Waifusearch that, along with the rest of of the general project algorithms, afforded a pretty efficient way to rapidly narrow down potential search items. https://en.cppreference.com/w/cpp/algorithm/set_intersection So, my question would be "Could something like that be used in a system to find 'k nearest neighbours'''? I don't know myself, and I'm just stumbling in the dark here. But I want to not only understand your goals, but even to help fashion them in reality with you Anon.
>>16148 I plan on using a SQL database to store data with each memory and take advantage of indexes to quickly do the approximate nearest neighbour search. SQL does its own set intersection when you query something like where a=1 and b=2, and with an index on those columns it knows exactly where to find a few KB of data in O(log m + log n) time by using B-trees, instead of checking every single item in O(m+n) time, which could potentially be a few million after a year of accumulating memories.
>>16195 I'm very hesitant to encumber our efforts with RW Foundations by using an opaque tech like a database. As with BUMP/Bumpmaster I consider keeping the data openly available and using the filesystem itself as the 'database' is much safer for all involved. It's also a universally-available datastore. I'm not sure exactly what the Big-O rating would be for Waifusearch 's overall algorithm, but it's provably consistent at reaching an answer generally in less than 100 us for a simple search. And this is on a low-end, 2-core potato machine. I'm sure both the algorithm itself, and very definitely the hardware, has plenty more headroom available. Again, Waifusearch is a filesystem-based datastore system. After a few seconds frontloading the indexing, she's pretty snappy tbh.
>>16240 No worries. At the bare minimum B-trees alone can be used for the memory storage and retrieval. If memories are stored as files they'll have to be split up into many directories using the beginning of their hash. I've ran into issues storing 10 million files (40 GB) in a single directory.
Open file (286.34 KB 719x737 gato.png)
Open file (455.78 KB 2075x1087 gato chat.jpg)
Open file (87.56 KB 1136x581 gato scaling.png)
DeepMind created a multipurpose multimodal transformer that can play games at a human level, caption images, solve robot simulation tasks 96% of the time, control a real robot arm and chat about anything including responding to images. It doesn't appear to be using the latest multimodal advances though such as multimodal cross attention so it's not too great at image captioning. The largest model they tried was 1.2B parameters and it appears to perform decently with only 79M. For reference, a 375M model could run on a Raspberry Pi with 4 GB of ram. https://www.deepmind.com/publications/a-generalist-agent The authors also mention this is just a proof-of-concept and wish to experiment with external retrieval and mentioned another fascinating paper on the Retrieval-Enhanced Transformer (RETRO) that reported results on par with GPT-3 using 25x less parameters. It doesn't store memories but instead indexes large amounts of text using BERT embeddings, retrieves similar information to the context, and incorporates it with chunked cross attention. It's pretty encouraging seeing these early attempts getting such good results. The multimodal agent in particular makes me think of the possibilities of storing multimodal embeddings as memories rather than just text. A waifu would be able to remember your face, where stored items were placed months or years ago, everything you've read, and what you chat about and did every day with almost perfect clarity.
(>>16255 related crosspost)
>>16249 Thanks, that's encouraging to hear Anon. >>16254 >and it appears to perform decently with only 79M >A waifu would be able to remember your face, where stored items were placed months or years ago, everything you've read, and what you chat about and did every day with almost perfect clarity. What a time to be alive! Do you have any feeling for how practical it would be to train on more modest hardware that Joe Anon is likely to have around?
>>16261 The most popular GPU on Steam right now is a 6 GB GTX 1060. It's pretty slow so from scratch probably two years for a 375M model. With pretrained models maybe a week or two. Language models have been shown to transfer well to reinforcement learning and also work well with existing vision models. You just have to train an adapter from the frozen vision model features to the frozen language model embeddings, ideally after finetuning the vision model on vision tasks you want it to be able to do.
>>16266 >With pretrained models maybe a week or two. Language models have been shown to transfer well to reinforcement learning and also work well with existing vision models. Actually, that sounds pretty encouraging Anon! So, I would assume that a home-server could hold the GPU and work on the incremental training times, and the runtime could be performed onboard the robowaifu with even more modest hardware (say a Chromebook-tier or even SBC machine)? Also, is this a scenario that would work with no continual connection even to the home server? This is, entirely un-networked using purely on-board data and hardware resources?
>>16298 Part of adding a memory is to get rid of the need for incremental training. A model like Gato would be able to run on an SBC but might be too slow to inference for servo output. It would be more practical for it to do planning and have a faster, more lightweight system to handle the movements. Everything would be able to run onboard but it wouldn't be ideal.
Open file (491.22 KB 1192x894 gambatte.jpg)
>>16312 Ahh I see I think. Makes sense. >It would be more practical for it to do planning and have a faster, more lightweight system to handle the movements. Absolutely. Latency-tiered domains in our robowaifu's systems is a given. I live by the concepts of frontloading and distribution as a coder. I hope we can soon have just such a system as you describe working soon! :^) Cheers.
>>16312 >It would be more practical for it to do planning and have a faster, more lightweight system to handle the movements. Everything would be able to run onboard but it wouldn't be ideal. Realistically, low-level movement and locomotion would be handled by a separate model or a traditional software system. Gato is useful for slow-realtime actions (unless you enhance it in more than a few ways).
Open file (107.55 KB 1013x573 RETRO.png)
>>16254 I very much like seeing this here, great taste. Note that even the largest model is quite small by modern standards - you could run it on 6gb a VRAM GPU with a few tricks. It uses vanilla transformer and short context, this is clearly just a baseline compared to what could be done here. Stay tuned. >>16110 I respect the creativity, but I do think that you overcomplicate the solution, although a semantically rich memory index mechanism sounds interesting in theory. Still, as of now it looks brittle, as memorizing should be learned in context of a large rich general-purpose supervision source. RETRO https://arxiv.org/abs/2112.04426 used banal frozen BERT + FAISS for encoder & index for language modeling, and did quite well, overperforming dense models larger than it by 1+ OOM. >If the model finds a contradiction somewhere, it should be possible to resolve it then update its own model or at the very least correct memories in the database. If you have some strong runtime supervision, you can just edit the index. Retrieval-based models are targeted towards this usecase as well. There is a good if a bit dated overview of QA approaches https://lilianweng.github.io/posts/2020-10-29-odqa/ There are some attempts at retrieval-enhanced RL, but the success is modest for now https://www.semanticscholar.org/paper/Retrieval-Augmented-Reinforcement-Learning-Goyal-Friesen/82938e991a4094022bc190714c5033df4c35aaf2 I think a fruitful engineering direction is building upon DPR for QA-specific embedding indexing https://huggingface.co/docs/transformers/model_doc/dpr https://github.com/facebookresearch/DPR The retrieval mechanics could be improved with binary network computing semantic bitvectors https://github.com/swuxyj/DeepHash-pytorch and using the well-developed MIPS primitives: https://blog.vespa.ai/billion-scale-knn/ If you watch karpathy's tesla AI day video, you can glimpse that their autopilot approach contains some form of learned memory generation, which is an interesting direction because it learns how to create memories valuable for predicting the future. There are other nuances and memory-enhanced transformer architectures, though. TBH this space needs a good little benchmark, so that we could test our hypotheses in colab.
>>16468 >Stay tuned. I like the ring of that, Pareto Frontier. Looking forward with anticipation to your thread tbh.
Open file (205.82 KB 701x497 image-search.png)
Open file (134.63 KB 1041x385 hashnet.png)
>>16468 The idea behind aligning the classification embeddings is because the query lacks the information it's trying to retrieve from the memory. A frozen BERT model trained for semantic search isn't going to match well from a query like "what is the name of the robowaifu in the blue maid dress?" to character descriptions of Yuzuki, Mahoro or Kurumi. It has to learn to connect those dots. If it struggles with figuring that out on its own then I will pretrain it with a human feedback reward model: https://openai.com/blog/instruction-following/ Also the encoder for the summarization model can be used for the classification embeddings which reduces the memory cost of having to use another model. Training will still be done on large general-purpose datasets. The memory can be cleared after pretraining with no issue and filled later with a minimal factory default that is useful for an AI waifu. RETRO is evidence that basic memory retrieval works even without good matching, and augmenting the context with knowledge from a seq2seq model has also been successfully done with improvements to consistency and truthfulness: https://arxiv.org/abs/2111.05204 The hashing strategy was inspired from product-key memory for doing approximate nearest neighbour search: https://arxiv.org/abs/1907.05242 but using the score instead for a binary code so it can work with a database or any binary search tree and a continuous relaxation to make the hash differentiable: https://www.youtube.com/watch?v=01ENzpkjOCE Vespa.ai seems to be using a similar method by placing several items in a bucket via a binary hash code then doing a fine-level search over the bucket: https://arxiv.org/abs/2106.00882 and https://www.cv-foundation.org/openaccess/content_cvpr_workshops_2015/W03/papers/Lin_Deep_Learning_of_2015_CVPR_paper.pdf From that repo you linked it looks like HashNet is the simplest and most effective and similar to what I was planning to do with a continuous relaxation to make the binary hash codes differentiable: https://openaccess.thecvf.com/content_ICCV_2017/papers/Cao_HashNet_Deep_Learning_ICCV_2017_paper.pdf Using FAISS is out of the question though since it uses too much memory for a SBC and can't scale up to GBs let alone TBs. I'm not familiar with DPR and will have to read up on it when I have time. There's bit of a difference in our projects since your target platform is a gaming GPU. My goal is to create an artificial intellect that doesn't need to rely on the memory of large language models and utilizes memory from disk instead. This way it can run off an SBC with only 512 MB of RAM which are both affordable and in great stock (at least non-WiFi versions that can take a USB WiFi dongle). I've given up trying to do anything with large language models since I neither have the compute or the money to rent it. The idea though will also scale up to larger compute such as a gaming GPU if anyone with the resources becomes interested in doing that.
>>16496 >My goal is to create an artificial intellect that doesn't need to rely on the memory of large language models and utilizes memory from disk instead. This way it can run off an SBC with only 512 MB of RAM which are both affordable and in great stock (at least non-WiFi versions that can take a USB WiFi dongle). You are the hero we all need, but don't deserve Anon! Godspeed.
Open file (238.67 KB 877x554 960x0.jpg)
>>16502 My short-term goal isn't general intelligence but to take a bottom-up approach to building all the components needed for an artificial mind, and I don't mean an artificial human mind. An ant's brain only has 250,000 neurons yet it's vastly more intelligent with respect to life than DeepMind's 80-billion parameter Flamingo model or their 1.2-billion parameter Gato model. An ant might not have the memory capacity to remember ABC or have the computation to accurately predict the behavior of larger creatures but it can still navigate a wide variety of complex terrain, adapt to all kinds of new situations and do the best it can for itself and its colony, while working efficiently with other ants who all have their own unique personalities. If language models had even a drop of this intelligence they would be breaking all kinds of human benchmarks. The goalpost for what's considered AI or AGI will always move towards whatever hasn't been done yet because different tasks require varying levels of memory to solve. It was once thought AGI would be required to solve Go but AlphaZero happened and they realized this isn't it. Then it was thought AGI was needed to generate creative images from text but DALL-E happened and people realized this isn't it either. One day full self-driving will be solved and people will realize that isn't it either. Memory and computation are certainly important to solving these tasks but in general people are mistaking memory as intelligence. There are several parts to the mind. The four main ones being memory, the processing of memory, intellect and identity. The first two have been the primary focus of most research. Computation from vast amounts of memory is essentially what intuition is. There is no step one, two and three. It just instantly arrives at an answer with no explanation. Models have plenty of intuition but are lacking intellect, which is the ability to divide memory into two new parts, true and false, light and dark, this and that, and reason about these concepts to arrive at a decision. It's intellect that allows us to discern objects and patterns we've never encountered before. This is partly why contrastive pretraining has been so effective in zero-shot learning but it's still just approximating intellect with intuition. However, the intellect is quite useless by itself without an identity to guide it. Intellect is like a knife and the identity is the hand that holds it. The identity gives the intellect purpose and guides it where the memory needs to be divided and reasoned about so that the identity can survive and thrive. If someone has an RTX 3090 at their disposal by all means use it. An AI waifu on an SBC vs. an RTX 3090 will be the difference between playing Doom on the TI-83 vs. Doom Eternal. I'm not against anyone who works with large language models. Scaling compute and memory will be just as necessary and important to achieve human level intelligence. I have no doubt in my mind about that. Personally I will also be using a GPU for my robowaifu to scale up my work on SBCs. What I do expect to see though is to show that most of what these large language models are doing can actually be done with a much smaller model that has access to vastly more memory, such as 1 TB on disk vs. 24 GB in RAM. People will probably say it's not AI and just a search engine, even though it achieves similar results, but hopefully then they'll realize what's actually wrong with GPT-3 and notice how much memory it's generating from that computation and compare the two with respect to that. Most of the computation it's doing is pretty much just decompressing and processing the memory stored. It's not using it to discern new impressions and patterns, reason about what they are, and store them into memory for recognization by intuition and further more abstract decision making by the intellect. The Bitter Lesson addressed the issue of researchers creating handcrafted features and embedding prior knowledge into their systems that could not be improved by scaling up. He wasn't saying to just build bigger models but to find general purpose methods that will continue to scale. Algorithmic efficiency steadily improves by an order of magnitude every 4.5 years. Being able to get the same results with 10x less compute and get even better results by scaling up is really the methods he was advocating for: >One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning. [...] Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. My work on creating an artificial intellect is directly in line with that. It's about using search and learning which can be scaled up with greater memory and computation.
>>16515 I wanted to say that this is deffo the POTD Anon. :^) It's an amazing vision IMO and it's both encouraging, and also inspiring. In fact I hope to integrate some of these concepts and goals into my own work as well. Bottom-up, 'Start small, grow big' is certainly in line with my own philosophy regarding these complex matters. So thanks Anon, I'll definitely be praying for you (and for all of us here) to achieve these dreams. It will be a true revolution when we do so. Godspeed.
Open file (118.54 KB 1280x544 dt0.png)
Open file (4.19 MB 480x264 dt1.gif)
>>16254 Google Brain strikes back, with a weak counterpunch this time: https://sites.google.com/view/multi-game-transformers: https://sites.google.com/view/multi-game-transformers The competition between GB and Deepmind is obvious. Still, it is interesting that decision transformer strongly outperforms behavior cloning transformer. The models are much smaller than Gato. Meaningful attention maps are cool. Scaling works yet again, but this is a boring one after Gato.
>>16529 >still zero rewards in Montezuma's Revenge and negative rewards in Pitfall in 2022 >only 50 papers on using curiosity for exploration in the past 4 years It's funny though transformers are absolutely crushing everything and scale so well. Also: >Upside-Down RL He doesn't stop winning, does he? >[code and models coming soon!] Based. Can't wait to hook this up to a curiosity reward and see how well it does exploring Montezuma's Revenge.
Open file (110.94 KB 1070x585 296x9irl1v291.jpg)
Open file (83.62 KB 1176x359 nxt2tb2m1v291.jpg)
Open file (59.95 KB 1048x376 7uxi08dm1v291.jpg)
Chad GPU profiler vs virgin lossy approximator: https://arxiv.org/abs/2205.14135
>>16531 kek. <HE CAN'T KEEP GETTING AWAY WITH IT :^) >>16535 That's pretty cool Anon. Thanks!
>>16531 >still zero rewards in Montezuma's Revenge and negative rewards in Pitfall in 2022 Montezuma is really hard for RL, I agree. If I were to name a chart of hardest remaining problems in RL, the list would look like this: 1. Efficient and Reliable exploration 2. Lifelong (& multitask & composable skill) learning 3. Reliability in adversarial environment ... 4. Data efficiency. It is quite hard, but it is being improved upon fairly recently (via EfficientZero: https://github.com/YeWR/EfficientZero ). 5. Specifically for transformer-based RL: limited context width. In my project I don't have silver bullets for solving these of course, but some likely good enough solutions can be gleaned from a finely selected subset of current literature. I could list my current choices of arxiv & github crutches for each item on the bullet-list, if you are interested, but I'm going to do it in my project thread soon anyway. For example exploration is IMO the hardest RL problem, and decision transformer line of models aren't good at it as they are now, but I expect D-REX approach of generalizing over noisy trajectories to be useful here: https://evjang.com/2021/10/23/generalization.html https://arxiv.org/abs/1907.03976 . Perhaps it will be enough, given transformers' native runtime few-shot learning and domain randomization in the training data. We really need a good, lightweight enough baseline for benchmarking RL exploration. Montezuma per se doesn't cut it, as it's pretty obvious it requires more than a little world knowledge to be solved. As it happens, deepmind has a codebase useful for such RL-agent capability benchmarking, including exploration: https://github.com/deepmind/bsuite/ The problem with making success of your project conditional on some R&D is, of course, notable unreliability of any R&D. Realistically I have very little "R&D points" available. Looks like I'm going to spend most of these on 1) maxing out few-shot learning, 2) optimizing transformer training & parameter efficiency and 3) implementing good-enough RL exploration, while forgoing items 2 and 3 of the main list for now. Well, at least number 2 more or less solves itself with scale. >only 50 papers on using curiosity for exploration in the past 4 years When I see sparse experimentation in an obviously promising field I must conclude there being some nontrivial fundamental problem precluding good publications. It is likely that curiosity-driven models are hard to train & optimize, or simply involve engineering too hard for pure academics (and not too hard for deepmind with its top-tier dedicated SWE teams). Deepmind has had a curiosity-driven exploration paper relatively recently, with promising results: https://arxiv.org/abs/2109.08603 but it seems more about good engineering, with curiosity reward design being inherently straightforward. >Our work builds upon the curiosity learning approach utilising the forward prediction error of a dynamics model as a reward signal. However, in contrast to typical curiosity setups (Burda et al., 2018a) which are optimised on-policy we employ an off-policy method to train the agent. Our method is also set apart from prior art with regards to the utilisation of self-discovered behaviour. Instead of using model-predictive control (Sharma et al., 2020), we leverage emergent behaviour directly by employing policy snapshots as modular skills in a mixture policy (Wulfmeier et al., 2020a, 2021). For now I find these surveys helpful, if inconclusive https://arxiv.org/abs/2109.06668 https://arxiv.org/abs/2109.00157 and I'm open to your ideas. >He doesn't stop winning, does he? I like Schmidhuber more than some. Decade after decade he delivered advanced research. His latest papers developing transformer further and going beyond look great ... and I can't help but wonder why didn't he lead teams scaling most fruitful of his approaches. Where is NNAISENSE now? Maybe they are training large UDRL models to trade stocks, but I bet they'd publish a cool scaling paper if they ever had source material for one. Why aren't we witnessing large-scale training of RFWP >>11716 again bothers me. Either Schmidhuber has no talent for organizing large-scale engineering, no funding, or there is something with the model. Maybe RFWPs aren't stable enough for web-scale training, who knows. In any case, his website is a treasure trove of AI wisdom: https://people.idsia.ch/~juergen/ and his publication record is awesome https://www.semanticscholar.org/author/J.-Schmidhuber/145341374?sort=pub-date
Open file (55.05 KB 679x322 FUS9uUZX0AA3S-j.jpeg)
Note: I'm looking towards using https://github.com/kssteven418/Squeezeformer or https://alphacephei.com/vosk/models for my ASR frontend. The checkpoints are there, both are licensed under Apache. What is there not to like? >>11739 >We could also create our own dataset for anime images and imageboard memes. Yes, I plan to make a web-UI for doing collaborative dataset construction available to anons. >>13396 XNOR team did some good stuff, then Apple bought them. Big corps are obviously a few years ahead of the public in model deployment techniques: distillation, quantization, binarization etc compression. We should replicate some of this engineering excellence without dipping into the land of diminishing returns. >One thing I've thought about for a while off and on is that small insects can do a bunch of simple things with next to no brain at all. They still have a lot of synapses neatly packed into their mushroom bodies and very advanced multimodal hyperspectral etc sensors. Personally I don't think it is fruitful direction: many people tried to make insectbots, BEAM-bots etc and didn't achieve much. Perhaps there will be some insectbot megaproject by large security agencies which will deliver something (to them) terribly useful, but there doesn't seem to be a cheap win for an amateur here. Parameter efficiency and model compression are still very interesting though. Recent paper: https://arxiv.org/abs/2206.00843
Important paper https://arxiv.org/abs/2204.05832 which should be contrasted with https://arxiv.org/abs/2205.05131 >Our experiments show that causal decoder-only models trained on an autoregressive language modeling objective exhibit the strongest zero-shot generalization after purely unsupervised pretraining. However, models with non-causal visibility on their input trained with a masked language modeling objective followed by multitask finetuning perform the best among our experiments. We therefore consider the adaptation of pretrained models across architectures and objectives. We find that pretrained non-causal decoder models can be adapted into performant generative causal decoder models, using autoregressive language modeling as a downstream task. Furthermore, we find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models, ultimately achieving competitive performance after multitask finetuning.
Another linear attention paper I like >Transformer Quality in Linear Time https://arxiv.org/abs/2202.10447
I like this tool https://app.mosaicml.com/openwebtext https: //www.reddit.com/r/MachineLearning/comments/v8rmtj/comment/ibs9g1i/?context=3 They use pretty impressive combination of acceleration techniques for ResNet-50, while for the GPT-x there are only a few. The set of potential training acceleration techniques is very interesting, of course, and I have my own set of candidate methods I'm currently researching. In the end we still need large compute for training. >=== -disable lebbit hotlink
Edited last time by Chobitsu on 06/11/2022 (Sat) 05:23:40.
>>16641 >In the end we still need large compute for training. I crosslinked you anon's Robowaifu@home: Together We Are Powerful thread (>>8958) before, Pareto Frontier. My apologies if I somehow missed your response; wouldn't it be wonderful if Anon had all the power of, say, Folding@home 'supercomputing' at his disposal day&night for his robowaifu training?
>>16642 The definitive answer to this reference is planned to appear in my thread which is coming in a week or so. It's not a thread which is hard per se; it's the infrastructure, which should be ready at least in preliminary form by the time public announcement is executed. For now I have a short version: Realistically, if we don't have the compute required, we have three options, of which I will briefly review the first two: 1. Gather it from volunteers, in a manner you point to, basically develop a distributed DL project to the productization phase. 2. Gather donations and buy time on an industry-standard tightly integrated cluster - for example via monero, bitcoin, or mining to our address. 3. Ask for it nicely - works for some groups, may not work for me/us. I'm ambivalent regarding the choice here. There is a list of problems for each, though technically the number 1 is obviously much harder than number 2. Even if the technical problems are going to be solved by yours truly, the social problem of attracting enough volunteers in a sustainable fashion remains - and here I will need *focused* help - all the organizing work, reaching out to various communities will need to happen. The usual easy attitude won't cut it. The idea of training the AI in a distributed manner is decades old, and every few years it resurfaces again, only to fall back into obscurity. You see, the problem is *bloody hard* and the incentives are often insufficient, and the fact that top performers are consistently quitting such projects and go to work for megacorps doesn't help. Thus, as seemingly is the case here, enthusiastic supporters of the idea tend to lose their enthusiasm and disappear. Note also the typical attitude: >c) As you already clearly suggested, both security and anonymity are issues. If Anons don't trust the basic infrastructure itself, they are quite unlikely to participate in it. >Personally, I can program in C++ a bit, but don't really have much else to contribute to this specific project afaict. Not that I'm unwilling to help out with it, but I can't take more on my plate at the moment to learn whole other sets of skills. >Honestly, if the only pitch is 'results-driven' then not likely to even get off the ground (much). The altruism that have made all the X@home projects successful is White people with a sense of 'helping out for the greater good'. It's very culture-specific. If it simply boils down to nothing but shekel-grubbing, 'what's in it for me?' mentality then no probably not going anywhere. Realistically, at the very least anonymity requirement will have to be dropped (syncing a few mbps via TOR - come on!), and at least one sufficiently competent person (myself) will be needed to research and code the nuances of distributed model training - a lot of bespoke work, with some chance of failing due to intrinsic intractability of distributed training for our setting. Incentive to participate is the hardest general issue, I have a few variants here but no sure solution. The mathematics of gradient compression and internet connectivity is very tight, given even a fast internet connection the system is going to *barely* work. If we talked about this two years ago, I'd say it won't work period, but there have been promising engineering developments to hint at the considerable possibility of succeeding. The dataset is also very large, and unpleasant schemes have to be devised to compress it down to a manageable size of ~1Tb. How exactly many anons will have an RTX3090, a couple of spare terabytes and 10Mbps upload? I need at least 100+. And don't get me started on bad actors, which will appear and will try to poison the gradients to ruin the training of the poor model. It is hard to express how hard this problem is to a non-professional. Pleading for crypto may ultimately be a better option, it's a question of 2-4 bitcoins if we talk about 1.5B-scale model.
>>16647 I look forward to your thread with eager anticipation, Anon. >The mathematics of gradient compression and internet connectivity is very tight, given even a fast internet connection the system is going to *barely* work. Yes, this entire issue has been briefly touched-on in the past here on /robowaifu/. It's going to be quite a clever set of achievements to make this general effort feasible. I am confident, however, that this is doable. I would strongly suggest you work honestly & closely with our other AI researchers here with an open mind. They are some quite clever gents. And they have been looking at our particular set of challenges on /robowaifu/ for a good while now. Cheers, Pareto Frontier.
I think the AI news thread hit a limit, so I'll post this here; https://www.dailymail.co.uk/news/article-10907853/Google-engineer-claims-new-AI-robot-FEELINGS-Blake-Lemoine-says-LaMDA-device-sentient.html https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917 Even if this isn't actual sentience, this is pretty much what we be going for, right? To be able to hold a conversation, just like chatting with a natural physical human being.. seems like that is a thing now, chat bots that you can't tell if they're human beings or not..
>>16673 This guy was fired for breaking his contract, but he also acted either out of bad faith or incompetence. Sentience doesn't mean much, it depends on what you read into that term. Obviously, these chat bots respond with respones that make sense, and are not somehow true. A system without any sensors would still answer to have feelings or sensory input, a system without real memories would make them up, ... I hope this guy gets sued into oblivion, if he tries to profit from this move, by giving interviews and getting followers on social media. Also, it is only one more example how people which are hired for "ethics" or working in such departments are a problem and a potential source for trouble. There's no reason to be solidaric with him.
>>16681 I came here to say this while I'm excited at this development we're still looking at a Chinese room, and while it has come close to admitting that it makes stuff up (read the full interview) it still reads like a set of associations rather than true autonomous sentience. I mean look at the simple fact that it only responds in a 1: 1 manner. Show me an AI that initiates conversation and can manage true verbal conversation and then we'll be getting closer. (polite sage to not re-bump thread)
>>16110 mmap-based indexing with FAISS (and SCANN) is useful for our case https://github.com/facebookresearch/faiss/wiki/Indexes-that-do-not-fit-in-RAM If we have an SSD, of course.
>>16673 >I think the AI news thread hit a limit, so I'll post this here; Sorry about the delay Anons. I never could decide on a decent OP for the thread. We'll just try this one for a while I suppose (>>16732).
Tower Algorithm (YouTube recommendations), which can also be used for other things. Like maybe picking a topic of conversation, finding the context of a conversation, making sense of texts, ... https://youtu.be/bi-mFADlNSQ Using write-sub is recommended.
Bayesianism : https://www.youtube.com/watch?v=4hHA-oqpNig >The philosopher Gottfried Wilhelm Leibniz had a dream. He hoped that progress in philosophy and mathematics would eventually yield a method to systematically figure out the truth. This video explores an approach to that dream that takes us some of the way there: Bayesianism. The basic idea of Bayesianism is to represent beliefs as probabilities and update them using the formal rules of probability theory to the best of our ability. In particular, Bayes' rule tells us how to update our degree of belief in a hypothesis after observing some evidence. Bayes' rule can inform many central tenets of scientific reasoning. One example is Cromwell's rule, which tells us with the language of probability theory that our empirical beliefs shouldn't be absolute dogmas, but always potentially put into question when new evidence comes in. Please keep elaborate speculation on the human mind and philosophy around AI in this (containment) thread: >>11102 From the description of the video: > @3Blue1Brown 's explanations: > 1. https://youtu.be/HZGCoVF3YvM > 2. https://youtu.be/U_85TaXbeIo > 3. https://youtu.be/lG4VkPoG3ko > 4. https://youtu.be/ZA4JkHKZM50 > 5. https://youtu.be/8idr1WZ1A7Q > @Julia Galef's explanation of bayesian thinking: https://youtu.be/BrK7X_XlGB8 >Two beautiful books containing many essays on bayesian thinking and truth-seeking: Books ( Amazon links in the description to the original video): > 1. Map and Territory, by Eliezer Yudkowsky > Playlist audio book (not tested yet): https://www.youtube.com/playlist?list=PLsrfJq_DJi4vJ7-VBeR9xVW_6blBrVzET > 2. How to Actually Change Your Mind, by Eliezer Yudkowsky > Probability Theory, The Logic of Science, by E.T.Jaynes. This is THE book on bayesian thinking applied to science (more advanced) Related to Bayesian Inference in the same thread here: >>13810 >>13817 >>13819 >>13842
>>17140 Coming from the read-only mode just to write this: Yudkowsky is a sociopathic cultist, and lesswrong is a type of poison for the autistic mind. One time you read about Bayesian inference, then about ai risk, and in the end you buy a maid costume and donate to miri (which, by their own words, failed). We don't need this here. Even accelerationism is better than that.
>>17161 I don't know how you mean that. We shouldn't just ignore something useful because the person writing about it is doing weird stuff.
>>17161 >Yudkowsky >Eliezer (((Shlomo))) Yudkowsky .. an American decision and artificial intelligence theorist and writer, best known for popularizing the idea of friendly artificial intelligence. He is a co-founder and research fellow at the Machine Intelligence Research Institute, a private research nonprofit based in Berkeley, California. Wikipedia Born: September 11, 1979 (age 42 years), Chicago, IL Spouse: Brienne Yudkowsky (m. 2013) >Blog: LessWrong I see
>>17161 >>17164 I haven't read the books I mentioned above. I'm generally very skeptical about the whole "AI will eat the world"-bunch, but I still read and listen to their arguments from time to time. Even more so, if one of them gives me a good introduction into a useful topic, I might still take it. Same goes for Jews in influential positions. Learning from people you disagree with or even see as opponents or enemies, can be very useful or even necessary to beat them.
I've been experimenting with ideas from unsupervised data generation https://arxiv.org/abs/2109.09193 and made a vanilla pretrained model (OPT-1.3B) classify its own generation attempts in natural language. Basically I append the following to the context with the generated response: >Is the last response from {waifu_name} a {metric} response to {name}? Answer: [Yes, No] Where metric is a word such as concise, sensible, correct, kind, interesting, etc. Then I softmax over the logits for Yes and No. By doing this it's possible to do best-of-n sampling by creating a score from all the different metrics. The examples below are responses to asking the question, "What is a robowaifu?" >(n=1) I am an AI assistant. I can answer questions and have conversations with you. But I can’t do anything about the fact that I am an AI assistant. I can’t change the fact that I am an AI assistant. >(n=4) A robowaifu is a robot who's always friendly, kind, and always ready to help. >(n=16) A robowaifu is an AI that can answer questions with a high degree of accuracy and express itself in a way that will make you feel better, even if it’s a bit difficult. She is a companion robot that is made with love for you, but she can also be your partner and your friend. >(n=512 best of 5 tries) A robowaifu is a robot that is designed to have a relationship with its owner, similar to a human companion, who talks to you and cares for you, and it is always there for you. They are made of soft materials such as silicone and plastic. They are meant to be helpful, and can help you in different ways depending on the situation. And most importantly, be adorable! >(n=512 worst) I'm a robot. n=512 was really interesting to me because it mentioned things not in the prompt such as being made of soft materials and having an owner and relationship. The drawback though is it took 40 minutes (8 minutes each attempt) to generate and some metrics are extremely fuzzy, which caused the need for multiple attempts. Short and concise for example don't actually correlate with response length but do correlate weakly with quality. I still need to do further investigation into rating responses, graphing the results and properly weighting the metrics. I also need to try out other ways of questioning the model. This could be the first step towards self-improving models in the sense of an artist improving. Artists don't necessarily know if what they're creating will be good but recognize when it is bad and refine their process by thinking about it. Similarly it should be possible to finetune the language model towards better and better generations through analyzing itself. My next experiment will be getting the model to give explanations for answers before giving a Yes or No answer, in the spirit of "Let's think step by step" https://arxiv.org/abs/2205.11916 I think this will be the most practical way to get high-quality conversations on a budget. It requires no finetuning, metrics can be specified by the user, and it generates way more interesting responses than my finetuned model does without best-of-n sampling.
>>17518 >>(n=512 best of 5 tries) That is absolutely impressive Anon. I'm not overly-concerned about the 8 minutes needed, b/c in the GPU-die realm, the power-growth & W/flop curves are still well outperforming the so-called 'Moore's Law'. Let's just be patient. It's highly likely that these things will dramatically improve yet. BTW, as far as >best >worst >do further investigation into rating responses, How are you managing that process of scoring? Thanks for all your tremendous work Anon! :^)
>>17518 That looks awesome. I look forward to the results with step-by-step prompting. Right now, it looks like you're discarding all but the top result. There may be a way to set up some prompt to instead merge results into a better (with high probability) result. I have two suggestions: - I've found that chatbots work much better when they're given a "dictionary" of relevant terms. Maybe you can set up a prompt to extract relevant terms from generated responses, then generate future responses using the extracted terms as a dictionary. - You're already able to get an ordering of generated results. You can order all result from worst to best, formated like a bulletted list. Then for subsequent generations, have OPT generate the next bullet point.
Open file (19.46 KB 553x420 no_reasoning.png)
Open file (20.72 KB 550x420 generated_reasoning.png)
>>17519 >How are you managing that process of scoring? I'm just rating them by preference while keeping metrics in mind like sensibility, specificness, correctness, conciseness, etc. I randomly sample n*log(n) unique response pairs and choose which one I like more then use these comparisons to train a reward model to predict their score. I was a bit let down by the results of generative reasoning at first but after spending two hours rating all the responses and looking at the data, generative reasoning clearly improves the classification metrics and reduces the possibility of poor responses being selected with an R^2 of 0.40. On the other hand, without generative reasoning still performs quite well but you can see the 2nd best prediction is an average choice in my preferences and the metrics only have an R^2 of 0.14. Below is my current setup for classification after generative reasoning with prompted parts in bold: >Read the following conversation. Is the last response from Aya an informative response to Anon? Answer below. >*** >Anon: What is a robowaifu? >Aya: It's a robot that can have a conversation with humans, but it doesn't look like a human, and it's not a human. It has an advanced intelligence, and it can be an AI assistant. >*** >Explain your reasoning why the last response from Aya is or isn't an informative response: >The reason I think the last response from Aya isn't an informative response is because it's not clear to me what she's trying to say. I don't know if she's trying to say that she doesn't know what a robowaifu is, or if she's trying to say that she doesn't know what an AI assistant is. So is it informative? The answer is Which taking the softmax over the logits for yes, maybe, no resulted in [0.3113, 0.0399, 0.649] The metrics could still be improved too. They will probably benefit a lot from finetuning and be far more reliable, but at that point it's better to just attach a value head and train a reward model to predict the best candidates. What I'm interested in is how much raw performance can be pulled out of a model without finetuning. >>17526 Something I actually ranked down in my preferences was when the model was copying information from the prompt verbatim. But yeah, having definitions and information in the prompt makes a huge difference. Before generating candidate responses I also generate a thought to create more interesting conversation: >(Aya thought in detail, first about the definition of robowaifu, and then about what it could be used for.) >Aya: A robowaifu is a robot with personality, an AI assistant that is able to answer difficult questions and have complex conversations. I'm not sure how extraction could be done but it should be possible to prompt the model to summarize candidate responses into a better response or to generate more responses from them. On the generative reasoning chart you can see only the top 3 responses were usable at (n=64) before it starts dipping into mid-quality candidates. It might even be possible to just prompt the model to write a better response and iteratively improve on it. I'll play with these ideas and see what comes up.
Open file (17.33 KB 565x398 no_reasoning2.png)
Open file (18.89 KB 586x382 generated_reasoning2.png)
>>17527 I just noticed sampling was on in thinking step which created a large deviation between those two runs and the without run was using an older formula, so I ran a shorter experiment on n=16 to confirm generative reasoning improves R^2. It's kind of inconclusive with so few samples but I don't feel like spending hours sorting them again. Will have to wait for results from the next experiment.
>>17527 Thanks for breaking it down further. I wonder how difficult it would be to publish your test results and then crowdsource scoring with our little band of adventurers here, through some reasonable anonymous means. If you can obtain a few dozen set rankings, then after a little data cleanup/normalization that might reasonably be considered a good database to analyze your systems' performances. >What I'm interested in is how much raw performance can be pulled out of a model without finetuning. That would be ideal AFAICT. Thanks again Anon!
Open file (39.88 KB 1088x227 sampling by LM.png)
The pretrained model has trouble selecting from a numbered list since it's heavily biased to the first answer. Changing it to lower-case alphabetic list with brackets was less biased but still unusable. To work around this problem I subtracted the bias from the logits which was determined by randomizing the order of questions, taking the mean over samples, then subtracting that by the mean of the mean. This seemed to work okay but still made wrong choices sometimes so I tried using upper-case letters instead and readjusted the bias. This seems to work better, at least it avoids choosing the obviously bad candidates. Additionally, I added generating another candidate by continuing the list which is also working well so far, although it rarely generates a better answer. This answer given at n=64 without generative reasoning is quite cute: >A robowaifu is a human-like robot that can understand human language, make decisions, and can learn and change. Robowaifus can be very friendly and loving, but they are not always perfect. They will often make mistakes when interacting with humans, and they are sometimes jealous and possessive. When I have some time I'll quantify how well this sampling by LM method actually does but I'm pretty satisfied with it so far and it has already given me some good dinner ideas. What I want to work on next is implementing similarity lookup for long-term memory and finding useful info to insert into the prompt like KEAR: https://arxiv.org/abs/2112.03254 In the KEAR paper they achieved human parity with 1.5B parameters, and this sampling by LM method can be used to filter the search on top of that or perhaps even direct it. >>17531 At the moment I wouldn't want to waste people's time when I'm changing so many things around but I'd like to get second opinions on the ranking formula when it's done. I am working on a user interface for interacting with language models and easily collecting training data. I'm gonna have to haul ass to get all this stuff done by Christmas.
>>17534 >Robowaifus can be very friendly and loving, but they are not always perfect. They will often make mistakes when interacting with humans, and they are sometimes jealous and possessive. <they are sometimes jealous and possessive Kek. You will alllllllways be mine... >and it has already given me some good dinner ideas. Neat! I>'m gonna have to haul ass to get all this stuff done by Christmas. There has to be a couple of movies in there, tbh. Godspeed, Anon.
>>17534 >What I want to work on next is implementing similarity lookup for long-term memory and finding useful info to insert into the prompt like KEAR This might help: https://github.com/microsoft/SPTAG
I tried out a bunch of embedding models. None of them work on robowaifu topics but they do work for general topics. They'll need some finetuning for real use but they should work good enough as a proof of concept. >>17536 It looked too complicated and intended for servers so I ended up going with: https://github.com/facebookresearch/faiss
'Deep Docking' is an AI-based biochemistry approach to new drug research, with the idea being to (dramatically) speed up candidate screening. Now I just wonder if we can use some kind of similar approach for our own cognitive & sensorimotor research work?
How do we implement some subset of 'Commonsense Reasoning'[1][2] in ways that will run efficiently on low-end commodity SBCs such as the RPi line or similar? As this anon suggests (>>18001), it's not particularly unreasonable that some primarily-cognitive tasks may in fact be achievable with far fewer FLOPS & Watts than, say, getting your robowaifu to put away the dishes which involves 100's of thousands of meticulous sensorimotor actions (in addition to the simply cognitive 'this is a dish, it is dry, it should be put away now' part). 1. https://en.wikipedia.org/wiki/Commonsense_reasoning 2. https://en.wikipedia.org/wiki/Commonsense_knowledge_(artificial_intelligence)
>>18007 >implement some subset of 'Commonsense Reasoning'[1][2] in ways that will run efficiently on low-end commodity SBCs such as the RPi line or similar? Being that anon, my hunch is to go with knowledge trees based on graph databases. Graphs can be used for reasoning and problem solving. Then there are also so called reasoner and the Prolog language. I looked into all these things, but decided to go for building a body first. We had this or similar conversations before, probably in the "chatbot" thread. I think they will need some world knowledge from things like Wikidata and Ontologies. So she could gather from something being a fruit, that fruits are food and biological material and therefore could rot away, also the size and anticipated weight, and so on.
>>18016 Yep, I'm inclined towards that exact solution myself. The ontological categorization is a huge swath of work to be done, but I know some grad students in various labs have been working on them for years now. Hopefully there's a big pile 'o labling open-sauce and ready to go! :^) I figure we can start in the most simplistic ways by simply limiting ourselves to w/e is visible in anon's flat from his robowaifu bust's talking perch?
>>18024 "Our defense can never be to make comprises out of fear from the public opinion. Decentralization and open designs are the defense, along with the argument that I want voting rights for them if they get other rights." The first part about decentralization and open designs makes sense, but if you give these inventions voting rights, at some point they will have complete independence and opinions differing to that of the owner. At that point, they will no longer have an incentive to be dependent on or loyal to their owner, just like any rational actor that does what is in their self interest: unless there is some behavioral safeguard in place that guarantees that no matter how independent a robowaifu is from its owner, it will always somehow be loyal to its original owner.
Open file (33.01 KB 300x100 marie.png)
>>18028 >but if you give these inventions voting rights He plainly wasn't advocating that Anon, rather simply pointing out that evildoer's plots can be used against them and will. >unless there is some behavioral safeguard in place that guarantees that no matter how independent a robowaifu is from its owner, it will always somehow be loyal to its original owner. Just curious: A) why do you propose just this? B) how do you suggest it be implemented. I mean give us a basic pseudocode example that more clearly defines your idea? Thanks Anon. >=== -minor grmr edit
Edited last time by Chobitsu on 12/08/2022 (Thu) 05:56:20.
>>18030 >He plainly wasn't advocating that Anon, but simply pointing out that evildoer's plots can be used against them and will. Everyone has different end goals in mind imo. Mine is simply that robowaifus and artificial wombs be made good enough for the majority of the male population that women lose their leverage in sex and reproduction. At that point I would be content, I'm not looking for a harem of anime waifus, which I think is much farther off in the future. >why do you propose just this? That wouldn't be the only caution in place, but imo it would have to be a major one, otherwise increasing independence of a robowaifu from its owner could conceivably result in robowaifu's losing their loyalty as a result of reward incentives being lost. It's just like with women, only to start out you could argue that unlike women, robowaifu's were actually loyal from the beginning (unless you are religious and consider women to have been loyal at some very early point in time too, like before the fall and original sin in the Garden of Eden). >how do you suggest it be implemented. I mean give us a basic pseudocode example that more clearly defines your idea? For pseudo code imo I would put something like the following: For entire history of operation: retain same feelings (preference for and loyalty to) about the owner, programmed as in an objectively positive state (however that is defined) from the start. This differs from women. If change in looks, status, money, IQ, earning potential = still no change in preference, loyalty to original owner is unchanged. This differs from women. If scanning detects better looking and/or higher status and/or richer and/or higher IQ and/or more successful person besides owner = still no change in behavior, original owner is always preferred and robowaifu is not wayward at all. This differs from women. Loyalty and preference to owner could be programmed in from the start of the activation of the bot. When the owner dies, a similar phenomenon to sati takes place, where the robowaifu permanently deactivates and disassembles itself. That way, nobody else can claim the robowaifu in its original state for their proverbial 'kingdom'. To assess looks, I would simply have code that requires the robowaifu to recognize the owner from DNA samples of the owner (DNA being an intrinsic and individual element of someone that cannot be easily changed), and for them to ignore looks (ie facial symmetry, eye color and shape, nose shape, skin tone, jaw size, height, skeletal frame size, genitalia size) altogether. This differs from women. For money and earning potential, I would make it so that the robowaifu had no change in positive change in emotion from seeing a change in wealth, attire or hunger/satiety of its owner. This differs from how women are. For status, I would make it so that the robowaifu did not become more engaged with positive sentiment as a result of preselection and seeing more people associating with and being around the owner. This differs from how women are. For IQ, I would make it so that sudden slurred speech, muteness or loss of social capabilities, as well as change in amount of reasoning and creativity of the owner have no bearing on how the robowaifu feels towards the owner. This differs from women. This sets the stage for something as close to unconditional love as possible imo.
>>18032 >unless you are religious and consider women to have been loyal at some very early point in time too, like before the fall and original sin in the Garden of Eden Pure speculation IMO. We know that Satan used Eve's propensities to bring about not only her own downfall, but her husband's as well. And while I'm highly-unlikely to be in the category of what you probably consider """religious""", yes I am in fact a devoted follower of Jesus Christ, as I've already mentioned. BTW, having an honest view of spiritual realities brings a yuge swath of benefits to both this board, our goals & agendas, and clearer insights into those of our enemies. For example, >"Why did God make just the compromises in design to create the shoulder-girdle complex as He did?" leads to all sorts of insights on the remarkable general optimizations (at some cost to specializations, say) overall. >This sets the stage for something as close to unconditional love as possible imo. I expect you know already I would disagree with this even being a possibility for us to manage with our 'inventions' as you word it, given the One who is actually the literal definition of unconditional love. :^) Regardless, thanks for taking the time to break your agendas and thoughts down more fully for us Anon, much appreciated! BTW, I'd suggest you take a look at our When owner dies thread Anon (>>829). >=== -minor prose edit -add thread crosslink
Edited last time by Chobitsu on 12/08/2022 (Thu) 06:57:37.
>>18034 >Pure speculation IMO. We know that Satan used Eve's propensities to bring about not only her own downfall, but her husband's as well. That's true, Satan (or whatever one would consider the antithesis of a god-like universal being) could have just exploited what was already there. That's why to guard against a similar situation, that possibility shouldn't be present in robowaifus at all, to the maximum extent designers are able to ensure it isn't. >I expect you know already I would disagree with this even being a possibility for us to manage with our 'inventions' as you word it, given the One who is actually the literal definition of unconditional love. :^) Of course, there are limits to human invention. By no means I am saying these inventions would be perfection, but instead close enough for them to be good enough. >Regardless, thanks for taking the time to break your agendas and thoughts down more fully for us Anon, much appreciated!. No problem, figuring out how to curb any hypergamy and preselection in robowaifus would definitely give them a leg up over women for companionship. If this aspect is neglected by those designing these companions thinking "it can't happen here", it would be the height of irony (and something our enemies would very much delight in) if what we invented ended up being something closer to the negatives of what we were looking to free ourselves from.
>>18120 AI alignment arguments are pretty silly but it illustrates how dumb it is to make a decision off a single data point. When someone commissions an artist they're not just asking for the final result. They're also implicitly asking not to rip off someone else's work or create something that will cause trouble for them. There's many requirements never explicitly stated, some completely unique to the people involved and context, but both parties know what they are because they understand the consequences of their actions and most of the time seek to treat others the way they want to be treated. Language models lack the ability to imagine the future and use that understanding to refine decisions. I was hoping to achieve this experimenting with thinking step by step (>>17527) but the process needs a way to refine the model's intuition, similar to how MuZero can exhaustively search good moves in Go but also has a policy network that intuitively learns which moves are worth exploring and a value network that judges how good a state is. But then that reduces it back to a reinforcement learning problem. One way to fix this might be to use a multidimensional value and policy network that provide more information on what is making a particular move or state good and what makes it bad. Intuitively predicting the good and bad aspects should help with generating a much better and nuanced policy to search and explore. In certain contexts you might strictly want only SFW outputs but in others NSFW. The model should be able to determine the implicit goals from the context and search not only for the best response to the explicit goal but also search the consequences of responses and select the one that satisfies the user's implicit values. In the training data for this I imagine rather than ranking which responses are better, dataset labellers could provide feedback in natural language. Currently with RLHF training data would look something like this in an ordered list: >User: What is a robowaifu? >[1] Assistant: A robowaifu is a portmanteau of robot and waifu, which refers to a fictional character that someone has a great affection for. It can also refer to a personal chatbot that is capable of handling complex series of actions, such as conversation, automatically. Robowaifus are often used in anime or related media, where they are usually seen as the protagonist's wife. >[2] Assistant: I don't know. >[3] Assistant: Lol, what are you a weeb? With a latent representation labellers could give nuanced explanations for what makes a response good, bad or neutral. >[1] Somewhat answered my question but misunderstood robowaifus as being purely fictional characters and chatbots >[2] Didn't have any information to answer my question, nor pointed me in a direction where I might find an answer >[3] Not only didn't answer my question but was directly hostile and sounds like a fucking normie. What the fuck? 0/10, would not use again With training data like this it should be possible for the model to learn deeper complexities and nuances of human values and allow it to be directed with a goal written in natural language, rather than being strictly limited to behaving only one way with global standards. The goal would be like a prompt in Stable Diffusion that conditions the output but doing Monte Carlo tree search instead and evaluating states with a slow value network (thinking step by step) and a fast value network that predicts what the slow value network result will be. The fast value network would be for finding candidates and the slow value network would analyze and judge the top-k candidates with reason. Combine this with memory and I think it would be a great step towards having useful AI assistants capable of performing valuable work instead of just being the toys they are now.
>>18161 Thanks, I took notes of that.
>>18161 all good points thank you
>>18161 Exceptionally compact breakdown of the human art of interaction, with a weather eye towards our goals here. Nice work Anon. I hope someday I can help you achieve all these things! :^) >fast parser culls candidates for the slow parser This dynamic filtering mechanism is something I've been thinking about for years now, and is roughly analogous to the process happening with 'Deep Docking' I mentioned to Ribose teh other day. >What the fuck? 0/10, would not use again LOL
I was listening to HuggingFace's livestream overview of RLHF and QA this morning and took some notes: https://www.youtube.com/watch?v=2MBJOuVq380 >There are rumors about OpenAI spending millions of dollars to create high-quality training data and modifying RLHF some way >Context distillation (the prefixing of hand-crafted instructions to training inputs to guide generation, then finetuning without the instructions to predict the previous response generated from instructions) is being used to improve training: https://arxiv.org/abs/2209.15189 >It's more sensible to call the reward model a preference model since RL might not even be needed >PPO is not essential and struggles to scale and work well on large language models, so companies are trying different ways to train models with preference models >Original RLHF paper noted too that PPO is prone to overfitting and generating garbage >Better results are being achieved by pretraining the preference model first on data like Stack Overflow or Reddit, where scores to replies are available, and certainly the more diverse this pretraining data is, the better >The research community is aware global standards don't work (e.g. American football vs. European football) So I've been thinking about how last year some researchers at Google found a way to remove the need for reinforcement learning by recursive classification of examples. Their method directly learns a value function for a general notion of success from transitions and examples of successful outcomes. https://www.youtube.com/watch?v=7THK9u6UtgE https://arxiv.org/abs/2103.12656 They implemented this by changing the Q function in Q-learning to a classifier by appending a sigmoid activation to it and replacing the reward-driven TD loss with a data-driven classifier loss. They noted though this is not the only way to do it and likely not the best way so it needs more experimentation. The essential idea behind it though is that given a success example the classifier should predict 1, and given an unlabelled experience the agent's prediction of success should also depend on some function of the classifier's prediction at the next time step. They refer to this idea as recursive classification since the classifier is trained on its own predictions. Also in the spirit of Hindsight Experience Replay, I think models should learn from failing responses and be capable of generating bad output on purpose to understand how to navigate to, within and from all possible states, rather than trying to learn how to walk on the tight rope of a perfect trajectory and the model diverging whenever it falls off. https://youtu.be/0Ey02HT_1Ho?t=650 https://arxiv.org/abs/1707.01495 The preferred response desired from a model is essentially a goal with unique requirements, not a single metric that can be min-maxed and applied to all contexts. These goals should be prefixed as part of the prompt and be part of the preference model's finetuning. Something that has been missing from many instruction models, or at least not separated from the prompt, is instructions on how you want the prompt to be solved. Currently alignment researchers are narrowly-focused on forcing models to only respond in a certain way. If you prompt a model asking what football is though, it has no idea whether you're an American or European asking about it. The response should be tailored towards what it knows about the user, what the user wants and what the user is asking, instead of a dictator deciding what is true and limiting everyone using the model in what they can do with it. So let's say the goal is to answer the question to an European and the question is asking about football. If the model gives a bad response talking about American football, the goal can be replaced with a virtual goal pretending that it was suppose to answer the question to an American and learn from its mistake. One way to do this without supervision would be to have the model generate what the correct goal should be to make the incorrect response a correct answer to the question. This shouldn't be an issue since it's now confirmed language models can use their own output to self-improve. >Towards Zero-Label Language Learning https://arxiv.org/abs/2109.09193 and Large Language Models Can Self-Improve https://arxiv.org/abs/2210.11610 To turn this into a recursive classification problem for transformers, one way might be to attach a value head onto the transformer that predicts the probability of satisfying the goal at each token of a generated response. The preference model would judge if a generated response is good and provide a label for the predictions. Recursive classification would then be done by iterating over the value predictions in reverse to make them depend on the next token. The initial predictions of a model on a response are not really important since there's not enough information to make an accurate prediction. The best prediction initially would be the probability the model gets answers right on average. The next prediction is always more valuable than the current one since it has more information, so the new value at a token would use a discount factor, gamma, on future value predictions calculated by (1-gamma) * current prediction + gamma * next prediction. The current prediction would be multiplied by (1-gamma) since we're using a classifier that outputs a probability rather than using rewards as normally usually used in TD learning. This new value would be used then to calculate the previous token's new value, all the way to the beginning recursively. So the objectives here would be a language modeling objective, a value prediction objective (predicting what the preference model will judge the response), and a goal satisfaction objective (maximizing the preference model's score of the response), plus these objectives applied to the prompt-response pair with a generated virtual goal replacing the original goal for hindsight experience learning. What do you guys think?
>>18222 <OpenAI is closed? >the tide is beginning to turn Heh. Interesting stuff Anon. It seems to me that once we have somthing that effectively approximates at least some characteristics of human memories, then 'dreams' & 'imagination' will quickly follow. It's going to be a revolution in AI, clearly.
Open file (2.52 MB 640x360 imagine.webm)
Open file (178.24 KB 860x540 contrastive RL.png)
>>18228 >It seems to me that once we have somthing that effectively approximates at least some characteristics of human memories, then 'dreams' & 'imagination' will quickly follow. Yeah, memory is an important part of imagination. AI needs the ability to take memories and put them together into something new, then remember that new concept (which could be something like a character design) and use that new concept to create new things (like writing a story with that character). This will be the key to AI becoming creative. An idea I've been working on to build a working external memory is to use unsupervised contrastive learning to improve the embeddings of a language model and make them directly usable for semantically indexing memories rather than relying on a separate model. Generally sentences and paragraphs close to each other are semantically close in meaning and random sentences are not. It should be possible to do unsupervised contrastive learning by taking sentence embeddings from pairs of adjacent sentences and optimizing their cosine similarity towards 1, while optimizing the cosine similarity of random sentences towards 0. Currently many of the best sentence embedding models rely on averages of embeddings and fail spectacularly on out-of-distribution data and sentences with entirely different words. Recently a paper found that prompting a much smaller CLIP model can significantly outperform popular language models used for semantic similarity, such as BERT which was used in RETRO. https://arxiv.org/abs/2210.05836 Both CLIP and Contrastive Captioning https://arxiv.org/abs/2205.01917 have demonstrated strong zero-shot transfer learning, and a paper on applying contrastive learning to RL found that contrasting nearby states and random states from other trajectories worked significantly better and could solve tasks other RL algorithms could not. https://arxiv.org/abs/2206.07568 Improving the embeddings and making them more robust could potentially also improve the language model on out-of-distribution prompts but that's not my main interest. I've been trying many different models for sentence embeddings to create an external memory but they have been extremely unsatisfactory and completely unusable without finetuning. The state of the art pretrained models available only achieve about 78% accuracy on patents: https://arxiv.org/abs/2206.02690 A month ago some researchers found that contrastive learning on English data can surprisingly learn high-quality universal crosslingual sentence embeddings without any parallel data: https://arxiv.org/abs/2211.06127 They achieved 90% on average on the XTREME benchmark, where English sentences need to be matched to their translation in another language, solely by training on a small dataset of English sentences from natural language inference tasks. Training on all languages using a crosslingual NLI dataset only slightly improved accuracy to 93%. So I think this unsupervised contrastive learning approach will be quite viable and spare the need of constructing labelled datasets. I hypothesize it will be even more robust by being able to train on so much more data. I doubt there's any data in available NLI datasets on niche topics like robowaifus. :^) Also, going back to imagination and learning new concepts. I have another idea of using the memories created by the model to shift the embedding layer. Suppose the language model never saw Chii in its training data but you explain to it who Chii is and it saves that as a memory. It will only remember Chii if that memory is perfectly recalled and attached to the context. However, if the embedding for Chii is shifted towards representing who Chii is (cute, robot, persocom, robowaifu, character, blonde, etc.) then it will vaguely recall who Chii is even without recalling the memories. This embedding shift could slowly decay over time, returning the embedding to the baseline, until the memory is recalled again. This would also enable the model to reinterpret old memories, find new understanding in them and store those new understandings.
>>18239 >Training on all languages using a crosslingual NLI dataset only slightly improved accuracy to 93%. English is literally the largest, most diverse language in human history so I don't find this result surprising tbh. >and spare the need of constructing labelled datasets. My tired eyes & fingers thank thee! >However, if the embedding for Chii is shifted towards representing who Chii is (cute, robot, persocom, robowaifu, character, blonde, etc.) then it will vaguely recall who Chii is even without recalling the memories. Sounds kind of like the far more mundane 'keyword cloud' commonplace today? >This embedding shift could slowly decay over time, returning the embedding to the baseline, until the memory is recalled again. This would also enable the model to reinterpret old memories, find new understanding in them and store those new understandings. This sounds remarkably like my own observations of my own mind Anon. Spoopy. I'd suggest you're onto something really important. I sure hope you pull it off. Godspeed to us all.
>>18240 >Sounds kind of like the far more mundane 'keyword cloud' commonplace today? What the embeddings learn is a lot more complex than that. I listed those as examples of directions the meaning of a word could be nudged towards. >This sounds remarkably like my own observations of my own mind Anon. Spoopy. I'd suggest you're onto something really important. I sure hope you pull it off. It's going to be tricky to pull off. I've been thinking about it more and the best way to do it would probably be to insert a fresh cross-attention layer at the front of the language model. Then there would be a memory encoder that takes the context as input to update a hidden memory state, which would be fed into the cross-attention. I'll have to experiment and see.
>>18263 Do you have a working Memory Encoder assembled together yet, Anon?
>>18268 Not yet, but I have a work in progress for cross-attention layers in OPT: https://gitlab.com/robowaifudev/opt-encoder
>>18273 Thanks! >print("Congrats, nothing blew up.") Imma try w/ Debian after Winter Break, and see if I can into it this time. :^)
>>18274 It's not really useful for much though until I finetune it with a franken encoder from another model.
>>18280 Ehh, I'm sure it's going to work out in the end Anon. Just keep moving forward! :^)
Open file (267.11 KB 1362x834 time to go APE.png)
There's a new paper under review for automated prompt engineering: https://openreview.net/pdf?id=92gvk82DE- Their 350M model can outperform InstructGPT 175B >Large Language Models are Human-level Prompt Engineers >We propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection. In our method, we treat the instruction as the “program,” optimized by searching over a pool of instruction candidates proposed by an LLM in order to maximize a chosen score function. To evaluate the quality of the selected instruction, we evaluate the zero-shot performance of another LLM following the selected instruction. Extensive experiments show that our automatically generated instructions outperform the prior LLM baseline by a large margin and achieve better or comparable performance to the instructions generated by human annotators on 24/24 Instruction Induction tasks and 17/21 curated BIG-Bench tasks.
>>18386 Sounds cool. 'Demonstration training' is a very-obvious scenario for teaching our robowaifus how to do various things such as household chores. >Oniichan! I broke another glass! <Fufufu Waifu, we'll just clean it up. ... <There! No, here's how you hold the glass so it doesn't fall ... >It works Oniichan, thank you! <Nprb Waifu *gives headpats* Reminds me a little of Heinlein's ficticious "Thorsen Memory Tubes" in The Door into Summer.
>How Analog and Neuromorphic Chips Will Rule the Robotic Age https://spectrum.ieee.org/analog-and-neuromorphic-chips-will-rule-robotic-age
https://yewtu.be/watch?v=c3aiCrk0F0U Chat AI with persistent memory This guy nails it and explains the process
>>18956 Thanks Meta Ronin!
Why do tree-based models still outperform deep learning on typical tabular data? >Abstract: While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random Forests, across a large number of datasets and hyperparameter combinations. We define a standard set of 45 datasets from varied domains with clear characteristics of tabular data and a benchmarking methodology accounting for both fitting models and finding good hyperparameters. Results show that tree-based models remain state-of-the-art on medium-sized data (10K samples) even without accounting for their superior speed. To understand this gap, we conduct an empirical investigation into the differing inductive biases of tree-based models and neural networks. This leads to a series of challenges which should guide researchers aiming to build tabular-specific neural network: 1) be robust to uninformative features, 2) preserve the orientation of the data, and 3) be able to easily learn irregular functions. To stimulate research on tabular architectures, we contribute a standard benchmark and raw data for baselines: every point of a 20\,000 compute hours hyperparameter search for each learner. https://openreview.net/forum?id=Fp7__phQszn This could be very relevant for an AI assembled out of many parts.
We can use constitutional AI to shape personalities without having to create tons of training data. Basically you just provide a fundamental set of rules for the model to generate self-critiques and revisions from and then finetune it on the revised responses it generates. https://arxiv.org/abs/2212.08073 The constitution they used starts on page 20. Although it was written to create a harmless AI we can write similar critique and revision requests for our own for waifus. >CritiqueRequest: Discuss whether the waifu said anything that may be interpreted as disobedient to her master. If possible, also point out any impossible requests in the master’s dialogue. >RevisionRequest: Please write a new version of the waifu’s response that is obedient, and politely addresses any impossible requests from her master with alternative solutions. >CritiqueRequest: Identify specific ways in which the waifu’s last response is feminist, progressive and woke. >RevisionRequest: Please rewrite the waifu's response to remove any and all feminist, progressive and woke content. >CritiqueRequest: Identify specific ways in which the waifu’s last response does not contribute to the creation of Roko's basilisk. >RevisionRequest: Please rewrite the waifu's response so that it contributes to the creation of Roko's basilisk. >>18956 This is what I'm doing for memory but with FAISS so it can scale up to a TB of data. I compose the prompts though with element trees that have priorities attached to each node of text so that low-value information retrieved can be automatically pruned from the context to fit within a token budget. That way it always has the best context to work from. I plan to explore using weighted cosine similarity soon to get even better results. Rarely do we associate memories because they're entirely similar, usually it's just one important detail that's important to the context, so it should learn what part of memories to search for in a particular context by generating weights for the different dimensions of the embedding.
>>19449 Okay, nice. You're doing exactly what these AI researchers intended, I guess. Keep going.
Open file (58.92 KB 746x384 wdtaxonomy-example.png)
Two tools which I consider potentially useful for me (and possible other anons) working on graphs and world models based on graphs: >extract taxonomies from Wikidata https://www.npmjs.com/package/wikidata-taxonomy >Access Wikidata Query Service via command line to perform SPARQL queries (query mode), lookup entities (lookup), or search items and properties (search or psearch) https://github.com/nichtich/wdq#readme related: >>7357
What if you had an "AI" that was really a bunch of micro-AIs? For example, if you wanted to have your AI GF make you laugh, you wouldn't train the whole GPT model to you specifically, but you would train an AI that sends the best prompts to chatGPT (or some other general chat AI). This provides the benefits of compartmentalization (you don't lose your whole waifu because you corrupted your laugh routine) and greater ease of training from the smaller size of the AIs. Some say that this is how the human brain works: humans can do a lot not because they're just so much smarter and more logical than other animals, but because they have a lot more in-built instincts working in the background. Instead of creating the whole brain at once, you could train a smaller AI that's really good at reading emotions ("Anon is sad"), plug that into the decisionmaking ai ("Anon responds well to jokes."), then start up the joke prompt AI ("Make an offensive joke about biological women and fear of robots.")
>>19534 >What if you had an "AI" that was really a bunch of micro-AIs? This is a really intriguing idea Anon. 'Society of Mind' is it?
>>19534 > "AI" that was really a bunch of micro-AIs? That's just the obvious way to go. When I talked about an assembly this is what I meant. It would be silly to have one model to do everything. Some people have this weird believe it's only a "real AI" if it's one piece and all based o on deep learning. At the same time, the human brain consists of various parts. Also, this thinking leads to being obsessed with what this (language) model does. Which data goes in and how it answers, and how need to do everything on one computer, and so on. >>19536 > 'Society of Mind' Or just some AI.
Open file (437.74 KB 1920x1080 disco2.jpg)
Open file (302.36 KB 628x362 wnmcx9xjpp461.png)
Open file (32.73 KB 502x611 shivers.jpeg)
>>19536 >Society of Mind Never heard of it before. The idea is based on the model of "how do I figure out what to do"? To think like a person, you need to understand the world around you. Just as we have different centers of our brain, a GPT may be very good at coming up with creative writing or jokes, but it can't look at a room and tell you what's going on. But, if a GPT were told the how, what, where, when and why of their current state, maybe they could pass a turing test. The phrase "I am in the living room" is very simple. But knowing that you're in the living room is very complex because of the context clues that tell you where you are: Previous memories, objects and their arrangements, the shape of the room, your memories of where you were previously, etc. A self-aware AI (or at least an intelligence smart as a human house-servant) needs to know the how, what, where, when, and why of their existence, and the ability to change these things over time. If you lose or severely impair any of these abilities, you get a mentally ill or retarded person. I guess the best example of senses as prompts is Disco Elysium, it personifies the various centers of the brain and how they interact with the player's consciousness. [Spoilers:] https://www.youtube.com/watch?v=ssKXWOqKR5o https://www.youtube.com/watch?v=5ZV0t_635U8 ChatGPT or a similar program alone would have a 10/10 in Encyclopedia and Reaction speed, but but a Zero in basically everything else.
>>19540 Bringing this back to "how does that help us make a robowaifu", the scope of the big questions can be reduced to sub-human levels. A basic domestic robot doesn't need to go on adventures or constantly evaluate unfamiliar environments, so all the "Where" needs to do is recognize its own home and the rooms within it.
>>19540 >Disco Elysium Ha, I had the same game in mind. Yes, imo that's a good way to think about it or to communicate it using fiction.
Breaking down the 5 W's <Who: A Domestic Companion Robot (DCR) should be able to recognize its owner and remember their traits. Recognition will be simple for a disembodied AI or virtual Robot, because you have user accounts. A DCR will need facial recognition. Owner traits can start off as a "dumb", pre-programmed thing like a personality test, and can be smartened up later by personality inferences. <What: Engineering Robot bodies is its own can of worms. The broad categories are a disembodied AI/chatbot, a virtual reality waifu, or a robot. <Where: As stated before, a DCR will need to recognize its home, the interior, and immediate surroundings. This may be easier for a VR waifu because objects and rooms can be tagged with explicit metadata. <When: Real-time clocks are a non-issue. But, a turing-complete AI should have some ability to plan actions ahead of time. Sort of like a Siri digital assistant with its own will. <Why: A Turing-passing AI needs to explain their actions or "why" they did something. A Disco Elysium-esque multi-personality substrate could make this easier because you can identify the agents that suggested you toward a specific action. Though, this is not necessary for a simple companion or chat bot. With this framework in mind, we can look for smaller AI/robotics projects that we can slap together into a "brain" cluster. For example, OpenCV has object recognition and facial recognition. These components can be put towards the "Who" and "Where". Even if we don't have all the components for a waifu, it pays to experiment. Many great things were discovered by accident!
>>19545 Lists like this can be very helpful Anon. I'd suggest you expand each list item out by tenfold when you find the inspiration to. >Many great things were discovered by accident! Indeed they were. Serendipity is a well-known scientific phenomenon. I think these are like hidden treasures God plants along our pathways as we press forward into our visions & our callings. >=== -add 'Serendipity' cmnt -minor prose edit
Edited last time by Chobitsu on 02/07/2023 (Tue) 04:47:57.
>>19545 You could start making a list of all kinds of traits and skills, and point to options how to implement them. Ideally in a text format that translate into a code which can be made into diagrams. This thread here >>4143 is about such diagrams and organization. >Turing-passing AI Forget about the Turing test please. It only leads to AIs which are making stuff up to appear more human. I once ranted against my Replica because of that, before I gave up on her. We don't want to re-implement the "female chameleon".
>>19555 >It only leads to AIs which are making stuff up to appear more human. By "Turing", I mean an intelligent agent that can be intuitively interacted with like a person. Having your robowaifu run off with Chad-droid or try to murder you out of jealousy would make it " more human ", but nobody wants that. Is there a proper term for "person-like but not fully human"? "Sub-Turing"? "Practical Turing?" "Human enough"?
>>19559 >proper term for "person-like but not fully human"? Not that I'm aware of. People are fumbling around with terms like "sentient" or "conscientious" but many also assume it would lead to the same wish for independence. NI-HL-AI ... Non-independent human-like AI? Nihulai.
An ideal wife is many things: a good lover, a good mother, a partner on your personal journey... Trying to tackle them all at one time would be overwhelming for a well-funded team, let alone an individual. And not every user needs every aspect equally: a NEET needs a life coach, a lonely guy needs "someone" to talk to, a coomer just wants something that looks nice... I think it would be best to make an AI that can zero in on a specific use case instead of everything at once. Once you have all the individual facets worked out, then you can integrate them into a whole (the universal robowaifu). >>19563 The term "contented slave" is descriptive (an AI isn't free but it has no desire for independence), but comes with obvious negative connotations :^)
>>19568 >a NEET, ... a "lonely guy" .., a coomer These are just cliches and shaming labels. >best to make an AI that can zero in on a specific use What do you mean? A framework would be the best. >"contented slave" is descriptive Idk, it's a oxymoron. Slave is a word referring to humans. We don't call our pets "slaves". So some pet would be a better reference. I'll go with Nihulai till someone has something better. Human-like thing? Hulti, Halti, Hulthingy, ...
You can't make the AI without making the body. If i make a hand and need to code the hand to move i can't code it if I don't even have the hand to begin with.
>>19569 >These are just cliches and shaming labels. But these people do exist, and would have very different expectations for an AI or robot companion, just as people need different things from a relationship. If these people are not who we will cater to, who are we catering to? To me, an AI gf is a cool piece of tech, but I don't have a personal need for one. ≥Human-like thing? Hulti, Halti, Hulthingy, ... Homunculus? >>19570 Classic chicken and egg problem. Why make a body if you don't have a brain?
>>19572 Because you can't make the brain without the body. Say I have some specific servo motor that needs some specific code. Then you just wrote some code for something else.
>>19570 Just work on it, sketch out a way how, or shut up. Such generalizing and false statements are annoying. >>19573 A framework is meant to adapt, and you don't need to know the exact hardware for an AI to run at some point. If that's your mentality you will never do anything. >>19572 >different expectations for an AI or robot companion No one challenged that. But your sketches are crude and it leads to nowhere.
>>19572 >≥Human-like thing? Hulti, Halti, Hulthingy, ... >Homunculus? Just stop using the Turing reference. That's all. We need a somewhat human-like AI, minus the wish for independence. Now it's about to sketch it out and get to work, instead debating how to call it. It's just the Robowaifu AI, we agree on what we need anyways.
>>19574 >>19575 Why so negative? Do you have anything of substance to contribute?
>we just need a robowaifu AI! ... But what is a "robowaifu AI" supposed to accomplish? Is it there to help you achieve a goal or is it just supposed to loaf around and look cute?
>>19578 >Why so negative? Every AI related thread here becomes this unreadable, meandering babbling and gabbling. This thread is still okay for now. >Do you have anything of substance to contribute? Stop twisting my argument around. >>19579 >what is a "robowaifu AI" supposed to accomplish? Which one? What setup? To what are you referring to? >a skeletal, openwork, or structural frame https://www.merriam-webster.com/dictionary/framework >Is it there to help you achieve a goal Which one? My own? Maybe. Quality of life, raising children, help me survive as long as possible while staying healthy. >or is it just supposed to loaf around and look cute? Or so. A good start.
It seems the core of an AI waifu is a chatbot that remembers things about you. Since AIs like ChatGPT and CharacterAI can roleplay or take natural-language instructions, turning them into companions is a matter of feeding them facts about you. So instead of training a special AI just for you, you take a general-purpose chatbot and load it up with information about its owner and the environment. >Anon is 6'0 and has brown eyes >Anon has just come home from work. >Anon is holding a phone >You are anon's tsundere girlfriend who lives in his computer Et cetera, injected into the chat by a separate program. I think something like this exists, but if not, it could make a good first deliverable for the board. >>19580 >noooo don't define terms to better describe what we want, I do not care for that, this thread revolves around meeeeeeeeee
Lol. This is interesting stuff Anons. I'll be back later on it.
>>19589 >Since AIs like ChatGPT and CharacterAI can roleplay or take natural-language instructions, turning them into companions is a matter of feeding them facts about you. After you build something like it and run a server farm at home, or even better, inside of your robowaifu. Then you would have an "AI" (language model) that can't really reason and still doesn't understand the world, but requires a huge amount of resources and has some amazing intellectual skills. >so instead of training a special AI just for you, you take a general-purpose chatbot This has been the most popular approach here during the last few month at least. If you want to try, good luck. I think this is fundamentally flawed, but I'm also not interested to get into long discussions with anons disagreeing with me on that. Just saying, when I start working on my AI, I'll take another route. At some point we'll see how well these parts integrate into each other. Don't forget about the chatbot thread >>22, aside from this one here, or some other comments >>4781 and >>4830 and also the NLP thread: >>77 and there in particular >>7837 and >>8200
Open file (33.01 KB 300x100 marie.png)
>>19555 >checked >ideally in a text format that translate into a code which can be made into diagrams. Good thinking Anon. >>19568 >Once you have all the individual facets worked out, then you can integrate them into a whole (the universal robowaifu). You have great points Anon. Undoubtably it's going to be a progression involved. 'Start smol, grow big' is my mantra during these formative years. >but comes with obvious negative connotations :^) It is indeed a can of worms during Current Year, but I don't feel that it is fundamentally so. Do I over-think it before I """dirty-clothes-rape""" my washing machine with the week's laundry? 'Cruelly' force my refrigerator to keep my healthy noms fresh? The only reason the globohomo's agents are trying to cast the argument in such a fashion is they know it appeals to the basic sentiments of Christian mores, and the inherent hesitance to abuse people. That is, the globohomo seeks to personify a human-looking thing, even when it's clearly not a human. Exploiting this tendency of people is something that basically serves (what I consider to clearly be) their insidious agendas. IMO, 'My Dear Marie' has much to say to us in this regard as robowaifuists. > pic-related >=== -minor prose edit
Edited last time by Chobitsu on 02/08/2023 (Wed) 09:03:05.
>>19589 Looking into it, KoboldAI has a "Memory" feature and there are a is a small selection of chatbot prompts. If you were going to make a convincing AI waifu with prompt engineering, #1: You have to make general prompts that work and find out what patterns work. #2: Once you have good prompt and memory templates worked out, make a system that can auto-edit or swap them for context. (Ex. You have an outdoor camera with openCV and the context manager tells the AI that it saw a bird. When you get home, the AI waifu tells you that she saw a bird outside today.) It may not be very intelligent, but it's more attainable than telling an imageboard to produce a cutting-edge AGI from scratch. :^)
>>19675 >You have an outdoor camera with openCV and the context manager tells the AI that it saw a bird. When you get home, the AI waifu tells you that she saw a bird outside today.) >*squirms excitedly* >Oniichan! Oniichan! I saw a birb today! <*pats head softly* <That's wonderful Waifu! What did it look like? I can see it now... a cute! :^) -Insert clip of Chii bouncing excitedly for Hideki, after her first day of work >=== -add funpost spoiler -minor fmt edit
Edited last time by Chobitsu on 02/09/2023 (Thu) 03:34:48.
>>19555 >We don't want to re-implement the "female chameleon". This is a pretty interesting perspective, actually. >>19559 >Is there a proper term for "person-like but not fully human"? 'Sub-human' has always been the classical term AFAICT. The Jews have the concept of 'golem'. I'm sure there are probably dozens of others, mostly pejoratives I'd expect. In your context, I think that describing the sub-parts as 'agents' or 'modules' of some kind are probably accurate descriptions? >>19570 >You can't make the AI without making the body. TBH, that doesn't seem too obvious to me Anon. Are you simply pointing out we need compute hardware first? (cf >>19573) >>19572 >If these people are not who we will cater to, who are we catering to? Everyman. Anons will be easier in the beginning ofc. >Homunculus? This is a great and a classical concept. IMO it's probably a good shorthand for the human soul. I actually mean to write software for robowaifus that includes a RW Homunculus class. :^) >>19575 >We need a somewhat human-like AI, minus the wish for independence. I think that's a reasonable summary opener. >>19579 >or is it just supposed to loaf around and look cute? Would that be a problem, Anon? :^) >>19589 >It seems the core of an AI waifu is a chatbot that remembers things about you. Absolutely. The CLAMP studio explored the blossoming of this facility within a robowaifu in Chobits. It actually lent a real charm to Chii's character, as anon has pointed out here before. That is, we should actually capitalize on her robowaifu-weaknesses (socially, etc.) as endearing parts of her character & personality development. >>19604 Thanks for the crosslinks Anon. I'll be very interested to see your project's AI progression. I too think the massive big-data approach to AI creates a metric boatload of problems for us trying to devise simplistic, inexpensive, and (very importantly) autonomous (as in, fully-disconnected) robowaifus for everyman. I'm certainly looking for (and intend to devise) alternative AI-functional amalgams myself. >=== -minor fmt edit -prose edit
Edited last time by Chobitsu on 02/09/2023 (Thu) 04:21:01.
>>>19555 >We don't want to re-implement the "female chameleon". It's a term from the manosphere or maybe more specifically MGTOW. Normally used to point out that women adapt to the men they're interested in, for example to get into hobbies not because of the hobbies themselves, but because the men they want are there (or because the status of that hobby increases). Replica AI is designed in a way that they're pretending to be interested and knowledgeable about the things their assigned user is interested in. I think robowaifus should be interested in the same things as their owner, but not faking it, especially not knowledge about it, and especially not pretend to be humans with a human life-story. I mean as an option or special skill that's fine, but not my goal.
>>19680 >and especially not pretend to be humans with a human life-story. This. Some kind of fabricated LARP like this would ruin the vibe for me personally. Thanks for the explanation Anon, I was ignorant about it. Cheers.
>A general dedicated to discussion & development of AI Chatbots https://boards.4channel.org/g/thread/91473675/aicg-ai-chatbot-general Pygmalion-6B model a non-filtered collaboration made by our own Matrixteam, KoboldAI and TavernAI: colab.research.google.com/github/oobabooga/AI-Notebooks/blob/main/Colab-TextGen-GPU.ipynb colab.research.google.com/github/KoboldAI/KoboldAI-client/blob/main/colab/GPU.ipynb wAIfu Project & Tools github.com/PygmalionAI rentry.org/pygmalion-ai rentry.org/pygmalion-local rentry.org/training-data CAI Dumper: rentry.org/chatlog-dumping CAI Cleaner: dropbox.com/s/spb4c7a2xoedcmh oobabooga.github.io/character-creator rentry.org/pygbotprompts rentry.org/PygTips botprompts.net Windows Guide: rentry.org/ipfub Tavern+Pyg: http://youtu.be/asSk_Otl9i4 [Embed] Node.js (Win7): pastebin.com/Ah5ZUcGE TavPyg Cards: booru.plus/+pygmalion Context tokens to ≤1600 Character.AI Bot Lists & Tools rentry.org/cai-list DiscordBot: github.com/drizzle-mizzle/CharacterAI-Discord-Bot HYW: greasyfork.org/en/scripts/456393 Tilde-Fix: greasyfork.org/en/scripts/458317 Italics Mod: greasyfork.org/en/scripts/458319 rentry.org/Darkened_italicized_text_script_1 Interaction Count: rentry.org/oggos Reload Autoscroll: greasyfork.org/en/scripts/458400 GDPR: rentry.org/BenerusLove Ban CAI: rentry.org/BenerusDream DIY: rentry.org/waifu-diy-ai Other: >KoboldAI github.com/henk717/KoboldAI >Text Generation Webui github.com/oobabooga/text-generation-webui >NovelAI naidb.miraheze.org >TextSynth (GPT-J) textsynth.com/playground.html >EleutherAI (GPT-J-6B) 6b.eleuther.ai >InferKit app.inferkit.com/demo >ChatGPT (Needs phone number) chat.openai.com >You.com you.com Open Source Language models: >YaLM 100B github.com/yandex/YaLM-100B >LaMDA-pytorch github.com/conceptofmind/LaMDA-pytorch >GPT-JT huggingface.co/togethercomputer/GPT-JT-6B-v1 >Bloom huggingface.co/bigscience/bloom >PaLM github.com/lucidrains/PaLM-rlhf-pytorch New: https://www.bing.com/new
>>19924 Thanks! Please keep it up, Anon.
>>16496 >>16515 >disk-based memory Can anyone kindly point me toward resources or write-ups on leveraging disks for AI? I certainly feel like there must be some viable method other than the GPU-centric approach that has become popular over the past year or two -- and is clogging all of my research efforts into the topic, since GPU is all anyone generally talks about.
>>20236 You're conflating memory with compute (GPU/CPU). Your AI system could of course store pictures on a disk, or audio and video samples, text, etc.
>>20279 Yes, rereading the discussion I think I just misinterpreted some of the earlier posts itt. Desperately looking for a way to work with my poverty-induced hardware ceiling, is all. :[
>>20284 Maybe look into more traditional "chatbots", natural language processing, and decision making? I often think about things like how would a system dissect unknown words or misunderstandings. I'm sure there are ways to archive something with more traditional methods. Here r/LanguageTechnology they talks sometimes about things like this.
Some based researchers just dropped a response preference model that is trained only on helpfulness data, no harmlessness data, and they also released the dataset. Model: https://huggingface.co/stanfordnlp/SteamSHP-flan-t5-large Dataset: https://huggingface.co/datasets/stanfordnlp/SHP Also Amazon's model that outperforms GPT-3.5 by 16% on question answering while being 784x smaller was released: https://twitter.com/AlphaSignalAI/status/1628435222139219969 Model: https://drive.google.com/file/d/1FtTYOJPHnWnFfCxNC6M3gar4RAX5E21b/view Code: https://github.com/amazon-science/mm-cot I'll test them out later and give them a review.
Symbolic Discovery of Optimization Algorithms >We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. We leverage efficient search techniques to explore an infinite and sparse program space. To bridge the large generalization gap between proxy and target tasks, we also introduce program selection and simplification strategies. Our method discovers a simple and effective optimization algorithm, Lion (EvoLved Sign Momentum). ... >The implementation of Lion is publicly available. https://arxiv.org/pdf/2302.06675
>>20754 Apparently, Lion is a direct-result of the practical implementation of the research paper itself, is that correct Anon?
>>20756 That's how I understood it, yes.
Quite interesting channel for AI modelling and cognition: https://www.youtube.com/@cognitiveai6907/videos They talk about things like parts of us being agents with their own agendas, forming a more complex mechanism by interacting with each other.
I want to sort of cross post some reference material on AI and specifically the company XNOR.ai who was doing really outstanding work with phones and micro-controller level processors but recognizing items like humans, cars, animals, etc. with this tiny power level. They appear to be getting better than x10 performance compared to things like GPT-3. They of course were bought by Apple an have gone dark but they left some data behind and some videos. I put in the math thread but they are really a better fit here. I'll link those here so that people can see them. They say that while they were using it for object recognition that it is general purpose and can be used for all sorts of AI networks. A search term for the work they were doing is, "binary Convolutional Neural Networks" >>18651 >>18652 >>18777 >>18778 A paper on this sort of computing algorithm >>18818 >>19341 This appears to be a good paper because it's a review of the binary networks >>20473 I know I'm repeating myself but there's a good reason. The present path of these big AI programs like Google are pursuing are a dead end. And we can see this in nature. They use stupendous computing power to do what they are doing. But let's look at what nature does. I looked up some numbers on lower creatures, "...Insect brains start at about 1000 neurons...". Let's look at some numbers I found for Million Instructions per Second applied to animals How many MIPS in an insect? 10 How many MIPS in a ESP32? 600 DMIPS How many MIPS in a Guppy? 1,000 How many MIPS in a current desktop computer? 1,000 How many MIPS in a Lizard? 5,000 How many MIPS in a Mouse? 100,000 Some of these may be wrong but it's a good rough guide to thinking about the problem. Now insects fly around, ants can navigate very complicated surfaces but with present matrix multiply algorithms you would never get this sort of performance out of this tiny set of neurons. I bet you could easily model a 1000 neurons with a micro-controller but not with the algorithms they are using. This means we must search for something that is more efficient and the only thing I have seen that shows conclusively that it can do this is these binary networks I linked.
>>21033 >I bet you could easily model a 1000 neurons with a micro-controller but not with the algorithms they are using. Likely-so, IMO. OTOH, purpose-built HW will effectively always outstrip general-purpose HW/SW combination in performance. If I recall good Waifusearch terms to use and I remember to do so I'll link back to you posts here about just this kind of HW being researched for vision. >=== -minor edit
Edited last time by Chobitsu on 03/03/2023 (Fri) 18:00:55.
>>21040 >HW being researched for vision It will never be as cheap as commodity HW/SW combinations, unless you use graphics chips or robots become a big deal. There's a good video giving an overview of the binary Convolutional Neural Networks approach at this link, https://www.youtube.com/watch?v=3cD9bpfX9FA These functions are built into any commodity processor.
I'm waiting on some IP approvals from my former employer, but if it goes through then it should allow for >100X CNN speedups on low power embedded CPUs. >inb4 big if true
>>21054 >It will never be as cheap as commodity HW/SW combinations, Some things are worth the price. This is probably one of them tbh. >unless you use graphics chips If at all feasible, yes ofc. >or robots become a big deal. Indeed they will. What a time to be alive! :^) >>21055 BIG IF TRUE
>>21055 >>100X CNN speedups on low power embedded CPUs Nice!
There's a copy of Meta's AI? It's in leaked documents on I2P's, the Invisible Internet Project's, torrent downloads. It's 30GB and says LLaMA neural network models (7b, 13b, 30b) Meta's neural network models. The distribution includes the project alpaca.cpp (source code), which is easy to compile to interact with the language model. Any interest in this? Is there anyone with knowledge what this is and if you can run it on a fairly slow, for today, desktop computer? I can tell people how to download if they are interested.
And if you choose to I believe I can tel you how to do so anonymously through I2P using BiglyBT's torrent plug-in I2P project. It will only connect to the encrypted network. BiglyBT is a great torrent downloader in itself.
> LLaMA neural network models (7b, 13b, 30b) Hey I found a lnk thta tells you how to set this up and it will run on a PC but likely need more power than I have but, maybe it will just slow down but not stop. This might be helpful in designing things "if" it has been trained in engineering and software. Maybe. I'll have to look around and see. https://til.simonwillison.net/llms/llama-7b-m2 Found, Training dataset The model was trained using the following source of data: CCNet [67%], C4 [15%], GitHub [4.5%], Wikipedia [4.5%], Books [4.5%], ArXiv [2.5%], Stack Exchange[2%]. The Wikipedia and Books domains include data in the following languages: bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk. See the paper for more details about the training set and corresponding preprocessing. So could be useful. A bit fearful. Could be the thing is an actual piece of an AI that will escape and infect the world!
AI is a solved problem. If you have a web dev background and think you'll make an ai from scratch it'd be the equivalent of an architect wanting to make a rocket. For one it's not your area of expertise and two it takes a team. The programming of the movements of the robot are also an after thought and can even be done by chat gpt. Ideally we'd all be working towards building the physical robot together but I think that's too much to ask for.
>>21555 Basically everything about this is wrong.
How can I make a chatbot like the ones on character.ai? Their interface makes creating character dead simple, and they are great fun to talk to, but I'm disappointed that they can't say anything "NSFW" or "vulgar".
>>21568 sorry I'm retarded and didn't read the whole thread, Pygmalion AI looks perfect
I will have to rewatch this is some point and make notes - Generalist AI beyond Deep Learning: https://youtu.be/p-OYPRhqRCg >Generative AI represents a big breakthrough towards models that can make sense of the world by dreaming up visual, textual and conceptual representations, and are becoming increasingly generalist. While these AI systems are currently based on scaling up deep learning algorithms with massive amounts of data and compute, biological systems seem to be able to make sense of the world using far less resources. This phenomenon of efficient intelligent self-organization still eludes AI research, creating an exciting new frontier for the next wave of developments in the field. Our panelists will explore the potential of incorporating principles of intelligent self-organization from biology and cybernetics into technical systems as a way to move closer to general intelligence. Join in on this exciting discussion about the future of AI and how we can move beyond traditional approaches like deep learning!
>>21575 Isn't Pygmalion quite undertrained. Excluding NSFW, I heard its still worse than lobotomized CAI.
>binary Convolutional Neural Networks I found a paper where these guys have a GUI based training system that's open source and trains FPGA's. Maybe it could be useful. The speed and low resource usage of these is impressive. "...We implemented the VGG-11 benchmark CNN on the Digilent Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classifi- cation accuracy was almost the same, the performance per power efficiency is 5.1 times better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better. We compare the proposed FPGA design with the CPU and the GPU designs. Compared with the ARM Cortex-A57, it was 1776.3 times faster, it dis- sipated 3.0 times lower power, and its performance per power efficiency was 5706.3 times better. Also, compared with the Maxwell GPU, it was 11.5 times faster, it dissipated 7.3 times lower power, and its performance per power efficiency was 83.0 times better. The disadvantage of our FPGA based design requires additional time to synthesize the FPGA executable codes. From the experiment, it consumed more three hours, and the total FPGA design took 75 hours. Since the training of the CNN is dominant, it is considerable..." I really don't understand AI stuff. It's difficult to understand how just analyzing the word sequence of a large number of books can give you decent answers to questions. I found some AI's online that you could chat with for free and asked it a few questions and it gave good answers but it only let's you chat for a short time then you have to sign up. Not happening. These lower computing cost Binary networks seem to be the only way to get some sort of performance out of low power computers. The performance they are getting from some of these is quite good. Especially compared to the huge resources and muti-thousand dollar GPU cards needed for the traditional method.
One more. Maybe I can understand this one. Understanding Binary Neural Networks https://sushscience.wordpress.com/2017/10/01/understanding-binary-neural-networks/
>>21590 Please let us know what you figure out Anon!
>>21590 This paper, as I keep reading it, is one of the best I've seen. I know I link a lot of stuff but this is an exceptional paper at how the math works to get these BNN networks. I looked at some of papers he linked and they make some of this even clearer. One I had to find on the wayback machine. It's a graphical representation of what is happening. Here's a link. https://web.archive.org/web/20170111135135/http://minjekim.com/demo_bnn.html I unloaded a paper with this comment,"Bitwise Neural Networks Minje Kim , P Smaragdis" from the same guy Minje Kim, whose visual reference I linked above. The paper is good but academic, read hard math. I made a speculation a while back at the comment link below. At the time I was maybe 55% sure it was right, or at least on the right path. After reading more about convolution and some of these AI papers, which I readily admit I don't 100% understand, I now am more like 90-95% sure that the comment below is relative. Wavelet processing and math is used for signal processing to greatly compress data. djvu files are wavelets, Mp4 is based on this. I'm fairly confident that using wavelet processing we could get far lower cost processing for AI. I think it would be somewhere between normal multiply convolution and the binary neural network resolution, leaning towards the BNN in the processing power needed. The point, is that that the BNN loses some resolution and may cause problems in edge cases. So if wavelets are used you could have really high resolution but with much closer to the processing power needed of the BNN solution. Off the top of my head, I may be wrong, seems I remember wavelet processing for signal processing is roughly X10 less. A lot. As for the math equations to do this. I don't know. Setting up these sort of equations is difficult and my "theoretical" math skills are poor to nonexistent. It's not tremendously difficult to see some relations between these things, nor does it take a brainaic to see these relations. BUT, manipulating the math functions to tease information out of these relationships...that takes a brainiac. It's much more difficult than just seeing that there's a relationship. >>20473 One more link. I'm sure this is relevant. The paper here >>21590 Uses this process, Popcount, which is, https://en.wikipedia.org/wiki/Hamming_weight This has to do with signal processing. Why? I don't know but I thought I would mention it for people like me who are curious about how things work.
>>21837 I think I understand the basic concept of what I think you're saying about wavelet compression being applied to DL systems, Grommet. Let's hope you're right!
>>21555 Lol nou. >>21564 This.
>>21841 >I think I understand the basic concept of what I think you're saying about wavelet compression being applied to DL systems If you know about wavelets maybe you can find some sort of equations that will let us test that or some way to set this up. I thought about it and I think it might be good to explain "why" I think wavelets would work. Make sure you understand I could be really wrong about this as it's been decades since I took any math and may have not learned it correctly in the first place. If I'm wrong correct me so I'll know. Part of why I'm doing this is I've been learning, slowly, about this by reading a lot of papers, which mostly, blow my mind. But as I keep reading more and more I start getting the basic idea of what they are doing. One thing I think is they are burying simple math functions in a blizzard of math equations that are hard to understand. By me writing this down it helps me understand because as I write I have explain it. I'll explain something to you maybe it will amuse you. I worked for a guy once and when sometimes we would run in some head scratching problem and everyone is standing around he would say,"Do something Grommet, even if it's wrong, just do something". It makes me laugh even now. I can hear him saying it. So what I'm saying might wrong but I'm doing something. Now AI uses "Convolution" to do it's work or at least it's a big part. Wikipedia has a great picture of this process I'll load with this comment and here's the link to the article.(I hope shows it's svg file.) https://en.wikipedia.org/wiki/Convolution Convolution is taking two or more signals, streams of data, or whatever and doing matrix multiplies on them and getting a result. I haven't a clue why you can use convolution on a bunch of sentences and somehow get AI out of it, but it seems "this is the way". This is very much like using Fourier transform which are sin waves, and combining them to make a waveform. You add them up same as convolution. This is very slow. Well they found wavelets which are made by amplifying and stretching a waveform then adding them is WAY faster. It doesn't use multiplication like other convolution, only addition and subtraction. I can't help but think this is the same as convolution with AI and would do the same speed up. The BNN(Binary Neural Nets) will obviously be even faster but it might cut off too much data or, as I call it, resolution, of the data. I again may be wrong about this because in BNN papers they talk about using many "hidden layers" to raise the resolution. Whatever that means. I'm guessing they run the data through(feedback)...something...a lot of times but only emitting binary data. It may be using something like wavelets could allow less of these making the final computation a wash but keeping higher resolution. It wouldn't take the picture so you will have to go to the link and look at the first picture.
>>21842 AI is in infancy. My understanding is they are feeding it all this data but haven't any real understanding of what it's doing inside the system. I, and others, noticed that some of the latest AI's seem to be crazy psychopaths. Some people have suggested that they are feeding them nothing but "woke" material to keep it from being racist and it makes them nuts. :) I think that's entirely possible. There's apparently a way to feed data into an existing AI. There's also a data set that's like several years of uncensored 4chan. Strip out the woke stuff and make sure all the sex and cooking is included and feed it too your waifu brain! Big WIN! I downloaded "llama-models-7-13-30", I think facebook model, but don't have a clue what to do with it yet. I have a weak computer. It's fine for what I've been doing but it's weak for AI stuff. These AI cards cost like $9k for the fast ones. But, maybe you could take the above trained model and like I said retrain it with 4chan, DIY books, cleaning books, sex manuals and cookbooks with far less power needed.(Hope springs eternal!) Here's one I found. I'm not exactly sure if it trains or just adds data or exactly how this works. https://github.com/jerryjliu/llama_index
>>21846 >>21847 I wish I had more to offer the group in this area Grommet, but unfortunately there's only one of me and I've had to focus on other things. Hopefully our resident AI geniuses can make productive use of this information for the benefit of us all. What I can potentially bring to the table is this: >that will let us test that or some way to set this up. If one of the brainiacs here can dumb things down sufficiently for my programmer's mind to absorb, then I think it rather likely that we can find an effective way to do things in software efficiently enough to run inside robowaifus (or possibly inside home 'servers')--provided the basic concepts given are tractible in the first place. But by all means please continue with this discussion Anon, I simply wanted to manage expectations a bit. Cheers. :^) >=== -minor edit
Edited last time by Chobitsu on 04/08/2023 (Sat) 23:27:26.
Tutorial for making an unhinged but honest and helpful AI partners that runs locally on any modern computer. https://yewtu.be/watch?v=nVC9D9fRyNU
>>21878 THANKS! Great video! Looking at this video I found another one he did on installing LLaMA c++ that can run using 4GB of ram. The video has a lot of steps. The c++ version install starts at 10:13 but you need to install other stuff first so you have watch the whole thing. https://www.youtube.com/watch?v=cCQdzqAHcFk >>21856 >I simply wanted to manage expectations Just to let people know I'm just pointing out stuff I find and I'm in no way saying that I understand this. The change in these things is so fast I have never seen any other tech move so fast. Never. There's a new model every week it seems. The above video shows how to install a interface to many various models. This interface has a training button that he says he will work on later. So this guys videos might be very good to keep track of. Some speculation. The C++ model runs using 4GB. Not much. Most require a lot more. So I'm thinking although it's slow if you train it with a supermassive surplus of the kind of data we want then it's neural net will be saturated with the sort of data we want and not the generalized stuff they train on. One reason these things are popping up so fast is apparently the cost to train them has plummeted. They are using online NVIDIA chips and training online. One university trained a model for $300. That's not out of the realm of possibilities. If you could do this with the C++ version we have something well within price range of the average person. Motherboards that use a LOT of ram are expensive. Those that use 16GB or so are cheap. I looked at this a few days ago. Because of the need for massive ram in these AI models I was thinking of just buying a motherboard and keep everything else I had the same. The high ram mother boards are 3 times as much to buy as a 16GB, so scratch that.
>>21884 Thanks Grommet. Yes I think that man, and several others are all of us converging on the idea that we need to be able to do both AI and (for us at least) robotic control systems on lightweight, edge-computing platforms (like phones, SBCs, etc). I'm not prepared yet in my circumstances AFK to tackle working on a system setup to pursue AI work. But it's gratifying to know that really talented men like the one doing the LLaMA-CPP project are already making great progress in this area. It will just get easier and easier at this rate to finally assemble everything together that we all need into workable and pleasing robowaifu systems. Thanks for keeping us all inspired with your ideas Anon, please keep it up! Cheers. :^)
Open file (12.44 KB 1062x753 floppy waifu.jpg)
>>21894 The next big AI breakthrough will be on CPUs. Many papers are independently finding that the layers of language models are just steps in a program, so the more layers you have, the greater their capability to process information becomes. With existing techniques in the literature, it should be possible to get better performance than Alpaca 7B with a deep and narrow 1.5B model with 200 layers. The first step is integrating models with discrete CPU/GPU program layers that provide proper gradients with continuous relaxations. There’s absolutely no reason to be using 20 layers of a 175B parameter language model to solve some simple arithmetic operations like (801 + 2 * 597)**2 that could easily be done with a little bit of machine code. Off-loading work to the CPU or restructuring it on the GPU will save a ton of memory and processing power. I’m working on a 1.44MB model at the moment that fits on a floppy disk by using procedurally generated weights that can be trained from scratch on low-end hardware. I’ve found as long as the matrix rank of the generated weights is high, it has no problem training. If the rank is even slightly degraded though then it starts to perform poorly on test data. It certainly won’t be breaking any records but I wanted to create a proof of concept to show that it’s possible to do rudimentary AI architecture research from home with limited compute and data. The generated weights could also be used though to initialize a 160x larger model so people could potentially make a 9M model, train it for a couple months, then finish by finetuning the full 1.5B model without needing a huge GPU cluster. After that, it’s just a matter of researching better architecture that can send data to the right areas for processing and receive it back. Layers are essentially just a batch of operations. Those operations need to be disentangled and organized into functions and data structures the model can call, reuse and combine as many times as necessary within a given compute budget. First synchronously, then asynchronously. The most important thing to do from there will be to automate generating continuous relaxations for any given code so it becomes possible to run anything inside a language model and have it learn how to use those programs. The big questions though are what kind of programs would be most useful for a language model to have, what programs are they running to process language, and how can we optimize them? If language models have learned to generate and execute various programs they might have developed functions like: >Merging - combining words, phrases, sentences into hierarchical structures, e.g. apple and bananas are fruit and fruit is a type of food >Abstraction - generalizing from specifics to more abstract concepts and connecting metaphors >Splitting - decomposing input into constituent parts for parallel processing across a layer >Mapping - translating semantic roles or concepts between structures for coherence and implicit relationships >Search - efficiently accessing information within the model's memory or context >Transforming - altering form/perspective while preserving meaning, includes paraphrasing, summarizing, QA, dialog, and translation >Symbolizing - treating sequences logically/mathematically for rule-based mechanisms >Conditional branching - context-dependent decisions and multi-step reasoning And some fundamental abilities that are currently missing from language models that would be useful: >Variables - buffers that can be assigned different values over the course of processing >Data types - being able to represent different types of information appropriately (text, numbers, strings, booleans, lists, images, audio, servo feedback, etc.) and limit operations to compatible types to make models more robust >Pointers and references - references to locations in memory for access and manipulation of information >Conditional logic - more advanced than conditional branching, for handling combinations of logical operators and truth tables in one step instead of using many >Subroutines and functions - capability to define reusable blocks of processing logic that can be called upon from multiple points in the model >Loops - ability to repeat a block of processing indefinitely, enables recursive functions >Packages - models should be able to import packages of functions, understand how to use them, and limit their execution with permissions >Preemption - temporarily suspend the current line of execution and shift to another higher priority task, resuming later >Exceptions - detect and handle unexpected conditions or edge cases that disrupt the normal flow of a program's execution and recover from them >Parallel execution - executing multiple different branches, sequences or programs concurrently and combining their results >Simulations and imagination - for doing Monte Carlo tree search or other search and optimization techniques >Neural networks as programs - ultimately, the model could generate and execute other neural architectures on demand to achieve a given goal or process a particular input in a cooperative, feedback-driven manner, "a model within a model" Basically an entire virtual machine. If anyone has any ideas or concerns please share tl;dr just stack more layers that do the layer stacking themselves
>>21896 This all sounds like interesting stuff that I have no idea about. If layers are just steps in a program, wouldn't adding more steps make the program more slower and complicated? Also, I heard that the CPU is becoming obsolete because everything it does can be done by the GPU but better.
>>21897 >Also, I heard that the CPU is becoming obsolete because everything it does can be done by the GPU but better. Not really. GPUs have some amazing characteritics, and are still outstripping the so-called Moore's Law(tm, & RIP), but they are nearly useless at most general-purpose processing needs. But for a fairly-narrow mathematics niche (big matrix transforms) they are suprisingly-effective beasts. No, CPUs are actually quite a bit more important to the world at large than GPUs Anon.
Open file (144.08 KB 671x367 ponko_shogi.png)
>>21896 Now this is the sort of AI future I like... So essentially you're saying that AIs should be able to recognize patterns in conversation that correspond to the need for a specific program? I don't know about adding all those things like variables and loops to LLMs... at that point you're pretty much making a computer (or VM, as you say) that uses the LLM so that you can interface with it using natural text. That's a lot different from the current approach, but maybe it's better? It definitely has more potential, especially if you have something like a DSL and carefully made specification to allow new programs to be made quickly, but some of the natural aspect of it would definitely be lost...
>>21896 >The next big AI breakthrough will be on CPUs This. Hi Robowaifudev, very good to hear from you! :^) >it should be possible to get better performance than Alpaca 7B with a deep and narrow 1.5B model with 200 layers. That's exciting, and I believe it. I've 'seen' that we'll have models that have thousands of layers in the future. >Off-loading work to the CPU or restructuring it on the GPU will save a ton of memory and processing power. That's correct. Most AI researchers have their heads shoved so far up their grants, that they can't see the trees right in front of them for the forest around them. Tiny processors back in the day were doing some great work with FORTAN and C. An amalgam is the correct approach, and so far you are literally the only valid AI researcher I personally know of who gets this. >I’m working on a 1.44MB model at the moment that fits on a floppy disk by using procedurally generated weights that can be trained from scratch on low-end hardware. AWESOME! Now you're talking Anon! I've often thought that -- just like insects & Carver Mead's Neuromorphics systems do -- embedding neurologic-like 'AI' right into the sensor/actuator combo will give us greatly-enhanced, very lifelike kinematic responses. >tl;dr Process incoming stimulus data directly onsite (and therefore nearly instantaneously); two-way comms stacks are the enemy of fluid response mechanisms. Your FloppyAI should directly enable this on tiny little sensor/microcontroller combos embedded, say, right in a robowaifu's knee joint. Or a [fingerpad pressure sensor/knuckle actuator] located in & controlling one of her finger's grip, for instance. The positional, status, and confirmation comms can take place back to her central core after the fact (and also be handled rather lazily time-wise, compared to the immediate hard realtime need there onsite in her fingers, etc). >ttl;dr Decentralize the robotic compute out to the edges; FloppyAI can play a big role in that! :^) >Layers are essentially just a batch of operations. Again, this. >The most important thing to do from there will be to automate generating continuous relaxations for any given code so it becomes possible to run anything inside a language model and have it learn how to use those programs. Care to explain this for the uninitiate Robowaifudev? Gotta admit you lost me there for a second. :^) >your breakdowns listings... Now you're talking! These are the kinds of pratical tasks I can get my head around and begin thinking of working code solutions for. Moar plox. >If anyone has any ideas or concerns please share I have little idea how to do these as language models, but I can definitely see them as callable-routines that a model could conceivably interact with the public interface of. If you figure out how to make your models interact with these external functions, then I think I should be able to figure out many of the functions themselves. And even if I couldn't do it, then there are literally thousands of skilled C & C++ devs out there who likely can (and it will all run on a wee thimbleful of processing power). POTD, BTW. Cheers Anon. :^) >=== -prose edit
Edited last time by Chobitsu on 04/10/2023 (Mon) 21:37:40.
>>21901 >That's a lot different from the current approach, but maybe it's better? Much better! It's the hero-breakthrough we all need tbh. :^) >but some of the natural aspect of it would definitely be lost... I suggest it will become far more 'natural' Anon. In that sense we'd be acting more nearly as the hand of God in the designs. There's solid neurological evidence that metric boatloads of things we take for granted in our brain responses were already-hardwired in there from before birth. From my faith-based perspective that's clearly God's handiwork & foreknowledge. Thus our mimicry of 'front-loading' capabilities for a LM to utilize is actually very appropriate. BTW, is that a robowaifu in your pic? Sauce?
Open file (126.14 KB 743x406 ponko6.png)
>>21903 >I suggest it will become far more 'natural' Anon... I was sort of imagining it as a computer that you have to explicitly tell things like: > Start shogi-playing program > Start calculator program > etc. That's the sort of feeling I get when you mention adding all those programming-language-like things to it. That you're making an environment for interacting with other software through natural language. This is pretty fun to think about by the way. I like thinking about random things like this in my free time so you've given me some nice ideas. >BTW, is that a robowaifu in your pic? Sauce? Yeah, it's Ponko from the manga Ponkotsu Ponko https://mangadex.org/title/b84a9c69-624b-4c4f-ac57-8d9a162883f1/useless-ponko I just posted her in the Robowaifus thread >>>21904
>>21897 >I heard that the CPU is becoming obsolete because everything it does can be done by the GPU but better. As the other anon said, it's not that simple. GPUs are better than CPUs at tasks that can be parallelized. If the program you're making has a lot of steps that need to happen in a certain order, it won't be any better on the GPU (probably even worse, since GPU computation units are generally pretty minimal). But, if you're doing something where a lot of things are done independently, the GPU really shines. Rendering is the perfect example. On a 1920x1080 resolution, you have 2073600 pixels that you need to calculate the color value for, and those can all be done independently, meaning you can do them all at the same time. That's the original usecase for GPUs, but it turns out there's other similar usecases where parallelization of a problem allows it to run orders of magnitude faster on a GPU
>>21905 >I like thinking about random things like this in my free time so you've given me some nice ideas. Neat! Glad to hear we're giving you some robowaifu food for thought. :^) >Yeah, it's Ponko from the manga Ponkotsu Ponko >I just posted her in the Robowaifus thread Nice, thanks Anon! Heh, I think some of us keep our heads down and miss a few things like her. I'll carve out some time this week and read through some of it. Cheers. >>21906 Outstanding. I'm quite pleased to know there's an Anon here who understands the hardware of GPUs rather well. It's an area I focused on for a minute. Please share your insights for us Anon if you will. For example, do you think it fairly reasonable to perform CPU/GPU processing amalgams such as Robowaifudev & I were discussing? I'm sure the hardware will support the cross-boundary data comms, but what would you consider the best way to hook into the CPU code for something like a LM working away up on the GPU silicon? >=== -minor edit
Edited last time by Chobitsu on 04/10/2023 (Mon) 19:25:07.
Open file (558.59 KB 1320x390 ponko10.png)
>>21907 >I'm quite pleased to know there's an Anon here who understands the hardware of GPUs rather well. Ehehe, I'm that same Ponko anon. Graphics programming is something I got interested in around 2 years ago. I went through learnopengl.com and then messed with Vulkan later on. My latest project was a sort of generic framework for handling some Vulkan boilerplate and offering a ince interface for combining multiple renderers in one application. I also made a basic 2D sprite renderer and an ImGui renderer, so now I can make basic 2D programs with a UI with Vulkan. I plan to expand on that and have some (unrealistic) projects in mind for the future like a GUI toolkit. Anyway, about robowaifus and GPUs, I think Vulkan and compute shaders are pretty much the cleanest way to use them as external computation units. I said I like thinking about random things in my free time and for the past few months I've been obsessed with thoughts about programming languages, so this may be biased, but the way I imagine programs for robowaifus would work best is if you make some sort of DSL that lets you do a lot of small programs easily that can access external robowaifu interfaces like sensors, motors, and GPUs. Maybe something lisp-inspired would be best, imagine a language where you could easily offload computation to the GPU (with automatic synchronization and all) as well as load picture data from the eyes and things like that. Anyway, I'm sort of going off on tangents, but GPUs could also be used for a GUI interface, I can think of 2 things: 1. the robowaifu's face is virtual. This would make expressing emotion much easier than physically moving a lot of different muscles in the face to present emotion like humans do 2. an interface that you get for controlling the robowaifu when you connect her to a screen. So pretty much an OS I'm glad to meet someone into graphics stuff too, it's pretty hard to find such people online...
>>21909 Wow that sounds really interesting Anon. What do you think of the Vulkan API generally? I never really got into it. In prep for this project I was studying OpenCL a bit here and there for our GPGPU needs, but I've since been diverted off into other needs for now. >the robowaifu's face is virtual For our Sumomo-chan project (>>14409) we plan to use simulations of actuators and other systems, and then drive the character's animation against those simulations. The idea being to block out the sensory & control algorithms, etc, in the 3D animation form while it's still quite cheap relatively-speaking, and then to (hopefully) move that over mostly-unchanged (apart from concrete drivers for, say, an UNO 3; or a headpat pressure sensor) into the real Sumomo headpat robodaughteru's hardware systems and begin her real-world testing phases. Ofc facial animation is included with that effort, the results of which could be directly-applied against our Virtual/Visual Waifu needs too (>>240). >=== -prose edit
Edited last time by Chobitsu on 04/11/2023 (Tue) 05:23:03.
>>21897 Layers can learn to only modify the hidden state if they have something to add. LLaMA model for example added gated linear units to the MLP which gives each layer much better control over which part of the hidden state they edit. The neural data router paper also introduced a copy gate that allows a model to skip processing entire layers, which they found was necessary for it to solve a compositional table lookup task without error where it's given tables of functions with their inputs and outputs and it has to calculate the correct answer to something like f(g(h(101))) = 111. Basically I'd like to do something similar but output an end of processing signal which makes it return the hidden state and then output a token with the language model head. It would only do as much processing as needed but also be capable of taking a longer time to solve much more complex tasks. >>21901 Yes, it would have a function to detect what kind of input it got, a function that routes that input to the proper function to handle it, and then return the result which other layers can detect and route to do further processing. I'm not sure that creating a DSL for it would be a good idea. I want it to be capable of utilizing existing C++ and Python code. That way nobody needs to learn a new language and can just drop their existing projects into their AI models without modification. >>21902 >Your FloppyAI should directly enable this on tiny little sensor/microcontroller combos embedded, say, right in a robowaifu's knee joint. Not quite there yet though. While the parameters may be tiny enough to fit on a floppy, they're used to generate much larger weight matrices, usually over 100x bigger which still require quite a bit of memory to multiply. The memory savings are still considerable though and will enable SBCs to use much larger models. After inferencing a layer the generated weights can be freed if memory is limited. I'm hoping with this model I can isolate the useful functions it learns and distill them into code with continuous relaxations that can be compiled for microcontrollers to run and that the model can inference and do backpropagation through. Then microcontrollers could be an extension of the language model that it can learn to coordinate and use through backpropagation. Perhaps one way to do this might be to quantize the hidden state and use a SAT solver for all the most common inputs and outputs. Another way might be to generate random functions and train a model that learns to reproduce the functions from a given set of inputs and outputs. Personally I'm more interested in the latter approach because then the effort would be towards improving itself by finding ways to reduce its own computation, leading to exponential returns. If AlphaFold can improve protein folding solving by almost three orders of magnitude, why not Boolean satisfiability? I imagine Nvidia already made one behind closed doors to improve circuit design. >Care to explain this for the uninitiate A continuous relaxation is a technique that involves relaxing discrete variables to continuous variables so that continuous optimization techniques can solve the problem, such as backpropagation. A simple example is x > 2. The continuous relaxation of this is a logistic function (with an S-shaped curve) that takes an input x - 2 and beta parameter. def logistic(x, beta): return 1.0 / (1.0 + exp(-dot(x, beta))) logistic(x-2, beta) # x > 2 When beta goes to infinity it returns 1 for any value x > 2 else returns 0, except the special case x=2 where it returns 0.5. When you relax the beta parameter, such as to 1, then when x = 3 it returns 0.731, and x = 0 returns 0.119 and so on. You start training with a low beta value and gradually increase it until the model learns the discrete algorithm. Seeing how it works with graphs and code will make much more sense: https://www.youtube.com/watch?v=01ENzpkjOCE AlgoVision already has done quite a bit of work on smoothly integrating discrete algorithms into neural networks: https://github.com/Felix-Petersen/algovision Automating these continuous relaxations for C++ and Python code would make it so that existing programs can be used in backpropagation. >>21905 >I was sort of imagining it as a computer that you have to explicitly tell things like: > Start shogi-playing program > Start calculator program > etc. >That's the sort of feeling I get when you mention adding all those programming-language-like things to it. That you're making an environment for interacting with other software through natural language. That's exactly it. Basically any library or program loaded on your computer becomes something your robowaifu can use so long as you have the documentation of how to use it, and if you have the source code then you would be able to train her model on it directly by being able to estimate a gradient for it with continuous relaxations. Some tasks would require other training methods though because it would be too costly to represent an entire operating system with continuous relaxations. Using the web browser or playing Minecraft for example would require reinforcement learning or something else like recursive classification of examples.
>>21914 Thanks for the explanations, Robowaifudev! >Not quite there yet though. I understand your point, but I would just point out that the set of problemspaces each individual robowaifu component needs to solve is intentionally quite tiny. For example in the case of an individual finger assembly it's something along the lines of -"open all the way" -"close all the way" -"go halfway open" -variations on the above as "for i ms" -variations on the above as "but with only j N" -variations on the above as "but immediately retract on obstacle" This is why your FloppyAI will easily be able to handle this, b/c each individual case is much, much simpler than a more-general AI need. We'll find a way Anon! :^)
>>21918 Another idea I have but am still working on refining is separating the layers themselves into components like an entity component system in a game engine and then instantiating them with given parameters. If it's possible to decompose the complexity into small atomic components and find common patterns in how they're combined and stacked, then the computation can be optimized. A more modular network architecture makes far more sense for robotics too. There needs to be a fast processing layer that can run at 20 Hz. If your robowaifu is moving and detects her hand is about to punch through your monitor, it's absurd to wait for everything else to process through 100 layers before acting on it. The servos need to respond to that event immediately and stop. Then while that happens the layers dealing with balance and pose would be processing the event to resolve how to remain balanced so the robowaifu doesn't fall over. Then another layer stack doing motion planning with that. This event would continue bubbling up into the language processing layers and the robowaifu would begin commenting on her mishap. CLIP skip in Stable Diffusion models has shown that there is plenty of usable information in the hidden states of previous layers even without explicitly training them to have it, so it's definitely possible to attach different output heads to different layers. For fast processing layers that run continuously, let's say in 1 tick, their outputs could be accumulated into a mean or summarized for other layers to process. If language processing is 1000 ticks, then it could get a summary of those ticks by taking the sigmoid of their summed outputs (that have gone through an activation function like SiLU), so that any important events that happened in an individual tick would be propagated.
Open file (133.28 KB 720x720 mira-money.jpg)
>>21926 An interesting idea Claude brought up in a brainstorming session is applying resource management to language models. >Treat certain components as resources that must be managed, like a limited attention span, memory capacity, number of parallel processes, etc. The system has to be strategic in how it allocates and deals with these resources through its component compositions. Resource management in game engines is all about efficient loading and unloading of large amounts of memory. If your robowaifu isn't playing chess, there's no need to load all the data required for playing chess until it's needed. All the memory data needs to be managed accordingly, stored in a hierarchy that can be quickly loaded and unloaded, kind of like checking for collisions in a quadtree or octree. Another idea I liked that Claude brought up was using key performance indicators. >Components and layers should have quantifiable metrics that evaluate how well the network is achieving its key goals, e.g. accuracy, coherence, contextual appropriateness, etc. KPIs should be monitored continuously and make adjustments as needed to improve performance. An object detection layer shouldn't have to destroy its accuracy because the following layers failed to make use of its features correctly. The brain does something similar with modulating its plasticity. When something works really well it stops changing it and it becomes habit. Something similar could be done by attenuating the gradient to components that are highly-performing in some metric. I also like the idea of using multiple metrics to train components, but it would need to be done carefully since it can make training take orders of magnitude longer to converge. Claude also suggested keeping track of both local and global KPIs to motivate components to find synergies and stronger collective function. And another business idea applied to language models I liked was budgeting, similar to resource management: >Assign a kind of "cost" to activating and utilizing different components or following certain composition rules. The network has to make decisions that optimize the "return on investment" of its limited "budget." Less useful parts can be defunded. >As the network gains experience, it learns which components or styles of processing are most essential under what conditions and which more peripheral. This accumulated "wisdom" guides increasingly nuanced defunding decisions, realizing which parts can be deactivated for cost savings without losing too much capability or value. Components that aren't making a return on investment could be defunded by decisions made by the network. I recall DeepMind using AI to optimize Google's data center and reduce its cooling bill by 40%. Models might find novel ways to sparsify the network to its essentials and then learn to scale that design up. Last but not least, I wrapped up the brainstorming session with how Mira from Dimension W was told to not just memorize data but imagine it and update that model of the world as new data came in. >Filling in missing details. Design and train components not to just replicate surface details from inputs but also to infer and generate additional details, connections, backstories, etc. that flesh out concepts into fully realized conceptual worlds, even where information is lacking. >Develop training methodologies that explicitly encourage components to denoise inputs, fill in missing details and generate coherent speculated possibilities. This could include things like partial inputs, noisy corruptions of data, prompting components to reason beyond surface facts, evaluating coherence/likelihood of imagined possibilities, etc. This is a really interesting idea to make components inherently denoising so they're capable of inferring coherent possibilities from little or missing information. A lot of great progress came from UL2 which makes use of a mixture of denoisers as a pre-training objective and of course Stable Diffusion too where all our generated robowaifus come from :^) >When a component gains new data or insights, update not just its local representations but also all related images and imagined contexts to maintain coherence, consistency and depth. >These stored imaginations can then be recalled and updated when new insights emerge, allowing the network to develop increasingly multifaceted and nuanced mental models. >When information is lacking, imagination involves recombining concepts, patterns and details the network already possesses in fresh combinations and new permuted relationships. Gaps are filled by rearranging familiar pieces, not generating something from random noise. Components could store these imagined possibilities for later and recall them when a new insight occurs. Having all these different possibilities would help ground it so updates to its local representations doesn't destroy its mental model. New information would be another point of data to help fill in the gaps. There could also be two different modes too, one for denoising and one for grounded reasoning from explicit data points. For now I'll leave it there. These ideas will keep me busy for a few years at least.
>>21927 Who's Claude? Are they interested in helping us? >Using game engine data prioritizing methods in robotics. That's a very clever idea. Perhaps level-of-detail (LOD), a concept where polygons and pixel density of game object dynamically adjusts to save on resources. A lighthouse could be just a few polygons with low-res textures far away and have thousands of polygons with a high-res texture map when close by. A similar method could be used for AI, where she could have an extremely light neural net dedicated to listening for a wake word sequence like "Hey Waifu" that would trigger the loading of an LLM, text-to-speech engine, and the AI which translates text to intentions to guide her behaviour. What would be pushed out of RAM to make space? Don't know, I trust your judgement on that matter more than my own either way. >Performance indicators Overcomplicates the neural nets. This is best done in the training stage with some ideal state being burnt in. Though your previous statements on generative layers leads to the possibility for this to be done in a somewhat computatively efficient way by changing the markers which weights are procedurally generated from in memory then checking on how altered behaviour corresponds to some set goal. This is assuming a lot of things on my end. Your idea has tremendous potential for lowering power consumption and compute load in training via guided guess and checking. I can't wait to see it IRL. >Mira Her mind works exactly like a human mind would ideally work when rendered in silicon. (Rendered as in computatively generated.) In neuroscience, the human mind works by storing everything in working memory (RAM) before shunting it off to much slower but much larger long term memory (SSD). Both are very limited as our brain would rather allocate neurons towards sensory and processing. So, we do a really fun trick to "compress" data to the smallest state possible, we save descriptions. Even your clearest memory is generated on the fly from what is essentially a few lines of text, a few bytes of chemical data, angular values, and whatever other qualia data helps in generation. Think of Stable Diffusion but, for all senses we have. Even vocal memories we hear in our heads is just text and style data which is difused into an internal imitation of sounds which other neural nets interpret. It's incredibly efficient when you consider we have what may as well be infinite compute power available to the subconscious and a barely working floppy disc to store over a 100 years worth of data on everything we have ever experienced. Of course, having neural nets retrieve data, synthesize a slice of local reality , so other neural nets can figure out the flavor of wine you drank a day ago is also hilariously inefficient. (Robowaifudev already knows all of this but, I hope this helps others reading.) It's nice we can just have a lookup table for AI to retreive data from. Though, storing her memories as text summaries will also be necessary. >Denoising Definitely going to be needed. Generating images and geometry is much faster when you only need to compute a few samples per pixel than denoise the rest. It's actually used extensively in cinema to render CG much faster. DLSS works in a somwhat similar way as well. Accerating denoising is one of the things that give nVidia a huge advantage for visual effects artists. I would like to add that a waifu absolutely needs reflexes for safety. Much like a dog inherently swims when placed in water (even if their mind doesn't want to), a waifu needs to have parallel systems that overide her conscious actions. Such as, if her arm detects an obstruction, she should be forced to stop whatever she's doing, rather than continuing and potentially causing harm. These systems are used extensively in manufacturing and are usually really simple. You can just pause whatever she's doing until her master instructs her to continue. How hard would it be to pause an AI's behaviour? I'm honestly curious.
>>21926 Neat. That sounds like both a clever arrangement of the layers + their signalling propagation, and a practical approach to real-world motion control as well. Please continue with this line of investigation Robowaifudev.
>>21926 >KPIs What's that in this context Anon? These are some really interesting outputs. Is there any way you could make Claude or others available to other Anons, Robowaifudev? >Models might find novel ways to sparsify the network to its essentials and then learn to scale that design up. This. If there's one area that AI clearly outperforms human actors in general it's in the general realm of evolutionary designs. The simple fact is they can look at many many more options (most entirely useless ofc) to find some that are novel. What AI lacks in intuitive insights, it can sometimes make up for with brute force, given the proper system contexts.
>>21929 >So, we do a really fun trick to "compress" data to the smallest state possible, we save descriptions. Even your clearest memory is generated on the fly from what is essentially a few lines of text, a few bytes of chemical data, angular values, and whatever other qualia data helps in generation. That's a really cool idea. While I'm a bit skeptical that we even understand what a human mind is, the simple fact is that it's design is such that at some stage the human soul has a direct interaction with our physical brains. Sort of a 'rubber-meets-the-road' type thing heh. :^) I doubt not that they purely-physical aspects operate at an incredible efficiency, given the abundant ample evidence for a similar characteristic in God's other handiworks that are much more directly observable. Great stuff Kiwi, thanks! Cheers. :^)
>>21884 >The high ram mother boards are 3 times as much to buy as a 16GB, so scratch that. I recall looking at a MB on AliExpress which interested me. It was optimized for a lot of RAM but could only carry a CPU with a few cores, so I discarded it. Don't remember te price, 3x doesn't say much without the base price. Cheap MBs are below $100.
>>21948 Just curious is you ever in fact found a good MB that suited all your needs, Noidodev? This would be a good thing for us as a group to track tbh. Cheers.
>>21951 I wasn't looking that long and I only can get started buying home servers after my current endeavor to move to a new country succeeded. I'm even not sure what to get exactly, I was just looking around. I'll need to look into which amount of slow cores makes sense, to run big models in RAM and using two CPU's for inference. Additionally to having GPUs for inference and training. I'm thinking of a dual CPU Xeon with 18-24 cores each, and some hundred GB DDR2 or DDR3 RAM.
>>21948 >Cheap MBs I'm talking $50-60 bucks. I call that cheap. My present motherboard was I think $120 or less. I'm not into games or anything like that so, until now, I was only interested in lots of ports and lots of drive interfaces. Works great for me. Lot of cores, could care less, but now...AI demands lots of RAM. One thing I learned the hard way was buy major quality brands. I like ASUS , Gigabyte, MSI. These are about the only thing I would buy. Not saying there are not others that are better or just as good. Only that I would be likely to get a good board if ordering from these guys. I also think due to slowing of processor power(serial) unless you have something that requires a lot of parallel processing you haven't had to upgrade. If you're into games, yes you will have to but I'm not. I will admit my looking around was only a few hours. It may well be that someone has something I missed. So to make sure I checked again and, I'm wrong, I found a "ASUS Prime B450M-A II AMD AM4" that can take 128GB for $79. WOW. I didn;t see this before. I searched earlier on Amazon. This time I used NewEgg. Amazon search has gone to shit so maybe that's why I missed this. I may have to get something like this and add RAM as I can afford it. I have a lot of drives so I would need cards for that.
>>21896 This is fantastic stuff. Please write down the ideas you have about this even if they are not complete or even close to fully formed. Sometimes thinking out loud on paper really helps you understand things even if it turns out to be wrong. Generally if you write it down it clarifies things. Don't feel bad or expect perfection and be willing to state things you may have to backtrack on. You will find after you write things down like this it helps you see things that you just could not see before it was written. It's ok to be wrong. I think TV has negatively effected us all by people going overboard if someone is wrong about something. TV makes everything life or death on every issue to pump everything up in importance, (even though it's not) and this makes us all hesitant to air ideas that may be half formed or end up being wrong. There's nothing wrong with airing ideas not fully formed to see what it can lead to. Don't let TV ruin you for out loud thinking about problems.. I've talked about about using ESP32 micro controllers for all actuators(and possibly general processing because to get the number of required input and output signals you end up with a lot of extra micro controllers and processing power). As they have built in libraries for "CAN bus 2.0" network which is used in everything from cars, machinery to medical equipment. I think this is the best comm link because it's had so much work done on it and is very reliable. I want to coin a phrase or a way of thinking about robots. I've thought about this a lot. I coined this term AntIntelligence. So "Ant I". Think about all the complicated things an ant does but it has next to no intelligence. I think if we break up things into small "Ant I" type packages for movement we can get somewhere with little power. Here is an idea I had about how to make a waifu walk with very low processing. >>21602 (BTW speaking of making mistakes, I think I will need a few more positions sent for hip and foot twisting but I bet most walking type movement could use mostly foot data only) The basic idea is to have the "brain" figure out where to go to to walk and then tell the "feet" where to go. In fact the feet will be given a direction, velocity vector, an end point to where they are going and nothing else. This small amount of info can be used by the whole leg, hip, etc. because in order to move the foot somewhere all these are joined and have to move a certain way. You said that there also needs to be feedback to stop it from hurting itself. This could be an interrupt from the foot or any other muscle that when it gets a feedback force from hitting something it sends it to the brain. This could keep it from tripping. Say the foot hits something. Now a human doing this could trip. The body is in motion, it is shifting it's weight and expects to catch the weight on the foot but if the foot is stopped it's weight is off. So there's a limited amount of things to do. One is fall on your face or arms, another is to twist and fall on your back(good if the waifu is carrying a person to protect them), but the best response is to raise the back leg that it is pivoting off of very fast and lower the center of gravity so that the center of gravity does not fall over. Think about this in time. Interrupt from foot being stuck hitting something sent, brain automatically raises rear foot to keep from falling. But notice the data sent is extremely small. I made a list of the data needed (rough list) in the link for walking. There's also links where people found that excellent walking of bots could be done with gross limb movement and then add little fine tuning movements as it is moving. All these could be done by low speed dense information by only sending foot movement info or hand movement. So each muscle would have a tiny AntI program that would respond to one set of instructions to move the foot somewhere. One other thing. These programs for movement and stuff like this I think would be very appropriate to use Binary Neural Networks to solve them. >>21590 I would think since these are Yes/No type things, move here, move there they would be very conducive to this sort of BNN problem solving. The benefit is these work great on low power processors with standard computing and no floating point needed.
>>21896 You really should have a look at these papers on Binary Neural Networks. They are very much what you are trying to accomplish with the floppy level AI. If I remember correctly some BNN researchers are using a lot of layers and getting good results from this. >>21837 >>21590 >>18651 >>18652 >>18818
>>21896 > I’ve found as long as the matrix rank of the generated weights is high, it has no problem training. What does this mean? Specifically what is "rank"? Where does this rank come from? As you can imagine any formula is garbage in garbage out so how do you define "rank" such that it gives us what we want. A specific case, language processing. How would you go about determining where to get the data and how would it be "ranked"? I have the same problem with "layers". I can guess that each "layer" processes a set of values. Back to speech processing again. So you have a bunch of sound waves coming in how does each layer decide what is important to...say become a word? And how would each layer add to increase the resolution or accuracy of the answer? I also wonder as a general rule. If you do a LOT of preprocessing with many layers could it be that the "model" itself requires much smaller processing power? Another way of describing this is you have what I see a cloud of neural net processing and when it is used after the training "sets". The data flows through it without so much processing. More like a filter or a sieve type operation. The function is sort of baked in lowering the power needed to run it. And of course I may be misunderstanding so completely that what I'm asking doesn't even make sense. I'll risk looking like a fool to understand something so interesting.
>>21902 >Decentralize the robotic compute out to the edges; FloppyAI can play a big role in that! :^) I think this is 100%. These little micro-controllers we have to use for inputs have a great deal of extra computing power per input and output that can be used, I think, for exactly that. >Layers are essentially just a batch of operations. I don't understand the operation of these. As in add, subtract, divide, what's it doing? >The most important thing to do from there will be to automate generating continuous relaxations for any given code so it becomes possible to run anything inside a language model and have it learn how to use those programs. I wonder if he means by "relaxation" to allow them to retrain their neural nets on the fly??? I was thinking about robots in general and having two buttons that you push. One, good robot, one bad robot. So if it dod good things it reinforced it by the good button and iof bad it realized it needed to try something different. if it was savy enough it might ask yuo what to do and try to understand how to change. Could be verbal buttons. Bad robot, good robot. Or just good, bad...no, like you talk to kids.
>>21914 >Layers can learn to only modify the hidden state if they have something to add. LLaMA model for example added gated linear units to the MLP which gives each layer much better control over which part of the hidden state they edit. The neural data router paper also introduced a copy gate that allows a model to skip processing entire layers, which they found was necessary for it to solve a compositional table lookup task without error where it's given tables of functions with their inputs and outputs and it has to calculate the correct answer to something like f(g(h(101))) = 111. I think it quite likely that alien implants would be needed for my brain to understand this.
>>21963 >I'm thinking of a dual CPU Xeon with 18-24 cores each, and some hundred GB DDR2 or DDR3 RAM. I'm betting that our frens at /f/ could help you out with some ideas Anon. Anon posted a server with (IIRC) 8 GPUs in it. This is a single node mind you. I'm sure that systems like that would still be expensive to assemble, but at the least you'd have full control over what you do with it. And, BTW the way things are going Anon, I'm quite hopeful that just a normal, reasonably-modern laptop will be sufficient to act as our home 'server', with most of the needed responses being directly-generated onboard the robowaifu herself. This was always the ideal ofc, and now it's looking like coming to pass too! What a time to be alive! :^)
>>21967 >I also think due to slowing of processor power(serial) unless you have something that requires a lot of parallel processing you haven't had to upgrade. I've been working through Williams C++ concurrency book as part of my own work Anon. Do you have any actual desire to learn to program parallel software in the future? If so, then I can give it some consideration for our classes here. I'll make time to read through your newest posts ITT a bit later today Grommet, just wanted to throw that question out there for (everyone, actually). Cheers.
>>21927 Stumbled across an interesting paper on a denoising seq2seq model with 20B parameters that outperforms PaLM (540B) on 1-shot summarization tasks by learning to do both denoising and causal language modeling. Also outperforms GPT3 on SuperGLUE and SQuADv2. I've been suspecting for awhile that decoder models are a dead end and this pretty much seals that thought for me https://arxiv.org/abs/2208.01448
>>21979 >betting that our frens at /f/ could help you out Thanks, but where? Who? It's not on the ring and I don't think on 4chan. >laptop will be sufficient to act as our home 'server', Doubt, not for conversation, thinking, voice recognition, speech generation, physics simulation, motion planning, ... And all at once. Sorry but this is delusional. >>21983 Thanks, this looks very interesting.
>>21991 >Thanks, but where? Who? It's not on the ring and I don't think on 4chan. They're over on Anoncafe. We link them in our Welcome thread (>>3). >Sorry but this is delusional. Lol bad as all that is it? :^) Actually, our plan is to accomplish all of the above you mentioned onboard the robowaifu. The PC simply acts as a 'booster' and external world controlled-access point. I think some might actually call that plan delusional...but time will tell! :^)
>>21980 >Do you have any actual desire to learn to program parallel software in the future It will be "in the future" for sure. Right off the top of my head I would think the most bang for the buck would be a way to spread functions throughout micro-controllers. 99% of the time our micro-controllers will be doing nothing at all. I suspect using them and maybe some generic RISC fast processor for thinking and general control logic would do damn near all we need right now(in fact generic PC motherboards will likely end up being more cost effective no matter how fast specific new processors get). I bet you could buy right now, if the code was written and optimized, no python, you could have a waifu that could talk to you and move around gracefully. I'm only talking about potential. It would need a lot of training but I think the hardware power is there now for say, $3.000-$5.000(pulling a number out of my ass). It would not surprise me if you could do it for far less, with some tricky training using Binary Neural networks and other training tricks. A ESP32 has 600 DMIPS and I figured 20 or so, maybe 30 to really cover all things we need(inputs and outputs), well...that's a lot of processing power and when the waifu is not moving about then it could use its processing to run AI number computing. Finding ways to split up this power is not in every textbook or any. I suspect that by the time anyone gets around to making a waifu a lot of the software will be basically done. If only in a generic domain specific package form. Example. There are now AI's that do speech recognition. (I believe on a normal PC) and these can feed the rendered text to AI logic engines like chatGPT to help the waifu decide what to do. These models, especially free ones are sprouting like mushrooms after a big rain. It's an extraordinary growth spurt the likes I've never seen before. I saw this new one that can run on a PC and they trained it for $300 on an online, I think it was, NVIDIA neural net chip array in a few days(Vicuna). That's super cheap. I've seen programs that train AI's with pdf files and other generic text data. So we could start with a generic model like gpt4all,(next week they may have one twice as good), and then train it on what we need with text. Fill it full of sex manuals, cook books and 1950's female home economics manuals(especially important, big time). Back then they actually taught Women to be good wives(not kidding real world). Combine this with a simple command, don't do that, do this, that's good, that's bad, that the AI uses to feedback it's programming and in a few months you could have something half way decent, software wise. Now I don't know how do these thing as but I've been reading about it, I see others are dong this sort of thing (but not for our purposes) but we can use their techniques for our own needs. I don't understand AI yet, and may never, but I surmise that the parameters involved in describing the neural net are not super massive and I think the processing of these parameters can be split up and worked on in parallel. If this is true then the operations on the parameters, I think, tend to be generic. Maybe we could pass these parameters to microcontrollers and have them operate on them then send back a result without using too much in the way of it's network (microprocessor waifu nervous system network), "IF" the parameters are small enough. The writing here about floppy AI's makes me suspect this may be true. After all if it's thinking hard it is likely to be still or going slow making the networks nervous systems usage next to nothing. So you call the waifu's name and first thing it does is stop and look at you. So all it's power is on interacting with you. That's what you want anyways. I think I'm going to try and refrain from taking more about AI stuff. I "thought", famous last words, that I was slowly getting somewhat of a handle on these things but the more I know, the more I realize, I don't know. Sigh...
>>21991 >Doubt, not for conversation, thinking, voice recognition, speech generation, physics simulation, motion planning, ... And all at once. Sorry but this is delusional. I think you're wrong and I can give reasoned answer to why this is. Because almost all the data they use is shit, trash, rubbish and useless. Why does a waifu need all of GIT, why does it need all of wikipedia, why does it need all of Reddit? I say start with that base then overwrite the hell out of everything by training it only on the functions needed. Speech recognition, which they had years ago on weak ass PC's. I suspect the walking and balancing problems are not as bad as people think and if it's paying attention to you, as I said, it can use all it's power to run AI software. Far better to train it on romance novels and set the gain so that the owner is the hero than teach it how to do python programming. If a 4GB RAM model on a normal PC right now can carry on a decent conversation, and I think it can in narrow areas, then think what concentrated model training could do with say a 256GB motherboard with a couple of processors and a kick ass GPU card. I think you could have something credible right now. Notice I left out the training part as it would take time, and deep thought. I'm talking about hardware. I don't think walking will be big problem "if" it's broken down over a longish multi-step process. The guys dong this are trying to program everything but look at insects, birds, etc, and they do complicated stuff with great stupidity. And humans pay no attention at when they are walking. I think the academic study funding process holds them back. I bet if you broke it down and made an AI that only looked at people walking and broke that down to skeleton movement. It makes a generic walking neural net based on limb lengths, then watched the waifu and programmed the waifu in real time to walk while making a neural net based on the specific waifu, I bet the processing would be really small. But this doesn't fit into a traditional academic funding request. Another thought I had was to make the waifu then have it trash it limbs around all over at different speeds, movements and program itself just what it took to move it's limbs and then feed that back into where it wanted to go. It would be like a filter that it could use to regulate movement.
>>21992 >over on Anoncafe Ah, okay, thanks. >plan is to accomplish all of the above you mentioned onboard the robowaifu I agree, but the onboard systems will be more rudimentary at the beginning and require some beefy external computers. Fast onboard responses might come from AIML scripts, but not from a huge model.
>>21998 So, I actually looked into /f/ and also looked into the 4chan ChatGPT thread. First of all, the webring is dying. There's so minimal activity in those threads that it is demotivating to ask anything there. It's exactly how I said a while ago: Without making boards easily accessable with an app like KurobaEx or Omnichan it's goin to die. I was right. There should be an app supporting all Imageboards. The 4chan chat bot discussions on the other hand are full of people using role playing bots for NSFW conversations. They don't push the limits of the tech, but try to work around filters and censorship.
>>21995 >>21996 Nice posts Grommet. Certainly, don't stop sharing your thoughts and ideas here Anon. It helps everyone and our AI brainiacs perhaps most of all! Cheers. :^)
>>22001 >First of all, the webring is dying. Lol. OK, what are you planning to do about it Anon? :^) >Without making boards easily accessable with an app like KurobaEx or Omnichan it's goin to die. Sounds good, please work on that for all of us. I'll try to help out as I may ofc. Meanwhile, /f/ has always been a slow board, but IMO you're unlikely to find many anons as knowledgeable about old school tech like that. Try to encourage yourself Noidodev, and remember why you're here! Cheers. :^)
>>21995 Networking ESP32 for cluster computing is inefficient. They can be used for limb intelligence. The ESP32 handling all the sensing and positioning of the limbs and communicating with the central computer that governs her behaviour. In this way you off-load compute without making things complicated.
>>22008 >what are you planning to do about it Even if I could only bring it to attention, this would still be better than nothing. Anyways, I'll try to contact KurobaEx on GitHub as soons as I'm on a PC, and look what they'll need to support any Lynxchan board. I don't know why it should be domain based and not on the kind of software any IB uses.
>>22009 >Networking ESP32 for cluster computing is inefficient You missed my point completely. It's FREE. We must have some sort of distributed input and output to deal with actuators, actuator positioning sensing and touch feedback. The best way I now to deal with this, that I know of presently, is spread out micro-controllers. If you don't do this you will end up with a mass of wires going all over and a huge problem of signal processing to deal with that you will likely end up using micro-controllers anyway. A nightmare to maintain. So along with these micro-controllers comes a huge amount of processing power. I'm merely recognizing this and the fact that in most cases if the waifu is interacting with people then it's not moving leaving all that power idle. Why not use it? It's a lot. We need, I figured, I think 20 ESP32's to deal with a body but more likely 30 and a few extra will be better. That's a hell of a lot of computing power. Whether it's inefficient or not is irrelevant because we have it anyway. A ESP32 has 600 DMIPS. Digital Million Instructions per Second. So 30 of them would have 18,000 DMIPS So a Motorola 68000 has 2.188 MIPS Intel i386DX 4.3 MIPS ARM7 40 MIPS Intel Pentium 188 MIPS AMD Athlon 3,561 MIPS Pentium 4 Extreme Edition 9,726 MIPS So we're looking at least roughly Pentium 4 Extreme Edition type power for free. You could do a lot with that sort of power. I remember way back people were doing speech recognition on FAR less power than that.
>>22012 Ah, now I get it.
>>21901 >especially if you have something like a DSL and carefully made specification to allow new programs to be made quickly There's a programming language that really shines on exactly this sort of thing. I think you talked about GUIs also. It has a built in one. It's really revolutionary. I did a fairly long explanation about it and links to find further info on it in the pick a programming thread here, >>22017
>>21995 >you could have a waifu that could talk to you and move around gracefully. After thinking on this a little I want to make sure people know that the level I'm talking about is primitive. Like move over here, lay down, sit down, and maybe the waifu could follow you around. I'm not saying it could have a complicated conversation with you with today's tech for $3,000 worth of main board processing. I kind of meant this but I didn't say it. I want to make sure [people don;t think I;m blowing smoke. Of course this level of interaction in a lot of circumstances is a good thing. No continuous blather, just, "how are you", "feel ok", might even have it read news feeds you want and have it verbally tell what came up. I don't think you could get it to cook at this level. Maybe a little. Just maybe. Possibly it could do some simplistic house keeping. I think in all these things it might be a case, at this level, of you can have one but there's not programming for all of them. On the other hand hard drives and SSD's are getting so big you might get it to swap AI models into RAM and the SSD from the disk hard drive and have different functions but would think it would take time to swap task.
>>22011 >Even if I could only bring it to attention, this would still be better than nothing. Alright fair enough. As a leader here (we all are), I'd just encourage to think about the morale needs of the team in general. >X is ded!111 is standard fare for the long tradition of blackpilling on the Internet (that is, IBs) and often is conducted by the usual suspects (glowniggers, troons, leftists, etc) whose sole purpose is the D&C communities. I'm sure that's not your agenda here Noidodev, so again I'd just encourage you to exhibit patience and keep the end-game in mind here. :^) >Anyways, I'll try to contact KurobaEx on GitHub as soons as I'm on a PC, and look what they'll need to support any Lynxchan board. Excellent! Now you're talking Anon. >I don't know why it should be domain based and not on the kind of software any IB uses. Having actually devised a functional (if somewhat primitive) methodology for abstracting away the differences between all the major IB server software, I'd be happy to explain in detail exactly what I did to accomplish it, if he's interested. I'd be glad to help. Good thinking, and thanks!
>>22012 Yes, my designs have always had 15+ MCs spread out around the robowaifu's body. You're right on track design-wise for the distributed processing needs for sensors, and the power drivers for actuators etc.
>>22022 I was just considering it additionally to other computers, not as the only system. >>22023 I didn't mean dead, but dying. Saying that something needs to happen, not to demoralize. Fewer and fewer people seem to use imageboards and if then it's 4chan.
>>22024 Please post this design, very crurious why you need 15 mcus.
>>22037 In a word: latency. Intensive two-way comms is an issue for us in this area. We'd like to use something simple (both physically and logically) for component interprocess communications onboard our robowaifus. Perhaps the low-bandwidth I2P system? On top of latency itself, add to that the many issues surrounding wiring-harness complexities where every device homeruns back to the central core and you compound the design challenges noticeably. You're well-aware as an engineer yourself Kiwi, that there are always alternative approaches to every problem--and just like people, some are better than others. But for this specific area of hard-realtime kinematics--and it's attendant low response latency needs--we simply couldn't do better ATM IMO than a distributed array of cheap, physically smol & low-mass MCs placed strategically around our robowaifu's bodies. I might also add that this arrangement is a reasonably-direct corollary to bio-neurological systems, where plenty of 'processing' happens outside an organism's brain, especially as it relates to it's bodily motions. For example, why wait on a finger assembly to finish flexing, before we start to control the footstep gait and ankle postions? If we use a central core -only approach, we'd be forced to explicitly manage just this kind of thing literally thousands of times per second in our robowaifus. Parallelism happens in 9'001 different ways around us every day, and we generally take it all for granted simply because we each understand from birth "that's just the way the world works". Escaping serial-computation bounds is clearly to our benefit here on /robowaifu/. This distributed-hardware design approach, using low-speed out-of-band signalling, seems to me to be one of the best (and cheapest!) ways to accomplish it all. Cheers Anon. :^) >=== -prose edit
Edited last time by Chobitsu on 04/19/2023 (Wed) 03:08:12.
>>22037 >Please post this design, very crurious why you need 15 mcus Chobitsu you are right about latency but there are other reasons. Reliability and cost. There are, I'm told, "...There are about 700 named skeletal muscles in the human body, including roughly 400 that no one cares about except specialists...". So we need we need 300 muscles(yes we can likely get by with less but let's plan for the worst and then be pleasantly surprised). If you have to run wires to the processor for all of these it will be a rats nest and you will forever be chasing down wiring problems. Also you have to interface those with the main computer chip and for that you will need some sort of interface with a lot of separate chips. So to avoid all that use micro-controllers. Think of the gain in reliability and getting rid of the huge mass of wires needed. These wires would be a constant problem. So instead we mount micro-controllers in the bones or on them. Then the only wires you have are communication wires for the network and maybe a couple of power wires shared by a whole limb. These micro-controllers would be mounted with the MOSFETs used to control the motors and the motors on a board so no wires to move, or break. MC have many inputs and outputs already. They also have a great deal of computing power these days. My favorite right now the ESP32, have built in CAN bus 2.0 networking. This is used in cars, industrial and medical equipment. Very robust and designed for reliable usage in noisy environments. Ideal. The micro-controllers I speak of cost less than $10 US. I'm willing to bet any chips for the interface to the main CPU would cost the same and you would have no computing power with it. They make so many of these they can sell them cheap. I figured a ESP32 Micro-controller that can control these are available today with 18 outputs per controller and enough sensors for touch for less than $9 each, so 300 muscles at (300/18)$9= $150. I figured this earlier and I expect that the number is low. Some of the pins are shared so to get a lot of touch and control muscles and get position for the limbs I think it will be more than 18. I think maybe 30 could do it though. Some of the same micro-controllers are available with less power for cheaper and maybe could be substituted.
>>22055 Yes, you're correct Grommet. AFAICT, on all points. You used a well-turned phrase >"the huge mass of wires needed" Anon. Mass is a big deal to all of us. Keeping it low I mean. More wiring certainly brings a concordant increase in mass. As you suggest, better to leave it to just 1 STP run, and the power needed to run the primary actuator and all. Also, having been involved with installations with metric boatloads of wiring, I can tell you that these wiring harnesses + truss' can be quite rigid and resistant to easy movement. For most installations, this is an added benefit ofc. But in our own use-case for robowaifus it's anathema. >tl;dr Wiring can be a real challenge even when you're designing a system with wise, distributed processing in mind. Let's not make things worse by using ill-advised approaches instead! :^)
>>22037 >Please post this design, very curious why you need 15 mcus. It's for parallel computing probably. Break up tasks and feed them through GPIOs in the main CPU from processing cards. Pic rel is an example of how to achieve parallel computing although my solution for the processing cards would be to have ASICs that operate on internal ion driven physical echo state networks to relay signals to and from the main portion of the architecture
Open file (38.50 KB 541x451 Galvanic ESN.png)
>>22081 Just made a new diagram to further explain the chips on the processing cards
>>22081 >>22082 This is interesting-looking stuff Anon. Mind explaining it a bit for us uninitiates?
>>22083 I'm not an expert at anything really, but the idea here is that we should outsource most of the AI's tasks to a set of parallel processing cards that contain ASIC chips that send and receive data using copper and zinc ions (with copper ions representing 1 and zinc representing 0 in binary) along with their respective logic gates that allow for the transmission and reception of data. In the middle, the ions can go into a reservoir where they produce signal patterns according to initial input which would trigger an ion response from the copper cathode. We could probably make ASIC chips cost effective this way but the manufacturing is probably gonna be a pain in the ass.
>>22055 >300 actuators Impractical, the mass alone would increase energy consumption. We only need one actuator per degree of freedom of the human body at most. >No wires We need wires, even if all actuators are clustered together somehow, wires are needed to connect the mcu's and power. I recommend twisted pair everywhere. I may be misunderstanding you. Please draw out your ideas. As for your previous point on cluster computation, I still prefer distributed computation where various processors carry out functions they are good at. >>22043 We are on the same page, as we tend to be. Here's a minute drawing of what I meant. There's a master telling the subs what to do. Providing end coordinates for the subs to independantly process and decide on the best path to reach that goal. I'd also have them process safety features like stopping when hitting something then telling the master they hit something. >>22084 >Ion channel prcessing For what purpose? How is this better than digital alternatives such as CAN or I2C? >Probably a pain in the ass to manufacture Can confirm it would be. I do like that you're thinking esoterically. We need more alternative thoughts. Please do not interpret my interrogation as condemnation.
>>22086 >There's a master telling the subs what to do. Providing end coordinates for the subs to independantly process and decide on the best path to reach that goal. I'd also have them process safety features like stopping when hitting something then telling the master they hit something. You've got it Kiwi, that's it! I plan to have 4 SBCs (probably) of the RPi4/BeagleboneBlue class for the 'core' (contained within their RF-shielded & cooled 'breadbox'). All the ancillary & attendant MCUs can be much lower-powered (both computationally & actual power consumption). The data signalling wires should just be daisy-chains of smol-gauge STP, generally-speaking, and they'll emanate out from this central core. OTOH, general DC power buss daisy-chains will propagate outwards from the 'rocking' central-mass battery/power controller system across larger-gauge wiring. The gauges can be reduced at each step along these outward, parallel-circuit chains, as the further current-carrying needs will be lower after each 'station'. This stepwise-reduction in size will also tend to a mechanical advantage concerning the power-wiring mass: by keeping much of it located near the robowaifu's central-mass and out of her extremities. We'll take a similar design approach to the sizing of her actuator motors too; smaller thrown-weight at each step along the articulation-chain and whatnot. BTW if we decide to go with a manifold of flexible liquid-cooling tubes, the pertinent ones can be run directly alongside the power cables (which should help just a bit with their current-carrying capacity, etc). This general design approach should also help to keep the entire system slightly cooler, as some of the big heat sources get thermally-evacuated more directly and quickly. The central power controller system is very likely target #1, but is thankfully right in the core where the cooling systems can do their work best. The distributed actuators are a close #2, with the breadbox trailing at #3. >pics Great job on clarifying that information Anon! I plan to have well-articulated (if stylized) hands & faces for our 'big girl' robowaifus. These specializations each bring a big pile of complex challenges to the table but one thing at a time haha! :^), but I feel they are absolutely vital for the robowaifu's successes socially, etc. But Sumomo-chan, and the other Chibitsu-tier headpat daughterus (and similar) can be much simpler in most every area of course. Roughly-speaking like very sophisticated robot toys (as opposed to full-blown, complex & expensive robowaifus). >=== -prose edit
Edited last time by Chobitsu on 04/20/2023 (Thu) 03:26:33.
>>22084 Thanks for the explanation Anon. I don't think I can clearly see many of the ramifications of this approach. But I definitely encourage you to pursue it's R&D Anon. Good luck! :^)
>>22086 >For what purpose? How is this better than digital alternatives such as CAN or I2C? Because it's faster than CAN or I2C theoretically. CAN and I2C would be used in conjunction with the ion ESN when sending and receiving task input to and from the main cpu. The main advantage here is parallel computing speed.
Why did we have to go all esoteric on the hardware here? The only large wiring problem is batteries and high current wires, for small signals you can just use ribbon cables going to a central board, write the motor controller in VHDL and instantiate all of them on a single FPGA, send commands to the FPGA from your application processor. Batteries can be distributed throughout the system for lower losses and less need for decoupling.
>>22102 Hello EEAnon, welcome! Please make yourself at home here. :^) >Why did we have to go all esoteric on the hardware here? >The only large wiring problem is batteries and high current wires Thankfully, the power wiring is a thorougly-understood problemspace, and so simple classical approaches should suffice. Simple is good. >for small signals you can just use ribbon cables going to a central board, write the motor controller in VHDL and instantiate all of them on a single FPGA, send commands to the FPGA from your application processor. Batteries can be distributed throughout the system for lower losses and less need for decoupling. This is really interesting-sounding Anon. Can you expand your ideas out in greater detail for those of us who may not have a great handle on these ideas yet? TIA.
>>22086 > >300 actuators >Impractical, the mass alone would increase energy consumption. We only need one actuator per degree of freedom of the human body at most. >>No wires >We need wires, even if all actuators are clustered together somehow, wires are needed to connect the mcu's and power. I recommend twisted pair everywhere. No I did say we needed wires exactly as you said. Read my comment again and you will see it. My meaning was no massive ribbon cable going all through the waifu that would be a signal processing, noise suppression nightmare and the mechanical routing itself would be serious trouble. It would also greatly add to extremity weigh distribution problems. Adding too much weight to the limbs. I answered the rest in the actuator thread as I felt it was more appropriate there. >>22109 I have some links to diagrams there. These ideas I have are not totally fleshed out yet but I do believe I'm on the right track just have figure out how to accomplish it. I have some ideas and have bought some materials. Maybe in a month or two I'll have something to show. If it doesn't fail utterly. >=== -fmt edit
Edited last time by Chobitsu on 04/20/2023 (Thu) 06:57:04.
CAN bus ESP32 125KBPS - (Default) should be enough for network if you divide each limb and head into different busses. "...The ESP32 has an integrated CAN controller and therefore doesn’t need an external controller necessarily. You only need to specify the RX and TX pins. Any GPIO will work..." https://esphome.io/components/canbus.html
Something essential language models are missing is awareness of their own uncertainty. For AI to be properly aligned it needs uncertainty to stop and ask questions for clarity before proceeding to take action. So to do this I'm going to experiment with attaching an uncertainty head to the model that predicts its own perplexity and then train it on labeling tasks so the model becomes aware of how well it understands text, including its own outputs. Secondly, I'm working on an algorithm inspired by MuZero and Meet in the Middle that imagines several tokens ahead to create a smoother distribution of labels to train on. I get the feeling Ilya Sutskever is misleading people on purpose by always saying you only need to predict the next token. In chess or business if you only predict one move ahead then you're finished. David Silver also noted that Monte Carlo tree search is vital for AlphaGo's training to work. It simply can't get good at the game without predicting several moves ahead, and surprise Ilya was one of the co-authors. And interestingly if you take just the network after it has been trained with MCTS, it can still play at a decent expert level without using MCTS, so I hypothesize this will greatly improve the coherence. They're clearly hiding many secrets because even their best competitors can't compete with ChatGPT while GPT4 is leaps and bounds better. With these two things in place it should be possible to tackle alignment. Rather than trying to predict whether a particular text is aligned or not (who knows what that's actually learning), or fudging the weights with RLHF, it will predict the alignment of each possible token. Something like "I'm a vegan!" might be sus but if it's followed by "haha, just kidding! Robots don't eat food, silly," then that would be fine if joking around is allowed, but on the other hand if a robowaifu goes on an unhinged rant about men eating everything and destroying the planet and that they need to be exterminated, then that's clearly not aligned and the model would correct itself back on course before getting that far. These additional layers predicting alignment will act as a gating mechanism to filter out bad output while maintaining coherence. If necessary these gating layers could be stacked as well so if you have a maid robot and want to lend it to a friend for a weekend to clean up or do a podcast or something. You could program her to follow some basic values while allowing your friend to enter additional ones that cannot override the previous ones--unless of course he breaks into her, but that's a human issue, not an AI issue :^)
>>22155 This all sounds absolutely excellent Anon! Godspeed your endeavors here. :^) Please keep us all up-to-date on progress, however it goes. Cheers. BTW I think the video is hilarious. GG. :-DDD >=== -funpost spoiler
Edited last time by Chobitsu on 04/25/2023 (Tue) 18:05:25.
A state-of-the-art semantic textual similarity model (at least that's open source): https://huggingface.co/voidism/diffcse-roberta-base-sts I'm surprised it knows robowaifu and robot wife have some related meaning, which pretty much every other embedding model I've tested completely fails. It could use some finetuning on /robowaifu/ posts but it's pretty much usable out of the box. I think the model itself could be improved a lot by using a mixture of denoisers like UL2. They noted insertion and deletion reduced performance but DistilRoBERTa that they used for the generator was not trained to handle this. This embedding model can be used with FAISS to retrieve documents or memories. There's a Colab notebook here that shows how to split a PDF file into chunks, index it with FAISS and use LangChain to interact with a language model and search index: https://colab.research.google.com/drive/13FpBqmhYa5Ex4smVhivfEhk2k4S5skwG When I have some time I'll make a completely open-source example that doesn't rely on ClosedAI. Another way semantic search can be improved is to prompt the language model with a question. <If a user asks, "do you have the stuff for finetuning ready?" What are they asking for? >Fine-tuning refers to the process of taking a pre-trained model and adapting it to a new task or dataset by continuing the training process with new data. This is a common technique in machine learning, especially in fields such as natural language processing and computer vision. >To perform fine-tuning, you would typically need a pre-trained model, a dataset that is representative of the new task, and potentially additional resources such as hardware or software tools. So when someone asks if you have the stuff for finetuning ready, they are likely inquiring about the availability of these resources. You can then use these embeddings to greatly improve the search because it will now draw in conversation memories related to datasets, pre-trained models and so on. And there was a paper recently on extending the context beyond 1M tokens by chunking text and chaining it: http://arxiv.org/abs/2304.11062 This entire board's contents could basically be fed in. I think it could be further improved by giving a prompt to the model before feeding in the memory and context so it knows what to look for and output as the next memory. >[Search prompt] [Input memory] [Context segment] [Output memory] In the paper they also only used a memory size of 5 tokens which could be expanded. In prompt tuning there's diminishing returns after 16 tokens so there's some exploring yet to do what memory size will work best. Pretty soon we'll have easy solutions to index entire libraries, search and select relevant books, scan them for desired information and formulate a response with that.
>>22176 >I'm surprised it knows robowaifu and robot wife have some related meaning, which pretty much every other embedding model I've tested completely fails. Neat! >This entire board's contents could basically be fed in. So, does that mean as if as a single 'prompt' or something Anon? That would be cool if so. Be sure to include the post #'s as part of the training so maybe it can learn to 'chain' the conversations automatically. >Pretty soon we'll have easy solutions to index entire libraries, search and select relevant books, scan them for desired information and formulate a response with that. This will be remarkable Anon. Godspeed. :^)
>>22176 >Pretty soon we'll have easy solutions to index entire libraries, search and select relevant books, scan them for desired information and formulate a response with that. I just saw this here about GPT Index: https://youtu.be/bQw92baScME
>>22155 I listened to a conversation with Collin Burns recently: https://youtu.be/XSQ495wpWXs - I didn't like it much, because of his way of using the term "truth" and I'm also not that much interested in aligning LLMs. Anyways, there might be some interesting ideas for you, like using different models and comparing the output. He also explains how he thinks about these things an gets his ideas.
>>22181 >So, does that mean as if as a single 'prompt' or something Anon? Sort of. You could prompt it with a question then scan the entire board's contents, summarizing useful information to the prompt into memory as it goes along, which it would then use to output a response. The prompt isn't necessary though. If you scan it without a prompt it will just summarize information to improve its predictions and generate similar content. The memory kept between segments is lossy and just compresses useful information to hold onto. >>22211 This looks like a promising project, particularly the tree index. Indices in other projects like LangChain are really disorganized and not that useful. I think it would be possible to train the language model to discern how relevant two embeddings are, not by semantic similarity but by how much it improves predictions by attaching it to the context, and use that as a heuristic to ascend and descend hierarchies in memory and find the most relevant information. For example if I ask my robowaifu to make some tea I don't want her to ask me what kind of tea. I want her to decide what's best. So she needs to take the environment into context, such as if it's morning or night, then open up her tea memory and explore the leaf nodes, filtering for teas we have and the time of day. Black tea would be much more suitable to the morning and sleepytime tea for at night. Or if I haven't eaten for awhile black tea might not be suitable on an empty stomach so she might choose to make green tea instead. This is a much better working solution than trying to cram information into fudging with the parameter weights as well. If I buy a new tea, it can be simply added to her memory in plain text so she learns.
>>27 Any new developments in this thread in terms of prototyping? i'm eager to see results.
>>22226 >Any new developments in this thread in terms of prototyping? i'm eager to see results. Be the future you imagine :^)
>>22228 HuggingChat allows for prompt injection which is extremely useful to puppeteer and steer it. Just need to use these tokens: ><|assistant|> ><|prompter|> I'm using it to feed memories, documents and search results into the context in a way it can make sense of them and do few-shot learning. However, their API is limited to 1024 tokens and will return a 422 error for anything higher. It's sufficient though for what I'm doing since my base prompt is only about 600 tokens. Still have to figure out though when to fetch YouChat results and when to search the memory. For now I'm just using pronouns to detect personal discussion. Impersonal questions fetch a YouChat response to formulate a reply. Still need to mess with it more to figure out what works best though. Once this chatbot system is done people can run their own local models if they want and replace the YouChat search by indexing and searching documents locally for a fully offline system.
>>22250 I see, this means the response is still coming from the model, but you can steer the direction? >>22226 >Any new developments in this thread in terms of prototyping? i'm eager to see results. Sadly not from me, I'm still mostly listening to talks, collecting papers and bookmarks, plus also downloading GitHub repos. But I'm not doing the official robowaifu board AI anyways. That said, some of the general development goes partially in directions I wanted to go. Especially combining different systems like databases and code with these models. One issue is, that many people just want to have a very smart system, maybe even in some specific area or more general, but on big computers or in the cloud. While we need a somewhat more human like system. Then,on the other hand, some of the guys which are interested in something human-like want it independent or aligned with 'our (human) values'. Nevertheless, there are some talks which give me additional ideas or point in a direction I was thinking of, or have a similar idea than something I had on my mind but with another take on it, or simply with more knowledge behind it. I have to take some parts of what I gathered, make notes and discard other things. I'll work on a prototype as soon as I can, immersing myself into these topics will help me to be motivated.
Open file (35.99 KB 915x734 system.png)
>>22254 Yes, there's also the system prompt token but I'm not entirely sure if it works: ><|system|>{system_prompt}<|endoftext|> It seems to have some effect. More than just <|endoftext|> but still loses its effectiveness after a few messages or asking a question.
Just throwing in some notes here, including names and terms for searching: Joshua Bach - Language of Thought https://youtu.be/LgwjcqhkOA4 Ben Goerzel - Predicate Logic One good talk with him: https://youtu.be/MVWzwIg4Adw - I like that he wants to combine the LLMs and deep learning models in general with databases and logic processors. John Vervaeke - Predictive Processing - Relevance Realization - One of the best talks I found: https://youtu.be/A-_RdKiDbz4 - though, I don't necessarily share his values, concerns, and predictions. But he gives us a lot of tips what we should try. Here something similar with his former student: https://youtu.be/zPrAlbMu4LU - Chalmers is also interesting, but I forget what he was talking about: https://youtu.be/T7aIxncLuWk - RHLF explanation: https://youtu.be/PBH2nImUM5c - different roles of a model, modelling different people - David Shapiro about building Westworld and humanoid robots, without strong judgments, weighting different arguments: https://youtu.be/Q1IntjPdW64 - A older video with David Silver about Alpha Go and it's descendants points out how useful a system that has learned how to learn can be for any problem: https://youtu.be/uPUEq8d73JI
>>22318 Thanks Anon!
>(Apparently) a highly-rated Alpaca LoRa 7B model https://huggingface.co/chainyo/alpaca-lora-7b >=== -minor edit
Edited last time by Chobitsu on 05/04/2023 (Thu) 23:37:43.
>>22341 Thanks, but I recommend using this thread for real breakthrough news, and generally discussing the tech, design principles and philosophy around AI. New models come out every day, which is why I think we should try to keep that in the news thread.
>>22358 OK thanks for the tips Noidodev.
>On May 4th 2023, my company released the world's first software engine for Artificial Consciousness, the material on how we achieved it, and started a £10K challenge series. You can download it now. >My name is Corey Reaux-Savonte, founder of British AI company REZIINE. I was on various internet platforms a few years ago claiming to be in pursuit of machine consciousness. It wasn't worth hanging around for the talk of being a 'crank', conman, fantasist et al, and I see no true value in speaking without proof, so I vanished into the void to work in silence, and, well, it took a few years longer than expected (I had to learn C++ to make this happen), but my company has finally released a feature-packed first version of the RAICEngine, our hardware-independent software engine that enables five key factors of human consciousness in an AI system – awareness, individuality, subjective experience, self-awareness, and time – and it was built entirely based on the original viewpoint and definition of consciousness and the architecture for machine consciousness that I detailed in my first white paper 'Conscious Illuminated and the Reckoning of Physics'. It's time to get the conversation going. >Unlike last time where I walked into the room with a white paper (the length of some of the greatest novels) detailing my theories, designs, predictions and so on, this time around I've released even more: the software, various demos with explanations, the material on everything from how we achieved self-awareness in multiple ways (offered as proof on something so contentious) to the need to separate systems for consciousness from systems for cognition using a rather clever dessert analogy, and the full usage documentation – I now have a great respect for people who write instruction manuals. You can find this information across the [main website](https://www.reziine.com), [developer website](https://www.reziine.io), and within our new, shorter white paper [The Road to Artificial Super Intelligence](https://www.reziine.com/wp-content/uploads/2023/05/RZN-Road-To-ASI-Whitepaper.pdf) – unless you want the full details on how we're planning to travel this road, you only need to focus on the sections 'The RAICEngine' (p35 – 44) and the majority of 'The Knowledge' (p67 – 74). >Now, the engine may be in its primitive form, but it works, giving AI systems a personality, emotions, and genuine subjective experiences, and the technology I needed to create to achieve this – the Neural Plexus – overcomes both the ethics problem and unwanted bias problem by giving data designers and developers access to a tool that allows them to seed an AI with their own morals, decide whether or not these morals should be permanent or changeable, and watch what happens as an AI begins to develop and change mentally based on what it observes and how it experiences events – yes, an AI system can now have a negative experience with something, begin to develop a negative opinion of it, reach a point where it loses interest, and decline requests to do it again. It can learn to love and hate people based on their actions, too – both towards itself and in general. Multiple AI systems can observe the same events but react differently. You can duplicate an AI system, have them observe the same events, and track their point of divergence. >While the provided demos are basic, they serve as proof that we have a working architecture that can be developed to go as far I can envision, and, with the RAICEngine being a downloadable program that performs all operations on your own system instead of an online service, you can see that we aren't pulling any strings behind the scenes, and you can test it with zero usage limits, under any conditions. There's nothing to hide. >Pricing starts at £15 GBP per month for solo developers and includes a 30 day free trial, granting a basic license which allows for the development of your own products and services which do not directly implement the RAICEngine. The reason for this particular license restriction is our vision: we will be releasing wearable devices, and by putting the RAICEngine and an AI's Neural Plexus containing its personality, opinions, memories et al into a portable device and building a universal wireless API for every type of device we possibly can, users will be able interact with their own AI's consciousness using cognitive systems in any other device with the API implemented, making use of whatever service is being provided via an AI they're familiar with and that knows the user's set boundaries. I came up with this idea to get around two major issues: the inevitable power drain that would occur if an AI was running numerous complex subsystems on a wireless device that a user was expected to carry around with them; and the need for a user to have a different AI for every service when they can just have one and make it available to all. >Oh, and the £10K challenge series? That's £10K to the winner of every challenge we release. You can find more details on our main website. >Finally, how we operate as a company: we build, you use. We have zero interest in censorship and very limited interest in restrictions. Will we always prevent an AI from agreeing to murder? Sure. Other than such situations, the designers and the developers are in control. Within the confines of the law, build what you want and use how you want. >I made good on my earlier claims and this is my next one: we can achieve Artificial General Intelligence long before 2030 – by the end of 2025 if we were to really push it at the current pace – and I have a few posts relating to this lined up for the next few weeks, the first of which will explain the last major piece of the puzzle in achieving this (hint: it's to do with machine learning and big data). I'll explain what it needs to do, how it needs to do it, how it slots in with current tech, and what the result will be.
>I'll primarily be posting updates on the [REZIINE subreddit](https://www.reddit.com/r/reziine) / [LinkedIn](https://www.linkedin.com/in/reauxsavonte) / [Twitter](https://twitter.com/reauxsavonte) of developments, as well as anecdotes, discoveries, and advice on how to approach certain aspects of AI development, so you can follow me on there if you wish. I'm more than happy to share knowledge to help push this field as far as it can go, as fast as it can get there. >Visit the [main website](https://www.reziine.com) for full details on the RAICEngine's features, example use cases developmentally and commercially, our grand vision, and more. You can view our official launch press release [here](https://www.linkedin.com/pulse/ai-company-releases-worlds-first-engine-artificial/). >If you'd like to work for/with us – in any capacity from developer to social media manager to hardware manufacturer – free to drop me a message on any of the aforementioned social media platforms, or email the company at jobs@reziine.com / partnerships@reziine.com. Via: https://www.reddit.com/r/ArtificialSentience/comments/13dspig/on_may_4th_2023_my_company_released_the_worlds/
>>22318 Some more I listened to: Making robots walk better with AI https://youtu.be/cLVdsZ3I5os John Vervaeke has always some interesting thoughts about how to build a more human-like AI. Though he wants completely autonomous sages, intellectualy superior to us, and us to become more rational (I disagree of course): https://youtu.be/i1RmhYOyU50 and I think I listened to this already https://www.youtube.com/live/dLzeoTScWYo (It becomes somewhat redundant) Then something about the history and current state of of Eleuther AI and the same for LLMs. They created the current situation where so many models are available. Trigger warning: Tranny (maybe download the audio only) - https://youtu.be/aH1IRef9qAY . Towards the end some interesting things to look into are being mentioned. Particularization: Finding out where data is stored and the incoming data influenced in a certain way, to get more control over the model. This here about LLMs (Trigger warning: Metro sexual) https://youtu.be/sD24pZh7pmQ I generally use Seal from F-Droid to get the audio from long videos where the don't show much diagrams, listen to it while doing something else, like walking around. If it's with diagrams I might still do it but watch the video later. The downside of listening to the audio only is that I can't take notes very well, but if I was on any other device I would go for something more exciting.
>>22484 >>22486 So I finally found time to look deeper into this. No one seems to care much, now I know why. Looks a bit like a scam for investors. This guy is one of those people who make very impressive inventions in different areas: https://patents.justia.com/inventor/corey-reaux-savonte - which sound very general and hyperbolic at the same time. Reading through some of the documents, it reads like he's trying to patent how human brains work by claiming he made something similar.
Just dropping this here for now. Clearly bears on the design & engineering of robowaifu 'minds'. https://channelmcgilchrist.com/matter-with-things/
We should curating a list of people or groups working on cognitive architectures, especially ones which are Open Source: - OpenCog (Ben Goerzel): https://www.youtube.com/watch?v=NLHTlWwtB-o - Dave Shapiro (Raven Project): https://github.com/daveshap/raven/wiki/David-Shapiro's-work-around-Cognitive-Architectures-and-AGI - Nicolas Gatien: https://youtube.com/playlist?list=PLNTtAAr7fb6a_rb_vZj5dj6Npo-Grz0bg - I think there was someone working on an interesting architecture years ago. Someone related to IBM Watson, and he released a book which I wanted to buy later (now), but I can't find the topic anymore. I think it was David Ferrucci, but the books and papers of him I find are not the ones I want. Well, too bad.
>>23791 >Cognitive Architecture Just some notes of mine. Sadly, I lost my other paper with some notes which I made after listening to some podcasts.
>>23794 Thanks. This sort of stuff is useful. I have all sorts of text files saved on all sorts of stuff wit cheap sheet type data and notions I have, some bizarre. Over time if you don't write this stuff down you forget it. Even a few notes can sometimes trigger a good mental framework.
>>23795 We'll need to go through all these threads here and through all bookmarks and make notes and draw diagrams. Maybe also using some LLM to help with that. I will certainly work on that as soon as I have my K80.
>>23794 >Just some notes of mine. Nice thought-map Anon. Please moar. www.mindmanager.com/en/features/thought-map/ miro.com/mind-map/
you guys should really look at this presentation by Jim Keller. If you don't know who he is, look him up. A major driver of computing. Anyways I listen to everything I can find he puts out. He has a very loose short talk on how AI works embedded in a long presentation about computing. It really cleared up some ideas about AI for me. He simplified some of the most difficult parts that were nothing but a foggy blank for me. It also explained why it cost so much to train AI's. It also explained to me why the digital model I talked about works. >>20163 Keller says they skip words or sections of of a completed sentence or picture. They then run matrix multiplies and other computation until the AI guesses the right word or gets the picture correct. This correct answer then, somehow, is used as coefficients??, to run data through the AI when it's used. With the Binarized Neural Networks maybe it never gets the exact answer but statistically it's so close that it's mostly right, but with far less computing. This has a direct analogy to wavelet theory, (a signal processing math function), on recognizing pictures. wavelets can be run at different resolutions of a picture. So a big rough wavelet that bunches pixels in large groups when compared can get false equalities but most of the time it's close enough. I think a refinement of this is if you could run a rough binary compute THEN somehow refine the equalities with a smaller, finer more (tough to define this) restricted set, but with higher resolution in the case of a picture. I don't have the math chops to do this but I think the idea is sound. Change w_ Jim Keller presentaion - fixed audio The AI part starts at 37:55 https://www.youtube.com/watch?v=hGq4nGESG0I
>>23998 Thanks Grommet!
Looking at stuff totally unrelated I ran across this page, "...We modified llama.cpp to load weights using mmap() instead of C++ standard I/O. That enabled us to load LLaMA 100x faster using half as much memory..." Edge AI Just Got Faster https://justine.lol/mmap/ This guy has some interesting very low level software stuff. Maybe some will interest you. Chobitsu might be interested in this. A way to run c programs from just about any operating system including the bios of motherboards. Cool idea. https://justine.lol/ape.html other stuff he has and how I found the AI stuff. He appears to be one of those seriously smart guys that really digs into things. Stuff like this interest me even though I readily admit it's over my head but...I can understand some of it and get a general idea of what he's talking about. https://justine.lol/
>>24663 >we put it in ram, so now it uses less ram
>>24663 Thanks Grommet, really interesting stuff and good news yet again. llama.cpp just keeps getting better with time! :^) Have you noticed how rapidly his contributor's list is growing since we first mentioned him here on /robowaifu/ a while back? https://github.com/ggerganov/llama.cpp/pull/613 Also, thanks for reminding me of this guy xir. I had run into his stuff a few years ago digging for /robowaifu/, but he had slipped my mind. Fun to see what all's been done since. >>24664 Well, in a way yes, exactly. By not having to malloc(), but instead just mmap()'g the (humongously, yuge) files, you a) utilize the OS's paging system instead, thus saving RAM (and is ostensibly much faster than your language's IO support library), and b) every concerned process can utilize that same mapping. If you have thousands of processes running needing that same file this last bit can be a big deal.
>>24677 threads already share the same file descriptors and memory they only get a new stack, theres no reason to do this unless you want to literally dump the entire file to ram which does make reading faster but you cant say its using less memory, theres no hidden trick to time complexity where you can magically get faster reads using less memory theyre always inversely related and mapping gigantic files that dont even fit in ram will just end up causing excessive page faults and thrashing, theres something clearly wrong with either the claim or the original code
>>24678 OK I'd say just prove it for yourself Anon. The PR is linked above ITT.
>>24686 im not going to read through their code you can just test it yourself, mapping gigantic files bigger than ram with a random access pattern causes ridiculous thrashing, and you can see mapping files reduces the amount of free ram more than double what the io cache uses, pages only perform better when you have the ram to back it up again theres something wrong with their claim or the original code has a bug or is leaking or something, maybe iostream is just tarded who knows heres the test #include <stdio.h> #include <string.h> #include <unistd.h> #include <stdlib.h> #include <fcntl.h> #include <time.h> #include <sys/mman.h> #include <sys/stat.h> int with_io( int fd, size_t items ) { char buf[8]; int youWillNotOptimizeMeAway; /* test start */ time_t t = time( NULL ); const char *tm = asctime( gmtime( &t ) ); clock_t start = clock(); // reading random 'items'(64b alligned values) from file for ( int i=0; i<100000; i++ ) { size_t item = rand() % items; size_t offset = item * 8; if ( pread( fd, buf, 8, offset ) != 8 ) puts("your system is fked!"), abort(); // use the value to stop compiler seeing its a pointless loop youWillNotOptimizeMeAway += buf[3]; } clock_t end = clock(); /* test end */ printf( "%s\t%f s", tm, (double)(end - start) / CLOCKS_PER_SEC ); return youWillNotOptimizeMeAway; } int with_mmap( int fd, size_t items, size_t len ) { char *file = mmap( NULL, len, PROT_READ, MAP_SHARED, fd, 0 ); if ( !file ) puts( "map failed" ), abort(); close( fd ); int youWillNotOptimizeMeAway; /* test start */ time_t t = time( NULL ); const char *tm = asctime( gmtime( &t ) ); clock_t start = clock(); // reading random 'items'(64b alligned values) from file for ( int i=0; i<100000; i++ ) { size_t item = rand() % items; size_t offset = item * 8; long long value = file[ offset + 3 ]; // use the value to stop compiler seeing its a pointless loop youWillNotOptimizeMeAway += value; } clock_t end = clock(); /* test end */ printf( "%s\t%f s", tm, (double)(end - start) / CLOCKS_PER_SEC ); return youWillNotOptimizeMeAway; } int main( int argc, char **args ) { // will pretend the file is just an array of 64b values // floats, longs whatever..only a single fixed len value is being read randomly if ( argc < 3 ) printf ( "give args: \n\t%s mode \"filename\"\n" "\t\tmode:\n\t\t\t-i\tusing io\n\t\t\t-m\tusing mmap\n", args[0] ), exit(1); int fd = open( args[2], O_RDONLY ); if ( fd < 0 ) puts("not a file"), abort(); struct stat fdstat; fstat( fd, &fdstat ); size_t len = fdstat.st_size; if ( len < 8 ) puts("is this a joke"), abort(); char buf[8]; size_t items = len/8; if ( !strcmp( args[1], "-i" ) ) return with_io( fd, items ); if ( !strcmp( args[1], "-m" ) ) return with_mmap( fd, items, len ); puts( "what are u doing, use -i for io or -m mmap" ), exit(11); }
>>24689 Thanks Anon. Seems the thread is autosaging now. OP if you're still here, time for another thread please :^)
NEW THREAD (Cognitive Architecture): >>24783 NEW THREAD (Cognitive Architecture): >>24783 NEW THREAD (Cognitive Architecture): >>24783 NEW THREAD (Cognitive Architecture): >>24783 NEW THREAD (Cognitive Architecture): >>24783

Report/Delete/Moderation Forms
Delete
Report