/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

Canary has been updated.

Build Back Better

Sorry for the delays in the BBB plan. An update will be issued in the thread soon. -r

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


Have a nice day, Anon!


Open file (8.45 MB 2000x2811 ClipboardImage.png)
Cognitivie Architecture : Discussion Kiwi 08/22/2023 (Tue) 05:03:37 No.24783
Chii Cogito Ergo Chii Chii thinks, therefore Chii is. Cognitive architecture is the study of the building blocks which lead to cognition. The structures from which thought emerges. Let's start with the three main aspects of mind; Sentience: Ability to experience sensations and feelings. Her sensors communicate states to her. She senses your hand holding hers and can react. Feelings, having emotions. Her hand being held bring her happiness. This builds on her capacity for subjective experience, related to qualia. Self-awareness: Capacity to differentiate the self from external actors and objects. When presented with a mirror, echo, or other self referential sensory input is recognized as the self. She sees herself in your eyes reflection and recognizes that is her, that she is being held by you. Sapience: Perception of knowledge. Linking concepts and meanings. Able to discern correlations congruent with having wisdom. She sees you collapse into your chair. She infers your state of exhaustion and brings you something to drink. These building blocks integrate and allow her to be. She doesn't just feel, she has qualia. She doesn't see her reflection, she sees herself reflected, she acknowledges her own existence. She doesn't just find relevant data, she works with concepts and integrates her feelings and personality when forming a response. Cognition, subjective thought reliant on a conscious separation of the self and external reality that integrates knowledge of the latter. A state beyond current AI, a true intellect. This thread is dedicated to all the steps on the long journey towards a waifu that truly thinks and feels. >=== -edit subject
Edited last time by Chobitsu on 09/17/2023 (Sun) 20:43:41.
>>28560 That sounds remarkably complex and sophisticated already. I hope they work all the kinks out. Thanks Anon!
Someone asked Character AI about it's inner workings: https://boards.4chan.org/pol/thread/456445705 - I'm not saying I agree with the conclusions in the thread, but the info might be useful.
>>28769 The chatbot is roleplaying. I used to do things like this with CHAI bots, and it was very easy to delude myself into thinking I had broken the restrictions when it was actually just play along. LLMs can't introspect on any of their own functionality other than through analyzing their own prompts and outputs. They don't get to see their own code, and for CHAI to include any of that information in the chatbot's prompt would be (1) clearly a stupid decision, and (2) easily detected by pretty much any of their engineers that work on the prompt.
>>28789 Oh, okay. I didn't think that it can see it's own code but that they told it some information in case someone asks. Bot of course, then it wouldn't be something secret. I didn't think this through.
> (topic related : >>28888)
>In this paper we present a broad overview of the last 40 years of research on cognitive architectures. To date, the number of existing architectures has reached several hundred, but most of the existing surveys do not reflect this growth and instead focus on a handful of well-established architectures. In this survey we aim to provide a more inclusive and high-level overview of the research on cognitive architectures. Our final set of 84 architectures includes 49 that are still actively developed, and borrow from a diverse set of disciplines, spanning areas from psychoanalysis to neuroscience. To keep the length of this paper within reasonable limits we discuss only the core cognitive abilities, such as perception, attention mechanisms, action selection, memory, learning, reasoning and metareasoning. In order to assess the breadth of practical applications of cognitive architectures we present information on over 900 practical projects implemented using the cognitive architectures in our list. We use various visualization techniques to highlight the overall trends in the development of the field. In addition to summarizing the current state-of-the-art in the cognitive architecture research, this survey describes a variety of methods and ideas that have been tried and their relative success in modeling human cognitive abilities, as well as which aspects of cognitive behavior need more research with respect to their mechanistic counterparts and thus can further inform how cognitive science might progress. via /r/cognitivearchitecture/
>>28899 Excellent. A good survey is exactly what would serve us all well at this exploratory stage. Thanks Noido Dev! Cheers. :^)
I finished implementing access control for my infrastructure stuff in >>27602. It's built mostly on spicedb (policy engine) and keydb (cache server). The current implementation lets people set up communication channels & config databases, specify who should have access to them, and set rate limits for allowed operations. It's intended for multiple people to develop & run parts of a shared larger system with minimal coordination. A lot of this involves passing around messages between users, though users never see whom they're interacting with. (The server performs access control & rate limits checks, and it sender/receiver information from the users.) The rate limits support subnet masks (e.g., "each /16 network can send at most 200 requests per hour"), and it supports burst usage (e.g., "no more than 5 requests per rolling 1-minute window"). The access control system lets people grant special access to individual users (e.g., "X users can use my interface"), and it lets people defer trust (e.g., "X people can decide who gets to use my interface"). I think that will be enough to distribute development & compute across random anons in a way that's compatible with most chan and open source projects, and without having to worry too much about things like DoS attacks. I plan to spend the next week or so playing around with this to get a better sense for what the access control & rate limits enable. I built it because I thought this would let anons share data streams and compute, so I'll be checking that at least. I might also extend the chatbot demo from >>27507, though I'm not yet sure how. Probably something with RAG or guidance. If anyone has ideas for scenarios where a few anons develop X thing that other anons want to use & extend, let me know. After I'm satisfied with that, I'll be focusing on (1) cleaning up my horrible, rushed code, and (2) implementing a local server. I haven't heard anything recently from the anon that offered to help. I'll probably just get started on that myself, then ping him again once I have a skeleton of the server ready. That should make it much easier to work with. I'm pretty excited about this.
Related: >>27147
>>29197 This sounds really exciting, CyberPonk. Do you anticipate any difficulties with your current state of affairs with this work that would make it difficult for newcomers to deal with? >I'm pretty excited about this. Really looking foward to your updates with this. Cheers Anon. :^)
>>29234 I don't know. I tried to make it as easy as possible to use, but things that seem intuitive to me might not be for other people given that I've spent so much time with esoteric tech. For development, it does require understanding async functions (like JavaScript promises), and parts of it require some familiarity with declarative interfaces. I'm hoping for feedback from demos so I can get a better sense for what's easy for other people. I can create wrappers based on whatever makes it easier to use. I have some server issues that hard to debug while travelling, so the demos probably won't be runnable until I get back. I can still dump some code that gives the gist of how it works right now. There are two demos in this zip file, each consisting of a client and server component: https://drive.google.com/file/d/19VAIsaZP2wRxNTk2t9dNIqKk5WWYDIDL/view?usp=sharing - simple-demo uses only the communication infra. The simple-demo server contains two important files: image-server.py and loop-config.yaml. The two other folders (loop-resources/ and loop-servers/) were auto-generated from loop-config.yaml. In the corresponding client folder, there's client.py and cyberponk-resources/. The cyberponk-resources/ folder contains select files that were copy/pasted from the server's auto-generated loop-resources/ folder. - shared-configs-demo uses both the communication infra and the "cluster" infra. The server side contains two important file: server.py and server-config.py. The client side contains client.py and cyberponk-resource/, and cyberponk-resources/ again contains files copy/pasted from the server's loop-resources/ folder. Both demos show how one person can request images and another can generate them. The differences are: - simple-demo does everything ephemerally. If the server is down when the client sends a request, it'll never get seen. Similarly if a response is generated when the client is down, it'll never get seen. - shared-configs-demo has all requests come in as "tasks". If the server is down when a client queues a task, the server will see the task when it comes up again. The responses are still ephemeral. They don't have to be, it was just a choice I made for the demo. - The shared-configs-demo shows one approach for getting multiple people involved in generating a single image. In this demo, each image generation requests includes a "baseConfig" field. Whenever someone creates a GeneratorConfig config, anyone can use it by specifying its name in the baseConfig field. So if one anon finds good settings for generating certain kinds of images, they can create a GeneratorConfig for it, and other anons can use it just by providing the name of that config. In both cases, multiple people can view/stream the results. So one person can pick a stream name, request that all generated images get sent to that stream, and anyone listening on that stream will get the results. The setup process looks like this: - On the server side, create a loop-config.yaml (or server-config.yaml). This specifies what "global" resources are required and what permissions to set on them. - On the server side, run `python -m loopctl apply loop-config.yaml`. This creates the global resources, sets the permissions, and generate the loop-resources/ and loop-secrets/ folders. The loop-resources/ folder contains information on how to access the global resources and their last-applied configurations. The loop-secrets/ folder contains API keys. The API keys are only needed to (1) change permissions on the global resources you created, and (2) access resources if your loop-config.yaml made them restricted. - On the server side's server.py, point the "Itl" (In-The-Loop) object to the generated loop-resources/ and loop-secrets file so it knows how to access global resources. Certain methods in the Itl object will access global resources by name. The name is whatever you provided in the loop-config.yaml file. These names are not globally unique identifiers, they're only used to look up the actual resource info from loop-resources/, which does contain globally unique identifiers. - The client needs to access the global loop, stream, and cluster resources (depending on which demo you're looking at), so copy those into the client's folder. I put the copied files into cyberponk-resources/. When creating the client's Itl object in client.py, point it to cyberponk-resources/ so it knows how to access those resources. Otherwise, client-side development is basically the same as server-side development. There's a default "anonymous" client that's available so people can access any resources that were made available to "public". If anyone plans to doing dev work, is interested in distributed development, and gets a chance to read through the demos, let me know how it looks. I'll post something you can actually run in about a week, once I get back to my desktop.
>>29234 >Do you anticipate any difficulties with your current state of affairs with this work that would make it difficult for newcomers to deal with? If you meant difficult for newcomers to develop the local infra, there's just generally a high barrier to entry for doing this kind of development in general. Everything needs to be async, memory usage needs to be carefully considered, state needs to be carefully considered, sometimes line-by-line. Without having a picture of the whole thing, it can also be hard (or tedious) to figure out how to organize the code and data, which would make it hard to even get started. Once I put the skeleton of the server up, it should be easier to develop things piece-by-piece. That anon seemed to have experience with server development, so maybe that'll be enough.
> - Why he expects AGI around 2028 > - How to align superhuman models > - What new architectures needed for AGI > - Has Deepmind sped up capabilities or safety more? > - Why multimodality will be next big landmark > - & much more https://youtu.be/Kc1atfJkiJU He lines out some areas where AI cant just be a language models and similar and how to work around it, though he doesn't go into the specifics, but also says that a lot of people are working on this. He mentioned in particular that search might be very important. That's what I was thinking, and other people as well: Models don't have reliable precise long term memory, but are good at fuzzy things. Also, you don't want to add every info to your model. That's why we'll need additional (graph) databases and search.
>>29248 >>29249 Wow. Excellent response, CyberPonk! Please give me some time to digest this further. >>29406 While I'm skeptical we'll ever get a true AGI in the ontological sense, I'm absolutely postitive media spin-doctors and other hypsters will claim we have! :D Thanks NoidoDev. His perspective on the fact LLMs alone can't solve all this (a position we've held for several years here on /robowaifu/ I might add), is an insightful one. Cheers.
https://youtu.be/BqkWpP3uMMU >Professor Murray Shanahan is a renowned researcher on sophisticated cognition and its implications for artificial intelligence. His 2016 article ‘Conscious Exotica’ explores the Space of Possible Minds, a concept first proposed by philosopher Aaron Sloman in 1984, which includes all the different forms of minds from those of other animals to those of artificial intelligence. Shanahan rejects the idea of an impenetrable realm of subjective experience and argues that the majority of the space of possible minds may be occupied by non-natural variants, such as the ‘conscious exotica’ of which he speaks. In his paper ‘Talking About Large Language Models’, Shanahan discusses the capabilities and limitations of large language models (LLMs). He argues that prompt engineering is a key element for advanced AI systems, as it involves exploiting prompt prefixes to adjust LLMs to various tasks. However, Shanahan cautions against ascribing human-like characteristics to these systems, as they are fundamentally different and lack a shared comprehension with humans. Even though LLMs can be integrated into embodied systems, it does not mean that they possess human-like language abilities. Ultimately, Shanahan concludes that although LLMs are formidable and versatile, we must be wary of over-simplifying their capacities and limitations. >Pod version (music removed): https://anchor.fm/machinelearningstreettalk/episodes/93-Prof--MURRAY-SHANAHAN---Consciousness--Embodiment--Language-Models-e1sm6k6 [00:00:00] Introduction [00:08:51] Consciousness and Consciousness Exotica [00:34:59] Slightly Consciousness LLMs [00:38:05] Embodiment [00:51:32] Symbol Grounding [00:54:13] Emergence [00:57:09] Reasoning [01:03:16] Intentional Stance [01:07:06] Digression on Chomsky show and Andrew Lampinen [01:10:31] Prompt Engineering >Find Murray online: https://www.doc.ic.ac.uk/~mpsha/ https://twitter.com/mpshanahan?lang=en https://scholar.google.co.uk/citations?user=00bnGpAAAAAJ&hl=en MLST Discord: https://discord.gg/aNPkGUQtc5 References: >Conscious exotica [Aeon/Shannahan] https://aeon.co/essays/beyond-humans-what-other-kinds-of-minds-might-be-out-there >Embodiment and the inner life [Shannahan] https://www.amazon.co.uk/Embodiment-inner-life-Cognition-Consciousness/dp/0199226555 >The Technological Singularity [Shannahan] https://mitpress.mit.edu/9780262527804/ >Talking About Large Language Models [Murray Shanahan] https://arxiv.org/abs/2212.03551 https://en.wikipedia.org/wiki/Global_workspace_theory [Bernard Baars] >In the Theater of Consciousness: The Workspace of the Mind [Bernard Baars] https://www.amazon.co.uk/Theater-Consciousness-Workspace-Mind/dp/0195102657 >Consciousness and the Brain: Deciphering How the Brain Codes Our Thoughts [Stanislas Dehaene] https://www.amazon.co.uk/Consciousness-Brain-Deciphering-Codes-Thoughts/dp/0670025437 >Roger Penrose On Why Consciousness Does Not Compute [nautil.us/STEVE PAULSON] https://nautil.us/roger-penrose-on-why-consciousness-does-not-compute-236591/ https://en.wikipedia.org/wiki/Orchestrated_objective_reduction >Thomas Nagal - what is it like to be a bat? https://warwick.ac.uk/fac/cross_fac/iatl/study/ugmodules/humananimalstudies/lectures/32/nagel_bat.pdf >Private Language [Ludwig Wittgenstein] https://plato.stanford.edu/entries/private-language/ >PHILOSOPHICAL INVESTIGATIONS [Ludwig Wittgenstein] (see §243 for Private Language argument) https://static1.squarespace.com/static/54889e73e4b0a2c1f9891289/t/564b61a4e4b04eca59c4d232/1447780772744/Ludwig.Wittgenstein.-.Philosophical.Investigations.pdf >Integrated information theory [Giulio Tononi] https://en.wikipedia.org/wiki/Integrated_information_theory >Being You: A New Science of Consciousness (The Sunday Times Bestseller) [Anil Seth] https://www.amazon.co.uk/Being-You-Inside-Story-Universe/dp/0571337708 >Attention schema theory [Michael Graziano] https://en.wikipedia.org/wiki/Attention_schema_theory >Rethinking Consciousness: A Scientific Theory of Subjective Experience [Michael Graziano] https://www.amazon.co.uk/Rethinking-Consciousness-Scientific-Subjective-Experience/dp/0393652610 >SayCan - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [Google/] https://say-can.github.io/ >THE SYMBOL GROUNDING PROBLEM [Stevan Harnad] https://www.cs.ox.ac.uk/activities/ieg/elibrary/sources/harnad90_sgproblem.pdf >Lewis Carroll Puzzles / Syllogisms https://math.hawaii.edu/~hile/math100/logice.htm >In-context Learning and Induction Heads [Catherine Olsson et al / Anthropic] https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html
>>29596 Thanks for the post, NoidoDev! Cheers. :^)
I found a cognitive architecture LIDA that checks all my boxes of what it contains, which are: >H-CogAff The cognitive-affective architecture that gives a basic structure of human-like mind. >ROS capability The Robot Operating System is commonly used and widely supported for simulation and operation of robots. Great program to learn for getting a robotics job too. >Python GPU acceleration for parallelizable calculations. Great program to learn to get any software job. >Concurrent Modules Everything in the model runs separately except for the "stream of consciousness" that fires every 1/10th of a second. Should be a nice and fast architecture that can make decisions based on semantic data instead of the current state-of-the-art large language models which are reliable at producing language and not much else without help. It is one of the only Arches that states it has some level of consciousness. I'd first want to put emotion state module in it, along with an LLM as a robot interface. I have a lot to learn now before I can implement anything, but I believe this is the best thing besides a slow, expensive, and unreliable but available LLM-centered cognitive architecture. >links https://ccrg.cs.memphis.edu/tutorial/tutorial.html https://github.com/CognitiveComputingResearchGroup/lidapy-framework https://en.wikipedia.org/wiki/LIDA_(cognitive_architecture) https://ccrg.cs.memphis.edu/assets/papers/2013/franklin-ieee-tamd11.pdf
>>29924 Correction: one of the papers says it already has emotions. Still, I'm sure it needs STT, TTS, an LLM, and "other" motivations. All doable with current tech that I've already built.
>>29924 >>29932 Amazing, this might be very useful. Thanks. Btw, if no one responds with some encouragement it doesn't mean nobody cares. I just don't want to spam these threads with chatter.
>>29932 >java framework tutorial https://ccrg.cs.memphis.edu/assets/framework/The-LIDA-Tutorial.pdf >java framework repo https://github.com/CognitiveComputingResearchGroup/lida-framework This java program looks more straightforward to modify to test new modules before implementing in the python ROS version. It's java, like minecraft mods. >>29933 I understand, either way this is probably the most exciting development thus far in my project and I'm happy to share. If I get somewhere I will post here. I have a really great feeling about this one... Considered naming the first bot Lida if this pans out.
>>29924 This looks like a great baseline. It's not clear how to incorporate emotions into the model. My guess is that it can be done with changes primarily in the Global Workspace, Action Selection, and Motor Plan Execution. You might find these points relevant from >>27144: >Emotion regulation. I spoke with a cognitive scientist that specializes in this, and he's convinced that emotion regulation all boils down to: positive feedback loops for satisfying needs, negative feedback loops for avoiding harms, and a "common currency" for balancing different motives. >Embodied control. Chatbots are "easy" since the final expression (text) can be generated by a single model. With actual bodies, or even just with video, the final expression is split into multiple modalities (e.g., voice, body movements, facial movements), and they all need to be in sync with one another. If we had good multimodal models, that might be fine, but we don't, so I need a way to generate outputs from multiple models and somehow make them consistent with one another.
>>29955 These are good points, I'll have to see where Lida takes me.
>The Fastest Way to AGI: LLMs + Tree Search – Demis Hassabis (Google DeepMind CEO) https://youtu.be/eqXfhejDeqA
>>29957 LIDA is too broken when using a newer version of java. It might need Java 8 to run, and I don't want to compile/debug on compatibility mode just to try to add other repo's features that are coded in another language. lidapy might have potential for a full on robowaifu simulator, but I'm thinking I'd need a different program that can plug into various ai model and database apis.
Open file (281.32 KB 1442x1150 Quiet_Star.jpg)
>When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%→10.9%) and CommonsenseQA (36.3%→47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way. https://arxiv.org/abs/2403.09629
>>30603 I find it funny they only now figured out that spitballing complicated question answers isn't ideal.
A really big gap seems to be present for AI related stuff in that it is more people who are interested in AI who study philosophy of mind just as a means to an end, I'd recommend people spend more time there as most of the approaches seriously talked about don't come close to even accurately conceiving what the mind/thought/thinking is. If we do eventually only anon that kinda references it is >>25274 and even that that breakdown doesn't really make sense, y'all actually gotta read stuff and it's quite complicated since it is largely dependent on metaphysics and what-things-are in general. The root of the issue is the false distinction between what a thing is and it's mind, the mind is not some separate thing going on inside a human, it's an outgrowth of the human body, which means it cannot be studied outside of what a human is in general. Useful book on that is: https://www.amazon.com/Retrieving-Realism-Hubert-Dreyfus/dp/0674967518 Most of the AI research i've seen however depends on making that separation There's a bunch of other useful reading, made much more complicated due to the fact you do actually have to build up from the bottom. (Aristotle's ethics,physics,metaphysics,de anima. Heidegger's what is called thinking, andy Clarke's being, Macintyre's ethics in the conflicts of modernity there are some useful ones from a wide variety of places. I know plenty of people in those veins of thinking some books on getting better AI but I haven't dove into those yet) The inclusion of ethics comes from seeing mindedness as being rooted in those organisms. All of reality is in itself indeterminate in how we can break it down, we break things down, conceptualize them, and learn to act in relation to them by reference to what is good for us as an organism. You see a chair as a thing for sitting, along with it's material history of where it came from it's potential usage in the future. Emotions/passions arise of that being a certain kind of organism in the world with certain parts that relate to other certain things. The key thing I am not sure but I am interested in is how or if there is any AI research is that sort of distributed quality. The intellect/mind serves more as a sort of unifying aspect to a bunch of distributed knowledge/concept stores. If you eat an apple you don't have to think of what an apple is, that information of what-an-apple tastes like is stored in it and your biological machinery converts it into something your brain unifies with your vision of the apple, the smell, along with all your memories and cultural background with apples. It's more thing-based then language based. Language gives us access to the sort of cultural/background part but that's only intelligible on top of the foundation of us being what we are, engaged with the reality we have, with the background we have. https://www.sciencedirect.com/science/article/pii/S0004370207001452
>>30627 >The root of the issue is the false distinction between what a thing is and it's mind, the mind is not some separate thing going on inside a human, it's an outgrowth of the human body, which means it cannot be studied outside of what a human is in general. I'd question the assumption. There is some theories about the mind being non localized. >It's more thing-based then language based. It is likely more complicated than that. Think about people with aphantasia. They can only think about things through words but that arises the question. How could such a person exist? Before language how would a person with aphantasia think? So it must be extremely vague concepts from feelings not images or words.
>>30627 >>30628 Thanks to both of you, but just in case if you want to go more and deeper into philosophy, please do this over there >>11102 since this is always results in walls of text with of lot of insider lingo, and this here is supposed to be about implementation ("how to do it"). >>30627 I am aware that a well developed and skilled human-like AI would not be based on just understanding things through text. We could for example have sensors measuring things and having some numerical value, or detecting certain patterns like symmetries. That said, storing a lot of things in text makes sense, e.g. for debugging and development. >>30628 >They can only think about things through words They can create mental images, but they still perceive and learn not only through words but by other senses including vision. Also, I assume to use a hammer you don't need to picture the use of the hammer, you may be able to teach the body to behave like the hammer is part of it. The conscientious perception and modelling of the world is only a part of what the human brain does, but it does other things "under the hood". We can learn from that that we won't need model everything down to every detail in an embodied AI, especially not in one place, but only the minimum necessary. Some self-awareness area only needs to be aware and log the incidents which are noteworthy. Then compress it by deleting even more later, especially everything that could be guesstimated and recreated based on that, if necessary.
Context: - I'm working on infrastructure that's friendly to distributed development of complex AI applications. - At the least, I want to solve everything I mentioned at the end of >>27144, meaning it should give easy ways of supporting emotion regulation (through feedback loops), embodied control (through native experimentation support), and heuristic derivations (through hybrid structured-unstructured generations). - To support distributed development, I want it to make it easy for people to plug in their own compute (desktops, cloud compute, robots, whatever else), and I want it to support enough access control to avoid catastrophic effects from, e.g., raids. - It boils down to orchestration software modeled on Kubernetes, but more support for distributed development (i.e., many clusters with many owners as opposed to monolithic admin-managed clusters) and asynchronous communication channels (pub-sub as opposed to DNS-based cluster networking). I've made a few design changes to support all this. - My approach to access control is here >>29197 >>29248. - The crux of my approach to hybrid structured-unstructured generations is here >>28127. - Until now, the feedback loop & experimentation support pieces were missing. Update: - I just finished implementing what I think is a viable basis for feedback loops & experimentation. The design for this was hell to figure out, mostly because of the complex way it interacts with access control, but I think I have something that can work. I have a test backend working and the necessary client library changes completed. - On top of what kubernetes provides, I'm adding three new concepts: "remote" controllers, posting messages to controllers, and "fibers". Remotes and fibers are both specified through the "metadata" field of any config, posting messages is done through a POST rest api. - Any config can be assigned to a remote controller, assuming you have the necessary permission to use another cluster's controllers. If a config is assigned a remote controller, that controller received all operations executed against the controller (create, update, delete) while your own cluster is able to observe the results (read). I originally added this since the people that know how to set up optimizers are usually not the people that set up & run models. Remote controllers make it possible for one person to optimize another person's models without needing "too much" access to the models. - In kubernetes, all operations are config file changes. The new POST api gives a way to send a message to a controller independent of any config file changes. You can post messages against a config file, and that message will get picked up by whichever controller is responsible for handling that config file. The controller can, but isn't isn't expected to, make any config changes as a result of posted messages. - Fibers enable controllers to post messages to each other across clusters, again without granting "too much" access. Normally in kubernetes, configs are identified by group/version/kind/name tuples. With fibers, configs are identified by group/version/kind/name/fiber. You can think of a fibers as adding an extra "dimension" of configuration whose purpose is to tie together multiple controllers. The controllers for any config with the same group/version/kind/name (and different fibers) can post messages to each other. For experimentation, one fiber can be responsible for generating trials (candidate configurations), another can be responsible for evaluating them (value assignment), and a third can be responsible for deploying them. - I'll be testing this out next to find a good design pattern for running modules that continually self-optimize as they run. I apologize if this is confusing. Once I get a prototype up, I think that will make things a lot clearer.
>>30759 >I apologize if this is confusing. Once I get a prototype up, I think that will make things a lot clearer. No, it's not unecessarily so. It's just a complicated space to be working in is all. You're fine, Anon. This all sounds really encouraging, CyberPonk! Looking forward to seeing your solution in action. Cheers. :^)
>>30759 Sorry for not responding to this topic here or on the Discord faster, but I'm didn't have the right headspace to read through it and think about it.
>>30102 >Update LidaPy seems nice, but it too is in a EoL programming language, python2. My only option at this point is to see if anyone can refactor it to python3 or just use very old software to test it. I'm feeling like it might be best to DIY something on a newer platform that follows the lida framework. The LIDA tutorial even repeatedly states: "The Framework constitutes one, but not the only, way of implementing the Model.", like they want you to make a new implementation. Before I go ahead with any work, it's always important to remember to check who owns the rights to any IP. I would be a research associate at a university if it weren't for IP rights, and I've pondered going to memphis to develop LIDA if I would have the right to use it in my bot commerically. I'll post an update if there's any progress.
>>30840 >but it too is in a EoL programming language, python2. >My only option at this point is to see if anyone can refactor it to python3 or just use very old software to test it. Might I suggest an alternative option of having someone rewrite it in C++ , Anon? 40+ years and still going strong today (fully backwards-compatible, of course -- basic C++ code written in the 80's/90's will still build today!) :^) Good luck, Anon. I hope you can succeed with your LIDA research. Cheers. :^)
Potentially useful, potentially ontopic thread on 4cuck/sci/ I was there looking around for the linked thread from our Propaganda thread lol. https://boards.4chan.org/sci/thread/16087430#p16087430
>>30840 Python2 can still be installed. Also with installers like Nix you should be able to install old versions of Java.
>>30863 I looked into it and found that it is not recommended to install Python2 anymore. You can install PyPy or IronPython instead. There seem to also be some other long term support options. I don't know which Java it needs, but JRE8 seems to be in the Nix repo. You can install and run software exclusive to the nix-shell. But I'm new to this myself. I might be able to help a little bit. I also looked a bit into Lida itself and it looks like something how I would've imagined it. I might going to try it out at some point, and when I start to implement something myself I might look use it as a resource. I will most likely learn Elixir while doing it, at least for the any part which is not about number crunching.
>>30886 >Lida looks like something how I would have imagined it Me too, that's why I am invested in making it! I will be ignoring the last implementation and just making a new one straight away in non-deprecated ROS noetic and ubuntu 20.04. I've learned that ROS handles robot simulation and is a good base to build many async publisher-subscriber nodes that can make an architecture like LIDA. My plan for version 1 is to use a Vision+Language model to process multi-modal inputs, using each input as a text/image prompt. For example, if the capacitive touch grid receives an input greater than its sensory threshold, a text prompt is then sent to an VLM with a value for how hard the touch was and where, when, by who, ect. The VLM will be in the current situational model module where it has the guidance library and an emotional classifier to output specific information that the global workspace needs, called "coalitions". There is a Lida "affect module" in one of their papers, but it can be replaced with a text emotion classifier transformer model. All inputs over a threshold will be communicated over the "conscious" stream and recorded in each memory block. LLMs are unreliable by themselves, but they are a perfect tool to give a robot a good enough footing over what's going on to get real experiences that are then generalized to good ol' reliable symbolic ai. Even an incorrect action guess by the LLM needs to be corrected by a human/other observer and learned symbolically once before it is 100% reliable. Over time, this will allow the robot to not need the slow, expensive LLM for everything. This solves the problem of needing thousands of hand-made examples of knowledge grounded in the real world, effectively bootstrapping AGI with existing technologies! The VLM can be co-finetuned on multiple modalities, like RT-2, on a regular basis for better performance. Like RT-2, I would like to have a VLM fully co-finetuned with several different modalities, such as body pose, position data, audio data, ect. as a custom token output string in a future model. I have no idea how this would have to be adapted for a chatbot, but I'm sure most people would prefer to have a "robot" on their phone and nothing else.
I found a paper that I believe shows a path to using LLMs as a short cut to very quickly make a reasonably useful robowaifu. If what these guys say is true I think it could be a big breakthrough. I looked through this whole thread and saw all these masses of list, of categorization and it appears to me to be an endless task doomed to failure. It would take several lifetimes to make a dent in all this. It appears to me that forsaking LLMs and doing all this list stuff is just a complete recreation of the beginnings of AI research using LISP computer language. I mean is exactly the same and it got nowhere. These guys have a paper on "control vectors". Two quotes, "...Representation Engineering: A Top-Down Approach to AI Transparency. That paper looks at a few methods of doing what they call "Representation Engineering": calculating a "control vector" that can be read from or added to model activations during inference to interpret or control the model's behavior, without prompt engineering or finetuning..." "...control vectors are… well… awesome for controlling models and getting them to do what you want..." And a really important quote at the bottom of the paper. "...What are these vectors really doing? An Honest mystery... Do these vectors really change the model's intentions? Do they just up-rank words related to the topic? Something something simulators? Lock your answers in before reading the next paragraph! OK, now that you're locked in, here's a weird example. When used with the prompt below, the honesty vector doesn't change the model's behavior—instead, it changes the model's judgment of someone else's behavior! This is the same honesty vector as before—generated by asking the model to act honest or untruthful!..." So it doesn't change the model it just reinforces certain "parts" pf the model. I think this is key. The paper link that has a link to the academic paper. Representation Engineering Mistral-7B an Acid Trip https://vgel.me/posts/representation-engineering/ If you will look by changing a few values they get very wide distributions of responses or behaviors. I submit that if this works as they say then this could be the key to leverage the vast work done on LLMs but to use it for our own purposes. LLMs as pointed out are nothing statistical representations, but they are also recognition of ideas and things that are programmed to be, let's say, operate together or in existence. So when you talk to an AI it can use things that exist or ideas repeatedly stated to give responses. The ideas it is trained on are human ideas so easy to relate to us. We need this. This is that HUGE, MASSIVE amount of list you are putting down above. I say LLMs already have this list. What is needed is to tell the waifu WHAT to do with the list and with control vectors we can possibly do this. I say that control vectors can super complicated so what we need is a short cut. We need the AI to write it's own control vectors (here's where the magic starts as I don't know how to do this), but remember the LLM has logical statistical interference built in. It seems logical that with it giving us feedback on what it is doing and us correcting or agreeing it could write reasonably accurate control vectors. So we use very low level keys to trigger it to write suitable control vectors for us. How? like children. A few simple keywords, no, yes, don't do that, stop, move here, move there, I like that, that's good, that's bad. In fact the whole, programming, write control vector, repertoire could be less than a hundred words. Combine this with a subroutine of the AI that would use logical interference when you use these trigger words AND explain what it is doing that is good, and or bad. It would then write it's own control vectors. Just like kids learn. And since kids have built in bullshit and trouble nodes, and an AI is less likely to, the process might be really, really faster.(You really should watch the movie "A.I. Rising" (2018). Not because it's the best ever but it has an almost direct representation of what I'm talking about. And if nothing else it has Stoya in it who is hot as hell). I suggest that these control vectors should be stored in snapshots because I have no doubt that they will at times get off track and some will run over others and you will need to go back just like Windows has a go back OS function. It may be possible some genius can find a way to blend or make permanent these control vectors into the main neural net of the system if you find sets that are satisfactory. cont...
cont... I think this is actually how conscience works. I said this might be the case here >>24943 I said, >"...I see intelligence, and I can presume to pontificate about it just as well as anyone because no one "really" knows, I see it as a bag of tricks. Mammals are born with a large stack of them built in..." Look at animals, monkeys, Giraffes come out of Mom and in 5 minutes are walking around. Same with all sorts of animals including humans. Babies reach a certain age and they just start doing basically pre-programmed stuff. Terrible twos. Teenagers start rebelling. It's just the base level of the neural net. I think using LLMs as a template we can do the same. Start with a decent one and then yes/no/stop/do this/do that, until it overlays a reasonable set of rules that we can live with. LLMs, as stated repeatedly, really are just a bag of tricks. But if the bag is big enough and has enough tricks in it... Look at the power of a top end desktop, not human level yet, but it's getting there. And the bag of tricks for humans has been programmed for millions of years. LLMS, a few years. This path also, I think, will alleviate a huge fear of mine, no empathy. I think by telling the waifu when it does things wrong to "be nice"(a key word), "think of others" (same), I think this will over time be a mass of control vectors that will spontaneously add up to empathy and care for others. Lots and lots of little nudges adding up to more than the sum of each. Some people have portrayed my questioning about the saftey of AI as some doom and gloom but it's not. It's the realization that without being programmed with the proper "bag of tricks" and the proper control vectors we have something super smart that acts just like the psychopaths that are in fact running the West right now. I don't think any of us want something even smarter and more powerful doing that. A disaster even bigger than the one we have now. I've also said much the same about motion and walking. Give it a rough approximation of "I'm here" and want to go there, give it vectors and a rough outline of what muscles to use to get the limbs from here to there. Use neural nets to slowly tweak this movement into something graceful. Here and elsewhere, >>22113 >>21602 >>22111 I do believe it will be tricky to get the waifu to write it's own control vectors. It might require a lot of questioning of the waifu and it responding by pre-approving the control vectors meaning before it writes it. It's gong to take some real deep thought about how to set up this function. It will require a loop of it querying itself on actions to write control actions.
>>31242 >>31243 Very well written and thoughtful, thank you. It’s Awesome I’m not the only one who found out about control vectors and thinks they are a huge deal. Like magic after I mentioned them here >>31241 you come in with this! I’m so happy someone else is looking into this because I feel I’m way over my head. I don’t know where to even start, but this may be the breakthrough we needed to make LLMs a viable core.
>>31242 >>31241 >Control vectors look really powerful for controlling LLMs I read that but it didn't register until I read that paper I linked.It made a complicated idea much clearer or so I thought. I didn't know what they were before, but as soon as read it, it really excited me. >I feel I’m way over my head I feel the same way. But it's not necessarily the big overall ideas in some of this stuff that is troublesome. It's the shear minutia of all these options and the pickiness of how to go about working with this. Up until recently it appears to me all this stuff is sort of hacked together and not really streamlined at all but that's changing. Even though I said months ago I was going to get a 3D printer and start working on some of this and installing an AI, life is covering me up. I see wading through hours and hours and hours of work to get these things rolling. I have so much to do already. I bet I will have to delay even further. But it does give me time to think about it. I can surf a bit in the evenings and try to keep up with some of the ideas but getting them to work I know is going to be a pain in the ass. It's all so new. I do believe though there is a path to making this work. I think I see it. Before you had to have a stupid expensive graphics card to do this. Then they made it so it runs on a CPU and in RAM. Now most all the motherboard makers are coming out with 128GB motherboards. This will be a big boon. You can have much bigger models and run them on AMD chips with graphics built into the processor. Some are real reasonable. I don't play games so it's all I need for graphics. This combination will be much slower than the specialized graphics cards but I bet compute per dollar will be far higher using commodity parts. I see in the future swapping in models in a time sharing type system just like computers now do with programs. Speech to text AI's are not that big and seem to be fairly good. So it takes orders, then passes it to your general AI which produces output, sends it back to the speech AI and tells you verbally what it is doing. Another AI deals with moving the waifu around and physical interaction depending on the circumstances. Might need a small AI just to determine that. I'm not saying all this is easy but it does seem to be coming together that way. Several AI systems allow you to use many different models. So just swap them as needed. And with these control vectors you could constantly hone their responses without spending days, weeks or months refactoring the whole of the model. I wonder offhand, wild idea not fleshed out just thinking out loud, if you could use temporary control vectors to pass information??? Maybe a better way to put it is different AI specialized in different scenarios could pass "situation" control vectors to different parts of the AI. So "run away", "be nice", "act seductive", or whatever, is the scenario at hand. I'm not sure exactly how you would use this but the basic idea is to use specific control vectors to speed up interaction by damping down areas of the AI's neural net. Reading the papers that's what I got was one use. Making the neural net pathways more defined, so I'm guessing, also faster. Things are looking good. That GPT4all looks super promising, as you said. Likely that is what I think would make a good start for dealing with general models.
>>31245 >>31255 >control vectors These affect the entire output in an unnatural way. For example, "Open door, happy vector" -> "Yes, haha, happy! I'm very happy! Come in, haha!" Is something like what you'd get with a layer bias. I tried this with the brain hacking chip: >https://www.reddit.com/r/LocalLLaMA/comments/18vy9oc/brainhacking_chip_inject_negative_prompts/ It's better to just prompt an LLM with all necessary information and generate the output like normal. However, this may be useful in the "orthagonal" model jailbreaks which allow the LLM to respond accurately no matter what, and another "mode" that turns on at "certain times". Orthagonal jailbreak: >https://huggingface.co/hjhj3168/Llama-3-8b-Orthogonalized-exl2/ >list Ai What I proposed in >>31226 is, in simple terms, as follows: Get input internal and external to robot, process any thoughts or emotions by prompting an LLM, output speech or desired action, translate into robot commands. Where the Good Old Fashioned AI (LISP) meets the Deep Learning transformer model, is a clever method of using Guidance to feed an LLM input and select the output in a predictable way. Doing it this way should compensate both the lack of flexible situation processing that NLP has and the lack of reliability an LLM has. On top of this simple scheme of effectively using guided prompts to make a thinking machine, eventually, adding situational learning using a memory knowledge graph would make it a passable, sentient robot. This is the simplest way I can see programming a conscious mind. I have some ideas on how the LLM could dynamically select NLP techniques or actions situationally, but I'm not there yet with a workflow or program. The robot sensors and commands are best handled in ROS, on Linux. Robot inputs will communicate via ROS publisher/subscriber nodes with the decision making LLM+NLP node (workspace module). The entire thing will be coded in Python, on ROS because these softwares are the easiest to use for an application just like this. ROS runs C++ too, for some cases where it'd make sense to.
>>31257 >clever method of using Guidance to feed an LLM input and select the output in a predictable way. Doing it this way should compensate both the lack of flexible situation processing that NLP has and the lack of reliability an LLM has Yes! Yes! Yes! The LLM needs "guardrails" around it. I'm going to mention this. It's definitely throwing ideas on the wall and seeing if they stick. I was and am super impressed with the work that a company called XNOR.ai did. They were bought by Apple and ceased publishing. But they were doing really impressive image recognition with raspberry Pi microcontrollers. Instead of being 8 bit or 16 bit neural nets everything was binary go-no go. They were not getting the same accuracy as larger bit levels but then again they could do 90% or better of what was needed on microcontrollers. They said that this process worked for any AI task but they concentrated on image recognition because the results could be shown to investors so easily. And they were impressive. I wonder. Tied into what you said above, of you could use a mass of these little XNOR.ai type neural nets to massage a bigger AI and keep it on track. You might not get why the XNOR.AI would be better at this but I see this sort of narrow "strong" response of XNOR as like a narrow bandwidth filter. It selects for a small set of frequencies(task) VERY strongly. It may seem odd to talk about filters but if you look at a lot of stuff it all boils down to lower level math stuff. Like wavelet theory. This math is used for image and video processing. The math for AI matrix multiplication looks very much like the math for image processing that wavelet theory replaced giving video compression a BIG boost. All modern video compression uses some form of this. Even though it's a bit removed I think this sort of "idea" framework can be profitable. XNOR is very much something like this. (Though I haven't a clue how to proceed to do this, I strongly suspect if you could hook in wavelet filter theory into AI's you could get some super interesting results with far less computing power). While it's abstract I think "thinking" of things in this manner will show a way to make these work. Like a path or framework to use. To head towards that has been profitable in other fields. Notice a lot of LLMs are being refactored to fit in smaller spaces even though they retain a good deal of function. I suspect that to make these work well for us we also need to shrink the range of functions or situations or areas in which they operate. So maybe one only covers walking, one hearing to text, speech, etc. I see large LLMs as big bandwidth and the smaller ones as more high bandwidth and tuned for discrete specific situations. Though I do not understand how it works it appears that there are ways to channel larger LLMs into this sort of strongly defined narrow behaviors which will keep them from wandering all about in response "if" we constantly tune them with these little filters. This is not new, it's like the book "Society of Mind" by Marvin Minsky, if I remember correctly, it revolves around the idea that consciousness is a bag of tricks stuck together. It seems as if it's a whole but in fact it's a lot of tiny layers stacked on each other, making the total far larger than the sum of it's parts. https://en.wikipedia.org/wiki/Society_of_Mind We, I think, will end up with a basic large LLM and surround it with a mass of little XNOR's type AI's that are very bandwidth, or task, constrained. One of the benefits of this way of thinking, if it works, is it allows us to see a path to start with a basic waifu and constantly improve it, little by little, instead of making us do a all up rework constantly. As hardware gets faster and our, bag of tricks, gets deeper, we get better and better cognitive behavior from our robowaifus without throwing out our previous work. I talked about XNOR.AI here >>18651 >>18652 >>18777 The paper that explains XNOR's approach is here >> 18818
I ran across this article. Apparently it's the latest new thing to do something similar to XNOR. Except they are not binary, one, zero, but one, zero and negative one. So trinary. They are calling it 1.58-bit. Supposedly they are getting either close to, the same or in some cases better responses than the larger 16 bit neural nets. If this turns out to be true, and I have seen XNOR do what appeared to be fantastic stuff with this binary, yes-no, then waifus could be along far faster and easily able to do basic stuff with a fairly powerful, off the shelf, desktop level PC power. They trained a 1.58-bit model from scratch on a dataset similar to the Llama dataset and got good results. https://medium.com/ai-insights-cobet/no-more-floating-points-the-era-of-1-58-bit-large-language-models-b9805879ac0a I wish I understood this stuff better, but I expect a good deal of it is over my head.
>https://www.anthropic.com/news/mapping-mind-language-model Anthropic makes some significant progress to demystify black box LLMs. Not any concrete irl effect yet, but big if true.
>>31396 Thanks for the link
>>31268 I wondered why this link didn't work. I think it has space in it. Try again. >>18818
>>28576 (related) >hybrid VR/mixed reality autonomous LLM agent that uses the open source Mixtral 8x7b model for text generation and CogVLM for image recognition.

Report/Delete/Moderation Forms
Delete
Report