/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

Site was down because of hosting-related issues. Figuring out why it happened now.

Build Back Better

Sorry for the delays in the BBB plan. An update will be issued in the thread soon in late August. -r

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


When the world says, “Give up,” Hope whispers, “Try it one more time.” -t. Anonymous


Open file (8.45 MB 2000x2811 ClipboardImage.png)
Cognitivie Architecture : Discussion Kiwi 08/22/2023 (Tue) 05:03:37 No.24783
Chii Cogito Ergo Chii Chii thinks, therefore Chii is. Cognitive architecture is the study of the building blocks which lead to cognition. The structures from which thought emerges. Let's start with the three main aspects of mind; Sentience: Ability to experience sensations and feelings. Her sensors communicate states to her. She senses your hand holding hers and can react. Feelings, having emotions. Her hand being held bring her happiness. This builds on her capacity for subjective experience, related to qualia. Self-awareness: Capacity to differentiate the self from external actors and objects. When presented with a mirror, echo, or other self referential sensory input is recognized as the self. She sees herself in your eyes reflection and recognizes that is her, that she is being held by you. Sapience: Perception of knowledge. Linking concepts and meanings. Able to discern correlations congruent with having wisdom. She sees you collapse into your chair. She infers your state of exhaustion and brings you something to drink. These building blocks integrate and allow her to be. She doesn't just feel, she has qualia. She doesn't see her reflection, she sees herself reflected, she acknowledges her own existence. She doesn't just find relevant data, she works with concepts and integrates her feelings and personality when forming a response. Cognition, subjective thought reliant on a conscious separation of the self and external reality that integrates knowledge of the latter. A state beyond current AI, a true intellect. This thread is dedicated to all the steps on the long journey towards a waifu that truly thinks and feels. >=== -edit subject
Edited last time by Chobitsu on 09/17/2023 (Sun) 20:43:41.
>>33298 I think my chatbot library (horsona) is in a good-enough state for anyone that wants to help out with development. Repo: https://github.com/synthbot-anon/horsona Open tasks: https://github.com/synthbot-anon/horsona/issues The current open tasks are for creating new LLMEngines (easy), making prompts more reliable (medium), and creating new datatypes & modules for character cards and image generation (medium/hard, probably requires some familiarity with pytorch). If you want to develop and run into any issues with the setup, let me know. If you want to add something not on the Open Issues list, feel free to post about it here. Note: This is a library, not an application. I do intend to create chatbot applications based on this, but those will be in separate repos.
>>33298 >>33301 Excellent news! I hope you soon have many PRs, Anon. :^) <---> I've been studying Fast path-finding algos in a smol footprint; fast BehaviorTrees to replace FSMs in a composable, user-friendly way; exploring YAML as a data format for user-designed/user-readable robowaifu resources (incl. in the BTs); finding many neat way to solve some of Kiwi & I's desing goals during our first MaidCom brainstorming; working on a auto-geo-mesh, auto-rigging meta, meta generator system to create Blender robowaifu models (eventually for Maya as well -- same system); solidifying my understanding of providing a good, stable & performant Pythonic API for all this with the latest tools; and lastly working through some concepts for driving simulator-learning (visualizing inside Blender) feedback mechanism from realworld hardware. Lol, none of this is really ontopic ITT except the behavior trees, maybe. :) Looking forward to what you do
>>33306 >fast BehaviorTrees to replace FSMs in a composable I hadn't heard of this, and it looks useful for my stuff. >Fast path-finding algos in a smol footprint I think everything for finding "good" paths starts with Depth-First Search (DFS), then adds customizations and optimizations to avoid the need for full exploration. In machine learning, Monte-Carlo Tree Search is pretty standard. It gives you a way to accumulate the results of each branch. UCT (Upper Confidence bounds for Trees) tells you how to prioritize which branch to take. Dynamic Programming adds a cache so if you see a state twice, you can recognize it and avoid duplicate processing. AlphaZero adds in a neural-network based heuristic so you can work with some information before investigating any branches. I think MuZero uses a neural network to abstract the explicit tree search, for cases where the number of branches is large. There are other algorithms that looks like path algorithms that are better thought as structure-finding algorithms. Topological sorts and spanning tree algorithms are two examples. >exploring YAML as a data format I recommend sticking to the subset of YAML where the data is compatible with JSON. That one is battle-tested on very complex infrastructure tasks for exactly this purpose (human-readable format for defining & configuring user-designed resources). For cases where the underlying "controllers" for handling resource configs can change, the Kubernetes object format is great. https://kubernetes.io/docs/concepts/overview/working-with-objects/ For other cases, just JSON-compatible YAML is great. >stable & performant Pythonic API for all this with the latest tools If you don't need to train on-device (though you probably do), I'd recommend separating the requirements for development from the requirements for execution. PyTorch is great for development, and you can export the models you create to be run by a more performant library. For example, you can create a model with pytorch, export it to ONNX, and use some C++ runtime to run the ONNX model. It looks like ONNX is going to add support for training https://onnx.ai/onnx/operators/onnx_aionnxpreviewtraining_Gradient.html so you might be able to take this approach even for cases where you do need to train on-device. OpenVINO seems to be the main choice for running ONNX models on CPUs, and TensorRT for Nvidia GPUs. >auto-rigging meta Anything that looks like automatically generating a configuration is going to be solved with an optimization algorithm. The main questions to ask are: how easy is it to get new datapoints (i.e., get an example configuration & test it to see how it performs), how much compute can you afford to throw at the problem, how many dimensions does the search space have, and how complex is the search space. - Bayesian optimization: very sample-efficient (needs few samples), the good algorithms are compute-intensive, and it deals with simple search spaces. - Neural networks: great for dealing with complex search spaces. If you can get a lot of samples, you can train these normally. If not, you'll need to use some tricks to train it with fewer samples. The size of the network determines how compute-intensive it is. - Monte Carlo reinforcement learning methods: requires a lot of samples, very low computation costs per sample, can deal with medium-complexity search spaces. Usually in ML, the solution is some mix of all of these things that fits your constraints.
>>33306 Feel free to move >>33325 to the appropriate thread. If you tell me where it goes, I can follow along there.
Maybe it's just me, but I found this article intended for expats in Japan oddly on-topic ITT : https://www.tokyodev.com/articles/become-a-great-communicator-in-japanese >''We report, you decide. :^) >>33327 Great stuff. OK, I'll plan to soonish, Anon.
>>33301 I'm currently working on lorebook generation (given a story/script, extract enough information so a chatbot can work within the context of the story/script) and character card creation (describe a personality so a chatbot can emulate it). I'm using those tasks to figure out how to clean up my memory implementation. I finally got my memory implementation to a point where it can read a story, though it's a slow reader. Here's its memory state after reading the first 50 paragraphs of Friendship is Optimal: https://ponepaste.org/10323 The information it extracts is good, but there are some obvious structural issues with this kind of memory. Notably, the relationship between information isn't represented. Ideally, when one thing changes, it should affect other related memory. I have ideas on how to do this (nothing concrete yet), and I think the question-answer format is a good starting point for building more complex memory structures. I'll be cleaning up my memory implementation next so it's easier to reuse & build on, and so it integrates more cleanly with the rest of my framework.
>>33298 I like the idea of keeping track of LLM inputs to then back propagate changes based on feedback. Most LLM rags/augments seem to be built as read only systems. As for model size and prompt engineering, I think it’s a good idea to start thinking about creating a dataset and doing a finetune. For example, in the case of Triple extraction a finetuned 7b will be no worse then using GPT-4. [1] The rp/erp people have been making and merging models and it makes a night and day difference. I think it’s important we start thinking about crafting LLMs specific for the robowaifu usecase. The goal is to leverage LLMs for language and not for fact storage, I'm hoping to go smaller than 7-8b. The Minitron 4b models (and finetunes of it) are impressive. Having smaller models is not just about using less ram, it’s also about more tokens per second, and it looks like our systems are going too be token heavy. (at least mine will be lol) This is something I do want to collaborate on. Crafting a high quality instruct datasets & then fine tuning is not cheap or easy, hence why I think it’s important to prevent duplicate effort here specifically. So we should start figuring out what tasks our systems are doing with LLMs. It be a good idea to have a common format between all our projects too. >>33301 I’m not super comfortable or with contributing to a python codebase but I like what you’re doing, so if you need a second pair of eyes or an opinion/code review I’m happy to help. I really like rag back propagation idea and I’m excited to see what else you’re going to do! >>33421 I’m not familiar with the story, but this generated QA memory table looks good. It’s a good looking result! Here is a detail that jumped out me: >"Who sat down at computer #12?": "David" >"Where did David sit down?": "At computer #12" I’m wondering what your thoughts are on deduping? Don’t worry if it’s nothing concrete or polished. Links: [1] https://medium.com/@EleventhHourEnthusiast/fine-tuning-language-models-for-triple-extraction-with-data-augmentation-834196bb3ceb
>>24816 Its been a year from my first post in this thread. Its interesting to look back, All I had was some gut feelings and a few leads, no concrete ideas on where to go or even a twinkle in my eye what a system would look like. Today I think have figured out good potential abstractions & building blocks for an agent. It's also fun to see what did not change, That "the high level mind is a narrative machine" sentiment has stuck with me as the guiding idea of what it is that I am trying to build. Now that I have a more concrete idea of what I am trying to build, I am hoping to layout my idea for the architecture and to get feedback on it, I want all your thoughts & criticism on it. I hope to start the software writing and engineering soon. A core assumption I have been following is that to get something useful a time first based approach to memory needs to be taken. Vanilla rag, Graph rag don't have mechanisms for dealing with time. Most cognitive architectures seem to split memory into "Declarative"/"Semantic memory" and "Procedural"/"Episodic memory" memory, This split is detrimental, information it not split cleanly between the two. At a high level the memory system has several main concepts: 1. Memory is composed of nodes connected by edges (its a graph). Each node has context(s) that it belongs to and the context can enforce requirements for node membership. Data is stored inside key value pairs with the nodes, So the graph nodes and edges are not classical triples, its kinda similar to an OpenCog atom. 2. Nodes values may have depth, depth is used to represent change over time. Time is represented as relative values with a scale category (second(s), minute(s), hour(s), day(s), etc...). Changes form a chain of values for a key in a node. Being able to query how facts change over time for a node is really important. This can be used to represent many things. For example imagine a "Tea making procedure" node, its "high level actions" key would be the steps to make tea. Another example is the agent remembering a story, it could have a node for each character & then keys for different aspects it observes, like a key for there actions, feelings & etc... 3. (Somewhat) Natural language as the primary representation of data and the source of truth. I am aware of how strange it sounds, usually agents try to distill natural language into symbolic representation for internal use, making this sound like an overly complicated LLM RAG. But I promise there is a reason for this. The problem the Semantic Web, CYC and other symbolic systems ran into is that a symbolic representation requires a predefined schema [1] (and for everything to be defined in it) & that is THE HARD PROBLEM to solve. Lucky us we now have access to LLMs and they are good with parsing and manipulating natural language [2]. LLMs can be leveraged to translate natural language to a constrained symbolic representation on demand. This flexibility is essential, There will likely be many domain specific solvers in an agent, that are not known ahead of time (they are learned), so we need a universal schema. (more of an implementation detail, but we do not need to call an LLM each time, the symbolic representation can be cached, only regenerated on fact rewrite) 4. Context plays an important role for both memory and cognition in this architecture, A context can (not required to) enforce rules onto all nodes that inherit it. This is important for (symbolic) reasoners, it ensures a uniform schema within all relevant nodes and links in its domain. Context can also just be a natural language text for LLM use. This post is getting long, this is mainly just talking about memory, but I think its a good starting point for discussion. Links: [1] https://youtu.be/3wMKoSRbGVs?t=455 -- On predicate logic / a universal schema. [2] https://youtu.be/3wMKoSRbGVs?t=1918 -- LLMs already hold a lot of the needed "rules of thumb" & we should not spend the man years required to do it by hand :^)
>>33490 >smol models I've been playing around with llama 3.1 8b more and am finding it much more impressive than I initially gave it credit for. The main problem with it is that it degrades *significantly* when trying to get structured outputs from it. I found a generic way to compose more functionality into 8b inference calls, but it's not that scalable in terms of how much complexity it can support. Unfortunately there's no way around this without using structured outputs. I have high hopes though that a fine-tune could solve this problem. My systems also turn out to be very token-heavy, so it would be a huge win if I could get structured outputs to work well with an 8b or smaller. >collaborating on datasets I'd be up for collaborating on high-quality instruct datasets. It's hard to collaborate on fine-tuning right now. I have ideas on that based on a distributed system I've developed ("infrastructure stuff" in >>32963), though it'll take some work to add in support for fine-tuning tasks. Datasets are much easier to collaborate on though, and I think my framework could help a lot with automated dataset creation due to its ability to backprop updates to a dataset. I can try to update my horsona library to support the creation of fine-tuning datasets. I'd do this by adding support for this kind of workflow: - Record inputs-outputs for LLM calls. - Keep track of when backprop is used to update the result of an LLM call. - Store the final dataset as <inference call, updated result> pairs. >using fine-tuned models. I'm only using 3rd party APIs right now since it's much cheaper to use those than to buy/rent my own GPUs. With APIs: - Groq is very unreliable, and it has no support for fine-tuning. I used to use it somewhat often, but I've stopped due to how unreliable it is. - Fireworks seems to have good support for fine-tuning & inference, and decent support for constrained generation. - Cerebras is by far the fastest and cheapest option, but it has no support for fine-tuning, and they don't have proper support for constrained generation. Also, the way their inference works, I'm not sure if they'll every support fine-tuned models. That lack of fine-tuning support would be fine if they could just get a fine-tuned 3.1 8b up with proper support for constrained generation. They also have low rate limits, and I expect they'll continue to have low rate limits until they have a proper paid version available. - I'm disregarding API providers that only offer closed-source models. On the open source side, I think sglang (open source) would be the best small-model option, and would be a good drop-in replacement for Fireworks. So right now, I don't see a downside to developing against Fireworks, then switching to sglang once my usage gets high enough or once private inference becomes a higher priority. There's no good open source replacement for Cerebras though due to its insane speed, so I'll probably stick to using Cerebras only for data generation and testing. >I’m wondering what your thoughts are on deduping? Nothing concrete right now. Per my current thinking, the QA dataset would be the "ground level" for declarative memory, and I'll eventually have higher-level structures built "above" that. For example, an ontology, where all of the concrete instances of concepts are QA data. I haven't thought much about it yet though, but I like the idea of grounding all declarative information in QA data since questions are easy to abstract over. For example, the "James" concept can be treated as the set of questions involving James.
>>33495 >Most cognitive architectures seem to split memory into "Declarative"/"Semantic memory" and "Procedural"/"Episodic memory" memory, This split is detrimental, information it not split cleanly between the two. I agree. I'm thinking about this approach: - The episodic memory would act like some quasi-ground truth. - Declarative memory would have a link back to the episodic memories from which it was derived. - Higher level software-friendly abstractions would be build on the declarative memory. And a general note about any sort of memory: I've found it helpful to think about a split between data and indexes. Data is about what's retrieved, and indexes are about how things can be retrieved. Any data that's stored (let's say episodic memory) can be indexed in many ways to support many kinds of queries, and it's often beneficial to index the same data in multiple ways. The exact same data can be indexed through triplets, embeddings, and SQL columns, and different uses of the same data might require different indexes. To that end, I'd want to think about what kinds of queries can be supported by what kinds of indexes. Triplets are good for logical queries. Embeddings are good for similarity searches. SQL columns are good for identity- and property-based searches. Spatial indexes are good for topological queries. What kinds of use cases aren't supported by these, and what kind of indexes would they require?
>>33421 The initial "StoryReader" implementation is up: >Example usage: https://github.com/synthbot-anon/horsona/blob/main/tests/test_reader.py >Implementation: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/stories/reader.py It can read stories paragraph-by-paragraph to extract information. It tracks three things as it reads through a story: (1) short term memory, which is just an array of the most recent paragraphs read, (2) a long term memory, which consists of an embedding-based question-answer database that keep track of information extracted from the story and a cache to keep track of the most recent retrievals, and (3) a StoryState, which is a data structure that keeps track of "live" information about what's being read (e.g., current location, current speakers). Next I'm going to refactor the StoryReader module to support custom memory modules, support extracting custom information from stories, and support tracking custom "live" information. I added two new issues, in case anyone wants to work on them: >https://github.com/synthbot-anon/horsona/issues/9 This one is for using an API to generate embeddings, instead of doing it in the library. >https://github.com/synthbot-anon/horsona/issues/10 This one is for using FAISS or other to do database operations (create, delete, query) on embeddings, instead of doing it directly with matrix operations. The full list of open issues is here: https://github.com/synthbot-anon/horsona/issues
>>33507 Excellent work, Lad. Godspeed. :^)
Open file (514.48 KB 512x768 AtriNoticed.png)
>>33495 >Connecting memories through context Seems interesting, I foresee this method requiring complex algorithms to keep everything coordinated correctly. Some sort of semantic relevance algorithm with a thesaurus and dictionary to guide connections should help simplify the process with a "good enough" alignment. >Natural language base Worth investigating, using tiny LLM's as translation layers can do wonders. So long as there are algorithmic guide rails to keep everything coherent. This has got me thinking of the role prompts themselves play in cognitive architecture. An algorithm that appends your request with relevant additions to an internal prompt, has potential to reduce computational load for a seemingly complex system. https://beginswithai.com/super-prompt-for-ai/
Open file (1.17 MB 1920x1080 ThinkingMina.png)
OpenAI 01 appears to provide a fascinating way forward towards cognitive architecture with LLM's. Essentially, it has a spiral of "thoughts" to refine an internal prompt to allow the model to provide a better result. As alluded to in the video, langchain and multi-agent frameworks can accomplish this. Adding RAG, and other enhancements would further bring us closer to real reasoning. Metacognition could be our backdoor into alternative cognition architectures. https://www.youtube.com/watch?v=tMWMuJF-JFo https://www.youtube.com/watch?v=zzaEBGOVKIg
>>33566 I'll be watching that project. The initial results shown in the video with Claude look promising. >>33507 I've refactored a lot of my horsona code to make it more async-friendly, which is necessary since LLM operations often need to be performed async, to make it easier to develop custom backproppable functions, and to more easily support "partial" updates to the computation graph. - In pytorch, you need to call loss.backward(), then optimizer.step(). The problem is that loss.backward() has no information on what exactly needs to be updated, so it needs to calculate gradients for everything that led to the creation of loss. In my refactor, loss.backward() needs to be passed a list of leaf nodes so excess computations can be pruned out. Code: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/autodiff/basic.py#L55 - In pytorch, a single optimizer needs to update all parameters. In my refactor, the optimizer is just a step() function that can be passed a gradient context, which contains all computed gradients. loss.backward() returns a gradient context that can be passed directly to step(). This makes it easier for a module to update its own parameters as needed without needing to rely on the caller to call the optimizer. Code: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/autodiff/basic.py#L188 - In pytorch, backproppable functions are defined as classes with separate forward() and backward() methods. In my refactor, both the forward and backward pass are defined by a single generator function. It calculates the forward call, yields the forward result, gets a gradient context from the yield, and performs the backward call. Code: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/autodiff/basic.py#L128 - The gradient context passed during the backward operation contains a dictionary with all of the variables that need to be updated. This lets functions figure out which gradients actually need to be calculated by checking if a variable is in the dictionary, which lets them cut out unnecessary calls. The functions are supposed to set the gradients of a variable by adding them to a list in the dictionary. To encourage this cutting out unnecessary gradient calculations, the list of keys in the gradient context is immutable, and it only contains the variables that need a gradient calculation. Code: (same as the above horsefunction). - Both sync and async generators are supported for backproppable function definitions. If a sync generators, the backward pass call is wrapped in an async function so it can be handled consistently when I eventually make the backprop & update steps run in parallel. Code: (same as the above horsefunction). I also updated my StoryReader implementation so it can be passed custom databases, caches, and state objects, which I expect will make it a lot easier to customize it depending on what information needs to be extracted from stories. This will require a few more changes, but I'm going to test it to see if it can extract world information and per-character information while reading a story without any changes to the underlying StoryReader module. That's what I'm currently working on.
>>33581 Asynchrony, concurrency, and parallelism have other benefits as well, #1 of which is performance (as in wallclock). Without taking advantage of these in modern processors, you're leaving the vast majority of your wallclock perf still sitting on the table. >I also updated my StoryReader implementation so it can be passed custom databases, caches, and state objects, which I expect will make it a lot easier to customize it depending on what information needs to be extracted from stories. This will require a few more changes, but I'm going to test it to see if it can extract world information and per-character information while reading a story without any changes to the underlying StoryReader module. That's what I'm currently working on. Sound exciting, CyberPonk! Looking forward to seeing what you achieve with this. Cheers. :^)
>>33587 I actually finished implementing asynchronous backprop & parameter updates last night, so the core of the framework is now fully async.
>>33598 >>33601 >I implemented fully parallel backprop and parameter updates, not just async. Good news. Keep up the good work, Anon! Cheers. :^)
>>33581 Horsona chatbot library updates: - I added support for "multi-LLMs" for better concurrency. These wrap multiple other LLM APIs, and they call the most expedient underlying LLM (based on recent calls, token usage, & rate limits) for every query. The rate limits track any arbitrary number of calls & tokens over any arbitrary interval, and multiple rate limits can be set per LLM since they can have per-minute, per-hour, and per-day limits. It handles retries and exponential backoffs as well, switching providers if there's any error, and it removes LLMs from the candidates list if they fail too many times consecutively. For simple usage, it doesn't make much of a difference, but for longer-running tasks that bump up against rate limits, it should give a big performance boost. The API mimics the least common denominator of all of the LLMs it wraps. So if all of the underlying LLMs are compatible with the OpenAI interface, the multi-LLM will be too. - ... Code (a bit ugly... I'll clean it up later): https://github.com/synthbot-anon/horsona/blob/main/src/horsona/llm/multi_engine.py - ... Rate limit implementation (top of the file): https://github.com/synthbot-anon/horsona/blob/main/src/horsona/llm/base_engine.py - ... Example (line 105): https://github.com/synthbot-anon/horsona/blob/main/tests/conftest.py#L105 - I think I figured out how I'm going to implement character cards and other "attachable" modules. The StoryReader returns the context of whatever it's reading as a result, which can be passed to other modules (e.g., the CharacterCard module). This is analogous to how you build pytorch modules. - ... Code example (line 161, cleanup in progress): https://github.com/synthbot-anon/horsona/blob/main/tests/test_reader.py#L161 Unrelated: I spent some time with another PPP developer cleaning up LLM tokenizers to get better pronunciations. The main reason I looked into this was to see why a TTS model might be generalizing poorly, and I think the findings can be generalized. Tokenization makes it very easy to see which parts of data are going to be "well-trained" and "well-utilized". If a token appears frequently in a training dataset, the model will be well-trained on data associated with that token. If a token appears frequently in inferences, the corresponding data will be well-utilized. Creating subtokenizers with a reduced vocabulary is also very easy, and doing so lets you ensure that far more parts of a dataset will be both well-trained and well-utilized without any changes to the dataset or model. Tokenizers are basically compressed, discrete representations of data based on a probability model learned from data. I'm wondering if it's worthwhile to create "semantic" tokenizers that generate an optimized discrete versions of embeddings. The basic flow would be: - Use an embedding model to index some dataset. - Use PCA to align the embedding space to the embedding's array representation. - Discretize the embeddings. - Tokenize the discretization. If this is useful, it would probably be over sequences of embeddings and with some N+1-dimensional tokenization scheme. E.g., since text is represented as 1d sequences, sequences of text would result in a 2d tokenization. For sequences of 2d images, it would be a 3d tokenization. On top of making models generalize better, tokenization let you index and search information in a very different way. With embeddings, you're basically restricted to similarity search. With tokens, you can do things like substring search and regex search. You can build grammars and do things like guidance & constrained decoding, which are an extremely powerful techniques for getting more useful outputs from transformers. At some point, I might look into this further. For now, it's just a spontaneous bit of information.
>>33710 This is both very encouraging, and highly enlightening CyberPonk. I particularly appreciate your perspective on the benefits of tokenization/sub-tokenization. I will soon begin the text works portions of my own RW Foundations efforts, and this is a helpful idea to me rn. Thanks! :^) I greatly appreciate both your insights & perseverance, and for sharing both here on /robowaifu/ . Cheers. :^) Drive on!
>>33566 >OpenAI 01 appears to provide a fascinating way forward towards cognitive architecture with LLM's. Essentially, it has a spiral of "thoughts" to refine an internal prompt to allow the model to provide a better result. As alluded to in the video, langchain and multi-agent frameworks can accomplish this. Adding RAG, and other enhancements would further bring us closer to real reasoning. Metacognition could be our backdoor into alternative cognition architectures. It's also just good signal to us that we are and where on the right track. There was no fundamental hidden/unknown reason why the big players where not doing this. ---- "Scopes of Thought" An idea I had to try to reduce context length usage is to borrow the idea of scope, where you treat the LLM context as a stack. When doing chain of thought, or solving a sub problem, you make a copy of the current state (KV cache if its a transformer), do the task & then extract the result. Then you load the old state and insert the Task you did and the result of it. Whats nice about this is that future calls and tasks get a simplified context of previous tasks and there results without the context being consumed/polluted. ---- An idea for a de-duplication strategy. Every node (or RAG Q&A pair) should have a simple keyword as a "label". (Mostly single word if possible) Label uniqueness is enforced on nodes. If a request for a new node has the same label a disambiguation process is started. First a choice is made if the existing node is actually related and should be modified instead of making a new node. if not then both nodes will have their labels changed to specify a subcategory. For example we are adding a node for a Person named Jon Doe with a label of JonDoe. But we already have a node for a concept of a Jon Doe. The concept node becomes "concept:JonDoe", and the person becomes "person:JonDoe". Note that the both are still reserving the first part JonDoe, a 3rd JonDoe node would still trigger this process. (There would be a top level keyword table that is a list of all simple label names (without the subsection prefixs)) ---- There is Interesting RWKV news. RWKV-7 "Goose" is being tested, what makes it unique is that its not a Linear attention model and overcomes the TC0 limitation that attention based transformers have. https://x.com/BlinkDL_AI/status/1833863117480280528 In general I am very bullish on RWKV LLMs and believe its the way forward if we fine tune them for our use-case. ---- On the topic of Fine Tunes, I have successfully contacted an author of the YoLLaVA paper and got them to publish the training code [1]. (Why do academic types not do this in the first place -_- without me pestering them, it did not get a lot of attention so if did not ask for it, there was a non zero chance that this code would of never been published.) If you don't know what YoLLaVA is please check it out!!! It's super cool :D [2], I think it's a perfect demo of why a custom fine tune is essential. Imagine this paired with an Image similarity RAG/memory system. Whats nice is that any LLM with vision can be trained to do this. This makes me question what other ""low hanging fruit"" is there? LLMs are powerful and the corpos are not interested or searching for use-cases outside of like Q&A chatbots and surveillance. [1]: https://github.com/WisconsinAIVision/YoLLaVA/commit/6a640ee636ceebdd8ff747ea4335b475765b9a7e [2]: https://thaoshibe.github.io/YoLLaVA/
>>33710 Thanks, Chobitsu. I appreciate seeing your thoughts, even when it's "just" encouragement. It's also nice having someone around that's been at this for as long as I have (since 2014). I'm glad the tokenization philosophizing helped. Are you posting your progress on RW Foundations anywhere? >>33717 >hen doing chain of thought, or solving a sub problem, you make a copy of the current state (KV cache if its a transformer), do the task & then extract the result. Then you load the old state and insert the Task you did and the result of it. Guidance does this. There's an efficient implementation here: https://github.com/sgl-project/sglang It lets you store & retrieve KV caches for recent queries. Some API providers support a (much more) limited version of this as well: https://www.anthropic.com/news/prompt-caching https://ai.google.dev/gemini-api/docs/caching The difficulty in applying this more broadly is that KV caches don't compose. So you can't take a KV cache for query A and a KV cache for query B, then piece them together into a KV cache for query AB. You can use it to continue or branch off from previous inferences, but there's no way to merge the caches of two inferences together. That's a limitation imposed by how the popular text positional encodings work. >I have successfully contacted an author of the YoLLaVA paper and got them to publish the training code Very nice.
>>33732 >>33733 (answer-related)
>>33732 Horsona chatbot library updates: - I created a demo app: https://github.com/synthbot-anon/horsona/tree/main/samples/simple_chatbot . It's not supposed to be a great chatbot, just a "hello world" for getting something simple running as an application. - I added support for saving & loading modules. The code is a little ugly right now. It's based on pytorch with a few exceptions: - ... Parent class for serializable data: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/autodiff/basic.py#L32 - ... Example usage: https://github.com/synthbot-anon/horsona/blob/main/tests/test_state_dict.py - ... To support serialization of optimized data structures (e.g., HNSW indexes in C++), modules can hook into the save/restore functions to convert the data into a serializable format. - ... The load function is a classmethod so there's no need to create an instance of a module before loading the data back in. This is necessary for cases where a module can't be reloaded without access to data from sub-modules. When loading a modules, the load function calls the module's constructor and passes restored data as arguments, so serializable modules need to have a constructor that accepts field values. I don't think that's an issue since it's good practice regardless. - ... The load function can accept additional arguments to pass to module constructors. This is necessary right now to handle fields that should not be serialized, like which LLM API to use. - ... The installation is a bit complicated right now. The Ollama dependency requires either familiarity with Ollama or some complicated docker usage. The default LLM config file also requires 4 API keys. I'm thinking about adding support for OpenAI & Anthropic embeddings and defaulting to requiring just a single OpenAI/Anthropic API key for the default installation. - I made LLMs configurable through a JSON config. Example config: https://github.com/synthbot-anon/horsona/blob/main/llm_config.json.example - I added removed the pytorch dependency. Embeddings are now calculated through an API (currently only ollama is supported, but I'll add others). Embeddings are indexed with ChromaDB's fork of hnswlib. I plan to add support for external indexes (e.g., ChromaDB, Postgres). - I made the embeddings configurable in the same way that LLMs are configurable. Example config: https://github.com/synthbot-anon/horsona/blob/main/index_config.json.example - I cleaned up & moved around a bunch of code. All of the embedding code is moved to horsona.index. Caches seem like their own thing (not an API, not backproppable) so I moved them to their own folder. There's a tentative BaseCache class, though I might refine that interface as I figure out more of what it needs to handle.
>>33996 Very exciting, CyberPonk. Especially glad to see you create a demo app for all us newcomers. Hope you had fun, good to see you back, Anon. Cheers. :^)
>>33996 Thanks. Good news indeed. I'll look into it and hope this will bring back my motivation to work on something as well. How well can this current program interact with others?
>>34009 The simple_chatbot demo uses stdin and stdout, so interoperability isn't great. I plan to add support in the library for: - Unity integration (pending consensus from an interested group that's using Unity) - REST integration, compatible with OpenAI's API (will certainly happen) - Unreal Engine integration (hypothetical right now, waiting to find someone proactive that wants to use it with UE) So it'll get better with some time.
>>34017 >The simple_chatbot demo uses stdin and stdout, so interoperability isn't great. Lolwut? That is the absolute apex of interoperability! :^) Unix Way, best way https://wiki.c2.com/?UnixWay
>>34031 Heh. It's good for interoperating with scripts as a standalone chatbot, not so good for interoperating with other software in a modular way. Horsona updates: - I created a sample module for contributors to use as a reference. It takes in character information plus contextual information, and it generates high level pose information that's appropriate for that character in that context. It supports "backpropagating" so that any errors discovered in the results can be used to correct underlying variables. - ... Code: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/contributions/sample/pose.py - ... Explanation of the code: https://github.com/synthbot-anon/horsona/tree/main/src/horsona/contributions/sample - ... Test cases: https://github.com/synthbot-anon/horsona/blob/main/tests/contributions/test_pose_module.py - ... Explanation of the test cases: https://github.com/synthbot-anon/horsona/tree/main/tests/contributions - ... General information for contributing: https://github.com/synthbot-anon/horsona/tree/main/src/horsona/contributions I'm going to work on the interoperability side next. The plan right now is to support calling each module via a REST API. This should allow external callers to reuse and extend any predefined workflows. (The OpenAI API compatibility will come later.)
>>34034 I'm reading some of it, but I can't use a LLM on my old laptop which I'm currently using.
>>34034 POTD Excellent! I'll try to check this out before the holiday seasons, CyberPonk. Keep up the great work! Cheers. :^)
>>34058 Thanks! >>34034 Horsona updates: - I'm adding support for game engine integration. It exposes a REST API that can be wrapped by Unreal Blueprint nodes, Unity Visual Scripting nodes, ComfyUI nodes, and so on. It should support everything that can be done in Python, including backpropagation. - ... Code: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/interface/node_graph/node_graph_api.py - ... Tests: https://github.com/synthbot-anon/horsona/blob/main/tests/interfaces/test_node_graph.py
>>34049 The default installation here doesn't require any powerful local compute: https://github.com/synthbot-anon/horsona/tree/main/samples/simple_chatbot It will be very slow though since OpenAI is slow. For LLMs, It supports Cerebras and Fireworks too, which should be much faster. For embeddings, I think the container version of Ollama should work quickly enough even on an old laptop. I'm running on a CPU, and it's not the bottleneck for any test cases or the sample project. There are instructions on that page for using different LLM APIs and for using the containerized version of Ollama. You can reuse the same index_config.json and llm_config.json when creating custom modules or runn ing tests.
>>34034 Horsona updates: - The game engine integration server example is up here: https://github.com/synthbot-anon/horsona/tree/main/samples/node_graph_api - I added support for session timeouts, which automatically cleans up resources. The timeout resets every time a session is used, and there's a new keep_alive API for manually resetting a timeout if a user is just AFK. - ... Test cases: https://github.com/synthbot-anon/horsona/blob/main/tests/interfaces/test_node_graph.py#L156
>>34076 >The game engine integration server example is up her Wow, that was fast Anon. :^)
>>34085 Hopefully I can get the Unity side of the integration up quickly. The guy I'm working with is giving a lot of good feedback on how the server side is implemented. Once I update my side with those changes, we'll start working on the other half. >>34076 Horsona updates: - I redid how the database cache works since it clubbed together multiple disparate functionality, and its interface required special handling by any module that used it. The new version gives an embedding database an LLM interface. It can be queried like any other LLM, and it does any embedding-specific handling in there (esp. generating keyword searches from the prompt to get better embedding lookups). For whatever underlying LLM it uses, it requires two queries: one to generate the search terms, and one to respond to the query. - ... Code: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/memory/embedding_llm.py - I implemented ReadAgent for dealing with long documents. ReadAgent generates a "gist" for each "page" of the document, which can be used to determine what information is on each page. At query time, it uses one LLM call to determine which pages to pull into the context, then a second LLM call to respond to the query. I implemented this as two modules: one to generate & keep track of gists, and one to provide the LLM interface. My version has two changes relative to the original: (1) when summarizing pages, it provides all gists-so-far as context so it can generate better summaries, and (2) when responding to a query, it provides all gists along with the selected pages rather than just the selected pages. - ... Code for creating gists: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/memory/gist_module.py - ... Code for the ReadAgent LLM wrapper: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/memory/readagent_llm.py - I added some utility functions that are generally useful for getting "smarter" responses. One of the is for searching the web for information on a given topic. The second is for decomposing a given topic into subtopics. - ... Code for searching the web: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/smarts/search_module.py - ... Code for decomposing a topic: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/smarts/mece_module.py I like the LLM wrapper approach for generating augmented responses. I'll likely update some other modules to use the same approach, particularly the DialogueModule for generating in-character responses. The ReaderModule is broken since I got rid of db_cache. I'll update this with a cleaner interface.
>>34110 It's been a while since I posted an update since I'm working on a more complex module, and I still don't have it done. I'm working on creating a causal reasoning module. It's based on the python DoWhy library, which can do analysis based on Judea Pearl's Do Calculus for causal modeling. The basic idea is: - You provide a causal graph for how you believe variables relate to each other. - You give it datapoints of those variables under different circumstances. - It fits a model of the data taking your causal graph into account. - You can ask it causal questions. Example causal questions: - What's the best way to accomplish X? - What were the underlying causes of X? - What would have happened if I did X instead of Y? - Is this new datapoint consistent with earlier datapoints? - How reliable is X's effect on Y? - If I do X, what effect will it have on variables Y, Z that I care about? I have the main class for this implemented. I had to implement some custom things for this to make it more robust: - The standard probability models supported by DoWhy don't handle continuous variables that well, so I had to create my own. My custom one uses Gaussian Processes since it's extremely sample efficient and it works reasonably well with a mix of continuous variables and discrete variables. - I'm using a kernel that's a slightly modified version of the SciKit Learn's default to make it more robust to noisy samples. The default is ConstantKernel * RBF, my custom one is ConstantKernel * Matern + WhiteNoise. - I'm imputing missing values in data before building a model on it since Gaussian Processes can't handle missing values. I'm using SciKit Lear's IterativeImputer for this. I ran some rudimentary tests to make sure it finds causal effects even with missing & noisy data and with very small sample sizes. With clean data, it can fairly reliably identify causal effects from as little as 10 datapoints for 12 variables. (The standard recommendation is NumVariables + 2). Adding 0.5 standard deviations of noise to all datapoints and setting 20% of values to null, it does well with 20 datapoints. With more noise and more null values, it requires more datapoints. It performs poorly when there are erroneous outliers in the data. I haven't figured out how to handle that yet. Since this needs to be fast and since it can slow down significantly with larger datasets, I have code for identifying representative samples and retaining only those. I'm using K-Means to identify clusters. I went through a large refactor since I implemented this, and I haven't yet integrated it with the updated code. I'm considering updating this to generate stratified clusters based on treatment (i.e., just the actions that need to be analyzed) patterns. The downside is that that would make it harder to understand what datapoints get retained, and it would need additional information, so I'm leaning against it. Once that's integrated, I'll need to think through how to wrap this functionality in an LLM interface ("LLM wrapper" a la >>34110). I suspect medium-size models (~70b) can generate reasonable causal graphs and figure out what kinds of causal questions need to be answered for a given query, but it'll require some experimentation to figure out exactly how. One challenge is figuring out how deal with large causal graphs. Right now, I'm thinking that each causal graph will represent a single "persona", and each persona can interact with others before deciding on a final response. A single persona would be backed by a small causal graph, and more complex causal reasoning would come from interactions between multiple personas. One huge benefit here is that, since interaction with a persona is the same as interacting with any LLM (or LLM wrapper), this can automatically support hybrid reasoning that requires knowledge, associating reasoning, and causal reasoning. I think a "persona" here is the same as an "agent" in Marvin Minsky's Society of Mind theory. I'm looking into that now to see what thoughts have been put into this approach.
>>34239 >I think a "persona" here is the same as an "agent" in Marvin Minsky's Society of Mind theory. I'm looking into that now to see what thoughts have been put into this approach. I'm not seeing anything here that hasn't been incorporated into common sense. It seems like Society of Mind is just a statement that a mind is composed of interacting modules. It applies equally well to monolithic neural networks as it does to functionally distinguished uses of a collection of neural networks. I don't expect to find anything useful there.
>>34240 >It seems like Society of Mind is just a statement that a mind is composed of interacting modules <TFW you read this as 'a maid is just a collection of interacting modules' Lol :D anime catgrill meidos in tiny miniskirts are a reality when?
Open file (156.38 KB 194x194 bread.gif)
>>34272 >Society of Meidos. Soon™. It's a pain in the ass working out how to do this. - The analysis requires the causal graph to be a DAG, but real-world causal graphs are definitely not DAGs. - I can get around this by distinguishing input nodes and output nodes, and having output nodes represent a change in output value rather than the output value itself. This requires more state tracking since using nodes as both input and output involves translating between the two. - Finding the "right way" to specify & apply changes isn't straightforward. - For practical reasons, I can only generate small graphs on each query. Piecing them together requires identifying which nodes are "the same" across multiple graph, decomposing queries so they can be applied to each graph separately, and stitching together the results.
As-yet unsolved mysteries. Can /robowaifu/ help solve some of these, please? https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_biology#Cognition_and_psychology
>>34279 >- I can get around this by distinguishing input nodes and output nodes, and having output nodes represent a change in output value rather than the output value itself. This requires more state tracking since using nodes as both input and output involves translating between the two. As mentioned to you in the Sumomo-chan bread : ( >>14409 ), there is a way in C++ to get around what would normally be a DAG of dependencies -- and in a way that doesn't restrict the representative 'nodes' (ie, C++ classes) from their original, intended purposes. I wonder if what you're dealing with isn't simply a limitation of your approach/language , Anon? Do you think it's possible to implement a simple, one-step abstraction 'layer' that frees you from this conundrum in a similar way to what was chosen for RW Foundations ? I hope you solve it soon, CyberPonk. Cheers. :^)
>>34298 It's not an abstraction issue, and it is a fundamental limitation of the the theory available today. I can generate reasoning that involves cyclic dependencies without a problem– that's actually what happens by default– but no rigorous causal analysis engine is capable of dealing with it. As far as I can tell, it's not known how to deal with spurious correlations when the causal graph contains cycles. I could switch to doing Bayesian message passing, which is capable of dealing with cycles, but it doesn't handle spurious correlations properly so it's not actually doing a proper causal analysis. I probably will end up adding a less restrictive module for non-causal analysis at some point, but right now I'm just focusing specifically on adding in the ability for an LLM to answer causal questions and use that to make decisions. I've actually decided to stick with representing output and input nodes in the same way. Having two representations (values & changes) for output nodes limits how much work can be offloaded to the causal analysis engine too much. To deal with cycles, I currently plan to create a graph of DAGs. Two DAGs are connected if they have nodes in common. Causal analysis is done on each DAG individually, then the results will be propagated to downstream DAGs. It's going to be complicated, but I think it's worthwhile to be able to do more complex analysis. >>34297 I think this one at least has solid grounding now: >How and where does the brain evaluate reward value and effort (cost) to modulate behavior? "Evaluation" is too vague a term, but it's essentially between the thalamus, basil ganglia, ventromedial prefrontal cortex, orbitofrontal cortex, and amygdala. See: https://pmc.ncbi.nlm.nih.gov/articles/PMC4093837/ https://pmc.ncbi.nlm.nih.gov/articles/PMC9352198/ https://www.youtube.com/watch?v=F1L-YTCUpk4
>>34299 >As far as I can tell, it's not known how to deal with spurious correlations when the causal graph contains cycles. Yet isn't this exactly what industrial PID Theory was designed to handle well? What if you 'wrap' each of your causality graph nodes inside an individual Multiproducer/Multiconsumer PID Interface Layer to equilibrate the system overall, outside of the local maxima/minima transition epochs? >tl;dr This is primarily a temporality issue, I think. All industrial systems in the realworld tend to have feedback loops, yet these control systems provably manage it all successfully. >=== -minor disambiguation
Edited last time by Chobitsu on 11/10/2024 (Sun) 04:03:46.
>>34300 I don't think it is. PID controllers don't account for spurious correlations. They treat all correlations equally. Per my understanding, PID controllers also start to fail when there are multiple interacting loops due to feedback issues. I think the usual solutions for scaling up PID controllers all involve removing cyclic feedback between control loops that can cause instabilities (cascade control, feedforward control, decoupling systems). If there are strong correlations between interacting loops, I don't think there's any way to guarantee that PID controllers will converge. Having interacting loops work on different time scales is one solution, but I can't guarantee that it's possible to separate causal graphs by time scales in a way that removes the cycles, and that's especially true when I'm using an LLM to generate many causal graphs dynamically. I'm realizing that even bayesian message passing will also fail to converge in a lot of cases. Maybe the best I can do here is to let it run for some fixed number of updates and just use the result regardless of whether it converged.
>>34301 >If there are strong correlations between interacting loops, I don't think there's any way to guarantee that PID controllers will converge. AFAICT, we're inventing theory here (ever hear of a Multiproducer/Multiconsumer PID Interface Layer before?), so no, no guarantees are to be had at this stage of the research. But if you don't make an effort to make the rubber meet the road, then you'll never know. I would predict the many-to-many matrix+temporal-sliding system that this concept approximates -- especially along with it's inherent ability to damp out spikes and converge on a node-local, stable signal level -- ought to provide ample opportunities for experimental tweaking/rewiring. Wouldn't you agree, Anon?
>>34302 I do agree, but getting robust causal relationships is important for what I want. Its uses are far more limited without that. If correlations were enough, I could just train a simple sparse autoencoder. In any case, I figured out how to analyze causal chains across graphs. There's an analogy with quorum intersection algorithms in distributed computing that I'm pretty sure works here. I'll try implementing it.
>>34303 >quorum intersection algorithms Remarkable synchronicity, CyberPonk. :^) Operational Transforms (OTs) & Conflict-Free Replicated Data Types (CRDTs) was literally going to be the topic of my next post to you in this chain, as per dovetailing with the added benefits of the 'power-PID-wrapped nodes' concept to quickly solve the need for convergence with temporal-sliding going on everywhere (just like in the realworld, lol). <---> Also, just to clarify: my idea wasn't to attempt eliminating cycles in the system -- but rather to make it reasonably-robust+speedy in the presence of them (just like in the realworld of bio-neurology, heh.) So it sounds like you're well on your way to a solution! Cheers, Anon. :^)
>>34279 Sorry for not answering earlier. I don't think I can help you. Did you ask in AI related forums? Did you ask some AI service for advice? Could you provide an example? >>34297 I think it's better to just look at every functionality and try to replicate it. We don't need to understand the human brain exactly. More like >>25032

Report/Delete/Moderation Forms
Delete
Report