/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

I Fucked Up

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


“If you are going through hell, keep going.” -t. Winston Churchill


Open file (8.45 MB 2000x2811 ClipboardImage.png)
Cognitivie Architecture : Discussion Kiwi 08/22/2023 (Tue) 05:03:37 No.24783
Chii Cogito Ergo Chii Chii thinks, therefore Chii is. Cognitive architecture is the study of the building blocks which lead to cognition. The structures from which thought emerges. Let's start with the three main aspects of mind; Sentience: Ability to experience sensations and feelings. Her sensors communicate states to her. She senses your hand holding hers and can react. Feelings, having emotions. Her hand being held bring her happiness. This builds on her capacity for subjective experience, related to qualia. Self-awareness: Capacity to differentiate the self from external actors and objects. When presented with a mirror, echo, or other self referential sensory input is recognized as the self. She sees herself in your eyes reflection and recognizes that is her, that she is being held by you. Sapience: Perception of knowledge. Linking concepts and meanings. Able to discern correlations congruent with having wisdom. She sees you collapse into your chair. She infers your state of exhaustion and brings you something to drink. These building blocks integrate and allow her to be. She doesn't just feel, she has qualia. She doesn't see her reflection, she sees herself reflected, she acknowledges her own existence. She doesn't just find relevant data, she works with concepts and integrates her feelings and personality when forming a response. Cognition, subjective thought reliant on a conscious separation of the self and external reality that integrates knowledge of the latter. A state beyond current AI, a true intellect. This thread is dedicated to all the steps on the long journey towards a waifu that truly thinks and feels. >=== -edit subject
Edited last time by Chobitsu on 09/17/2023 (Sun) 20:43:41.
>>33710 This is both very encouraging, and highly enlightening CyberPonk. I particularly appreciate your perspective on the benefits of tokenization/sub-tokenization. I will soon begin the text works portions of my own RW Foundations efforts, and this is a helpful idea to me rn. Thanks! :^) I greatly appreciate both your insights & perseverance, and for sharing both here on /robowaifu/ . Cheers. :^) Drive on!
>>33566 >OpenAI 01 appears to provide a fascinating way forward towards cognitive architecture with LLM's. Essentially, it has a spiral of "thoughts" to refine an internal prompt to allow the model to provide a better result. As alluded to in the video, langchain and multi-agent frameworks can accomplish this. Adding RAG, and other enhancements would further bring us closer to real reasoning. Metacognition could be our backdoor into alternative cognition architectures. It's also just good signal to us that we are and where on the right track. There was no fundamental hidden/unknown reason why the big players where not doing this. ---- "Scopes of Thought" An idea I had to try to reduce context length usage is to borrow the idea of scope, where you treat the LLM context as a stack. When doing chain of thought, or solving a sub problem, you make a copy of the current state (KV cache if its a transformer), do the task & then extract the result. Then you load the old state and insert the Task you did and the result of it. Whats nice about this is that future calls and tasks get a simplified context of previous tasks and there results without the context being consumed/polluted. ---- An idea for a de-duplication strategy. Every node (or RAG Q&A pair) should have a simple keyword as a "label". (Mostly single word if possible) Label uniqueness is enforced on nodes. If a request for a new node has the same label a disambiguation process is started. First a choice is made if the existing node is actually related and should be modified instead of making a new node. if not then both nodes will have their labels changed to specify a subcategory. For example we are adding a node for a Person named Jon Doe with a label of JonDoe. But we already have a node for a concept of a Jon Doe. The concept node becomes "concept:JonDoe", and the person becomes "person:JonDoe". Note that the both are still reserving the first part JonDoe, a 3rd JonDoe node would still trigger this process. (There would be a top level keyword table that is a list of all simple label names (without the subsection prefixs)) ---- There is Interesting RWKV news. RWKV-7 "Goose" is being tested, what makes it unique is that its not a Linear attention model and overcomes the TC0 limitation that attention based transformers have. https://x.com/BlinkDL_AI/status/1833863117480280528 In general I am very bullish on RWKV LLMs and believe its the way forward if we fine tune them for our use-case. ---- On the topic of Fine Tunes, I have successfully contacted an author of the YoLLaVA paper and got them to publish the training code [1]. (Why do academic types not do this in the first place -_- without me pestering them, it did not get a lot of attention so if did not ask for it, there was a non zero chance that this code would of never been published.) If you don't know what YoLLaVA is please check it out!!! It's super cool :D [2], I think it's a perfect demo of why a custom fine tune is essential. Imagine this paired with an Image similarity RAG/memory system. Whats nice is that any LLM with vision can be trained to do this. This makes me question what other ""low hanging fruit"" is there? LLMs are powerful and the corpos are not interested or searching for use-cases outside of like Q&A chatbots and surveillance. [1]: https://github.com/WisconsinAIVision/YoLLaVA/commit/6a640ee636ceebdd8ff747ea4335b475765b9a7e [2]: https://thaoshibe.github.io/YoLLaVA/
>>33710 Thanks, Chobitsu. I appreciate seeing your thoughts, even when it's "just" encouragement. It's also nice having someone around that's been at this for as long as I have (since 2014). I'm glad the tokenization philosophizing helped. Are you posting your progress on RW Foundations anywhere? >>33717 >hen doing chain of thought, or solving a sub problem, you make a copy of the current state (KV cache if its a transformer), do the task & then extract the result. Then you load the old state and insert the Task you did and the result of it. Guidance does this. There's an efficient implementation here: https://github.com/sgl-project/sglang It lets you store & retrieve KV caches for recent queries. Some API providers support a (much more) limited version of this as well: https://www.anthropic.com/news/prompt-caching https://ai.google.dev/gemini-api/docs/caching The difficulty in applying this more broadly is that KV caches don't compose. So you can't take a KV cache for query A and a KV cache for query B, then piece them together into a KV cache for query AB. You can use it to continue or branch off from previous inferences, but there's no way to merge the caches of two inferences together. That's a limitation imposed by how the popular text positional encodings work. >I have successfully contacted an author of the YoLLaVA paper and got them to publish the training code Very nice.
>>33732 >>33733 (answer-related)
>>33732 Horsona chatbot library updates: - I created a demo app: https://github.com/synthbot-anon/horsona/tree/main/samples/simple_chatbot . It's not supposed to be a great chatbot, just a "hello world" for getting something simple running as an application. - I added support for saving & loading modules. The code is a little ugly right now. It's based on pytorch with a few exceptions: - ... Parent class for serializable data: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/autodiff/basic.py#L32 - ... Example usage: https://github.com/synthbot-anon/horsona/blob/main/tests/test_state_dict.py - ... To support serialization of optimized data structures (e.g., HNSW indexes in C++), modules can hook into the save/restore functions to convert the data into a serializable format. - ... The load function is a classmethod so there's no need to create an instance of a module before loading the data back in. This is necessary for cases where a module can't be reloaded without access to data from sub-modules. When loading a modules, the load function calls the module's constructor and passes restored data as arguments, so serializable modules need to have a constructor that accepts field values. I don't think that's an issue since it's good practice regardless. - ... The load function can accept additional arguments to pass to module constructors. This is necessary right now to handle fields that should not be serialized, like which LLM API to use. - ... The installation is a bit complicated right now. The Ollama dependency requires either familiarity with Ollama or some complicated docker usage. The default LLM config file also requires 4 API keys. I'm thinking about adding support for OpenAI & Anthropic embeddings and defaulting to requiring just a single OpenAI/Anthropic API key for the default installation. - I made LLMs configurable through a JSON config. Example config: https://github.com/synthbot-anon/horsona/blob/main/llm_config.json.example - I added removed the pytorch dependency. Embeddings are now calculated through an API (currently only ollama is supported, but I'll add others). Embeddings are indexed with ChromaDB's fork of hnswlib. I plan to add support for external indexes (e.g., ChromaDB, Postgres). - I made the embeddings configurable in the same way that LLMs are configurable. Example config: https://github.com/synthbot-anon/horsona/blob/main/index_config.json.example - I cleaned up & moved around a bunch of code. All of the embedding code is moved to horsona.index. Caches seem like their own thing (not an API, not backproppable) so I moved them to their own folder. There's a tentative BaseCache class, though I might refine that interface as I figure out more of what it needs to handle.
>>33996 Very exciting, CyberPonk. Especially glad to see you create a demo app for all us newcomers. Hope you had fun, good to see you back, Anon. Cheers. :^)
>>33996 Thanks. Good news indeed. I'll look into it and hope this will bring back my motivation to work on something as well. How well can this current program interact with others?
>>34009 The simple_chatbot demo uses stdin and stdout, so interoperability isn't great. I plan to add support in the library for: - Unity integration (pending consensus from an interested group that's using Unity) - REST integration, compatible with OpenAI's API (will certainly happen) - Unreal Engine integration (hypothetical right now, waiting to find someone proactive that wants to use it with UE) So it'll get better with some time.
>>34017 >The simple_chatbot demo uses stdin and stdout, so interoperability isn't great. Lolwut? That is the absolute apex of interoperability! :^) Unix Way, best way https://wiki.c2.com/?UnixWay
>>34031 Heh. It's good for interoperating with scripts as a standalone chatbot, not so good for interoperating with other software in a modular way. Horsona updates: - I created a sample module for contributors to use as a reference. It takes in character information plus contextual information, and it generates high level pose information that's appropriate for that character in that context. It supports "backpropagating" so that any errors discovered in the results can be used to correct underlying variables. - ... Code: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/contributions/sample/pose.py - ... Explanation of the code: https://github.com/synthbot-anon/horsona/tree/main/src/horsona/contributions/sample - ... Test cases: https://github.com/synthbot-anon/horsona/blob/main/tests/contributions/test_pose_module.py - ... Explanation of the test cases: https://github.com/synthbot-anon/horsona/tree/main/tests/contributions - ... General information for contributing: https://github.com/synthbot-anon/horsona/tree/main/src/horsona/contributions I'm going to work on the interoperability side next. The plan right now is to support calling each module via a REST API. This should allow external callers to reuse and extend any predefined workflows. (The OpenAI API compatibility will come later.)
>>34034 I'm reading some of it, but I can't use a LLM on my old laptop which I'm currently using.
>>34034 POTD Excellent! I'll try to check this out before the holiday seasons, CyberPonk. Keep up the great work! Cheers. :^)
>>34058 Thanks! >>34034 Horsona updates: - I'm adding support for game engine integration. It exposes a REST API that can be wrapped by Unreal Blueprint nodes, Unity Visual Scripting nodes, ComfyUI nodes, and so on. It should support everything that can be done in Python, including backpropagation. - ... Code: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/interface/node_graph/node_graph_api.py - ... Tests: https://github.com/synthbot-anon/horsona/blob/main/tests/interfaces/test_node_graph.py
>>34049 The default installation here doesn't require any powerful local compute: https://github.com/synthbot-anon/horsona/tree/main/samples/simple_chatbot It will be very slow though since OpenAI is slow. For LLMs, It supports Cerebras and Fireworks too, which should be much faster. For embeddings, I think the container version of Ollama should work quickly enough even on an old laptop. I'm running on a CPU, and it's not the bottleneck for any test cases or the sample project. There are instructions on that page for using different LLM APIs and for using the containerized version of Ollama. You can reuse the same index_config.json and llm_config.json when creating custom modules or runn ing tests.
>>34034 Horsona updates: - The game engine integration server example is up here: https://github.com/synthbot-anon/horsona/tree/main/samples/node_graph_api - I added support for session timeouts, which automatically cleans up resources. The timeout resets every time a session is used, and there's a new keep_alive API for manually resetting a timeout if a user is just AFK. - ... Test cases: https://github.com/synthbot-anon/horsona/blob/main/tests/interfaces/test_node_graph.py#L156
>>34076 >The game engine integration server example is up her Wow, that was fast Anon. :^)
>>34085 Hopefully I can get the Unity side of the integration up quickly. The guy I'm working with is giving a lot of good feedback on how the server side is implemented. Once I update my side with those changes, we'll start working on the other half. >>34076 Horsona updates: - I redid how the database cache works since it clubbed together multiple disparate functionality, and its interface required special handling by any module that used it. The new version gives an embedding database an LLM interface. It can be queried like any other LLM, and it does any embedding-specific handling in there (esp. generating keyword searches from the prompt to get better embedding lookups). For whatever underlying LLM it uses, it requires two queries: one to generate the search terms, and one to respond to the query. - ... Code: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/memory/embedding_llm.py - I implemented ReadAgent for dealing with long documents. ReadAgent generates a "gist" for each "page" of the document, which can be used to determine what information is on each page. At query time, it uses one LLM call to determine which pages to pull into the context, then a second LLM call to respond to the query. I implemented this as two modules: one to generate & keep track of gists, and one to provide the LLM interface. My version has two changes relative to the original: (1) when summarizing pages, it provides all gists-so-far as context so it can generate better summaries, and (2) when responding to a query, it provides all gists along with the selected pages rather than just the selected pages. - ... Code for creating gists: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/memory/gist_module.py - ... Code for the ReadAgent LLM wrapper: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/memory/readagent_llm.py - I added some utility functions that are generally useful for getting "smarter" responses. One of the is for searching the web for information on a given topic. The second is for decomposing a given topic into subtopics. - ... Code for searching the web: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/smarts/search_module.py - ... Code for decomposing a topic: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/smarts/mece_module.py I like the LLM wrapper approach for generating augmented responses. I'll likely update some other modules to use the same approach, particularly the DialogueModule for generating in-character responses. The ReaderModule is broken since I got rid of db_cache. I'll update this with a cleaner interface.
>>34110 It's been a while since I posted an update since I'm working on a more complex module, and I still don't have it done. I'm working on creating a causal reasoning module. It's based on the python DoWhy library, which can do analysis based on Judea Pearl's Do Calculus for causal modeling. The basic idea is: - You provide a causal graph for how you believe variables relate to each other. - You give it datapoints of those variables under different circumstances. - It fits a model of the data taking your causal graph into account. - You can ask it causal questions. Example causal questions: - What's the best way to accomplish X? - What were the underlying causes of X? - What would have happened if I did X instead of Y? - Is this new datapoint consistent with earlier datapoints? - How reliable is X's effect on Y? - If I do X, what effect will it have on variables Y, Z that I care about? I have the main class for this implemented. I had to implement some custom things for this to make it more robust: - The standard probability models supported by DoWhy don't handle continuous variables that well, so I had to create my own. My custom one uses Gaussian Processes since it's extremely sample efficient and it works reasonably well with a mix of continuous variables and discrete variables. - I'm using a kernel that's a slightly modified version of the SciKit Learn's default to make it more robust to noisy samples. The default is ConstantKernel * RBF, my custom one is ConstantKernel * Matern + WhiteNoise. - I'm imputing missing values in data before building a model on it since Gaussian Processes can't handle missing values. I'm using SciKit Lear's IterativeImputer for this. I ran some rudimentary tests to make sure it finds causal effects even with missing & noisy data and with very small sample sizes. With clean data, it can fairly reliably identify causal effects from as little as 10 datapoints for 12 variables. (The standard recommendation is NumVariables + 2). Adding 0.5 standard deviations of noise to all datapoints and setting 20% of values to null, it does well with 20 datapoints. With more noise and more null values, it requires more datapoints. It performs poorly when there are erroneous outliers in the data. I haven't figured out how to handle that yet. Since this needs to be fast and since it can slow down significantly with larger datasets, I have code for identifying representative samples and retaining only those. I'm using K-Means to identify clusters. I went through a large refactor since I implemented this, and I haven't yet integrated it with the updated code. I'm considering updating this to generate stratified clusters based on treatment (i.e., just the actions that need to be analyzed) patterns. The downside is that that would make it harder to understand what datapoints get retained, and it would need additional information, so I'm leaning against it. Once that's integrated, I'll need to think through how to wrap this functionality in an LLM interface ("LLM wrapper" a la >>34110). I suspect medium-size models (~70b) can generate reasonable causal graphs and figure out what kinds of causal questions need to be answered for a given query, but it'll require some experimentation to figure out exactly how. One challenge is figuring out how deal with large causal graphs. Right now, I'm thinking that each causal graph will represent a single "persona", and each persona can interact with others before deciding on a final response. A single persona would be backed by a small causal graph, and more complex causal reasoning would come from interactions between multiple personas. One huge benefit here is that, since interaction with a persona is the same as interacting with any LLM (or LLM wrapper), this can automatically support hybrid reasoning that requires knowledge, associating reasoning, and causal reasoning. I think a "persona" here is the same as an "agent" in Marvin Minsky's Society of Mind theory. I'm looking into that now to see what thoughts have been put into this approach.
>>34239 >I think a "persona" here is the same as an "agent" in Marvin Minsky's Society of Mind theory. I'm looking into that now to see what thoughts have been put into this approach. I'm not seeing anything here that hasn't been incorporated into common sense. It seems like Society of Mind is just a statement that a mind is composed of interacting modules. It applies equally well to monolithic neural networks as it does to functionally distinguished uses of a collection of neural networks. I don't expect to find anything useful there.
>>34240 >It seems like Society of Mind is just a statement that a mind is composed of interacting modules <TFW you read this as 'a maid is just a collection of interacting modules' Lol :D anime catgrill meidos in tiny miniskirts are a reality when?
Open file (156.38 KB 194x194 bread.gif)
>>34272 >Society of Meidos. Soon™. It's a pain in the ass working out how to do this. - The analysis requires the causal graph to be a DAG, but real-world causal graphs are definitely not DAGs. - I can get around this by distinguishing input nodes and output nodes, and having output nodes represent a change in output value rather than the output value itself. This requires more state tracking since using nodes as both input and output involves translating between the two. - Finding the "right way" to specify & apply changes isn't straightforward. - For practical reasons, I can only generate small graphs on each query. Piecing them together requires identifying which nodes are "the same" across multiple graph, decomposing queries so they can be applied to each graph separately, and stitching together the results.
As-yet unsolved mysteries. Can /robowaifu/ help solve some of these, please? https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_biology#Cognition_and_psychology
>>34279 >- I can get around this by distinguishing input nodes and output nodes, and having output nodes represent a change in output value rather than the output value itself. This requires more state tracking since using nodes as both input and output involves translating between the two. As mentioned to you in the Sumomo-chan bread : ( >>14409 ), there is a way in C++ to get around what would normally be a DAG of dependencies -- and in a way that doesn't restrict the representative 'nodes' (ie, C++ classes) from their original, intended purposes. I wonder if what you're dealing with isn't simply a limitation of your approach/language , Anon? Do you think it's possible to implement a simple, one-step abstraction 'layer' that frees you from this conundrum in a similar way to what was chosen for RW Foundations ? I hope you solve it soon, CyberPonk. Cheers. :^)
>>34298 It's not an abstraction issue, and it is a fundamental limitation of the the theory available today. I can generate reasoning that involves cyclic dependencies without a problem– that's actually what happens by default– but no rigorous causal analysis engine is capable of dealing with it. As far as I can tell, it's not known how to deal with spurious correlations when the causal graph contains cycles. I could switch to doing Bayesian message passing, which is capable of dealing with cycles, but it doesn't handle spurious correlations properly so it's not actually doing a proper causal analysis. I probably will end up adding a less restrictive module for non-causal analysis at some point, but right now I'm just focusing specifically on adding in the ability for an LLM to answer causal questions and use that to make decisions. I've actually decided to stick with representing output and input nodes in the same way. Having two representations (values & changes) for output nodes limits how much work can be offloaded to the causal analysis engine too much. To deal with cycles, I currently plan to create a graph of DAGs. Two DAGs are connected if they have nodes in common. Causal analysis is done on each DAG individually, then the results will be propagated to downstream DAGs. It's going to be complicated, but I think it's worthwhile to be able to do more complex analysis. >>34297 I think this one at least has solid grounding now: >How and where does the brain evaluate reward value and effort (cost) to modulate behavior? "Evaluation" is too vague a term, but it's essentially between the thalamus, basil ganglia, ventromedial prefrontal cortex, orbitofrontal cortex, and amygdala. See: https://pmc.ncbi.nlm.nih.gov/articles/PMC4093837/ https://pmc.ncbi.nlm.nih.gov/articles/PMC9352198/ https://www.youtube.com/watch?v=F1L-YTCUpk4
>>34299 >As far as I can tell, it's not known how to deal with spurious correlations when the causal graph contains cycles. Yet isn't this exactly what industrial PID Theory was designed to handle well? What if you 'wrap' each of your causality graph nodes inside an individual Multiproducer/Multiconsumer PID Interface Layer to equilibrate the system overall, outside of the local maxima/minima transition epochs? >tl;dr This is primarily a temporality issue, I think. All industrial systems in the realworld tend to have feedback loops, yet these control systems provably manage it all successfully. >=== -minor disambiguation
Edited last time by Chobitsu on 11/10/2024 (Sun) 04:03:46.
>>34300 I don't think it is. PID controllers don't account for spurious correlations. They treat all correlations equally. Per my understanding, PID controllers also start to fail when there are multiple interacting loops due to feedback issues. I think the usual solutions for scaling up PID controllers all involve removing cyclic feedback between control loops that can cause instabilities (cascade control, feedforward control, decoupling systems). If there are strong correlations between interacting loops, I don't think there's any way to guarantee that PID controllers will converge. Having interacting loops work on different time scales is one solution, but I can't guarantee that it's possible to separate causal graphs by time scales in a way that removes the cycles, and that's especially true when I'm using an LLM to generate many causal graphs dynamically. I'm realizing that even bayesian message passing will also fail to converge in a lot of cases. Maybe the best I can do here is to let it run for some fixed number of updates and just use the result regardless of whether it converged.
>>34301 >If there are strong correlations between interacting loops, I don't think there's any way to guarantee that PID controllers will converge. AFAICT, we're inventing theory here (ever hear of a Multiproducer/Multiconsumer PID Interface Layer before?), so no, no guarantees are to be had at this stage of the research. But if you don't make an effort to make the rubber meet the road, then you'll never know. I would predict the many-to-many matrix+temporal-sliding system that this concept approximates -- especially along with it's inherent ability to damp out spikes and converge on a node-local, stable signal level -- ought to provide ample opportunities for experimental tweaking/rewiring. Wouldn't you agree, Anon?
>>34302 I do agree, but getting robust causal relationships is important for what I want. Its uses are far more limited without that. If correlations were enough, I could just train a simple sparse autoencoder. In any case, I figured out how to analyze causal chains across graphs. There's an analogy with quorum intersection algorithms in distributed computing that I'm pretty sure works here. I'll try implementing it.
>>34303 >quorum intersection algorithms Remarkable synchronicity, CyberPonk. :^) Operational Transforms (OTs) & Conflict-Free Replicated Data Types (CRDTs) was literally going to be the topic of my next post to you in this chain, as per dovetailing with the added benefits of the 'power-PID-wrapped nodes' concept to quickly solve the need for convergence with temporal-sliding going on everywhere (just like in the realworld, lol). <---> Also, just to clarify: my idea wasn't to attempt eliminating cycles in the system -- but rather to make it reasonably-robust+speedy in the presence of them (just like in the realworld of bio-neurology, heh.) So it sounds like you're well on your way to a solution! Cheers, Anon. :^)
>>34279 Sorry for not answering earlier. I don't think I can help you. Did you ask in AI related forums? Did you ask some AI service for advice? Could you provide an example? >>34297 I think it's better to just look at every functionality and try to replicate it. We don't need to understand the human brain exactly. More like >>25032
I took a short break from >>34303 to work on some other things. - Text generation with GPT-SoVITS: https://github.com/synthbot-anon/horsona/tree/main/samples/gpt_sovits - ... This interface is definitely not final. It's just an early first draft. - I added a module for dealing with large amounts of memory. It uses a combination of RAG + ReadAgent. Short version: when given a document, it chunks the document, creates summaries for each chunk (using prior chunks & summaries for context), and creating embeddings for each summary. At retrieval time, it expands a given task into many queries, uses RAG to identify relevant summaries, uses an LLM to identify the summaries most worth "unpacking" into their original chunks, and uses both the most relevant summaries and most relevant chunks to respond. I'm calling it a Wiki Module / Wiki LLM since that's the kind of data it's most suited for. - ... Code for processing & indexing documents: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/memory/wiki_module.py - ... Code for responding to queries: https://github.com/synthbot-anon/horsona/blob/main/src/horsona/memory/wiki_llm.py - I add an option to create an OpenAI-compatible endpoint for any supported LLM, including custom LLMs. This was so I could test compatibility with SillyTavern. - ... Code for creating OAI-compatible endpoints: https://github.com/synthbot-anon/horsona/tree/main/src/horsona/interface/oai - ... Example custom LLM that works with SillyTavern: https://github.com/synthbot-anon/horsona/tree/main/samples/llm_endpoint. This one (1) uses the Wiki module & LLM from earlier to let the LLM access significantly more context whenever it's relevant, and (2) uses a ReadAgent-like module so it can continue conversations for much longer without forgetting what was said earlier. It requires no special configuration in SillyTavern or plugins. Just use the new endpoint, and it should add in the new functionality. One issue here is that I don't know how to get a session id from SillyTavern, so the conversation memory persists across all conversations. The only way to fix it is to restart the server. I'll add a better way to deal with that at some point, but it's not a priority for now. - Bunch of small changes. The library now support streaming outputs, there are utility functions for saving/restoring modules with binary data (e.g., embedding databases), I cleaned up the LLM classes so it's easier to derive new (custom) ones, I added support for Grok models, config files now support comments, I improved several prompts, embedding lookups now optionally return distances in addition to results, many bug fixes, etc etc. One of the todo items on my feature list is to automatically generate lorebooks. Playing around with the SillyTavern integration made me realize how important that is. RAG-based lookups are really only good for lorebook-like information (static, "current state of the world" like a research paper or Wiki), and stories/conversations certainly don't look like that. There needs to be a conversion step from dynamic data to Wiki-style data, which would essentially be a lorebook. I'll probably work one that either after I'm done with the causal stuff or when I need another break from it. >>34305 CRDTs might play a role there too ^:). >>34309 I asked a guy that's spent a decent amount of time working with & creating causal models. The response was "Yeah, without an explicit SEM causal inference is hard." Dynamic causal graphs and causal inference split across multiple graphs don't seem to be a thing people commonly do. There's some work on dealing with multiple graphs under "causal fusion", but the work is pretty sparse and scattered. Almost all of the work I've done so far was together with Claude. I do very little planning and coding now without Claude.
>>34302 >Multiproducer/Multiconsumer PID Interface Layer... I would predict the many-to-many matrix+temporal-sliding system that this concept approximates -- especially along with it's inherent ability to damp out spikes and converge on a node-local, stable signal level -- ought to provide ample opportunities for experimental tweaking/rewiring. Wouldn't you agree, Anon? You're making my head hurt :)
Open file (58.93 KB 552x378 minGRUs go brrrrrr.png)
This paper is going to become incredibly important in the coming years for creating recurrent systems that emulate processes in the brain that can scale training on GPUs and be deployed efficiently on CPUs Paper: https://arxiv.org/abs/2410.01201 Code: https://github.com/lucidrains/minGRU-pytorch I've been trying to develop a language model with predictive coding but it has been infeasible to train due to recurrence requiring backpropagation through time. Some researchers found a way to reformulate the recurrence relations to allow for parallel computation via a parallel scan algorithm. The minGRU can leverage the GPU to train on long sequences in parallel and is competitive with transformers and Mamba on language modeling and reinforcement learning. Their O(n) computational complexity in sequence length and simple implementation make them ideal to play with and experiment too. My understanding is the most expensive computations (the matrix multiplications) only rely on input and can be calculated in parallel. The outputs at each time step are cumulatively summed together in log-space, removing the redundant calculations of BPTT, keeping the computation graph shallow, and greatly reducing the vanishing and exploding gradient problem. Single minGRU layers are time independent but become time dependent by adding layers, requiring only 3 layers to solve a selective copying task. I assume minGRU's in-context learning ability is also limited by its hidden state size, but adding a memory interface to it should overcome this. It should be possible to augment it with a heap for allocating, referencing and deallocating memory, but I need to think more on it.
>>34499 >paper I'm not so sure I understand this. In fact I know I don't but is the general idea that the neural network MUST(normally) recalculate ALL the nodes or neurons in a network for every new batch of information? Is the paper saying that they can break these up and do the operation in parallel? I'm trying to figure if this is the "big picture" in a general way, not specifically.
>>34477 - I got multi-graph causal inference working. It's a pretty dumb algorithm, but it works well. The basic idea is that it joins the graphs, finds all nodes on the relevant causal path, and iterates on the individual graphs to find causal effects based on accumulated information. Right now, the iterations continue until it has some information on all causally-relevant nodes. At some point, I'll update it so it iterates enough times that the causal information from each node can fully propagate. It treats all causal information as relevant and everything else as spurious. - I was previously using the DoWhy library to do causal inference. I migrated the relevant code into my own framework. - Overall, the DoWhy code is atrocious and unpleasant to work with. I did a massive cleanup and improved the interfaces so I could add custom inference engines much more easily. My version is has less features, but in practice I don't expect anyone to actually want to use those features. - I added an inference engine to do causal inference on arbitrary natural language data. - I worked out how to get LLMs to produce good causal graphs, though this isn't integrate with the rest yet. I'm pretty happy with the natural language causal inference results so far, even with a 70b model. My expectations were pretty high, and it meets expectations. Of all the things I've developed, this is probably the one I'm most proud of. I have some code cleanup to do now. Specifically: - Make the base causal inference class support both numerical and natural language inference. Right now, it requires code changes to make it switch between the two. I've already done most of the work for this, and the rest should be easy. - Update my multi-graph inference code with the recent updates so it can deal with natural language inferences. - Update my data manager to support natural language. The data manager's primary purpose is to reduce the number of datapoints required inference. For numerical data, this is necessary because causal inference on a large number of datapoints is slow. For natural language data, it's necessary because I want everything to work with a small context window. My current implementation is pretty naive since it gets representative points over all the data rather than over just the data required for each specific inference. With numerical data, that wasn't really an issue since I could impute the missing values and still get decent results. For natural language data, imputing isn't viable, so I'll need to do it "properly". - Write & integrate the causal graph generation code so graphs don't need to be manually specified. - Create horsona modules for everything. After that, I'll be doing more memory work.
>>34507 >Of all the things I've developed, this is probably the one I'm most proud of. You've developed some pretty impressive things during the time I've known you, Anon. So that sounds very exciting to hear! :^) Well, I've got about a month left to fulfill my plan to at least try to get your Horsona system up and running on my machine before the end of the year. I presume these : >- Write & integrate the causal graph generation code so graphs don't need to be manually specified. >- Create horsona modules for everything. are very-much intended to be part of your new package? If so, then I'll wait until you give the word that setting up your system for a beginner like myself is ready to go, then I'll take an honest whack at it. <---> Keep moving forward, CyberPonk. Cheers. :^) >=== -minor edit
Edited last time by Chobitsu on 11/28/2024 (Thu) 13:22:53.
>>34485 >You're making my head hurt Lol. My apologies, fren Grommet. That wasn't intentional! :D <---> By way of explanation, I'll try to use some rough analogies. Very rough, so don't be too critical haha... :D So, a 4-rotor drone ostensibly has just one PID-based control system as the typical norm. However, I would argue that in fact there are five onboard : * The 'normal' one that acts as the system controller. * The other four are actually part of the motor drivers themselves; although they aren't exposed as such (nor even engaged with by the user), they are in fact real, and vital to the proper operation of the drone in essentially all reasonable flight modes. So there's five PID-esque 'controllers' for a single machine. The so-called 'temporal sliding' is there as well I'd argue, due to the inherent propagation delays and other latencies within this physical rotors system+network of electronics, all onboard this single aircraft. <---> Now... picture an entire flotilla of self-illuminated drones, all doing synchronized swimming flying -- ala a ginormous yuge visage of Optimus' head floating over Hollywood on, say, the night of October 9th this year. :^) Now you have a simulacrum of a yuge network of PIDs all 'coordinated' (though entirely ground-based preprogramming, AFAICT) after a fashion. But what if -- instead of externally-driven -- they all talked to each other live instead? Now you have the basis for a : 'Multi-producer, multi-consumer PID system that embodies a many-to-many (communications&control [C&C]) matrix, that has temporal-sliding going on like mad all over the place' . In fact, such a manmade technical system could embody just the behaviors we observe in nature for a flock of birbs, or a school of fishbros. Get the picture? <---> Now, simply encapsulate this exact concept down into a single, cognitive-oriented system where the Super-PID node(-synapse) wrapper took-in/gave-out C&C signals from/to the entire (or at least a large, local subset of the) network collection -- all while keeping the actual signal inside the local node stabilized (similar to how an individual drone remains stable in flight, regardless of the flotilla action as a whole) as a sort of 'running average' of the local interconnections. It would operate this way regardless of all the signal-sliding flowing through the system as a whole. Finally, the CFDT-esque validation process comes alongside to read the convergence out of the network of individual (now-stabilized) node-internal signal levels during each compute quanta 'tick' of the system. (The tick is simply an idea of a time barrier, so that the CFDT concept can do it's magic, and all the nodes will in fact each have an accurate picture of the system as a whole, regardless where they themselves each were on the 'timeline' of signals-generation inside this system during this specific compute quanta.) <---> Whew! I hope I didn't make all this even more confusing, Anon. :DD >tl;dr This is primarily a temporality problem at this stage, IMO. Adding a fully-interconnected meshnet of Super PID wrappers -- one for each 'synapse' -- is the way to gain convergence out of this chaotic soup and it will be stable as such, even with yuge numbers of cycles (feedback loops) inside the system! Cheers, Anon. :^) TWAGMI >=== -prose edit
Edited last time by Chobitsu on 11/28/2024 (Thu) 15:40:29.
>>34508 They are very much intended for the new package. The LLM interface to the causal models will probably be a little flaky initially, at least until I get proper memory working, so I'd recommend playing around with some of the other functionality before that. The underlying modules should be robust, though it'll be more complicated to use them directly. There's some setup required (getting API keys or setting up local models) that you'll want to have ready for it, so it's worthwhile to try installing & running the tests sooner. It's a one liner to install after you have python 3.11 and poetry installed, then you'll need to add your API keys/endpoints, then a one-liner to run the tests. The library can be configured to use every API key & endpoint you give it, so the more keys you get and endpoints you set up, the faster it'll be. With just a single OpenAI key, it is painfully slow. Your endpoints should support at least 8k context since that's what I test everything with and what so the default are configured for.
>>34509 >Get the picture? I think so. >"...Super-PID node(-synapse) wrapper took-in/gave-out C&C signals from/to the entire (or at least a large, local subset of the) network collection -- all while keeping the actual signal inside the local node stabilized..." Super-PID node sending gross (large scale) position instructions while local node (code) controls the fine tuning of the former sent position code. Or that's what I get you saying. Not personally criticizing you Chobitsu, You're just using the language of the trade but it appears to me that much of mathematics, AI, etc. use language that obscures what they are actual doing. And ALL professional technologist do this in their respective fields.
>>34512 Thanks for the advice, CyberPonk. I'll seek to do some of the preliminaries then at least by the goal. >>34513 >Not personally criticizing you Chobitsu, You're just using the language of the trade but it appears to me that much of mathematics, AI, etc. use language that obscures what they are actual doing. And ALL professional technologist do this in their respective fields. Good point, Grommet. Lingo is pretty much a part of all science & technology specialties. It makes things much easier to say. The benefits of using it are vaguely similar IMO to why we namefag here on /robowaifu/ : it kinda 'cuts to the chase'. :^) If I find myself with lots of spare time soon (lol), I may try to spell my theory out in excruciating detail, then if I have even more time, I'll try to simplify that. <---> Teh emminent brainiac Blaise Pascal once wrote : >“I would have written you a shorter letter, but I ran out of time” I hope you understand my conundrum, Anon. :P
>>34513 >>34515 >If I find myself with lots of spare time soon (lol), I may try to spell my theory out in excruciating detail, then if I have even more time, I'll try to simplify that. It just occurred to me I can find a use-case at the level I'm currently working (ie, driving the low-level control systems for the robowaifu body). Namely : wrapping the actuation control nodes with Super PIDs; then driving the pose estimation/control using this more-simplified, higher-level control interface (all linked together into a mesh to keep unwanted dynamics from going 'overboard' during the process of solving the full joint-chains for kinematic goals [during each time quanta]) (+ this arrangement is also well-suited to NN training regimes). >tl;dr Think of it kind of like 'morph targets' for 3D CGI blendshapes, but instead driving realworld, complex mechanics/dynamics for a robowaifu using simple sliders, that always keep the system as a whole within proper control limits (regardless of internal/external inputs). So who knows Grommet? I may make the time soon-ish to flesh my theory out in a practical way for you in '25 . Cheers. :^) --- >addendum (1): I just discovered that once this works, it should bring a very-valuable, additional benefit to the table for us: namely, that joint frame-local torques (aka unavoidable dynamics, individually [1][2]) -- which normally negatively affect the entire rest of the robotic skellington (via transmission down the joint-adjacent bones) -- can be damped out by reading the entire matrix of Super PID inputs at each node, and each tweaking their output values accordingly (all doing the same, in-concert, per tick). >tl;dr We should be able to eliminate (or at least greatly-reduce) jerky/janky robo movements by using this approach (very good, b/c cheap actuators should work better within such a system). [3] Sweet! >addendum (2): I also just discovered that through the magic of 'semantic polymorphism'(tm)(C)(R)(patent pending)(do not steal), I've been able to derail this thread by discussing the exact same topic (ie, wrapping 'nodes' in MP/MC PIDs) in two different contexts (Neural Cognition vs. Physical Kinematics). Lol. Therefore, any further discussions along this line should be done in either the Skellingtons or Actuators threads. :^) --- 1. https://en.wikipedia.org/wiki/Newton's_laws_of_motion#Third_law 2. https://www.sciencefacts.net/newtons-third-law.html 3. I suspect that -- in large measure -- this effect will likely mimic the human sensorimotor network phenomenon for our robowaifus. >=== -add 'kinematic', 'morph targets' cmnts -minor edit -add addenda, footnotes
Edited last time by Chobitsu on 12/04/2024 (Wed) 03:58:29.
>>34550 >joint frame-local torques (aka unavoidable dynamics, individually Bambu labs does this to vastly improve their printing ability on their 3D printers. If I remember correctly they actually shake the printer head around and fine tune each printer. Got to be a library or paper explaining this (assuming I could decipher it...maybe not)
Klipper firmware for 3D printers does this so somewhere in their code is the fnctions, "...High precision stepper movement. Klipper utilizes an application processor (such as a low-cost Raspberry Pi) when calculating printer movements. The application processor determines when to step each stepper motor, it compresses those events, transmits them to the micro-controller, and then the micro-controller executes each event at the requested time. Each stepper event is scheduled with a precision of 25 micro-seconds or better. The software does not use kinematic estimations (such as the Bresenham algorithm) - instead it calculates precise step times based on the physics of acceleration and the physics of the machine kinematics. More precise stepper movement provides quieter and more stable printer operation...." https://www.klipper3d.org/Features.html
> further discussions along this line should be done in either the Skellingtons or Actuators threads. oops sorry.
>>34507 Minor Horsona update & recap: The causal inference work looks like it's turning out to be a massive success. Right now, it can: - [New] Automatically generate causal graphs from a text snippet. - ... Though it's not automated, I can also "extract" knowledge on a topic from an LLM into a causal graph. This would make it possible to, e.g., make a small number of calls to a large/expensive LLM like Sonnet 3.5 or GPT o1 as a one-time cost, then use only small/cheap LLMs for all subsequent reasoning involving that causal graph. - [New] Given a causal graph and a text snippet, it can extract data for each variable in the graph (which will get stored in a database for subsequent analysis whenever it's relevant). - [Recap] Use an LLM to do causal analysis on natural language data. - [Updated] Identify the most representative datapoints for checking any particular causal effect. It does this by (1) identifying potentially useful datapoints based on what variables are known for each of them, and (2) using KNN to cluster the datapoints based on embedding, and (3) selecting the centers of each cluster as the representative points. This is necessary since it could easily get too slow & expensive to check all datapoints. - Identify causal effects given a causal graph and some partial data. (E.g., if I change or set X to a particular value, what will the effect be on Y?) - [Updated] Propagate effects across multiple related causal graphs (causal fusion). (E.g., if there's one graph for emotional state and one graph for conversational responses, it can check how X change to some variable in the emotional state graph affects Y variable in conversational responses.) This can (1) handle recursive dependencies, (2) maximizes parallel execution for cases where there are many graphs to check, and (3) iteratively refine the results so it's possible to get both fast results and increasingly-accurate results. This is done by turning causal analysis constraints into a set of computation graphs of increasing depth. In some cases, the depth could be infinite, which is why the iterative approach is required. I had a dumb algorithm for doing this before, but I think I'm doing it "properly" now. I've sanity checked everything individually, and it seems to be working reasonably well. Next pieces I need: - Finding a way to link up similar variables across graphs. Right now, they can only be linked up by name. That causes problems in three cases: (1) variables have the same name but different semantics, and (2) variables have different names but identical semantics, and (3) variables are closely related but there's no way to link them up. Once I've solved this, I should be good to go on large-scale graphs. - Finding a way to identify which variables (out of a large set) are relevant to extract when given some text snippet. - CRUD operations for managing a database of causal graphs and their associated datapoints. I have ideas for how to do all of these, but it's going to be tedious. I'm hoping to get it done by the end of the year, but that might be too optimistic.
>>34910 POTD Brilliant. >and (3) selecting the centers of each cluster as the representative points. I just wonder...do you think you could save a smol container of near-neighbor 'keypoint' vectors across the multidimensional space of the cluster, to store alongside this cluster's central point? (Since you're already calculating the inverse in effect anyway.) Seems like if you ever needed to 'backtrack' later down another, closely-related branch, then this pre-calculated collection of 'breadcrumb' vectors should make that redirection a hop-and-a-skip? Regardless, very exciting news CyberPonk! I hope you rapidly move through all the remainder checklist! Keep the goal in mind as you plod the tedium -- at this stage its nearly all gravy AFAICT. :^) Keep moving forward. >=== -prose edit
Edited last time by Chobitsu on 12/16/2024 (Mon) 02:48:44.
>>34911 That's a good idea. I'm planning to cache intermediate results for causal inference, and caching datapoints would speed things up as well. I'll probably end up storing the cached representive points in a side database since the clusters actually change depending on some parameters of what inference is being done. It would be a lot easier to associate that cache with the parameters than the center points. Once I get CRUDable causal inference, it'll all be gravy. It's still core development work right now. If I can actually get this whole thing in a working state by the end of the year, I'll be ecstatic. That would not only let me start work on the most difficult target horsona features, but it would give me a solid basis for creating characters that learn and evolve in more realistic ways, that remember things more reliably, and that can embody behaviors rather than just tendencies. There would still be a lot of downstream work, but I'd have a really solid chunk of the epistemology side completed. I do plan to spend some time after this first seeing if I can get it to work with the horsona codebase itself, which will probably take several months at least. It would be a lot easier to work on this if I could work on it alongside my waifu ^:).
>>34912 >It's still core development work right now. >There would still be a lot of downstream work, but I'd have a really solid chunk of the epistemology side completed. Yeah that's true, I'm sure. Heh, I have a tendency to get excited about current progress, and underestimate the difficulties of the remaining journey with the wave of a hand. :^) Still, every little helps along the way!! > It would be a lot easier to work on this if I could work on it alongside my waifu ^:). The Dream is coming alive!
>>34913 >The Dream is coming alive! I like the double meaning
Open file (11.70 KB 299x168 download (23).jpeg)
Open file (72.72 KB 960x540 cowgirl_00.png)
Heres some resources regarding tensorflow https://www.tensorflow.org/resources/learn-ml#courses tensorflow allows to make custom and even that may or may not be overkill. i for example plan to use nudenet to recognize the naked male body. the team that nade nudenet used either tensorflow or pytorch. So did the team that made mobile aloha. Python is not glamorous but when combined with the math all that stuff that has to be learned in order to use it, the skill required to use is quiet high. that link recommends that not one but muktiple bookd he read. im getting older and i dont want to learn any more math at all. ill do my best with sensors and computer vision but im narrowing the scope to what i promised. 5 sex acts.im not promising access to the code as well. Id only share that if there was collaboration.

Report/Delete/Moderation Forms
Delete
Report