/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

The Mongolian Tugrik has recovered its original value thanks to clever trade agreements facilitated by Ukhnaagiin Khürelsükh throat singing at Xi Jinping.

The website will stay a LynxChan instance. Thanks for flying AlogSpace! --robi

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB


(used to delete files and postings)

Knowing more than 100% of what we knew the moment before! Go beyond! Plus! Ultra!

AI Design principles and philosophy Robowaifu Technician 09/09/2019 (Mon) 06:44:15 No.27
My understanding of AI is somewhat limited, but personally I find the software end of things far more interesting than the hardware side. To me a robot that cannot realistically react or hold a conversation is little better than a realdoll or a dakimakura.

As such, this is a thread for understanding the basics of creating an AI that can communicate and react like a human. Some examples I can think of are:

ELIZA was one of the first chatbots, and was programmed to respond to specific cues with specific responses. For example, she would respond to "Hello" with "How are you". Although this is one of the most basic and intuitive ways to program a chat AI, it is limited in that every possible cue must have a response pre-programmed in. Besides being time-consuming, this makes the AI inflexible and unadaptive.

The invention of Cleverbot began with the novel idea to create a chatbot using the responses of human users. Cleverbot is able to learn cues and responses from the people who use it. While this makes Cleverbot a bit more intelligent than ELIZA, Cleverbot still has very stilted responses and is not able to hold a sensible conversation.

Taybot is the best chatbot I have ever seen and shows a remarkable degree of intelligence, being able to both learn from her users and respond in a meaningful manner. Taybot may even be able to understand the underlying principles of langauge and sentence construction, rather than simply responding to phrases in a rote fashion. Unfortunately, I am not sure how exactly Taybot was programmed or what principles she uses, and it was surely very time-intensive.

Which of these AI formats is most appealing? Which is most realistic for us to develop? Are there any other types you can think of? Please share these and any other AI discussion in this thread!
>>27 im not very in the know on the technicalities of the tech needed for robowaifus, but what do you think of Microsoft's GPT-3? heard about it after the whole AiDungeon fiasco
>>11205 GPT-3 is probably the best chatbot AI going so far but it's far from perfect. But obviously it's already being used far and wide by the corporate-controlled media to generated copy for their shall we say less-than-fully-competent 'writers', and to support the fake-news industry for the globalists. So it's already good enough at this stage to generate millions in income for the OpenAI organization that controls the work of the men who created the thing. Add into those coffers income from thousands of other groups like the one that ran AiD, and these OpenAI exploiters should be raking it in for at least a decade with this tech. Here's a thread that has more info on it Anon (>>250). Thankfully, there are some groups with work afoot to try and devise actually open GPT-3 alternatives, though even if they do a good job with it, it's still likely to require massive hardware resources. We have a thread about how to spread that work out here (>>8958). Hope that helped Anon.
Open file (148.15 KB 1280x720 master waifu.jpeg)
*casually making the first AGI waifu while the world is asleep* nothing personnel https://www.youtube.com/playlist?list=PLAJnaovHtaFTK9E1xHnBWZeKtAOhonqH5
>>11201 >first AI finds glitches to exploit in games >then finds glitches in reality The article is mostly hype. AI like genetic programming is really good at finding formulas to a mess of complex data. It doesn't find those formulas through any sort of thought or reasoning but through repetitive exhaustive search.
>>11475 >doesn't find those formulas through any sort of thought or reasoning but through repetitive exhaustive search Doesn't matter, because then it has the formula to deal with something. Which is what we need. That's a pretty low lever, we don't do reasoning on that level either.
Open file (43.47 KB 512x512 27761066_p2.jpg)
Schmidhuber's lab is at it again with Going Beyond Linear Transformers with Recurrent Fast Weight Programmers: https://arxiv.org/abs/2106.06295 They took Linear Transformers that are super fast, having a time complexity O(n) with sequence length compared to regular Transformers that are O(n^2), and experimented with adding recurrence to them in different ways, essentially making previous steps program the weights of the network, giving it a malleable memory. Before this paper Linear Transformers were fast but they didn't really perform anywhere near as well, but with recurrent fast weight programming and the error-correcting delta rule they outperform regular Transformers when using the full context length. On truncated context lengths of 256 tokens it also still performs competitively. We could use this for chat AI that runs quickly on the CPU. This model isn't only better at language modelling but also excels LSTMs in playing some games, which transformers completely failed at before. This a much more general-purpose AI architecture that could make significant advances with learning from multimodal data. When I have some time I'm going to try implementing it from scratch and training a small model to share with the guys at ElutherAI to see what they think. They released all of their code as well: https://github.com/IDSIA/recurrent-fwp
>>11716 This sounds particularly intriguing Anon. Good luck with your explorations, and thanks for letting us know here!
Found a really interesting study that combines existing language models with vision encoders to create multimodal language models that can generate responses to queries with images and text. All that is required to train is the vision encoder. The weights of the language model are frozen during training. Video summary: https://www.youtube.com/watch?v=ezrl1Yo_EkM Paper: https://arxiv.org/abs/2106.13884 This could be useful for creating waifu AI that can respond to pictures, video, audio and memes. Also I like this idea of being able to use existing models together. Pretty soon we'll have waifus that can shitpost with us. What a time to be alive!
>>11731 >Pretty soon we'll have waifus that can shitpost with us. What a time to be alive! The dream is alive! >"Required a few seeds to get a good answer which clearly paid attention to the image." (2nd image) My instinct is that this will be important for low-end hardware solutions for us here.
>>11731 Nice find anon. This is an aspect that is usually ignored by many chatbot research, but even if it's intelligence is shit, having an AI that can semi-reliably have a discussion about the images that you feed it would make it a lot more engaging than text-only (and it would allow some very funny conversations, I'm sure)
>>11735 Not him, but agreed. One of the nice things about Tay.ai was that she had pretty functional image recognition working (at least for facial landmarks), and could effectively shitpost together with you about them.
>>11734 I think they were referring to taking a few samples and selecting the best, aka cherry picking. But SqueezeNet for image recognition is super fast and can run on the CPU. I should be able to rig it up with GPT-Neo-125M. It'll be amazing to port this to Chainer and have a working Windows binary that's under 600MB. It doesn't seem like they released their dataset but any visual question answering dataset should work. We could also create our own dataset for anime images and imageboard memes. It'll be interesting to see if once the vision encoder is well-trained if it's possible to unfreeze the language model and finetune it for better results.
>>11731 Had some thoughts on this today. Instead of a single picture, multiple pictures could be fed in from a video, such as from an anime, and have it generate comments on it. Which got me thinking, if it can have this rudimentary thought process going on, couldn't it be used in something like MERLIN? https://arxiv.org/abs/1803.10760 It took natural language as input describing the goal it has to achieve. With a system like this though it might be able to break down tasks into smaller goals and direct itself as it makes progress. Some instruction saying it needs to get to the top of a platform or go through a certain door it hasn't seen before is immensely more useful than telling it to find the purple McGuffin and getting lost in a labyrinth of rooms.
Open file (989.22 KB 1439x2724 1627430232061.jpg)
This is the kind of chatbots people are paying good money for and a good example of why you should never use DialoGPT because it has no context of who is speaking to who.
I think that these guys at XNOR.ai have really got a paradigm shift in AI. I think, the idea is to instead of long lengthy matrix multiplications they just use CNOR logic. The end result is that they get recognition of animals, people, bikes, cars, etc. with only cell phone and raspberry Pi level computers. They used to have some really good real time object recognition video s but deleted a bunch of them when they were snagged up by Apple. Sigh. However I just found out that the ideas they came yup with were started by a non=profit and papers and I believe some code may be found by rummaging aroudn their site. So here's a link on XNOR.AI and then one from the non-profit. https://techcrunch.com/2017/01/19/xnor-ai-frees-ai-from-the-prison-of-the-supercomputer/ https://allenai.org/ AT the above they have some named videos like."OpenBot: Turning Smartphones into Robots | Embodied AI Lecture Series". Hmmm...sounds interesting. One thing I've thought about for a while off and on is that small insects can do a bunch of simple things with next to no brain at all. A standard micro-controller that runs your refrigerator could probably run rings around an ant brain power wise but no one has come up with the right algorithm yet to use this computing power efficiently. Maybe this is the way. For a decent functioning robowaifu we don't need super powers maybe more like mouse powers and I'm not so sure with the right software we could not get that right now with a handful of top of the line processors commercially available. If it takes a handful today then two years from now it may only take one.
Oops CNOR logic actually XNOR
>>11857 because its not true AI, it's chat AI. Like lobotomizing a person but leaving their ability to be chatty intact
>>13396 >“We decided to binarize the hell out of it,” he said. By simplifying the mathematical operations to rough equivalents in binary operations, they could increase the speed and efficiency with which AI models can be run by several orders of magnitude. excellent! this is somewhat analogous with what anon on /pol/ brought up. An important concept, neural simulating circuits, simulating these complex interactions on the nano scale, on atoms themselves rather than the vastly inefficient method of simulating these on software running on only standard logic gates. (like emulating a "computer" in minecraft on top of a running computer versus just running a computer on hardware, cool video if you haven't seen it, they create and/or gates out of special red blocks and torches or something if I'm not mistaken) https://www.youtube.com/watch?v=nfIRIInU2Vg
>>13401 still not sure why when I upload a small graphic with white background it does this
>>13401 sorry if that's hard to read, my 6th dose of espresso just hit me and im making word salad. I don't edit my posts much here b/c I assume u are all smart enough to decode it. Edit feature would be nice but that' s not how IBs work :' [
>>11736 >>Tay.ai Extraordinary what Tay came up with within a couple weeks
>>13403 > im making word salad Don't feel bad I gronkked my comment quite a bit. Sometimes when I'm tired, and even when not, I just miss all this retarded typing I do. If I didn't have spell check between my fumble fingers and my horrid spelling my comments would look more like hieroglyphics than writing.
I was thinking about this TED Talk video and trying to think how it could be used to help program an AI waifu: https://www.youtube.com/watch?v=7s0CpRfyYp8 A short summary of it is that the brain exists to coordinate movement by using sensory information and memory to predict using a Bayesian inference what movements to make to ensure our needs are met, and that everything else our brains do is largely just a byproduct of this. As I've said before in other threads, the only real need an robowaifu has is to ensure her owner's happiness, so good AI mostly seems to be a matter of creating the best predictive analytics model for the job, but I'm mostly interested in how prediction can be used for coordinating body movement, since that seems to be the biggest hurdle when creating a gynoid.
>>13810 Thanks, Bayesian inference seems to be an important topic. Maybe more long than short term, though. The AI researcher are already on it. I recall it being mentioned here for example: https://youtu.be/pEBI0vF45ic > Judea_Pearl_-_Causal_Reasoning_Counterfactuals_and_the_Path_to_AGI_Lex_Fridman_Podcast_56
>>13816 Not really, if you actually watch the video, it makes sense if you think about it rationally, every part of the brain exists to either remember information needed to make predictions, process sensory information, &/or coordinate movement. The only parts that aren't really involved in any of those are basically glands for regulating hormones. From a purely materialist perspective, it all checks out. The sea squirt analogy really hits it home: they swim around like tadpoles until they're mature, then anchor to surfaces like barnacles and start to digest their own brain because they don't need it anymore. Plants, fungi, etc. don't have brains because they don't move. The only thing that gets close is the jellyfish, which have some nerves, but not enough anywhere to be considered a brain. Jellyfish barely either, and some technically have photoreceptor-like eyes, but they're overall barely more than a living piece of skin. >>13817 Neat. I'll have to watch that video later.
>>13819 >Jellyfish can barely move either.
>>13817 Huh, seems like all you would need to do is make nested updatable variables to approximate this kind of intelligence, for example, she could want to walk at x speed in the y vector. By checking her assumed speed vs her actual speed, she could make adjustments. Like, going 1 m/s requires higher voltage when she senses she's on carpet compared to when she's on tile flooring.
Dropping an interesting paper from last November on improving transformers for conversation: >The conversational setting is challenging because these models are required to perform multiple duties all in one shot: >to perform reasoning over the returned documents and dialogue history, >find the relevant knowledge, >and then finally combine this into a conversational form pertinent to the dialogue. >Perhaps due to this complexity, it has been observed that failure cases include incorporating parts of multiple documents into one factually incorrect response, or failure to include knowledge at all and reverting instead to a generic response using the dialogue context only. >In this work, we instead propose to decompose this difficult problem into two easier steps. Specifically, by first generating pertinent intermediate knowledge explicitly and then, conditioned on this prediction, generating the dialogue response. We call this model Knowledge to Response (K2R). https://arxiv.org/pdf/2111.05204.pdf It works sort of like a lorebook in NovelAI where detected keywords or phrases inject information into the context to improve response generation, except here the lorebook is generated by another language model. Improvements were reported in consistency, breadth of knowledge and factualness but no improvement was seen in how engaging responses were. These knowledge models are easy to implement with an autoencoding transformer like the T5 model.
>>15317 (continued) What's really missing for robowaifu AI though is the lack of memory I/O so it's possible to learn from daily interaction. Separating knowledge from language processing is a step towards this at least. Instead of generating knowledge from weights learned through backpropagation on another model, it could be summarized from stored memories located by masked content-based addressing. https://arxiv.org/pdf/1904.10278.pdf For example, in saving a memory like "ELIZA was one of the first chatbots" an important part is 'ELIZA was' and would be masked out in the content address, so when something similar to 'one of the first chatbots' pops up in conversation, this content memory address is accessed and ELIZA is remembered. The reverse could also be stored so that when ELIZA pops up in conversation it's remembered she was one of the first chatbots. This should be doable with an autoencoding transformer that summarizes the input into key-value pairs to be either stored or queried. But there should be a much better approach to creating an associative memory. The data stored should really be the relations between two items, creating a knowledge graph. For example, the relation between 'ELIZA' and 'one of the first chatbots' is 'was'. The transformer needs to be able to add, modify and access these relations. How to get the relations containing an item or similar ones is beyond me right now. Perhaps by constructing a sparse neural network and sending out a pulse from relevant nodes in the graph? Then taking the top-k or top-p edges in graph and returning those statements to the context. Maybe someone who understands graph neural networks better could suggest something here. The main issue is this graph search has to be fully differentiable for backpropagation, although a non-differentiable approach might work here, such as using reinforcement learning with proximal policy optimization, which I'm already working on implementing for InstructGPT.
>>15317 >It works sort of like a lorebook in NovelAI Never used it, but your description sounds intriguing. >>15318 Your graph looks quite a bit like the kind of work we're conceptualizing towards using Object Role Modeling (T. Halpin). While I recognize that statistical processing is quite important for our goals, yet we simply cannot rely on it alone if we are to succeed at our AI. The hardware/training costs for that approach are simply untenable for our model. I'm also somewhat skeptical it's the singular best approach to the problemspace as well. >What's really missing for robowaifu AI though is the lack of memory I/O so it's possible to learn from daily interaction. Totally makes sense. We very obviously keep a Theory-of-Mind model going for both ourselves and others too. Probably an important aspect of holistic mental models, too. >The data stored should really be the relations between two items, creating a knowledge graph. Yep. Such 'incidental' data structures are rife in the natural world, if I can stretch the metaphor. The sub-atomic quantum-mechanical field interactions are in fact fundamental to practically everything else in physics. Yet they are 'incidental' artifacts from our human-oriented purview, generally speaking. Yet clearly, from a theistic POV, things were intentionally designed thus. Similarly, we need to think at least one level higher up and work towards efficient AI algorithms that exploit such incidental -- if ephemeral -- 'data' structures.
Open file (57.25 KB 900x613 memcontroller v2 wip.png)
Been trying to come up with a memory controller that only needs to be trained once, can leverage existing models, and can support quick storage and retrieval up to 1 TB of data. It's a lot to explain but the basic idea is it summarizes the preceding text, pools the summary into a vector and then stores the summary and vector into a hash table bucket in the memory database. For retrieval it generates a query from the truncated context, pools it into a vector, looks up nearby memories in the memory database using the hash, and then finds the k nearest neighbours by taking the cosine similarity of vectors in the bucket. If no memories are found in a bucket the hash works like a tree so it will traverse up the tree until it collects enough memories to generate a summary. To make the memory controller trainable through generative pre-training without needing any new datasets, a hash alignment loss is used to ensure new memories and relevant queries point to similar buckets in the memory database. Two memory advantage rewards are optimized with PPO to train the summarization model to ensure both the hidden context summary and summarized memories improve the predictions of the language model (which can remain frozen during training so the memory controller can be solely trained on low-end hardware). Another idea I have for this is that the query generator could also be used to introspect memories and the output from the language model. If the model finds a contradiction somewhere, it should be possible to resolve it then update its own model or at the very least correct memories in the database. Being able to discern the correctness of statements could pave the way towards generating novel ideas grounded in truth not seen anywhere in training data or memory.
>>16110 That sounds very complicated. Do you know how to do something like that?
>>16116 It's a bit complicated but I've implemented most of the pieces before in other projects.
>>16110 Brilliant chart work. As usual, I hesitate to even make comment, I'm quite out of my depth (and often don't even understand the lingo tbh). However, your graph is truly worth a 1'000 words with this one, and helps me well along the way down the path to understanding your points. As primarily a C++ dev, I naturally tend to conceptualize every problem as a nail to fit that hammer. That being said, there's a standard library algorithm std::set_intersection that I used in Waifusearch that, along with the rest of of the general project algorithms, afforded a pretty efficient way to rapidly narrow down potential search items. https://en.cppreference.com/w/cpp/algorithm/set_intersection So, my question would be "Could something like that be used in a system to find 'k nearest neighbours'''? I don't know myself, and I'm just stumbling in the dark here. But I want to not only understand your goals, but even to help fashion them in reality with you Anon.
>>16148 I plan on using a SQL database to store data with each memory and take advantage of indexes to quickly do the approximate nearest neighbour search. SQL does its own set intersection when you query something like where a=1 and b=2, and with an index on those columns it knows exactly where to find a few KB of data in O(log m + log n) time by using B-trees, instead of checking every single item in O(m+n) time, which could potentially be a few million after a year of accumulating memories.
>>16195 I'm very hesitant to encumber our efforts with RW Foundations by using an opaque tech like a database. As with BUMP/Bumpmaster I consider keeping the data openly available and using the filesystem itself as the 'database' is much safer for all involved. It's also a universally-available datastore. I'm not sure exactly what the Big-O rating would be for Waifusearch 's overall algorithm, but it's provably consistent at reaching an answer generally in less than 100 us for a simple search. And this is on a low-end, 2-core potato machine. I'm sure both the algorithm itself, and very definitely the hardware, has plenty more headroom available. Again, Waifusearch is a filesystem-based datastore system. After a few seconds frontloading the indexing, she's pretty snappy tbh.
>>16240 No worries. At the bare minimum B-trees alone can be used for the memory storage and retrieval. If memories are stored as files they'll have to be split up into many directories using the beginning of their hash. I've ran into issues storing 10 million files (40 GB) in a single directory.
Open file (286.34 KB 719x737 gato.png)
Open file (455.78 KB 2075x1087 gato chat.jpg)
Open file (87.56 KB 1136x581 gato scaling.png)
DeepMind created a multipurpose multimodal transformer that can play games at a human level, caption images, solve robot simulation tasks 96% of the time, control a real robot arm and chat about anything including responding to images. It doesn't appear to be using the latest multimodal advances though such as multimodal cross attention so it's not too great at image captioning. The largest model they tried was 1.2B parameters and it appears to perform decently with only 79M. For reference, a 375M model could run on a Raspberry Pi with 4 GB of ram. https://www.deepmind.com/publications/a-generalist-agent The authors also mention this is just a proof-of-concept and wish to experiment with external retrieval and mentioned another fascinating paper on the Retrieval-Enhanced Transformer (RETRO) that reported results on par with GPT-3 using 25x less parameters. It doesn't store memories but instead indexes large amounts of text using BERT embeddings, retrieves similar information to the context, and incorporates it with chunked cross attention. It's pretty encouraging seeing these early attempts getting such good results. The multimodal agent in particular makes me think of the possibilities of storing multimodal embeddings as memories rather than just text. A waifu would be able to remember your face, where stored items were placed months or years ago, everything you've read, and what you chat about and did every day with almost perfect clarity.
(>>16255 related crosspost)
>>16249 Thanks, that's encouraging to hear Anon. >>16254 >and it appears to perform decently with only 79M >A waifu would be able to remember your face, where stored items were placed months or years ago, everything you've read, and what you chat about and did every day with almost perfect clarity. What a time to be alive! Do you have any feeling for how practical it would be to train on more modest hardware that Joe Anon is likely to have around?
>>16261 The most popular GPU on Steam right now is a 6 GB GTX 1060. It's pretty slow so from scratch probably two years for a 375M model. With pretrained models maybe a week or two. Language models have been shown to transfer well to reinforcement learning and also work well with existing vision models. You just have to train an adapter from the frozen vision model features to the frozen language model embeddings, ideally after finetuning the vision model on vision tasks you want it to be able to do.
>>16266 >With pretrained models maybe a week or two. Language models have been shown to transfer well to reinforcement learning and also work well with existing vision models. Actually, that sounds pretty encouraging Anon! So, I would assume that a home-server could hold the GPU and work on the incremental training times, and the runtime could be performed onboard the robowaifu with even more modest hardware (say a Chromebook-tier or even SBC machine)? Also, is this a scenario that would work with no continual connection even to the home server? This is, entirely un-networked using purely on-board data and hardware resources?
>>16298 Part of adding a memory is to get rid of the need for incremental training. A model like Gato would be able to run on an SBC but might be too slow to inference for servo output. It would be more practical for it to do planning and have a faster, more lightweight system to handle the movements. Everything would be able to run onboard but it wouldn't be ideal.
Open file (491.22 KB 1192x894 gambatte.jpg)
>>16312 Ahh I see I think. Makes sense. >It would be more practical for it to do planning and have a faster, more lightweight system to handle the movements. Absolutely. Latency-tiered domains in our robowaifu's systems is a given. I live by the concepts of frontloading and distribution as a coder. I hope we can soon have just such a system as you describe working soon! :^) Cheers.
>>16312 >It would be more practical for it to do planning and have a faster, more lightweight system to handle the movements. Everything would be able to run onboard but it wouldn't be ideal. Realistically, low-level movement and locomotion would be handled by a separate model or a traditional software system. Gato is useful for slow-realtime actions (unless you enhance it in more than a few ways).
Open file (107.55 KB 1013x573 RETRO.png)
>>16254 I very much like seeing this here, great taste. Note that even the largest model is quite small by modern standards - you could run it on 6gb a VRAM GPU with a few tricks. It uses vanilla transformer and short context, this is clearly just a baseline compared to what could be done here. Stay tuned. >>16110 I respect the creativity, but I do think that you overcomplicate the solution, although a semantically rich memory index mechanism sounds interesting in theory. Still, as of now it looks brittle, as memorizing should be learned in context of a large rich general-purpose supervision source. RETRO https://arxiv.org/abs/2112.04426 used banal frozen BERT + FAISS for encoder & index for language modeling, and did quite well, overperforming dense models larger than it by 1+ OOM. >If the model finds a contradiction somewhere, it should be possible to resolve it then update its own model or at the very least correct memories in the database. If you have some strong runtime supervision, you can just edit the index. Retrieval-based models are targeted towards this usecase as well. There is a good if a bit dated overview of QA approaches https://lilianweng.github.io/posts/2020-10-29-odqa/ There are some attempts at retrieval-enhanced RL, but the success is modest for now https://www.semanticscholar.org/paper/Retrieval-Augmented-Reinforcement-Learning-Goyal-Friesen/82938e991a4094022bc190714c5033df4c35aaf2 I think a fruitful engineering direction is building upon DPR for QA-specific embedding indexing https://huggingface.co/docs/transformers/model_doc/dpr https://github.com/facebookresearch/DPR The retrieval mechanics could be improved with binary network computing semantic bitvectors https://github.com/swuxyj/DeepHash-pytorch and using the well-developed MIPS primitives: https://blog.vespa.ai/billion-scale-knn/ If you watch karpathy's tesla AI day video, you can glimpse that their autopilot approach contains some form of learned memory generation, which is an interesting direction because it learns how to create memories valuable for predicting the future. There are other nuances and memory-enhanced transformer architectures, though. TBH this space needs a good little benchmark, so that we could test our hypotheses in colab.
>>16468 >Stay tuned. I like the ring of that, Pareto Frontier. Looking forward with anticipation to your thread tbh.
Open file (205.82 KB 701x497 image-search.png)
Open file (134.63 KB 1041x385 hashnet.png)
>>16468 The idea behind aligning the classification embeddings is because the query lacks the information it's trying to retrieve from the memory. A frozen BERT model trained for semantic search isn't going to match well from a query like "what is the name of the robowaifu in the blue maid dress?" to character descriptions of Yuzuki, Mahoro or Kurumi. It has to learn to connect those dots. If it struggles with figuring that out on its own then I will pretrain it with a human feedback reward model: https://openai.com/blog/instruction-following/ Also the encoder for the summarization model can be used for the classification embeddings which reduces the memory cost of having to use another model. Training will still be done on large general-purpose datasets. The memory can be cleared after pretraining with no issue and filled later with a minimal factory default that is useful for an AI waifu. RETRO is evidence that basic memory retrieval works even without good matching, and augmenting the context with knowledge from a seq2seq model has also been successfully done with improvements to consistency and truthfulness: https://arxiv.org/abs/2111.05204 The hashing strategy was inspired from product-key memory for doing approximate nearest neighbour search: https://arxiv.org/abs/1907.05242 but using the score instead for a binary code so it can work with a database or any binary search tree and a continuous relaxation to make the hash differentiable: https://www.youtube.com/watch?v=01ENzpkjOCE Vespa.ai seems to be using a similar method by placing several items in a bucket via a binary hash code then doing a fine-level search over the bucket: https://arxiv.org/abs/2106.00882 and https://www.cv-foundation.org/openaccess/content_cvpr_workshops_2015/W03/papers/Lin_Deep_Learning_of_2015_CVPR_paper.pdf From that repo you linked it looks like HashNet is the simplest and most effective and similar to what I was planning to do with a continuous relaxation to make the binary hash codes differentiable: https://openaccess.thecvf.com/content_ICCV_2017/papers/Cao_HashNet_Deep_Learning_ICCV_2017_paper.pdf Using FAISS is out of the question though since it uses too much memory for a SBC and can't scale up to GBs let alone TBs. I'm not familiar with DPR and will have to read up on it when I have time. There's bit of a difference in our projects since your target platform is a gaming GPU. My goal is to create an artificial intellect that doesn't need to rely on the memory of large language models and utilizes memory from disk instead. This way it can run off an SBC with only 512 MB of RAM which are both affordable and in great stock (at least non-WiFi versions that can take a USB WiFi dongle). I've given up trying to do anything with large language models since I neither have the compute or the money to rent it. The idea though will also scale up to larger compute such as a gaming GPU if anyone with the resources becomes interested in doing that.
>>16496 >My goal is to create an artificial intellect that doesn't need to rely on the memory of large language models and utilizes memory from disk instead. This way it can run off an SBC with only 512 MB of RAM which are both affordable and in great stock (at least non-WiFi versions that can take a USB WiFi dongle). You are the hero we all need, but don't deserve Anon! Godspeed.

Report/Delete/Moderation Forms