/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

Happy New Year!

The recovered files have been restored.

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


“What counts is not necessarily the size of the dog in the fight – it’s the size of the fight in the dog.” -t. General Dwight Eisenhower


LLM & Chatbot General Robowaifu Technician 09/15/2019 (Sun) 10:18:46 No.250
OpenAI/GPT-2 This has to be one of the biggest breakthroughs in deep learning and AI so far. It's extremely skilled in developing coherent humanlike responses that make sense and I believe it has massive potential, it also never gives the same answer twice. >GPT-2 generates synthetic text samples in response to the model being primed with an arbitrary input. The model is chameleon-like—it adapts to the style and content of the conditioning text. This allows the user to generate realistic and coherent continuations about a topic of their choosing >GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets. Also the current public model shown here only uses 345 million parameters, the "full" AI (which has over 4x as many parameters) is being witheld from the public because of it's "Potential for abuse". That is to say the full model is so proficient in mimicking human communication that it could be abused to create new articles, posts, advertisements, even books; and nobody would be be able to tell that there was a bot behind it all. <AI demo: talktotransformer.com/ <Other Links: github.com/openai/gpt-2 openai.com/blog/better-language-models/ huggingface.co/ My idea is to find a way to integrate this AI as a standalone unit and add voice-to-text for processing the questions and TTS for responses much like an amazon alexa- but instead of just reading google results- it actually provides a sort of discussion with the user. (Edited to fix the newlines.)
Edited last time by Kiwi_ on 01/16/2024 (Tue) 23:04:32.
Open file (2.86 MB 1080x1920 vtube.mp4)
>>36730 It looks like Live2D Cubism is based on OpenGL. I've seen it before but never got it to work for vtube until this latest release. Here's Vtuber working using screen mirroring but I didn't have the speaker in her head. There's already Jenny Live2D models but can't find a free one yet. If I can get a Jenny and Emmy Live2D model working, I'd be pretty happy. Would be nice with 8 inch tablet vertical. And they also plan on a native mobile app at some point. Definitely needs to be audited but GitHub already has 2k stars so hopefully people are
>>36735 Thanks, Barf! I appreciate all the inputs about this goal of mine. Anything else you think to share along these lines will also be welcome! Cheers. :^)
>>36752 I can say this vtuber app allows for whisper.cpp, llama.cpp, piper.cpp and other Smol TTS like Kokoro-TTS. The Live2D is in Java and works quick even on mobile browser. Best chatbot I've used so far It's using under 5GB so should be able to run on 8GB Pi with 3B LLM. Would be good option for Qwen 2.5 VL 3B with spatial reasoning, and they already have a camera option to use with vision models but not fully implemented yet. I haven't really gone down the vtuber hole yet but that's the main thing is the avatar is limited by proprietary Live 2D, so any OpenGL replacement for it would be great.
>>36754 >...so any OpenGL replacement for it would be great. Thanks! If you would care to do so, please try to find an opensource, permissively-licensed viseme library written in either C++ or C. If you can do so then I should be able to integrate it within an OpenGL application to take a stream of text and animate a face (more-specifically, the mouth). When that is in place we can begin implementing a smol visual-waifu. At first it will only be a 'floating head', but it should be both very tiny in size, and very lightweight against compute. It could take in any stream of text (as from: an LLM, etc.), and then speak that out with a good TTS engine in addition. >=== -prose edit
Edited last time by Chobitsu on 02/08/2025 (Sat) 11:45:43.
>>36735 Nice! I can see the potential for a standalone virtual gf too. Keep it up!
Warning! >>36771
>>36756 >viseme library written in either C++ or C. I could only find viseme in python and java but not C++. Closest I found was Rhubarb Lip Sync in C\C++ which is command line based audio input and output to a few formats. https://github.com/DanielSWolf/rhubarb-lip-sync >>36772 Nice find. Sounds like you might have to start scanning models but huggingface will probably fix
>>36780 Thanks Barf. Push comes to shove we can do some simple viseme analysis ourselves eventually. Cheers. :^)
>>36780 >update: This may be good enough, Barf. Using a limited form of phoneme recognition may just prove sufficient for our purposes. I'll look into lifting out part of their system to use as an engine for us: https://github.com/DanielSWolf/rhubarb-lip-sync/tree/master/rhubarb/src/recognition <---> Give me some time, and I'll make plans to integrate investigation work into this project as a sideline in my schedule. I'll be able to state more firmly thereafter if it will work for us here. Cheers & thanks again, Anon. :^) >=== -prose edit
Edited last time by Chobitsu on 02/09/2025 (Sun) 08:36:54.
>>36811 Updated my Offline AI Manual. I added character/role templates for people to use, as well as further details on how to use ChatterUI well.
>>36794 Good to hear! If it can run on a RICV-V SBC, my short term dream would be fulfilled of running on fully open hardware and software. But, I might still go the transparent route and get a CoPilot+ PC and use a Cortana chatbot, so they can watch me love my robowaifu. >>36818 New guide is really good. I'm going to use some of those prompts. Thank you!
>>36821 Thank you :)
This guy is working on open source full 3D local chatbot using audio2face unreal engine. But he's also using 4090 laptop, and not sure how censored it will be if using Epic\Unreal engine https://x.com/stablequan/status/1888334608766234828 https://forums.unrealengine.com/t/tutorial-nvidia-omniverse-audio2face-to-metahuman/1266501
Posted a few updates to GitHub. I updated my chatbot program to output the img2img stable diffusion to a webpage, so it can be viewed on a screenface. So now you can have a live wallpaper displayed on a phone or tablet while idle and then change to webpage while talking. I might add video idle to the webpage instead of wallpaper. I also added the servo code examples that moves a single servo while the chatbot is talking, and also an older TTS version using Coqui TTS which doesn't sound good but get's less than 5 second responses using an 8GB GPU with 13B LLM model. I still prefer the Vtube app, but a full voice\image clone option with hands-free voice is good to have. https://github.com/drank10/AnotherChatbot
>>36900 That looks really impressive, Barf. Thanks for the updates on your project!
>>36821 >If it can run on a RICV-V SBC, my short term dream would be fulfilled of running on fully open hardware and software. My apologies for not responding before Barf, I missed this on my 'TODO' list (I often do! :D YES! This is ofc a big dream of mine & several others here as well. The fact that Pi has BLOBs in place is -- by far -- the biggest strike against them. Not having a GPU API available is second. I'm not too sure what Broadcom's agendas are behind these choices, but you can be sure they aren't to serve the common man Anon! :^) If I can at all do so, I'll attempt just what you suggest with this little project. Who knows? Maybe with all our inputs together we can still come up with a Robowaifu Simulator after a fashion. Cheers. :^)
Still waiting on a plug and play app that you might find on steam that does all the heavy lifting (tech wise) for you. Ideally, running a chatbot locally should be as easy if not easier than using a site like chub or spicy chat. I don't want to go full Steve Jobs, but it should just work. It should still offer options to customize things under the hood, but the frontend should be as easy as downloading models and bots and just running them without any fiddling with dials and such.
>>36921 I made a guide for exactly that
>>36920 If the Hailo M2 module works with the 16GB banana pi, it probably would be very quick but would be around $500 for 20 TOPS. Llama.cpp and Sherpa for STT\TTS already work on RISC-V. https://www.amazon.com/Visionfive2-Hailo-RISC-V-Kit-Acceleration/dp/B0DK1SRGGT https://k2-fsa.github.io/sherpa/ncnn/examples/vision-five-2.html https://github.com/ggerganov/llama.cpp/pull/10411
>>36923 Thanks for looking after newcomers, GreerTech. I very much appreciate that. >>36924 Yeah, that looks really nice Barf. So, do you have any protips/advice for someone looking to set up such a rig?
>>36928 I really dont know which distro \ SBC would be best. If I were to try, I'd probably go with the Banana Pi since it seems to have more support than the Lichee Pi and more ram than the Vision Five. I'd start with probably the Debian or DietPi image to see if it boots https://docs.banana-pi.org/en/BPI-F3/BananaPi_BPI-F3 Once I got a working image, I'd try to compile llama.cpp with OpenBLAS make LLAMA_OPENBLAS=1 Then I'd try to get Sherpa working https://k2-fsa.github.io/sherpa/ncnn/install/riscv64-embedded-linux.html I just dont want to spend 200-500 and not get it working
>>36931 Thanks kindly, Barf. I'm hoping to try something similar to this eventually. I just thought I'd pick your (or any other Anons who'd care to contribute on this topic) brains. We should all be striving eventually for entirely free & opensource hardware & software solutions for our robowaifus. Anything less would be a disservice to them & ourselves, and a big boost for the Globohomo Big- Tech/Gov's nefarious (((agendas))) instead. Cheers.
>>36931 >Once I got a working image, I'd try to compile llama.cpp with OpenBLAS Interesting timing. I just made a post regarding the C++ Committee officially adopting the BLAS spec : ( >>36930 ). Gerganov strikes me as just the sort of chap to go in and refactor his systems to use the standard version once it's available in the big compilers. <---> Broadly-speaking such adoption by the language is a very good thing, since it means a wide swath of compatible hardware (such as robowaifu-onboard SBCs, MCUs, sensors, etc.) would all be running the exact same long-established (we're talking FORTRAN days here, folks :) LinAlg maths algorithms together, and with no dependency fuss. Generally, (near-perfect) portability is a high priority for software engineering -- especially so for those of us with big systems engineering challenges on our plates like /robowaifu/. >=== -fmt, prose edit
Edited last time by Chobitsu on 02/13/2025 (Thu) 13:49:33.
>>36938 Nice! I had no idea what it was. I just saw this reddit post when searching for RISC-V and llama.cpp https://www.reddit.com/r/LocalLLaMA/comments/16qo5xd/running_llamacpp_on_riscv_visionfive_2_simple/ He's using a 7B model on Visionfive so it took them 1000 seconds to get a response, but 500s with BLAS. I saw someone getting 7 TPS with 1B model, so I wonder if it would get it to ~15 TPS which would be usable for a bot program
>>36954 >Nice! I had no idea what it was. We're talking authentic OG stuff here in it's foundations, Anon [1]. And it has been successively expanded-upon & developed [2] into the forms that C++26, et al, use today [3][4]. The roots of the general maths approaches involved go back for centuries (possibly even millennia) (cf. >>36763, Newton, et al 'giants'). :^) <---> While our interests with LinAlg here on /robowaifu/ arguably stem primarily from it's usage in predicting/controlling complex motion paths of robowaifu skellingtons (ie, applied Kinematics [5]); in this specific use-case however, its b/c LLMs use sparse matrices to perform parsing operations on their token/model weights data. --- 1. https://dl.acm.org/doi/pdf/10.1145/355841.355847 2. https://www.netlib.org/blas/old-index.html 3. https://www.netlib.org/blas/ 4. https://www.netlib.org/blas/blast-forum/blas-report.pdf 5. https://en.wikipedia.org/wiki/Kinematics >=== -fmt, minor edit -add'l footnote/hotlink
Edited last time by Chobitsu on 02/14/2025 (Fri) 08:36:21.
>>36923 Too much reading, I need to be able to press one button and have everything work! Shitposting aside, I skimmed that and noticed backyard AI. Is it pretty self explanatory or am I going to have to open the hood and start fiddling with stuff in depth?
Open file (95.00 KB 1333x653 AnotherLiteChatbot.jpg)
I made this yesterday using Qwen32 Coder. It's another basic chatbot interface but this time using whisper.cpp and piper.cpp so PyTorch is not needed, and it is very fast. I'm getting less than 2 second response times for 15-30 second responses, so it would probably be usable on a Pi. Still need to work on GUI, webserver and stable diffusion but it's a start that anyone can use. Since it's all CLI based, it is easy to swap things out. I'll also work on packaging, but I can make this a single click installer easily now. https://github.com/drank10/AnotherLiteChatbot
>>37007 So great to see you making these rapid fire advances Barf.
>>37007 Can you prioritize the Python dependencies for your project Anon? So, from my perspective (and AFAICT, what you implied earlier) moving to a pure C++ and/or C implementation would not only speed things up performance-wise, but allow this to run on much smol'r devices. If I was able to make a simple GUI like yours would that help out much in reducing the dependencies? I have the MIT-licensed Dear ImGui in mind for simple GUI wrappers for underlying programs. https://github.com/ocornut/imgui >=== -sp edit
Edited last time by Chobitsu on 02/16/2025 (Sun) 08:10:42.
I just found this space and want to share what I'm working on. I'm trying to make an immersive chatbot (no regenerates) with a simple cognitive architecture for emotions, needs and memory. Still very WIP. I have a few agents for emotions so far. https://github.com/flamingrickpat/private-machine Not sure if it even runs right now. This is not real-time by far, but if I manage to add other agents that generate meta-cognitive thoughts, we could use the dataset from this agent to train a distilled model maybe thats faster and generates the internal reasoning for emotions and such itself, instead of relying on outsourced thoughts from agents. Any ideas? I have a lot going on, will probably continue deving in a few weeks.
>>37008 >>37018 Thanks for feedback. A C++ GUI would be great and would have done it if I could! I'm just using AI to code and hitting regenerate until it works. I'm not sure I can do that with C++ but need to try. LLMs will spit out inline ASM if you ask but waiting until next year to try that. I just added a web server to the program to serve the avatar image\video and then will work on conversation history but haven't looked into dependencies for either yet. I'm just using flask for webserver for now. That's really enough for me. Just a simple program that you can talk to without pushing buttons and has an avatar of any type. For this one, I'm just using it for screenface on Greertech's bot, so I can talk to different avatars. I'd like to get it using just animated GIFs of a few emotions while talking and then a resting face GIF. Very basic but easy to customize and fast.
>>37019 I was looking at your ToT and would love to see that become part of a distilled model. We just got CoT and sounds like a tweakable recursion levels are coming. Would this work on a 3B-7B or is a 24B needed? If you're getting 75+ TPS, a custom ToT would be pretty nice
>>37021 Thanks for checking it out :3 The ToT is actually created with Hermes 8B. I noticed that stuff with structured output and no <think> tags is faster and better with a non-reasoning model. My next steps will be to make some a needs, agency and meta-congnitive layer similar to emotions. Like goals of the ai persona, reflection on its own thoughts. Then refactor the ToT into a first-person thought. I'm still struggling with the final bottlenack, the generation of the final output. The 70B model made much better resulsts, but I have to cpu offload and it takes forever. Probably because the internal reasoning is made for math and logic and useless crap like that.
i would like to know how i can crack plugins for music production?
>>37019 >>37022 Are there any projects where I could "borrow" code from? The only other AI gf project I know is Yuna AI and the guy making it trains the models himself. No agents for my agent agnostic framework.
>>37019 Hello, Anon. Welcome! Please look around the board while you're here. >what I'm working on Wow! The ambitious scope of this project is seriously impressive, Anon. I hope you don't mind if I steal some of your ideas for our own RW Foundations concepts : ( >>14409 )? I hope I can make the time to try running your system (though I don't think my rig can handle a 24B model well). >tl;dr Time and again, I'm impressed with amateur Anon efforts in these arenas. You guys instill hope in the rest of us! Keep it up, Anon. Cheers. :^) >>37020 >Thanks for feedback. Y/w very kindly. Thanks for doing such good work towards these common goals, Anon. >A C++ GUI would be great and would have done it if I could! OK, I'll take that as a go code, and dig around with haxxoring together a simple GUI that hopefully will approximate yours for starters. We can go from there to see about wiring it up for your already extant system(s). Cheers. >=== -sp edit
Edited last time by Chobitsu on 02/17/2025 (Mon) 06:04:28.
>>37025 Probably not the best forum to ask, Anon. BTW, please re-ask this in our /meta thread ( >>32767 ) if you will, thanks (we'll be rm'g it from this one). Good luck with your project work, Anon! Cheers.
>>37037 That would be nice! The GUI was the hardest part for me since LLMs seem to be horrible at it. It pretty much one-shot the CLI integrations and then I spent a few hours re-arranging buttons manually. I'm probably not going to update that program much more. It was just intended to be easy to fork and test any future TTS\STT CLI. I'll probably make other bots for other use cases though like integrating with Home Assistant or something, but that's a ways away.
>>37046 I made a single click Pyinstaller.exe for the program but now everything that has console output spawns a terminal window. I can work on redirecting everything to null, but this is just a starting point to test CLIs. It works and only spawns one terminal for a half second on each response but annoying. It is a 4GB .zip file that doesn't require installing anything, comes with all of the whisper models and 2 Piper voices. With just one preset voice and CPU only version of whisper, it would be under 2GB. So someone could start with the basic pyinstaller .exe and then if they use it, install python so they can change things. Python is easy enough to install Python still sucks though, and Backyard was removed from the Google Playstore. It seems a lot of people are using ChatterUI on android because there is also nothing on playstore. If there was a simple C++ program for Windows\MacOS with end-to-end speech, I think it would help a ton for adoption. Also, for this program, you do have to use small GIFs since android can't render gifs above 2MBs well in a browser.
>>37037 thanks man! sure everything is free for the taking. im sitting on a 9h bus ride to berlin right now and got some new ideas. all seem very implementable in my head. when a new aensation (user input, thought, reminder etc) comes in i preprocess it with id (needs such as human interaction, sleep for memory consolidation) emotion (persistent basemodel with emotional stats and their agents) superego (persoanilty traits) every one of these works like like emotion system right now where agemts discuss what emmy should feel and do. after we have tge first think block, we can determine an action ignore think and reply (if user input) reply (if user input) think ignore call api for enhamced actions change location (high level, for when i have my virtual world woth places) some of these actions mightloop back to action selection like when an api returns something when replying it gets interesting. some higher order subsystema kick in such as goals, tasks, meta cognition amd reflection. and make more thoughts for the final reply. now instead of using the think block of a reflection model, i instruct a story writer agemt to continue the story. and add something like "i want emmy to think first. these are her thoughts.......what will she say?" then a feedback loop with meta rules check against hallucinations and undesireable output. this should me it possible to use a good rp model instead of a reasoning model. maybe a single 8b for everything. it will still be slow, but i have high hopes for tge story mode. it feels the internal reaaoning doesnt work so good emotional reflection, even with the 24b. i apologize for typos
Open file (834.24 KB 1280x720 2997068655.png)
>>37051 i was wondering whats the wecomended amount of dedotated wam wohnwehwehwoh pc..,
>>37046 >It was just intended to be easy to fork and test any future TTS\STT CLI. For me, the biggest issue is making the time/biting the bullet and learning the API for GUI creation; then plowing through the minutia of hooking together integration with the rest of the system. After all that, changes to the actual GUI arrangements are straightforward. So don't worry, Anon. We'll figure it out together. >>37051 >If there was a simple C++ program for Windows\MacOS with end-to-end speech, I think it would help a ton for adoption. <simple >end-to-end speech HAAA!! :D Heh, I think I know what you mean (probably something like: 'dead-simple to setup & use; just click-to-install then speak'). :^) Others have encouraged me to do something similar for installers here in the past. I may say now that I simply don't know of any ways to make a 'no-purchase-needed-to-license-instead-its-entirely-free-and-opensource-packaging-and-installer-framework' one-click installer (never having had to make one before in my work). Instead I've always worked on an already-existing pipeline, or (as here), just built my systems from scratch via sourcecode. <---> In fact, I'm highly-skeptical of the entire "installer framework" industry now (and much moreso during Current Year). When I see a listing of hundreds & hundreds of files being written + integrated into my computer while using such I feel the need to: a) immediately jump up to go take a shower, and b) put on an imaginary 'protective full-body rubber' before sitting back down at the machine! :DD c) keep my Flammenwerfer 35 handily nearby, just in case something spoopy jumpscares me from out of the box afterwards. :DD I've no doubt you're correct that devising such a system would speed adoption, but even till today I simply throw the sourcecode out there, recommend a build system to use (almost always CMake or Meson), and that's that. <---> If someone here can make recommendations for this need that works fine with C++ builds, and is opensource, and especially if it works on Windows & macOS, I'll be happy to give it the once-over. Cheers. >=== -funpost edit
Edited last time by Chobitsu on 02/17/2025 (Mon) 22:13:28.
>>37062 >thanks man! sure everything is free for the taking. Thanks! I hope I can return the favor here someday. :^) >im sitting on a 9h bus ride to berlin right now and got some new ideas. I reckon you're probably there by now. Have a safe & productive trip, Anon! >maybe a single 8b for everything. Yes, I think this would be the 'sweet spot' target goal for us all rn.
>>37062 >i apologize for typos I'll be happy to go in and patch those for you, Anon?
>>37068 About 4GB of spare ram on any machine should do using 1B LLM, Whisper Base and low quality voice for piper. Backyard AI is easier to install though and a lot more options. This was just to get it to output to a webpage for a project. That and backyard isn't opensource >>37071 > keep my Flammenwerfer 35 handily nearby My thoughts exactly. In the mean time, I might add a holster attachment to Galatea
>>37079 Yeah, about that... I hesitated to make my little joke. I really do hate what kikes + their glowniggers have been able to do to the software & computing industries. Still, we actually have a very strong position against most of their antics today. <---> But I still refuse to play their Terminators-R-Us pay-for-play game with them. And I'd advise every'non here to avoid milcon or other zogbot -oriented work. I'd consider the real costs to be far higher than anything they could possibly pay, tbh. :/
>>37072 Thanks! I wasn't productive, I was going there for a concert. It was great :3 Now I'm back. Wasn't feeling like deving, but after 5 coffees I made some progress. The initial tests with the story mode are great. Here you see the regular chat in the assistant turns, and the agency stuff as user input. I fake broke up with her (db rollback) and the response is great. Mind you, this is Hermes-3-Llama-3.1-8B.Q8_0.gguf. When (ab)using the <think> tag with Dolphin3.0-R1-Mistral-24B-Q4_K_M.gguf I had to regenerate sooo many times because it hallucinated or something, but this just works. Right now the emotions are kind of useless, using the dialoge alone would probably generate a similar answer. The goal is to couple that with a persistant emotion state with a decay to baseline. And use the same subsystem principle for other stuff. I'll experiment with some other subsystems right now. Like some sort of reflection thing where she's "aware" of the subsystems and can reference them.
>>37121 Pretty brutal, Anon. <---> So, in the context of say, a newcomer (me ofc, but others as well), how do you do this? Are there links to documentation or something for it (either for the models themselves, or discussing your own modifications if those are ready yet)? This would probably help other Anons here come up to speed with you, if you had a mind to see that. This is an interesting project Anon. Good luck with it. Cheers.
>>37122 I honestly feel bad every time I have to test if extreme reactions :( Sorry, right now the code is the only documentation. During the bus ride home I started making a presentation on my phone for the whole project. Once all the architecture changes are integrated, I might make a video on youtube going into detail. Problem is, things are changing so fast. Even though its on github, I don't really treat it like a public project with proper commits. I'm glad noone else is helping me. Imagine making some changes and then I push a 36 files changed commit that fucks everything up for you. I should really start making feature branches. I'll post a quick rundown here in the next few days, once I'm sure the changes I'm making right now are working as intended.
>>37123 Sounds great! Really looking forward to it all, Anon. >Imagine making some changes and then I push a 36 files changed commit that fucks everything up for you. I should really start making feature branches. Heh, just imagine what's it's like for Gerganov [1] rn : in about 2 year's time went from a smol set of quiet, personal little projects to now thousands of forks & contributors, and shaking the world of AI today. What a ride! >tl;dr Better buckle up, Anon! You may do something similar. Cheers. :^) --- 1. https://github.com/ggerganov

Report/Delete/Moderation Forms
Delete
Report