/robowaifu/ - LLM & Chatbot General v2

Name
Subject
E-mail
Message	Max message length: 6144
Files	Drag files to upload or click here to select them Maximum 5 files / Maximum size: 20.00 MB

Spoiler images
Password	(used to delete files and postings)
Use bypass

Chobitsu Board owner 05/30/2025 (Fri) 14:56:03 No.38827

>>38824 Thanks, OP! :^)

GreerTech 05/30/2025 (Fri) 16:10:42 No.38831

Breaking in the new thread with a new manual update -Updated list of models -Added Odysee backups -Added @Barf's prompt ------- Odysee Backups Social-Roleplay-Llama-Gemma PC AI Models https://ody.sh/M8f3VALm7S Social-Roleplay-Llama Smartphone AI Models https://ody.sh/imSLyPOkBh Qwen Coder AI Models https://ody.sh/PzErEQF9S9

Chobitsu 05/30/2025 (Fri) 20:54:59 No.38840

An smol'r alternative to llamacpp. Includes a vision-encoder. Appears to be Chinese. https://github.com/li-plus/chatglm.cpp >>38831 Neat! GG, Anon. Cheers. :^)

Edited last time by Chobitsu on 05/30/2025 (Fri) 21:02:49.

Robowaifu Technician 05/30/2025 (Fri) 22:11:27 No.38847

> (LLM dev -related : >>38845 )

GreerTech 05/31/2025 (Sat) 05:10:08 No.38861

>mfw I get blindsided epic style >>38853

Barf 05/31/2025 (Sat) 06:14:22 No.38866

>>38864 Looks good, but the prompt I posted with emotional tags is only for Orpheus and is just a stripped down version of the one Orpheus ships with which is this https://github.com/Lex-au/Orpheus-FastAPI/blob/main/System_Prompt.md I had a different prompt for F5-TTS with emotions but this is much better. Those are both good TTS engines that need a decent GPU with at least 8GB. For smaller, faster TTS in C++, sherpa onyx is a newer option than piper TTS https://github.com/k2-fsa/sherpa-onnx That sucks for Backyard since it was easy to install. SillyTavern\Open WebUI are a little harder but no coding is needed with Pinokio and they are all open source. SillyTavern is better for long format ERP and Open WebUI is more for productivity but has OpenAI speech API option which you can connect to a local Orpheus TTS server since it is a drop in replacement for the OpenAI speech endpoints.

GreerTech 06/01/2025 (Sun) 02:35:39 No.38888

>>38861 Fixed version Also, I added more backups

GreerTech 06/01/2025 (Sun) 03:02:21 No.38895

>>38888 Fixed version (again) -Fixed copyright notice error (Hopefully this is the last one lol)

Robowaifu Technician 06/16/2025 (Mon) 01:56:33 No.39396

> (LLM setup -related : >>39387 )

Robowaifu Technician 06/18/2025 (Wed) 08:32:02 No.39454

anyone wanna test my new ai gf implemention? https://github.com/flamingrickpat/private-machine/tree/main you'll need to be able to run llamacpppython with cuda and enough vram for gemma-3-12b-it-q4_0.gguf, weaker models will most likely not work i rewrote the whole logic (pm_lida.py) since the last time i posted it here. its inspired by the lida cognitive architecture and simulates emotions, needs, goals, intentions, self image. still in active development, nowhere near done. and very slow

Chobitsu Board owner 06/19/2025 (Thu) 00:29:26 No.39466

>>39454 Hello Anon, welcome (back)! Please look around the board while you're here. >its inspired by the lida cognitive architecture and simulates emotions, needs, goals, intentions, self image. That sounds remarkable! Certainly if you pull this off well, it will be a major breakthrough for AI companions, I think. I hope that someday you can work towards targeting much-smol'r hardware to run your waifu software on top of, Anon. That would be very helpful for us here. Please keep us here all abreast of your progress, Anon. Good luck with this project! Cheers. :^)

Robowaifu Technician 06/19/2025 (Thu) 08:30:02 No.39480

>>39466 Thanks :3 My long term goals are to finetune a smol model with the data generated by the agents. Recently I did some tests with loras, adapters and memory-layer adapters. I'm too smol brained and poor to come up with something groundbreaking and actually train it. I hope once the Google Titans architecture has some pre-trained models to make use of them. Or maybe even beyond LLM tech, who knows? Then I don't have to worry about memory agents, it learns and updates weights during inference. And the size / intelligence ratio will be better by then, so I could train it for cheap. Instead of the usual reasoning, it comes up with the whole unconscious thought process itself. Right now I'm mostly just testing it with my Emmy character. Depending on how it works, my next step might be to let her loose in an environment. Some virtual minecraft-esque world to move around and interact with the environment. Already browsed the robot vision thread and found some interesting projects.

Chobitsu 06/19/2025 (Thu) 13:06:30 No.39482

>>39480 It's a very exciting-sounding project, Anon! :^) >And the size / intelligence ratio will be better by then, so I could train it for cheap. >Instead of the usual reasoning, it comes up with the whole unconscious thought process itself. I'd sure like to hear extensive breakdowns of these two statements when you have the time to, Anon. Cheers. :^)

Grommet 06/21/2025 (Sat) 07:32:24 No.39516

>>39454 Much thanks!

Robowaifu Technician 06/27/2025 (Fri) 21:41:27 No.39619

>>39482 Sure, I'd love to elaborate. >And the size / intelligence ratio will be better by then, so I could train it for cheap. Basically, I'm waiting for some sort of breakthrough that makes inference and training much cheaper with same computational power. Maybe the diffusion LLM project, or RWKV, or something completely different. If a 1B model performs as well as the 12B model, I could train it overnight on my GPU. Use that to update the personality bias and the memory layer loras without lobotomizing the model. Or someone comes up with neural network architecture that really learns during inference, and isn't just fancy autocomplete like transformers. >Instead of the usual reasoning, it comes up with the whole unconscious thought process itself. This one is interesting. There is a guy who already does something like this. https://github.com/yukiarimo/yuna-ai He trained his own model to generate to generate different data for the situation. ><yuki>: User's dialogue ><yuna>: Companion's dialogue ><hito>: Other peoples' dialogue in the same conversation ><qt>: Internal thoughts and feelings ><action>: Function calls and actions ><data>: Embedded data or information Talked to him, nice but weird dude. This is like the reasoning of modern models, but for different aspects of world building, and the tags are additional special tokens. Usual models have system, assistant and user. In theory, a model could be trained to approximate the logic of all my agents. You start off with the dialogue, and the model dynamically generates the thoughts, goals, emotional impact of the new input on the fly. This means that I wouldn't have to make sooo many prompts with specialized agents, just to get one specific output (such as valence, anxiety delta as json). I decided not go any further with this, because I can't into math and even if I manage do make progress, some chinese dude will have it perfected by the time my code works. In the meantime, i'm getting into holographic waifus. Bought a Quest 3 and now I'm researching slam and segmentation.

GreerTech 06/27/2025 (Fri) 21:54:26 No.39620

>>39619 Check out our visual waifus thread! >>240

Grommet 06/29/2025 (Sun) 01:37:59 No.39646

>>39619 I don't know how to do this but I read the Chinese, I think, were using computational farms if I remember correctly from Amazon or Google. It seems that they have cheap computing power. They need massive compute so they sell off the excess and they were training AI's starting from a known pre-built AI for like $200. That might be a way to cut cost. Start with a huge open source AI, then train on a waifu dataset for a few hundred dollars. Once trained it of course can be duplicated endlessly. A suggestion is to have it when turned on have standard "command training function". Very much like talking to little kids. Maybe an input password, can be verbal, like "name of bot", do this, don't do that, stop. Should be a natural verbal command structure to correct like a kid so you don't get confused. Very simple two-three year old human commands and each one of these would provide a "control vector" as talked about here, >>31242 >>24943 >>31268 >>33184 >>35865

Barf 07/07/2025 (Mon) 04:47:20 No.39802

Here's instructions on how to integrate Open WebUI, F5-TTS voice cloning and KDTalker avatar using Pinokio for easier install. It just adds a button to generate the video avatar after the response, so it is optional. https://github.com/Barfalamule/KDTalker-OpenWebUIAction There's a lot of functions for Open WebUI like websearch, home assistant, weather and memory. And you could pair with OBS\webcam for vision and maybe output the video to a screenface Takes under 5 seconds for short response on 3090 for the audio and 30 seconds for the video, and should be about double on a 5060 ti 16GB. Or you could rent a B200 by the hour and have it instant.

GreerTech 07/07/2025 (Mon) 08:39:02 No.39803

>>39802 It looks great, what you describe definitely is an upgrade to the normal offline local AIs I use. You should definitely do a video tutorial.

Barf 07/07/2025 (Mon) 14:49:38 No.39804

>>39803 Thanks. Will do at some point but have to censor everything for github

Chobitsu 07/07/2025 (Mon) 15:35:55 No.39807

>>39802 >>39804 Hi Barf, thanks! >...but have to censor everything for github You're not beholden to Microsoft's GitHub, Anon. There are plenty of good alternatives. --- Regardless, thanks for keeping us all up to date on your progress! Cheers. :^)

Edited last time by Chobitsu on 07/07/2025 (Mon) 15:36:44.

GreerTech 07/24/2025 (Thu) 18:26:48 No.39971

Very useful article on offline LLMs https://www.xda-developers.com/things-wish-knew-started-self-host-llms/

Chobitsu 07/26/2025 (Sat) 03:09:30 No.39982

>>39971 Thanks, GreerTech!