/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality!

Happy New Year!

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


HAPPY NEW YEAR, /ROBOWAIFU/. WHY NOT JOIN THE PARTY?


Open file (2.28 MB 320x570 05_AI response.mp4)
Open file (4.77 MB 320x570 06_Gyro Test.mp4)
Open file (8.29 MB 320x570 07B_Spud functions.mp4)
Open file (1.06 MB 582x1446 Bodysuit.png)
SPUD Thread 2: Robowaifu Boogaloo Mechnomancer 11/19/2024 (Tue) 02:27:15 No.34445
This first post is to show the 5 big milestones in the development of SPUD, the Specially Programmed UwU Droid. You can see the old thread here: >>26306 The end goal of SPUD is to provide a fairly high-functioning robot platform at a relatively low cost (free code but a few bucks for 3d print files) that can be used for a variety of purposes such as promotional, educational or companionship. All AI used is hosted on local systems: no bowing to corporations any more than necessary, thank you. Various aspects of the code are/will be modular, meaning that adding a new voice command/expression/animation will be easy as making the file, naming it and placing it in the correct folder (no need to mess around with the base code unless you REALLY want to). While I'm researching more about bipedal walking I'll be making a companion for SPUD to ride on, so it might be a while before I return to the thread.
>>42218 Heh, cute! >that clack Can you find a way to bumper them so kids don't have the full "SPUD Scissortoes of Death!" experience? Might make them a little nervous being down close where they are and all. Cheers, Mechnomancer. GG! :^)
>>42219 >"SPUD Scissortoes of Death!" Funny enough that was never an issue. Kids saw her reacting to me saying "toes" and they ran right over to try to get her to do it :D
>>42225 Oh that's good then! :^)
>>42218 Super cute
>>42218 So you suppose harpy girls would be easier to engineer than standard human feet?
>>42218 We have robowaifu feet videos before GTA 6 >>42230 "I used the harpies to destroy the harpies"
Open file (6.96 MB 720x1200 Spudgui.mp4)
>>42234 >robowaifu feet videos Kek A demo of SPUD's gui. Because it runs asynchronously it slows the llm down to a crawl but at least the preprogrammed functions are in real-time. I'll try to integrate it into the main program and make it toggle-able. Also have a reason to redo spuds face again (mostly fixing the eyes). She did a bit of a faceplant during setup. Runs fine tho.
>>42234 KEK >>42239 That's actually really cute. Great work on the GUI, keep it simplistic for this environment! The timings seem a bit off? I'm sure you can solve the synchrony & compute resource-allocation issues, Mechnomancer. My own plan is to have a smol mesh of 4 SBCs onboard the robowaifu to distribute compute loads. >faceplant Ahh. I noticed her eyebrows/lids were a little off before. I hope nothing structural needs replacement. Regardless, dear SPUD is coming right along! Nice work Anon, keep it up. Cheers. :^)
Edited last time by Chobitsu on 10/13/2025 (Mon) 06:07:09.
>>42241 >The timings seem a bit off? Yeah, I threw it together in less than an hour. I'll solve the synchrony issue by only having the image window be updated after the AI does its shenanigans. I talked with a puppet maker who was taught by Tom McLaughlin (of Dark Crystal/Jim Henson fame) and listened to his panel. He does traditional cloth work (I wish I got a pic of his "Beast" for a children's production of Beauty and the Beast) and works with silicone by creating a foam shell (furniture or eva foam) then spreading/smoothing silicone (aka bathroom caulking) over it. It has given me some ideas how to do a "rubber" face that would be easy and (hopefully) not creepy. Of course I'd have to get my 3d printer back running again so I could make the servo frame for it. In the meantime I could probably do some tests to try adding certain curves to SPUD's 2d cosplay.
>>42252 Ahh, makes sense. I hope it turns out being just that simple! :^) >It has given me some ideas how to do a "rubber" face that would be easy and (hopefully) not creepy. Neat! Good luck with this effort Mechnomancer. Looking forward to seeing the results. Cheers. :^)
Edited last time by Chobitsu on 10/15/2025 (Wed) 18:10:31.
Open file (1.43 MB 1025x763 motor control.png)
I've been working on a closed-loop motor control system (pic rel). An arduino acts as an analog reader sending data on the serial bus (with a servo breakout board acting as a breakout board for motor encoders aka potentiometers), while a MCP23017 (red glowing jellybean) sends out digital signals via I2C which get amplified by the ULN2003 (purple guy). Arduino isn't doing digital outputs because I discovered with other projects it can have power issues if you have too many digital outputs, and I don't have I2C analog readers because they didn't sample very fast (like 2 samples per second, could be user error but why bother when I already have something that works?) So is it SPUD-related? Not really. Or at least, not immediately. This allows the raspi to control motors via the MCP2307 (sending digital outputs to a separate motor controller) and read real joint position and act accordingly. This opens the doors for controlling many other types of motors, from smol hobby motors to 3500lb winches to screw-drive linear actuators, which is good because screw-drive linear actuators they only use power when moving. They tend to move slow but can easily be pushed beyond their voltage rating. So, maybe SPUD will have linear-actuator-based legs. Or I might just plop her on a big reciprocating spooder base. https://www.youtube.com/watch?v=fTtXNjmahzE
>>42385 THIS LOOKS AMAZING. I'm really glad to see your research in this area, Mechnomancer. If we can devise sophisticated motor control schemas using cheap COTS parts that will strongly solidify these efforts as every'non DIY projects. I hope you in fact create linear actuators for dear SPUD'S legs. We need more research in this area tbh. Looking forward to your progress, Anon! Cheers. :^)
You should make a github
>>42424 >github Eventually, I only wanna release properly documented stuff cuz I've encountered many a repository that isn't properly documented D:< I've gotten better at doing documentation during development (makes it easier to find & fix stuff) so that's more of a possibility. I also discovered a faster way to do graphics on the raspberry pi, so that means I'll be re-building Pringle to be even more optimized. Among faster render times I plan to put dependency installation right in the code so it is just run and done with a few other features.
>>42455 >I also discovered a faster way to do graphics on the raspberry pi, so that means I'll be re-building Pringle to be even more optimized. Among faster render times I plan to put dependency installation right in the code so it is just run and done with a few other features. Excellent! Looking forward to this, Mechnomancer.
Open file (361.72 KB 682x384 new pringle.mp4)
Open file (30.23 KB 406x320 emotions.png)
A faster way of rendering graphics for raspberry pi is using Pygame. So I've been working on an even more basic SPUD PR model: it simply takes expressions from a sprite sheet via keywords. No lipsynching makes it easier to work with. I figure something simple and lightweight would be good to add to any project if you decide it should have a face. Could probably also use with AI if you scan the AI response for the keywords. Features: - Custom sprite size (must be square tho) - fullscreen and scales up to your monitor resolution - keywords determined via external txt file (left to right) - gives a warning if your spritesheet isn't perfectly proportioned - manual input to select faces (won't crash if you improperly reference one) Things I need to add before release: - dropbox.txt for expressions instead of an input prompt - setting custom sprite size without going into the code (another external .txt)
>>42942 Excellent! There's a reason vidya settled around this paradigm: fast, efficient, and easily-reconfigured. Nice work, Mechnomancer. BTW, I'll presume this is primarily for dear Pringle instead of dear SPUD?
>>42945 Ah yes. It is indeed for Pringle. Forgot to mention that. I got the idea today and threw it together in like an hour. Also got in a shipment of materials to make some body covering for SPUD and could possibly be repurposed into a minimum viable waifu. But I got some outdoor things to do before I get snowed in for real -got a blizzard and it all melted, heh- and focus all my energies on SPUDliness. For the custom sprite size I might just whip up a configuration file protocol for python that looks for things like "sprite_size = 52" in any order in the txt file, and if a config isn't found uses the defaults in the script and gives a console message to the user. It would make modularity easier to implement in future.
>>42952 Excellent. Keep the stove hot! Cheers, Mechnomancer. :^)
>>42942 Nice! I like all the different faces, nice work!
>>42960 I yoinked them off an image search lol. Using pygame seems to run much faster for basic image manipulation, so hopefully this will serve as a good proof-of-concept for replacing SPUD's indicator graphics.
Open file (1.71 MB 1005x1371 pringle deployment.png)
Pringle is shocked a status LED is shining through the 2nd curved OLED screen I have. I'm probably gonna make her a papercraft mecha musume helmet.
>>43058 Nice!
>>43058 >Pringle is shocked a status LED is shining through the 2nd curved OLED screen I have. <"It's a feature, not a bug!"' :D >I'm probably gonna make her a papercraft mecha musume helmet. <"No one cared who I was until I put on the mask." --- Very cool. This looks pretty sweet, Anon. I like where you're going with dear Pringle's new design motif. Please keep us all up to date on her progress! Cheers. :^)
Edited last time by Chobitsu on 11/27/2025 (Thu) 01:56:41.
Open file (11.69 MB 480x658 SPUD PR v1 helmet.mp4)
>>43062 >Please keep us all up to date on her progress! Tada! Some slight alignment issues but that is a graphics problem not a code problem.
>>43068 VERY COOL!! That curved screen really gives some dimensionality to her face. Really looking forward to where you go with this, Mechnomancer. Cheers. :^)
Open file (1.38 MB 1029x1077 SPUD neko ears.png)
Hastily gave pringle some ears so she looks a bit less gundam and more Samurai Pizza cat :D
>>43086 LOL. A cute! :D
>>43086 Y'all are really out here trying fuck mouse cosplay BMO instead of talking to any woman ever, smh
BANNED TO THE BASEMENT DUNGEON!111
>>43090 Oh hi obvious troll. To answer your question, yes, and? @Chobitsu obvious troll
Open file (72.54 KB 960x1041 chadpepe.jpg)
>>43090 TFW the cute college chicks oogling me during the numerous exhibitions of my robits. I must science before I procreate. I might make a medabot-size walking robit body for our Samurai Pizza Cat. Erm, Samurai Pizza Mouse. Depends how bored I get and how severely I get snowed in this year.
>>43093 >I might make a medabot-size walking robit body for our Samurai Pizza Cat. Neat! That's about the size I envisioned for the initial prototype of dear Sumomo-chan (a bit smol'r, actually... ~45cm or so). >Medarot Heh, haven't watched that in years. I'll be doing so a bit during the holidays. Cheers, Mechnomancer. :^)
I have an idea: make a bunch of reaction image faces Shocked, smug, happy, etc If you male those, I (and hopefully others) will use them online.
>>43192 >reaction images Yeah I've been doing that occasionally in other online communities lol Been mucking around with SPUD's code in my free time and: A) figured out how to segment the LLM's response into roughly 7 words to feed to the TTS. LLMs usually go by tokens and sometimes they contain a fraction of a word like so the phrase "I ate an apple!" might be the tokens "I a", "te an"," app","le!". A little tricky, mostly counting spaces and splitting tokens up. Just couldn't be bothered to figure it out until now. B) sequential speech. Before, SPUD would speak as the response is generated by the LLM so any backlog would be ignored. But now everything goes in a nice tidy list. C) image/object recognition AI via voice command. Apparently she sees what looks like a computer monitor in an environment that looks like a cockpit :D Theoretically, there should now be only a brief pause before SPUD goes into a nearly contiguous response-a very slight pause indicates a gap in the 7 word chunks but not noticeable unless you listen closely for it. I just can't record proof thereof because all my bluetooth speakers are out of battery ^_^; I also have an idea for the construction of a rubbery face (and perhaps even a body someday) using a combo of some techniques I learned from a puppetmaker at the last convention I exhibited at and my own ideas. But that will probably wait until the new year. And maybe a return to LCD eyes, I don't know. Have to experiment.
>>43243 Interesting! You're making good progress with conversational AI. It definitely has applications beyond SPUD
>>43243 WOOT!! This is exciting, Mechnomancer. If you don't mind, I'll look into this for the dear HoloWaifu project?
Open file (2.09 MB 498x212 real-steel-max.gif)
>>43260 Well, the optimizations consist of relatively simple variable juggling. So not to hard for someone to implement themselves in python (or in another programming language). So feel free. To go into detail: As each token is generated, check for a space character and count them. if the count is 7 and there is a space in the current token you split it along the space, attaching the first half to your completed token chunk and send that to tts, then you set the second to the variable where the freshly generated tokens are added. Updated lip synch program works by checking a text file for list of .wavs and keeps track of where it is in the list. After each file is played (which consist of playing the .wav file audibly then going thru all the wav file frames checking amplitude and moving the mouth accordingly) it goes down the list. The list is only reset if instead of a filename there is the keyword "fin", otherwise it will wait until another filename is added to the list. the file_list.txt would look like "llmresponse_0.wav|llmresponse_1.wav|llmresponse_2.wav|fin" Thankfully playing a .wav doesn't take up much processing power. Interestingly, if the LLM hasn't been used in a while (or first starting up) the first response can take a while. Probably just loading it in/out of ram but it adds a bit of character: "oh the robit is daydreaming lol". For object/image recognition I use a Ultralytics YOLO model, I forget which. https://www.ultralytics.com/ Next thing I might do is check out how fast I can get a pose-estimation model running. Would be cute to get SPUD to mimick my arm movements. Gifrel https://youtu.be/AAkfToU3nAc?t=185
>>43273 Thanks for the details, Mechnomancer! >token parsing This seems straightforward enough. >list of .vav files I must've missed how these are being selected for? >...checking amplitude and moving the mouth accordingly I'm presuming you have a hand-curated mapping of phonemes to visemes? >Thankfully playing a .wav doesn't take up much processing power. There's really no decoding required. .wav's are .bmp's of audio! :D >pose-estimation model running That is a very powerful capability that has uses across a wide array of robowaifu topics! Good luck, Anon. Cheers. :^)
>>43283 >[wav files]I must've missed how these are being selected for? The LLM program writes the name of the tts generated file to a txt file in the form of a list. The lip synch program reads from the .txt file and keeps track of which one it is on. if there is no proper .wav file (eg filename called "fin"... or maybe it is "end of line") it simply waits around until a proper file is written to the current slot >I'm presuming you have a hand-curated mapping of phonemes to visemes? No just a simple mouthflap: if the .wav is above a certain volume open the mouth, else close it. But it probably wouldn't be too hard to export the string and have the lipsynch program scan for them.
>>43286 >The LLM program writes the name of the tts generated file to a txt file in the form of a list. Ahh, got it. I suppose you specify that behavior within the prompts themselves? >No just a simple mouthflap Oh duh! I knew that. :P I've been thinking about full-blown animations recently. >But it probably wouldn't be too hard to export the string and have the lipsynch program scan for them. Yah, I think you're right. --- Anyway, excited to watch what you do this Winter for dears SPUD & Samurai Pizza Cat Pringle! Cheers, Mechnomancer. :^)
Edited last time by Chobitsu on 12/13/2025 (Sat) 02:48:03.
Open file (31.55 KB 668x892 simple python AI.png)
>>43290 >Ahh, got it. I suppose you specify that behavior within the prompts themselves? Nope. I'm using the ollama library for python, which means the LLM outputs to a a simple string. Pic rel. It's *that easy* to set up a python waifu. Sure its for one-shot responses but basic coding could get you a nice text-loop easy. (https://github.com/ollama/ollama-python) My code is more complex since I'm doing things like TTS, voice recognition,separate lip-synch processand pre-scripted responses but that is the general idea. SPUD's streamlined tts code was 99% working but I accidentally broke it last night. Fixed it today, tho. She now has the vocal cadence of William Shatner. XD I'll record a vid at some point. She knows about gundam, the founding fathers, tony stark (but not spooderman). Unfortunately her paperface hasn't done very well since she face-planted at that convention. Good excuse to look into that new face idea after Christmas. Funny enough it may involve our old friend Mr Fishlips.
>>43291 Oh cool! That does sound pretty easy. >She now has the vocal cadence of William Shatner. XD LMAO >"Scotty. I've. got. to. HEAR THIS!! Engines to full!" >Unfortunately her paperface hasn't done very well since she face-planted at that convention. Good excuse to look into that new face idea after Christmas. Oh that's right! Yes, probably good timing. >Funny enough it may involve our old friend Mr Fishlips. Somehow this has slipped my mind. :P --- Thanks for keeping us all up to date with your progresses, Mechnomancer! Cheers. :^)
>>43292 SPUD with the LLM responding sluggishly/sleepily (takes a while for the LLM to load into memory?) vs a bit more awake. Her face slid down her skull a bit. Either way gonna be looking into an even better one. I think the latency is pretty good for an itty bitty raspberry pi that isn't overclocked running an LLM and somewhat decent TTS (better than Espeak anyway).
>>43360 Neat! I'm glad that she's back together, and that you have an improved face in the works. >response latency While as a technologist, I certainly agree its good performance for the hardware (and in fact remarkable that it even works at all so well)...yet I hope we can figure out ways to improve the performance further still. Significantly-so. Adult laymen generally have little to no comprehension of the underlying complexities involved in such a feat, and are unlikely to display much patience. Not to mention children's impatience! :D While my own "target audience" is Anons (and therefore grown men) who are much more likely to both understand the problem of doing this -- and may in fact be nerds themselves. But even at that, I think we'd all like smooth-flowing TTS/STT/LLM integrations. >tl;dr THIS IS A BIG CHALLENGE! :^) --- You've done well here, Mechnomancer. I'm personally planning on looking into Gerganov's Whisper when it's time for me to tackle such an effort. * It's already focused on optimized performance on smol hardware. Additionally, given the yuge number of contributors that have stepped forward I don't doubt that wrappers for Python and several other languages likely exist for his entire swath of research work, including his spin of Whisper. --- * https://huggingface.co/ggerganov/whisper.cpp https://github.com/ggml-org/whisper.cpp https://ggerganov.com/
Edited last time by Chobitsu on 12/16/2025 (Tue) 05:14:11.
>>43376 >the bottleneck? The LLM is the main bottleneck. I have a few ideas to do some stuff like "ums" and "ahs" to fill the silence. So when SPUD is halfway thru generating a sentence she will go "um", "ah", "erm", etc. Also could do something similar while waiting for the text to initially be generated. Since the speech is sequential now I can do such things without worrying about overlapping/negating the text being generated.
>>43380 >The LLM is the main bottleneck. Yeah, kind of as-expected. I hope we can all have a general breakthrough in this area. Cheers, Anon. :^)
>>43405 I suspect further LLM optimization will come as they develop. After all, a few years ago nobody probably even imagined running an LLM on a raspberry pi! I'm also not using the smallest one I possibly can XD Meanwhile I tried to get SPUD to sing a Christmas song, but she was rather reluctant to. When I finally did get her to sing something there apparently is a good reason she doesn't like to sing: she is an absolute goblin. This showcases some of the "ums" and "ers" I put in while text is in the middle of generating. There are still some substantial pauses that could do with some filling via time-based deployments of "um" and "erm". But that would be at the mercy of the rate of token generation. Idk.
>>43409 A CUTE! >LLM pauses I'm sure we'll see improvements! Cheers, Mechnomancer.
Open file (4.85 MB 2458x2560 teslabot photodump 1.png)
While on Christmas "Vacation" I got some ideas n stuff. I am tempted to return to LCD screens, however I've had that recent discovery using graphic libraries -even high-performance ones like pygame- cause the LLM to slow to a crawl. So I have a big-brain solution (literally). Use 2 pis. The brain has 2 hemispheres, so why not SPUD? The "left" hemisphere (I am biased because I am right handed lol) will handle the speech detection, high-end vision recognition and LLM while the "right" will handle the lower-end functions such as displaying emotion, moving servos, etc. In biology the gap between the hemispheres is bridged by a structure called the corpus callosum. For SPUD, this would be the UART port on the raspberry pi. It's quite easy to send strings back and forth between 2 pis via python. However, I'd have to set up ANOTHER program to run alongside everything to put the strings into various .txt files. I should also experiment to see if there are any smaller models I could run. Sure, SPUD might end up even derpier but It would be nice for her to respond faster than a socially awkward sperg. After all if a bimbo is pretty it doesn't really matter what she says lol. Oh, I also stopped by a Tesla store since I heard they got Teslabots on display. I saw the thing and got idearhea. That's like having so many ideas you compulsively have to write them down (idea diarrhea). I don't have it very often but when I do it is awful cuz 99% of the time I'm away from pencil and paper. Got some up-close pics and chatted with the dudes behind the counter. Sure, they gave the schpeal about how Teslabots would be available in the next year and if I would buy one, so I pivoted to my DIY attitude and showed them some pics of my robo-family. Their eyes shot out of their sockets and suggested I apply to Tesla to work on the robot lmao. But I got an art degree so unless I really BS the resume... ¯\(ツ)/¯ A few things about Teslabutt here: A) The legs appear to only have 3 actuators installed: 2 ankles 1 knee, there is no thigh actuator! Maybe it was hidden really deep in the thigh or they just didn't install it for the demo model. Someone did leave a zip tie on one of the wrist actuators tho :D 2) It is completey static. Like no movement. Can't even talk to the thing. So I got one up on the Musk man :P Yellow) Pretty sure I have enough spare motors lying around to build my own teslabot-style legs. But I'm still in-progress when it comes to highspeed control of acme-screw linear actuators. Once I get that deployed I might give some teslabot legs a try. Would be a mix of them acme-screw actuators I love so much, the 80kg servos I mentioned back in August (>40272) and the ASMB-04s. Although I'll be mighty tempted to put in some foot sensors. Poor Teslabutt seems to be lacking them (no wires or cables going around his ankle balljoint). If I could pull it off I'd love to have SPUD walk into a Tesla store and say she wants a Teslabot as a Husbando :D
Chat GPT seems to be getting better at producing images. SPUDis looking delightfully animu. Just gotta work on making the reality match the images >:V

Report/Delete/Moderation Forms
Delete
Report