/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality!

We are back again (again).

Our TOR hidden service has been restored.

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


“What we do not see, what most of us never suspect of existing, is the silent but irresistible power which comes to the rescue of those who fight on in the face of discouragement.” -t. Napoleon Hill


Visual Waifus Robowaifu Technician 09/15/2019 (Sun) 06:40:42 No.240
Thoughts on waifus which remain 2D but have their own dedicated hardware. This is more on the artistry side though ai is still involved. An example of an actual waifu product being the Gatebox.
gatebox.ai/sp/

My favorite example is Ritsu, she's a cute ai from assassination classroom who's body is a giant screen on wheels.
>>43038 Well, if we have a stated goal, then we must make the egg first, a platform to make desired content on. >I intend to create some basic facial animation work as "proof of concept" *, but I'm going to need a headset of some fashion for prototyping the product effort. Exactly
>>43035 >what we actually need is content Undoubtedly. But does this modular approach to the headset where the smartphone is basically a Nintendo Switch that can operate either docked or undocked mean that what we're creating here is basically just a smartphone app? Or is that an overly reductionist way of looking at it?
>>43042 For now, yes
Please talk me out of buying these XReal Ones, bros : ( >>42928 )! :D They're US$400 right now, and look remarkably-suited to our HoloWaifu project (as the 'display-only' portion of the problemspace). * What's wrong with them that I'm not seeing yet? * Why is this grossly-overpriced for Anons? * Why would <<other_product>> be a much-better choice rn? --- Here's a general user's review: https://www.youtube.com/watch?v=3duYMt020_M A rather more-technical review with nice tight closeups of the device (also evaluates the more-expensive Pro version in comparison): https://www.youtube.com/watch?v=9TnBpCnX31c <---> PLS SEND HALP before I do it soon-ish! * --- * I was already planning to make some kind of investment purchase towards this project sometime around this upcoming /robowaifu/ birthday weekend period : ( >>1591 ) [so, a decision before next Monday when the improved pricing ends].
Edited last time by Chobitsu on 11/26/2025 (Wed) 13:19:21.
>>43050 $400 is a very good price, but how secure are they? Glasses that can see everything you can are a huge security liability if they can't be completely locked down. That said, if they are secure, then they'd be great for all of us.
>>43050 - I've had some difficulty finding a gyroscopic sensor that can calculate the z axis (vertical). Theoretically the GY-521 can do it somewhat, but there is drift because it cannot find a reference point. Chatgpt gets grump when you try workarounds so maybe I'll do some irl experimenting to prove the grand oracle wrong lol. - Supposedly small, hi-resolution screens are expensive. And if you're not doing a pepper's ghost it has to be transparent as well. But I just got a small 720p projector for like $20 at a discount store so I dunno about that. I might take it apart to see exactly how they do it. I have certainly been tempted to make my own diy vr headset but haven't done much beyond icon tracking. Maybe some GY-521 silliness might hold the key.
>>43051 >$400 is a very good price, but how secure are they? >Glasses that can see everything you can are a huge security liability if they can't be completely locked down. >That said, if they are secure, then they'd be great for all of us. As-is, they are very secure since they are output-only devices. You can simply think of them as 1080p monitors you can wear on your face! :D (And that you can also see through like basic sunglasses wherever there's no 'screen' being displayed.) That said, the entire goal of this project overall is to provide two-way video, so we can track aruco & QR markers, [1] and overlay Anon's waifu somewhere nearby to that marker (as 'Augmented Reality' [AR]). These can't do that quite yet, but still provide enough basics to go on with for me to begin working on the software side of the project. During this interim period, we can just 'lock' the display window into place wherever we want the waifu to be situated & displayed, say, floating on your computer desk. So the >tl;dr here is that security isn't an issue to begin with, but as the solution gets more sophisticated with time, then it will need to be addressed. (Simply firewalling off any network access for starters.) Does that answer your questions, Greentext anon? AMA if you need further info. >>43052 >Maybe some GY-521 silliness might hold the key. Heh, I think I have one of those laying around somewhere. Good luck and please keep us up to date with your progress on this, Mechnomancer! :^) >Supposedly small, hi-resolution screens are expensive. These use hi-quality Sony micro-OLEDs as the source projectors, 1 per eye. * >I've had some difficulty finding a gyroscopic sensor that can calculate the z axis (vertical). The basic XR 1's have 3 DoF tracking via onboard electronics. If you connect the optional US$100 'Eye' addon (a monocular camera situated right in the middle) then you can switch the unit to the 6 DoF mode [2] and truly lock the virtual screen into place within your IRL environment. This is the mode we would be aiming for with this project, whether via XR 1's or something else. That way the HoloWaifu properly stays put in-place while you move around. Make sense, Anon? --- 1. https://docs.opencv.org/4.x/d5/dae/tutorial_aruco_detection.html 2. The system augments the glasses' electronics with SLAM via the monocular camera to achieve this. https://www.uploadvr.com/xreal-one-glasses-become-6dof-with-xreal-eye-camera-xreal-beam-pro/ https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping * See spec details on the page linked : ( >>42928 ).
Edited last time by Chobitsu on 11/26/2025 (Wed) 18:00:45.
>>43053 $100 could buy quite a bit of hardware. Supposedly adding a compass could make the z axis easier (and appears GPT the oracle) but I could never get a proper reading with a compass sensor. A vr setup I'm thinking of would probably be a raspberry pi to do the gy-521 silliness and send that data over LAN, and use a wireless hdmi (also LAN) for video so it would be simply plugging it into your hdmi port and having a companion python script to receive sockets and turn into game input. I'm thinking about making my own version of steel battalion (mixed with darling in the franxx) and a diy vr headset might be a good gimmick in addition to/instead of a funky controller.
>>43054 >$100 could buy quite a bit of hardware. Fair point. I'd be trading expediency for gold (a common tradeoff, AFAICT! :D Also, the reliability that supposedly comes with a proper commercial product. * >I'm thinking about making my own version of steel battalion (mixed with darling in the franxx) and a diy vr headset might be a good gimmick in addition to/instead of a funky controller. The Iron Man LARP'rs system seems to be very-sophisticated as a DIY VR headset! ( >>42822 ). He claims it supports 9 DoF (incl. magnetic compass). --- * This is the 3rd or 4th generation of these things by this company, so its not just a 'johnny-come-lately' thing.
Edited last time by Chobitsu on 11/26/2025 (Wed) 18:12:27.
>>43055 >The Iron Man LARP'rs system Jesus! How can people ramble about such a simple thing for 30 minutes?
>>43053 >output only Huh, I always assumed that cameras were necessary for AR (the image tracking and intelligent placement being the "augmented" part). That is good for security, of course, but it does seem a bit pricy as just a monitor. Still, it's not a bad starting point for getting a sense of how everything should look.
>>43009 >>43010 Okay, HoloAnon thanks for your patience. I can finally take some time today to discuss your project ideas a little more. Just off the top of my head two things stand out: 1. I think it's a feasible idea to overlay a waifu over many possible physical artifacts, including a lever-driven, actuated snuggly doll. 2. This is a primarily SFW board, given our engineering focus (cf. >>3 ). As an accommodation, we have an echii "containment thread" of sorts ( >>419 ). Let's keep any sexual-oriented discussions constrained to that thread please. TIA. <---> I really like the way you've broken the problem down into elements, Anon. >Control system >Behavioral logic >Rendering I would add the obvious >Physical puppet & control into the mix. --- >Up until now this holowaifu project has relied on the principle of summoning the waifu when a suitable tracking marker such as a QR code is detected. Actually, we're not constrained by the visibility of markers at all. We can generate as many waifus & other virtual artifacts as the system in question has capacity for. I just wanted to point that out. >But there's no reason to limit this only to the waifu. You could introduce multiple markers that represent different AR objects. Yes, absolutely! Basically just like a game, you can create a practically infinite variety of virtual items. >The second is using hand gestures. The reason I picked these ideas as opposed to something else like physical buttons or head movement detection is because they operate on the same fundamental principle as the waifu herself does, the principle of AI image recognition. Yes. While I'll be starting out with a special 'pointer' for my initial development system, eventually we'll be able to perform pose analysis on the Anon operator, I predict. This will include hand gesturing for control. >although I might like to incorporate a voice command system at some point simply because it doesn't step on the toes of the other control systems and it creates more of a sense of the waifu being your OS. We're definitely going to need voice recognition at a relatively-early stage in the affair. >I think this portion of the program should be written as a state machine, much like the control system likely will be. State machines represent basically any sort of nontrivial programming I might be able to do because they just intuitively make sense to me. But we could have the waifu switch between states according to hand gestures and other forms of interaction that govern her behavior, including interactions with other AR objects you summon through their separate markers. Please proceed with prototyping that for us, HoloAnon. Even just pseudocode will be an advance atm. >This is the part that I have the hardest time understanding, which is kind of a problem because under this system the AR visor is controlled through AR means rather than physical buttons. You obviously need to render the waifu (or an AR-based control system) before you can interact with them. The basics of it are not much different than @Mechnomancer and @GreerTech have already prototyped for us... at least for starters. As the quality goes up, so will the complexity. My own goal is to have something that will put X.ai 's Ani to shame in the end! :^) >...and wouldn't vanish when you lose sight of a marker; she only needs the marker to instantiate herself. Again, that's not an issue. You'll probably understand better once I can post some videos of my initial prototype work with the visors. >...The waypoints are also context-sensitive... We could devise any kind of game-mechanics like systems that Anons can dream up (given enough actual developers working hard on the project). But for now, let's just keep it to the basics: A simple cartoon waifu that can stay in place, as you move around. Make sense, HoloAnon? <---> The echii stuff let's discuss further in the thread mentioned. If you would kindly repost your 2nd part in that thread, then that will get things started for that conversation. Deal? Anyway, thanks again for this great effort-post Anon. It's a good start for things that you have so many ideas already in mind. Cheers & Happy Thanksgiving! :^)
>>43057 Heh, he's enthusiastic about his project! :D >>43070 >Huh, I always assumed that cameras were necessary for AR (the image tracking and intelligent placement being the "augmented" part). Yes, that's true. For now, these are about the best compromise of (cost/capability/comfort) that I've managed to find so far. There are full-on AR goggles out there, but they are neither comfortable nor lower-cost. All this is still in it's infancy I deem, so much-improved systems should be available to us over the next 2 or 3 years. But we should be able to get started with these, as-is, I think. I'm planning to 3D-print some kind of clip-on for the frames that will support at least one camera (possibly several) to provide the input video to the computer system. I'm sure this will develop over time. >Still, it's not a bad starting point for getting a sense of how everything should look. Yeah. And these are probably the single most-comfortable visor ATM to begin with, and seem a fairly-mature design as well. We'll see! :^)
>>43057 >>43073 >Jesus! How can people ramble about such a simple thing for 30 minutes? He's a YouTuber. They're second only to politicians in their ability to talk as much as possible without actually saying anything.
>>43054 >I'm thinking about making my own version of steel battalion Sounds like a really cool application of this technology. Vehicle games in general (racing, flight, etc.) could see a lot of use with this, but maybe mech games will become more popular with visors. >>43072 >We can generate as many waifus & other virtual artifacts as the system in question has capacity for. This is of course true, but I'm thinking with an eye towards immersion here. In Pokemon Go they generate Pokemon over your smartphone camera's field of view, but they don't really feel like part of the environment because they don't interact with it. The reason I want to use markers and possibly a digitized clone of your surroundings is so you can create AR entities that actually feel realistic. Like if you created an AR dog but had a real water bowl in your room, the dog would recognize the bowl and would sometimes take a sip from it unprompted. >hand gestures and voice recognition Glad to hear these are plausible. But why do you think voice recognition would need to be present early? >Please proceed with prototyping that for us Yeah, I want to do some work on this. I'm not completely sure what I'm doing here, but I'll talk to DeepSeek and see if I can at least come up with some pseudocode. >As an accommodation, we have an echii "containment thread" of sorts That's fine, the part about the marionette is outside the scope of purely visual waifus, so that does belong in another thread.
>>43080 >This is of course true, but I'm thinking with an eye towards immersion here. In Pokemon Go they generate Pokemon over your smartphone camera's field of view, but they don't really feel like part of the environment because they don't interact with it. The reason I want to use markers and possibly a digitized clone of your surroundings is so you can create AR entities that actually feel realistic. Like if you created an AR dog but had a real water bowl in your room, the dog would recognize the bowl and would sometimes take a sip from it unprompted. Very interesting. This scenario would require at least 4 degrees of "ping-ponging" back and forth to make work correctly. Given the very-tight time budget constraints on computations for realtime interactivity, it's no wonder it hasn't been solved yet! But who knows, lets see what we can manage here. >But why do you think voice recognition would need to be present early? Simply b/c any other form of interaction will quite rapidly become incredibly clunky, in my estimate. We'll see. >Yeah, I want to do some work on this. I'm not completely sure what I'm doing here, but I'll talk to DeepSeek and see if I can at least come up with some pseudocode. DOOEET! :D >>43080 >That's fine, the part about the marionette is outside the scope of purely visual waifus, so that does belong in another thread. Great! I'll wait on your repost there. Cheers, Anon. :^)
>>42893 In accord with @Kiwi's sensible admonition -- and since I'm already quite-familiar with this product's platform -- I've ordered a little wheeled vehicle * for transporting the ArUco Markers & QR codes around on surfaces, just in case any'non here wants to follow along with my doings for this project. The plan is to print little tracking markers to affix to it, and directing it to drive around on a surface "carrying" the holowaifu along with it (in other words, as an expedient stand-in for the "BallBot" conceptualization ITT). --- * It has a camera + sensors, differential 4-wheel drive (can spin in-place), is compact (~ 6" cubed), and runs on the UNO R3. It's currently just US$55 (for Black Friday). https://www.amazon.com/dp/B07KPZ8RSZ
Edited last time by Chobitsu on 11/28/2025 (Fri) 15:28:22.
>>43082 >I've ordered a little wheeled vehicle Wait, I thought we agreed that we didn't need a robot for this, just graffiti.
>>43084 Eheh, yea we did. But I still want the little waifu to be able to traipse about in my flat before long. I simply decided to pull the trigger on this while it was still a Black Friday deal. No one else need get this yet. And beside, I still need to work out the camera+clip setup before that phase as well (cf. >>43073 ).
>>43085 So you want to try this because you think it would be faster to implement than digitizing your surroundings and rendering a waifu into that?
>>43088 Well you put it like that, yes (though that wasn't my reasoning). At the moment at least: a) I'm rather confident I can do the camera+tracking of the Aruco/QR car with my current knowledge. I'll just need to make the time to assemble everything and write a bit of code to support that. b) I'm not confident in the same way about digitizing my surroundings (though I'm sure I can get a handle on it when I make time to focus on it). Unless I literally went into Blender or Maya and modeled them. In either case, the rendering of the waifu is pretty straightforward. Again, we'll start smol and grow big with her designs & details (as with everything else for this project).
I just wanted to give a quick update on this. I ended up getting sidetracked by other things, and the chatbot site I use started experiencing technical difficulties; I got multiple 502 errors across different websites, so I'm not sure what's going on here. But I did find another site that seems to be usable for now, which is good because my ability to code doesn't exist without that.
>>43202 Hello HoloAnon, welcome back! So I've now gotten the XR1 visor glasses + a few other accessories -related to hopefully make it work well for us for the HoloWaifu project. I should have something visual to show in about a week. Don't expect much fancy at first, just something to demonstrate the platform itself works OK. BTW, the glasses are quite comfy and the brightness of the internal screen is excellent (I was rather concerned they would be too dim).
Edited last time by Chobitsu on 12/09/2025 (Tue) 07:28:07.
>>43009 >>43025 > (part 2 of 2 -related : >>43203 )
>>43207 Based, you've done more on this project than I have. Which was pretty much what I expected, like I said before, but what exactly are we looking at here? Do you have a full waifu program or just a demo for the ability to project AR holograms onto a marker?
>>43212 Thanks! I'm really hopeful that prices on all this stuff will begin coming way down once more competitors enter this arena. My credit card is feeling the pinch from this right now! :^) >but what exactly are we looking at here? Do you have a full waifu program or just a demo for the ability to project AR holograms onto a marker? No we won't have any of that stuff yet. I'm going to take baby steps at all this to keep the impacts on my time & energy low-ish. For now, I'm simply going to demonstrate displaying a static image of dear Chii-chan in a window, floating above my work table. Should be about a week as I get everything assembled for it. Next, I want to print out aruco markers, making a smol printed cube of them. Then I need to 3D-print a PLA clip to go onto the nosebridge of the visors to hold a little mini-webcam right in the center of them. This will be the input video feed back to the computer, * and which we'll use to create the most basic OpenCV tracker program so that it keeps a target reticle fixed on that cube, regardless how I move my head. After that, we'll explore more about registering the waifu image above the target cube within the visors, as we mentioned. It'll still just be a static image for now. Then we'll start working on creating a "blocking" 3D waifu character (blocking is a film term, it just means lo-rez & clunky) to register above the cube (now replacing the previous static image). We'll use the blocking character to begin programming the initial crude animated movements. That's enough to go on with for the moment! :D I hope all this makes sense, Anon? --- * Remember, these visors are merely the output from the computer part of the problemspace. (cf. >>43053 )
Edited last time by Chobitsu on 12/09/2025 (Tue) 08:28:06.
>>43213 >My credit card is feeling the pinch from this right now! Shit, sorry about that. I didn't mean to get you to spend money on this that you can't afford. I know how bad the economy is right now. Maybe I should shop for food deals for people or something to ease the pinch a little. If I had the programming skill I'd create an extreme couponing AI that finds all the best deals on essential goods. But then again I'd rather just have those things actually be affordable to begin with so this wouldn't be needed. But yeah, your demo concept makes sense. It might also help my understanding to actually see the proof of concept in action. I think what I need is a basic list of core actions the waifu hologram should be able to do. Ideally you'd be able to piece together more complex actions from combining this core set in different ways.
>>43215 >Shit, sorry about that. I didn't mean to get you to spend money on this that you can't afford. LOL. Don't worry Anon. I'm a grown man and know how to budget. I'll grudgingly give the kikes their pound of credit jewgold flesh to see our dear HoloWaifus come to life! :^) >Maybe I should shop for food deals for people or something to ease the pinch a little. If I had the programming skill I'd create an extreme couponing AI that finds all the best deals on essential goods. But then again I'd rather just have those things actually be affordable to begin with so this wouldn't be needed. No. I don't recommend anything like that! Please just stay focused on perfecting your basic colloquial C++ skills for now. That will be far more valuable to us here! >I think what I need is a basic list of core actions the waifu hologram should be able to do. Ideally you'd be able to piece together more complex actions from combining this core set in different ways. Sounds like a great concept! Please begin compiling the basic animation couplets we need a holowaifu to perform. Good thinking, HoloAnon. Cheers. :^)
>>43216 >Please just stay focused on perfecting your basic colloquial C++ skills for now. This is what I'm going to do. I've been talking with DeepSeek about it and got a list of about 20 different core actions, but this many core actions may result in spaghetti code. I was hoping it would be more like 5-10. That's why I wanted to ask people what they want their holowaifu to be able to do.
>>43247 >This is what I'm going to do. Excellent. > I was hoping it would be more like 5-10. That's why I wanted to ask people what they want their holowaifu to be able to do. Please don't get discouraged if it goes WAAAY beyond that. Slow & steady, Anon. Cheers. :^)
Open file (522.54 KB 1200x800 manupmanchildren.png)
OK peeps, I have some more progress to report, maybe even a major breakthrough. I surmise that with this concept, there's now enough basic information for a new dev who's never had any previous experience with waifu development to meaningfully contribute to the project on a technical level, which is good, because that dev is me. Holowaifu Movement Solved (Maybe) I thought of a possible way to govern the holowaifu's movement, and it includes the possibility of dynamically creating movements on the fly to make the waifu feel more realistic. It's similar to the idea I came up with for robowaifu martial arts. Basically it involves using an AI video model to create her movements, but there are a few important caveats to this. The "prompt" would be either voice commands issued by the user, or the waifu would use the footage from the visor itself as an image/video prompt. Because the waifu needs to react in real time and this is an AR visor we're talking about, with processing power equivalent to a smartphone (because in many cases the visor will actually be a smartphone attached to a headset), the model would have to be a very stripped-down model optimized for speed rather than enormous amounts of artistic detail. If the visor isn't powerful enough to run the model fast enough for real-time reactions, the model could be run on a laptop or desktop that streams its output to the visor. Of course it would be helpful to have a set of standardized movements rather than having to constantly have the AI create new movements, but implementing each movement as a state in a state machine program could rapidly become unmanageable. Therefore I've decided to simplify the program's architecture to include only 3 states, IDLE, SEARCH and REACTION. Whenever the waifu needs to generate a movement, the program runs a search to see if any of the existing animations in its database are suitable for the prompt, and if a suitable animation is found, it runs. If a suitable animation isn't found, the AI video model creates a new animation. But the AI video capabilities are very rarely necessary, particularly for the prototype version. The search algorithm is the key here, because it relieves much of the processing demand of having to come up with new animations. It's also more efficient than a program with dozens or hundreds of possible states representing the waifu's behavior. Most likely the first version won't feature the AI video model at all, merely a placeholder in the code to incorporate it in the future, so the REACTION state won't be used until the AI video capabilities can be properly integrated. DeepSeek suggested using tags for each animation, like hashtags on social media, to assign the context needed for the search algorithm to determine which animation is appropriate. It might also be good to assign weights to the tags so more frequently needed animations show up more often, but that seems like something you'd build in later and not something you need for a proof of concept. I probably should have thought of incorporating the search algorithm back when I was first theorizing about robowaifu martial arts. I didn't think of it because I had the idea that the robowaifu always needs to use AI to find the best counter to enemy attacks, and nothing else would be good enough to let her adapt to a canny human opponent. Because of that, conserving processing power didn't really seem important; having a more combat-capable robowaifu was the priority regardless of the processing overhead. But a holowaifu doesn't physically exist, except if you use AR to overlay her onto a robowaifu or doll, and even then she can easily be detached from it, so she doesn't need to be capable of fighting. Maintaining proper balance while walking isn't an issue either, so the search space of animations can be shrunk dramatically. Part of the reason I've been posting infrequently since I finished laying out the details of my concept is because I've been busy with other things, but the other part is that I just got stonewalled when I tried to figure out how to govern the waifu's behavior without writing a clunky garbage program with over 9000 states that would be impossible to maintain and would run like shit, especially on an AR headset where you need to optimize because you just don't have the power of a high-end gaming machine. I couldn't tell you how I thought of this solution. But the search function isn't really any different than searching for images or videos by hashtag on a website. The most important part of this idea is that it's actionable in terms of writing code, so the next big post I make will contain my attempt at vibe-coding this. But I'd like to get some feedback on this first, because not being an experienced programmer, things like knowing when to use a given search algorithm or how to set up a video database are foreign to me.
>>43349 >pic Lol, errytiem. :D >The most important part of this idea is that it's actionable in terms of writing code, so the next big post I make will contain my attempt at vibe-coding this. Please do, HoloAnon! >things like knowing when to use a given search algorithm As to your data containers & search algorithms, the basic C++ STL has several tried-and-true ones. Unordered hash maps * are a great start towards understanding this domain of containers+algorithms. >or how to set up a video database are foreign to me. I may not be understanding fully what you mean by this. If you mean what I think you mean (fully-unbridled, context-independent searches across a wide swath of video content, with no human-devised tagging necessary), then I regret to tell you that this is an area of research currently, and even Samyuel Copeman and his US$trillions taxpayer-funded (((data centers))) struggle with attempting this (and in fact no one has done this...all relying on carefully-curated human tagging beforehand). So the >tl;dr here is that I hope I misunderstand your intent, and that you have something far more-modest in mind! Cheers, Anon. :^) --- * >protip: cppreference.com is your good fren! :^) https://en.cppreference.com/w/cpp/container/unordered_map.html
Edited last time by Chobitsu on 12/15/2025 (Mon) 09:03:43.
>>43350 >no human-devised tagging necessary Nah, we're going to need tags for this. Otherwise the search would take way too long. That's why I mentioned tagging the animations using hashtags or something like them so we can narrow the search space.
>>43351 Then that should work fine, and std::unordered_map is your answer to all of the above. Good luck, Anon! Cheers. :^)
>>43349 >>43350 As far as I know, no AI is as efficient as an animation engine. Keep in mind that you don't need complete flexibility. Things like the thread OP pic can do a lot with limited animation. For the seggs and posing, you can have a VTuber-esque posable character.
>>43355 Yeah, that's why I said the search function is ultimately more important than the AI video model. In most situations there will be a preexisting video in the database that fits the context. If you have enough animations loaded into the database you'd have a suitable animation over 99% of the time without creating a new animation from scratch with AI. Maybe at some point the AI's processing power requirements can come down enough that we can rely primarily on the AI video model, but a search system is good enough for now.
>>43355 >>43357 An idea I had was to make the character canonically fly/float. This will help reduce animations and avoid awkward float-walking.
>>43358 Yeah, I think we talked about that earlier, but it's probably good to mention it again because things like that can get lost in the vast dumps of info here.
>>43357 A while ago while I was working with chatgpt I instructed it to give servo positions and described what each position controlled. I'm not sure if it actually provided something useful (I didn't hook it up to a physical body I just wondered if it could), so a trained LLM can certainly provide such data. Or you could just have animations generate using random numbers.
>>43361 This sounds interesting. I've heard things previously about LLMs being used to control robots, but I don't know much about that. I'd appreciate any extra detail you could provide. >have animations generate using random numbers How do you do this?
Open file (6.41 MB 1364x768 GPT expressions.mp4)
>>43363 >LLMs controlling robits You tell the AI something like: hey the word "move_forward" will move a physical robot body forward. or "Emotions: happy" will display a happy emotion on a physical robot. Please respond with emotion:[emotion keyword]X[whatever you'd like to say] and I will parse it accordingly. You then check the LLM's response. I use python so it would look like if "happy" in response.split[0]: face=happy.jpg Vidya is me doing such things with GPT python API back in '22 >rando animation generation just periodically set a servo to a random position, combined with several servos you get a "random" animation. The python "random" library is good at easily generation pseudo-random numbers. I have that going for the neck of my robit along with some motion smoothing (gif is also from like 2022 lol).
>>43366 That's very impressive. But how did you get her expressions to sync up with her responses? Also I have some unrelated questions about mech design, but I'll get to that later when I have a minute to discuss it in detail and not in this thread.
>>43367 You get the entire string from the LLM in one go, so the raw LLM response is something like response = "Expression: happyXWhy yes I am quite happy today" then response = response.split("X") emotion = response[0] display_text = response[1] then just used opencv to display a pic if "happy" in emotion: cv2.imshow("happy.jpg") and of course print the LLM response print(display_text ) I forget the specifics but that gives ya a general idea. TLDR its all just some low-level string manipulation since chaptgpt api function returns a simple string. >mech design Don't bother just yet: I just discovered certain motor controllers generate enough emf to disrupt microcontrollers like arduinos and I2C hardware XD Still gonna be a while before SPUD gets acme screw actuators.
>>43368 So basically you scan both your input and the LLM's output for certain keywords and then have her change her expressions if those keywords are detected. Of course there might be cases that have multiple potentially contradictory keywords being triggered, so you'd have to account for that. But that's pretty similar to what I was talking about with the tags and voice commands causing the animations to play. I guess Chobitsu was right when he said we'd want voice commands early on. I was also thinking that she'd respond dynamically to the environment via the search function without explicit instructions, but now I'm questioning whether that's actually necessary. Do we want her to go and do things on her own?
>>43369 I don't know, I guess it might not strictly be necessary to scan your input, only the output, which is what you seem to be doing here. But some people might want the input scanned too.
Open file (7.75 MB 320x570 Jukeboxtest.mp4)
>>43370 I scan the input, eg what the voice detection picks up and transforms into a string if I want pre-programmed commands. It works like chat = vosk.speech_detect(), which returns a string which can be poked, pinched and prodded all you like. For example if the input contains the words "what","time", and "is", instead of going to the LLM it accesses the pi's time function and feeds that to the TTS. Similarly if "what" "you" and "see" are in the voice detection string, the image detection function is run, which returns a string listing all the things SPUD sees and sends that to the TTS. Even managed to get a basic robowaifu music player together that ran only off voice commands/string juggling (see vid). So yeah input is also scanned for fun stuff the LLM wouldn't be able to do. Conversely, you could get the LLM to access functions it normally wouldn't have such as (for example) knowing the time or what the image recognition picks up (if the LLM says "I look around" or other keywords you feed it the IR data).
>>43372 OK, that seems pretty manageable. But how much interaction do you think she should have with the world? However much that ends up being, there will probably be a lot less of it in the prototype version.
Open file (29.46 KB 250x250 linux pepe.png)
>>43375 how much interaction do you think she should have with the world? Well eventually I'll be optimizing the code for public release with easy module addons with 3 different types (might think of more idk): - Direct voice command script - if keywords are in voice detection string, run the script) - Prompt injection script - before the voice detection string is sent to the LLM run a script to add something (like object detection), voice command keywords optional - LLM accessed scripts - if keywords are detected in the LLM's response, run a script such as display an emotion or if "dance" in LLM response, play a dancing animation. By making these modular (I already figured out the script for modular voice commands so these other two would be relatively easy) and eventually making it open-sauce I can rely on others to add extra features I can't think of or am too lazy to do.
>>43381 That makes sense. Offer customization because there's no way you could possibly predict what every user will want their waifu to do. The system I came up with is modular for the sake of letting the developers add new animations if needed, but you're talking about the Bethesda approach where a large portion of the modifications are conducted by the users. So I guess the initial version really does need just a few basic actions, like the 5-10 I was talking about earlier, and we can give the user a method to add more if they want.
>>43381 This sounds really powerful, Mechnomancer! Really looking forward to seeing this packaged up in a tidy release. Cheers. :^) >>43382 >there's no way you could possibly predict what every user will want their waifu to do. This. Good thinking, HoloAnon. Cheers. :^)

Report/Delete/Moderation Forms
Delete
Report