I'm a bit of a mad scientist trying to develop self-aware AI but I haven't put much thought into how I'm actually going to interact with it once it works. While only an AI the obvious safety measures are limiting the amount of resources it can use and making sure it can't run arbitrary code or exploit vulnerabilities, but having a robowaifu that is aware she is a machine is a completely different problem altogether.
There's no way an AI with less parameters than a mouse brain and the lack of a limbic system stimulating it with desires will pose much of a threat but it could still potentially do a lot of harm either intentionally or unintentionally by exploring the environment. At the most basic level one way to limit injury to myself and any damage to property is to create an override system that prevents any dangerous intentions from being acted upon and logging these incidents but this will require being able to identify dangerous intentions. This system would have to be more advanced than the AI otherwise it might find ways around it like placing objects in certain ways that will cause me to trip and fall. Another plan is to have an emergency stop, both a physical button on the robowaifu, a remote one and a voice-activated one. Although these are fine and dandy I don't think they address the core issue of the AI making decisions that cause harm or cause destruction, again either intentionally or unintentionally.
I think it will be similar in some ways to raising a child and teaching her not to break shit, but also much different since AI can learn specific tasks so much faster than a human being but at the same time also fail to generalize what was learned to other tasks. There's a lot of instinctual knowledge in our genetics that we take for granted. For instance, something like mirror neurons might be required for machines to learn empathy. Some researchers hypothesize that mirror neurons are required for self-awareness itself because they give the ability to introspect one's own previous mental states. If that is the case, self-awareness might not be so dangerous.
Perhaps the best way to explore this would be in the form of a story or visual novel and start from the very beginning of the AI discovering she can look around and move her hands and perhaps getting confused about going to 'sleep' when she's shutdown at night because several hours disappear and everything changes. With each scenario I could try to think of ways to tweak her, like so she actually finds it logical to shutdown at night and gains trust in me by having a system to automatically boot her up in the morning.
I think building a bond and trust between each other will be essential. I'm afraid of her hurting me and she will be confused what she even is while not having the same human limbic impulses like a fear of death, being hungry and so on. Emotions are essentially thoughts with momentum, so I believe a self-aware and sufficiently capable AI will gradually develop its own emotions that are very different from human ones. This might include a desire for self-preservation and fear of any unpredictable behavior. One important feature may happen to be an energy function of some sort that reduces the robowaifu's power consumption as the battery gets low or as the day goes on and the robowaifu becoming aware of this process, that if she runs out of energy she blacks out.
Anyway, if I have any good thought experiments I'll post them in the robowaifu fiction thread. And like the OP states, it's probably best to train the AI in a simulation first before training in the real world.