/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

Site was down because of hosting-related issues. Figuring out why it happened now.

Build Back Better

Sorry for the delays in the BBB plan. An update will be issued in the thread soon in late August. -r

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


“I am not judged by the number of times I fail, but by the number of times I succeed: and the number of times I succeed is in direct proportion to the number of times I fail and keep trying.” -t. Tom Hopkins


Robot Vision General Robowaifu Technician 09/11/2019 (Wed) 01:13:09 No.97
Cameras, Lenses, Actuators, Control Systems

Unless you want to deck out you're waifubot in dark glasses and a white cane, learning about vision systems is a good idea. Please post resources here.

opencv.org/
https://archive.is/7dFuu

github.com/opencv/opencv
https://archive.is/PEFzq

www.robotshop.com/en/cameras-vision-sensors.html
https://archive.is/7ESmt
Edited last time by Chobitsu on 09/11/2019 (Wed) 01:14:45.
>Unless you want to deck out you're waifubot in dark glasses and a white cane
But, OP, what if the model for the waifubot is supposed to be blind? #triggered
Open file (9.23 KB 480x360 0.jpg)
>>97
The inmoov project managed to use the playstation move(?) to give the robot vision, the script is freely available for it. They also use purchasable cameras for the eyes.
inmoov.fr/eye-mechanism.

https://www.invidio.us/watch?v=H4Z09edx52E
>>1107
KEK

>>1108
That's cool anon thanks. I had planned on using the open sauce version of kinect (Willow Garage) and a couple of hi quality 1080p webcams.
Open file (22.08 KB 480x360 0(1).jpg)
I just purchased a couple of the JeVois cameras. I plan to try using them on a little moebot. What's neat about them, is the have a dedicated 4-core CPU running Linux and any other vision software like OpenCV right in the camera. This pretty much entirely offloads the vision processing computation from the robowaifu's other onboard processors. Thanks to the anon in other thread for first posting the camera for us.

https://www.invidio.us/watch?v=7cMtD-ef83E
>>1107
Blindfolded girls a cute.

On topic: a new Google vision kit
www.google.com/amp/s/www.theverge.com/platform/amp/2017/11/30/16720322/google-aiy-vision-kit-raspberry-pi-announce-release
>>1111
Thanks anon, I'll check it out come New Year. If I feel it doesn't feed their botnet, I'll recommend it. Either way, I'll post back here and let you know what I think about it in roughly a month or so.

>ed. surely anon will deliver...
facial recognition convo:
[[1129
Anon linked a nice OpenCV training source:
[[1154

OpenCV tutorial site
[[1161

JeVois camera set up:
[[1163
I just found this;
hackaday.com/2018/02/15/student-3d-prints-eyes/
I'm thinking of using an acrylic light guide / light splitter that reads a single image from a TFT display and duplicates it for two small eyeballs. The eyeballs can be physical, so this will save some logic and wonkiness and I can just focus on making a nice retina pattern; the light guide can be a bunch of optical fibers. Pic is supposed to be a Lego piece, wish there was a better illustration.
>>1117
Sounds interesting FluffyDev, can't say I really 'get it' specifically just yet, but I am familiar with optical fibers.
>>1107
>>1111
>>1118
Upon VERY SERIOUS consideration, one-eyed girls may work for our purpose, either temporarily one-eyed or even permanently, since the space behind the other eye can be used for additional electronics and actuators (much needed space especially for a smaller 80cm bot such as mine). A tube guide consisting of either solid plastic tubing or bundled optical fibers and a mirror system will make it possible to have both a real camera vision system as well as fancy pupil/retina graphics. Plus with a physically moving eye, the camera will be able to pan also. (Yes it will be a huge eye mechanism, hence the SERIOUS CONSIDERATION of making the waifu one-eyed on purpose). But she will actually have vision which you can display on the debug computer!
>>1119
>not just having a leela robowaifu
plebeian pls
>>1119
Eye patch is beauty
I'm not really a fan of this eye mechanism since it doesn't support a camera but it could be redesigned to support one. The eyes look pretty good and the process could be adapted to creating anime doll eyes too.

>How to Make Realistic Eyes Using 3D Printing for Animatronic Eye Mechanisms
https://www.youtube.com/watch?v=RqZRKUbA_p0

>How to Build a Simple 3D Printed Arduino Animatronic Eye Mechanism
https://www.youtube.com/watch?v=Ftt9e8xnKE4
>>1657
Yes, I think I remembered that one from before. I like the face the parts are mostly 3D-printable with a cheap printer. Can probably use a resin-based UV printer for the more precision parts like the eyeballs/lids themselves. I don't see any reason small cameras couldn't be fitted inside the interior globes of the eyeballs anon. Even the Jevois camaeras (that have actual tiny fans on them b/c Linux+OpenCV coprocessor right on board) should work with the half-open design of the eyeball.
>>1664
>I like the fact*
Open file (2.07 MB 320x228 790.gif)
>>1657
reminds me of the Robot from Lexx
Here's one with two cameras and low amount of space: https://youtu.be/DY4as7Lc9KY My own thinking is, that ideally we should be able to take eyeballs out from the front. The sockets should be expandable four maintenance. If taken out, the servos should be replaceable as well. But then, I'd like my waifu to be waterproof (shower) and she should have tears. So the servos need to be separated with a layer of silicone somehow, maybe they move a magnet and the eyeballs have some metal inside?
>>4335 >that ideally we should be able to take eyeballs out from the front. The sockets should be expandable four maintenance. That's a great idea Anon. It will allow for easy upgrading of the cameras, and as the control mechanisms will be moving thousands of times a day (even if only for small distance) affords maintenance access. I propose that the entire assembly be able to slide forward on a tray to afford easy access from most angles.
>>4337 That sounds good, but it would add more moving parts. Might make sense i n some designs, but I'd prefer to avoid it. Maybe you're thinking of a moe bot with small eyeballs? If the eyeballs in a bigger one are quite huge and you can get the eyeball out first, then you would need to cut through some silicone, then getting the servos out should work as well. My point is the balls should be held only by the eyesocked (circle) as part of the skull, but if needed it could expand and form a bigger circle.
>>4335 After thinking about this a bit more, it came to mind that using a magnet connected to the servo and metal part to the eyeball, while the eyeball is also connected to a cable might not work. I'm concerned that the position of the eyeball couldn't be controlled very well. However, I tested it with a hollow flexible ball and a weak magnet (without servo). Now I can imagine it better and I'm more confident. However it might need a automatic re-adjustement mechanism. In case the servo looses connection to the eyeball it might move without the eyeball and then not know where it is.
>>4341 >That sounds good, but it would add more moving parts. This is a good example of the basic eye-control mechanism I have in mind: >>1657 (1st pic) The access mechanism would simply add a slide-out frame for the entire assembly to rest on. >then you would need to cut through some silicone, I would propose a design that had firm plastic 'frames' around the eye sockets, with a clearly-delineated seam that wouldn't require any special treatment to slide the eyes out for access.
Open file (432.91 KB 620x800 Nebula.png)
>>4351 Okay, that's different from mine, bc it will probably not be a seamless skin or plastic face on the outside. Which is okay, we need different approaches for every taste and use case. In such a case you might want to embrace the seams and color the face in different colors, bit like Marvels Nebula for example.
>>4355 Yeah, maybe a good idea. I'm not a art-designer per se, but I'll dabble around with different styles once my efforts have matured a bit.
Open file (129.78 KB 691x1037 IMG_20200702_020947.jpg)
Here's a lot of discussion towards moving eyes, following eyes and blinking eyes. https://dollforum.com/forum/viewtopic.php?f=6&t=103291 They're having a lot of trouble with space. Putting all kinds of stuff into a skull won't be easy. I thought about how to get the whole eye movement contraption smaller, thought the eyeballs would need to be bigger than human ones for anime eyes. One thing is, using servos as small as possible, not these bulky ones. If we can do it with normal dc motors then those would be even better. Of course, they are not so precise... I also thought I had a good idea, by putting one motor in the quite big eyeball. Problem is, I'd like to have the noise dampened. I also had the idea of using one motor for both eyes before, but human eyes can move independently. However, this does not apply for up/down movement, so we are down to three small motors then. The one in the middle would need to have an axis in both directions. Even better would be if the motors could be a bit deeper in the skull and also being used for something else (just a crude idea yet).
Open file (79.41 KB 1283x1064 IMG_20200701_210057.jpg)
I just listened to that interview here just by chance, while doing some chores: https://youtu.be/bnsgsPjILyQ - Very fascinating. How image recognition needs to work, so the system can think about it. It's one model that looks for different things in a image. It's inspired by neuroscience. Idea is that perception and cognition can't be disconnected from each other. Natural signals, segmentation and top down controllability are the keywords, the latter means for example when we're zooming into a picture in our minds.
>>4541 >Putting all kinds of stuff into a skull won't be easy. Yeah, I think that's mostly a misguided idea. Trying to bio-mimic absolutely everything that's been so elegantly designed into humans is, well, humanly impossible. :^) Better to keep most of the componentry safely protected inside the torso, etc, IMO. >>4777 >perception and cognition can't be disconnected from each other. Yeah, that's probably correct. Certainly it seems to ring true with some of the positions Carver Mead suggests for the field of Neuromorphic Computing he basically originated. Much of what we think of as 'cognition' is in fact neurological at a basic level instead of a higher level, and the perceptions are pushed as far out towards the extremity of sensory perception as possible in most the the higher life form's biological systems.
>>4786 IMHO at least one relevant computer should be in the head, to imitate humans. Also, stuff we have to put into the head isn't only that, but we'll need a lot of mechanisms in general there, so space matters. Think of facial expressions, microphones, speakers (mb in the throat), heating for the skin, tongue moving around while still leaving some space... Cleaning mechanisms... Okay, this is going OT towards >>9 (face/head general). Further discussion on what to put into the head maybe better there?
>>4792 Yup, all good points Anon.
Open file (43.59 KB 681x555 OAK-specs.jpeg)
Boards with cameras attached came up in the thread on SBCs, here: >>5705 OAK from OpenCV and a cam from Jevois where the computer is part of the camera. Fascinating, but might be a problem if one wants to put it in eyeballs and also make thouse water proof. OAK seems to be a bit big and the cams from Jevois have aircoolers... On the other hand for development my concerns might be irrelevant, since one can build something with them and replace them later with something smaller and cooler. The Jevois camera has shutter sensor with inertial measure unit and digital motion unit, gyroscope and all kind of sensors, wow: https://youtu.be/MFGpN_Vp7mg
Here's a video on eye movement. https://youtu.be/FaC2RXBss2c The human eye has six muscles, it can even roll sideways a bit. However, what always bothered me, is that so many fembot eyes can move independently up and down. I still think this isn't necessary. I'll look for a motor with two axes, for up and down movement.
>related xpost >>8659
Someone with no maths/science background looking to get into openCV or just computer vision in general. My problem would be that I don't know how to approach the topic. I haven't looked into any code yet, because I want to learn how openCV works first. Though now that I'm typing it out, it does sound stupid to learn how openCV works without looking at the code. I thought to start by looking into the various algorithms but I hardly found any on DDG. Found this one site: https://www.upgrad.com/blog/computer-vision-algorithms/ but the more I look into the unknown terms, the more I went down a rabbit hole of unknown math terms. How do you guys recommend I get started with openCV? Should I only tackle algorithms when I see/need them in youtube tutorials I watch? What are some great resources to help get into openCV? I don't want to turn into a code monkey who only copies and pastes code to get their CV project working...
>>9296 I wanted to let you know I saw your post, and I'll attempt an answer suited to it. But that will be a while till I can. In the meantime, mind giving us a little better idea of both your experiences & level in technical work & programming? Do you know C++ already for example? Also, do you already have some small cameras or mobile platforms available, say like an RC car or something? It's OK if you don't have any of this stuff. You can get by without any at the beginning. But letting us know your situation will assist us with giving you a better answer.
>>9296 OK, the obvious first step is simply to get your news straight from the horse's mouth Anon. https://web.archive.org/web/20210308213203/https://docs.opencv.org/master/ There are tutorials there that will get you up and running quickly. The library's software is written in the C++ programming language. So, if you really want to dig into their system as an engineer, then I'd recommend you take that path through both the tutorials and the codebase. This will help you with gaining a deeper understanding of the library itself if you do. https://github.com/opencv/opencv Additionally, if you choose this path, then I can also be of more assistance to you here since I have some experience both with C++, and with using OpenCV in the context of C++ engineering aimed at some basic image processing tasks. OTOH, if your intent is more as a hobbyist simply exploring the ideas for CV that are out there, then there are both Python & JS tutorial pathways as well. I'm pretty sure Java is supported as well. The API of the system are quite similar for all the languages. The basic, fundamental idea to get your head around to start with is that OpenCV treats images as a big matrix (as in, the linear algebra matrix) of data. All image-processing operations orbit around this fundamental paradigm. Get your head around that notion from the beginning, and the rest of the library can quickly come into focus for you. Given your post, it sounds like perhaps you are just trying to get your head around the general field itself for the time being. In that case, you can just keep scouring the Internet for general articles, various forum posts, all the other imageboards like /robowaifu/ that's a joke BTW :^), blogs, and various YT feeds. There are also many scientific papers on the subject. Eventually you'll get the hang of it Anon. So it kind of depends on both your experience level and where you want to go with this as to what I'd suggest to you here Anon. Others here may have alternate perspectives they might choose to share with you. >=== -various prose edits
Edited last time by Chobitsu on 03/30/2021 (Tue) 02:05:18.
>>9297 >experiences & level in technical work & programming? Not much I'd say. Been using python for little over a year learning/doing web scraping and GUI building (with pyqt5). >Do you know C++ already? I forgot to mention but I was referring to openCV in python, would I still need to know C++ for that? I did go from C++ to python so its not completely unknown to me. >do you already have some small cameras or mobile platforms available? The only thing I have is a USB webcam. >>9298 I'd say my intent is more as a hobbyist. I just want to create programs for myself right now. >trying to get your head around the general field itself for now. Yeah. I guess as with anything, patience is the key. Thanks, anon.
>>9312 OK, that's all fine. Python and a webcam is just fine. I'd suggest you simply work through getting the system installed now, and working through all the Python tutorials. You can use some of them with your camera as well. https://web.archive.org/web/20201102214056/https://docs.opencv.org/master/d6/d00/tutorial_py_root.html Once you've worked through that as far as you can, then I would advise you to take a break from it all for a bit and then reconsider what you've learned and where you want to go next. I wasn't exactly clear yet Anon; do you think you're wanting to make your own robowaifu eventually?
>>9316 Thanks for the advice, anon. >do you think you're wanting to make your own robowaifu eventually? Yes.
>>9317 YW, and glad to hear it. Good luck with your robowaifu's design Anon!
Trying to do some computer vision experiments/projects on python with my xbox 360 kinect. Installed/built freenect module via https://github.com/OpenKinect/libfreenect and https://github.com/amiller/libfreenect-goodies. However, I can't seem to find any documentation on what functions it has or what one can do with it. All I have are the example scripts in those github repos. Is there any documentation or guides on this?
>>9793 >Is there any documentation or guides on this? I haven't looked into this yet Anon, but Willow Garage (of OpenCV, etc. fame) had the original open-sauce libraries surrounding 360 Kinect et al cameras for stereoscopy. You might research these venues for more information, if not documentation on your particular setup.
>>9795 I see. Thanks Anon, I'll look into Willow Garage.
>>9296 A wall has been hit! Doesn't look like I'll get past contours https://web.archive.org/web/20201102214056/https://docs.opencv.org/master/d6/d00/tutorial_py_root.html in the image processing sub-topic or any other sub-topic after without knowledge in mathematics. What do I need to get a deep understanding of to move forward with computer vision (and machine learning)? Apart from the obvious linear algebra, of course. Who would've thought, making fun of math classes back in school was not the way to go.
>>9970 Correction: I can move forward without math knowledge but it will limit my understanding to just knowing what the code does. I wont have any idea how the code does what it does.
>>9970 I don't know what exactly you need, but there are YouTube lessions, Coursera (also as Torrents) and the Brilliant app.
>>9971 >I wont have any idea how the code does what it does. I hesitate to encourage you in this one way or other beyond what I've already done ITT: >>9298 >So, if you really want to dig into their system as an engineer, then I'd recommend you take that path through both the tutorials and the codebase. This will help you with gaining a deeper understanding of the library itself if you do. However, I'm well aware that can be a literally years-long process, with no guarantees of success. The journey itself has plenty of rewards IMO > One of the basic lessons to learn early in life -- particularly in an engineering/design life -- is: Keep moving forward. That's it. Just don't quit Anon. Very likely if you can imagine something, then it can actually be done. No guarantees how long or short the journey to success may turn out to be however. This depends in large part on yourself, and the groundwork of effort you've already laid for/in yourself. Keep looking for another way past that mountain Anon. Over, under, around, or through ... you can make it! And as far as creating your own robowaifus goes, you could do worse than making your way here to our board. You're already a bit ahead of the game in that regard, Anon. Now, with that out of the way :^), can you clarify a bit more? >contours and the link you provided have a bit of a disconnect. What about 'contours' is your current problem, can you explain specifically?
>>9976 >and the link you provided have a bit of a disconnect. What about 'contours' is your current problem, can you explain specifically? >UD: I presume this is the link of interest here, Anon? https://web.archive.org/web/20201028001740/https://docs.opencv.org/master/d3/d05/tutorial_py_table_of_contents_contours.html
I just want to summarize my observations so far regarding this topic. I've been looking at various self-driving robot/rc car projects by students on youtube. They usually fall into the following: 1. Raspberry Pi + OpenCV. The video from the camera has to be segmented... i.e. turned into grayscale, edge-detected, and lane markers drawn. You then just need to draw a line right through the center of the road, that will be the steering vector which you then command the steering servo. I haven't touched OpenCV yet though, and I might just bypass the raspberry pi entirely since the trend is to use Nvidia Jetson boards now which have their own preinstalled things. 2. Nvidia Jetson (Nano or other boards). There's the Jetracer: https://github.com/NVIDIA-AI-IOT/jetracer which comes with its own precompiled software as a Jetcard image. (I was able to secure a TT02 chassis for only 90 bucks, it's going to be delivered any day now so i can't wait to start on this. I find it easier to customize hobby kits as you're building it rather than risk destroying a ready to run with irreversible changes.) Now does this apply to robowaifus? Well if she is on wheels at least, the steering vector would be fed into a differential H-bridge DC motor driver. Now that's just one side of robot vision -- the ability to navigate. How about FPV? Though I don't have a drone (I don't want to start a hobby where EVERYTHING comes from China, unlike RC cars at least), I've been looking at various FPV setups. They usually go like this: 1. 1000TVL type of analog FPV camera + OSD (onscreen display board, usually based on MAX7456) + VTX (video transmitter, usually an Eachine or whatever is compatible with the 5.8Ghz headset). So yeah, apparently analog is still applicable since it has low latency. If you want to record video, best practice is to just attach a second HD lens and make sure it is pointing at the same direction as the analog camera, they usually call this hybrid cameras. 2. Wifi FPV -- I actually tried this with my RC cars, it's blegh... you would have to drive really, really slow for it to be usable, but for slow H-bridge DC motor robots it would be perfect. The advantage is that if you are viewing through a high spec machine such as a laptop you can have a fancy overlay for your onscreen display (cue the awesome video game style HUDs) and just record your stream directly to your disk. So first I have to actually make one of those self driving robots, then I can prove if its practical to just use the camera input (usually a raspberry pi class of camera), Y-split that into an FPV feed and a navigation feed. The navigation feed will be turned into grayscale or color segmented blocks and lines, while the FPV feed will be wi-fied to the client machine which will handle most of the OSD processing.
Open file (206.67 KB 1360x772 esp32cam programming.jpg)
Oh by the way, a few weeks back I was looking at the cheapest way to get into camera robots. Apparently an ESP32 has just barely enough capability to capture and stream images, previously they paired discrete ESP32s with OV-series basic cameras, until they decided to directly sell combos such as this ESP32-cam. It is too slow to be used for RC-speed vehicles, but for slow indoor robots such as this: https://github.com/gitnabeshin/ESP32CamRobot it is fine. I'm posting this since it's what go my feet wet with non-AI computer vision and control, so now I have enough confidence to actually try AI self-driving as the next step.
>>9973 Didn't know brilliant was free. Well there are YouTube lessons and such but I don't know what I need to look up the YouTube lessons for. I've been following 3blue1brown and MyWhyU for lessons in linear algebra. Guess I'll just have to cover the math topics when they appear in the way to learning OpenCV. >>9979 Yeah, that's the link. More specifically, this is where my confusions started: https://web.archive.org/web/20201031223524/https://docs.opencv.org/master/dd/d49/tutorial_py_contour_features.html >>9976 Everything past the contours section in the "Image processing with OpenCV" sub-topic in https://web.archive.org/web/20201102214056/https://docs.opencv.org/master/d6/d00/tutorial_py_root.html and all the other sub-topics past "Image processing with OpenCV" (i.e. Feature Detection and Description, and onwards) seem to explain their workings in mathematical terms with formulae I have no hopes of understanding right now. >What about 'contours' is your current problem, can you explain specifically? I don't know what "moments" (which seems to be a math derived term) of an image are. And nor am I understanding what cv.Moments() returns. That's an inspirational image!
>>10018 >>10019 Quality information, thank you Anon. >>10021 >I don't know what "moments" (which seems to be a math derived term) of an image are. It's just a abre-use of the idea of 'moments' from physics (cf. >>9887, et al). In that context a 'moment' is kind of a calculus notion, and can be thought of as sort of an instantaneous 'snapshot' or moment of the rate of change in something. In the physics context the implication is that it's a description of some particular force being imparted by the inertia of some particular mass under consideration. This abre-use of this term by OpenCV devs is simply that there are 'rates-of-change' types of things going on in an image (for example, it's contour lines) that you would want to track. For example the rate of change of brightness of pixels in a given direction forms a 'contour' of that change, kind of like a contour map in geography typically describes the changes in elevation for instance. Make sense? https://www.quora.com/What-exactly-are-moments-in-OpenCV?share=1 >And nor am I understanding what cv.Moments() returns. Well remember how I said all image data (and practically every other form of data) in OpenCV is a big matrix? Well, that's what gets returned. A set of matrices describing contour characteristics of that input image, etc. https://web.archive.org/web/20170815042506/http://docs.opencv.org/master/d8/d23/classcv_1_1Moments.html
>>10018 >>10019 Yes, this is good to know. Thanks. >>10021 Using the Brilliant app isn't free, or wasn't last time I checked. Only the introduction. To access everything it did cost 60-80€ per year. In general, when I get stuck at something, I'll look into other sources of information. Reddit isn't popular on image boards, but having an account there, completely focused on learning stuff und asking around on tech questions might be a good idea. I'm also sure there are good videos on YouTube on OpenCV and tutorials on the net which are even related to using it with Raspberry Pi. >>10023 Not the anon asking, but this was interesting to me as well. Thanks.
Since we don't have some kind of general Robowaifu Sensornets or something like it yet, I'm just going to drop this here since it's probably the closest thing we have atm. So, a newly-reported spying mechanism invented to keep all the population under close, real-time surveillance via their goyphones uses bat-like echolocation to synthesize an environment's image data. Including the people and pets in it. While they themselves mean it purely for evil purposes (George Orwell's Telescreens spying system, but mobile & no need for a fixed camera), perhaps we can in fact use the same technology approach for good? I expect this can certainly enhance a robowaifu's realtime situational awareness capabilities? https://web.archive.org/web/20210501015028/https://www.dailymail.co.uk/sciencetech/article-9525529/Scientists-equip-smartphones-bat-sense-technology-generate-images-sound.html
>>10217 Paper is titled, "3D imaging from multipath temporal echoes" for anyone interested. PDF available: https://arxiv.org/abs/2011.09284
>>10630 Briefly skimming the paper, they seem to use a 3D camera to collect training data (like a Kinect, but more expensive), whilst the actual sensor for the acoustic system consists of a stock-standard Logitech PC speaker, and a microphone. Echos are recorded, passed into a network which reconstructs a depth map. All-in-all, it's a super simple system to replicate, and one that could easily be made for almost no cost, and very compact. Creating training data might be a pain, but given that I already have a Kinect, I'd be happy to volunteer to rig up a Kinect+Pi+mic+speaker and just walk around my local area collecting huge amounts of data. Let me know if anyone's interested. I'll probably do it myself anyway, but if there's any specific areas you'd like data for, write it down and I'll see what I can do.
>>10630 >>10631 Wow, great work Anon. Much appreciated. >Let me know if anyone's interested. I'm certainly interested to know about your results, though I don't have any personal requests ATM. >Kinect Hmm. Hasn't that product been discontinued now? I wonder what good alternatives exist for us today if any? >PDF available Always a good idea to archive a copy here on the board for access in case anything is ever pulled in the future. >
>>10632 I suggested the Kinect because it's the most common consumer level depth camera, and I already have one. Any 3D camera can be used, as long as you can get a depth map image from it. Regarding alternatives, they use an Intel Realsense D435, but that's about $300 new. Even at that price, it's still considered one of the cheaper depth cameras. Note that the 3D camera is only needed for training data collection. Once you've got your data, the system only requires a directional speaker (i.e a standard speaker), and a microphone. In theory, you could probably use the voice speaker + ears of the robowaifu, but you'd want to tweak your training data if you did.
>>10633 There are some other options: JeVois: https://youtu.be/MFGpN_Vp7mg eYs3D: https://youtu.be/NXWHYH0v638 e-con: https://youtu.be/vzXzz7VmWzo SceneScan from Nerian: https://youtu.be/mJ5UlXNguvg SP1 from Nerian: https://youtu.be/vVVjFqUkG4E Cadence: https://youtu.be/wPxi4ZYSJC0 Stereolabs ZED: https://youtu.be/7_8XLI99dno This here is software, trying to do depth perception on any camera: KudanSLAM: https://youtu.be/Pgami8jglmE
>>10646 Thanks for the work Anon.
>NIKKOR Lens Simulator https://imaging.nikon.com/lineup/lens/simulator/ An interesting utility the fine Anons on /p/ mentioned.
>related crosslink (>>10645, ...)
M-LSD: Towards Light-weight and Real-time Line Segment Detection: https://github.com/navervision/mlsd You Only Look at One Sequence https://github.com/hustvl/YOLOS >TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO object detection benchmark. >Directly inherited from ViT (DeiT), YOLOS is not designed to be yet another high-performance object detector, but to unveil the versatility and transferability of Transformer from image recognition to object detection. CLIP (Contrastive Language-Image Pre-Training) https://github.com/openai/CLIP > CLIP is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and >3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision. Dynamic Vision Transformer (DVT) >We develop a Dynamic Vision Transformer (DVT) to automatically configure a proper number of tokens for each individual image, leading to a significant improvement in computational efficiency, both theoretically and empirically. https://github.com/blackfeather-wang/Dynamic-Vision-Transformer
Open file (3.04 MB 444x250 very_nice_anon.gif)
>>10759 > Towards Light-weight and Real-time Line Segment Detection < Light-weight and Real-time Well I certainly find that combination of terms very encouraging Anon. Simply b/c that's the only combination that's going to be successful for devising reasonably inexpensive, mobile, autonomous gynoid companions robots. AKA robowaifus. It's gratifying seeing a small cadre of researchers seeming to be breaking off from the standard-issue Globohomo toadies, and tackling the real-world concerns of the billions of regular people, regarding AI advancement at the personal level. >tl;dr We'll never achieve an AI/Robowaifu Renaissance if we every one have to be beholden to the Globohomo cloud, hat in hand, begging "Please sir, may I have some more?"
>>10761 test
Found this reference image that is used by 'Ocularists' who paint glass eyes/ocular prosthesis. Might help some anon who is trying to paint their robowaifu a pair of pretty peepers. A note about eye lighting/refractive highlights: Despite what anime has taught us, I think it may be best not to paint on any lighting effects/refractive highlights (at least on non-cartoon irises). The highlights are best formed by a coating of clear epoxy resin. I say this because such refractive highlights move throughout the lens of the eye and across the surface of the sclera. So if they are painted on, they will be static...and this just looks wrong. Especially if the eye is also coated in clear resin. Because then you have the natural light reflected off the resin along with the static painted highlights. I think this is why Will Cogley leaves a concave depression inside his eye mold so that clear epoxy resin can set inside and form a lens shape, which reflects and refracts the light just like a real eye. Some measurements: (from the Wikipedia article on the human eyeball, and a scientific paper on the human iris). Human eyeball average height = 23.7mm (0.93 in) Human eyeball average width = 24.2mm (0.95 in) the eyeball is slightly wider than it is tall - which is why I didn't just give one diameter, but I know for the sake of simplicity that a 25.4mm (1 inch) diameter eyeball looks correct inside a life-sized human head. Human eyeball average anterior to posterior diameter = 24mm (0.94 in) This may be less important for a robowaifu since often they only use the front half of the eyeball, with the back/interior being fastened to a servohorn or pushrod in some way. Human eyeball average volume = 6 cubic centimeters (0.37 cu in) Iris size range = 10.2 to 13.0 mm in diameter with an average size of 12 mm in diameter, and a circumference of 37 mm. (From this paper: https://www.di.ubi.pt/~lfbaa/pubs/iscit2013.pdf 'Iris Surface Deformation and Normalization' by Somying Thainimit, Luis A. Alexandre and Vasco M.N. de Almeida. Fully dilated pupil = 4 to 8mm in diameter Fully constricted pupil = 2 to 4 mm in diameter (larger doe-eyed pupils are usually better since small ones can make the eyes look angry/scary/psycho - unless you are going for that, of course LOL). Also, please note that if you are mixing and pouring clear epoxy resin into those half-sphere (cabochon) silicone molds, remember to: A.) Wear an apron or some cheap clothes since that stuff sticks to everything worse than shit to a blanket. B.) Make sure that your mixing cups and stirrer are both disposable and as clean as possible (wear latex or nitrile gloves to avoid getting the molds greasy and wash everything with warm water, soap and even an isopropyl alcohol rinse if you have that). C.) Pour/mix the resin and hardener SLOWLY. There is plenty of time. Even with a thin layer of epoxy you've got about half an hour before it really starts to set, and I know from experience that 1 inch lenses made with the stuff take over 24 hours to set through completely. If you rush and try to beat epoxy resin like eggs in an omlette then you will just end up with thousands of tiny bubbles and a pair of cloudy looking lenses. (Also, don't do what I did and poke the back of your resin lense at about 20 hours in just to see if it has set yet. Because you end up with a dented, ruined lense. Best to wait for about 30 hours before popping them out just to be safe.)
>>10991 I should also mention that even if you get the best epoxy resin on the market, it will eventually yellow with time. UV light exites the chemical bonds and turns the resin yellow from the outside, in. If your robowaifu has eyelids that move, this is a good reason to close her eyes while she is inactive. It also means that the eyeballs themselves should be made with a degree of replacability, since you'll want to swap them out after a couple of years when they start to yellow noticeably. So maybe spend an hour or two painting in the irises but it's probably best not to try crafting a masterwork prosthesis ;D
>>10991 >>10992 Excellent information on this topic. The effort here is much appreciated Anon.
Open file (139.44 KB 1125x1219 IMG_20210409_013031.jpg)
>>10995 Thank you kindly Anon.
Open file (1.05 MB 1758x504 modulo_teaser.png)
I already mentioned there's open source software that can see heartbeats and micro facial movements called Eulerian Video Magnification in another thread, but I figure I might as well mention it here too: https://www.youtube.com/watch?v=ONZcjs1Pjmk As for hardware, I was thinking simple solid black camera eyes if I couldn't get 3-axis movement (vertical, horizontal, convergence) working with cameras without taking up too much space. I haven't thought too much about eyes other than that, except that I was thinking she should see in IR, to help in poor lighting without blinding me with LEDs on her face. I remember reading about something called a Modulo Camera that supposedly never over-exposes or something, so maybe it could just use a bigger camera for better night vision? There's also something called a "Light field camera" that keeps everything in focus, but I'm not sure how useful that is for robot vision, I just think it's neat.
>>13163 That's an interesting concept Anon, thanks. Yes, I think cameras and image analysis have very long legs yet, and we still have several orders of magnitude improvements yet to come in the future. It would be nice if our robowaifus (and not just our enemies) can take advantage of this for us. We need to really be thinking ahead in this area tbh.
It seems like CMOS is the default sensor for most CV applications due to cost. But seeing all these beautiful eye designs makes me consider carefully how those photons get processed into signal for the robowaifus. Cost aside, CCD as a technology seems better because the entire image is processed monolithically, as one crisp frame, instead of a huge array of individual pixel sensors, which I think causes noise which has to be dealt with in post image processing. CCD looks like its still the go-to for scientific instruments today. In astrophotography everyone drools over cameras with CCD, while CMOS is -ok- and fits most amateur needs, the pros use CCD. Astrophotography / scientific www.atik-cameras(dot)com/news/difference-between-ccd-cmos-sensors/ This article breaks it down pretty well from a strictly CV standpoint. www.adimec(dot)com/ccd-vs-cmos-image-sensors-in-machine-vision-cameras/
>>14751 That looks very cool Anon. I think you're right about CCDs being very good sensor tech. Certainly I think that if we can find ones that suit our specific mobile robowaifu design needs, then that would certainly be a great choice. Thanks for the post!
iLab Neuromorphic Vision C++ Toolkit The USC iLab is headed up by the PhD behind the Jevois cameras and systems. http://ilab.usc.edu/toolkit/
>(>>15997, ... loosely related)
>"Follow Me" eyes (crosslink): >>19037 - I somehow forgot that we had a dedicated thread for eyes.
> conversation-related (>>23398, ...)
Related: >>23405 >Once thing I would like to do with a board that allows for more than one camera, would be to have a way to use this for creating a somewhat 3D model of the world. Especially be able to know the distance of an object it recognizes. This will be absolutely crucial to understand the world. >>23410 >Stereo Depth Cameras ... using triangulation >>23431 > auto-mesh generation Is this about generating meshes from 2D pictures. I just wrote somewhere that I wonder how video to 3D model would work. It's possible to use AI generated videos to feed a game engine and render a even better video. I guess the background is done using this "auto-mesh generation" then (pose estimation to bone model for characters).
>>23431 >Motion isn't req'd. when you already know the dimensions of the object, you can only use a single image when you already know the actual height then its just a matter of measuring the difference between the real height vs the image height, its how snipers have to figure out distances in their scope when they dont have a rangefinder, it would need to keep a database of dimensions for known objects otherwise it has to go into pajeetmode to emulate stereoscopic vision, its doable but it seems like too much hassle when you can just use two cameras
>>23436 >it would need to keep a database of dimensions for known objects otherwise it has to go into pajeetmode to emulate stereoscopic vision Agreed, and that's an aspect of the 'well-calibrated' camera(s) part of the equation. For instance, when a robowaifu can remain in the relative safety of her master's home, then she can have the luxury of perfectly pre-learning basically everything in his space. This is a big win for all of her on-the-fly, object recognition/distance/volume/kinematic/mass/force/pose -estimate calculations. Including him, of course. :^) >"Master!? Have you been putting on weight again?" >=== -prose edit, fmt -add funpost spoilers
Edited last time by Chobitsu on 06/24/2023 (Sat) 23:34:03.
>>23435 >Is this about generating meshes from 2D pictures. Yes. It works far better using a combination of stereo depth cameras, and the ability to proactively transform the camera(s) around the object(s) in question. Much like a robowaifu (or a human photographer) would be able to do. The primary point being to highly-accurately model the world around her, including her own master and other humans. (For example: their own children romping about. :^)
>>23437 lol, i forgot it would already need a database anyway for those things, still calculating based on parallax is way simpler than comparing the image to known dimensions, especially if the object is rotated then you need to know the angle to get a real height to compare to
>>23440 >parallax is way simpler This comment touches on a technical aspect of computation, and the up-front costs involved with setup. But fair enough Anon. I'm sure it's more reliable, in general, than simplistic dimension analysis, particularly in tricky lighting conditions.
>>23436 >need to keep a database of dimensions for known objects Thanks for pointing this out, but that's something I want to do anyways. Robowaifus should have a rough estimate on the traits of identifiable objects, e.g. weight and size. This can be taken from some public databases or LLMs. Then on top of that, if they see something close to an unknown object, which they can identify, they should be able to draw a conclusion about the size of the unknown one based on that.
Open file (97.63 KB 1202x743 yolo_nas_frontier.png)
>Deci is thrilled to announce the release of a new object detection model, YOLO-NAS - a game-changer in the world of object detection, providing superior real-time object detection capabilities and production-ready performance. Deci's mission is to provide AI teams with tools to remove development barriers and attain efficient inference performance more quickly. https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md
>>23776 what is mAP?
>>23777 The diagram indicates it's a form of accuracy to compare such models. >Mean Average Precision (mAP) https://blog.paperspace.com/mean-average-precision/ Found via: https://duckduckgo.com/?q=map+machine+learning+accuracy
>>23776 Very low-latency in detection is vital, insofar as her autonomous safety is concerned. The ideal is human-level speed at object recognition (or even faster). We're probably getting pretty close on smol devices already, so I predict we'll reach this goal generally by the time the first real-world robowaifus begin rolling out. Thanks Anon.
>>24909 - the computers connected to the eyes (cameras) should have different ways of sharing data with other computers, e.g. just sharing body movement analysis and recognition info as a text stream, same for the person being detected, or some emotional indicators. Sending photos and videos should be very limited, only sending encrypted files, also the system should mostly not store this data. Some home server might store and process some data for fine tuning, but needs to receive this data encrypted. Decision what to share should be made based on overall context coming from the general cognitive architecture >>24783 - fast and efficient segmentation of images (FPGAs?) - different variants or the same image, created very fast, maybe using FPGA. For further processing, e.g. only processing a low res partial image of an object to keep track of. The creation of that low res partial image should be done by a specialized system close to the cameras. - using object detection models based on context informed by the general cognitive architecture >>24783 or just based on awareness of what room she's in and maybe even at what she's looking at. So they can be smaller, faster and more specialized, including some models which are trained on the specific training data related to the household (photos and videos of the home environment).
Open file (215.89 KB 869x350 Screenshot_114.png)
Open file (326.25 KB 879x492 Screenshot_113.png)
Open file (162.74 KB 878x396 Screenshot_112.png)
>LERF optimizes a dense, multi-scale language 3D field by volume rendering CLIP embeddings along training rays, supervising these embeddings with multi-scale CLIP features across multi-view training images. After optimization, LERF can extract 3D relevancy maps for language queries interactively in real-time. LERF enables pixel-aligned queries of the distilled 3D CLIP embeddings without relying on region proposals, masks, or fine-tuning, supporting long-tail open-vocabulary queries hierarchically across the volume. >With multi-view supervision, 3D CLIP embeddings are more robust to occlusion and viewpoint changes than 2D CLIP embeddings. 3D CLIP embeddings also conform better to the 3D scene structure, giving them a crisper appearance. https://www.lerf.io https://github.com/kerrj/lerf https://drive.google.com/drive/folders/1vh0mSl7v29yaGsxleadcj-LCZOE_WEWB?usp=sharing https://arxiv.org/abs/2303.09553
> Face recognition Not tested, just looking what's available: https://github.com/cmusatyalab/openface Following quotes are from Reddit, not from me... https://github.com/ageitgey/face_recognition > I have tried this out. It's easy to code and accurately recognizes faces. The problem is it can't even detect faces 1 feet away from the camera. https://github.com/timesler/facenet-pytorch (FaceNet & MTCNN) > This can detect and recognize faces at a distance, but the problem is it can't recognize unknown faces correctly. I mean for unknown faces it always tries to label it as one of the faces from the model/ database encodings. https://github.com/serengil/deepface > I have tried VGG, ArcFace, Facenet512. The latter two gave me good results. But, the problem is I couldn't figure out how to change the detection from every 5 seconds to real-time. Also, I couldn't change the camera source. (If anyone can help me with these please do). Also, it had fps drops frequently. https://github.com/deepinsight/insightface > Couldn't test this yet. But in the demo YT video it shows the model incorrectly detecting a random object as a face. If someone knows how well this performs please let me know. https://www.reddit.com/r/computervision/comments/15ycwom/face_recognition_whats_the_state_of_the_art/ This here seems to be the best: https://github.com/ZoneMinder/zoneminder the Reddit link above has some thread and patch for detecting faces on distance, I think.
Open file (537.86 KB 877x878 LLaVA.png)
LLaVA: Large Language and Vision Assistant (https://llava-vl.github.io/) A project to integrate vision into large language models. Though very new and young as a concept, adding visual context to language models has tremendous potential. Notably, a waifu which can understand correlations between what she perceives in her environment with what she is told can lead to much more naturally feeling interactions. Fingers crossed for a fork that implements YOLO (https://pjreddie.com/darknet/yolo/) rather than CLIP (https://openai.com/research/clip) for better compute and memory efficiency. Getting this to run at sub 10 watts should be a goal.
Edited last time by Kiwi_ on 10/11/2023 (Wed) 18:33:59.
Open file (82.60 KB 386x290 Screenshot_158.png)
I was working on this here >>26112 using OpenCL to make video processing faster. So I got this here recommended by YouTube: https://www.youtu.be/0Kgm_aLunAo Github: https://github.com/jjmlovesgit/pipcounter This is using OpenCV to count pips on dominos, and does it much faster and better than GPT4-Vision. I wonder if it would be possible to have a LLM adjust the code dependent on the use case, and maybe having a library of common patterns to look out for. Ideally one would show it something new, it would detect the outer border like the stones here and then adjust till it can catch the details on all of these objects which are of interest. It could look out for patterns dependent on some context, like e.g. a desk.
>>26132 >and does it much faster and better than GPT4-Vision. Doesn't really surprise me. OpenCV is roughly the SoA in hand-written C++ code for computer vision. You have some great posts ITT Anon thanks... keep up the good work! :^)
There are several libraries and approaches that attempt to achieve generalized object detection within a context, although creating a completely automatic, context-based object detection system without predefining objects can be a complex task due to the variability of real-world scenarios. However, libraries and methodologies that have been utilized for more general object detection include: 1. YOLO (You Only Look Once): YOLO is a popular object detection system that doesn't require predefining objects in the training phase. It uses a single neural network to identify objects within an image and can detect multiple objects in real-time. However, it typically requires training on specific object categories. 2. OpenCV with Haar Cascades and HOG (Histogram of Oriented Gradients): OpenCV provides Haar cascades and HOG-based object detection methods. While not entirely context-based, they allow for object detection using predefined patterns and features. These methods can be more general but might not adapt well to various contexts without specific training or feature engineering. 3. TensorFlow Object Detection API: TensorFlow offers an object detection API that provides pre-trained models for various objects. While not entirely context-based, these models are designed to detect general objects and can be customized or fine-tuned for specific contexts. 4. Custom Object Detection Models with Transfer Learning: You could create a custom object detection model using transfer learning from a pre-trained model like Faster R-CNN, SSD, or Mask R-CNN. By fine-tuning on your own dataset, the model could adapt to specific contexts. 5. Generalized Shape Detection Algorithms: Libraries like scikit-image and skimage in Python provide various tools for general image processing and shape analysis, including contour detection, edge detection, and morphological operations. While not object-specific, they offer tools for identifying shapes within images. Each of these methods has its advantages and limitations when it comes to general object detection. If you're looking for a more context-aware system that learns and adapts to various contexts, combining traditional computer vision methods with machine learning models trained on diverse images may be a step towards achieving a more generalized object detection system. However, creating a fully context-aware, automatic object detection system that adapts to any arbitrary context without any predefined objects is still a challenging area of research. ----------------- In terms of computational requirements, here's a general ranking of the mentioned object detection methods based on the computational power and RAM they might typically require: 1. OpenCV with Haar Cascades and HOG: - Computational Power Needed: Low to Moderate - RAM Requirements: Low - These methods are computationally less intensive compared to deep learning-based models. They can run on systems with lower computational power and memory. 2. Generalized Shape Detection Algorithms (scikit-image, skimage): - Computational Power Needed: Low to Moderate - RAM Requirements: Low to Moderate - While these libraries might need slightly more computational power and RAM than Haar Cascades and HOG, they are still less demanding compared to deep learning-based models. 3. TensorFlow Object Detection API: - Computational Power Needed: Moderate to High - RAM Requirements: Moderate to High - Running pre-trained models from the TensorFlow Object Detection API might require more computational power and memory compared to traditional computer vision methods due to the complexity of the deep learning models. 4. Custom Object Detection Models with Transfer Learning: - Computational Power Needed: Moderate to High - RAM Requirements: Moderate to High - Training custom object detection models with transfer learning typically requires moderate to high computational power and memory, especially during the training phase. 5. YOLO (You Only Look Once): - Computational Power Needed: High - RAM Requirements: High - YOLO models are relatively demanding in terms of computational power and memory. They require more powerful machines due to their deep neural network architecture and real-time processing capabilities. The exact computational requirements and memory usage can vary based on the specific hardware, image sizes, complexity of the models, and the scale of the operations being performed. Deep learning models, in general, tend to demand more computational resources compared to traditional computer vision methods. If you're working with large datasets or real-time processing, more powerful hardware configurations would likely be necessary to achieve optimal performance. -------- https://github.com/opencv/opencv/tree/master/data/haarcascades -------- If your goal is to detect shapes without knowing the specific objects at first, OpenCV's contour detection methods combined with image processing techniques could be more appropriate than scikit-learn. Once shapes are identified, further analysis or categorization can be performed using traditional machine learning algorithms from scikit-learn or other methods.
>>26146 Understood. If the goal is to identify various objects within a specific context (like a desk) without predefining the objects, and the lighting conditions might vary, a more flexible approach using general computer vision techniques can be applied. This could involve methods such as contour detection, edge detection, and basic image processing techniques to identify objects within the context of a desk. You might use a more generalized version of object detection that isn’t specific to particular objects but rather identifies any distinguishable shape within the context. Here’s an example: python import cv2 # Read the image image = cv2.imread('path_to_your_image.jpg') # Convert to grayscale gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # Apply thresholding or other preprocessing techniques to enhance object edges # ... # Find contours contours, _ = cv2.findContours(processed_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) detected_objects = [] for contour in contours: # Apply some conditions to filter objects based on size, shape, etc. # For instance, you might filter by area or aspect ratio area = cv2.contourArea(contour) if area > some_minimum_area_threshold: detected_objects.append(contour) # Count and display the number of detected objects print(f"Number of objects detected: {len(detected_objects)}") This code applies general techniques such as contour detection to identify distinguishable shapes within the context of the desk. The process of identifying objects relies on the uniqueness of their shapes and their contrast against the background. The challenge in this approach lies in how the algorithm distinguishes objects based on their shapes and sizes. It might not identify specific objects but rather any shape that meets certain criteria (like area, aspect ratio, etc.) within the provided context (in this case, the desk). This method might detect a variety of objects but could also identify false positives or miss some objects. Fine-tuning the conditions for object identification (like area thresholds or other characteristics) can improve the accuracy of detection within the context of the desk, considering the variability in lighting and object characteristics.
Open file (346.77 KB 696x783 1698709850406174.png)
Open file (199.93 KB 767x728 1698710469395618.png)
I suppose this is a good thread to use for discussing this concept: a swarm of small drones available for a robowaifu's use for enhanced perimeter/area surveillance, etc.
>1.6B parameter model built using SigLIP, Phi-1.5 and the LLaVA training dataset. Weights are licensed under CC-BY-SA due to using the LLaVA dataset. Try it out on Hugging Face Spaces! https://github.com/vikhyat/moondream https://huggingface.co/spaces/vikhyatk/moondream1 https://youtu.be/oDGQrOlmC1s >The model is release for research purposes only, commercial use is not allowed. >circa 6GB or 4GB quantized
>>29286 Thanks. Do you have any views on it's usefulness r/n, Anon?
Open file (84.01 KB 960x720 yuina.png)
For people looking for a Kinect, I've had success finding them at electronics recycle centers. RE:PC in Seattle had a big bin. Also, I just checked and they're going for under ten dollars on eBay lol. I had also heard that the Kinect's depth camera isn't all too necessary at this point due to how good neural networks have gotten recently. Is there any merit to that?
>>29911 Unless you're using the kinect to do some sort of 3d mapping you can get stuff like pose landmark detection using AI stuff and a standard webcam, like Gulag's open-source library Mediapipe. https://mediapipe-studio.webapps.google.com/home I use some of their models for object recognition :D
>>29911 >>29915 Thanks for both the great tips, Anons! Cheers. :^)
>>29367 I think we would need workarounds if such models are not fast enough, but wow it needs less than a second to identify common objects in a photo of a room from a home. I guess on a smaller computer it would be slower, but still. This is good enough for now, and it's just a stepping stone. Keep in mind, we don't need it as fast and general as AI in cars. The waifus will mostly look at the same home with the same objects all the time. >>29911 My issue is rather that I don't want to use a device which I can only get from recycling centers. Also, I want two cams which can move on their own and I decide on which distance they are. I guess something like Kudan will be the way to go: https://www.youtube.com/@KudanLimited
A bit odd no one mentioned LiDAR. This would allow for a better sense of depth and objects behind themselves out of ordinary vision to avoid walking backwards into someone or elbowing them.
>>30138 but the cyberninjas wear black
>>30139 Black clothes aren't that black especially as the dye fades over time. If you want to be picky being just a secondary source of sight you could use at compromise of resolution instead use radar just for a general awareness to know to carefully turn to see what is at a location.
>>30138 To add to my earlier point. I found a diy LiDAR that is supposed to cost $40 to make. https://www.instructables.com/Project-Lighthouse-360-Mini-Arduino-LiDAR/
>>30174 > I found a diy LiDAR that is supposed to cost $40 to make. I'd think that's a game-changer for the mapping need, if it's legit and reliable. Thanks, Anon! Cheers. :^)
>>30180 Considering usual cost of LiDAR I am thinking this is a bit less accurate and shorter range but it's still useful for this kind of application likely. Im not sure why the developer privated his videos. They might be still viewable through Archive.
>>30189 >Considering usual cost of LiDAR I am thinking this is a bit less accurate and shorter range but it's still useful for this kind of application likely. Yeah makes sense. >Im not sure why the developer privated his videos. In my experience, that's one of the first signs that an opensource system is going closed source. They block the assets from the publice b/c """reasons""". >They might be still viewable through Archive. Not sure what that means.
>>30190 >signs that an opensource system is going closed source He left up the files for making it though. Apparently his whole YouTube channel is gone. >Not sure what that means. I found the URL for one at least that was archived. The follow up update video wasn't archived unfortunately. https://web.archive.org/web/20210202100801/https://www.youtube.com/watch?v=uYU534Wn4lA I managed to find a similar priced one though a little more cost that used to be available as a kit but it appears to be a different design, The website seems to no longer exist. https://web.archive.org/web/20211129020703/https://curiolighthouse.wixsite.com/lighthouse Found that one from a video of some guy assembling it https://www.youtube.com/watch?v=_aRcoI25HqE>>30190 Going down that rabbit hole from YouTube recommend vids lead me to two others $44 but this one is a single point instead of 360º https://www.dfrobot.com/product-1702.html This one is $99 https://www.dfrobot.com/product-1125.html
>>30193 Wait never mind about the curiolighthouse. It seems my browser was just not redirecting to the page properly. That site is still up.
Just found out 3D cameras for sensing depth are called A "depth camera" or "3D depth sensor" or "stereoscopic depth sensor" sometimes terms like "binocular depth camera" appear. They capture color (some IR too) and depth in a single system like our vision works. Though if you used one of these premade units it would mean having only head turning not eye turning.
>>29915 Started on the kinect lite guide because I don't want giant XBOX 360 bars on my robot's face. And just now after saying it I regret hacking it apart. It's still huge after making it half the size, the length of a smartphone. https://medium.com/robotics-weekends/how-to-turn-old-kinect-into-a-compact-usb-powered-rgbd-sensor-f23d58e10eb0
>>30877 I know this is a stupid question but can you strip those components right out of the suppoirt frame and have them simply connected to the wires?
>>30879 Zoom in to the whole in the centre. Looks like there is a circuit board under there. If one were to take it out of the frame it would require adding wires and attaching back to the circuit board I imagine.
>>30879 >>30880 I expect the physical positioning of the 3 camera components is tightly registered. Could be recalibrated I'm sure, but it would need to be done.
>>30879 >Depth Perception From what I know these systems work so that it knows the distance between the two cameras and this is part of the hardware. If you want to do this yourself then your system would need to know the distance. I think Kudan Slam is a software doing that: >>29937 and >>10646 >Kudan Visual SLAM >This tutorial tells you how to run a Kudan Visual SLAM (KdVisual) system using ROS 2 bags as the input containing data of a robot exploring an area https://amrdocs.intel.com/docs/2023.1.0/dev_guide/files/kudan-slam.html >The Camera Basics for Visual SLAM >“Simultaneous Localization and Mapping usually refer to a robot or a moving rigid body, equipped with a specific sensor, that estimates its motion and builds a model of the surrounding environment, without a priori information [2]. If the sensor referred to here is mainly a camera, it is called Visual SLAM.” https://www.kudan.io/blog/camera-basics-visual-slam/ >.... ideal frame rate ... 15 fps: for applications with robots that move at a speed of 1~2m/s >The broader the camera’s field of view, the more robust and accurate SLAM performance you can expect up to some point. >...the larger the dynamic range is, the better the SLAM performance. >... global shutter cameras are highly recommended for handheld, wearables, robotics, and vehicles applications. >Baseline is the distance between the two lenses of the stereo cameras. This specification is essential for use-cases involving Stereo SLAM using stereo cameras. >We defined Visual SLAM to use the camera as the sensor, but it can additionally fuse other sensors. >Based on our experience, frame skip/drop, noise in images, and IR projection are typical pitfalls to watch out. >Color image: Greyscale images suffice for most SLAM applications >Resolution: It may not be as important as you think >Visual SLAM: The Basics - https://www.kudan.io/archives/433 Edit: Added the tutorial and articles about "Camera Basics" and "Visual SLAM Basics".
Open file (225.52 KB 1252x902 kinectxie.jpg)
>>30877 The kinect was cheap at 12$ and I scaled it to the full sized robot head in gimp. I can use the main camera in the middle of aperture and the two projector/IR camera lenses as the eye shines. It won't look like this in the final robot head, but it will be positioned in this manner.
Will Cogley came out with a snap fit eye mechanism (no screws needed). > By removing ALL fasteners and using a 100% snap-fit assembly, assembly time is cut down 6 fold! Hopefully this design will also be more accessible if you struggle to get the right parts for my projects. If you don’t want to use my new PCB design (which admittedly is a work in progress) refer to [my previous design](https://www.notion.so/Simple-Eye-Mechanism-983e6cad7059410d9cb958e8c1c5b700?pvs=21) for electronics/wiring instructions. > If you do want to use the PCB, note that its still a work-in-progress. The design works although there is an issue with some holes being undersized. In theory the attached file is fixed but I’ve yet to test it myself to be 100% sure! https://youtu.be/uzPisRAmo2s https://nilheim-mechatronics.notion.site/Snap-fit-Eye-Mechanism-b88ae87ceae24d1ca942adf34750bf87

Report/Delete/Moderation Forms
Delete
Report