/robowaifu/ - Robot Eyes/Vision General

Name
Subject
E-mail
Message	Max message length: 6144
Files	Drag files to upload or click here to select them Maximum 5 files / Maximum size: 20.00 MB

Spoiler images
Password	(used to delete files and postings)
Use bypass

Robowaifu Technician 02/24/2024 (Sat) 20:34:17 No.29911

For people looking for a Kinect, I've had success finding them at electronics recycle centers. RE:PC in Seattle had a big bin. Also, I just checked and they're going for under ten dollars on eBay lol. I had also heard that the Kinect's depth camera isn't all too necessary at this point due to how good neural networks have gotten recently. Is there any merit to that?

Mechnomancer 02/24/2024 (Sat) 23:30:39 No.29915

>>29911 Unless you're using the kinect to do some sort of 3d mapping you can get stuff like pose landmark detection using AI stuff and a standard webcam, like Gulag's open-source library Mediapipe. https://mediapipe-studio.webapps.google.com/home I use some of their models for object recognition :D

Chobitsu 02/25/2024 (Sun) 01:25:49 No.29918

>>29911 >>29915 Thanks for both the great tips, Anons! Cheers. :^)

NoidoDev ##pTGTWW 02/25/2024 (Sun) 15:40:01 No.29937

>>29367 I think we would need workarounds if such models are not fast enough, but wow it needs less than a second to identify common objects in a photo of a room from a home. I guess on a smaller computer it would be slower, but still. This is good enough for now, and it's just a stepping stone. Keep in mind, we don't need it as fast and general as AI in cars. The waifus will mostly look at the same home with the same objects all the time. >>29911 My issue is rather that I don't want to use a device which I can only get from recycling centers. Also, I want two cams which can move on their own and I decide on which distance they are. I guess something like Kudan will be the way to go: https://www.youtube.com/@KudanLimited

Robowaifu Technician 03/05/2024 (Tue) 22:05:07 No.30138

A bit odd no one mentioned LiDAR. This would allow for a better sense of depth and objects behind themselves out of ordinary vision to avoid walking backwards into someone or elbowing them.

Robowaifu Technician 03/05/2024 (Tue) 22:30:04 No.30139

>>30138 but the cyberninjas wear black

Robowaifu Technician 03/06/2024 (Wed) 02:07:29 No.30148

>>30139 Black clothes aren't that black especially as the dye fades over time. If you want to be picky being just a secondary source of sight you could use at compromise of resolution instead use radar just for a general awareness to know to carefully turn to see what is at a location.

Robowaifu Technician 03/07/2024 (Thu) 13:52:14 No.30174

>>30138 To add to my earlier point. I found a diy LiDAR that is supposed to cost $40 to make. https://www.instructables.com/Project-Lighthouse-360-Mini-Arduino-LiDAR/

Chobitsu 03/07/2024 (Thu) 18:10:40 No.30180

>>30174 > I found a diy LiDAR that is supposed to cost $40 to make. I'd think that's a game-changer for the mapping need, if it's legit and reliable. Thanks, Anon! Cheers. :^)

Robowaifu Technician 03/07/2024 (Thu) 23:13:48 No.30189

>>30180 Considering usual cost of LiDAR I am thinking this is a bit less accurate and shorter range but it's still useful for this kind of application likely. Im not sure why the developer privated his videos. They might be still viewable through Archive.

Chobitsu 03/07/2024 (Thu) 23:37:59 No.30190

>>30189 >Considering usual cost of LiDAR I am thinking this is a bit less accurate and shorter range but it's still useful for this kind of application likely. Yeah makes sense. >Im not sure why the developer privated his videos. In my experience, that's one of the first signs that an opensource system is going closed source. They block the assets from the publice b/c """reasons""". >They might be still viewable through Archive. Not sure what that means.

Robowaifu Technician 03/08/2024 (Fri) 00:32:46 No.30193

>>30190 >signs that an opensource system is going closed source He left up the files for making it though. Apparently his whole YouTube channel is gone. >Not sure what that means. I found the URL for one at least that was archived. The follow up update video wasn't archived unfortunately. https://web.archive.org/web/20210202100801/https://www.youtube.com/watch?v=uYU534Wn4lA I managed to find a similar priced one though a little more cost that used to be available as a kit but it appears to be a different design, The website seems to no longer exist. https://web.archive.org/web/20211129020703/https://curiolighthouse.wixsite.com/lighthouse Found that one from a video of some guy assembling it https://www.youtube.com/watch?v=_aRcoI25HqE >>30190 Going down that rabbit hole from YouTube recommend vids lead me to two others $44 but this one is a single point instead of 360º https://www.dfrobot.com/product-1702.html This one is $99 https://www.dfrobot.com/product-1125.html

Robowaifu Technician 03/08/2024 (Fri) 00:39:51 No.30194

>>30193 Wait never mind about the curiolighthouse. It seems my browser was just not redirecting to the page properly. That site is still up.

Robowaifu Technician 03/29/2024 (Fri) 19:35:38 No.30620

Just found out 3D cameras for sensing depth are called A "depth camera" or "3D depth sensor" or "stereoscopic depth sensor" sometimes terms like "binocular depth camera" appear. They capture color (some IR too) and depth in a single system like our vision works. Though if you used one of these premade units it would mean having only head turning not eye turning.

Robowaifu Technician 04/12/2024 (Fri) 02:46:27 No.30877

>>29915 Started on the kinect lite guide because I don't want giant XBOX 360 bars on my robot's face. And just now after saying it I regret hacking it apart. It's still huge after making it half the size, the length of a smartphone. https://medium.com/robotics-weekends/how-to-turn-old-kinect-into-a-compact-usb-powered-rgbd-sensor-f23d58e10eb0

Robowaifu Technician 04/12/2024 (Fri) 05:10:55 No.30879

>>30877 I know this is a stupid question but can you strip those components right out of the suppoirt frame and have them simply connected to the wires?

Robowaifu Technician 04/12/2024 (Fri) 05:51:53 No.30880

>>30879 Zoom in to the whole in the centre. Looks like there is a circuit board under there. If one were to take it out of the frame it would require adding wires and attaching back to the circuit board I imagine.

Robowaifu Technician 04/12/2024 (Fri) 07:36:27 No.30881

>>30879 >>30880 I expect the physical positioning of the 3 camera components is tightly registered. Could be recalibrated I'm sure, but it would need to be done.

NoidoDev ##pTGTWW 04/12/2024 (Fri) 08:40:51 No.30884

>>30879 >Depth Perception From what I know these systems work so that it knows the distance between the two cameras and this is part of the hardware. If you want to do this yourself then your system would need to know the distance. I think Kudan Slam is a software doing that: >>29937 and >>10646 >Kudan Visual SLAM >This tutorial tells you how to run a Kudan Visual SLAM (KdVisual) system using ROS 2 bags as the input containing data of a robot exploring an area https://amrdocs.intel.com/docs/2023.1.0/dev_guide/files/kudan-slam.html >The Camera Basics for Visual SLAM >“Simultaneous Localization and Mapping usually refer to a robot or a moving rigid body, equipped with a specific sensor, that estimates its motion and builds a model of the surrounding environment, without a priori information [2]. If the sensor referred to here is mainly a camera, it is called Visual SLAM.” https://www.kudan.io/blog/camera-basics-visual-slam/ >.... ideal frame rate ... 15 fps: for applications with robots that move at a speed of 1~2m/s >The broader the camera’s field of view, the more robust and accurate SLAM performance you can expect up to some point. >...the larger the dynamic range is, the better the SLAM performance. >... global shutter cameras are highly recommended for handheld, wearables, robotics, and vehicles applications. >Baseline is the distance between the two lenses of the stereo cameras. This specification is essential for use-cases involving Stereo SLAM using stereo cameras. >We defined Visual SLAM to use the camera as the sensor, but it can additionally fuse other sensors. >Based on our experience, frame skip/drop, noise in images, and IR projection are typical pitfalls to watch out. >Color image: Greyscale images suffice for most SLAM applications >Resolution: It may not be as important as you think >Visual SLAM: The Basics - https://www.kudan.io/archives/433 Edit: Added the tutorial and articles about "Camera Basics" and "Visual SLAM Basics".

Robowaifu Technician 04/22/2024 (Mon) 03:04:11 No.30994

>>30877 The kinect was cheap at 12$ and I scaled it to the full sized robot head in gimp. I can use the main camera in the middle of aperture and the two projector/IR camera lenses as the eye shines. It won't look like this in the final robot head, but it will be positioned in this manner.

NoidoDev ##pTGTWW 06/07/2024 (Fri) 18:58:00 No.31472

Will Cogley came out with a snap fit eye mechanism (no screws needed). > By removing ALL fasteners and using a 100% snap-fit assembly, assembly time is cut down 6 fold! Hopefully this design will also be more accessible if you struggle to get the right parts for my projects. If you don’t want to use my new PCB design (which admittedly is a work in progress) refer to [my previous design](https://www.notion.so/Simple-Eye-Mechanism-983e6cad7059410d9cb958e8c1c5b700?pvs=21) for electronics/wiring instructions. > If you do want to use the PCB, note that its still a work-in-progress. The design works although there is an issue with some holes being undersized. In theory the attached file is fixed but I’ve yet to test it myself to be 100% sure! https://youtu.be/uzPisRAmo2s https://nilheim-mechatronics.notion.site/Snap-fit-Eye-Mechanism-b88ae87ceae24d1ca942adf34750bf87

Robowaifu Technician 12/27/2024 (Fri) 17:29:51 No.35167

> (eye-assembly -related : >>35165 )

Robowaifu Technician 01/04/2025 (Sat) 23:02:57 No.35339

> (eye-design -related >>35318, >>35338 )

TFT Display Eyes Robophiliac 01/09/2025 (Thu) 21:28:47 No.35511

>>1666 >>8817 >>26306 There seems to be some interest in display "eyes" that don't actually help the robot to see, but probably not enough for it's own thread, so for now I'll just park this here. From this thread on the dollforum: NSFW https://dollforum.com/forum/viewtopic.php?t=189110 Links in thread reproduced here, just in case: An example of a sexdoll on reddit (NSFW): https://www.reddit.com/r/SexDolls/comments/1gvulh4/video_custom_eyes/ Same doll with different image for emotion(NSFW) https://www.reddit.com/r/SexDolls/comments/1gxums5/kawaii/ Same doll, different display with moving tongue(NSFW) https://www.reddit.com/r/SexDolls/comments/1gxvwme/omg_thats_good/ A display entry on amazon. Search "round tft display" as offerings change over time. https://www.amazon.com/gp/product/B0B7TFRNN1/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 An Instructible article on the software: https://www.instructables.com/TFT-Animated-Eyes/ A tutorial video on youtube: Master the Round TFT Display on ESP32 and GC9A01 driver with the TFT_eSPI library https://www.youtube.com/watch?v=pmCc7z_Mi8I OP's results video: https://youtu.be/S-ktv1snsiQ Uncanny eyes Halloween skull https://www.instructables.com/Uncanny-Eyes-Halloween-Skull-Animatronic/ github link for large eyes (used in halloween skull) https://github.com/dalori/ESP32-uncanny-eyes-halloween-skull Large eyes tutorial on youtube: https://youtu.be/G2RZFX-qwnI

Chobitsu 01/10/2025 (Fri) 02:16:09 No.35518

>>35511 This is definitely the correct thread, Robophiliac. >pic Care to >tl;dr what we're looking at here a bit more? The one on the right certainly looks pretty suited as a static eye. Can it 'move'? What about the left one? TIA.

Robophiliac 01/10/2025 (Fri) 02:25:22 No.35519

>>35511 >>35518 > what we're looking at here a bit more? Sorry, it's a size comparison with a semispherical doll eye; to show it's pretty much a drop-in replacement fit. If I wanted to go that route, they would fit nicely in the heads I'm getting to modify.

Chobitsu 01/10/2025 (Fri) 05:14:32 No.35527

>>35519 Ah, got it thanks Robophiliac. Once you have an assembly together, would you mind posting clip(s) of these eyes 'in action' please? It might help all of us to understand your approach better. Cheers. :^)

Robophiliac 01/15/2025 (Wed) 04:14:15 No.35649

>>35318 >>35338 >Most animatronic eyes use a central pivot point in the eyeballs, greatly reducing the available area for a camera. They were designed as props, not robots. Only some mods to the inmoov design and a few others are intentionally "camera friendly", and only some also have eyelids. There is an inmoov mod for the ezrobot hardware I have, but that system uses a single camera, and the mod doesn't include eyelids. Among the security cams the main concerns are size, range of focus, the ability to continuously view the signal live (preferably via wire) and the presence of microphones for possible use as "ears". Any suggestions appreciated. >effective, highly capable (and 'sovlful') stereoscopic eye designs; including the accessory 'tissues' (lids, brows, &tc.) As it so often seems, the biggest hurdle to finding something online is figuring out what that thing is called, so you know what to look for. https://www.ebay.com/sch/i.html?_nkw=Mini-CCTV-Camera-Security-Micro-Audio-Wired If you have other ideas for search terms, go ahead and add them. As you can see from the search results, many of the cameras on offer will easily fit in the area in front of the central pivot of many animatronic eye designs. Most also have RCA type output connectors, but there are RCA to USB adapters available, so you could use them with open-CV, or any other system that uses a USB feed. Now that cameras are available to choose from, among considerations of actual size, the presence of a microphone, power consumption and any other features the camera may have, One area hardly ever discussed is resolution vs computing power- How high of a resolution can your robot process? If you are using an SBC, will it be able to process the video signal(s) and be able to perform other tasks at the same time? can it walk and chew gum? . If you are using a tethered system sending video and control signals back and forth wirelessly to a more powerful computer, it may be necessary to use a low-resolution camera system (or more than one data channel) to avoid "buffering" of the data flow. We don't want the robot to walk into a wall that it saw, but didn't get the message to turn in time to avoid. Yes, we could install collision sensors and an automatic stop function, but we could then be getting "pauses" every time the buffering situation occurred, during various tasks. This would be very non-human-like, and annoying . So, the problem becomes how much resolution do we want/need vs computing power and it's $ price? One question immediately occurs; would it be possible to change the resolution on the fly by using a software "switch" to tell the processing computer to drop every other bit(pixel?), or to process only every 3rd or 4th bit? Or to go to black-and-white for most operations?

GreerTech 01/15/2025 (Wed) 04:17:50 No.35650

>>35649 >One question immediately occurs; would it be possible to change the resolution on the fly by using a software "switch" to tell the processing computer to drop every other bit(pixel?), or to process only every 3rd or 4th bit? Or to go to black-and-white for most operations? Makes me think of Terminator Vision

GreerTech 01/15/2025 (Wed) 04:21:27 No.35651

Here's a good, albeit outdated tutorial on computer vision. https://www.societyofrobots.com/programming_computer_vision_tutorial.shtml

Barf 01/15/2025 (Wed) 05:04:41 No.35652

Just started going through this thread. Lots of options and depends on scope of project I guess. Sounds like you can do old way with things like depth sensors using ultrasonic or lidar but then you have to program all the spatial reasoning yourself. Spatial reasoning models look like they are just taking off though. For now, from what I've seen like link below, most are clipping frames, downsizing using ffmpeg and then passing to a vision model for image details. You could do that with a Qwen 2B-VL and pass to larger model or fine tune one depending on scope again. But that doesn't give you spatial reasoning. https://www.youtube.com/watch?v=QHBr8hekCzg Hopefully over next year open weight modals will be released and at some point a full multi-modal for text, audio and video reasoning will be within Nvidia Jetson range. Am I off here, or is that the current state basically?

Chobitsu 01/15/2025 (Wed) 21:25:45 No.35660

>>35649 Interesting ideas, Robophiliac. Thanks! >variable-resolution encoding I think some wizardy with ffmpeg or other codec systems might provide you with that 'on-the-fly' variability, Anon? Maybe have two SBCs dedicated to the vision tasks onboard? Good luck, Anon! :^) >>35650 >pic I lel'd a little. I've wondered at this oft-repeated trope over the years (this film was made in the '80s sometime, I think). Why would they think a robowaifu (or terminator, in this case) would want to see a text overlay on it's visual field like it was playing some kind of vidya? :D

NoidoDev ##pTGTWW 01/15/2025 (Wed) 23:48:21 No.35667

>>35649 >Mini-CCTV-Camera-Security-Micro-Audio-Wired Thanks. Good find. In the past I also looked at such small cameras, but these where for model airplanes. I think these were analog and it would've been a bit tricky to get the signal encoded into digital. >would it be possible to change the resolution on the fly by using a software "switch" to tell the processing computer to drop every other bit(pixel?), or to process only every 3rd or 4th bit? Or to go to black-and-white for most operations? This would be great. I had similar ideas, but rather for the computer next to the camera. Maybe some FPGA that can switch between different modes, idk? My vague idea was that the computer, or several small ones, would change the picture very fast into various formats and cuts. At least several resolutions down to very low ones, maybe removing the color, also only the center or certain parts of the picture. Maybe there's also a technique to change the color in a certain way, so the object in a color you are looking for sticks out more. Focus e.g. cutting out faces or objects would require a fast adaptive system, but the other operations should be done by something very fast and energy efficient. Maybe an ASIC, I guess. Then the system downstream would not look at video data the whole time, but only analyze the lowest amount of data to figure out what's going on.

GreerTech 01/16/2025 (Thu) 04:19:37 No.35680

>>35660 My headcanon is that since the neural net CPUs of the Terminators were like human brains, it helped them in some way to visually see that information.

Chobitsu 01/16/2025 (Thu) 08:11:24 No.35693

>>35651 Thanks, GreerTech! >>35652 >Am I off here, or is that the current state basically? I think it's a good idea to experiement with current NVIDIA Jetson board if you can do so, Anon. As to the camera, I'd say to just pick the smolest one that gets the job done & is compatible with your processing board. This is an area that is under heavy R&D, so I wouldn't worry too much about waiting until "just the perfect choice" comes out. Good luck, Barf. >>35667 >pic Cute. :^) >>35680 Hehe, makes sense.

Barf 01/16/2025 (Thu) 15:15:53 No.35700

Haven't gone through all the threads yet, but here's a good repo of code and prints. It shows the frame processing for a visual LLM at about a frame per second, and shows it doing reasoning a bit. https://www.youtube.com/watch?v=0O8RHxpkcGc/&t=14m09s https://openroboticplatform.com/library https://github.com/NikodemBartnik/Machine-Learning-Robot I only have an ESP8266 and my main PC to start, but if I ever get that far, I might get a Jetson.

GreerTech 01/24/2025 (Fri) 12:17:50 No.36020

https://venturebeat.com/ai/hugging-face-shrinks-ai-vision-models-to-phone-friendly-size-slashing-computing-costs/ Visual AIs are becoming very compact now.

Barf 01/24/2025 (Fri) 14:39:12 No.36024

>>36020 Here's a link to the WebGPU version - https://huggingface.co/spaces/HuggingFaceTB/SmolVLM-256M-Instruct-WebGPU Really good image description for a 256MB model. The description was about as good as 2.5\7B models I've used Now if I could just figure out how to fine tune the data - https://huggingface.co/datasets/HuggingFaceM4/the_cauldron

Barf 01/24/2025 (Fri) 16:41:27 No.36025

These with the convex lens that you can split apart might be nice. No cameras, but could probably be added and has everything else on a custom PCB already. https://www.adafruit.com/product/4343

Chobitsu 01/24/2025 (Fri) 17:37:47 No.36029

This is really exciting stuff lately ITT, Anons. Thanks for linking to resources for us all! Cheers. :^)

GreerTech 01/29/2025 (Wed) 09:22:47 No.36237

Researchers were able to tweak machine vision into being usable in low-light conditions https://techxplore.com/news/2025-01-neural-networks-machine-vision-conditions.html

Chobitsu 01/29/2025 (Wed) 09:28:28 No.36239

>>36237 Thanks GreerTech! I'm actually interested in devising a smol flotilla of surveillance drones (the tiny, racing ones) for a robowaifu's use for situational-awareness on grounds. Having 'night vision' is very useful for this ofc -- especially if no special hardware would be required! Cheers. :^)

GreerTech 01/29/2025 (Wed) 10:34:51 No.36242

>>25927 Unfortunately, it looks like project may be dead (your link was broken and the last update was in last January, but I wonder if it could be retooled with newer and more efficient LLMs and vision models. It definitely caught my eye, it solved the elephant in the room I was thinking about, how do we tie a vision model to an LLM? https://github.com/haotian-liu/LLaVA

GreerTech 04/29/2025 (Tue) 05:27:04 No.37990

I mentioned Tesla FSD cameras in another thread, and I thought of this video I saw https://youtu.be/fKXztwtXaGo?si=pyoeQ2b-0vqfaKcW

Robowaifu Technician 04/29/2025 (Tue) 05:48:40 No.37997

>>37990 > (vision -related convo : >>37987, ...)

GreerTech 04/29/2025 (Tue) 05:53:13 No.38001

Depth perception clues, both monocular and binocular >>37995

GreerTech 04/29/2025 (Tue) 06:05:42 No.38004

>>38001

Robowaifu Technician 04/30/2025 (Wed) 05:46:58 No.38041

> (robo-vision -related : >>23145, >>37985, et al )

GreerTech 07/29/2025 (Tue) 03:09:15 No.40043

Vision based on a Python library that uses "abstractions" of objects https://www.hackster.io/news/explore-robotics-and-computer-vision-with-zumo-jetson-5d158fede708

Chobitsu 07/31/2025 (Thu) 02:15:00 No.40087

>>40043 Thanks, GreerTech! Cheers. :^)