/robowaifu/ - Robot Eyes/Vision General

Name
Subject
E-mail
Message	Max message length: 6144
Files	Drag files to upload or click here to select them Maximum 5 files / Maximum size: 20.00 MB

Spoiler images
Password	(used to delete files and postings)
Use bypass

Robowaifu Technician 10/05/2019 (Sat) 02:40:00 No.1107

>Unless you want to deck out you're waifubot in dark glasses and a white cane
But, OP, what if the model for the waifubot is supposed to be blind? #triggered

Robowaifu Technician 10/05/2019 (Sat) 02:42:35 No.1108

>>97
The inmoov project managed to use the playstation move(?) to give the robot vision, the script is freely available for it. They also use purchasable cameras for the eyes.
inmoov.fr/eye-mechanism.

https://www.invidio.us/watch?v=H4Z09edx52E

Robowaifu Technician 10/05/2019 (Sat) 02:43:34 No.1109

>>1107
KEK

>>1108
That's cool anon thanks. I had planned on using the open sauce version of kinect (Willow Garage) and a couple of hi quality 1080p webcams.

Robowaifu Technician 10/05/2019 (Sat) 02:44:53 No.1110

I just purchased a couple of the JeVois cameras. I plan to try using them on a little moebot. What's neat about them, is the have a dedicated 4-core CPU running Linux and any other vision software like OpenCV right in the camera. This pretty much entirely offloads the vision processing computation from the robowaifu's other onboard processors. Thanks to the anon in other thread for first posting the camera for us.

https://www.invidio.us/watch?v=7cMtD-ef83E

Robowaifu Technician 10/05/2019 (Sat) 02:46:32 No.1111

>>1107
Blindfolded girls a cute.

On topic: a new Google vision kit
www.google.com/amp/s/www.theverge.com/platform/amp/2017/11/30/16720322/google-aiy-vision-kit-raspberry-pi-announce-release

Robowaifu Technician 10/05/2019 (Sat) 02:47:54 No.1112

>>1111
Thanks anon, I'll check it out come New Year. If I feel it doesn't feed their botnet, I'll recommend it. Either way, I'll post back here and let you know what I think about it in roughly a month or so.

>ed. surely anon will deliver...

Robowaifu Technician 10/05/2019 (Sat) 02:48:25 No.1113

facial recognition convo:
[[1129

Robowaifu Technician 10/05/2019 (Sat) 02:49:19 No.1114

Anon linked a nice OpenCV training source:
[[1154

OpenCV tutorial site
[[1161

JeVois camera set up:
[[1163

Robowaifu Technician 10/05/2019 (Sat) 02:50:13 No.1115

https://www.invidio.us/watch?v=BaWostkMClA

Robowaifu Technician 10/05/2019 (Sat) 02:52:23 No.1116

I just found this;
hackaday.com/2018/02/15/student-3d-prints-eyes/

Robowaifu Technician 10/05/2019 (Sat) 02:54:57 No.1117

I'm thinking of using an acrylic light guide / light splitter that reads a single image from a TFT display and duplicates it for two small eyeballs. The eyeballs can be physical, so this will save some logic and wonkiness and I can just focus on making a nice retina pattern; the light guide can be a bunch of optical fibers. Pic is supposed to be a Lego piece, wish there was a better illustration.

Robowaifu Technician 10/05/2019 (Sat) 02:56:21 No.1118

>>1117
Sounds interesting FluffyDev, can't say I really 'get it' specifically just yet, but I am familiar with optical fibers.

Robowaifu Technician 10/05/2019 (Sat) 02:58:08 No.1119

>>1107
>>1111
>>1118
Upon VERY SERIOUS consideration, one-eyed girls may work for our purpose, either temporarily one-eyed or even permanently, since the space behind the other eye can be used for additional electronics and actuators (much needed space especially for a smaller 80cm bot such as mine). A tube guide consisting of either solid plastic tubing or bundled optical fibers and a mirror system will make it possible to have both a real camera vision system as well as fancy pupil/retina graphics. Plus with a physically moving eye, the camera will be able to pan also. (Yes it will be a huge eye mechanism, hence the SERIOUS CONSIDERATION of making the waifu one-eyed on purpose). But she will actually have vision which you can display on the debug computer!

Robowaifu Technician 10/05/2019 (Sat) 02:59:00 No.1120

>>1119
>not just having a leela robowaifu
plebeian pls

Robowaifu Technician 10/05/2019 (Sat) 02:59:31 No.1121

>>1119
Eye patch is beauty

##+9XbDi 10/05/2019 (Sat) 03:02:59 No.1123

>>1122

Robowaifu Technician 11/30/2019 (Sat) 21:42:47 No.1657

I'm not really a fan of this eye mechanism since it doesn't support a camera but it could be redesigned to support one. The eyes look pretty good and the process could be adapted to creating anime doll eyes too.

>How to Make Realistic Eyes Using 3D Printing for Animatronic Eye Mechanisms
https://www.youtube.com/watch?v=RqZRKUbA_p0

>How to Build a Simple 3D Printed Arduino Animatronic Eye Mechanism
https://www.youtube.com/watch?v=Ftt9e8xnKE4

Robowaifu Technician 12/01/2019 (Sun) 06:43:54 No.1664

>>1657
Yes, I think I remembered that one from before. I like the face the parts are mostly 3D-printable with a cheap printer. Can probably use a resin-based UV printer for the more precision parts like the eyeballs/lids themselves. I don't see any reason small cameras couldn't be fitted inside the interior globes of the eyeballs anon. Even the Jevois camaeras (that have actual tiny fans on them b/c Linux+OpenCV coprocessor right on board) should work with the half-open design of the eyeball.

Robowaifu Technician 12/01/2019 (Sun) 06:44:37 No.1665

>>1664
>I like the fact*

Robowaifu Technician 12/01/2019 (Sun) 08:26:36 No.1666

>>1657
reminds me of the Robot from Lexx

Robowaifu Technician 07/22/2020 (Wed) 04:55:06 No.4335

Here's one with two cameras and low amount of space: https://youtu.be/DY4as7Lc9KY My own thinking is, that ideally we should be able to take eyeballs out from the front. The sockets should be expandable four maintenance. If taken out, the servos should be replaceable as well. But then, I'd like my waifu to be waterproof (shower) and she should have tears. So the servos need to be separated with a layer of silicone somehow, maybe they move a magnet and the eyeballs have some metal inside?

Robowaifu Technician 07/22/2020 (Wed) 13:36:03 No.4337

>>4335 >that ideally we should be able to take eyeballs out from the front. The sockets should be expandable four maintenance. That's a great idea Anon. It will allow for easy upgrading of the cameras, and as the control mechanisms will be moving thousands of times a day (even if only for small distance) affords maintenance access. I propose that the entire assembly be able to slide forward on a tray to afford easy access from most angles.

Robowaifu Technician 07/22/2020 (Wed) 17:28:39 No.4341

>>4337 That sounds good, but it would add more moving parts. Might make sense i n some designs, but I'd prefer to avoid it. Maybe you're thinking of a moe bot with small eyeballs? If the eyeballs in a bigger one are quite huge and you can get the eyeball out first, then you would need to cut through some silicone, then getting the servos out should work as well. My point is the balls should be held only by the eyesocked (circle) as part of the skull, but if needed it could expand and form a bigger circle.

Robowaifu Technician 07/22/2020 (Wed) 18:43:57 No.4346

>>4335 After thinking about this a bit more, it came to mind that using a magnet connected to the servo and metal part to the eyeball, while the eyeball is also connected to a cable might not work. I'm concerned that the position of the eyeball couldn't be controlled very well. However, I tested it with a hollow flexible ball and a weak magnet (without servo). Now I can imagine it better and I'm more confident. However it might need a automatic re-adjustement mechanism. In case the servo looses connection to the eyeball it might move without the eyeball and then not know where it is.

Robowaifu Technician 07/22/2020 (Wed) 19:27:32 No.4351

>>4341 >That sounds good, but it would add more moving parts. This is a good example of the basic eye-control mechanism I have in mind: >>1657 (1st pic) The access mechanism would simply add a slide-out frame for the entire assembly to rest on. >then you would need to cut through some silicone, I would propose a design that had firm plastic 'frames' around the eye sockets, with a clearly-delineated seam that wouldn't require any special treatment to slide the eyes out for access.

Robowaifu Technician 07/22/2020 (Wed) 20:00:58 No.4355

>>4351 Okay, that's different from mine, bc it will probably not be a seamless skin or plastic face on the outside. Which is okay, we need different approaches for every taste and use case. In such a case you might want to embrace the seams and color the face in different colors, bit like Marvels Nebula for example.

Robowaifu Technician 07/22/2020 (Wed) 20:44:49 No.4362

>>4355 Yeah, maybe a good idea. I'm not a art-designer per se, but I'll dabble around with different styles once my efforts have matured a bit.

Robowaifu Technician 07/26/2020 (Sun) 06:06:30 No.4541

Here's a lot of discussion towards moving eyes, following eyes and blinking eyes. https://dollforum.com/forum/viewtopic.php?f=6&t=103291 They're having a lot of trouble with space. Putting all kinds of stuff into a skull won't be easy. I thought about how to get the whole eye movement contraption smaller, thought the eyeballs would need to be bigger than human ones for anime eyes. One thing is, using servos as small as possible, not these bulky ones. If we can do it with normal dc motors then those would be even better. Of course, they are not so precise... I also thought I had a good idea, by putting one motor in the quite big eyeball. Problem is, I'd like to have the noise dampened. I also had the idea of using one motor for both eyes before, but human eyes can move independently. However, this does not apply for up/down movement, so we are down to three small motors then. The one in the middle would need to have an axis in both directions. Even better would be if the motors could be a bit deeper in the skull and also being used for something else (just a crude idea yet).

Robowaifu Technician 08/18/2020 (Tue) 15:16:42 No.4777

I just listened to that interview here just by chance, while doing some chores: https://youtu.be/bnsgsPjILyQ - Very fascinating. How image recognition needs to work, so the system can think about it. It's one model that looks for different things in a image. It's inspired by neuroscience. Idea is that perception and cognition can't be disconnected from each other. Natural signals, segmentation and top down controllability are the keywords, the latter means for example when we're zooming into a picture in our minds.

Robowaifu Technician 08/18/2020 (Tue) 22:43:13 No.4786

>>4541 >Putting all kinds of stuff into a skull won't be easy. Yeah, I think that's mostly a misguided idea. Trying to bio-mimic absolutely everything that's been so elegantly designed into humans is, well, humanly impossible. :^) Better to keep most of the componentry safely protected inside the torso, etc, IMO. >>4777 >perception and cognition can't be disconnected from each other. Yeah, that's probably correct. Certainly it seems to ring true with some of the positions Carver Mead suggests for the field of Neuromorphic Computing he basically originated. Much of what we think of as 'cognition' is in fact neurological at a basic level instead of a higher level, and the perceptions are pushed as far out towards the extremity of sensory perception as possible in most the the higher life form's biological systems.

Robowaifu Technician 08/19/2020 (Wed) 05:05:35 No.4792

>>4786 IMHO at least one relevant computer should be in the head, to imitate humans. Also, stuff we have to put into the head isn't only that, but we'll need a lot of mechanisms in general there, so space matters. Think of facial expressions, microphones, speakers (mb in the throat), heating for the skin, tongue moving around while still leaving some space... Cleaning mechanisms... Okay, this is going OT towards >>9 (face/head general). Further discussion on what to put into the head maybe better there?

Robowaifu Technician 08/19/2020 (Wed) 21:19:42 No.4801

>>4792 Yup, all good points Anon.

Robowaifu Technician 10/15/2020 (Thu) 12:37:32 No.5714

Boards with cameras attached came up in the thread on SBCs, here: >>5705 OAK from OpenCV and a cam from Jevois where the computer is part of the camera. Fascinating, but might be a problem if one wants to put it in eyeballs and also make thouse water proof. OAK seems to be a bit big and the cams from Jevois have aircoolers... On the other hand for development my concerns might be irrelevant, since one can build something with them and replace them later with something smaller and cooler. The Jevois camera has shutter sensor with inertial measure unit and digital motion unit, gyroscope and all kind of sensors, wow: https://youtu.be/MFGpN_Vp7mg

Robowaifu Technician 01/27/2021 (Wed) 12:47:34 No.8309

Here's a video on eye movement. https://youtu.be/FaC2RXBss2c The human eye has six muscles, it can even roll sideways a bit. However, what always bothered me, is that so many fembot eyes can move independently up and down. I still think this isn't necessary. I'll look for a motor with two axes, for up and down movement.

Robowaifu Technician 02/20/2021 (Sat) 15:47:38 No.8681

>related xpost >>8659

Robowaifu Technician 03/29/2021 (Mon) 13:45:02 No.9296

Someone with no maths/science background looking to get into openCV or just computer vision in general. My problem would be that I don't know how to approach the topic. I haven't looked into any code yet, because I want to learn how openCV works first. Though now that I'm typing it out, it does sound stupid to learn how openCV works without looking at the code. I thought to start by looking into the various algorithms but I hardly found any on DDG. Found this one site: https://www.upgrad.com/blog/computer-vision-algorithms/ but the more I look into the unknown terms, the more I went down a rabbit hole of unknown math terms. How do you guys recommend I get started with openCV? Should I only tackle algorithms when I see/need them in youtube tutorials I watch? What are some great resources to help get into openCV? I don't want to turn into a code monkey who only copies and pastes code to get their CV project working...

Robowaifu Technician 03/29/2021 (Mon) 17:14:20 No.9297

>>9296 I wanted to let you know I saw your post, and I'll attempt an answer suited to it. But that will be a while till I can. In the meantime, mind giving us a little better idea of both your experiences & level in technical work & programming? Do you know C++ already for example? Also, do you already have some small cameras or mobile platforms available, say like an RC car or something? It's OK if you don't have any of this stuff. You can get by without any at the beginning. But letting us know your situation will assist us with giving you a better answer.

Robowaifu Technician 03/29/2021 (Mon) 18:56:05 No.9298

>>9296 OK, the obvious first step is simply to get your news straight from the horse's mouth Anon. https://web.archive.org/web/20210308213203/https://docs.opencv.org/master/ There are tutorials there that will get you up and running quickly. The library's software is written in the C++ programming language. So, if you really want to dig into their system as an engineer, then I'd recommend you take that path through both the tutorials and the codebase. This will help you with gaining a deeper understanding of the library itself if you do. https://github.com/opencv/opencv Additionally, if you choose this path, then I can also be of more assistance to you here since I have some experience both with C++, and with using OpenCV in the context of C++ engineering aimed at some basic image processing tasks. OTOH, if your intent is more as a hobbyist simply exploring the ideas for CV that are out there, then there are both Python & JS tutorial pathways as well. I'm pretty sure Java is supported as well. The API of the system are quite similar for all the languages. The basic, fundamental idea to get your head around to start with is that OpenCV treats images as a big matrix (as in, the linear algebra matrix) of data. All image-processing operations orbit around this fundamental paradigm. Get your head around that notion from the beginning, and the rest of the library can quickly come into focus for you. Given your post, it sounds like perhaps you are just trying to get your head around the general field itself for the time being. In that case, you can just keep scouring the Internet for general articles, various forum posts, all the other imageboards like /robowaifu/ that's a joke BTW :^), blogs, and various YT feeds. There are also many scientific papers on the subject. Eventually you'll get the hang of it Anon. So it kind of depends on both your experience level and where you want to go with this as to what I'd suggest to you here Anon. Others here may have alternate perspectives they might choose to share with you. >=== -various prose edits

Edited last time by Chobitsu on 03/30/2021 (Tue) 02:05:18.

Robowaifu Technician 03/30/2021 (Tue) 02:14:04 No.9312

>>9297 >experiences & level in technical work & programming? Not much I'd say. Been using python for little over a year learning/doing web scraping and GUI building (with pyqt5). >Do you know C++ already? I forgot to mention but I was referring to openCV in python, would I still need to know C++ for that? I did go from C++ to python so its not completely unknown to me. >do you already have some small cameras or mobile platforms available? The only thing I have is a USB webcam. >>9298 I'd say my intent is more as a hobbyist. I just want to create programs for myself right now. >trying to get your head around the general field itself for now. Yeah. I guess as with anything, patience is the key. Thanks, anon.

Robowaifu Technician 03/30/2021 (Tue) 04:16:59 No.9316

>>9312 OK, that's all fine. Python and a webcam is just fine. I'd suggest you simply work through getting the system installed now, and working through all the Python tutorials. You can use some of them with your camera as well. https://web.archive.org/web/20201102214056/https://docs.opencv.org/master/d6/d00/tutorial_py_root.html Once you've worked through that as far as you can, then I would advise you to take a break from it all for a bit and then reconsider what you've learned and where you want to go next. I wasn't exactly clear yet Anon; do you think you're wanting to make your own robowaifu eventually?

Robowaifu Technician 03/30/2021 (Tue) 04:42:55 No.9317

>>9316 Thanks for the advice, anon. >do you think you're wanting to make your own robowaifu eventually? Yes.

Robowaifu Technician 03/30/2021 (Tue) 05:33:29 No.9321

>>9317 YW, and glad to hear it. Good luck with your robowaifu's design Anon!

Robowaifu Technician 04/12/2021 (Mon) 10:36:19 No.9793

Trying to do some computer vision experiments/projects on python with my xbox 360 kinect. Installed/built freenect module via https://github.com/OpenKinect/libfreenect and https://github.com/amiller/libfreenect-goodies. However, I can't seem to find any documentation on what functions it has or what one can do with it. All I have are the example scripts in those github repos. Is there any documentation or guides on this?

Robowaifu Technician 04/12/2021 (Mon) 11:31:37 No.9795

>>9793 >Is there any documentation or guides on this? I haven't looked into this yet Anon, but Willow Garage (of OpenCV, etc. fame) had the original open-sauce libraries surrounding 360 Kinect et al cameras for stereoscopy. You might research these venues for more information, if not documentation on your particular setup.

Robowaifu Technician 04/13/2021 (Tue) 03:18:54 No.9828

>>9795 I see. Thanks Anon, I'll look into Willow Garage.

Robowaifu Technician 04/20/2021 (Tue) 15:14:12 No.9970

>>9296 A wall has been hit! Doesn't look like I'll get past contours https://web.archive.org/web/20201102214056/https://docs.opencv.org/master/d6/d00/tutorial_py_root.html in the image processing sub-topic or any other sub-topic after without knowledge in mathematics. What do I need to get a deep understanding of to move forward with computer vision (and machine learning)? Apart from the obvious linear algebra, of course. Who would've thought, making fun of math classes back in school was not the way to go.

Robowaifu Technician 04/20/2021 (Tue) 15:16:29 No.9971

>>9970 Correction: I can move forward without math knowledge but it will limit my understanding to just knowing what the code does. I wont have any idea how the code does what it does.

Robowaifu Technician 04/20/2021 (Tue) 15:35:38 No.9973

>>9970 I don't know what exactly you need, but there are YouTube lessions, Coursera (also as Torrents) and the Brilliant app.

Robowaifu Technician 04/20/2021 (Tue) 15:53:22 No.9976

>>9971 >I wont have any idea how the code does what it does. I hesitate to encourage you in this one way or other beyond what I've already done ITT: >>9298 >So, if you really want to dig into their system as an engineer, then I'd recommend you take that path through both the tutorials and the codebase. This will help you with gaining a deeper understanding of the library itself if you do. However, I'm well aware that can be a literally years-long process, with no guarantees of success. The journey itself has plenty of rewards IMO > One of the basic lessons to learn early in life -- particularly in an engineering/design life -- is: Keep moving forward. That's it. Just don't quit Anon. Very likely if you can imagine something, then it can actually be done. No guarantees how long or short the journey to success may turn out to be however. This depends in large part on yourself, and the groundwork of effort you've already laid for/in yourself. Keep looking for another way past that mountain Anon. Over, under, around, or through ... you can make it! And as far as creating your own robowaifus goes, you could do worse than making your way here to our board. You're already a bit ahead of the game in that regard, Anon. Now, with that out of the way :^), can you clarify a bit more? >contours and the link you provided have a bit of a disconnect. What about 'contours' is your current problem, can you explain specifically?

Robowaifu Technician 04/20/2021 (Tue) 16:16:25 No.9979

>>9976 >and the link you provided have a bit of a disconnect. What about 'contours' is your current problem, can you explain specifically? >UD: I presume this is the link of interest here, Anon? https://web.archive.org/web/20201028001740/https://docs.opencv.org/master/d3/d05/tutorial_py_table_of_contents_contours.html

Robowaifu Technician 04/21/2021 (Wed) 03:28:22 No.10018

I just want to summarize my observations so far regarding this topic. I've been looking at various self-driving robot/rc car projects by students on youtube. They usually fall into the following: 1. Raspberry Pi + OpenCV. The video from the camera has to be segmented... i.e. turned into grayscale, edge-detected, and lane markers drawn. You then just need to draw a line right through the center of the road, that will be the steering vector which you then command the steering servo. I haven't touched OpenCV yet though, and I might just bypass the raspberry pi entirely since the trend is to use Nvidia Jetson boards now which have their own preinstalled things. 2. Nvidia Jetson (Nano or other boards). There's the Jetracer: https://github.com/NVIDIA-AI-IOT/jetracer which comes with its own precompiled software as a Jetcard image. (I was able to secure a TT02 chassis for only 90 bucks, it's going to be delivered any day now so i can't wait to start on this. I find it easier to customize hobby kits as you're building it rather than risk destroying a ready to run with irreversible changes.) Now does this apply to robowaifus? Well if she is on wheels at least, the steering vector would be fed into a differential H-bridge DC motor driver. Now that's just one side of robot vision -- the ability to navigate. How about FPV? Though I don't have a drone (I don't want to start a hobby where EVERYTHING comes from China, unlike RC cars at least), I've been looking at various FPV setups. They usually go like this: 1. 1000TVL type of analog FPV camera + OSD (onscreen display board, usually based on MAX7456) + VTX (video transmitter, usually an Eachine or whatever is compatible with the 5.8Ghz headset). So yeah, apparently analog is still applicable since it has low latency. If you want to record video, best practice is to just attach a second HD lens and make sure it is pointing at the same direction as the analog camera, they usually call this hybrid cameras. 2. Wifi FPV -- I actually tried this with my RC cars, it's blegh... you would have to drive really, really slow for it to be usable, but for slow H-bridge DC motor robots it would be perfect. The advantage is that if you are viewing through a high spec machine such as a laptop you can have a fancy overlay for your onscreen display (cue the awesome video game style HUDs) and just record your stream directly to your disk. So first I have to actually make one of those self driving robots, then I can prove if its practical to just use the camera input (usually a raspberry pi class of camera), Y-split that into an FPV feed and a navigation feed. The navigation feed will be turned into grayscale or color segmented blocks and lines, while the FPV feed will be wi-fied to the client machine which will handle most of the OSD processing.

Robowaifu Technician 04/21/2021 (Wed) 03:57:22 No.10019

Oh by the way, a few weeks back I was looking at the cheapest way to get into camera robots. Apparently an ESP32 has just barely enough capability to capture and stream images, previously they paired discrete ESP32s with OV-series basic cameras, until they decided to directly sell combos such as this ESP32-cam. It is too slow to be used for RC-speed vehicles, but for slow indoor robots such as this: https://github.com/gitnabeshin/ESP32CamRobot it is fine. I'm posting this since it's what go my feet wet with non-AI computer vision and control, so now I have enough confidence to actually try AI self-driving as the next step.

Robowaifu Technician 04/21/2021 (Wed) 07:12:11 No.10021

>>9973 Didn't know brilliant was free. Well there are YouTube lessons and such but I don't know what I need to look up the YouTube lessons for. I've been following 3blue1brown and MyWhyU for lessons in linear algebra. Guess I'll just have to cover the math topics when they appear in the way to learning OpenCV. >>9979 Yeah, that's the link. More specifically, this is where my confusions started: https://web.archive.org/web/20201031223524/https://docs.opencv.org/master/dd/d49/tutorial_py_contour_features.html >>9976 Everything past the contours section in the "Image processing with OpenCV" sub-topic in https://web.archive.org/web/20201102214056/https://docs.opencv.org/master/d6/d00/tutorial_py_root.html and all the other sub-topics past "Image processing with OpenCV" (i.e. Feature Detection and Description, and onwards) seem to explain their workings in mathematical terms with formulae I have no hopes of understanding right now. >What about 'contours' is your current problem, can you explain specifically? I don't know what "moments" (which seems to be a math derived term) of an image are. And nor am I understanding what cv.Moments() returns. That's an inspirational image!

Robowaifu Technician 04/21/2021 (Wed) 11:47:26 No.10023

>>10018 >>10019 Quality information, thank you Anon. >>10021 >I don't know what "moments" (which seems to be a math derived term) of an image are. It's just a abre-use of the idea of 'moments' from physics (cf. >>9887, et al). In that context a 'moment' is kind of a calculus notion, and can be thought of as sort of an instantaneous 'snapshot' or moment of the rate of change in something. In the physics context the implication is that it's a description of some particular force being imparted by the inertia of some particular mass under consideration. This abre-use of this term by OpenCV devs is simply that there are 'rates-of-change' types of things going on in an image (for example, it's contour lines) that you would want to track. For example the rate of change of brightness of pixels in a given direction forms a 'contour' of that change, kind of like a contour map in geography typically describes the changes in elevation for instance. Make sense? https://www.quora.com/What-exactly-are-moments-in-OpenCV?share=1 >And nor am I understanding what cv.Moments() returns. Well remember how I said all image data (and practically every other form of data) in OpenCV is a big matrix? Well, that's what gets returned. A set of matrices describing contour characteristics of that input image, etc. https://web.archive.org/web/20170815042506/http://docs.opencv.org/master/d8/d23/classcv_1_1Moments.html

Robowaifu Technician 04/21/2021 (Wed) 12:19:40 No.10025

>>10018 >>10019 Yes, this is good to know. Thanks. >>10021 Using the Brilliant app isn't free, or wasn't last time I checked. Only the introduction. To access everything it did cost 60-80€ per year. In general, when I get stuck at something, I'll look into other sources of information. Reddit isn't popular on image boards, but having an account there, completely focused on learning stuff und asking around on tech questions might be a good idea. I'm also sure there are good videos on YouTube on OpenCV and tutorials on the net which are even related to using it with Raspberry Pi. >>10023 Not the anon asking, but this was interesting to me as well. Thanks.

Robowaifu Technician 05/01/2021 (Sat) 02:00:15 No.10217

Since we don't have some kind of general Robowaifu Sensornets or something like it yet, I'm just going to drop this here since it's probably the closest thing we have atm. So, a newly-reported spying mechanism invented to keep all the population under close, real-time surveillance via their goyphones uses bat-like echolocation to synthesize an environment's image data. Including the people and pets in it. While they themselves mean it purely for evil purposes (George Orwell's Telescreens spying system, but mobile & no need for a fixed camera), perhaps we can in fact use the same technology approach for good? I expect this can certainly enhance a robowaifu's realtime situational awareness capabilities? https://web.archive.org/web/20210501015028/https://www.dailymail.co.uk/sciencetech/article-9525529/Scientists-equip-smartphones-bat-sense-technology-generate-images-sound.html

Robowaifu Technician 05/25/2021 (Tue) 15:26:23 No.10630

>>10217 Paper is titled, "3D imaging from multipath temporal echoes" for anyone interested. PDF available: https://arxiv.org/abs/2011.09284

Robowaifu Technician 05/25/2021 (Tue) 15:37:39 No.10631

>>10630 Briefly skimming the paper, they seem to use a 3D camera to collect training data (like a Kinect, but more expensive), whilst the actual sensor for the acoustic system consists of a stock-standard Logitech PC speaker, and a microphone. Echos are recorded, passed into a network which reconstructs a depth map. All-in-all, it's a super simple system to replicate, and one that could easily be made for almost no cost, and very compact. Creating training data might be a pain, but given that I already have a Kinect, I'd be happy to volunteer to rig up a Kinect+Pi+mic+speaker and just walk around my local area collecting huge amounts of data. Let me know if anyone's interested. I'll probably do it myself anyway, but if there's any specific areas you'd like data for, write it down and I'll see what I can do.

Robowaifu Technician 05/25/2021 (Tue) 15:46:31 No.10632

>>10630 >>10631 Wow, great work Anon. Much appreciated. >Let me know if anyone's interested. I'm certainly interested to know about your results, though I don't have any personal requests ATM. >Kinect Hmm. Hasn't that product been discontinued now? I wonder what good alternatives exist for us today if any? >PDF available Always a good idea to archive a copy here on the board for access in case anything is ever pulled in the future. >

Robowaifu Technician 05/25/2021 (Tue) 16:05:29 No.10633

>>10632 I suggested the Kinect because it's the most common consumer level depth camera, and I already have one. Any 3D camera can be used, as long as you can get a depth map image from it. Regarding alternatives, they use an Intel Realsense D435, but that's about $300 new. Even at that price, it's still considered one of the cheaper depth cameras. Note that the 3D camera is only needed for training data collection. Once you've got your data, the system only requires a directional speaker (i.e a standard speaker), and a microphone. In theory, you could probably use the voice speaker + ears of the robowaifu, but you'd want to tweak your training data if you did.

Robowaifu Technician 05/26/2021 (Wed) 17:52:39 No.10646

>>10633 There are some other options: JeVois: https://youtu.be/MFGpN_Vp7mg eYs3D: https://youtu.be/NXWHYH0v638 e-con: https://youtu.be/vzXzz7VmWzo SceneScan from Nerian: https://youtu.be/mJ5UlXNguvg SP1 from Nerian: https://youtu.be/vVVjFqUkG4E Cadence: https://youtu.be/wPxi4ZYSJC0 Stereolabs ZED: https://youtu.be/7_8XLI99dno This here is software, trying to do depth perception on any camera: KudanSLAM: https://youtu.be/Pgami8jglmE

Robowaifu Technician 05/26/2021 (Wed) 20:59:38 No.10648

>>10646 Thanks for the work Anon.

Robowaifu Technician 05/26/2021 (Wed) 21:40:48 No.10651

>NIKKOR Lens Simulator https://imaging.nikon.com/lineup/lens/simulator/ An interesting utility the fine Anons on /p/ mentioned.

Robowaifu Technician 05/27/2021 (Thu) 06:02:45 No.10654

>related crosslink (>>10645, ...)

Robowaifu Technician 06/03/2021 (Thu) 03:21:29 No.10759

M-LSD: Towards Light-weight and Real-time Line Segment Detection: https://github.com/navervision/mlsd You Only Look at One Sequence https://github.com/hustvl/YOLOS >TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO object detection benchmark. >Directly inherited from ViT (DeiT), YOLOS is not designed to be yet another high-performance object detector, but to unveil the versatility and transferability of Transformer from image recognition to object detection. CLIP (Contrastive Language-Image Pre-Training) https://github.com/openai/CLIP > CLIP is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and >3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision. Dynamic Vision Transformer (DVT) >We develop a Dynamic Vision Transformer (DVT) to automatically configure a proper number of tokens for each individual image, leading to a significant improvement in computational efficiency, both theoretically and empirically. https://github.com/blackfeather-wang/Dynamic-Vision-Transformer

Robowaifu Technician 06/03/2021 (Thu) 05:16:07 No.10761

>>10759 > Towards Light-weight and Real-time Line Segment Detection < Light-weight and Real-time Well I certainly find that combination of terms very encouraging Anon. Simply b/c that's the only combination that's going to be successful for devising reasonably inexpensive, mobile, autonomous gynoid companions robots. AKA robowaifus. It's gratifying seeing a small cadre of researchers seeming to be breaking off from the standard-issue Globohomo toadies, and tackling the real-world concerns of the billions of regular people, regarding AI advancement at the personal level. >tl;dr We'll never achieve an AI/Robowaifu Renaissance if we every one have to be beholden to the Globohomo cloud, hat in hand, begging "Please sir, may I have some more?"

Robowaifu Technician 06/03/2021 (Thu) 10:27:38 No.10762

>>10761 test

Making nice placeholder eyes SophieDev 06/21/2021 (Mon) 12:06:09 No.10991

Found this reference image that is used by 'Ocularists' who paint glass eyes/ocular prosthesis. Might help some anon who is trying to paint their robowaifu a pair of pretty peepers. A note about eye lighting/refractive highlights: Despite what anime has taught us, I think it may be best not to paint on any lighting effects/refractive highlights (at least on non-cartoon irises). The highlights are best formed by a coating of clear epoxy resin. I say this because such refractive highlights move throughout the lens of the eye and across the surface of the sclera. So if they are painted on, they will be static...and this just looks wrong. Especially if the eye is also coated in clear resin. Because then you have the natural light reflected off the resin along with the static painted highlights. I think this is why Will Cogley leaves a concave depression inside his eye mold so that clear epoxy resin can set inside and form a lens shape, which reflects and refracts the light just like a real eye. Some measurements: (from the Wikipedia article on the human eyeball, and a scientific paper on the human iris). Human eyeball average height = 23.7mm (0.93 in) Human eyeball average width = 24.2mm (0.95 in) the eyeball is slightly wider than it is tall - which is why I didn't just give one diameter, but I know for the sake of simplicity that a 25.4mm (1 inch) diameter eyeball looks correct inside a life-sized human head. Human eyeball average anterior to posterior diameter = 24mm (0.94 in) This may be less important for a robowaifu since often they only use the front half of the eyeball, with the back/interior being fastened to a servohorn or pushrod in some way. Human eyeball average volume = 6 cubic centimeters (0.37 cu in) Iris size range = 10.2 to 13.0 mm in diameter with an average size of 12 mm in diameter, and a circumference of 37 mm. (From this paper: https://www.di.ubi.pt/~lfbaa/pubs/iscit2013.pdf 'Iris Surface Deformation and Normalization' by Somying Thainimit, Luis A. Alexandre and Vasco M.N. de Almeida. Fully dilated pupil = 4 to 8mm in diameter Fully constricted pupil = 2 to 4 mm in diameter (larger doe-eyed pupils are usually better since small ones can make the eyes look angry/scary/psycho - unless you are going for that, of course LOL). Also, please note that if you are mixing and pouring clear epoxy resin into those half-sphere (cabochon) silicone molds, remember to: A.) Wear an apron or some cheap clothes since that stuff sticks to everything worse than shit to a blanket. B.) Make sure that your mixing cups and stirrer are both disposable and as clean as possible (wear latex or nitrile gloves to avoid getting the molds greasy and wash everything with warm water, soap and even an isopropyl alcohol rinse if you have that). C.) Pour/mix the resin and hardener SLOWLY. There is plenty of time. Even with a thin layer of epoxy you've got about half an hour before it really starts to set, and I know from experience that 1 inch lenses made with the stuff take over 24 hours to set through completely. If you rush and try to beat epoxy resin like eggs in an omlette then you will just end up with thousands of tiny bubbles and a pair of cloudy looking lenses. (Also, don't do what I did and poke the back of your resin lense at about 20 hours in just to see if it has set yet. Because you end up with a dented, ruined lense. Best to wait for about 30 hours before popping them out just to be safe.)

Yellowing of Epoxy Resin Lenses SophieDev 06/21/2021 (Mon) 14:32:21 No.10992

>>10991 I should also mention that even if you get the best epoxy resin on the market, it will eventually yellow with time. UV light exites the chemical bonds and turns the resin yellow from the outside, in. If your robowaifu has eyelids that move, this is a good reason to close her eyes while she is inactive. It also means that the eyeballs themselves should be made with a degree of replacability, since you'll want to swap them out after a couple of years when they start to yellow noticeably. So maybe spend an hour or two painting in the irises but it's probably best not to try crafting a masterwork prosthesis ;D

Chobitsu 06/21/2021 (Mon) 17:55:59 No.10993

>>10991 >>10992 Excellent information on this topic. The effort here is much appreciated Anon.

Robowaifu Technician 06/22/2021 (Tue) 00:16:16 No.10995

>>10993 Yes, good infos. I drop some videos into it, which I watched a while ago. https://youtu.be/_KAlB_toNDg https://youtu.be/RO88zp5Pblg https://youtu.be/VMlZfpO1vPA https://youtu.be/KrLGnCC0C8E

Chobitsu 06/22/2021 (Tue) 08:27:47 No.10998

>>10995 Thank you kindly Anon.

Robowaifu Technician 09/15/2021 (Wed) 09:51:26 No.13163

I already mentioned there's open source software that can see heartbeats and micro facial movements called Eulerian Video Magnification in another thread, but I figure I might as well mention it here too: https://www.youtube.com/watch?v=ONZcjs1Pjmk As for hardware, I was thinking simple solid black camera eyes if I couldn't get 3-axis movement (vertical, horizontal, convergence) working with cameras without taking up too much space. I haven't thought too much about eyes other than that, except that I was thinking she should see in IR, to help in poor lighting without blinding me with LEDs on her face. I remember reading about something called a Modulo Camera that supposedly never over-exposes or something, so maybe it could just use a bigger camera for better night vision? There's also something called a "Light field camera" that keeps everything in focus, but I'm not sure how useful that is for robot vision, I just think it's neat.

Robowaifu Technician 09/16/2021 (Thu) 00:08:02 No.13206

>>13163 That's an interesting concept Anon, thanks. Yes, I think cameras and image analysis have very long legs yet, and we still have several orders of magnitude improvements yet to come in the future. It would be nice if our robowaifus (and not just our enemies) can take advantage of this for us. We need to really be thinking ahead in this area tbh.

CMOS v CCD sensors Robowaifu Technician 12/22/2021 (Wed) 13:48:09 No.14751

It seems like CMOS is the default sensor for most CV applications due to cost. But seeing all these beautiful eye designs makes me consider carefully how those photons get processed into signal for the robowaifus. Cost aside, CCD as a technology seems better because the entire image is processed monolithically, as one crisp frame, instead of a huge array of individual pixel sensors, which I think causes noise which has to be dealt with in post image processing. CCD looks like its still the go-to for scientific instruments today. In astrophotography everyone drools over cameras with CCD, while CMOS is -ok- and fits most amateur needs, the pros use CCD. Astrophotography / scientific www.atik-cameras(dot)com/news/difference-between-ccd-cmos-sensors/ This article breaks it down pretty well from a strictly CV standpoint. www.adimec(dot)com/ccd-vs-cmos-image-sensors-in-machine-vision-cameras/

Chobitsu Board owner 01/19/2022 (Wed) 21:33:34 No.15032

>>14751 That looks very cool Anon. I think you're right about CCDs being very good sensor tech. Certainly I think that if we can find ones that suit our specific mobile robowaifu design needs, then that would certainly be a great choice. Thanks for the post!

Robowaifu Technician 04/16/2022 (Sat) 06:46:32 No.15888

iLab Neuromorphic Vision C++ Toolkit The USC iLab is headed up by the PhD behind the Jevois cameras and systems. http://ilab.usc.edu/toolkit/

Robowaifu Technician 04/25/2022 (Mon) 08:37:04 No.16002

>(>>15997, ... loosely related)

Robowaifu Technician 01/26/2023 (Thu) 17:01:46 No.19060

>"Follow Me" eyes (crosslink): >>19037 - I somehow forgot that we had a dedicated thread for eyes.

Robowaifu Technician 06/24/2023 (Sat) 18:52:08 No.23411

> conversation-related (>>23398, ...)

NoidoDev ##eCt7e4 06/24/2023 (Sat) 21:46:52 No.23435

Related: >>23405 >Once thing I would like to do with a board that allows for more than one camera, would be to have a way to use this for creating a somewhat 3D model of the world. Especially be able to know the distance of an object it recognizes. This will be absolutely crucial to understand the world. >>23410 >Stereo Depth Cameras ... using triangulation >>23431 > auto-mesh generation Is this about generating meshes from 2D pictures. I just wrote somewhere that I wonder how video to 3D model would work. It's possible to use AI generated videos to feed a game engine and render a even better video. I guess the background is done using this "auto-mesh generation" then (pose estimation to bone model for characters).

Robowaifu Technician 06/24/2023 (Sat) 22:09:50 No.23436

>>23431 >Motion isn't req'd. when you already know the dimensions of the object, you can only use a single image when you already know the actual height then its just a matter of measuring the difference between the real height vs the image height, its how snipers have to figure out distances in their scope when they dont have a rangefinder, it would need to keep a database of dimensions for known objects otherwise it has to go into pajeetmode to emulate stereoscopic vision, its doable but it seems like too much hassle when you can just use two cameras

Chobitsu 06/24/2023 (Sat) 22:57:53 No.23437

>>23436 >it would need to keep a database of dimensions for known objects otherwise it has to go into pajeetmode to emulate stereoscopic vision Agreed, and that's an aspect of the 'well-calibrated' camera(s) part of the equation. For instance, when a robowaifu can remain in the relative safety of her master's home, then she can have the luxury of perfectly pre-learning basically everything in his space. This is a big win for all of her on-the-fly, object recognition/distance/volume/kinematic/mass/force/pose -estimate calculations. Including him, of course. :^) >"Master!? Have you been putting on weight again?" >=== -prose edit, fmt -add funpost spoilers

Edited last time by Chobitsu on 06/24/2023 (Sat) 23:34:03.

Chobitsu 06/24/2023 (Sat) 23:18:24 No.23438

>>23435 >Is this about generating meshes from 2D pictures. Yes. It works far better using a combination of stereo depth cameras, and the ability to proactively transform the camera(s) around the object(s) in question. Much like a robowaifu (or a human photographer) would be able to do. The primary point being to highly-accurately model the world around her, including her own master and other humans. (For example: their own children romping about. :^)

Robowaifu Technician 06/24/2023 (Sat) 23:26:03 No.23440

>>23437 lol, i forgot it would already need a database anyway for those things, still calculating based on parallax is way simpler than comparing the image to known dimensions, especially if the object is rotated then you need to know the angle to get a real height to compare to

Chobitsu 06/24/2023 (Sat) 23:44:25 No.23441

>>23440 >parallax is way simpler This comment touches on a technical aspect of computation, and the up-front costs involved with setup. But fair enough Anon. I'm sure it's more reliable, in general, than simplistic dimension analysis, particularly in tricky lighting conditions.

NoidoDev ##eCt7e4 06/25/2023 (Sun) 08:07:15 No.23456

>>23436 >need to keep a database of dimensions for known objects Thanks for pointing this out, but that's something I want to do anyways. Robowaifus should have a rough estimate on the traits of identifiable objects, e.g. weight and size. This can be taken from some public databases or LLMs. Then on top of that, if they see something close to an unknown object, which they can identify, they should be able to draw a conclusion about the size of the unknown one based on that.

NoidoDev ##eCt7e4 07/02/2023 (Sun) 22:25:17 No.23776

>Deci is thrilled to announce the release of a new object detection model, YOLO-NAS - a game-changer in the world of object detection, providing superior real-time object detection capabilities and production-ready performance. Deci's mission is to provide AI teams with tools to remove development barriers and attain efficient inference performance more quickly. https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md

Robowaifu Technician 07/02/2023 (Sun) 23:17:23 No.23777

>>23776 what is mAP?

NoidoDev ##eCt7e4 07/02/2023 (Sun) 23:30:25 No.23779

>>23777 The diagram indicates it's a form of accuracy to compare such models. >Mean Average Precision (mAP) https://blog.paperspace.com/mean-average-precision/ Found via: https://duckduckgo.com/?q=map+machine+learning+accuracy

Chobitsu 07/03/2023 (Mon) 13:54:44 No.23788

>>23776 Very low-latency in detection is vital, insofar as her autonomous safety is concerned. The ideal is human-level speed at object recognition (or even faster). We're probably getting pretty close on smol devices already, so I predict we'll reach this goal generally by the time the first real-world robowaifus begin rolling out. Thanks Anon.

Optimization of visual regonition NoidoDev ##eCt7e4 08/26/2023 (Sat) 15:51:51 No.24911

>>24909 - the computers connected to the eyes (cameras) should have different ways of sharing data with other computers, e.g. just sharing body movement analysis and recognition info as a text stream, same for the person being detected, or some emotional indicators. Sending photos and videos should be very limited, only sending encrypted files, also the system should mostly not store this data. Some home server might store and process some data for fine tuning, but needs to receive this data encrypted. Decision what to share should be made based on overall context coming from the general cognitive architecture >>24783 - fast and efficient segmentation of images (FPGAs?) - different variants or the same image, created very fast, maybe using FPGA. For further processing, e.g. only processing a low res partial image of an object to keep track of. The creation of that low res partial image should be done by a specialized system close to the cameras. - using object detection models based on context informed by the general cognitive architecture >>24783 or just based on awareness of what room she's in and maybe even at what she's looking at. So they can be smaller, faster and more specialized, including some models which are trained on the specific training data related to the household (photos and videos of the home environment).

NoidoDev ##pTGTWW 10/01/2023 (Sun) 11:48:12 No.25665

>LERF optimizes a dense, multi-scale language 3D field by volume rendering CLIP embeddings along training rays, supervising these embeddings with multi-scale CLIP features across multi-view training images. After optimization, LERF can extract 3D relevancy maps for language queries interactively in real-time. LERF enables pixel-aligned queries of the distilled 3D CLIP embeddings without relying on region proposals, masks, or fine-tuning, supporting long-tail open-vocabulary queries hierarchically across the volume. >With multi-view supervision, 3D CLIP embeddings are more robust to occlusion and viewpoint changes than 2D CLIP embeddings. 3D CLIP embeddings also conform better to the 3D scene structure, giving them a crisper appearance. https://www.lerf.io https://github.com/kerrj/lerf https://drive.google.com/drive/folders/1vh0mSl7v29yaGsxleadcj-LCZOE_WEWB?usp=sharing https://arxiv.org/abs/2303.09553

NoidoDev ##pTGTWW 10/02/2023 (Mon) 19:57:37 No.25732

> Face recognition Not tested, just looking what's available: https://github.com/cmusatyalab/openface Following quotes are from Reddit, not from me... https://github.com/ageitgey/face_recognition > I have tried this out. It's easy to code and accurately recognizes faces. The problem is it can't even detect faces 1 feet away from the camera. https://github.com/timesler/facenet-pytorch (FaceNet & MTCNN) > This can detect and recognize faces at a distance, but the problem is it can't recognize unknown faces correctly. I mean for unknown faces it always tries to label it as one of the faces from the model/ database encodings. https://github.com/serengil/deepface > I have tried VGG, ArcFace, Facenet512. The latter two gave me good results. But, the problem is I couldn't figure out how to change the detection from every 5 seconds to real-time. Also, I couldn't change the camera source. (If anyone can help me with these please do). Also, it had fps drops frequently. https://github.com/deepinsight/insightface > Couldn't test this yet. But in the demo YT video it shows the model incorrectly detecting a random object as a face. If someone knows how well this performs please let me know. https://www.reddit.com/r/computervision/comments/15ycwom/face_recognition_whats_the_state_of_the_art/ This here seems to be the best: https://github.com/ZoneMinder/zoneminder the Reddit link above has some thread and patch for detecting faces on distance, I think.

Kiwi 10/11/2023 (Wed) 18:33:22 No.25927

LLaVA: Large Language and Vision Assistant (https://llava-vl.github.io/) A project to integrate vision into large language models. Though very new and young as a concept, adding visual context to language models has tremendous potential. Notably, a waifu which can understand correlations between what she perceives in her environment with what she is told can lead to much more naturally feeling interactions. Fingers crossed for a fork that implements YOLO (https://pjreddie.com/darknet/yolo/) rather than CLIP (https://openai.com/research/clip) for better compute and memory efficiency. Getting this to run at sub 10 watts should be a goal.

Edited last time by Kiwi_ on 10/11/2023 (Wed) 18:33:59.

NoidoDev ##pTGTWW 10/28/2023 (Sat) 23:53:42 No.26132

I was working on this here >>26112 using OpenCL to make video processing faster. So I got this here recommended by YouTube: https://www.youtu.be/0Kgm_aLunAo Github: https://github.com/jjmlovesgit/pipcounter This is using OpenCV to count pips on dominos, and does it much faster and better than GPT4-Vision. I wonder if it would be possible to have a LLM adjust the code dependent on the use case, and maybe having a library of common patterns to look out for. Ideally one would show it something new, it would detect the outer border like the stones here and then adjust till it can catch the details on all of these objects which are of interest. It could look out for patterns dependent on some context, like e.g. a desk.

Chobitsu 10/30/2023 (Mon) 00:54:58 No.26142

>>26132 >and does it much faster and better than GPT4-Vision. Doesn't really surprise me. OpenCV is roughly the SoA in hand-written C++ code for computer vision. You have some great posts ITT Anon thanks... keep up the good work! :^)

Faster and more efficient methods than neural networks [email protected] 10/30/2023 (Mon) 05:09:21 No.26146

There are several libraries and approaches that attempt to achieve generalized object detection within a context, although creating a completely automatic, context-based object detection system without predefining objects can be a complex task due to the variability of real-world scenarios. However, libraries and methodologies that have been utilized for more general object detection include: 1. YOLO (You Only Look Once): YOLO is a popular object detection system that doesn't require predefining objects in the training phase. It uses a single neural network to identify objects within an image and can detect multiple objects in real-time. However, it typically requires training on specific object categories. 2. OpenCV with Haar Cascades and HOG (Histogram of Oriented Gradients): OpenCV provides Haar cascades and HOG-based object detection methods. While not entirely context-based, they allow for object detection using predefined patterns and features. These methods can be more general but might not adapt well to various contexts without specific training or feature engineering. 3. TensorFlow Object Detection API: TensorFlow offers an object detection API that provides pre-trained models for various objects. While not entirely context-based, these models are designed to detect general objects and can be customized or fine-tuned for specific contexts. 4. Custom Object Detection Models with Transfer Learning: You could create a custom object detection model using transfer learning from a pre-trained model like Faster R-CNN, SSD, or Mask R-CNN. By fine-tuning on your own dataset, the model could adapt to specific contexts. 5. Generalized Shape Detection Algorithms: Libraries like scikit-image and skimage in Python provide various tools for general image processing and shape analysis, including contour detection, edge detection, and morphological operations. While not object-specific, they offer tools for identifying shapes within images. Each of these methods has its advantages and limitations when it comes to general object detection. If you're looking for a more context-aware system that learns and adapts to various contexts, combining traditional computer vision methods with machine learning models trained on diverse images may be a step towards achieving a more generalized object detection system. However, creating a fully context-aware, automatic object detection system that adapts to any arbitrary context without any predefined objects is still a challenging area of research. ----------------- In terms of computational requirements, here's a general ranking of the mentioned object detection methods based on the computational power and RAM they might typically require: 1. OpenCV with Haar Cascades and HOG: - Computational Power Needed: Low to Moderate - RAM Requirements: Low - These methods are computationally less intensive compared to deep learning-based models. They can run on systems with lower computational power and memory. 2. Generalized Shape Detection Algorithms (scikit-image, skimage): - Computational Power Needed: Low to Moderate - RAM Requirements: Low to Moderate - While these libraries might need slightly more computational power and RAM than Haar Cascades and HOG, they are still less demanding compared to deep learning-based models. 3. TensorFlow Object Detection API: - Computational Power Needed: Moderate to High - RAM Requirements: Moderate to High - Running pre-trained models from the TensorFlow Object Detection API might require more computational power and memory compared to traditional computer vision methods due to the complexity of the deep learning models. 4. Custom Object Detection Models with Transfer Learning: - Computational Power Needed: Moderate to High - RAM Requirements: Moderate to High - Training custom object detection models with transfer learning typically requires moderate to high computational power and memory, especially during the training phase. 5. YOLO (You Only Look Once): - Computational Power Needed: High - RAM Requirements: High - YOLO models are relatively demanding in terms of computational power and memory. They require more powerful machines due to their deep neural network architecture and real-time processing capabilities. The exact computational requirements and memory usage can vary based on the specific hardware, image sizes, complexity of the models, and the scale of the operations being performed. Deep learning models, in general, tend to demand more computational resources compared to traditional computer vision methods. If you're working with large datasets or real-time processing, more powerful hardware configurations would likely be necessary to achieve optimal performance. -------- https://github.com/opencv/opencv/tree/master/data/haarcascades -------- If your goal is to detect shapes without knowing the specific objects at first, OpenCV's contour detection methods combined with image processing techniques could be more appropriate than scikit-learn. Once shapes are identified, further analysis or categorization can be performed using traditional machine learning algorithms from scikit-learn or other methods.

[email protected] 10/30/2023 (Mon) 05:10:20 No.26147

>>26146 Understood. If the goal is to identify various objects within a specific context (like a desk) without predefining the objects, and the lighting conditions might vary, a more flexible approach using general computer vision techniques can be applied. This could involve methods such as contour detection, edge detection, and basic image processing techniques to identify objects within the context of a desk. You might use a more generalized version of object detection that isn’t specific to particular objects but rather identifies any distinguishable shape within the context. Here’s an example: python


import cv2

# Read the image
image = cv2.imread('path_to_your_image.jpg')

# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Apply thresholding or other preprocessing techniques to enhance object edges
# ...

# Find contours
contours, _ = cv2.findContours(processed_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

detected_objects = []
for contour in contours:
    # Apply some conditions to filter objects based on size, shape, etc.
    # For instance, you might filter by area or aspect ratio

    area = cv2.contourArea(contour)
    if area > some_minimum_area_threshold:
        detected_objects.append(contour)

# Count and display the number of detected objects
print(f"Number of objects detected: {len(detected_objects)}")

This code applies general techniques such as contour detection to identify distinguishable shapes within the context of the desk. The process of identifying objects relies on the uniqueness of their shapes and their contrast against the background. The challenge in this approach lies in how the algorithm distinguishes objects based on their shapes and sizes. It might not identify specific objects but rather any shape that meets certain criteria (like area, aspect ratio, etc.) within the provided context (in this case, the desk). This method might detect a variety of objects but could also identify false positives or miss some objects. Fine-tuning the conditions for object identification (like area thresholds or other characteristics) can improve the accuracy of detection within the context of the desk, considering the variability in lighting and object characteristics.

Robowaifu Technician 10/31/2023 (Tue) 01:13:49 No.26151

I suppose this is a good thread to use for discussing this concept: a swarm of small drones available for a robowaifu's use for enhanced perimeter/area surveillance, etc.

moondream1 - Tiny AI Vision Language Model NoidoDev ##pTGTWW 02/10/2024 (Sat) 12:42:19 No.29286

>1.6B parameter model built using SigLIP, Phi-1.5 and the LLaVA training dataset. Weights are licensed under CC-BY-SA due to using the LLaVA dataset. Try it out on Hugging Face Spaces! https://github.com/vikhyat/moondream https://huggingface.co/spaces/vikhyatk/moondream1 https://youtu.be/oDGQrOlmC1s >The model is release for research purposes only, commercial use is not allowed. >circa 6GB or 4GB quantized

Chobitsu 02/12/2024 (Mon) 06:09:46 No.29367

>>29286 Thanks. Do you have any views on it's usefulness r/n, Anon?

Robowaifu Technician 02/24/2024 (Sat) 20:34:17 No.29911

For people looking for a Kinect, I've had success finding them at electronics recycle centers. RE:PC in Seattle had a big bin. Also, I just checked and they're going for under ten dollars on eBay lol. I had also heard that the Kinect's depth camera isn't all too necessary at this point due to how good neural networks have gotten recently. Is there any merit to that?

Mechnomancer 02/24/2024 (Sat) 23:30:39 No.29915

>>29911 Unless you're using the kinect to do some sort of 3d mapping you can get stuff like pose landmark detection using AI stuff and a standard webcam, like Gulag's open-source library Mediapipe. https://mediapipe-studio.webapps.google.com/home I use some of their models for object recognition :D

Chobitsu 02/25/2024 (Sun) 01:25:49 No.29918

>>29911 >>29915 Thanks for both the great tips, Anons! Cheers. :^)

NoidoDev ##pTGTWW 02/25/2024 (Sun) 15:40:01 No.29937

>>29367 I think we would need workarounds if such models are not fast enough, but wow it needs less than a second to identify common objects in a photo of a room from a home. I guess on a smaller computer it would be slower, but still. This is good enough for now, and it's just a stepping stone. Keep in mind, we don't need it as fast and general as AI in cars. The waifus will mostly look at the same home with the same objects all the time. >>29911 My issue is rather that I don't want to use a device which I can only get from recycling centers. Also, I want two cams which can move on their own and I decide on which distance they are. I guess something like Kudan will be the way to go: https://www.youtube.com/@KudanLimited

Robowaifu Technician 03/05/2024 (Tue) 22:05:07 No.30138

A bit odd no one mentioned LiDAR. This would allow for a better sense of depth and objects behind themselves out of ordinary vision to avoid walking backwards into someone or elbowing them.

Robowaifu Technician 03/05/2024 (Tue) 22:30:04 No.30139

>>30138 but the cyberninjas wear black

Robowaifu Technician 03/06/2024 (Wed) 02:07:29 No.30148

>>30139 Black clothes aren't that black especially as the dye fades over time. If you want to be picky being just a secondary source of sight you could use at compromise of resolution instead use radar just for a general awareness to know to carefully turn to see what is at a location.

Robowaifu Technician 03/07/2024 (Thu) 13:52:14 No.30174

>>30138 To add to my earlier point. I found a diy LiDAR that is supposed to cost $40 to make. https://www.instructables.com/Project-Lighthouse-360-Mini-Arduino-LiDAR/

Chobitsu 03/07/2024 (Thu) 18:10:40 No.30180

>>30174 > I found a diy LiDAR that is supposed to cost $40 to make. I'd think that's a game-changer for the mapping need, if it's legit and reliable. Thanks, Anon! Cheers. :^)

Robowaifu Technician 03/07/2024 (Thu) 23:13:48 No.30189

>>30180 Considering usual cost of LiDAR I am thinking this is a bit less accurate and shorter range but it's still useful for this kind of application likely. Im not sure why the developer privated his videos. They might be still viewable through Archive.

Chobitsu 03/07/2024 (Thu) 23:37:59 No.30190

>>30189 >Considering usual cost of LiDAR I am thinking this is a bit less accurate and shorter range but it's still useful for this kind of application likely. Yeah makes sense. >Im not sure why the developer privated his videos. In my experience, that's one of the first signs that an opensource system is going closed source. They block the assets from the publice b/c """reasons""". >They might be still viewable through Archive. Not sure what that means.

Robowaifu Technician 03/08/2024 (Fri) 00:32:46 No.30193

>>30190 >signs that an opensource system is going closed source He left up the files for making it though. Apparently his whole YouTube channel is gone. >Not sure what that means. I found the URL for one at least that was archived. The follow up update video wasn't archived unfortunately. https://web.archive.org/web/20210202100801/https://www.youtube.com/watch?v=uYU534Wn4lA I managed to find a similar priced one though a little more cost that used to be available as a kit but it appears to be a different design, The website seems to no longer exist. https://web.archive.org/web/20211129020703/https://curiolighthouse.wixsite.com/lighthouse Found that one from a video of some guy assembling it https://www.youtube.com/watch?v=_aRcoI25HqE >>30190 Going down that rabbit hole from YouTube recommend vids lead me to two others $44 but this one is a single point instead of 360º https://www.dfrobot.com/product-1702.html This one is $99 https://www.dfrobot.com/product-1125.html

Robowaifu Technician 03/08/2024 (Fri) 00:39:51 No.30194

>>30193 Wait never mind about the curiolighthouse. It seems my browser was just not redirecting to the page properly. That site is still up.

Robowaifu Technician 03/29/2024 (Fri) 19:35:38 No.30620

Just found out 3D cameras for sensing depth are called A "depth camera" or "3D depth sensor" or "stereoscopic depth sensor" sometimes terms like "binocular depth camera" appear. They capture color (some IR too) and depth in a single system like our vision works. Though if you used one of these premade units it would mean having only head turning not eye turning.

Robowaifu Technician 04/12/2024 (Fri) 02:46:27 No.30877

>>29915 Started on the kinect lite guide because I don't want giant XBOX 360 bars on my robot's face. And just now after saying it I regret hacking it apart. It's still huge after making it half the size, the length of a smartphone. https://medium.com/robotics-weekends/how-to-turn-old-kinect-into-a-compact-usb-powered-rgbd-sensor-f23d58e10eb0

Robowaifu Technician 04/12/2024 (Fri) 05:10:55 No.30879

>>30877 I know this is a stupid question but can you strip those components right out of the suppoirt frame and have them simply connected to the wires?

Robowaifu Technician 04/12/2024 (Fri) 05:51:53 No.30880

>>30879 Zoom in to the whole in the centre. Looks like there is a circuit board under there. If one were to take it out of the frame it would require adding wires and attaching back to the circuit board I imagine.

Robowaifu Technician 04/12/2024 (Fri) 07:36:27 No.30881

>>30879 >>30880 I expect the physical positioning of the 3 camera components is tightly registered. Could be recalibrated I'm sure, but it would need to be done.

NoidoDev ##pTGTWW 04/12/2024 (Fri) 08:40:51 No.30884

>>30879 >Depth Perception From what I know these systems work so that it knows the distance between the two cameras and this is part of the hardware. If you want to do this yourself then your system would need to know the distance. I think Kudan Slam is a software doing that: >>29937 and >>10646 >Kudan Visual SLAM >This tutorial tells you how to run a Kudan Visual SLAM (KdVisual) system using ROS 2 bags as the input containing data of a robot exploring an area https://amrdocs.intel.com/docs/2023.1.0/dev_guide/files/kudan-slam.html >The Camera Basics for Visual SLAM >“Simultaneous Localization and Mapping usually refer to a robot or a moving rigid body, equipped with a specific sensor, that estimates its motion and builds a model of the surrounding environment, without a priori information [2]. If the sensor referred to here is mainly a camera, it is called Visual SLAM.” https://www.kudan.io/blog/camera-basics-visual-slam/ >.... ideal frame rate ... 15 fps: for applications with robots that move at a speed of 1~2m/s >The broader the camera’s field of view, the more robust and accurate SLAM performance you can expect up to some point. >...the larger the dynamic range is, the better the SLAM performance. >... global shutter cameras are highly recommended for handheld, wearables, robotics, and vehicles applications. >Baseline is the distance between the two lenses of the stereo cameras. This specification is essential for use-cases involving Stereo SLAM using stereo cameras. >We defined Visual SLAM to use the camera as the sensor, but it can additionally fuse other sensors. >Based on our experience, frame skip/drop, noise in images, and IR projection are typical pitfalls to watch out. >Color image: Greyscale images suffice for most SLAM applications >Resolution: It may not be as important as you think >Visual SLAM: The Basics - https://www.kudan.io/archives/433 Edit: Added the tutorial and articles about "Camera Basics" and "Visual SLAM Basics".

Robowaifu Technician 04/22/2024 (Mon) 03:04:11 No.30994

>>30877 The kinect was cheap at 12$ and I scaled it to the full sized robot head in gimp. I can use the main camera in the middle of aperture and the two projector/IR camera lenses as the eye shines. It won't look like this in the final robot head, but it will be positioned in this manner.

NoidoDev ##pTGTWW 06/07/2024 (Fri) 18:58:00 No.31472

Will Cogley came out with a snap fit eye mechanism (no screws needed). > By removing ALL fasteners and using a 100% snap-fit assembly, assembly time is cut down 6 fold! Hopefully this design will also be more accessible if you struggle to get the right parts for my projects. If you don’t want to use my new PCB design (which admittedly is a work in progress) refer to [my previous design](https://www.notion.so/Simple-Eye-Mechanism-983e6cad7059410d9cb958e8c1c5b700?pvs=21) for electronics/wiring instructions. > If you do want to use the PCB, note that its still a work-in-progress. The design works although there is an issue with some holes being undersized. In theory the attached file is fixed but I’ve yet to test it myself to be 100% sure! https://youtu.be/uzPisRAmo2s https://nilheim-mechatronics.notion.site/Snap-fit-Eye-Mechanism-b88ae87ceae24d1ca942adf34750bf87

Robowaifu Technician 12/27/2024 (Fri) 17:29:51 No.35167

> (eye-assembly -related : >>35165 )

Robowaifu Technician 01/04/2025 (Sat) 23:02:57 No.35339

> (eye-design -related >>35318, >>35338 )

TFT Display Eyes Robophiliac 01/09/2025 (Thu) 21:28:47 No.35511

>>1666 >>8817 >>26306 There seems to be some interest in display "eyes" that don't actually help the robot to see, but probably not enough for it's own thread, so for now I'll just park this here. From this thread on the dollforum: NSFW https://dollforum.com/forum/viewtopic.php?t=189110 Links in thread reproduced here, just in case: An example of a sexdoll on reddit (NSFW): https://www.reddit.com/r/SexDolls/comments/1gvulh4/video_custom_eyes/ Same doll with different image for emotion(NSFW) https://www.reddit.com/r/SexDolls/comments/1gxums5/kawaii/ Same doll, different display with moving tongue(NSFW) https://www.reddit.com/r/SexDolls/comments/1gxvwme/omg_thats_good/ A display entry on amazon. Search "round tft display" as offerings change over time. https://www.amazon.com/gp/product/B0B7TFRNN1/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 An Instructible article on the software: https://www.instructables.com/TFT-Animated-Eyes/ A tutorial video on youtube: Master the Round TFT Display on ESP32 and GC9A01 driver with the TFT_eSPI library https://www.youtube.com/watch?v=pmCc7z_Mi8I OP's results video: https://youtu.be/S-ktv1snsiQ Uncanny eyes Halloween skull https://www.instructables.com/Uncanny-Eyes-Halloween-Skull-Animatronic/ github link for large eyes (used in halloween skull) https://github.com/dalori/ESP32-uncanny-eyes-halloween-skull Large eyes tutorial on youtube: https://youtu.be/G2RZFX-qwnI

Chobitsu 01/10/2025 (Fri) 02:16:09 No.35518

>>35511 This is definitely the correct thread, Robophiliac. >pic Care to >tl;dr what we're looking at here a bit more? The one on the right certainly looks pretty suited as a static eye. Can it 'move'? What about the left one? TIA.

Robophiliac 01/10/2025 (Fri) 02:25:22 No.35519

>>35511 >>35518 > what we're looking at here a bit more? Sorry, it's a size comparison with a semispherical doll eye; to show it's pretty much a drop-in replacement fit. If I wanted to go that route, they would fit nicely in the heads I'm getting to modify.

Chobitsu 01/10/2025 (Fri) 05:14:32 No.35527

>>35519 Ah, got it thanks Robophiliac. Once you have an assembly together, would you mind posting clip(s) of these eyes 'in action' please? It might help all of us to understand your approach better. Cheers. :^)

Robophiliac 01/15/2025 (Wed) 04:14:15 No.35649

>>35318 >>35338 >Most animatronic eyes use a central pivot point in the eyeballs, greatly reducing the available area for a camera. They were designed as props, not robots. Only some mods to the inmoov design and a few others are intentionally "camera friendly", and only some also have eyelids. There is an inmoov mod for the ezrobot hardware I have, but that system uses a single camera, and the mod doesn't include eyelids. Among the security cams the main concerns are size, range of focus, the ability to continuously view the signal live (preferably via wire) and the presence of microphones for possible use as "ears". Any suggestions appreciated. >effective, highly capable (and 'sovlful') stereoscopic eye designs; including the accessory 'tissues' (lids, brows, &tc.) As it so often seems, the biggest hurdle to finding something online is figuring out what that thing is called, so you know what to look for. https://www.ebay.com/sch/i.html?_nkw=Mini-CCTV-Camera-Security-Micro-Audio-Wired If you have other ideas for search terms, go ahead and add them. As you can see from the search results, many of the cameras on offer will easily fit in the area in front of the central pivot of many animatronic eye designs. Most also have RCA type output connectors, but there are RCA to USB adapters available, so you could use them with open-CV, or any other system that uses a USB feed. Now that cameras are available to choose from, among considerations of actual size, the presence of a microphone, power consumption and any other features the camera may have, One area hardly ever discussed is resolution vs computing power- How high of a resolution can your robot process? If you are using an SBC, will it be able to process the video signal(s) and be able to perform other tasks at the same time? can it walk and chew gum? . If you are using a tethered system sending video and control signals back and forth wirelessly to a more powerful computer, it may be necessary to use a low-resolution camera system (or more than one data channel) to avoid "buffering" of the data flow. We don't want the robot to walk into a wall that it saw, but didn't get the message to turn in time to avoid. Yes, we could install collision sensors and an automatic stop function, but we could then be getting "pauses" every time the buffering situation occurred, during various tasks. This would be very non-human-like, and annoying . So, the problem becomes how much resolution do we want/need vs computing power and it's $ price? One question immediately occurs; would it be possible to change the resolution on the fly by using a software "switch" to tell the processing computer to drop every other bit(pixel?), or to process only every 3rd or 4th bit? Or to go to black-and-white for most operations?

GreerTech 01/15/2025 (Wed) 04:17:50 No.35650

>>35649 >One question immediately occurs; would it be possible to change the resolution on the fly by using a software "switch" to tell the processing computer to drop every other bit(pixel?), or to process only every 3rd or 4th bit? Or to go to black-and-white for most operations? Makes me think of Terminator Vision

GreerTech 01/15/2025 (Wed) 04:21:27 No.35651

Here's a good, albeit outdated tutorial on computer vision. https://www.societyofrobots.com/programming_computer_vision_tutorial.shtml

Barf 01/15/2025 (Wed) 05:04:41 No.35652

Just started going through this thread. Lots of options and depends on scope of project I guess. Sounds like you can do old way with things like depth sensors using ultrasonic or lidar but then you have to program all the spatial reasoning yourself. Spatial reasoning models look like they are just taking off though. For now, from what I've seen like link below, most are clipping frames, downsizing using ffmpeg and then passing to a vision model for image details. You could do that with a Qwen 2B-VL and pass to larger model or fine tune one depending on scope again. But that doesn't give you spatial reasoning. https://www.youtube.com/watch?v=QHBr8hekCzg Hopefully over next year open weight modals will be released and at some point a full multi-modal for text, audio and video reasoning will be within Nvidia Jetson range. Am I off here, or is that the current state basically?

Chobitsu 01/15/2025 (Wed) 21:25:45 No.35660

>>35649 Interesting ideas, Robophiliac. Thanks! >variable-resolution encoding I think some wizardy with ffmpeg or other codec systems might provide you with that 'on-the-fly' variability, Anon? Maybe have two SBCs dedicated to the vision tasks onboard? Good luck, Anon! :^) >>35650 >pic I lel'd a little. I've wondered at this oft-repeated trope over the years (this film was made in the '80s sometime, I think). Why would they think a robowaifu (or terminator, in this case) would want to see a text overlay on it's visual field like it was playing some kind of vidya? :D

NoidoDev ##pTGTWW 01/15/2025 (Wed) 23:48:21 No.35667

>>35649 >Mini-CCTV-Camera-Security-Micro-Audio-Wired Thanks. Good find. In the past I also looked at such small cameras, but these where for model airplanes. I think these were analog and it would've been a bit tricky to get the signal encoded into digital. >would it be possible to change the resolution on the fly by using a software "switch" to tell the processing computer to drop every other bit(pixel?), or to process only every 3rd or 4th bit? Or to go to black-and-white for most operations? This would be great. I had similar ideas, but rather for the computer next to the camera. Maybe some FPGA that can switch between different modes, idk? My vague idea was that the computer, or several small ones, would change the picture very fast into various formats and cuts. At least several resolutions down to very low ones, maybe removing the color, also only the center or certain parts of the picture. Maybe there's also a technique to change the color in a certain way, so the object in a color you are looking for sticks out more. Focus e.g. cutting out faces or objects would require a fast adaptive system, but the other operations should be done by something very fast and energy efficient. Maybe an ASIC, I guess. Then the system downstream would not look at video data the whole time, but only analyze the lowest amount of data to figure out what's going on.

GreerTech 01/16/2025 (Thu) 04:19:37 No.35680

>>35660 My headcanon is that since the neural net CPUs of the Terminators were like human brains, it helped them in some way to visually see that information.

Chobitsu 01/16/2025 (Thu) 08:11:24 No.35693

>>35651 Thanks, GreerTech! >>35652 >Am I off here, or is that the current state basically? I think it's a good idea to experiement with current NVIDIA Jetson board if you can do so, Anon. As to the camera, I'd say to just pick the smolest one that gets the job done & is compatible with your processing board. This is an area that is under heavy R&D, so I wouldn't worry too much about waiting until "just the perfect choice" comes out. Good luck, Barf. >>35667 >pic Cute. :^) >>35680 Hehe, makes sense.

Barf 01/16/2025 (Thu) 15:15:53 No.35700

Haven't gone through all the threads yet, but here's a good repo of code and prints. It shows the frame processing for a visual LLM at about a frame per second, and shows it doing reasoning a bit. https://www.youtube.com/watch?v=0O8RHxpkcGc/&t=14m09s https://openroboticplatform.com/library https://github.com/NikodemBartnik/Machine-Learning-Robot I only have an ESP8266 and my main PC to start, but if I ever get that far, I might get a Jetson.

GreerTech 01/24/2025 (Fri) 12:17:50 No.36020

https://venturebeat.com/ai/hugging-face-shrinks-ai-vision-models-to-phone-friendly-size-slashing-computing-costs/ Visual AIs are becoming very compact now.

Barf 01/24/2025 (Fri) 14:39:12 No.36024

>>36020 Here's a link to the WebGPU version - https://huggingface.co/spaces/HuggingFaceTB/SmolVLM-256M-Instruct-WebGPU Really good image description for a 256MB model. The description was about as good as 2.5\7B models I've used Now if I could just figure out how to fine tune the data - https://huggingface.co/datasets/HuggingFaceM4/the_cauldron

Barf 01/24/2025 (Fri) 16:41:27 No.36025

These with the convex lens that you can split apart might be nice. No cameras, but could probably be added and has everything else on a custom PCB already. https://www.adafruit.com/product/4343

Chobitsu 01/24/2025 (Fri) 17:37:47 No.36029

This is really exciting stuff lately ITT, Anons. Thanks for linking to resources for us all! Cheers. :^)

GreerTech 01/29/2025 (Wed) 09:22:47 No.36237

Researchers were able to tweak machine vision into being usable in low-light conditions https://techxplore.com/news/2025-01-neural-networks-machine-vision-conditions.html

Chobitsu 01/29/2025 (Wed) 09:28:28 No.36239

>>36237 Thanks GreerTech! I'm actually interested in devising a smol flotilla of surveillance drones (the tiny, racing ones) for a robowaifu's use for situational-awareness on grounds. Having 'night vision' is very useful for this ofc -- especially if no special hardware would be required! Cheers. :^)

GreerTech 01/29/2025 (Wed) 10:34:51 No.36242

>>25927 Unfortunately, it looks like project may be dead (your link was broken and the last update was in last January, but I wonder if it could be retooled with newer and more efficient LLMs and vision models. It definitely caught my eye, it solved the elephant in the room I was thinking about, how do we tie a vision model to an LLM? https://github.com/haotian-liu/LLaVA

GreerTech 04/29/2025 (Tue) 05:27:04 No.37990

I mentioned Tesla FSD cameras in another thread, and I thought of this video I saw https://youtu.be/fKXztwtXaGo?si=pyoeQ2b-0vqfaKcW

Robowaifu Technician 04/29/2025 (Tue) 05:48:40 No.37997

>>37990 > (vision -related convo : >>37987, ...)

GreerTech 04/29/2025 (Tue) 05:53:13 No.38001

Depth perception clues, both monocular and binocular >>37995

GreerTech 04/29/2025 (Tue) 06:05:42 No.38004

>>38001

Robowaifu Technician 04/30/2025 (Wed) 05:46:58 No.38041

> (robo-vision -related : >>23145, >>37985, et al )

GreerTech 07/29/2025 (Tue) 03:09:15 No.40043

Vision based on a Python library that uses "abstractions" of objects https://www.hackster.io/news/explore-robotics-and-computer-vision-with-zumo-jetson-5d158fede708

Chobitsu 07/31/2025 (Thu) 02:15:00 No.40087

>>40043 Thanks, GreerTech! Cheers. :^)