The philosophy of computational object recognition, scene understanding, machine learning, and musings on the future of computer vision.
Second white shirted kid from the front. Remember tombone, some may be fooled, but pretention is not sophistication. I'm happy you enjoyed learning about the retinal system, it is very interesting. However I don't know about a martian being able to tell what we as humans see by just looking at our retinas, can you tell me what we are tuned in to see if you think a martian could(For one how would the martain isolate the protein absorbing the photons or simply identify it; would probably be the first necessary key to then move on to the task of determining the many different cell-types and their functions with in the retina, before you could move on to the high processes of depicting what can be seen from the organization of light sensing cells)? Secondly there is another dimension to the working of the visual system that you seem to overlook, and that is the chemical structure. You can not remove it from the hard wiring, maybe its your physics background but with out understanding the chemical signaling and control thereof, you really don't have much of an understanding at all about how the retinal system works, maybe rather how photons are absorbed by rhodopsin which triggers light sensation (then progressive desensitization) and the order of these light sensing cells (rods and cones with in the retina)but you do not talk about how these signals are intergrated, amplified, and ordered, to give human "sight". I also consider my own knowledge as being basic, as most of biology considers its knowledge also (Hence alot of research still going on with in the visual system). Don't seperate the retinal system from the visual cortex either, try to understand the wiring of that, you'll realize maybe its not so much what we see, but rather what me make ourselves see.
When I was talking about the Martian experiment, I was referring to the geometry and photometry of things that a Martian could infer about the human visual experience on Earth as opposed to a detailed acount of the biological, chemical, and physical processes involved.Independent of the chemical structure of the human visual system, one can think of this system as a black box, a system which converts the incoming visual image to some representation which gets sent through the optic nerve. When I refer to the retina, I actually want to talk about the processing (manipulation of the visual signal) of information that happens before this information is sent ot the brain. I know that the signal is never truly interpreted without the brain, but there is enough low-level processing happening in the retina (or what I believe is the retina) such that one can analyze the retinal cell and determine that they are responsive to certain visual structures. For example, sharp gradients in intensity (edges) are perceived by humans because there are certain structures with the retinal system that respond to such structures of local intensity. I think the Martian could build a probability density function of edges seen in the human visual field; its not like he would see the 'image' of your mother burned into your retina. Experiments were done on the retinal system of field mice, and scientists were able to find out that field mice obtain a strong response to something that looks like a hawk in the sky. One should not be surprised since hawks like to eat field mice.I would like to state that I'm not directly interested in the human visual system. I don't even want to talk about photons in my work on vision. Would a story that sounds like, "photon X hits protein P, releases chemical C, causes reaction R, then a human 'sees'" satisfy you? Not only do I think that science will never provide us with a detailed anough chemical account of vision, this account would be more like a manual for operating the human visual system and would not be a true theory of vision. I am looking for a theory of vision that transcends implementation details, I'll gladly talk about parts-decomposition of objects, object discovery, memory, spatial reasonsing, obstacle avoidance, and even low level image-processing tasks such as edge detection, gestalt-segmentation, building pdf in high-dimensional feature spaces, blurring, etc. However, I will not talk about low-level system implementation details such as processor speed, machine instruction set, programming language of choice, and definitely not about protein structure, photo-receptor location, and other chemical system properties. In other words, I'm looking for a theory which not only explains human vision, but explains it in an implementation-independent way that translate to modern software/hardware implementation and perhaps other biological implementations in the future.I agree with you when you say "its not so much what we see, but rather what me make ourselves see," and I would rather talk about a conceptual framework for representing objects (high-level algorithms) as opposed to a chemical-system (machine code instructions) which implements such a theory of vision. From an academic point of view, I'm highly interested in the problem of how past visual experience interacts with the visual experience of the 'now.' I believe that understanding this intimate connection between the past and the current is crucial if we want to advance computer vision.
Yo tom... my bad, my last post was a bit unnecessary. I just love neurochemistry and I think it could hold some valuable insight into forms of algorithms that one day you may be able to apply however, understandably, the science has to progress in order to use such insight (I believe). Anyways when it comes to high processes of vision (taking a step back from cellular mechanisms) one thing that really interests me is the fact that just about all I "see" is associated to a spoken word. Do you feel that this has any significance to the field of computer vision, because as much as you would want to remove the programmer from the scenario, we all at one point had a teacher. It seems to me that our images, with in the literate or rather spoken world, are associated to some sort of word(symbol). Now this may have implications for your work as I believe you (computer vision scientists) are not trying to have the "computer-beholder" be able to recognize such abstract things that seem unable to be associated with any spoken word (psychedelic experience and dreams come to mind) but rather with things which are easily communicated with symbol (words, eg. door and table). I don't know, just thought that it is an interesting frame to ponder within, plus I feel bad for posting that last comment which seemed to lack all substance whatsoever. Take it easy bone!
First of all, I find your points of view rather illuminating, especially because you are getting a PhD in afield that is distant from robotics. What might appear like a dispute is just us adhering to the paradigms taught by our research program.Although we (the vision community) would like to build vision systems that can interact with the worldon their own, we would also like to be able to pose queries using natural language. If we want to interact with these machines and ask them questions about what theysee, then they must be able to spit out concepts in a natural language. Clearly, we would need the machines to understand 'things which are easily communicated with symbol'; However, I believe that the internal representation of the visual world -- that the machines use -- doesn't have to correspond to human symbols.
I understand that the "machine's internal representation" doesn't have to resemble "human symbols" but don't you think it would simplify the solution if these representations where acknowledged in terms of human word?(Direct association)