Do you have to watch a new object 1000 times to recognize it? Or if you watch it another 100000 times will you recognize it better? I think not, and this is how image recognition systems based on the Haar cascade classifier or neural networks work.
SaraVision is an image recognition project that is conceptually completely different from the approach that the whole world is pinning their hopes on. It's different from better and better neural networks that learn from bigger and more available databases. We assume that sometimes just one or several glances at an object are enough to remember it and recognize it. It all started from the need to add the sense of sight and some intelligence to one of our sub-projects called SaraCam, which raises the current Google and Alexa assistants to the 2.0 level.
At first, in order to test some basic assumptions we wrote a simple program that recognizes the MNIST character set, which I describe on our blog in a slightly provocative article "About the nonsense of deep learning, neural networks in image recognition (using the MNIST kit)". Already there we managed to create a very universal program, recognizing characters regardless of their size, slant or font type, but it was only a programming "sandbox".
The next stage was to create something more universal, allowing for recognition of any objects, and for a start, allowing for quick detection of basic geometric figures and also to check a theory that our brain can see very well what we cannot see, literally drawing in our imagination the missing elements (see: visual perception and gestaltism, reification theory), and that our system will work similarly:
It worked, it works similarly, as you can see in this amateur video below, and it works so that you don't have to see the whole square to detect that the square is there.
It may seem that detecting simple figures is easy and any programmer can do it. You can use some "ready-made" programs, you can also algorithmize everything, but we don't want to write an algorithm for every shape, but to write one for all shapes, and most importantly we don't want to teach the system with thousands of images.
The next step was to test the system to see if it could handle face detection in the camera video. Importantly, the system is designed to detect a face very quickly, the face can be tilted left or right, slightly sideways, sideways, poorly lit, visible in color or IR rays - it worked. The system, despite the fact that it is under construction coped almost 20 x faster than standard face detection systems, and most importantly coped where other systems could not cope at all (for example, a face illuminated from one side by the sun with the head slightly tilted to the side):