TTIC Professor Greg Shakhnarovich works on research projects in computer vision, with students as well as faculty colleagues. “Computer vision, broadly understood, is about making computers understand, process, and create visual artifacts, like images and videos,” said Prof. Shakhnarovich. “It’s taking the physical world and representing it to a machine through some kind of collecting or sensing. Images are the most common way of doing this.”
One of the projects that he has been working on with TTIC student Nick Kolkin is about a process called style transfer. The idea is to take an image, which could be a photograph, drawing, or painting, and recreate the content of the first image in the style of the second one. Using a style transfer algorithm, you can take a photograph and render it as an oil painting, or a pencil sketch. Where the same end could potentially be achieved through a program like Adobe Photoshop, the goal is to automate this process and make it more intuitive and general. The current work related to this project is in collaboration with their colleagues at Adobe Research.
“We have what we think is the best style algorithm out there, and we’re continuing to work on this. It’s very cool, as it produces nice artifacts, and it also relates to a much more fundamental question in computer vision, which is how people perceive information through images,” said Prof. Shakhnarovich. “Computers are very bad at understanding art, they’re much better at understanding photographs. If you show them an impressionist painting, it’s very hard for them to find the same objects.”
He and his collaborators believe that figuring out how to create these stylized images, which retain the original content, could create a gateway to figuring out how to improve our ability to perceive things in the world around us.
“Basically, in this investigation into stylization, part of the goal is to make computer vision more robust, and part of it is just fun because it produces nice images. And I can finally explain to my mother what I’m doing,” said Prof. Shakhnarovich.
Prof. Shakhnarovich has also been working on a joint project with Professor Karen Livescu that has been underway for over ten years now, in collaboration with their colleagues in the Linguistics department at the University of Chicago. Together they have been conducting research on sign language recognition, focusing primarily on American Sign Language (ASL). Their goal is to eventually be able to take a video of a person signing and translate it into spoken English.
“This is an interesting combination of natural language processing problems, because it’s a machine translation from one language to another, and it’s not a typical acoustically spoken language. It shares many difficulties with speech, like people having different accents, or speaking different dialects. Nobody signs things exactly like in the dictionary,” said Prof. Shakhnarovich. “It’s also a computer vision problem because you have visual speech instead of acoustic speech. It’s a very challenging problem that remains largely unsolved. You can use google translate to translate German to English, and it works, but there’s nothing like that for sign language.” Another direction, spearheaded by his student Ruotian (RT) Luo, is looking at the interaction between language and vision. “Language is, in some sense, the best way we have for describing the world depicted in images,” he said. “One aspect of this is attempting to take an image and map its contents to a natural language description, for example, this is a pixel of a chair, and this is a pixel of a cat sitting on the chair.”
An interesting question that this study (involving collaboration with colleagues at the University of Chicago and Bar-Ilan University in Israel) explores is how to properly describe an image. This can be closely tied to the goal of making information more available to people with disabilities, such as people who are blind relying on tools that describe images for them.
Prof. Shakhnarovich’s interest in computer vision stemmed from an interest in Artificial Intelligence (AI). “I came to think of being able to understand the visual world as very fundamental to intelligence. Visual intelligence is something that is very basic because we share it with animals. Many of us would be very happy if we had a computer with the intelligence level of a squirrel, and a squirrel doesn’t seem to have language, but they certainly possess the ability to understand an image.” He believes that cracking visual perception could be key to understanding higher levels of how intelligence works.
“And also, it’s just a lot of fun. I enjoy the fact that it’s very visual (no pun intended), that both your input and a lot of the things you do are very easy to look at, and it has a very intuitive visual interpretation.”
Next quarter (Spring 2021), Prof. Shakhnarovich will be teaching Introduction to Computer Vision, TTIC’s first formal class on the subject.