Alan Bovik holds the Cockerell Family Endowed Regents Chair in Engineering at The University of Texas at Austin. For the past three decades, he has also held the position of Director of the Laboratory for Image and Video Engineering (LIVE). His numerous awards and accolades include the 2019 IEEE Fourier Award for Signal Processing and a Primetime Emmy for Outstanding Achievement in Engineering Development.
JOLTTX caught up with Professor Bovik recently to learn more about his research, its applications, and its possibilities for future work.
This interview has been edited for length and clarity.
- How would you describe your field of research?
- I work at the nexus of digital video and neuroscience.
- What interested you in that field?
- I was trained as an image processing engineer back in graduate school, but I soon became interested in visual neuroscience. I started collaborating with people working in visual neuroscience, and it occurred to me that it would be very interesting to bring visual neuroscience into areas like video communications where it didn’t really exist. And that has borne out very successfully.
- Could you give a brief overview of how image and video quality algorithms work?
- Our world is governed by physics and that can often be expressed statistically. This means that pictures of our world are also governed by statistical laws. Pictures also includes videos. When you capture those pictures digitally, they still obey certain laws. However, if the pictures are distorted, through events like video compression or noise or blur, then it’s expected that those statistics are altered.
-
- And that’s important because our visual brain has been evolving and adapting with respect to these statistics for eons. The visual brain, meaning the human brain, is very sensitive to distortions. If you see a blurred picture, you immediately sense that. You don’t have to sit there and think: Is that blurry? It’s what’s called pre-attentive or pre-cognitive. And so, the algorithms that I like to build are based on models of how our brain works and specifically how our brain represents distorted pictures at the level of neurons. And, with those mathematical models, we create algorithms for how people will react to distorted videos.
- Are these algorithms trained by machine learning?
- Some of them are. Surprisingly, our most popular algorithms are not what you would call deep learning. Some don’t use any machine learning at all because they’re models of real neurons. Some models are trained with simple machine learning or support vector machines, which are not deep learning. We are conducting very extensive research on using deep networks to also do these kinds of problems, but they’re just not ready for prime time yet.
- How do these quality algorithms relate to the problem of a streaming video platform needing to adjust to different bandwidth conditions?
- A good example would be Netflix. I think a lot of people watch Netflix all over the world. So, suppose you’re a Netflix user, and you say, “I want to watch Stranger Things.” Your local geographic data cloud contains data for “Stranger Things” generated on a per scene basis. It can have 15 or 20 versions of a particular scene at different quality levels. When you click “Stranger Things” then immediately your local geographic cloud looks at the 15 to 20 different versions, each compressed to a different degree and in different ways. Netflix is also able to make measurements of the bandwidth conditions. They can even reach down into your set top box to see what the conditions are in your buffer (how much video you have available to watch). They make a decision out of which one of those 15 to 20 scenes that they’re going to send you, and they make that decision in view of the balance of your bandwidth conditions and the predicted video quality. That predicted video quality that’s considered in the decision is based on one of these video quality algorithms that we’ve developed. And today, every bit that is streamed by Netflix is controlled using one of our algorithms.
- As a follow up question, are you familiar with the net neutrality debate?
- I don’t follow it that closely, but I have a general sense of what it’s about. I certainly don’t think that big companies should have an advantage over small companies in terms of owning internet bandwidth.
- Do you think that your algorithms would help a streaming video platform adapt if their service were subject to limited bandwidth unless they were willing to pay additional fees? How far would your algorithms be able to compensate for that?
- Because of our algorithms, Netflix is able to save about 25% of bits– so they’re already doing it. So, if that were to happen, they could try to optimize it a little bit further if somehow they were constrained in terms of how much bandwidth they would be allowed to have. But, as a cost savings measure, they already try to conserve every bit they can. So if their bandwidth was constrained further, they would still perceptually optimize using these kinds of algorithms, but it might mean an overall decrease in video quality delivered. There’s a limit to everything.
- Do you see 3D Quality algorithms as a future area of research?
- Sure, we’ve developed quite a few of those.
- What would you say are some of the challenges in this area?
- Well in 3D, there are broader issues. First is the question of whether it’s distorted or not. We’re using similar kinds of statistical and perceptual models. However, when you’re viewing in 3D, there are additional issues of physical discomfort, nausea, headaches, eye strain–all kinds of things. So, we have also created algorithms that will predict how much discomfort you will feel because we modeled the mechanisms that cause discomfort when you are watching 3D content.
- Will that modeling increase the widespread use of 3D environments?
- It could. For example, in the controlled environment of a VR helmet there are lots of things you can do. There is interest in using these kinds of models to reduce physical discomfort. It’s still in the future because you need to know where the person is looking in order to be able to maximize the effectiveness of those algorithms. In fact, they are starting to put eye trackers in helmets. The other area of interest is using these algorithms and models for content creation for VR helmets.
- What do you see as the future areas of research on 2D quality algorithms?
- In video, everything is getting bigger, right? Deeper and also bigger screens. Deeper augmenting HDR support and depths: deeper darks, lighter lights, better color gamut, high frame rates, etc. So, what we’re looking at right now is higher frame rates because there is significantly increased interest in things like live sports. And in live sports, there’s issues with lower frame rates. You get different kinds of distortions like motion blurring. Those types of distortions can be fixed with higher frame rates. However, you need special algorithms that can deal with the higher frame rates.
-
- Further out, I would say video in VR helmets. When you think of VR, you usually think of gaming. But, the industry that does VR, they want you to put on the helmet and be in the movie avatar at high resolution without feeling nausea or significant physical discomfort. Right now, there are significant constraints with respect to bandwidth and so on. We’re looking at algorithms where you can predict where a person is looking and create what’s called foveated compression, meaning that the video is high quality where your eyes are pointed but lower quality where you won’t notice. So that’s a future direction we’ve been looking at. We’re working on that with Facebook right now.
- Facebook is probably also interested in your image quality algorithms, correct?
- Extremely, yes. We work with them very much on that topic.
- Do you have any other fields of research that you are very excited about?
- My great passion has always been to find new directions in visual neuroscience that apply to engineering problems in vision. So, a direction that we’ve recently been pushing is quality for audio and video simultaneously. For example, if you are watching a movie on a mobile device, how much does the audio quality affect your overall quality of experience as compared to the video quality? And, how video and audio quality coexist and combine to create an overall sense of enjoyment?
-
- Another area I’d love to break into at a greater degree is medical images. Medical image quality is special because it’s all task-based. Meaning, how does the quality of a radiograph affect the ability of the radiologist to make an accurate judgment. That’s a pretty wide-open field right now. So, if we can take our algorithms and apply them to task-based quality like that there could be life-saving consequences as well.
- I noticed you had some research listed on infrared images. What is the end use for quality algorithms in infrared?
- We’ve done some work on infrared images. In terms of end use, think about a fireman running into a burning house. He walks into the room and it looks fine. But if he has an IR camera, he can point at the walls and see the hot spots. He can see hot wires, a big warm region on the wall, etc. To be able to understand what’s going on in that building has lifesaving consequences. Naturally, the quality of those IR images is very important. So, that’s the kind of IR work we’ve been doing.