True, but look at the face recognition portion, he's finding the circumscribing rectangle, which allows for a simple depth calculation. He's not really showing us the original face locking. Following a linearly moving object once you've found it changes calculations by orders of magnitude. And lastly, he's moving that pad slowly; not much acceleration.
In a previous life I was involved with realtime multi-processor/multi-sensor array DSP for tracking bad guys and the things they throw at you.