Wednesday, March 4, 2015

Scalable Crowd-Sourcing of Video from Mobile Devices

Motivation:

This paper claimed that crowd-sourced video from mobile devices can be a valuable public resource. It numerated some use cases that could be enabled by a large-scale searchable video repository. Google glass can be a good example as the source of first-person video. The authors predicted that integrating video capture with correlated sensor information such as gaze tracking, audio, geolocation, acceleration, and biodata (e.g., heartrate) is only a matter of time.

Main Points:

  • An Author-Publisher Model is designed to encourage users to share their captured scenes of everyday life. This approach creates a business relationship with the service provider that invests in the video capture infrastructure.
  • They proposed a hybrid cloud architecture that is effectively a CDN in reverse. This architecture uses decentralized cloud computing infrastructure in the form of VM-based cloudlets. 
  • Image processing for denaturing are offloaded from mobile devices to cloudlets. For each mobile user who is currently associated with a cloudlet, there is a “personal” VM that performs the customized denaturing for that user. Other cloudlet VMs encapsulate image processing code to perform background indexing of recently-captured video segments.
  • The major bottleneck of the performance of video denaturing is the computer vision algorithms used for face detection. However, given the computational requirements of the denaturing, the penalty of virtualization on the total throughput is limited.
  • Reducing the resolution is a clear path to increase the throughput of the personal VM, but might negatively impact the detection accuracy of the denaturing process. Their results suggested that denaturing should be performed at resolutions of at least 720p, otherwise users may lose their trust in the denaturing process.


Trade-offs/Comments:

  • This paper just mentioned the google glass can be a good source of first-person video. I think another source of continuously captured scenes of everyday life could be the dash camera (driving recorder) which often provide video evidence in an event of an accident and may record parameters such as date/time, speed, G-forces and location. 
  • The user defines a default privacy policy to publish his video segments. The object-based filters are currently limited to the faces present in the training set of our face recognition algorithms.
  • The computer vision algorithms, which is the bottleneck of denaturing and content-based indexing, are extremely resource intensive and the overhead of virtualization is however limited. They believe that the throughput might drastically improve when GPU virtualization matures and advanced, highly optimized GPU routines become available inside the VM. 


1 comment:

  1. Very nice points. Perhaps you can offer us a better option for #2 on the midterm :-)

    Are you skeptical of point #3? I think I might be.

    ReplyDelete