Thursday, March 5, 2015

Scalable Crowd-Sourcing of video from mobile devices:


Motivation:

This paper proposes a scalable internet system that continuously collects crowd-sourced videos from HUDs like Google glass. It tries to describe the potential benefits of crowd-sourced first-person video collection, offers an incentive based model for such a system and then deals with various technical issues such as context-based searching and privacy issues pertaining to the user. It uses a prototype implementation of the system to expose the scalability bottlenecks. The paper argues that the value of sharing first-person videos is that they may be of interest to someone else in an entirely different context. 


Main points:
  • One plausible incentive model the paper discusses is the Author-Publisher model. It is suggested that crowd-sourced video be treated as authored content. Analogous to a book publisher, the service provider will publish the authored content and share the revenue with the author. This is the incentive for sharing videos. 
  • The paper describes a process called denaturing to edit or cut out sensitive and private information that the user may not want to share. Denaturing has to be user-specific and should not be limited to content editing but also look at meta-data modification. Denaturing is important to encourage users to share their videos. Currently, the system only uses face detection techniques and blurs faces in the denaturing step. 
  • Another issue discussed is the potential overwhelming of a central cloud due to high data rate of incoming videos from multiple users. To avoid this problem, the systems uses a hybrid cloud architecture, a decentralized cloud structure in the form of VM-based cloudlets. These cloudlets are not just temporary storage areas for the denatured videos, they are in fact the true home of the denatured videos. Mostly, only meta-data about these videos is stored in the cloud although sometimes some video segments may be copied to the cloud.    
  • Video content indexer analyses each segment of the denatured videos using computer vision algorithms and obtains tags for each frame. More tags may come from the denaturing process since it blurs faces that the video content indexer will not pick up. 
  • Searching happens in two steps. In the first step, conventional SQL queries are run using time and location data and the tags extracted from each frame.  The second step is actual content based searching that is computationally expensive but can be run in parallel on the cloudlets. 
  • For video uploading, it is generally more efficient to upload larger video segments. 
  • In denaturing, there is a trade-off between video resolution, accuracy and throughput. Higher resolution videos result in greater accuracy in denaturing but the throughput takes a hit. On the other hand, low resolution videos have higher throughput but will lower user’s trust because denaturing accuracy will drop. The results of the evaluation suggest that the denaturing should be done at least at 720 p for optimal results.    
  • Similarly, there is a trade-off between resolution and indexing accuracy. The paper suggests that indexing can be done on resolutions of 360 p without compromising too much on accuracy.


Comments:
  • While denaturing may be useful in encouraging users to share their videos, suppose a search on crowd-sourced videos is needed to locate a missing child or to identify a thief, denaturing may affect such queries because it will blur the faces, modify location, time, etc. 
  • The complexity of the search query poses another challenge. While a search query with lots of exact details makes it easy to narrow down the search, such carefully worded searches may not always be possible and vague queries may pose problems. 
  • Since multiple users will be uploading video segments simultaneously, storage of these segments is a challenge. Using computer vision algorithms, similar crowd-sourced videos could be clubbed together. For example, if two users are in the same area while capturing videos, the visual content of these would be similar and may not need to be stored as 2 separate video segments.  
  • Quality of these crowd-sourced videos could be another potential challenge in my opinion. If the captured videos has issues of lighting, occlusion, as well as other quality parameters, the video content indexer and the searching algorithms may not work well. Lighting variations and partial obstruction of faces may also result in poor denaturing.




 

1 comment:

  1. Terrific points raised! It seems there are many interesting research challenges required to make this approach really work in practice.

    ReplyDelete