Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media Conference proceeding (October 2016)