Learning spoken concepts from unlabeled audio-visual data
Presenter
February 18, 2019
Keywords:
- audio-visual training
- cross-modal training
- image captioning
- speech search