Unsupervised grounding of textual descriptions of object features and actions in video
Alomari, Muhannad, Chinellato, Eris ORCID: https://orcid.org/0000-0003-1920-2238, Gatsoulis, Yiannis, Hogg, David C. and Cohn, Anthony G.
(2016)
Unsupervised grounding of textual descriptions of object features and actions in video.
Proceedings, Fifteenth International Conference on Principles of Knowledge Representation and Reasoning (KR-16).
In: 15th International Conference Principles of Knowledge Representation and Reasoning (KR 2016), 25-29 April 2016, Cape Town, South Africa.
ISBN 9781577357551.
[Conference or Workshop Item]
|
PDF
- Final accepted version (with author's formatting)
Download (940kB) | Preview |
Abstract
We propose a novel method for learning visual concepts and their correspondence to the words of a natural language. The concepts and correspondences are jointly inferred from video clips depicting simple actions involving multiple objects, together with corresponding natural language commands that would elicit these actions. Individual objects are first detected, together with quantitative measurements of their colour, shape, location and motion. Visual concepts emerge from the co-occurrence of regions within a measurement space and words of the language. The method is evaluated on a set of videos generated automatically using computer graphics from a database of initial and goal configurations of objects. Each video is annotated with multiple commands in natural language obtained from human annotators using crowd sourcing.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Research Areas: | A. > School of Science and Technology > Design Engineering and Mathematics |
Item ID: | 19684 |
Notes on copyright: | This is the author's accepted manuscript included in this repository with permission, granted on 16/02/17 by the publisher AAAI. The final published paper appears as: "Alomari, Muhannad, Chinellato, Eris, Gatsoulis, Yiannis, Hogg, David, AND Cohn, Anthony. "Unsupervised Grounding of Textual Descriptions of Object Features and Actions in Video" Knowledge Representation and Reasoning Conference 2016". Published by the Association for the Advancement of Artificial Intelligence (AAAI), available at: http://www.aaai.org/ocs/index.php/KR/KR16/paper/view/12827 |
Useful Links: | |
Depositing User: | Eris Chinellato |
Date Deposited: | 05 May 2016 11:02 |
Last Modified: | 29 Nov 2022 21:59 |
URI: | https://eprints.mdx.ac.uk/id/eprint/19684 |
Actions (login required)
![]() |
View Item |
Statistics
Additional statistics are available via IRStats2.