Forskning ved Københavns Universitet - Københavns Universitet

Forside

Compositional Generalization in Image Captioning

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

Compositional Generalization in Image Captioning. / Nikolaus, Mitja; Abdou, Mostafa; Lamm, Matthew; Aralikatte, Rahul; Elliott, Desmond.

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, 2019. s. 87-98.

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Nikolaus, M, Abdou, M, Lamm, M, Aralikatte, R & Elliott, D 2019, Compositional Generalization in Image Captioning. i Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, s. 87-98, 23rd Conference on Computational Natural Language Learning, Hong Kong, Kina, 03/11/2019. https://doi.org/10.18653/v1/K19-1009

APA

Nikolaus, M., Abdou, M., Lamm, M., Aralikatte, R., & Elliott, D. (2019). Compositional Generalization in Image Captioning. I Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) (s. 87-98). Association for Computational Linguistics. https://doi.org/10.18653/v1/K19-1009

Vancouver

Nikolaus M, Abdou M, Lamm M, Aralikatte R, Elliott D. Compositional Generalization in Image Captioning. I Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics. 2019. s. 87-98 https://doi.org/10.18653/v1/K19-1009

Author

Nikolaus, Mitja ; Abdou, Mostafa ; Lamm, Matthew ; Aralikatte, Rahul ; Elliott, Desmond. / Compositional Generalization in Image Captioning. Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, 2019. s. 87-98

Bibtex

@inproceedings{2f18fd294bcb4b66b1fa9decac2c8b6c,
title = "Compositional Generalization in Image Captioning",
abstract = "Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.",
author = "Mitja Nikolaus and Mostafa Abdou and Matthew Lamm and Rahul Aralikatte and Desmond Elliott",
year = "2019",
month = "11",
day = "1",
doi = "10.18653/v1/K19-1009",
language = "English",
pages = "87--98",
booktitle = "Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)",
publisher = "Association for Computational Linguistics",

}

RIS

TY - GEN

T1 - Compositional Generalization in Image Captioning

AU - Nikolaus, Mitja

AU - Abdou, Mostafa

AU - Lamm, Matthew

AU - Aralikatte, Rahul

AU - Elliott, Desmond

PY - 2019/11/1

Y1 - 2019/11/1

N2 - Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.

AB - Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.

U2 - 10.18653/v1/K19-1009

DO - 10.18653/v1/K19-1009

M3 - Article in proceedings

SP - 87

EP - 98

BT - Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

PB - Association for Computational Linguistics

ER -

ID: 230849989