A Review of Transformer-Based Approaches for Image Captioning

Ondeng, Oscar ; Ouma, Heywood ; et al.

In: Applied Sciences, Vol 13, Iss 11103, Jg. 13 (2023), Heft 11103, p 11103

Online academicJournal

Zugriff:

Zum Volltext

Visual understanding is a research area that bridges the gap between computer vision and natural language processing. Image captioning is a visual understanding task in which natural language descriptions of images are automatically generated using vision-language models. The transformer architecture was initially developed in the context of natural language processing and quickly found application in the domain of computer vision. Its recent application to the task of image captioning has resulted in markedly improved performance. In this paper, we briefly look at the transformer architecture and its genesis in attention mechanisms. We more extensively review a number of transformer-based image captioning models, including those employing vision-language pre-training, which has resulted in several state-of-the-art models. We give a brief presentation of the commonly used datasets for image captioning and also carry out an analysis and comparison of the transformer-based captioning models. We conclude by giving some insights into challenges as well as future directions for research in this area.

Titel:	A Review of Transformer-Based Approaches for Image Captioning
Autor/in / Beteiligte Person:	Ondeng, Oscar ; Ouma, Heywood ; Akuon, Peter
Link:	Zum Volltext https://www.mdpi.com/2076-3417/13/19/11103 https://doaj.org/toc/2076-3417 https://doaj.org/article/b61a176d2c5e4234a6aa46a3f999baf6 https://doi.org/10.3390/app131911103
Zeitschrift:	Applied Sciences, Vol 13, Iss 11103, Jg. 13 (2023), Heft 11103, p 11103
Veröffentlichung:	MDPI AG, 2023
Medientyp:	academicJournal
ISSN:	2076-3417 (print)
DOI:	10.3390/app131911103
Schlagwort:	computer vision convolutional neural networks image captioning MS COCO CIDEr natural language processing Technology Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999
Sonstiges:	Nachgewiesen in: BASE Sprachen: English Collection: Directory of Open Access Journals: DOAJ Articles Document Type: article in journal/newspaper Language: English

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.