https://www.journals.elsevier.com/image-and-vision-computing/call-for-papers/cross-media-learning-visual-question-answeringSubmission Deadline: August 31, 2020
First Review: October 31, 2020
Revisions Due: December 31, 2020
Final Decision: February 28, 2021
Visual Question Answering (VQA) is a recent hot topic which involves multimedia analysis, computer vision (CV), natural language processing (NLP), and even a broad perspective of artificial intelligence, which has attracted a large amount of interest from the deep learning, CV, and NLP communities. The definition of this task is shown as follows: a VQA system takes a picture and a free, open-ended question in the form of natural language about the picture as input and takes the generation of a piece of answer in the form of natural language as the output. It is required that pictures and problems should be taken as input of a VQA system, and a piece of human language is required to be generated as output by integrating information of these two parts. For a specific picture, if we want that the machine can answer a specific question about the picture in natural language, we need to enable the machine to have certain understanding of the content of the picture, and the meaning and intention of the question, as well as relevant knowledge. VQA relates to AI technologies in multiple aspects: fine-grained recognition, object recognition, behavior recognition, and understanding of the text contained in the question (NLP). Because VQA is closely related to the content both in CV and NLP, a natural QA solution is integrating CNN with RNN, which are successfully used in CV and NLP, to construct a composite model. To sum up, VQA is a learning task linked to CV and NLP.