Artificial Intelligence Programming Lab(AIPLab) 討論區

Please login or register.

請輸入帳號, 密碼以及預計登入時間


[開學]106學年第1學期的課程看版開張了 歡迎同學問問題-20170917

作者 主題: IVC-Special Issue on Cross-Media Learning for Visual Question Answering-20200831  (閱讀 274 次)


  • 管理員
  • Hero Member
  • *****
  • 文章: 1839
    • 檢視個人資料
Submission Deadline: August 31, 2020

First Review: October 31, 2020

Revisions Due: December 31, 2020

Final Decision: February 28, 2021

Visual Question Answering (VQA) is a recent hot topic which involves multimedia analysis, computer vision (CV), natural language processing (NLP), and even a broad perspective of artificial intelligence, which has attracted a large amount of interest from the deep learning, CV, and NLP communities. The definition of this task is shown as follows: a VQA system takes a picture and a free, open-ended question in the form of natural language about the picture as input and takes the generation of a piece of answer in the form of natural language as the output. It is required that pictures and problems should be taken as input of a VQA system, and a piece of human language is required to be generated as output by integrating information of these two parts. For a specific picture, if we want that the machine can answer a specific question about the picture in natural language, we need to enable the machine to have certain understanding of the content of the picture, and the meaning and intention of the question, as well as relevant knowledge. VQA relates to AI technologies in multiple aspects: fine-grained recognition, object recognition, behavior recognition, and understanding of the text contained in the question (NLP). Because VQA is closely related to the content both in CV and NLP, a natural QA solution is integrating CNN with RNN, which are successfully used in CV and NLP, to construct a composite model. To sum up, VQA is a learning task linked to CV and NLP.

SimplePortal Classic 2.0.5