Visual Question Answering with Frozen Large Language Models
Talking with LLMs about images, without training LLMs on images.
Published in
18 min readOct 9, 2023
In this article we’ll use a Q-Former, a technique for bridging computer vision and natural language models, to create a visual question answering system. We’ll go over the necessary theory, following the BLIP-2 paper, then implement a system which can be…