A Weekend AI Project: Making a Visual Assistant for People with Vision Impairments
Running a multimodal LLaVA model, camera, and speech synthesis
Published in
8 min readFeb 17, 2024
Modern large multimodal models (LMMs) can process not only text but also different types of data. Indeed, “a picture is worth a thousand words,” and this functionality can be crucial during the interaction with the real…