A Weekend AI Project: Making a Visual Assistant for People with Vision Impairments

Running a multimodal LLaVA model, camera, and speech synthesis

Dmitrii Eliuseev
Towards Data Science
8 min readFeb 17, 2024

--

Image by Enoc Valenzuela, Unsplash

Modern large multimodal models (LMMs) can process not only text but also different types of data. Indeed, “a picture is worth a thousand words,” and this functionality can be crucial during the interaction with the real…

--

--

Python/IoT developer and data engineer, data science and electronics enthusiast