A Weekend AI Project: Making a Visual Assistant for People with Vision Impairments

Running a multimodal LLaVA model, camera, and speech synthesis

Published in

Towards Data Science

8 min readFeb 17, 2024

Modern large multimodal models (LMMs) can process not only text but also different types of data. Indeed, “a picture is worth a thousand words,” and this functionality can be crucial during the interaction with the real…

A Weekend AI Project: Making a Visual Assistant for People with Vision Impairments

Running a multimodal LLaVA model, camera, and speech synthesis

Written by Dmitrii Eliuseev