LLaVA, an open-source alternative to OpenAI’s GPT-4 Vision, has been introduced by researchers from the University of Wisconsin-Madison and Microsoft Research. LLaVA is a Large Language and Vision Assistant model that excels in multimodal chat interactions and visual instruction tasks. The model leverages data generated by GPT-4 to expand its instruction-following capabilities. Initial evaluations show that LLaVA outperforms GPT-4 in terms of chat abilities and achieves state-of-the-art accuracy on a multimodal reasoning dataset. LLaVA represents a promising development in multimodal language models and contributes to the open-source foundation model movement.
source update: The First Open Source GPT-4V Alternative – Towards AI