Abstract: We present Florence-VL, a new family of multimodal large language models (MLLMs) with enriched visual representations produced by Florence-2 [45], a generative vision foundation model.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results