Vision-Language Models (VLMs) demonstrate remarkable zero-shot generalization to unseen tasks, but fall short of the performance of supervised methods in generalizing to downstream tasks with limited ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results