Clip vision model.