Phi-3.5-vision-instruct ollama support?
Is there any way to get this model running locally on ollama? It's not among the phi-3.5 models listed in their index. It seems to fit my requirements pretty well.
https://huggingface.co/microsoft/Phi-3.5-vision-instruct
Also, what models do people really like for image comparison (two or more images compared for differences). I take it "multimodal" is what I need. So far gpt-4o gives me good results, Llama-3.2-11B-Vision was a bit dicey and slow, probably because I had to stitch the left and right images together and ask it about each side instead of inputting two images. But I am relatively new to this topic so I don't know everything out there.
UPDATE:
Little update here, I did some digging and there is an easy converter between Hugging Face safetensors (which phi-3.5-vision-instruct is based on) and ollama format. I got stuck for hardware reasons, but... in case this helps someone else out.
Basically:
git clone model repository (the_repo), which gives safetensors files and config.json etc.
cd the_repo
Now create a file called 'Modelfile' with the contents as so:
FROM .
TEMPLATE """{{ if .System }}System: {{ .System }}{{ end }}
User: {{ .Prompt }}
Assistant:"""
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful answers to the
user's questions."""
PARAMETER stop "User:"
PARAMETER stop "Assistant:"
PARAMETER stop "System:"
Then you can convert this into an ollama model like this:
ollama create phi-3.5-vision-instruct
. I got stuck here, because phi-3.5 family requires A100, A6000, or H100 hardware to run, I have an old 1080TI. So I guess it's not self-hostable for me. But I think it would have run if the architecture had been compatible.