{"id":73356,"date":"2025-03-17T21:00:37","date_gmt":"2025-03-17T13:00:37","guid":{"rendered":"https:\/\/www.hongkiat.com\/blog\/?p=73356"},"modified":"2025-03-11T18:40:32","modified_gmt":"2025-03-11T10:40:32","slug":"vision-enabled-models-ollama-guide","status":"publish","type":"post","link":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/","title":{"rendered":"3 Powerful Things You Can Do with Vision-Enabled Models in Ollama"},"content":{"rendered":"<p>Artificial intelligence keeps getting smarter, and vision-enabled language models are becoming essential tools for developers. These clever models can analyze images and describe them in plain English. By combining language understanding with computer vision, they can spot objects, details, or potential issues in visual content.<\/p>\n<p>In this article, we\u2019ll look into three practical ways you can use vision-enabled models in Ollama:<\/p>\n<ol>\n<li><a href=\"#image-alt-text-generation\">Image-to-Text Generation<\/a><\/li>\n<li><a href=\"#ocr-data-extraction\">Visual Data Extraction<\/a><\/li>\n<li><a href=\"#web-accessibility-testing\">Visual and Accessibility Testing<\/a><\/li>\n<\/ol>\n<hr>\n<h2 id=\"programming-language-selection\">Selecting Programming Language<\/h2>\n<p>Before we dive into these specific applications, let\u2019s discuss our choice of the programming language.<\/p>\n<p><strong>We\u2019ll be using PHP<\/strong><\/p>\n<h3>Why?<\/h3>\n<p>I understand PHP might not be people\u2019s first choice when working with AI. Many would opt to use <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.python.org\/\">Python<\/a>.<\/p>\n<p>However, I think PHP actually works great with LLMs. PHP can often run faster than Python <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/benchmarksgame-team.pages.debian.net\/benchmarksgame\/performance\/binarytrees.html\">in many cases<\/a>, making it ideal for building AI applications. With built-in features for handling HTTP requests and JSON, it\u2019s also easy to work with Ollama\u2019s API.<\/p>\n<hr>\n<h2 id=\"model-selection-guide\">Selecting a Model<\/h2>\n<p>Next, we are going to select the model to use.<\/p>\n<p>There are several vision-enabled models available in Ollama. It provides models with vision capabilities like <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/ollama.com\/library\/llava\">LLaVA<\/a> or <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/ollama.com\/library\/llama3.2-vision\">llama3.2-vision<\/a>.<\/p>\n<p>For this article, <strong>we\u2019ll be using the <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/ollama.com\/library\/llama3.2-vision\">llama3.2-vision<\/a> model<\/strong>. It\u2019s two times larger than the <strong>llava model<\/strong>, but it\u2019s also more powerful and accurate.<\/p>\n<hr>\n<h2 id=\"ollama-setup-requirements\">Pre-requisites<\/h2>\n<p>That said, to build the applications in this article, you will need the following installed and set up on your computer:<\/p>\n<ul>\n<li><strong><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/ollama.com\/\">Ollama<\/a><\/strong>: We\u2019ll use Ollama to download the model and run it locally. You can follow our article, <a href=\"https:\/\/www.hongkiat.com\/blog\/ollama-ai-setup-guide\/\">Getting Started with Ollama<\/a>, to learn how to install and set up Ollama on your computer.<\/li>\n<li><strong><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.php.net\/\">PHP<\/a><\/strong>: The programming language we\u2019ll use to build our applications. Check out our article, <a href=\"https:\/\/www.hongkiat.com\/blog\/manage-multiple-php-versions\/\">5 Ways to Manage Multiple Versions of PHP<\/a>, to manage PHP installations on your computer.<\/li>\n<\/ul>\n<p>After you have Ollama running, we can download llama3.2-vision by running the following command.<\/p>\n<pre>\r\nollama pull llama3.2-vision\r\n<\/pre>\n<p>Then, we can start building and running our applications.<\/p>\n<hr>\n<h2 id=\"image-alt-text-generation\">1. Image-to-Text Generation<\/h2>\n<p>One of the most useful features of vision-enabled models is their ability to describe images. These models can create captions, descriptions, and alt text that help make images accessible and understandable to everyone. Let\u2019s take a look at how we can implement this feature.<\/p>\n<p>I\u2019ve created a simple class called <code>AltText<\/code> that handles the conversion:<\/p>\n<pre>\r\nclass AltText implements Prompt {\r\n\r\n    use WithOllama;\r\n\r\n    public function fromImage(string $image): string {\r\n        \/\/ Parse the image, send prompt to Ollama, and return the response.\r\n    }\r\n}\r\n<\/pre>\n<p>The <code>fromImage<\/code> method takes an image path as input. It then encodes the image and sends it to Ollama for processing.<\/p>\n<p>Rather than diving into the PHP implementation details, which you can find in <strong>our <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/github.com\/hongkiat\/ollama-vision-enabled-llms\/blob\/main\/src\/AltText.php\">ollama-vision-enabled-llms<\/a> repository<\/strong>, let\u2019s focus on the key part: the prompt we send to Ollama. Here\u2019s what we use to generate the alt text:<\/p>\n<pre>\r\nGenerate concise, descriptive alt text for this image that:\r\n\r\n1. Describes key visual elements and their relationships\r\n2. Provides context and purpose\r\n3. Avoids redundant phrases like \"image of\" or \"picture of\"\r\n4. Includes any relevant text visible in the image\r\n5. Follows WCAG guidelines (130 characters max)\r\n\r\nFormat as a single, clear sentence.\r\n<\/pre>\n<p>Let\u2019s try this with an example image:<\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/pink-car.jpg\" alt=\"Pink vintage car parked on street with store sign\" width=\"1000\" height=\"640\"><\/figure>\n<div class=\"sue-icon-text su-image-caption\" data-url=\"\" data-target=\"self\" style=\"min-height:34px;padding-left:36px;color:#333333\">\n<div class=\"sue-icon-text-icon\" style=\"color:#333333;font-size:24px;width:24px;height:24px\"><i class=\"sui sui-photo\" style=\"font-size:24px;color:#333333\"><\/i><\/div>\n<div class=\"sue-icon-text-content su-u-trim\" style=\"color:#333333\"><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.flickr.com\/photos\/24759202@N05\/4398321372\">Pink Car by Reid Per Fiskerstrand<\/a><\/div>\n<div style=\"clear:both;height:0\"><\/div>\n<\/div>\n<p>To generate alt text for this image, we can call our class like below:<\/p>\n<pre>\r\necho (new AltText())->fromImage('.\/img\/image-1.jpg');\r\n<\/pre>\n<p>When we run this code, the model generates a pretty accurate description of the image, as shown below:<\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/pink-car-alt-text.jpg\" alt=\"AI-generated alt text example for pink car image\" width=\"1000\" height=\"320\"><\/figure>\n<hr>\n<h2 id=\"ocr-data-extraction\">2. Visual Data Extraction<\/h2>\n<p>Another useful capability of vision-enabled models is their ability to recognize and extract text from images, also known as Optical Character Recognition (OCR).<\/p>\n<p>These models can understand content structures such as tables, which could be particularly useful when you\u2019re working with screenshots of data tables, financial reports, or any tabular information trapped in image format.<\/p>\n<p>Let\u2019s create a simple tool that extracts tables from images and formats them as Markdown. This tool uses a class implementing the <code>Prompt<\/code> interface, as shown below, following a similar structure to our earlier application.<\/p>\n<pre>\r\nclass TableExtractor implements Prompt {\r\n\r\n    use WithOllama;\r\n\r\n    public function fromImage(string $image): string {\r\n        \/\/ Parse the image, send prompt to Ollama, and return the response.\r\n    }\r\n}\r\n<\/pre>\n<p>The difference would be in our prompt. In this example, our prompt focuses on extracting the table from the image:<\/p>\n<pre>\r\nExtract the table from this image and format it as a Markdown table\r\nwith the following requirements:\r\n\r\n1. Identify and include all column headers\r\n2. Preserve all data in each cell\r\n3. Maintain the alignment and relationships between columns\r\n4. Format output using Markdown table syntax:\r\n    - Use | to separate columns\r\n    - Use - for the header separator row\r\n    - Align numbers to the right\r\n    - Align text to the left\r\n\r\nResponse should only contain the Markdown formatted table, without\r\nany additional or explanatory text or list before or after the table.\r\n<\/pre>\n<p>Let\u2019s try this with the image below.<\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/invoice.jpg\" alt=\"Example invoice document for OCR processing\" width=\"1000\" height=\"640\"><\/figure>\n<p>Using our class, we can extract the table from the image and format it as Markdown using the <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/packagist.org\/packages\/erusev\/parsedown\">Parsedown<\/a>, like below:<\/p>\n<pre>\r\necho (new Parsedown())->text(\r\n    (new TableExtractor())->fromImage('.\/img\/image-2.jpg')\r\n);\r\n<\/pre>\n<p>The result, as expected, is a Markdown formatted table extracted from the image pretty accurately. Although, in this case, it also responds with the table headers as a list, <strong>somehow<\/strong>, as we can see below. I think we\u2019d need to recalibrate the prompt, but for now I\u2019m pretty happy with the result.<\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/invoice-table-response.jpg\" alt=\"Markdown table extracted from invoice image\" width=\"1000\" height=\"640\"><\/figure>\n<p>You can see the full source of the code implementation <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/github.com\/hongkiat\/ollama-vision-enabled-llms\/blob\/main\/src\/TableExtractor.php\">in the repository<\/a>.<\/p>\n<hr>\n<h2 id=\"web-accessibility-testing\">3. Visual and Accessibility Testing<\/h2>\n<p>Not everyone experiences websites the same way. Some people find it hard to read certain colors or need bigger text to read comfortably. Others might struggle with small buttons or low-contrast text.<\/p>\n<p>This is also where vision-enabled models can help. We\u2019ll use them to automatically check our websites for these accessibility issues. So we can make sure our websites are more accessible and create a better experience for as many users as possible.<\/p>\n<p>Let\u2019s create a simple tool that checks a website for accessibility issues. It also uses a class implementing the <code>Prompt<\/code> interface, as shown below:<\/p>\n<pre>\r\nclass VisualTesting implements Prompt {\r\n\r\n    use WithOllama;\r\n\r\n    public function fromUrl(string $url): string {\r\n        \/\/ Parse the URL, send prompt to Ollama, and return the response.\r\n    }\r\n}\r\n<\/pre>\n<p>We\u2019ll be adding the prompt to check a few relevant accessibility issues that can be observed from just an image, as follows:<\/p>\n<pre>\r\nAnalyze this UI screenshot for color contrast issues, which includes:\r\n- Text vs background contrast ratios for all content.\r\n- Identify any text below WCAG 2.1 AA standards.\r\n- Flag low-contrast text.\r\n<\/pre>\n<p>We\u2019ll try this prompt with the image below.<\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/pricing-table.jpg\" alt=\"Website pricing table for accessibility testing\" width=\"1000\" height=\"640\"><\/figure>\n<p>Using our class, we can check the image for accessibility issues using the <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/packagist.org\/packages\/erusev\/parsedown\">Parsedown<\/a>, like below:<\/p>\n<pre>\r\necho (new VisualTesting())->fromUrl('.\/img\/image-3.jpg');\r\n<\/pre>\n<p>The model can accurately recognize what the image is about and provide an assessment of the accessibility issues. The response is shown below:<\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/a11y-response.jpg\" alt=\"AI accessibility testing results for pricing table\" width=\"1000\" height=\"640\"><\/figure>\n<p>However, I found the results could sometimes be hit or miss, and it can run really slow depending on the details in the image being processed. So we might need to fine-tune the model configuration, use a model with higher parameters, and run it on better hardware.<\/p>\n<hr>\n<h2>Wrapping<\/h2>\n<p>Vision-enabled models open up smart and efficient ways to work with images. They simplify tasks like generating image descriptions, extracting data, and enhancing accessibility \u2013 all with just a few lines of code. While the examples we\u2019ve explored are just the beginning, there\u2019s room for improvement, such as fine-tuning models or crafting better prompts for more accurate results.<\/p>\n<p>As AI continues to evolve, adding vision capabilities to your workflow can help you create more powerful and user-friendly applications. It\u2019s something that I think you should definitely consider exploring.<\/p>","protected":false},"excerpt":{"rendered":"<p>Artificial intelligence keeps getting smarter, and vision-enabled language models are becoming essential tools for developers. These clever models can analyze images and describe them in plain English. By combining language understanding with computer vision, they can spot objects, details, or potential issues in visual content. In this article, we\u2019ll look into three practical ways you&hellip;<\/p>\n","protected":false},"author":113,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3398],"tags":[3545],"topic":[],"class_list":["entry-content","is-maxi"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v22.8 (Yoast SEO v27.6) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>3 Powerful Things You Can Do with Vision-Enabled Models in Ollama - Hongkiat<\/title>\n<meta name=\"description\" content=\"Artificial intelligence keeps getting smarter, and vision-enabled language models are becoming essential tools for developers. These clever models can\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"3 Powerful Things You Can Do with Vision-Enabled Models in Ollama\" \/>\n<meta property=\"og:description\" content=\"Artificial intelligence keeps getting smarter, and vision-enabled language models are becoming essential tools for developers. These clever models can\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/\" \/>\n<meta property=\"og:site_name\" content=\"Hongkiat\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/hongkiatcom\" \/>\n<meta property=\"article:published_time\" content=\"2025-03-17T13:00:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/pink-car.jpg\" \/>\n<meta name=\"author\" content=\"Thoriq Firdaus\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@tfirdaus\" \/>\n<meta name=\"twitter:site\" content=\"@hongkiat\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Thoriq Firdaus\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/\"},\"author\":{\"name\":\"Thoriq Firdaus\",\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/#\\\/schema\\\/person\\\/e7948c7a175d211496331e4b6ce55807\"},\"headline\":\"3 Powerful Things You Can Do with Vision-Enabled Models in Ollama\",\"datePublished\":\"2025-03-17T13:00:37+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/\"},\"wordCount\":1036,\"publisher\":{\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/assets.hongkiat.com\\\/uploads\\\/vision-enabled-models-ollama-guide\\\/pink-car.jpg\",\"keywords\":[\"Artificial Intelligence\"],\"articleSection\":[\"Internet\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/\",\"url\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/\",\"name\":\"3 Powerful Things You Can Do with Vision-Enabled Models in Ollama - Hongkiat\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/assets.hongkiat.com\\\/uploads\\\/vision-enabled-models-ollama-guide\\\/pink-car.jpg\",\"datePublished\":\"2025-03-17T13:00:37+00:00\",\"description\":\"Artificial intelligence keeps getting smarter, and vision-enabled language models are becoming essential tools for developers. These clever models can\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/#primaryimage\",\"url\":\"https:\\\/\\\/assets.hongkiat.com\\\/uploads\\\/vision-enabled-models-ollama-guide\\\/pink-car.jpg\",\"contentUrl\":\"https:\\\/\\\/assets.hongkiat.com\\\/uploads\\\/vision-enabled-models-ollama-guide\\\/pink-car.jpg\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/vision-enabled-models-ollama-guide\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"3 Powerful Things You Can Do with Vision-Enabled Models in Ollama\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/\",\"name\":\"Hongkiat\",\"description\":\"Tech and Design Tips\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/#organization\",\"name\":\"Hongkiat.com\",\"url\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/wp-content\\\/uploads\\\/hkdc-logo-rect-yoast.jpg\",\"contentUrl\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/wp-content\\\/uploads\\\/hkdc-logo-rect-yoast.jpg\",\"width\":1200,\"height\":799,\"caption\":\"Hongkiat.com\"},\"image\":{\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/hongkiatcom\",\"https:\\\/\\\/x.com\\\/hongkiat\",\"https:\\\/\\\/www.pinterest.com\\\/hongkiat\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/#\\\/schema\\\/person\\\/e7948c7a175d211496331e4b6ce55807\",\"name\":\"Thoriq Firdaus\",\"description\":\"Thoriq is a writer for Hongkiat.com with a passion for web design and development. He is the author of Responsive Web Design by Examples, where he covered his best approaches in developing responsive websites quickly with a framework.\",\"sameAs\":[\"https:\\\/\\\/thoriq.com\",\"https:\\\/\\\/x.com\\\/tfirdaus\"],\"jobTitle\":\"Web Developer\",\"url\":\"https:\\\/\\\/www.hongkiat.com\\\/blog\\\/author\\\/thoriq\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"3 Powerful Things You Can Do with Vision-Enabled Models in Ollama - Hongkiat","description":"Artificial intelligence keeps getting smarter, and vision-enabled language models are becoming essential tools for developers. These clever models can","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/","og_locale":"en_US","og_type":"article","og_title":"3 Powerful Things You Can Do with Vision-Enabled Models in Ollama","og_description":"Artificial intelligence keeps getting smarter, and vision-enabled language models are becoming essential tools for developers. These clever models can","og_url":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/","og_site_name":"Hongkiat","article_publisher":"https:\/\/www.facebook.com\/hongkiatcom","article_published_time":"2025-03-17T13:00:37+00:00","og_image":[{"url":"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/pink-car.jpg","type":"","width":"","height":""}],"author":"Thoriq Firdaus","twitter_card":"summary_large_image","twitter_creator":"@tfirdaus","twitter_site":"@hongkiat","twitter_misc":{"Written by":"Thoriq Firdaus","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/#article","isPartOf":{"@id":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/"},"author":{"name":"Thoriq Firdaus","@id":"https:\/\/www.hongkiat.com\/blog\/#\/schema\/person\/e7948c7a175d211496331e4b6ce55807"},"headline":"3 Powerful Things You Can Do with Vision-Enabled Models in Ollama","datePublished":"2025-03-17T13:00:37+00:00","mainEntityOfPage":{"@id":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/"},"wordCount":1036,"publisher":{"@id":"https:\/\/www.hongkiat.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/#primaryimage"},"thumbnailUrl":"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/pink-car.jpg","keywords":["Artificial Intelligence"],"articleSection":["Internet"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/","url":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/","name":"3 Powerful Things You Can Do with Vision-Enabled Models in Ollama - Hongkiat","isPartOf":{"@id":"https:\/\/www.hongkiat.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/#primaryimage"},"image":{"@id":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/#primaryimage"},"thumbnailUrl":"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/pink-car.jpg","datePublished":"2025-03-17T13:00:37+00:00","description":"Artificial intelligence keeps getting smarter, and vision-enabled language models are becoming essential tools for developers. These clever models can","breadcrumb":{"@id":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/#primaryimage","url":"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/pink-car.jpg","contentUrl":"https:\/\/assets.hongkiat.com\/uploads\/vision-enabled-models-ollama-guide\/pink-car.jpg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.hongkiat.com\/blog\/vision-enabled-models-ollama-guide\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.hongkiat.com\/blog\/"},{"@type":"ListItem","position":2,"name":"3 Powerful Things You Can Do with Vision-Enabled Models in Ollama"}]},{"@type":"WebSite","@id":"https:\/\/www.hongkiat.com\/blog\/#website","url":"https:\/\/www.hongkiat.com\/blog\/","name":"Hongkiat","description":"Tech and Design Tips","publisher":{"@id":"https:\/\/www.hongkiat.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.hongkiat.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.hongkiat.com\/blog\/#organization","name":"Hongkiat.com","url":"https:\/\/www.hongkiat.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.hongkiat.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.hongkiat.com\/blog\/wp-content\/uploads\/hkdc-logo-rect-yoast.jpg","contentUrl":"https:\/\/www.hongkiat.com\/blog\/wp-content\/uploads\/hkdc-logo-rect-yoast.jpg","width":1200,"height":799,"caption":"Hongkiat.com"},"image":{"@id":"https:\/\/www.hongkiat.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/hongkiatcom","https:\/\/x.com\/hongkiat","https:\/\/www.pinterest.com\/hongkiat\/"]},{"@type":"Person","@id":"https:\/\/www.hongkiat.com\/blog\/#\/schema\/person\/e7948c7a175d211496331e4b6ce55807","name":"Thoriq Firdaus","description":"Thoriq is a writer for Hongkiat.com with a passion for web design and development. He is the author of Responsive Web Design by Examples, where he covered his best approaches in developing responsive websites quickly with a framework.","sameAs":["https:\/\/thoriq.com","https:\/\/x.com\/tfirdaus"],"jobTitle":"Web Developer","url":"https:\/\/www.hongkiat.com\/blog\/author\/thoriq\/"}]}},"jetpack_featured_media_url":"https:\/\/","jetpack_shortlink":"https:\/\/wp.me\/p4uxU-j5a","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/posts\/73356","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/users\/113"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/comments?post=73356"}],"version-history":[{"count":2,"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/posts\/73356\/revisions"}],"predecessor-version":[{"id":73358,"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/posts\/73356\/revisions\/73358"}],"wp:attachment":[{"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/media?parent=73356"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/categories?post=73356"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/tags?post=73356"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/www.hongkiat.com\/blog\/wp-json\/wp\/v2\/topic?post=73356"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}