Continuing 3Cloud’s series on the Microsoft Cognitive Services APIs, let’s look into Vision. Using the pre-built artificial intelligence from the Vision APIs will give your apps and other solutions the rich capabilities of different types of image and video analysis.


All About Vision

The Vision capabilities use AI to automatically provide detailed explanations of what your images and videos contain. The following Vision services cover many of the common tasks that you might use to enrich your applications. 

vision-computervision Computer Vision API Analyze image content, obtain tags and categories, identify text using optical character recognition, flag racy or adult content, crop photos as thumbnails, and more
vision-face Face API Locate and group faces in images as well as suggest gender, age, and emotions (previously a separate Emotion API)
vision-customvision Custom Vision Service Train and use your own custom image classifiers
vision-videoindexer Video Indexer Analyze video content to track faces, gauge sentiment, transcribe audio, recognize different speakers, and more
vision-contentmoderator Content Moderator Detect and potentially filter offensive content from images and video, and flag content for human review

To try any of these Vision Cognitive Services, get free trial API keys here. 

Custom Vision Service

Let’s peer into some of the functionality for one of Microsoft’s Vision offerings: the Custom Vision ServiceAmong the Vision APIs, the Custom Vision Service is unique because it enables developers to easily train their own image classification model. In addition to the abundant data provided by pre-trained AI from the Computer Vision and Face APIs, the Custom Vision Service lets you go a step further and tag your own images. Once you have trained a model and it’s ready for production, create a personalized API endpoint. Ultimately, send additional images to the API to classify them and predict how closely they align with your custom labels.

Why would you consider using the Custom Vision Service and not the regular Computer Vision API? Custom Vision is beneficial in cases where you need precise or proprietary labels for your data. For example, a manufacturer may want to label product lines using their own enterprise terminology, which the Computer Vision API is not equipped to do. In fact, consider using both APIs together. Here is a look at some of the output from the Computer Vision API and how you can enhance that output using the Custom Vision Service.

Use the Computer Vision API for General Results


Use the Custom Vision Service for Personalized Tagging


Through its website, the Custom Vision Service lets you build an image classifier using a process similar to tagging images on social media. Even if you are not a data scientist, you could take advantage of Microsoft’s AI without the time and resources needed to code and productionalize a custom image model from scratch.


With a language such as Python, you can use code to perform the same tasks as the  website. Code gives you the advantage of greater scalability, faster changes, and the ability to reference image URLs instead of manually uploading images.


Getting Started with the Custom Vision Service

To get started, go to and sign in with a Microsoft account associated with an Azure subscription. The first time you sign in, you need to agree to Microsoft’s Terms of Service.

When adding a new project, choose between Classification and Object Detection. Classification predicts labels for an entire image while Object Detection details where tagged content appears in an image. With a generic classification project, you may optionally select targeted domains for scenarios like Food or Retail.


To train a model, you currently need at least 15 images for every tag, but precision should increase with additional images. Microsoft recommends at least 50 images per tag. With a standard project, you can include up to 250 tags and up to 50,000 training images. It also helps to include images that contain your objects of interest in a variety of settings and backgrounds.

The Custom Vision Service also provides precision and recall by tag as basic measures of model performance. To use your model in production, you need a subscription key. As with the other Cognitive Services, incorporate your Custom Vision model into a solution with a basic API call.


As you can see, the Vision APIs are convenient for a variety of tasks. Whether you are a developer or data scientist, these APIs bring you advanced AI capabilities without the cost and time of training your own image models. You can always decide if you need more at a future point, but the Vision APIs provide a jump start into an area where it is time- and resource-intensive to build these types of models on your own. Also, consider looking further into all of the Cognitive Services here, and combine Vision with other sets of APIs like Language or Speech to make your apps even more intelligent.

More to Come

So far, 3Cloud has surveyed the Search and Vision APIs. In addition, we will explore the remaining Cognitive Services categories: Speech, Knowledge, and Language. Subscribe to our blog so that you don’t miss out, and contact us if you would like to learn more about incorporating Cognitive Services into your own solutions.