How To Describe a Dataset For A Computer Vision Classification Problem

As a data scientist I worked on several machine learning and deep learning projects related to the computer vision field. In each project, I was asking myself how to choose the best dataset, and I realized that an accurate and well-organized description would give me the right answer. In this article, I would like to share with you the following table (table 1) which I developed to describe a dataset of images for classification projects in machine learning.

  • General information: Dataset name, link, and size.
  • Images dimensions: Dimension range for both width and height gives you a better idea about the images and about the transformation that you may apply, also an average value gives you an intuition about the dimension value for most images.
  • Number of images: · Depending on the problem you want to solve, there will be an acceptable number that you can deal with. But if the problem is very complex, then this number may need to be sufficient to cover all the possible cases.
  • Number of classes: The number of classes will help you choose and set up a ML/DL algorithm.
  • Number of images per class: It is very important to know whether the dataset is balanced or imbalanced as it will affect the whole process of training and validating of the ML/DL model.
  • Number of images per extension: Sometimes we are interested in a specific image extension. This info will help you to know the portion of images per extension
  • Images File size: Will give you an intuition about the images file size distribution.
  • Notes: This is useful if you want to add some additional information or notes about the dataset. (such as permissions, ethics…etc)
Artificial Intelligence Jobs

Trending AI Articles:

1. How to automatically deskew (straighten) a text image using OpenCV

2. Explanation of YOLO V4 a one stage detector

3. 5 Best Artificial Intelligence Online Courses for Beginners in 2020

4. A Non Mathematical guide to the mathematics behind Machine Learning

In order to understand the idea better let me show you a quick demo. The following table (Table 2) shows a description of a Covid19 dataset from Kaggle website.

This is all for this article, I hope you find it useful, and would you please share with me your ideas about the discussed topic.

Don’t forget to give us your ? !


How To Describe a Dataset For A Computer Vision Classification Problem was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/how-to-describe-a-dataset-for-a-computer-vision-classification-problem-7a93b43903d5?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/how-to-describe-a-dataset-for-a-computer-vision-classification-problem

Published by 365Data Science

365 Data Science is an online educational career website that offers the incredible opportunity to find your way into the data science world no matter your previous knowledge and experience. We have prepared numerous courses that suit the needs of aspiring BI analysts, Data analysts and Data scientists. We at 365 Data Science are committed educators who believe that curiosity should not be hindered by inability to access good learning resources. This is why we focus all our efforts on creating high-quality educational content which anyone can access online.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Design a site like this with WordPress.com
Get started