Originally from KDnuggets https://ift.tt/3ona6LA
source https://365datascience.weebly.com/the-best-data-science-blog-2020/software-20-takes-shape
365 Data Science is an online educational career website that offers the incredible opportunity to find your way into the data science world no matter your previous knowledge and experience.
Originally from KDnuggets https://ift.tt/3ona6LA
source https://365datascience.weebly.com/the-best-data-science-blog-2020/software-20-takes-shape
Originally from KDnuggets https://ift.tt/3dTk82c

What is data visualization and why is it important?
Data visualization is the art of providing insights with the aid of some type of visual representation, such as charts, graphs, or more complex forms of visualizations like dashboards. Usually, the process involves various data visualization software – top data visualization tools such as Tableau, Power BI, or Python, and R on the programming end.
Investing time in learning data visualization techniques is worthwhile, as data visualization is becoming one of the most sought out fields in data science overall. Moreover, excellent data visualization skills are high-in-demand across a myriad of businesses and industries and open the door to many rewarding career opportunities.
With that in mind, we dedicate this post to some of the classic data visualizations combined with inspirational data visualization project ideas. Data is beautiful and invaluable when presented the right way and we believe the examples we listed below will come in handy in your own practice.
In this Top 10, you will find the staples in data visualization and ideas on how to use them in different projects. You can use the table of contents to jump directly to the ones that interest you most or just scroll down to absorb all dataviz ideas from first to last.
Table of Contents
1. Bar Chart Data Visualization Project ideas
2. Time Series Data Visualization Project Ideas
3. Box Plot Data Visualization Project Ideas
4. Word Cloud Data Visualization Project Ideas
5. Map Data Visualization Project Ideas
6. Graph Network Data Visualization Project Ideas
7. Race Chart Data Visualization Project Ideas
8. Correlogram Data Visualization Project Ideas
9. Dendogram Data Visualization Project Ideas
10. Heatmap Data Visualization Project Ideas

Any data visualization journey starts with the bar chart.
So, to answer the question we posed at the start “What is data visualization?”: in the majority of cases, the answer is the bar chart. It’s one of the most popular data visualization examples you’ll ever come across because it is truly versatile, intuitive, and clear as a visualization.
There is no shortage of available options here. However, our suggestion is plotting the flight delays values, as suggested in this Kaggle tutorial:

Time series data is one of the staples in data visualization. So, chances are, no matter what field you’re working in, at one point or another you’ll face a project where you’ll have to display data with time series elements.
For this type of data, it is crucial to make sure the date features in your data are converted into date type format. No matter what your go-to data visualization tools are: Tableau, Python, R, or Excel, the conversion step is crucial to ensure your data is plotted correctly.
That said, here’s a great project idea to explore: Stock Returns indices data. You can visualize and compare different stock market returns for various indices, at different points in time. You can easily download the up to date stock market information from the finance yahoo website:

Box plot is a chart that might seem a bit intimidating or foreign if you’re seeing it for the first time. But nothing is too complicated once you get to know it better. We use the box to represent numerical data via quartiles. The whiskers that you sometimes see on top of this type of chart show the variability of the data. In such cases, we call it a box and whiskers plot.
Project-wise, we continue with the stock market theme because opening and closing prices on the stock market is one of the prime use cases of this visualization. And, of course, you can check out yahoo finance for the most current data.

When it comes to data visualization examples, word clouds are often neglected, when in fact, they can be quite useful. Recently, they’ve found a place aiding text data analysis. Turns out, when performing sentiment analysis, word clouds can be tremendously helpful to find common topics within a cluster. Therefore, any time you’re looking at the most common items within a topic, word clouds can be a helpful way of visualizing your data.
Project idea? Any type of top 10 list, or most popular word search. Why not do a word cloud on the subject of top data visualization projects? Or head over to the Large Movie Reviews Dataset and try data visualizations based on their data.

Being able to chart and interpret geographical data is one of the utmost skills required for a data viz expert. Depending on what software you use, this can vary in terms of difficulty. The free data visualization software most equipped to handle geographical data is probably Tableau and I recommend using it if there are no specific software requirements. Or you could also try R’s highcharter or Python plotly module (alternatively cartopy, which is based on matplotlib) if you’d prefer statistical analysis tools for visual communication.
An interactive map of Australia’s bioluminescence organisms is one of the best visualization projects just in general. Why not try and recreate the result yourself?

This type of visualization usually reflects complex systems where the importance is placed on the interaction between the elements. Despite being intricate, networks are one of the most inspiring topics in terms of dataviz, as they show that information is beautiful when translated in the correct form. Think infrastructure, social networks or biological pathways such as genetic pathways or integrated systems – all of them can be displayed with the help of a network.
If you’re looking for graph network data viz project ideas, you can head over to the network repository and explore numerous data sets on a variety of topics. The great news is that you can directly visualize each data set on the same site using their interactive tool. And maybe it’s only me but it’s great fun exploring all the different networks.

The race bar chart is an animated bar chart, showing the development of an entity (usually top 10) over time. Recently made popular by Data is Beautiful YouTube channel. There are numerous interesting races in stock, for instance, the most popular sci-fi Movies from 1968 until 2019 (that is my personal favourite). But hey, if you’re stuck for data visualization projects ideas here is our proposal.
Go over to Kaggle and see how to implement the bar chart race of the most populous Turkish provinces from 2007 until 2018 :

Data visualization examples run through various parts of the data science process. And correlograms are a part of the data exploratory phase that can reveal information on various relationships within our data. A correlogram displays n variables within our data on an (n-1)x(n-1) grid of subplots. On these subplots, you can display scatter plots, density plots, or histograms, each revealing different insights about your data.
For a correlogram data visualization project, you could try out a classic, like the Iris data set. In fact, any data where you have numerical features will do the trick. However, we recommend a data set you’d most likely be familiar with. This way, you can practice and delve into the different options presented with this form of visual.

Continuing with data visualization examples from data science, we delve straight into machine learning with a technique used in unsupervised learning – the dendrogram. A dendrogram is a type of tree used for the hierarchical representation of points and is the main data visualization used for hierarchical clustering solutions. In fairness, results in machine learning, tend to be hard to visualize. That is one of the reasons why the field is considered hard to understand… Without any visual, it’s hard to develop an intuition of the matter. That’s why we couldn’t skip the chance to include this data visualization example.
Any type of clustering data set will do for such a project. You can visit the UCI Machine Learning Repository and check out their clustering data sets. Just a small tip. If you’re using hierarchical clustering, a large data set might require extra computing time. So keep that in mind.

Heatmap visualization is surely one of the most effective ways to intuitively show relationships between variables. What makes a heatmap stand apart is the excellent use of colors that contribute to the intuitive understanding of the plot. With a heatmap, you can observe the correlation between variables within your data and find dependencies.
The Heatmap is yet another crucial element for data analysis (or beginning stages of machine learning tasks).
So, to wrap up our list on a high note, here is an idea for a data visualization project with widespread application in data science. In fact, it is the same suggestion we started with: flight delays. There is hardly a better example of how data visualizations are interconnected.
If you’re eager for more ideas, here is another of my favorite data visualization examples, which features microbial life represented as a heatmap.
Looking for data visualization training that will teach you how to turn any bad data visualization into a great one? Check out our data visualization course where you’ll learn how to create stunning data visualizations with free data visualization tools: Python, R, Tableau, and Excel.
The post Top 10 Data Visualization Project Ideas 2020 appeared first on 365 Data Science.
from 365 Data Science https://ift.tt/34n0xof
Originally from KDnuggets https://ift.tt/31yb91Q
Originally from KDnuggets https://ift.tt/2IWLDg8
Example of A/B Testing and Multiple Comparisons and Hypothesis Testing
Continue reading on Becoming Human: Artificial Intelligence Magazine »

Machine learning projects are on everyone’s lips, but from customer projects we know that the implementation of AI projects is a mystery to many. That’s why we will show you how the life cycle of our machine learning projects looks like in a series of blog posts. Our target audience for this series are project managers, engineers, decision makers, and everyone else planning an AI project.
In this first part of our series we’re going to briefly touch upon the single project phases. We’re also going to discuss special challenges we face in the field of computer vision. In later blog posts we will take a closer look at each project phase.

Machine learning projects start like any other technology project. There is a problem or a need, and we begin to explore the task and discuss possible approaches to solve it. However, the execution of AI projects is fundamentally different from traditional technology ventures, because they are of more iterative and explorative nature. This is why every machine learning project is carried out in a life cycle process.
Machine learning projects always involve a high degree of uncertainty in terms of workload and result quality. To minimize the risk and investment for
our customers we strictly follow the “start small, fail fast” philosophy.
This means we build a feature complete system with the minimal possible workload to get a fast feedback if the model and available data play well together. Then we improve the data and model in iterations (one iteration is a complete life cycle run) to raise the result quality to the needed level.
Let’s look at the life cycle phases.
Machine learning models should solve a given problem on the basis of data. Therefore everything starts with collecting enough samples with proper metadata.
Quality, quantity, and the balance of the data are the decisive points in data collection. The more data we have and the better the quality and balancing is, the better the model will learn and predict accurately.
1. Fundamentals of AI, ML and Deep Learning for Product Managers
3. Graph Neural Network for 3D Object Detection in a Point Cloud
4. Know the biggest Notable difference between AI vs. Machine Learning
The quality of the samples is important because wrong or misleading samples or metadata (called noisy data) will confuse the model and dramatically lower the prediction quality. We can improve the quality with data cleaning (phase 2 of the life cycle).
Having balanced data means to have roughly the same amount of training data for each class. Unbalanced training data can lead to biased models as classes are not represented equally.
In computer vision projects we often face a lack of training data (correctly labeled images). To increase quantity and improve balance of the data we might be able to use data synthesis to create training data programmatically ourselves. This process can be very complex and there are various methods to do this.
Another common method to create more data is called data augmentation. We create additional data by modifying existing samples, e.g. through random cropping, adding noise, changing colors or brightness.
Let’s say we have collected enough data, then we need to create a structure we can feed the model with.
We clean the data by identifying noise, false or misleading data and correct or remove it from the training set. Additionally, we preprocess the data to normalize it. In our cases this mostly mean scaling or cropping images, converting them into a relevant format and creating a folder structure we can use for training.
Collecting, cleaning, and preprocessing data are our biggest and most time-consuming challenges. It is not unusual to spend a major portion of the project time for these tasks.
During model evaluation we take a closer look at different models and model architectures in order to find out which architectures work well with certain data and certain problems.
There are models that work well with text, e.g. translation, term classification. Other models work well with images, e.g. classification models, detection models, or localization models. Our experience, best practice orientation, and scientific research lead us to the appropriate model for our current project.
Before we start training the model we split the training data set into actual training data (the majority of the data, let’s say 75%), validation data (10%), and test data (15%). The actual distribution can vary depending on the amount of data available. Training data and validation data is used for model training. The test data is used after the training to validate the model performance with unseen data.

To train a model in the field of computer vision is more complex and time-consuming than text-based machine learning tasks. This is because we use deep and complex models, and the needed data for these models tend to be very large, up to terabytes. Calculation is therefore very time consuming.
After finishing the training as described above, we assess the quality of the model. We work with the model to understand its behavior: which aspects are already solved very well, and which are not. By inspecting the visual data we interprete necessary changes to the training set in order to optimize result quality. An adjustment could be for example to collect or synthesize more data from a specific category.
Sometimes we even have to change the model architecture, especially if we find that the model either cannot grasp the task or just memorizes the training set (under- and overfitting).
In this step it is time to share the progress we’ve made so far with our customer. We present our findings on the quality and condition of the model, we show what worked and what did not work. A good teamwork with our customer is significant here. Together, we discuss possible improvements of the model, for example gathering more data and where to get this data from. In close cooperation, we plan the next iteration of model training.
The deployment of our current model version acts as the quality base line for the following training iteration. If the model already adds value for the customer, it can be integrated in his prototype or even in production. Meanwhile, we begin the next iteration of training, the life cycle starts again.
Stay tuned for our next article of this series. We’re going to talk about all things regarding data: how we collect it, how we clean it, and how we preprocess it. If you need an experienced helping hand with your AI project, just get in touch with us.



Applied Machine Learning Life Cycle for Computer Vision Tasks was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally from KDnuggets https://ift.tt/2Tjc7dv
Originally from KDnuggets https://ift.tt/3m9NkFg