365 Data Science

How to evaluate the Machine Learning models?Part 3

How to evaluate the Machine Learning models? — Part 3

Congratulations everyone- who is reading this metric series, with this article we have came half way in the metric series. This is third part of the metric series where, we will discuss about predicting power score which act as substitute of the correlation and confusion metrics and derived from the confusion metrics which is very important when we evaluate the model using accuracy. The metric derived from the confusion metric act as a justification along with the confusion metric to justify our model. Below is the images of the metric which we are going to discuss in this article.

Predicting Power score

Predictive power score is a normalized metric which has range [0,1], it shows up to what extend IV can be used to predict DV. It is alternative to correlation. It is very effective in following ways:

It can detect and summaries linear and non-linear relationship also.
It is asymmetric, i.e. IV to DV not vice versa.

It is slower to correlation in calculation but it is useful in many ways and can give accurate results. This article is focused on the confusion metric therefore I have not described in detail. For other correlation metrics article.

Jupyter Notebook: Link

2. Confusion Metric

Confusion metric is like mother/father of classification metric from which many metric can be derived such as Accuracy, Precision and many other which we are going to discuss. It is applicable to machine learning as well as deep learning models.

If confusion metric is a metric of size m *m ( m is no. of classes) , if we traverse row wise i.e. left to right it represent Actual Value and if we traverse column wise only it represent Predicted Value and the combination of the row can column give the count of the TP, TN, FP, and FN.

FN (False Negative) = Predicted value is Falsely predicted and the actual value is positive but the model predicted negative value . It is also known as Type 2 error.

FP (False Positive) = Predicted value is Falsely predicted and the actual value is negative but the model predicted positive value. It is also known as Type 1 error.

TN (True Negative) = Predicted value matches the actual value. The model predicted the negative value and the actual value is also negative.

TP (True Positive)= Predicted value matches the actual value. The model predicted the positive value and the actual value is also positive.

Why we need the Confusion metric?

Suppose we have dataset which has 90 dog images and 10 cat images and a model is trained to predict dog, where a trained model predicted 80 correct sample and 20 incorrect sample.

Jupyter Notebook: Link

2.1. Accuracy

Accuracy is defined as the no of correctly predicted sample( TN+ TP) to the total no. of samples (TN+TP+FN+FP).

**Fig.3.** Explanation Accuracy Metric with Confusion Metric

Trending AI Articles:

1. How to automatically deskew (straighten) a text image using OpenCV

2. Explanation of YOLO V4 a one stage detector

3. 5 Best Artificial Intelligence Online Courses for Beginners in 2020

4. A Non Mathematical guide to the mathematics behind Machine Learning

In above image, we can see accuracy is giving wrong data about the result i.e. model is saying it will predict dog 80% of the time, actually it is doing opposite. We saw that, the accuracy of the model is very good 80% but dataset is imbalance i.e. one class has more no of sample than other which makes model biased towards one class and hence in real world application it will not be able to work according to our expectation. So we need other metrics also which support our model more.

Jupyter Notebook: Link

2.2. TPR

TPR is known as True Positive rate, which is ration of TP: ( TP+FN). Value of TPR is directly proportional to goodness of model i.e. TPR increases model becomes better. Alternative to accuracy metric.

Jupyter Notebook: Link

2.3. FNR

FNR is known as False Negative rate, which is ration of FN: ( TP+FN). Value of FNR is indirectly proportional to goodness of model i.e. FNR decreases model becomes better. Alternative to accuracy metric.

Jupyter Notebook: Link

2.4. TNR

TNR is known as True Negative rate, which is ration of TN: ( FP+TN). Value of TNR is directly proportional to goodness of model i.e. FPR increases model becomes better. Alternative to accuracy metric.

Jupyter Notebook: Link

2.5. FPR

FPR is known as False Positive rate, which is ration of FP: ( FP+TN). Value of FNR is indirectly proportional to goodness of model i.e. FPR decreases model becomes better. Alternative to accuracy metric.

Jupyter Notebook: Link

2.6. Precision

Precision tell us how many of the samples were predicted correctly to the all positive predicted samples i.e. TP:(TP+FP). We cannot afford to have more FP FP because it will degrade our model. It is very useful when FP is more important than the FN especially in terms of healthcare.

Jupyter Notebook: Link

2.7. Recall

Recall tell us how many of the samples were predicted correctly to the all the actual positive predicted samples i.e. TP:(TP+FN). We cannot afford to have more FN because it will degrade our model. It is very useful when FN is more important than the FP especially in terms of security of places.

High precision we will have low recall and vice versa. It is trade of between these two metric similar to Bias-Variance Tradeoff, so it depends upon the use case which metric we can select.

Jupyter Notebook: Link

2.8. F1-score

As we have discussed above we precision and recall are inversely related and in some cases where we are unable to decide which metric to use so we procced with F1-score which is combination of precision and recall. F1-score is harmonic mean of precision and recall has high value when precision and recall are equal. It is always used with other evaluation metric because it lacks interpretation.

Jupyter Notebook: Link

2.9. F-Beta Sore.

We have discussed earlier that F-score uses geometric mean instead of the arithmetic mean because Harmonic mean punishes penalize bigger value more.

If P=0 recall=1 then arithmetic mean=0.5, it is clear it is output of dumb model- ignore the input feature and predict one of the classes only. Now if we take Harmonic mean of Precision and recall, we will get 0. Some researchers can give more importance/ weight to precision or recall, so they can attach the adjustable parameter known as beta-square which will give beta times importance to precision or recall respectively.

Jupyter Notebook: Link

2.10. Thresholding

Thresholding is a technique in which is applied to the probability of the prediction of the models. Suppose you have used “softmax” as the activation layer in the output layer — gives probability of the prediction. In thresholding we can select a particular value so that if the predicted value is higher to the particular value we can classify into one class/label and if the predicted value is lower that the particular value we can classify into another class/label. This particular value is known as threshold and the process is called thresholding.

The threshold is determined by considering TP, TN, FP, and FN as well as the use case or business scenario.

Jupyter Notebook: Link

2.11. AUC — ROC

AUC is referred as Area Under the Curve i.e. the the amount of area which are under the line( linear or non linear). ROC means Receiver Operating Characteristic, initially it was used for differentiating noise from not noise but in recent years it is used very frequently in binary classification.

ROC gives trade-off between TP and FP where x- axis represent FPR and y-axis represent TPR. Total area of ROC is 1 unit because TPR and FPR value has range[0,1]. It can be clearly analyzed that more the area under the curve better the model as it can clearly differentiate between positive and negative class but AUC in ROC should be greater than 0.5 unit.

AUC = 1, then the binary classification model is able to perfectly differentiate between all the Positive and the Negative class points correctly.

AUC =0, then the classifier would be predicting all Negatives as Positives, and all Positives as Negatives.

0.5<AUC<1, there is a high chance that the binary classification model will be able to differentiate between the positive class values from the negative class values. This is so because the classifier is able to detect more numbers of True positives and True negatives than False negatives and False positives.

AUC=0.5, then the classifier is not able to distinguish between Positive and Negative class points. Meaning either the classifier is predicting random class for all the dataset.

Few more terms:

TPR is also known as Sensivity.

Specificity = 1- FPR

So, don’t get confused when you will see the combination of these words in any other blog/articles.

Steps to build AUC-ROC:

Arrange the dataset on the basis on the predictions of probability in decreasing order.
Set the threshold as the ith row of the prediction (for 1st take 1 prediction for second take 2nd as threshold).
Calculate the value of TPR and FPR.
Iterate through all the data point in dataset and repeat the step 1, 2 and 3.
Plot the value of TPR (Sensivity) and FPR (Specificity = 1- FPR).
Select the model with maximum AUC.

Jupyter Notebook: Link

2.11. AUC — PRC

AUC is referred as Area Under the Curve i.e. the the amount of area which are under the line( linear or non linear). PRC means Precision Recall Curve, it is used very frequently in binary classification.

PRC gives trade-off between Precision and Recall where x- axis represent Precision and y-axis represent Recall. Total area of PRC is 1 unit because Precision and Recall value has range[0,1]. It can be clearly analyzed that more the area under the curve better the model as it can clearly differentiate between positive and negative class but AUC in PRC should be greater than 0.5 unit because of imbalance dataset as small change in prediction can show drastic change in AUC-PRC curve.

A binary classification model with correct predictions is depicted as a point at a coordinate of (1,1). A skillful model is represented by a curve that bows towards a coordinate of (1,1). A dumb binary classification model will be a horizontal line on the plot with a precision that is proportional to the number of positive examples in the dataset. For a balanced dataset this will be 0.5.The focus of the PR curve on the minority class makes it an effective diagnostic for imbalanced binary classification models. Precision and recall make it possible to assess the performance of a classifier on the minority class. Precision-recall curves (PR curves) are recommended for highly skewed domains where ROC curves may provide an excessively optimistic view of the performance

Steps to build AUC-PRC:

Arrange the dataset on the basis on the predictions of probability in decreasing order.
Set the threshold as the ith row of the prediction (for 1st take 1 prediction for second take 2nd as threshold).
Calculate the value of Precision and Recall .
Iterate through all the data point in dataset and repeat the step 1, 2 and 3.
Plot the value of Precision and Recall.
Select the model with maximum AUC.

Jupyter Notebook: Link

Micro average is the precision or recall of f1-score of classes.

Macro average is the average of precision or recall of f1-score of classes.

Weighted average is just the weighted average of precision or recall of f1-score of classes.

Special Thanks:

As we say “Car is useless if it doesn’t have a good engine” similarly student is useless without proper guidance and motivation. I will like to thank my Guru as well as my Idol “Dr. P. Supraja”- guided me throughout the journey, from bottom of my heart. As a Guru, she has lighted the best available path for me, motivated me whenever I encountered failure or roadblock- without her support and motivation this was an impossible task for me.

Contact me:

If you have any query feel free to contact me on any of the below-mentioned options:

Website: www.rstiwari.com

Medium: https://tiwari11-rst.medium.com

Google Form: https://forms.gle/mhDYQKQJKtAKP78V7

Notebook for Reference:

Jovian Notebook: https://jovian.ai/tiwari12-rst/correlation

Jovian Notebook: https://jovian.ai/tiwari12-rst/metric-part3

References:

Environment Set-up: https://medium.com/@tiwari11.rst

Jovian: https://jovian.ml/docs/user-guide/install.html

YouTube : https://www.youtube.com/channel/UCFG5x-VHtutn3zQzWBkXyFQ

Don’t forget to give us your ? !

How to evaluate the Machine Learning models? — Part 3 was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/how-to-evaluate-the-machine-learning-models-part-3-ff0dd3b76f9?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/how-to-evaluate-the-machine-learning-modelspart-3

Pandas on Steroids: End to End Data Science in Python with Dask

End to end parallelized data science from reading big data to data manipulation to visualisation to machine learning.

Originally from KDnuggets https://ift.tt/3k4hg4h

source https://365datascience.weebly.com/the-best-data-science-blog-2020/pandas-on-steroids-end-to-end-data-science-in-python-with-dask

Cognitive Automation helps where RPAs fall short

And it does so in other places too. In my opinion (#POV), Cognitive Automation is the “how” to the “what” being defined as automation or generally speaking digital transformation (aka digitization).

So what is the difference between these two fellows? While browsing the Internet you might find a number of different descriptions or opinions. I have even seen something called Cognitive RPA... Anyway, let me share some definitions we use at byteLAKE.

RPA, Robotic Process Automation, is a (collection of… or a framework for…) software robot(s). It relies on basic technologies, a rule-based approach to automate easy, simple, yet repetitive and time-consuming tasks. Typical examples are macros for automated calculations, files transfers from scanners’ folders to teams’ network locations or even basic files processing.

Cognitive Automation, on the other hand, relies on knowledge and intends to mimic human behaviors and actions. In other words, it leverages Artificial Intelligence to assist humans in complex tasks execution, helps analyze all sorts of data and performs non-routine tasks. Industries gradually embrace the power of Cognitive Automation and example applications vary from office tasks automation like extracting information from documents, through Big Data analytics like finding trends, answers hidden in the data or predictive maintenance, to AI-assisted production lines monitoring in manufacturing for example.

Both help companies effectively reduce costs, increase productivity, offload humans from monotonous tasks and in the case of cognitive automation, augment humans capabilities. Thus Cognitive Automation can address much more complex activities and provide value in the form of an expert system, guiding in decision-making processes, provide insights and the like. Unlike RPA or simple macros or bots, Cognitive Automation can read documents or IoT sensors and perform analytics on gathered data, leverage cameras to trigger alarms or assist in quality inspection procedures and many more.

Cognitive Automation relies on knowledge and intends to mimic human behaviors and actions.

Don’t forget to give us your ? !

Cognitive Automation helps where RPAs fall short was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/cognitive-automation-helps-where-rpas-fall-short-a1c5a01a66f8?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/cognitive-automation-helps-where-rpas-fall-short

Youll never get confused about Type-I and Type-II errors again!

You’ll never get confused about Type-I and Type-II errors again !

A lot of us get confused about Type-I and Type-II errors. I genuinely hope that this blog will end the confusion for once and for all.

Before diving into the details, let’s define what is ground truth. As the name suggests, Ground Truth is basically the actual value of some information that we may or may not know. For example, let’s think about the volume of water that exist in our planet. The information is unknown to us , but if proper investigation and research was done, we might be able to figure out the exact amount of water present. That is the ground truth.

Let’s take an example of a lie detector machine. In an ideal world, when someone lies, the lie detector goes buzzing and if they speak the truth then there’s no reaction. But there can be two other cases, the lie detector don’t buzz when someone lies and it starts buzzing even when someone speaks the truth.

visualized through a quadrant source : www.analyticsvidhya.com

Now we can dive into Type-I and Type-II errors, often mentioned as false positives and false negatives. Let’s take the previous example. The two other cases I mentioned earlier are nothing but Type-I and Type-II errors. When the lie detector buzz when someone speaks the truth, that’s a false positive or Type-I error. And when it doesn’t buzz even when someone lies, that’s a false negative or Type-II error.

Don’t forget to give us your ? !

You’ll never get confused about Type-I and Type-II errors again ! was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/youll-never-get-confused-about-type-i-and-type-ii-errors-again-f07464ab6a0f?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/youll-never-get-confused-about-type-i-and-type-ii-errors-again

Six Ethical Quandaries of Predictive Policing

When predictive machine learning models are applied to real-life scenarios, especially those that directly impact humans, such as cancer detection and other medical-related applications, the risks involved with incorrect predictions carry very high stakes. These risks are also prominent in how machine learning is applied in law enforcement, and serious ethical questions must be considered.

Originally from KDnuggets https://ift.tt/3mZCVws

source https://365datascience.weebly.com/the-best-data-science-blog-2020/six-ethical-quandaries-of-predictive-policing

Essential data science skills that no one talks about

Old fashioned engineering skills are what you need to boost your data science career.

Originally from KDnuggets https://ift.tt/2I1Kuny

source https://365datascience.weebly.com/the-best-data-science-blog-2020/essential-data-science-skills-that-no-one-talks-about

Top KDnuggets tweets Oct 28 Nov 03: 11 Branches of #MachineLearning and 63 Important Machine Learning Algorithms; Top #Python Libraries for #DataScience #DataVisualization #MachineLearning

Building Neural Networks with PyTorch in Google Colab; The Roadmap of Mathematics for Deep Learning; Top #Python Libraries for Data Science, Data Visualization, Machine Learning; 11 Branches of #MachineLearning and 63 Important Machine Learning Algorithms.

Originally from KDnuggets https://ift.tt/32yGcer

source https://365datascience.weebly.com/the-best-data-science-blog-2020/top-kdnuggets-tweets-oct-28-nov-03-11-branches-of-machinelearning-and-63-important-machine-learning-algorithms-top-python-libraries-for-datascience-datavisualization-machinelearning

2 Coding-free Ways to Extract Content From Websites to Boost Web Traffic

There are 2 main coding-free solutions for extracting content from websites to build your content base: use web scraping tools and use content aggregation tools. We review top choices.

Originally from KDnuggets https://ift.tt/3esoApf

source https://365datascience.weebly.com/the-best-data-science-blog-2020/2-coding-free-ways-to-extract-content-from-websites-to-boost-web-traffic