365 Data Science

ML05: Neural network on Iris

ML05: Neural Network on iris by Numpy

Discover NN elements by a perceptron from scratch

Read time: 10~12 min

Beginners of NN often intimidated by the tricky math and complex models at the first sight, so I'd like to share a fairly simple toy example of NN on iris without leveraging any DL framework like PyTorch or Tensorflow from a book written by Japanese[1], and only by NumPy----that is, we need to create the loss functions, activators, and adjusting weights on our own.

Complete Python code: 
https://drive.google.com/drive/folders/1Haknut4yGujlWP-QKpJnFWwRJE1xtf9Y

Neuron is a minimum unit of neural network. A perceptron is a single-layer neural network. Let’s try to do a binary classification of iris by a perceptron.

Outline
(1) Dataset
(2) Neural Network Review
(3) Input
(4) Data Splitting
(5) Define Functions
(6) Training
(7) Testing
(8) Summary
(9) Reference

(1) Dataset

The renowned iris from:
https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

(2) Neural Network Review

Figure 1: Visualization of a perceptron [2]

Figure 2: Visualization of a neural network [2]

Figure 3: Low-level operations and DL algorithm [3]

(3) Input

import numpy as np
import pandas as pd
import os
os.chdir("D:\\Python\\Numpy_JP\\ch04-3") 
# Choose your own working directory

df = pd.read_csv('iris.data', header=None)
print(df)

(4) Data Splitting

x_train = np.empty((80, 4))
x_test = np.empty((20, 4))
y_train = np.empty(80)
y_test = np.empty(20)

x_train[:40],x_train[40:] = x[:40],x[50:90]
x_test[:10],x_test[10:] = x[40:50],x[90:100]
y_train[:40],y_train[40:] = y[:40],y[50:90]
y_test[:10],y_test[10:] = y[40:50],y[90:100]

Row 1~50 are “Iris-setosa”, and row 51~100 are “Iris-virginica.” So we collect row 1~40 of and row 51~90 to be the training set, and the rest rows are testing set.

(5) Define Functions

def sigmoid(x):
    return 1/(1+np.exp(-x))

def activation(x, w, b):
    return sigmoid(np.dot(x, w)+b)

def update(x, y_train, w, b, eta): 
    y_pred = activation(x, w, b) 
    # activator
    a = (y_pred - y_train) * y_pred * (1- y_pred) 
    # partial derivative loss function

    for i in range(4):
        w[i] -= eta * 1/float(len(y)) * np.sum(a*x[:,i])
    b -= eta * 1/float(len(y))*np.sum(a)
    return w, b

Let’s probe into the math behind the preceding code:

Figure 5: NN function & loss function [4]

Figure 6: Partial derivatives & sigmoid activator

Activator: sigmoid function
Loss function: MSE (mean square error)
Optimizer: gradient descend
Weight updates: tiresome math work

(6) Training

weights = np.ones(4)/10 
bias = np.ones(1)/10 
eta = 0.1
for _ in range(15): # Run both epoch=15 & epoch=100 
 weights, bias = update(x_train, y_train, weights, bias, eta=0.1)

Initial weights: Let wi & b all be 0.1
Learning rate: set eta= 1
Epoch: Run both epoch= 15 & epoch= 100

(7) Testing

print("Epochs = 15") # Run both epoch=15 & epoch=100
print('weights = ', weights, 'bias = ', bias)
print("y_test = {}".format(y_test))
activation(x_test, weights, bias)

Figure 10: Testing result of epochs = 100

If we set the decision boundary at 0.5, then we get 100% accuracy in both epochs = 15 & 100.

The first 10 predictions of epochs = 15 are between 0.46~0.49, while the first 10 predictions of epochs = 100 are between 0.23~0.30. The last 10 predictions of epochs = 15 are between 0.57~0.63, while the last 10 predictions of epochs = 100 are between 0.64~0.81. As epochs rises, values of the two flower group become more distance.

(8) Summary

Without any NN framework, we built up a single-layer neural network ! We discovered concepts like activator(sigmoid), loss function(MSE), optimizer(gradient descend), weight updates(tiresome math work), initial weights(wi = b= 1), learning rate(eta= 1), epoch(tried epochs= 15 & 100).

(9) Reference

[1] Yoshida, T. & Ohata, S. (2018). Genba de Tsukaera! NumPy Data Shori Nyumon kikaigakushu| datascience de yakudatsu kosoku shorishuho. Japan, JP: SHOEISHA.

[2] Bre, F. et al(2020). An efficient metamodel-based method to carry out multi-objective building performance optimizations. Energy and Buildings, 206, (unknown).

[3] Subramanian, V. (2018). Deep Learning with PyTorch. Birmingham, UK: Packt Publishing.

[4] Roughgarden, T. & Valiant, G.(2015). CS168: The Modern Algorithmic Toolbox Lecture #15: Gradient Descent Basics. Retrieved from
http://theory.stanford.edu/~tim/s15/l/l15.pdf

Don’t forget to give us your ? !

ML05: Neural network on Iris was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/ml05-8771620a2023?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/ml05-neural-network-on-iris

Top And Easy to use Open-Source Image Labelling Tools for Machine Learning Projects

Image labelling is the process of manually or automatically defining regions in an image and creating a textual description of those…

Continue reading on Becoming Human: Artificial Intelligence Magazine »

Via https://becominghuman.ai/top-and-easy-to-use-open-source-image-labelling-tools-for-machine-learning-projects-ffd9d5af4a20?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/top-and-easy-to-use-open-source-image-labelling-tools-for-machine-learning-projects

Data Science Volunteering: Ways to Help

No matter the field in which you hold some expertise, sharing your skills to benefit the lives of others or supporting non-profit organizations that try to make the world a better place is a noble and time-worthy personal pursuit. Many opportunities exist in data science to contribute to meaningful projects and crucial needs from your local community to a global scale.

Originally from KDnuggets https://ift.tt/2LtFxVN

source https://365datascience.weebly.com/the-best-data-science-blog-2020/data-science-volunteering-ways-to-help

A Rising Library Beating Pandas in Performance

This article compares the performance of the well-known pandas library with pypolars, a rising DataFrame library written in Rust. See how they compare.

Originally from KDnuggets https://ift.tt/34kvmtp

source https://365datascience.weebly.com/the-best-data-science-blog-2020/a-rising-library-beating-pandas-in-performance

10 Python Skills They Dont Teach in Bootcamp

Ascend to new heights in Data Science and Machine Learning with this thrilling list of coding tips.

Originally from KDnuggets https://ift.tt/3naPBRB

source https://365datascience.weebly.com/the-best-data-science-blog-2020/10-python-skills-they-dont-teach-in-bootcamp

Building AI Models for High-Frequency Streaming Data

Many data scientists have implemented machine or deep learning algorithms on static data or in batch, but what considerations must you make when building models for a streaming environment? In this post, we will discuss these considerations.

Originally from KDnuggets https://ift.tt/3n7m4rU

source https://365datascience.weebly.com/the-best-data-science-blog-2020/building-ai-models-for-high-frequency-streaming-data4746306

Implementing the AdaBoost Algorithm From Scratch

AdaBoost technique follows a decision tree model with a depth equal to one. AdaBoost is nothing but the forest of stumps rather than trees. AdaBoost works by putting more weight on difficult to classify instances and less on those already handled well. AdaBoost algorithm is developed to solve both classification and regression problem. Learn to build the algorithm from scratch here.

Originally from KDnuggets https://ift.tt/373vhvG

source https://365datascience.weebly.com/the-best-data-science-blog-2020/implementing-the-adaboost-algorithm-from-scratch

ML04: From ML to DL to NLP

A concise concept map

Read time: 20 min

This article is like a concise concept map from ML to ANN to NLP, I wouldn't  put attention on the complicated math behind ML, DL and NLP. Instead, I try to just run through all the concepts and leave the details to the readers.

This article is a part my mid-term report of the course PyTorch and Machine Learning in NCCU. The original report:
https://drive.google.com/drive/folders/1Haknut4yGujlWP-QKpJnFWwRJE1xtf9Y

Contents
(1) Machine Learning Basics
1–1 Supervised learning
1–2 Unsupervised learning
1–3 Reinforcement learning
1–4 Model Evaluation
1–5 Data splitting & cross-validation
1–6 Data preprocessing and feature engineering
1–7 Overfitting and underfitting
1–8 Workflow of a ML project
(2) Neural Network Basics
2-1 Visualizations of NN
2–2 Activation functions
2–3 Loss functions
2–4 Optimizers
2–5 Batch learning
2–6 Batch normalization
2–7 Dropout
2–8 Hyper-parameter
2–9 Data splits & cross-validation
(3) Neural Network Models
3–1 Perceptron
3–2 FNN
3–3 MLP
3–4 CNN
(4) Neural Network in NLP
4-1 Data Pre-processing
4–2 BOW approaches
4–3 CNN
4–4 RNN
4–5 LSTM
(5) Reference

(1) Machine Learning Basics

1–1 Supervised learning

— Regression problems
— Classifications problems
— Image segmentation
— Speech segmentation
— Language segmentation

1–2 Unsupervised learning

— Clustering
— Dimensionality reduction (e.g. SVD, PCA)

1–3 Reinforcement learning

1–4 Model evaluation

For numeric targets, we have:
— MSE
— RMSE
— MAPE
For categorical targets, we have:
— Accuracy
— Precision
— Recall
— F1-score

1–5 Data splitting & cross-validation

— Three-way data splits: Splitting the datasets into three parts — training, validation and test datasets. It’s stricter than and have better performance than the two-way data splits—only splitting the datasets into training and test datasets. Three-way data splits is also called “splitting data machine learning validation”, this term strangely doesn’t have a unified name.

what’s the more common name of “Three-way data splits”?

— K-fold cross-validation: Preventing overfitting and making the models more stable.

Figure 2: 4-fold cross validation & three-way data splits [2]

1–6 Data preprocessing and feature engineering

— Vectorization: A must-do process for data of formats like text, sound, image and video.
— Handling missing values: Deleting or imputing them. I wrote a very detailed article on missing value imputation a month ago on my medium blog [1], concluding that:

1. In general, the complex ways of missing value imputation (random forest, Bayesian linear regression and so on) won’t perform worse than the simple ways like just imputing mean or median, contradicting to famous and popular some ML books.
2. Theoretically, random forest boasts better speed than kNN with similar accuracy, contradicting to famous and popular some ML books.
3. Bayesian linear regression (BayesianRidge in Python) and random forest model (ExtraTreesRegressor in Python) probably have the best performances in accuracy than other models.

1–7 Overfitting and underfitting

— Getting more data
— Reducing the size of the network (i.e. reducing the complexity of ML models)
— Apply weight regularization
— Dropout (only for ANN models, not suitable for SVM, RF and so forth)
— Underfitting

1–8 Workflow of a ML project

— Problem definition and dataset creation
— Measures of success
— Evaluation protocol
— Data preparation
— Baseline model
— Large enough to overfit
— Apply regularization
— Learning rate picking strategies

(2) Neural network Basics

2–1 Visualizations of NN

Figure 3: Visualization of a perceptron [3]

Figure 4: Visualization of a neural network [3]

Figure 5: Low-level operations and DL algorithm [2]

As we can see, there are a few main concepts of NN — — weights, activation function (in a perceptron), loss function, optimizer, weight updates. So, let’s probe into these concepts.

2–2 Activation functions

— Sigmoid
— Tanh
— ReLU
— PReLU (leaky ReLU is a kind of PReLU): Eliminate the “dying ReLU” in ReLU.
— Softmax: Useful for classification.

Figure 6: Relation between PReLU & leaky ReLU [4]

Figure 7: Plots of common activation functions [5]

Figure 8: Saturated & non-saturated activator [6]

2–3 Loss functions

— L1 loss
— MSE loss
— Cross-entropy loss: for classification
— NLL loss
— NLL loss2d

2–4 Optimizers

— SGD: Stochastic gradient descent
— Momentum
— AdaGrad
— RMSprop (= AdaGrad + Momentum)
— Adam (= Advanced RMSprop)

Figure 9: Optimizers comparison: SGD, Momentum, AdaGrad, Adam [7]

Figure 10: Optimizers comparison on MNIST: SGD, Momentum, AdaGrad, Adam [7]

In general, Adam > AdaGrad > Momentum > SGD (> represents “better than”), but in the preceding MNIST case, AdaGrad > Adam > Momentum > SGD. For most of the use cases, an Adam or RMSprop optimization algorithm works better.

2–5 Batch learning

— Mini-batch: Close to the concept of bootstrap.

2–6 Batch normalization

Normalization is an essential procedure for NN.

2–7 Dropout

Significant for avoiding overfitting.

2–8 Hyper-parameter

Tuning parameters like:
— Amount of perceptron of each layer
— Batch size
— Learning rate
— Weight decay

2–9 Data splitting & cross-validation

Better to adopt three-way data splits & k-fold cross-validation.

(3) Neural Network Models

3–1 Perceptron

Neuron is a minimum unit of neural network. A perceptron is a single-layer neural network.

3–2 FNN

Feedforward neural network (FNN), an artificial neural network wherein connections between the nodes do not form a cycle.

3–3 MLP

A multilayer perceptron (MLP) is a class of feedforward ANN.

3–4 CNN

CNN, convolutional neural network, is one kind of FNN. Fully connected layer (or linear layer) is too complex and loses all spatial information, whereas CNN avoid the preceding issues and leverage convolution layers and pooling layers to yield outstanding real-world outcomes in computer vision.

CNN has two major merits in computer visions: [8]
— Translation invariant
— Spatial hierarchies of patterns

Popular CNN’s network architecture: [7][9]
— LeNet
— AlexNet
— ResNet
— GoogLeNet
— VGGNet
— ImageNet

Moreover, others major concepts of CNN: [6]
— Conv2d (Conv2D)
— Pooling (MaxPooling2D)
— Nonlinear activator — ReLU
— Transfer learning
— Pre-convoluted features

Figure 12: A simplified version of CNN [2]

For more elaboration on CNN, check this:

Medical Image Analysis with Deep Learning — II

(4) Neural Network in NLP

NLP (natural language processing) had developed before ANN (artificial neural network) was feasible, though not until ANN was added into NLP did it prosper. The classic NLP book “Natural Language Processing with Python” [11] , published in 2009, only elaborate the statistical language modeling without mentioning any ANN methods.

4–1 Data Pre-processing

Converting text into matrix before going into NN:
— Use contraction dictionary
— Tokenization
— Deleting stopwords
— Stemming

4–2 BOW approaches

Then, we could treat the text as Bag-of-words (BOW) and do vectorization, either one-hot encoding or word embedding.

One-hot encoding: A traditional NLP approach usually used with TF-IDF. Data is too sparse here, facing the curse of dimensionality problem, and hence it’s rarely used with deep learning. Also, it often comes with n-gram model.
Word embedding: Converting the data into dense matrix. Word2vec is a popular measure.

However, the BOW approaches lose the sequential nature of text. So, then we turn to RNN to make good use of the sequential nature of text. [2]

4–3 CNN

CNNs solves problems in computer vision by learning features from images. In images, CNNs works by convolving across height and width. In the same way, time can be treated as a convolutional feature. 1-D CNNs sometimes perform better than RNNs and are computationally cheaper. Another usage of CNN in NLP is text classification. [2]

4–4 RNN

Recurrent neural network (RNN), which is not FNN, aims to address sequential data. RNN can solve problems like natural language understanding, document classification, sentiment classification. RNN uses backpropagation through time (BPTT) instead of backpropagation (BP). [11]

The simple version of RNN, in practice, finds it difficult to remember the contexts that happened in the earlier parts of sequence. LSTMs and other different variants of RNN solve this problem by adding different neural networks inside the LSTM which later decides how much, or what data to remember. [2]

4–5 LSTM

Long short term memory networks (LSTM) is a kind of RNN, capable of learning long-term dependency. The simple RNN has problems like vanishing gradients and gradient explosion when addressing large sequence. LSTMs are designed to avoid long-term dependency problems by having a design by which is natural to remember information for a long period of time. [2]

LSTM has 5 parts—cell state, hidden state, input gate, forget gate, output gate. [12]

(5) Reference

[1] Kuo, M. (2020)。ML02: 初探遺失值(missing value)處理。取自 https://merscliche.medium.com/ml02-na-f2072615158e
[2] Subramanian, V. (2018). Deep Learning with PyTorch. Birmingham, UK: Packt Publishing.
[3] Bre, F. et al (2020). An efficient metamodel-based method to carry out multi-objective building performance optimizations. Energy and Buildings, 206, (unknown).
[4] Guo, H. (2017). How do I implement the PReLU on Tensorflow?. Retrieved from https://www.quora.com/How-do-I-implement-the-PReLU-on-Tensorflow
[5] Endicott, S. (2017). Game Applications of Deep Neural Networks. Retrieved from https://bit.ly/2G8nUIQ
[6] Taposh Dutta-Roy (2017). Medical Image Analysis with Deep Learning — II. Retrieved from
https://medium.com/@taposhdr/medical-image-analysis-with-deep-learning-ii-166532e964e6
[7] 斎藤康毅 (2016). ゼロから作るDeep Learning ―Pythonで学ぶディープラーニングの理論と実装 (中譯：Deep Learning：用Python進行深度學習的基礎理論實作). Japan, JP: O’Reilly Japan.
[8] Chollet, F. (2018) Deep learning with Python. New York, NY: Manning Publications.
[9] 邢夢來等人 (2018)。PyTorch 深度學習與自然語言處理。新北市，台灣：博碩文化。
[10] Bird, S. et al (2009). Natural Language Processing with Python. California, CA: O’Reilly Media.
[11] Rao, D., & McMahan, B. (2019). Natural Language Processing with PyTorch. California, CA: O’Reilly Media.
[12] Ganegedara, T. (2018). Natural Language Processing with TensorFlow. Birmingham, UK: Packt Publishing.

Don’t forget to give us your ? !

ML04: From ML to DL to NLP was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/ml04-ce0b172deb2b?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/ml04-from-ml-to-dl-to-nlp

How Is AI Transforming Enterprise Software Applications

A recent survey by Gartner predicts, “By 2021, 40% of new enterprise applications implemented by service providers will include AI technologies.”

The world of business is undergoing a massive change owing to the rapid emergence of artificial intelligence (AI) for enterprise applications. Indeed, artificial intelligence has the power to solve several organizational problems as it offers functionalities that humans cannot practically perform at the same rate and accuracy.

AI has quickly changed status from a “technology to experiment” to a “technology to deploy.” By 2025, most enterprises will be using AI-enabled apps to gain a competitive edge from streamline operations, more incredible product innovation, and improved customer satisfaction.

Drivers of this shift include an unprecedented growth of enterprise data, advances in machine learning (ML), natural language processing (NLP) capabilities and the need to accelerate digital transformation journey.

Below, we present you with the latest insights into how AI is transforming enterprise software apps.

A Glimpse At AI For Enterprise Applications

1. Embracing conversational AI to simplify data analytics consumption

While data analysis is critical, it is extremely time-consuming to sift through multiple business dashboards and reports and find relevant data. To overcome such limitations, AI-enabled virtual assistants are integrated with business intelligence apps.

AI-enabled virtual assistants, leverage the NLP technology to converse with users in natural language. By merely initiating a chat on the enterprise messaging app, and sending simple messages like “What is the sales of product A for 2017?”, employees and business leaders can procure in-depth insights in the most granular form of data, without switching between multiple tools and dashboards. Users need not manually filter data to analyze information and arrive at crucial decisions.

This, AI is transforming the consumption of business intelligence and analytics, especially for on-field employees (for example, sales agents) or CxOs, who need quick access to information without having to dig through heaps of data.

2. Securing Every Aspect of Enterprise IT through AI

With the rise of remote working across the globe and as IT decision-making becomes more democratic, enterprises cannot ignore the increased threat of cyber attacks.

To combat the threat and secure every aspect of the IT infrastructure, organizations are scrambling to deploy applications that integrate machine learning to detect possible threats and vulnerabilities in real-time.

These tools use ML techniques to spot anomalies in network traffic, emails and user activities. Hence, they can quickly identify a potential attack and take steps to mitigate it, even if the threat is unlike anything the organization has witnessed before.

3. Transforming IT through AIOps

AIOps, an emerging variation of DevOps, uses machine learning (ML) algorithms on IT operative data to derive insights that optimize and improve operations.

While DevOps automates and simplifies IT operations, AIOps goes a step further by extracting information that is useful in overseeing IT activities.

Don’t forget to give us your ? !

How Is AI Transforming Enterprise Software Applications was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/how-is-ai-transforming-enterprise-software-applications-1afcca884c69?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/how-is-ai-transforming-enterprise-software-applications

Data Compression via Dimensionality Reduction: 3 Main Methods

Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.

Originally from KDnuggets https://ift.tt/37Xz9hg

source https://365datascience.weebly.com/the-best-data-science-blog-2020/data-compression-via-dimensionality-reduction-3-main-methods

ML05: Neural Network on iris by Numpy

Discover NN elements by a perceptron from scratch

(1) Dataset

(2) Neural Network Review

(3) Input

Trending AI Articles:

(4) Data Splitting

(5) Define Functions

(6) Training

(7) Testing

(8) Summary

(9) Reference

Don’t forget to give us your ? !

A concise concept map

(1) Machine Learning Basics

1–1 Supervised learning

1–2 Unsupervised learning

1–3 Reinforcement learning

1–4 Model evaluation

1–5 Data splitting & cross-validation

1–6 Data preprocessing and feature engineering

1–7 Overfitting and underfitting

1–8 Workflow of a ML project

Trending AI Articles:

(2) Neural network Basics

2–1 Visualizations of NN

2–2 Activation functions

2–3 Loss functions

2–4 Optimizers

2–5 Batch learning

2–6 Batch normalization

2–7 Dropout

2–8 Hyper-parameter

2–9 Data splitting & cross-validation

(3) Neural Network Models

3–1 Perceptron

3–2 FNN

3–3 MLP

3–4 CNN

(4) Neural Network in NLP

4–1 Data Pre-processing

4–2 BOW approaches

4–3 CNN

4–4 RNN

4–5 LSTM

(5) Reference

Don’t forget to give us your ? !

A Glimpse At AI For Enterprise Applications

1. Embracing conversational AI to simplify data analytics consumption

2. Securing Every Aspect of Enterprise IT through AI

3. Transforming IT through AIOps

Trending AI Articles:

4. Making Intranet Smarter with AI

5. Combining AI with CRM

6. Optimizing Supply Chain Management Through AI

7. Simplifying Vendor Billing through AI

Get Started

Don’t forget to give us your ? !