365 Data Science

A Guide to Preparing OpenCV for Android

This tutorial guides Android developers in preparing the popular library OpenCV for use. Using a step-by-step guide, the library will be imported into Android Studio and then can be used for performing any of the operations it supports, such as object detection, segmentation, tracking, and more.

Originally from KDnuggets https://ift.tt/3jBNLr6

source https://365datascience.weebly.com/the-best-data-science-blog-2020/a-guide-to-preparing-opencv-for-android

New U. of Chicago Machine Learning for Cybersecurity Certificate Gives Professionals Tools to Detect and Prevent Attacks

Machine learning has become an essential tool for IT security professionals seeking to detect and prevent attacks and vulnerabilities. The Center for Data and Computing (CDAC) convened a trio of University of Chicago computer science faculty to produce an innovative new remote Machine Learning for Cybersecurity certificate that will be offered for the first time this autumn.

Originally from KDnuggets https://ift.tt/34oBDn2

source https://365datascience.weebly.com/the-best-data-science-blog-2020/new-u-of-chicago-machine-learning-for-cybersecurity-certificate-gives-professionals-tools-to-detect-and-prevent-attacks

Your Guide to Linear Regression Models

This article explains linear regression and how to program linear regression models in Python.

Originally from KDnuggets https://ift.tt/30wmgaQ

source https://365datascience.weebly.com/the-best-data-science-blog-2020/your-guide-to-linear-regression-models

The Boring Work of Implementing ML Models

Published from my website

As I write blog posts of the potential for AI to help industries. Or some hype article in the media talking about how AI can help revolutionise the world or your favourite industry. We forget the day to day work needed to make that future into reality.

What should you do with the data?

Yes, having AI look at medical images is great and gives more accurate predictions than doctors. But how do you get that data? Medical data is hard to obtain for good reason. Patients medical history should not be passed around willy-nilly. As its very sensitive information. This is not just data about your music preferences. It’s about their lives. After you get the data. How should you train the data? A simple 2D medical image may require a Convolutional neural network. The default for computer vision. How do you train the data? You will need labels. So, the computer knows what it’s looking at. A person with expertise in the field (a doctor). Will need to help label the images. Depending on the goal of the model, the doctor will need to point out items in an image. (for object detection). Or just give the general category of the image. (image classifier). Now you have trained the model. And if a model is good. If the model is correct more often than doctors. Then you can think about how to move the model to production.

How do you make sure the model is safe for use?

Hospitals are known for some terrible bureaucracy and paperwork depending on your country. So how would you get this AI into the hands that need it the most? For example, should the doctor access the model via a web app? Then upload an image to the website. Or should it be a mobile app? Where the doctor can point his camera at his hospital computer and the app gives a result. Only talking to your prospective users will give you the answer. If the diagnosis is wrong, who comes under fire? The model or doctor. So, there are even ethical questions when it comes to some areas of using machine learning.

When Google added machine learning to its data centres. To help with energy usage. They added fail safes just in case the AI does something funky. Many times, humans had to do a final send-off of approval. When the AI changed something, the humans were always in the loop. So, depending on the scale and activity. Safety features may have to get built-in. Rather than just focusing on getting accuracy on your model. But tons of areas where machine learning will be used. Such safety features won’t be needed. It will likely speed up time extracting useful information from the company’s data. Like spreadsheets or text documents. The main issue is privacy. Because if they contain personal data of customers. Then ethics and regulations like GDPR get involved. But these are problems you will face even before touching the model.

What should you do with missing or inadequate data?

If the user decides to opt-out of giving data in certain areas. How should the company give model recommendations to the user? Maybe it will need to guess by using other data from the user. Or maybe, it bases guesses on other users similar to the original user. Or maybe just say to the user you can’t use the service unless you opt-in into giving certain data. I don’t know. I guess it will depend on the company and the service they are providing.

Let’s say you want to use satellite data and/or remote sensing for your project. One major question you need to ask before starting your project. Is the spatial resolution enough for your image? If not, then you start to notice halfway through collecting your data. That it’s not good enough. As you can’t zoom in enough to get features you want from the image. This affected me in one of my projects. So I was later forced to use screenshots from google earth. If the project has commercial value. Then it may make sense to buy higher resolution images. From places like Planet Labs that release high-quality satellites into space. Allowing for high-resolution images with daily or close to daily updates. These things that don’t get mentioned in media articles talking about “HoW RemTOe SeNing can HElp Your Business.” To get cool things to work, you will need to do boring things.

Sending the model to production

I didn’t even get to the step by step problems of releasing your model to the public. Because if you are going to do so. You need to quickly learn how to use ML-ops and learn basic software engineering. An area I’m hoping to learn soon. Like I said in the medical example do you want to create a web-app or a mobile app. Maybe you want to create an API. As your users are going to be developers. But this side of machine learning is not talked about that much. After you release it, how will you update the model and even the app? Should users give user feedback to the model? Or are you going to do it personally by looking at user and error logs? Which cloud provider would you use to release your model? Would you go for the new serverless services or traditional server space? To improve the model would you collect user data separately. Then occasionally, train the model on new data? Or train as you go along. (Note: I don’t know if you can do this).

As I get more experience, I should be writing in more detail on how to solve these issues. Because I think this is an area that resources are lacking. Also, for my selfish reasons I want to share my work. Some apps allow you to interact with the model. This is something I want to do. Also, I could learn from the users of the app. So, something like that should be on the horizon. And if we want to have machine learning be useful. Then it’s obvious they should be released in some type of way. An internal app. Or a public web app. Models are not useful when stuck in a notebook. They are useful when released for the wider world. And the model is being tested with reality.

Don’t forget to give us your ? !

The Boring Work of Implementing ML Models was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/the-boring-work-of-implementing-ml-models-e71d72f89e04?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/the-boring-work-of-implementing-ml-models

Is OpenAI still open?

OpenAI announced a partnership with Microsoft, that grants them exclusive source-code and model access to GPT-3 without using the API…

Continue reading on Becoming Human: Artificial Intelligence Magazine »

Via https://becominghuman.ai/is-openai-still-open-81f6839f756b?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/is-openai-still-open

Microsofts HummingBird-ML: A library for accelerating traditional machine learning models

Microsoft’s HummingBird-ML: A library for accelerating inference with traditional machine learning models

Hello Guys. Today’s topic of our discussion is HUMMINGBIRD-ML.

We all have used traditional ML Models like Linear Regression, Logistic Regression, Decision Tree, etc, in the initial stages of our way in the field of Machine Learning.

Unfortunately, traditional ML libraries and toolkits (such as Scikit-Learn, ML.NET, and H2O) are usually developed to run on CPU environments. Which gives lack in performance of a model.

they do not use abstraction such as tensors to represent their computation. The lack of this extraction means that for these frameworks to make use of hardware acceleration, one would need to have many implementations ((for each operator) x (for each hardware backend)) which does not scale well. This means that traditional ML is often missing out on the potential accelerations that deep learning and neural networks have.

So, How HummingBird help in this situation?

Frameworks like TensorFlow, PyTorch, and ONNX Runtime are built on the idea of individual units and have tensors as their basic computational unit. These frameworks can run efficiently on hardware accelerators like GPUs and their prediction performance can be further optimized with compiler frameworks such as TVM(TVM is an open source deep learning compiler stack for CPUs, GPUs, and specialized accelerators).

Hummingbird compiles traditional ML pipelines into tensor computations to take advantage of the optimizations that are being implemented for neural network systems. This allows users to seamlessly leverage hardware acceleration.

HummingBird converts scikit_learn, ONNX.ML, etc models to PyTorch framework so that it can run with GPUs and attain a great performance.

This first open-source release of Hummingbird currently supports converting the following trees to PyTorch:

Scikit-learn:

Tree-based operators

DecisionTreeClassifier
DecisionTreeRegressor
ExtraTreesClassifier
ExtraTreesRegressor
GradientBoostingClassifier
GradientBoostingRegressor
HistGradientBoostingClassifier
HistGradientBoostingRegressor
IsolationForest
RandomForestClassifier
RandomForestRegressor

Linear methods

LinearRegression
LinearSVC
LogisticRegression
LogisticRegressionCV
SGDClassifier

SVM

NuSVC
SVC

Classifiers: Other

BernoulliNB
GaussianNB
MLPClassifier
MLPRegressor
MultinomialNB

Preprocessing

Binarizer
Normalizer
OneHotEncoder
RobustScaler
MaxAbsScaler
MinMaxScaler
StandardScaler

Matrix Decomposition

PCA
KernelPCA
TruncatedSVD
FastICA

Feature Selectors

SelectPercentile
SelectKBest
VarianceThreshold

Feature Pre-processing: One-to-One

SimpleImputer
MissingIndicator

Feature Pre-processing: Other

PolynomialFeatures

LightGBM:

LGBMClassifier
LGBMRanker
LGBMRegressor

XGBoost:

XGBClassifier
XBGRanker
XGBRegressor

ONNX.ML:

ArrayFeatureExtractor
LinearClassifier
LinearRegressor
Normalizer
Scaler
TreeEnsembleClassifier
TreeEnsembleRegressor

Have a look at the performance and efficiency difference between scikit-learn model, Pytorch with CPU as resource and Pytorch with GPU as hardware accelerator.

That’s all about HummingBird, in upcoming months Microsoft is planning to introduce some more supports to HummingBird.

So Stay Tuned, Thank You!

Don’t forget to give us your ? !

Microsoft’s HummingBird-ML: A library for accelerating traditional machine learning models was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/microsofts-hummingbird-ml-a-library-for-accelerating-traditional-machine-learning-models-8c12385f13fb?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/microsofts-hummingbird-ml-a-library-for-accelerating-traditional-machine-learning-models

Key Machine Learning Technique: Nested Cross-Validation Why and How with Python code

Selecting the best performing machine learning model with optimal hyperparameters can sometimes still end up with a poorer performance once in production. This phenomenon might be the result of tuning the model and evaluating its performance on the same sets of train and test data. So, validating your model more rigorously can be key to a successful outcome.

Originally from KDnuggets https://ift.tt/30xpawc

source https://365datascience.weebly.com/the-best-data-science-blog-2020/key-machine-learning-technique-nested-cross-validation-why-and-how-with-python-code

Getting Started in AI Research

A guide on how to contribute to confirming the reproducibility of some of the most recent papers and join open-search research.

Originally from KDnuggets https://ift.tt/3nlZMD9

source https://365datascience.weebly.com/the-best-data-science-blog-2020/getting-started-in-ai-research

The GPT-3 Algorithm Wrote Articles For Two WeeksAnd 26000 People Read It

The GPT-3 algorithm wrote articles for two weeks — they were read by 26 thousand people and only one asked the author if it was a robot…

Continue reading on Becoming Human: Artificial Intelligence Magazine »

Via https://becominghuman.ai/the-gpt-3-algorithm-wrote-articles-for-two-weeks-and-26-000-people-read-it-210127f79a16?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/the-gpt-3-algorithm-wrote-articles-for-two-weeksand-26000-people-read-it

365 Data Science

A Guide to Preparing OpenCV for Android

Top Stories Sep 28 Oct 4: Data Science Minimum: 10 Essential Skills You Need to Know to Start Doing Data Science

New U. of Chicago Machine Learning for Cybersecurity Certificate Gives Professionals Tools to Detect and Prevent Attacks

Your Guide to Linear Regression Models

The Boring Work of Implementing ML Models

What should you do with the data?

How do you make sure the model is safe for use?

What should you do with missing or inadequate data?

Sending the model to production

Don’t forget to give us your ? !

Is OpenAI still open?

Microsofts HummingBird-ML: A library for accelerating traditional machine learning models

Microsoft’s HummingBird-ML: A library for accelerating inference with traditional machine learning models

So, How HummingBird help in this situation?

Scikit-learn:

Tree-based operators

Trending AI Articles:

Linear methods

SVM

Classifiers: Other

Preprocessing

Matrix Decomposition

Feature Selectors

Feature Pre-processing: One-to-One

Feature Pre-processing: Other

LightGBM:

XGBoost:

ONNX.ML:

Don’t forget to give us your ? !

Key Machine Learning Technique: Nested Cross-Validation Why and How with Python code

Getting Started in AI Research

The GPT-3 Algorithm Wrote Articles For Two WeeksAnd 26000 People Read It