Googles Professional AI Certification & What Ive Learned Since

Last week, I took Google’s Tensorflow Certification exam, a grueling 5-hour endeavor where the tester is required to build highly accurate models pertaining to image classification, text classification, and time-series predictors.

I ended up passing the exam, but it definitely was a beast and required incremental accumulation of subject matter knowledge over several years along with a very structured study schedule and hands-on practice.

I wanted to quickly share what I’ve learned since then about Tensorflow and about neural networks in general.

Artificial Intelligence Is Probabilities

Artificial Intelligence (AI) is all about probabilities at the most basic level, but we can sum it up in once sentence:

Artificial Intelligence is about arriving at the probability distribution of the truth.

You aren’t seeing this at the surface level because API’s like Keras extract the higher level math away, but ultimately its good to understand exactly what machine learning is doing under the hood so that we know when to put faith into what the model is telling us, and when not to.

So how are neural networks using probability distributions? Let’s take an example of using an AI model to predict heart disease.

Big Data Jobs

Let’s say I have a training dataset of 1000 patients, 980 are labeled to not have heart disease and 20 are labeled as having heart disease. As far as the model is concerned, this dataset represents the ground truth, the maximum amount of information present to be learned by a machine learning model to equate the real world.

What a neural network is doing internally is that it’s shifting its weights around to arrive at the expected distribution of the original training dataset. In other words, when you feed data into a neural network, what you’re saying to the network is this:

“Okay, neural network, I’m going to feed you some data where 980 of the inputs have class “0″ (“do not have heart disease”), and 20 have class “1″ (“have heart disease”). Please shift your network weights around when evaluating the input data so that your predictions will typically match around 20 class “1’s” for every 980 class “0’s””.

“Okay Neural Network, given all these X’s, give me the function for a red line that best fits the data distribution so I can predict future Y’s”

That’s all that’s happening under the hood. The neural network is converging in on a probability distribution for predictions that will most closely match the probability distribution of the original ground truth you fed into it.

This is an important thing to understand because if you’re feeding datasets into a machine learning model where the distribution of classifiers do not approximate the true distribution of those classifiers in nature, then your machine learning model will be pretty “untruthful” (useless) when you try to predict on new data.

This is the ultimate challenge of data science and artificial intelligence is that it’s not just about building efficient models, it’s actually more about what you’re feeding into them.

This is analogous to diet and exercise for people: if you’re training hard in the gym, but eating fattening and unhealthy foods, it doesn’t matter how great your training program is, you’re ultimately not going to get the results you’re looking for.

The hardest part about machine learning is understanding whether the distribution of your data matches that which occurs naturally in the physical world.

Trending AI Articles:

1. Top 5 Open-Source Machine Learning Recommender System Projects With Resources

2. Deep Learning in Self-Driving Cars

3. Generalization Technique for ML models

4. Why You Should Ditch Your In-House Training Data Tools (And Avoid Building Your Own)

If I went out and sampled 1000 people for heart disease using their age, BMI, blood pressure, and other feature inputs to predict whether they have heart disease, how many of them would truly have heart disease?

But here’s the twist:

  • What if I sampled those 1,000 people standing outside my local gym? How might that affect the number of people in the dataset I record as truly having heart disease?
  • Conversely, what if I sampled the entire dataset standing outside of a McDonald’s?

Depending on how I collect the dataset, that is going to affect the ground truth distribution of “doesn’t have heart disease” versus “has heart disease” for training data. I may encounter far less people who truly have heart disease standing outside my local gym versus people who are regular diners at McDonald’s.

If I fed either of these datasets into a machine learning model, the model is going to converge in on the same probability distribution as the dataset that I fed into it. The model cannot learn any more information that what I originally gave it. It won’t tell me whether I am biasing it or not.

Neural networks cannot learn any more information than what you give them. They cannot tell you whether your dataset represents actual reality or not.

If my dataset is biased towards not having heart disease, the model will move its internal probability distribution to put a heavier emphasis on predicting someone to not have heart disease, which could yield a lot of false negatives if I started using that model in a clinical setting for cardiology patients.

This is fundamentally how artificial intelligence works: arrive at the probability distribution that would yield all these “y’s” for all these “x’s”. It’s an important concept to understand not just from a results perspective in machine learning, but also from an ethical standpoint.

Ultimately, artificial intelligence will give back to us only what we feed into them, they are merely a mirror for our own cognitive biases. It’s important that we realize that as we progress into the future.

Don’t forget to give us your ? !


Google’s Professional AI Certification & What I’ve Learned Since was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/googles-professional-ai-certification-what-i-ve-learned-since-40489b96c6d9?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/googles-professional-ai-certification-what-ive-learned-since

Published by 365Data Science

365 Data Science is an online educational career website that offers the incredible opportunity to find your way into the data science world no matter your previous knowledge and experience. We have prepared numerous courses that suit the needs of aspiring BI analysts, Data analysts and Data scientists. We at 365 Data Science are committed educators who believe that curiosity should not be hindered by inability to access good learning resources. This is why we focus all our efforts on creating high-quality educational content which anyone can access online.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Design a site like this with WordPress.com
Get started