365 Data Science

Can AI Learn Human Values?

OpenAI believes that the path to safe AI requires social sciences.

Originally from KDnuggets https://ift.tt/31MOQW6

source https://365datascience.weebly.com/the-best-data-science-blog-2020/can-ai-learn-human-values

Machine Learning vs Data Science

What’s The Difference? A Short Guide

Continue reading on Becoming Human: Artificial Intelligence Magazine »

Via https://becominghuman.ai/machine-learning-vs-data-science-31f942696334?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/machine-learning-vs-data-science

Balancing Creativity and Focus

The curious mind wanders

In my life, I’m very creative, but the cost of creativity, for me, has been focus. I’m curious to a fault, and while that has been the drive behind taking too many classes in school, it has also hindered my ability to be productive. That’s not to say I haven’t had major finds as a results; I have. I don’t think creativity and focus have to be all or nothing. I’ve been working on maintaining a balance to not only maintain a job but also to find the best, most impacting ideas.

In high school, I definitely had this problem in class. The material was always interesting to me, but my ability to remain focused got in the way of some good essay writing. This was especially true in my history class where we would have to write one essay a week, and while I loved the material, I was scatter brained when pen went to paper. Strange to think that now I read a few history books a year.

In college, it turns out I’m very capable at math which means I didn’t need to study so much. However, the downside is that I didn’t go into as much depth on some materials as I would have liked. For my senior design project, I really let loose in designing the computer vision system for our autonomous vehicle. I mostly wrote the code on my own, and I got very creativity in using uint16 instead of float to shave off some valuable seconds of the processing time. However, I didn’t quite have the focus to comment my code appropriately, even though it was highly efficient code. The end result is that my code was not used by anyone else, which I regret.

At Notre Dame, I quickly learned my motivation was to graduate on time. This meant I had to focus. I determined when I wanted to finish, and I back tracked all the time requirements of each module of my work. The result was that I had a rough idea of when things had to get done, and this allowed me to stay more or less on track.

At DSC, I had a mixed bag. When I started, we were only using the entire face (as opposed to regions) for face recognition when the literature at the time suggested any method would be improved by multiple regions fused together. My manager had other things for me to work on, but I kept getting drawn to this low hanging fruit. To even do the testing, I revamped the code and sped it up by a factor of 10. I was using the oldest computer at the company, so my code needed to run just a bit faster if I wanted to do the experiments properly. I threw together a few rough regions, and I was able to quickly show how much more improvement one got form multiple face regions. After that, my previous tasks went out the window, and while I was happy, I’m sure it was frustrating to my manager.

Don’t forget to give us your ? !

Balancing Creativity and Focus was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/balancing-creativity-and-focus-578db1332dfb?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/balancing-creativity-and-focus

Data Scientists to Help Women-led Startups Soar in WaiDATATHON for Sustainable Future

Ten AI-driven women-led startups are set to get a big tech boost next weekend as part of WaiDATATHON, the first datathon ever hosted in VR! The two day virtual event is designed to connect women entrepreneurs with global data, AI and software engineers who will build prototypes for each startup.

WaiDATATHON for Sustainable Future is orchestrated by two Women in AI members and Machine Learning/Computer Vision engineers from Autonomous Driving R&D at TomTom, Sindi Shkodrani and Vedika Agarwal.

Sindi Shkodrani, who is leading #WaiDATATHON, says that making conscious and sustainable tech requires a mindset of building things fast. “We want to nurture our startups and future tech with these values. If you believe that the future of data and AI is sustainable, join us to build it together.”

How it works:

Browse through the challenges and register today to help. Then select and apply to one of the challenge ideas. Teams are expected to build a working proof of concept. All teams will present their ideas to a specialized jury and the winners will be announced.

Eve Logunova is the Women in AI Ambassador in the Netherlands who manages the accelerator where the women entrepreneurs have been engaging for the past nine months. “These challenges are generating great interest and we are very happy to enable our entrepreneurs in their product development journey!”

If you love food and sharing stories, join challenge #1 and help us build a traditional food storyteller.

Don’t forget to give us your ? !

Data Scientists to Help Women-led Startups Soar in WaiDATATHON for Sustainable Future was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/data-scientists-to-help-women-led-startups-soar-in-waidatathon-for-sustainable-future-7049d54d483f?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/data-scientists-to-help-women-led-startups-soar-in-waidatathon-for-sustainable-future

Getting A Data Science Job is Harder Than Ever How to turn that to your advantage

Although many aspiring Data Scientists are finding it is becoming more difficult to land a job than it was in previous years, understanding what has changed in the hiring landscape can be used to to your advantage in matching with the best organization for your goals and interests.

Originally from KDnuggets https://ift.tt/31PZY4G

source https://365datascience.weebly.com/the-best-data-science-blog-2020/getting-a-data-science-job-is-harder-than-ever-how-to-turn-that-to-your-advantage

Advice for Aspiring Data Scientists

Are you a student of some type asking how to get into Data Science? You’ve come to the right place. Read on for both common and less basic advice on entering the field and excelling in the profession.

Originally from KDnuggets https://ift.tt/3oz00aD

source https://365datascience.weebly.com/the-best-data-science-blog-2020/advice-for-aspiring-data-scientists

How to become a Data Scientist: a step-by-step guide

Data science is everywhere. But what are the best ways to learn the field well enough to enter the profession? Read on for some tips and steps on doing so, and some great courses to help you get there.

Originally from KDnuggets https://ift.tt/2HGNnt2

source https://365datascience.weebly.com/the-best-data-science-blog-2020/how-to-become-a-data-scientist-a-step-by-step-guide

How Automation Is Improving the Role of Data Scientists

Here is an overview of 5 ways that data automation will enhance how scientists spend their time and improve the results they get.

Originally from KDnuggets https://ift.tt/37Jn6oY

source https://365datascience.weebly.com/the-best-data-science-blog-2020/how-automation-is-improving-the-role-of-data-scientists

DengAI: Predicting Disease SpreadImputation and Stationary Problems

DengAI: Predicting Disease Spread — Imputation and Stationary Problems

Github Repository

One of the biggest data challenge on DrivenData, with more than 9000 participants is the DengAI challenge. The objective of this challenge is predict the number of dengue fever cases in two different cities.

This blogpost series covers our journey of tackling this problem, starting from initial data analysis, imputation and stationarity problems up un to the different forecasting attempts. This first post covers the imputation and stationarity checks for both cities in the challenge, before moving on to trying different forecasting methdologies.

Throughout this post, code-snippets are shown in order to give an understanding of how the concepts discussed are implemented into code. The entire Github repository for the imputation and stationary adjustment can be found here.

Furthermore, in order to ensure readability we decided to show graphs only for the city San Jose instead showing it for both cities.

Imputation

Imputation describes the process of filling missing values within a dataset. Given the wide range of possibilities for imputation and the severe amount of missing data within this project, it is worthwhile to go over some of the methods and empirically check which one to use.

Overall, we divide all imputation methods into the two categories: basic and advanced. With basic methods we mean off the shelf, quick imputation methods, which are oftentimes already build into Pandas. Advanced imputation methods deal with model-based approaches where the missing values are attempted to be predicted, using the remaining columns.

Given that the model-based imputation methods normally result in superior performance, the question might arise why we do not simply use the advanced method for all columns. The reason for that is that our dataset has several observations where all features are missing. The presence of these observations make multivariate imputation methods impossible.

We therefore divide the features into two categories. All features, or columns, which have fewer than 1 percent missing observations are imputed using more basic methods, whereas model-based approaches are used for features which exhibit more missing observations than this threshold.

The code snippet below counts the percentage of missing observations, divides all features into one of two aforementioned categories and creates a graph to visualize the results.

The resulting graph below shows four features which have more than 1 percent of observations missing. Especially the feature ndvi_ne, which describes satellite vegetation in the north-west of the city has a severe amount of missing data, with more around 20% of all observation missing.

Stationarity Problems — Seasonality and Trend

In contrast to cross-sectional data, time series data comes with a whole bunch of different problems. Undoubtedly one of the biggest issues is the problem of stationarity. Stationarity describes a measure of regularity. It is this regularity which we depend on to exploit when building meaningful and powerful forecasting models. The absence of regularity makes it difficult at best to construct a model.

There are two types of stationarity, namely strict and covariance stationarity. In order for a time series to be fulfil strict stationarity, the series needs to be time independent. That would imply that the relationship between two observations of a series is only driven by the timely gap between them, but not on the time itself. This assumption is difficult, if not impossible for most time series to meet and therefore more focus is drawn on covariance stationarity.

For a time series to be covariance stationary, it is required that the unconditional first two moments, so the mean and variance, are finite and do not change with time. It is important to note that the time series is very much allowed to have a varying conditional mean. Additionally, it is required that the auto-covariance of a time series is only depending on the lag number, but not on the time itself. All these requirements are also stated below.

There are many potential reasons for a time series to be non-stationary, including seasonalities, unit roots, deterministic trends and structural breaks. In the following section we will check and adjust our exogenous variable for each of these criteria to ensure stationarity and superior forecasting behavior.

Seasonality

Seasonality is technically a form of non-stationarity because the mean of the time series is dependent on time factor. An example would be the spiking sales of a gift-shop around Christmas. Here the mean of the time series is explicitly dependent on time.

In order to adjust for seasonality within our exogenous variables, we first have to find out which variables actually exhibits that kind of behavior. This is done by applying a Fourier Transform. A Fourier transform disentangles a signal into its different frequencies and assesses the power of each individual frequency. The resulting plot, which shows power as a function of frequency is called a power spectrum. The frequency with the strongest power could then be potentially the driving seasonality in our time series. More information about Fourier transform and signal processing in general can be read up on an earlier blogpost of ours here.

The following code allows us to take a look into the power-plots of our 20 exogenous variable.

The plot below shows the resulting 20 exogenous variables. Whether or not a predominant and significant threshold is met for a variable is indicated by a red dot on top of a spike. If a red dot is visible, that means that the time series has a significantly driving frequency and therefore a strong seasonality component.

One possibility to cross-check the results of the Fourier Transforms is to plot the Autocorrelation function. If we would try have a seasonality of order X, we would expect a significant correlation with lag X. The following snippet of code plots the autocorrelation function for all features and highlights those features which are found to have a seasonal affect according to the Fourier Transform.

From the ACF plots below, we can extract a lot of useful information. First of all, we can clearly see that for all columns where the Fourier transforms find a significant seasonality, we also find confirming picture. This is because we see a peaking and significant autocorrelation at the lag which was found by the power-plot.

Additionally, we find some variables (e.g. ndvi_nw) which exhibit a constant significant positive autocorrelation. This is a sign of non-stationarity, which will be addressed in the next section which will be dealing of stochastic and deterministic trends.

In order to get rid of the seasonal component, we decompose each seasonality-affected feature into its unaffected version its seasonality component and trend component. This is done by the STL decomposition which was developed by Cleveland, McRae & Terpenning (1990). STL is an acronym for “Seasonal and Trend decomposition using Loess”, while Loess is a method for estimating non-linear relationships.

The following code snippet decomposes the relevant time series, and subtracts (given that we face additive seasonalities) the seasonality and the trend from the time series.

Deterministic Trends

One more obvious way to breach the assumptions of covariance stationarity is if the series has a deterministic trend. It is important to stress the difference between a deterministic and not a stochastic trend (unit root). Whereas it is possible to model and remove a deterministic trend, this is not possible with a stochastic trend, given its unpredictable and random behavior.

A deterministic trend is the simplest form of a non-stationary process and time series which exhibit such a trend can be decomposed into three components:

The most common type of trend is a linear trend. It is relatively straight forward to test for such a trend and remove it, if one is found. We apply the original Mann-Kendall test, which does not consider seasonal effects, which we already omitted in the part above. If a trend is found, it is simply subtracted from the time series. These steps are completed in the method shown below.

The result can be viewed here. As we can see, most time series exhibited a linear trend, which was then removed.

Even though we removed a deterministic trend, this did not ensure that our time series are actually stationary now. That is because what works for a deterministic trend does not work for a stochastic trend, meaning that the trend-removing we just did does not ensure stationary of unit-roots.

We therefore have to explicitly test for a unit-root in every time series.

Stochastic Trends — Unit roots

A unit root process is the generalization of the classic random walk, which is defined as the succession of random steps. Given this definition, the problem of estimating such a time series are obvious. Furthermore, a unit root process violates the covariance stationarity assumptions of not being dependent on time.

To see why that is the case, we assume an autoregressive model where today’s value only depends on yesterday’s value and an error term.

If we parameter a_1 would now be equal to one, the process would simplify to

By repeated substitution we could also write this expression as:

When now calculating the variance of y_t, we face a variance which is positively and linearly dependent on time, which violates the second covariance stationarity rule.

This would have not been the case if a_1 would be smaller than one. That is also basically what is tested in an unit-root test. Arguably the most well-known test for an unit root is the Augmented Dickey Fuller (ADF) test. This test has the null hypothesis of having a unit root present in an autoregressive model. The alternative is normally that the series is stationary or trend-stationary. Given that we already removed a (linear) trend, we assume that the alternative is a stationary series.

In order to be technically correct, it is to be said that the ADF test is not directly testing that a_1 is equal to zero, but rather looks at the characteristic equation. The equation below illustrates what is meant by that:

We can see that the difference to the equation before is that we do not look at the level of y_t, but rather at the difference of y_t. Capital Delta represent here the difference operator. The ADF is now testing whether the small delta operator is equal to zero. If that would not be the case, then the difference between yesterday’s and tomorrow’s value would depend on yesterday’s value. That would mean if the today’s value is high, the difference between today’s and tomorrow’s value will also be large which is a self-enforcing and explosive process which clearly depends on time and therefore breaks the assumptions of covariance stationarity.

In case of a significant unit-root (meaning a pvalue above 5%), we difference the time series as often as necessary until we find a stationary series. All of that is done through the following two methods.

The following table shows that we do not find any significant ADF test, meaning that no differencing was needed and that no series exhibited a significant unit root.

Finishing up

Last but not least we take a look at our processed time series. It is nicely visible that none of the time series are trending anymore and they do not exhibit significant seasonality anymore.

Additionally we take a look at how the distributions of all of the series look. It is important to note that there are no distributional assumptions of the feature variables when it comes to forecasting. That means that even if we find highly skewed variables, it is not necessary to apply any transformation.

After sufficiently transforming all exogenous variables, it is now time to shift our attention on the forecasting procedure of both cities.

Don’t forget to give us your ? !

DengAI: Predicting Disease Spread — Imputation and Stationary Problems was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/dengai-predicting-disease-spread-imputation-and-stationary-problems-f08c7bac06f1?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/dengai-predicting-disease-spreadimputation-and-stationary-problems

365 Data Science

Can AI Learn Human Values?

Machine Learning vs Data Science

Balancing Creativity and Focus

The curious mind wanders

Trending AI Articles:

Don’t forget to give us your ? !

Data Scientists to Help Women-led Startups Soar in WaiDATATHON for Sustainable Future

How it works:

Trending AI Articles:

Don’t forget to give us your ? !

Getting A Data Science Job is Harder Than Ever How to turn that to your advantage

Advice for Aspiring Data Scientists

How to become a Data Scientist: a step-by-step guide

Top Stories Oct 19-25: How to Explain Key Machine Learning Algorithms at an Interview; Roadmap to Natural Language Processing

How Automation Is Improving the Role of Data Scientists

DengAI: Predicting Disease SpreadImputation and Stationary Problems

DengAI: Predicting Disease Spread — Imputation and Stationary Problems

Imputation

Trending AI Articles:

Basic imputation methods

The potential flaws of the knn approach

Model-based imputation methods

Stationarity Problems — Seasonality and Trend

Seasonality

Deterministic Trends

Stochastic Trends — Unit roots

Finishing up

Don’t forget to give us your ? !