7 Signs you are data literate

Understanding data is key to being a Data Scientist. But, how can you know if you might be a good fit for the field when you haven’t worked with much data? These telltale signs will suggest you are competent to work with data, and that you might have a talent for being data literate.

Originally from KDnuggets https://ift.tt/3iZDYLl

source https://365datascience.weebly.com/the-best-data-science-blog-2020/7-signs-you-are-data-literate

My Week in AI: Part 3

Photo by Clark Street Mercantile on Unsplash

Welcome to My Week in AI! Each week this blog will have the following parts:

  • What I have done this week in AI
  • An overview of an exciting and emerging piece of AI research

Progress Report

Building skills in big data and discovering dashboarding

This week I learned about big data engineering by starting the ‘Advance Your Skills as an Apache Spark Specialist’ learning pathway on LinkedIn Learning. I am somewhat familiar with Spark and Hadoop, but I wanted to learn more, and this seemed like a good way to gain those skills.

I also read a lot this week about data visualization in Python. I came across a powerful dashboarding library called Altair, which allows you to build very aesthetic dashboards and visualizations. It only requires Python knowledge and is based on visual grammar, so it is fairly intuitive. This is definitely a library I am going to add to my toolbox and explore further.

Applying for Etsy Fellowship

I spent another large part of my week brainstorming, researching, and writing a proposal for Etsy’s Summer of Vision Fellowship program. The assignment was to develop a machine learning project based on the question, “How might we use visual cues to improve buyers’ shopping experience on Etsy?” This prompted me to read a lot about recommendation systems, computer vision in e-commerce, and the use of computer vision and NLP in concert for such applications.

Trending AI Articles:

1. Natural Language Generation:
The Commercial State of the Art in 2020

2. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

Emerging Research

Visiolinguistic Attention Learning

As part of my research for the Etsy Fellowship application, I came across a paper by Chen, Gong and Bazzani called ‘Image Search with Text Feedback by Visiolinguistic Attention Learning’ that will be presented at the upcoming CVPR 2020 conference¹. The paper discusses a new framework that these researchers developed: given a reference image and text feedback such as ‘same but in red’ or ‘without buckle,’ images may be retrieved that resemble the reference image but with the desired modification as described by the text. A potential application for this would be as a search feature on an e-commerce site.

Example usage of VAL¹

This task involves learning an amalgamated representation that captures visual and textual information from the inputs. In order to do this, the authors presented the Visiolinguistic Attention Learning (VAL) framework, which is made up of three parts: an image encoder, a text encoder, and multiple composite transformers that modify visual feature maps based on language information. The image encoder was made up of a typical CNN with feature maps being extracted from several different layers, and the text encoder was made up of an LSTM followed by max pooling and linear projection layers.

To me, the most fascinating part of this research is in the visiolinguistic representation using the composite transformers. The visual and language features are fused and then passed through a two-stream module that learns attentional transformation and preservation. First, the self-attention stream learns non-local correlations in the fused features and generates an attention mask that highlights spatial long-range interdependencies. In parallel, a joint-attention stream works to retain the visual features of the reference image. The outputs of these two streams are combined to create a set of composite features.

Overview of VAL framework¹

In terms of training, hierarchical matching is utilized. The primary objective function relates to visual-visual matching so as to ensure that the composite features are very similar to the target features. The secondary objective function relates to visual-semantic matching, which is useful when images have accompanying text such as descriptions or tags.

Big Data Jobs

In my opinion, the exciting applications of this research are for online shopping. If you see a pair of shoes that you like but would prefer them in a different color, you’d just have to type “I like these, but in blue” and the website will attempt to find for you a blue version of the original pair of shoes. This combination of semantic and visual features is not something I have come across much (probably because it is a very difficult task!).

Join me next week for more updates on my progress and a look at some cutting-edge research in the field of time series forecasting. Thanks for reading and I appreciate any comments/feedback/questions.

An update on my blog post from last week which you can find here; IBM announced on June 8th that they will cease all work on facial recognition due to fears that their products could be used for racial-profiling and perpetuating biases.

Update June 11th: Amazon will not allow usage of it’s facial recognition software by police for at least the next year.

References

[1] Y. Chen, S. Gong, and L. Bazzani. “Image Search with Text Feedback by Visiolinguistic Attention Learning,” IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020.

Don’t forget to give us your ? !


My Week in AI: Part 3 was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/my-week-in-ai-part-3-e37866479d56?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/my-week-in-ai-part-3

Will AI rescue the world from the impending doom of cyber-attacks or be the cause

source https://365datascience.weebly.com/the-best-data-science-blog-2020/will-ai-rescue-the-world-from-the-impending-doom-of-cyber-attacks-or-be-the-cause

Top June Stories: How Much Math do you need in Data Science? Easy Speech-to-Text with Python

Also: Deep Neural Networks and the Jennifer Aniston Neuron; Don’t Democratize Data Science.

Originally from KDnuggets https://ift.tt/2ZhU37G

source https://365datascience.weebly.com/the-best-data-science-blog-2020/top-june-stories-how-much-math-do-you-need-in-data-science-easy-speech-to-text-with-python

Deep Learning in Finance: Is This The Future of the Financial Industry?

Get a handle on how deep learning is affecting the finance industry, and identify resources to further this understanding and increase your knowledge of the various aspects.

Originally from KDnuggets https://ift.tt/3fh56Ub

source https://365datascience.weebly.com/the-best-data-science-blog-2020/deep-learning-in-finance-is-this-the-future-of-the-financial-industry

A Technical Introduction to Gesture Recognition for Data Scientists

This is part of a project at NordAxon, the company where I work at as a Data Scientist. The purpose of this article is to share my insights and knowledge, meaning to instill collaboration within the Data Science community and also to help you kickstart your own project in gesture recognition.

Background

Gesture recognition is an important part of Human Computer Interaction (HCI). It can be used to improve user interfaces, e.g. virtual reality in the gaming industry or remote control of robot arms using gestures. It can even make life easier for people with disabilities as an assistive technology such as sign language translation.

Example problem formulation: Classify the type of walk a person is doing

As of today, with Deep Learning and only 2D RGB images as inputs, we can build fairly robust models for static gesture recognition. Though, the more promising application for continuous gestures recognition still faces many challenges and is an open problem.

Prerequisite

This article is intended for Data Scientists who have at least basic understanding of machine learning and deep learning in computer vision.

Challenges to overcome

Most problems in gesture recognition are associated with accuracy and performance. For instance, different people might perform the same gesture differently in both speed and movement range. This greatly increases the complexity of the problem if we have many types of gestures. In this article, we will restrict our discussion only to hand gestures as they are the most important features for many gesture recognition tasks.

  • Occlusions (i.e. hands not in full sight) occur frequently and might also be a challenge to model. A robust model must be able to take into account the global- and local features in order to handle occlusions. Imagine a person that is holding a banana, then the part of the hand behind the banana is not visible to the camera/model.
Illustration of keypoint detection of hands.
  • Co-articulation occurs when a gesture is performed differently depending on what gesture comes before and after. Co-articulation usually happens between two gestures as an overlap. This occurs frequently in fluent sign language.
  • A limiting aspect for creating an end-to-end software is the trade-off between real-time performance and accuracy. This may be a requirement for creating e.g. reliable gesture user interfaces for controlling robot arms.
  • Another problem with continuous gesture recognition is that it is expensive to create datasets with frame-level annotations, which may be necessary for sign language recognition or other fast-paced gesticulations. Imagine any sign language with over 1000 words (understatement) and the amount of time that must be spent annotating this dataset.

Trending AI Articles:

1. Natural Language Generation:
The Commercial State of the Art in 2020

2. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

Creating your own dataset

Let’s address the problem of acquiring a frame-level dataset for continuous hand gesture recognition. This problem was addressed by a paper from 2016 where the authors train a CNN on 1 000 000 images using weakly supervised learning. Their algorithm takes advantage of the Expectation-Maximization (EM)-algorithm and “inaccurate” labels in order to train a fairly robust CNN model on different hand shapes. In other words, through clustering the model learns different hand shapes. This is a time-saving approach if you want to create your own frame-level dataset for handshapes.

Illustration of weak annotations/labels of the frames in a video. Each frame is an image.

Deep Learning architectures

The gestures are characterized by spatial movements of the hand through the time, therefore our model must be able to capture spatio-temporal features. In other words, the model will handle 2D images (spatial features) through time (temporal features). Here I list some approaches that can be explored and tested:

  • 3D Convolutional Neural Networks (3D CNN), the author of this paper argues that this type of model is more suitable for modelling spatio-temporal features compared to 2D CNNs. These models perform well when doing action recognition on videos and their usage may be extended to continuous gesture recognition. For a demonstration of these type of networks, skip to 18:00 in the following presentation
  • Object localization models which predicts bounding boxes such as YOLOv3, Single Shot Detectors (SSD) or Faster R-CNN can be used for static hand gestures recognition (small demonstration of YOLOv3 in the figure below). You can choose between fast real-time inference models e.g. YOLOv3 and SSD. Else if you value accuracy over speed you can use R-CNNs.
A demonstration of YOLOv3 trained on static hand gestures.
  • Keypoint detection models can estimate joint positions which in turn model human poses. These systems are usually built on top of an object localization model and their output could be used as features for lightweight classification models. For instance, with the help of Hidden Markov Models (HMM) or Reccurent Neural Networks (RNN). Alternatively, there exists complete end-to-end systems such as MediaPipe, OpenPose and wrnchAI. Check them out! You might be able to integrate these solutions into your gesture recognition system if you don’t want to develop one yourself.

Summary

  • In this article, I have introduced you to some of the challenges in continuous hand gesture recognition. For instance, occlusions and co-articulations.
  • You can use weakly supervised learning to train your model and therefore save time.
  • An end-to-end system could be a combination of different systems such as OpenPose + RNN (example) for continuous gesture recognition.

Next steps

The next steps are to decide the following

  • What will your dataset look like (which hand gestures) and how will you gather it?
  • Compare all the different approaches/models.

Don’t forget to give us your ? !


A Technical Introduction to Gesture Recognition for Data Scientists was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/a-technical-introduction-to-gesture-recognition-for-data-scientists-9ea4f1b76ed4?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/a-technical-introduction-to-gesture-recognition-for-data-scientists

My Week in AI: Part 2

Photo by Christopher Gower on Unsplash

Featuring more data engineering with SQL, Wheat Detection Kaggle Competition and research exploring demographic biases in facial recognition technology.

Welcome to My Week in AI! Each week this blog will have the following parts:

  • What I have done this week in AI
  • An overview of an exciting and emerging piece of AI research

Progress Report

Exploring New Types of Databases

This week’s focus has been more data engineering. I have finished the ‘Master SQL for Data Science’ learning pathway on LinkedIn Learning, and I found it very useful and interesting. I am planning to learn more about document databases (specifically MongoDB) and graph databases (specifically Neo4j) and am currently setting up my own projects on both of these platforms to dive deeper into their intricacies. Up next for my big data and data engineering learning is Hadoop and Kafka and more advanced Spark.

Experimenting with Object Detection

I’ve also been working on a project: the Kaggle Wheat Detection competition, essentially an object detection computer vision task. The aim is to identify individual wheat plant heads in images of agricultural fields and there are two main challenges associated with this: plants may be very dense and so identifying individual heads is difficult, and images may be blurred due to wind. My first step for this project was loading and cleaning data, and now I am trying some fast and rough transfer learning approaches to understand the baseline accuracy I can achieve. Further steps for improving accuracy might be augmenting the training data, training my own classifier, and using transfer learning to generate image embeddings as an input to my own classifier.

Trending AI Articles:

1. Natural Language Generation:
The Commercial State of the Art in 2020

2. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

Emerging Research

Biases in Facial Recognition Algorithms

This week’s research paper is in a field I touched on in my post from last week, ethics and fairness in AI. Some concern exists amongst local and federal government agencies about inconsistencies between different demographic groups when identifying faces using facial recognition technology. This has led to several US cities and other governing bodies banning the use of such technology in the public sphere.

The research I’m highlighting this week is by Garcia et al. and titled ‘The Harms of Demographic Bias in Deep Face Recognition Research’.¹ The authors showed that the accuracy of three state-of-the-art facial recognition algorithms dropped significantly for non-Caucasians compared to Caucasians, and for women compared to men. The algorithms they tested were VGGFace, DLib and FaceNet. The authors presented the results as the mean euclidean distances between the face embeddings of different subjects within each of the demographic categories tested. For example, they found that the mean euclidean distance between the embeddings for Caucasian Male samples was ~1.343, but only ~1.02 for Asian Females, when using the FaceNet algorithm. This means that Asian Females had more similar embeddings on average, and so were more likely to be misidentified. This difference of ~0.3 between the two demographic groups is significant relative to typical euclidean distance values for such embeddings.

After evaluating the facial recognition algorithms on data of different demographic groups, the researchers demonstrated the ease of performing a morphing attack on these algorithms in an automated border control (ABC) scenario. ABCs use facial recognition software to compare a person to the image on their travel document, and open immigration gates if the two are deemed a match. A morphing attack involves an accomplice and an imposter who work together to create a fake document with a combination of their two faces. Garcia et al. developed morphed images from a set of ‘accomplice’ images and tested these on the FaceNet algorithm. They found that attacks based on Asian faces worked 4.32% the time compared to 0.27% of the time for Caucasian faces. This is a practical illustration of the consequences that demographic bias in facial recognition technology can have.

Example of a morphing attack on an automated border control¹

Stay tuned next week for a look into big data technologies and some more emerging research. Thanks for reading and I appreciate any comments/feedback/questions.

References

[1] R. V. Garcia, L. Wandzik, L. Grabner and J. Krueger, “The Harms of Demographic Bias in Deep Face Recognition Research,” 2019 International Conference on Biometrics (ICB), Crete, Greece, 2019, pp. 1–6, doi: 10.1109/ICB45273.2019.8987334.

Don’t forget to give us your ? !


My Week in AI: Part 2 was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/my-week-in-ai-part-2-4bc3d61958e5?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/my-week-in-ai-part-2

5 BENEFITS OF ARTIFICIAL INTELLIGENCE DURING THE LOCKDOWN AND GLOBAL PANDEMIC

Artificial Intelligence, also known as (AI) is a branch of computer science that focuses more on the development of intelligent machines…

Via https://becominghuman.ai/5-benefits-of-artificial-intelligence-during-the-lockdown-and-global-pandemic-b7b996202162?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/5-benefits-of-artificial-intelligence-during-the-lockdown-and-global-pandemic

What every Data Scientist needs to learn from Business Leaders

You’ve learned so much to become a Data Scientist. Now, it’s time to kick it up to the next level with advanced soft skills — because these are important to the business for which you empower to make better decisions. Learning from the business leaders you support will help you develop a broader set of enhanced skills that will boost your Data Science quality and output.

Originally from KDnuggets https://ift.tt/2BWHQME

source https://365datascience.weebly.com/the-best-data-science-blog-2020/what-every-data-scientist-needs-to-learn-from-business-leaders

Histogram in R: How to Make a GGPlot2 Histogram?

ggplot2 histogram tutorial

Creating and understanding a histogram is an integral part of any data analysis process. In fact, if your work or education is in any way related to a quantitative discipline, you’ll most likely be required to make a histogram of your own or examine results featuring one. Not to mention that in today’s data-driven world, a strong data visualization skillset is one of the gateways to a successful career in data science.

That’s why in this tutorial, we’ll show you how to create a histogram in R.

More specifically, you will learn how to make a GGplot2 histogram. You’re about to find out how to use one of the most popular visualization libraries in R. And, what’s more, you will be able to add the ggplot histogram to your own analysis.

So, let’s get started, shall we?

What Is a Histogram?

A histogram is one of the most useful tools to understand numerical data.

What Is a Histogram Used for?

The first thing you need to remember is that a histogram requires precisely one numerical feature.

A Histogram shows the distribution of a numeric variable. The variable’s range of values is split into intervals, represented by different bins. The height of the bins shows the number of observations within an interval.

What Is the Difference Between a Bar Graph and a Histogram?

At this point, it’s worth mentioning another key aspect of a histogram.

You may have noticed that it looks similar to a bar chart. However, histograms bins show neighbouring intervals. Hence, there is no space between the bins of the histogram, unlike between bars in a bar chart.

Example of a bar graph

Example of a histogram

Now that you know what is a histogram and what is its purpose, let’s start work on our actual ggplot2 histogram.

How to Create a Histogram in GGplot2 in R?

When it comes to data analysis and statistics, R is one of the most popular choices among data scientists.

And when it comes to visualizing data in R, there is one clear stand out choice – ggplot2. ggplot2 is one of the most popular data visualization libraries in the R language. So popular in fact, that there is now a ggplot2 library in Python, based on the R version. So, it supports more than one single programming language.

But no matter which environment you’re programming in, to obtain a histogram, first, you need some data.

How to Load the Data Set for the GGplot2 Histogram?

For our histogram, we’ll be using data on the California real estate market.

In a new variable called ‘real estate’, we load the file with the ‘read CSV’ function. We also specify ‘header’ as true to include the column names and have a ‘comma’ as a separator.

GGPlot2 histogram example: loading the data set

Here, if your data file isn’t in your main r folder, you must also include the path’s location to your file, as well.

After loading the data we’re able to explore it in more detail with the aid of the environment pane. By clicking on the real estate variable, we observe that our real estate data frame contains a little over 250 observations and a total of 9 features.

GGPlot2 histogram example: the dataset

However, we rely on a single feature for our histogram, namely ‘Price’. As we’ve discussed, a histogram requires precisely one measure.

What are the GGplot2 Histogram Mandatory Layers?

With that in mind, let’s proceed with creating our Histogram with the help of the ‘GG plot’.

1. Data Layer

We start with the data layer, which is our ‘real estate’ data frame.

2. Aesthetics Layer

We move on to the aesthetics and as discussed, we’re creating a histogram of ‘Price’. Hence, we need only specify the ‘Price’ column here.

3. Geometry Layer

Lastly, the third layer is geometry. To create our histogram, we must use ‘geom histogram’.

GGPlot2 histogram example: geometry layer

After executing the code, we obtain our gg histogram.

Example of a gg histogram

How to Choose the Number of Histogram Bins in a GGplot2 Histogram?

Now, we can examine our newly obtained histogram. It shows 30 different bins, which is the default number in a ‘GG histogram’. However,  based, on our data, a smaller number would be more appropriate.

Choosing an appropriate number of bins is the most crucial aspect of creating a histogram. Through varying bin sizes, a histogram can reveal vastly different insights. This is a broad topic and examining it in more detail would require a tutorial on its own!

But here, we stay on the practical side of things and see how to alter a histograms bin size in a ‘GG plot’.

We can achieve this through the bins parameter. In the geometry layer, we add another parameter, which is bins. For this histogram we make it equal to 8.

Example of GGPlo2 histogram: the bins parameter

Also, in this layer, we’re able to control additional aspects of our histogram. For instance, we can specify the ‘bin width’, ‘boundaries’, even ‘geometries’ of our histogram. Feel free to explore these options when you’re creating your own histogram.

How to Change the Color in a GGplot2 Histogram?

We‘re moving on to some styling options (but we encourage you to explore additional options for a ‘GG histogram’ on your own, as well).

One of the most crucial aspects of every visualization is the colors we choose to display it. And while remaining with the default is always an option, taking that extra step and choosing a custom color is what sets your visualization apart.

For our histogram, it will be a blue color – close to our hearts. It’s the 365 Data science blue, which has the code ‘#108A99’. Altering the color is achieved with the ‘fill’ parameter.

Now, in a GG histogram, unlike a bar chart, there is no space between two neighboring bins. All the bins seem as if they’ve been glued together which, sadly, makes the bins less distinguishable. But we can avoid that by adding a white border for each bin. That way we’re creating separation among the blue bins. We can control the border color through the ‘color’ argument, so we set it to white.

Example of GGPlot2 Histogram: the color argument

GGPlot2 histogram in progress

This is already an excellent result! However, there are a few additional elements, aside from color, which could really set your chart apart.

How to Style the GGplot2 Histogram?

You can style a chart by customizing its theme. The default in a ggplot has a grey background. But this isn’t fitting, especially with our brand new color. So instead, we’ll rely on a ‘classic theme’. A classic theme has a clean design and a white background.

And of course, we cannot leave our histogram without a title. We include a title with the help of a ‘GG title’. It reads as ‘Distribution of Real Estate Prices’. Here you could do with increasing the title size. This can be achieved by adding a theme layer with a ‘plot title’ element. We need a ‘text element’ and in the brackets let’s choose a ‘size of 16’ and the ‘face’ to be bold.

While we’re at it, some axis labels wouldn’t go amiss. With ‘xlab’ we set the x-axis label to ‘Price in thousands of dollars’. For ‘ylab’ we have ‘Number of Properties’.

Example of GGPlot2 Histogram: styling the histogram by customizing the theme, adding a title, and axis labels

 

GGPlot2 Histogram

And that’s all folks! With just a few, carefully curated steps, we’ve achieved a professional and well-styled histogram. We relied on ggplot2’s capabilities in R and then used our knowledge and aesthetics to further transform the histogram. This way we ensured that our chart is the best it can be.

GGplot2 Histogram: Next Steps

The topic of how to create a histogram, and how to create one the right way is a broad one.  And this tutorial’s goal was to provide you with all the necessary steps to create a ggplot histogram in R. However, you shouldn’t limit yourself to one environment only. So, if you’d like to develop your data visualization skillset in technologies like Python, R, Tableau, and Excel, check out our Complete Data Visualization Course.

Try Data Visualization with Python, R, Tableau, and Excel Course for free!

Next tutorial: How To Make a GGPlot2 Scatter Plot in R?

 

The post Histogram in R: How to Make a GGPlot2 Histogram? appeared first on 365 Data Science.

from 365 Data Science https://ift.tt/2OaOMrQ

Design a site like this with WordPress.com
Get started