Building and Testing an AI Platform

Source

by Davar Ardalan and Servesh Tiwari

This Saturday, I was honored to present on the promise of Cultural AI at #TribalQonf, hosted by The Test Tribe. The bottom line is that cultural relevancy needs to be a two-way street for AI products and solutions to speak to the global public. Testers can play a major role in designing this future to make cultural intelligence easily available to both human and machine audiences.

IVOW stands for Intelligent Voices of Wisdom, we are an early stage startup utilizing machine learning to identify and segment consumer audiences using public data around holidays, festivals, food, music, arts, and sports. IVOW also sources new data via crowdsourced competitions.

The problem we’re addressing is that in 2020, AI systems, such as conversational interfaces and chatbots, struggle to be responsive to the values, goals, and principles of diverse communities. Too many AI systems reflect the biases and perspectives of their developers.

Trending AI Articles:

1. Natural Language Generation:
The Commercial State of the Art in 2020

2. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

That’s because AI algorithms train on datasets to learn patterns which currently are limited in understanding global cultural contexts. This lack of cultural diversity in datasets will limit the effectiveness of governments and businesses in providing solutions and expanding into new markets.

Our chatbot Sina and smart tool, CultureGraph (currently in Phase 1 Alpha) will help enterprises to better understand consumer audiences; customize consumer messaging for diverse audiences; and unlock first-party data and new revenue streams and areas of growth.

Together with forward-looking enterprises, we aim to bring cultural and historic awareness into the world of artificial intelligence systems, data science, and machine learning. Imagine future chatbots, smart speakers, and other conversational interfaces that are able to truly understand your culture and traditions, and can tell you interesting stories about your community and heritage.

“Someday my great-great-granddaughter will ask ‘Google, why do Indians wear a red dot on their foreheads?’ I want the answer to be truly reflective of her ancestry and include the emotions that I would feel in answering that question, rather than the one-size-fits-all answer that ‘it’s common practice to do so.’” says Aprajita Mathur, Manager Bioinformatics Software Test at Guardant Health and Senior Advisor at IVOW AI.

Think about machine learning models helping to better evaluate customer needs, and improve experiences based on the culture of your community and people.

Testing any AI platform like IVOW is a complex task so it follows many of the steps used during the functional testing. We have summarized that approach here:

Data Source and Conditioning Testing

a. Verify the quality of data from various systems: data correctness, completeness, and appropriateness along with the format checks, data lineage checks, and pattern analysis.

b. Verify transformation rules and logic applied on the raw data to get the desired output format (tables, flat files or big data).

c. Verify that the output queries or programs provide the intended data output.

d. Test for positive and negative scenarios.

Algorithm Testing (Development Team)

a. Split input data for learning and for algorithms.

b. If the algorithm uses ambiguous datasets, i.e. the output for a single input is not known, the application should be tested by feeding the set of inputs and checking if the output is related. Such relationships must be soundly established to ensure that algorithms do not have defects.

c. Check the cumulative accuracy of hits (TP’s and TN’s) over misses (FP’s and FN’s). (True positive and true negatives and false positives and false negative)

API Integration Testing

a. Verify input request and response from each API.

b. Test the communication between the components (input response returned and response format and correctness).

c. Verify the output of integration of API’s connected with each other.

System/Regression Testing

a. Conduct end-to-end implementation testing for specific use cases to ensure the Quality of the product.

b. Check for system security testing.

c. Conduct the user interface and regression testing of the system.

User Case 1:

Performance Testing is done to provide stakeholders with information about their application regarding speed, stability, and scalability. Performance testing will determine whether their software meets speed, scalability and stability requirements under expected workloads.

ML Jobs

Here, at IVOW AI we would be testing various scenarios (which the real users might perform) with a concurrent user load and will be monitoring and reporting the response time of different APIs. Based on the results, optimization of the APIs would be done. It has already been conducted in phase 1 and would be managed in phase 2 as well.

Use Case 2:

Impact analysis is very important for the testing prospective. This analysis is done to analyze the impact of the changes in the deployed application or product. This helps in identifying the unintentionally affected functionality because of the change in the application.

AI-led testing approach: Strong focus on AI algorithms for test optimization, defect analytics, scenario traceability, requirements traceability, and rapid impact analysis.

For our Culture Graph: We check the accuracy of the user input and dataset. Different input and output combinations are fed to the machine based on which it learns and defines the functions. Natural language processing (NLP) is a very important feature in IVOW which should be taken into consideration while testing the culture graph.

Following is the ideal approach to evaluate the IVOW system (which is a kind of information retrieval system) mentioned briefly. Currently, only manual evaluation is performed:

  • Evaluate recall: a measure of how many results contain the relevant document (in IVOW’s case, whether all the returned words are pointing to the URLs corresponding to a given search_keyword rather than mixing results with the URLs of other search_keywords).
  • Evaluate precision: a measure of how accurate the results are (i.e., in IVOW’s case, whether the word corresponding to each search_keyword actually relates to it, pointing to one of its URLs and the URL page also has that word as its content).

If you’re in the Testing and QA Community and interested in supporting us with open-source research in this area please be in touch Davar@ivow.ai.

IVOW AI is an early stage startup focusing on cultural intelligence in AI. We address a much-needed market: the convergence of artificial intelligence to preserve culture with the need for marketers to better understand culture. We are part of WAIAccelerate, the Women in AI accelerator program; a KiwiTech Portfolio company; and incubating at We Work Labs in DC as we build our MVP.

Don’t forget to give us your ? !


Building and Testing an AI Platform was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/building-and-testing-an-ai-platform-ec95fbd5d3a7?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/building-and-testing-an-ai-platform

Discovering Equations describing the universe using Graph Neural Networks

This paper uses graph neural networks to fit into physical system and then uses symbols to recover real-world equations. And even was able…

Via https://becominghuman.ai/discovering-equations-describing-the-universe-using-graph-neural-networks-c5969575eedd?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/discovering-equations-describing-the-universe-using-graph-neural-networks

What are Big data analytics tools and what are the advantages of these?

Source

In order to leverage big data, businesses need to have robust strategies in place for handling massive volumes of data.

By now, it has been fully established that big data is much more than just a buzzword, which was thought once by a lot of people. Instead, it’s probably the biggest asset that businesses may ever have. In order to leverage big data, businesses need to have robust strategies in place for handling massive volumes of data. And this is exactly where big data analytics tools come into the picture. They help businesses to identify trends, point out patterns and derive many valuable insights that can be used by decision-makers to make informed business decisions.

It’s important to understand that big data is of no use without the analysis of the captured information and making sense of this data falls under the domain of big data analytics tools that offer different capabilities for businesses to obtain competitive value.

Big data analytics is a collection of different processes which are related to business, data scientists, production teams, business management, among others.

There’re several big data analytics tools are being utilized for big data analytics model. We’ve created this post to give you an overview of some of the most popular big data analytics tools, how they work, and why they have gained popularity.

Trending AI Articles:

1. Natural Language Generation:
The Commercial State of the Art in 2020

2. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

Before delving deeper, let’s have a quick look at some features and characteristics that any big data analytics tool must contain.

1- Fundamental features of big data analytics tools

  • Analytic capabilities: Different big data analytics tools come with different types of analytic capabilities like decision trees, predictive mining, neural networks, time series etc.
  • Integration: Sometimes additional programming languages and statistical tools are required by businesses to conduct different forms of custom analysis. So, it’s required for big data analytics tools to come equipped with it.
  • Scalability: Data wouldn’t be the same always and will grow as a business grows. With the scalability feature of big data analytics tools, it’s always effortless to scale-up as soon as the business captures new data.
  • Version control: The majority of the big data analytics tools get involved in the adjustment of the parameters of data analytics models. Version control feature helps to improve the capabilities to track changes.
  • Identity management: Identity management is a required feature for all effective big data analytics tools. They should be able to access all the systems and all related information which may be associated with the computer software, hardware, or any other individual computer.
  • Security features: Data security should be paramount for any successful business. The big data analytics tools that are used should come with safety and security features to safeguard the collected data. In addition, data encryption is an imperative feature which should be offered by big data analytics tools.
  • Visualization: This feature of big data analytics tools enables professionals to display the data in a graphical format, making it more useable.
  • Collaboration: Though analysis can be a solitary exercise sometimes, it frequently involves collaboration and thus, this feature is required.

You can always go out and purchase big data analytics tools in order to cater to the needs of your business. But all big data analytics tools aren’t created equal and some may not be efficient in dealing with the task for which you’re buying it. In addition, buying additional tools beyond your business’s existing analytics and business intelligence applications may not be necessary based on the particular business goals of a project.

In this post, we’re going to take a closer look at some of the most popular big data analytics tools to help you make an informed purchase decision. Just ensure that the tool you select comes with all of the features mentioned above together with other ones that may be required to support your business results and organizational decision-making teams as well.

2- Popular big data analytics tools

Here’re some of the widely used big data analytics tools together with their key advantages.

2.1- Apache Hadoop

It’s a software framework employed for the handling of big data and clustered file system. This open-source framework offers cross-platform support and is being used by some of the giant tech companies including Microsoft, IBM, Facebook, Intel etc.

Advantages:

  • Highly scalable
  • Offers quick access to data
  • Presence of Hadoop Distributed File System (HDFS) that comes with the ability to hold every type of data
  • Highly effective for R&D purposes

2.2- Tableau Public

This intuitive and simple tool offers valuable insights through data visualization. A hypothesis can be investigated with the help of Tableau Public. You can embed visualizations published to this tool into blogs and share web pages through social media or email.

Advantages:

  • Enables free publishing of visualizations to the web
  • No programming skills required

2.3- Google Fusion Tables

When it comes to big data analytics tools, Google Fusion Tables is a cooler version of Google Spreadsheets. You can use this excellent tool for data analysis, large dataset visualization etc. In addition, you can add Google Fusion Tables to your business analysis tools list.

Advantages:

  • Lets you visualize larger table data online
  • Lets you summarize and filter across a huge number of rows
  • Enables you to create a map in minutes
ML Jobs

2.4- Storm

It’s an open-source and free big data computation system. It comes with distributed stream processing, fault-tolerant, real-time processing system together with real-time computation capabilities.

Advantages:

  • Guarantees the processing of data
  • Reliable at scale
  • Very fast and fault-tolerant

2.5- RapidMiner

It’s a cross-platform that comes with an integrated environment for predictive analysis, data science, and machine learning. It comes under different licenses and the free version allows for up to 10,000 data rows and 1 logical processor.

Advantages:

  • The effectiveness of front-line data science algorithms and tools
  • Integrates well with the cloud and APIs
  • The convenience of code-optional GUI

2.6- Qubole

This all-inclusive, independent big data platform manages, learns, as well as, optimizes on its own from the usage. It enables the data team to focus on business outcomes rather than managing the platform.

Advantages:

  • Increased flexibility and scale
  • Faster time to value
  • Optimized spending
  • Easy to use

2.7- NodeXL

It’s one of the best big data analytics tools available in the market. This open-source software offers exact calculations and comes with advanced network metrics.

Advantages:

  • Graph visualization
  • Graph analysis
  • Data import

2.8- Apache SAMOA

SAMOA or Scalable Advanced Massive Online Analysis is an open-source platform for machine learning and big data stream mining. With this, you can create distributed streaming ML algorithms and have them run on multiple DSPEs.

Advantages:

  • True real-time streaming
  • Fast and scalable
  • Simple to use

2.9- Lumify

This free and open-source tool lets you perform big data fusion/integration, visualization, and analytics. Some of its primary features are 2D and 3D graph visualizations, full-text search, integration with mapping systems, automatic layouts, among others.

Advantages:

  • Scalable and secure
  • Supported by a dedicated and full-time development team
  • Supports the cloud-based environment

2.10- MongoDB

It’s a NoSQL database written in JavaScript, C, and C++. It comes with features like Aggregation, Indexing, Replication, MMS (MongoDB management service), file storage, load balancing, among others.

Advantages:

  • Reliable and low cost
  • Easy to learn
  • Offers support for multiple platforms and technologies

2.11- Datawrapper

It’s one of the big data analytics tools that are used by newsrooms throughout the world. This open-source platform enables its users to quickly generate precise, simple, and embeddable charts.

Advantages:

  • Works very well on every type of device
  • Fast and interactive
  • Fully responsive
  • No coding is required

Closing Thoughts

Big data analytics tools have become imperative for large-scale industries and enterprise because of the massive volume of data they need to manage on a regular basis. These tools help businesses save a significant amount of resources and in obtaining valuable insights to make informed business decisions. As big data analytics refers to the complete process of capturing, organizing, and analyzing massive sets of data, the process requires very high-performance analytics. In order to be able to analyze such massive volumes of data, specialized software like big data analytics tools are must.

In the present situation, the volume of data is steadily increasing along with the technology growth and world population growth. This is a clear indication of the immense necessity of having big data analytics tools for businesses to leverage the power of that data. These tools are being heavily used in some of the most widespread sectors including travel and hospitality, retail, healthcare, government, among others.

With huge investments and interests in big data technologies, professionals with big data analytics skills are in high demand. For those looking to step into this field, probably this is the best time to get some certifications to showcase their skills and talent. It’s important to note that the domains of the big data landscape are quite different and so does their requirement. Since data analytics is the emerging one in every field, the need for trained professionals with adequate knowledge is naturally huge as well.

Join our new Slack community: https://bit.ly/AI-ML-DataScience-Lovers-Slack-Channel

Don’t forget to give us your ? !


What are Big data analytics tools and what are the advantages of these? was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/what-are-big-data-analytics-tools-and-what-are-the-advantages-of-these-175816885bf6?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/what-are-big-data-analytics-tools-and-what-are-the-advantages-of-these

How to Build Your Data Science Competency for Post-COVID Future

Data science is helping healthcare organizations and businesses navigate the current crisis more effectively. Find out how you can learn this in-demand qualification and help them with addressing complex challenges.

Originally from KDnuggets https://ift.tt/2NNcPx6

source https://365datascience.weebly.com/the-best-data-science-blog-2020/how-to-build-your-data-science-competency-for-post-covid-future

Data Cleaning: The secret ingredient to the success of any Data Science Project

With an uncleaned dataset, no matter what type of algorithm you try, you will never get accurate results. That is why data scientists spend a considerable amount of time on data cleaning.

Originally from KDnuggets https://ift.tt/3dQJ3SG

source https://365datascience.weebly.com/the-best-data-science-blog-2020/data-cleaning-the-secret-ingredient-to-the-success-of-any-data-science-project

Credit-Cards customer approval and categorization

Source

Credit cards play a major role in most of the people who deal with various money transactions. A credit card gives the ability to make money transactions even there is no balance left in the holder’s account. By offering these services, banks find a way to make an income by charging form the account holder. Usually, a bank charges 3% of a charge from the amount of the money transaction made by an account holder. Addition to that another interest is charged as an account holder delayed to pay their due.

Usually, a bank targets the profit and the average cash flow of their business. Account-holders who have a good cash flow, give a higher income to the bank. That income further increases with the delay they get to repay their due amount, since the bank charge an interesting value for the delay. In contrast account holders who does not make much transaction and account holders who get delayed to pay their due for months give a negative result to the bank’s cash flow. Account-holders who doesn’t make many transactions but they clear their due regularly do not give a bad impression to the business and also not so important since they make relatively low income to the bank.

Identifying the nature of the customer is a great advantage to the bank to make decisions in their business. A bank cannot fully distinguish whether a customer is profitable or not at first glance. The bank has to study the financial background and history of past payments throughout the period of their account. This study makes no sense to a human data analyzer with a higher amount of considered parameters. But machine learning could help us in this matter. By resolving the pattern of a customer it is possible to predict whether a customer’s next monthly payment would be a default payment or not. This gives the opportunity to the bank to take business action regarding their customers.

The flowchart in the shows the way data has processed from raw data to prediction model selection

As mentioned in the flow chart the data has been visualized at first and pre-processed while creating new features. Thereafter the attributes will be selected based on the statistical and domain understanding, next different classification models will be trained and finally select the best model based on those prioritise the features.

Trending AI Articles:

1. Natural Language Generation:
The Commercial State of the Art in 2020

2. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

Feature Engineering

The problem consists of 24 parameters which describe the background of the account holder and due, payment records throughout 6 months(July — December). The co-relation matrix gives an idea about how data is co-related

When carefully analysing the co-relation matrix, a strong correlation can be seen between due amounts of July to December. But when considering the required prediction, only payment amounts were mostly co-related. All other parameters fall under 0.07 per cent of co-relation with the required prediction. Since the co-relation values rely only on statistical representation it shows the requirement of machine learning approach to resolving the underlying complex pattern.

Feeding non engineered features may mislead machine learning algorithms and might lack accuracy. But we can model a structure according to the problem with to solve and generate more meaningful parameters.

All the new features stand out with a higher co-relation to \textit{NEXT\_MONTH\_DEFAULT} compared to the features that were used to define the new features. Since the given dataset contains temporal features, new features that reflect the temporality were considered. However, new features with exclusive correlations could not be found. Also due to the fact that test data dimensions were identical to train data, it can be assumed that temporal features may not improve the models.

Moreover, by using principal component analysis a new set of features were created, which didn’t have a strong correlation to the class but it represented the whole data set

Preprocessing

All the preprocessing steps described below were applied to both training and testing datasets. Firstly the datasets were read into python pandas data frames. Then data were checked for NaNs and found none. The columns with multiple units were treated to hold values in the same unit. For example, Balance_Limit_V1 had multiple units {M and K} and the values were stored as strings. The values were converted to int64 format to express the meaning of {Balance_Limit_V1}. Then a check for finding outliers was carried out. As it seemed to discard potential outliers didn’t increase the models’ performance this step was undone.

Jobs in AI

Final Model and how it was reached

Initially, 11 different classification models were training a Logistic Regression, KNN Regression, SVC with a Linear kernel, SVC with RBF kernel, Gaussian NB, a Decision Tree Classifier, Random Forest Classifier, XGB Classifier, Extra Trees Classifier, an Ada Boost Classifier and a Classical Neural Network. Then the dataset has been divided into 3 parts as 70\% Training data, 15\% cross-validation data and 15\% test data randomly.
Thereafter, by checking the co-relation matrix and by domain understating top 7 and 6 features were selected from 2 sets from classical feature creation and PCA. Then the models were optimized for the Training data set and have done the initial accuracy and error test from the cross-validation set and tested. After, by repeating the same process with changing the cross-validation set and training set above models were trained for 5 times by fine-tuning the hyperparameters. From the above 11 algorithms, 4 algorithms outperformed in Training accuracy, Testing accuracy and in cross-validation accuracy. Which were Logistic Regression, SVC with RBF kernel, XGBoost Classifier, Ada Boost Classifier

Finally based on the feature distribution, Recall, Precision and F1 score the algorithm was selected.

Specifications and recommendations (Business Insights)

Account-holders can be categorized as their importance to the bank. The most important part is people who bring the most income and cash flow to the bank. These kinds of people can identify as following

1) Account-holders with a higher average of expenditures within a month and also pay their due regularly as soon as possible. Since this kind of people makes transactions with a higher amount, the bank gains a considerable amount of profit.
2) Account-holders with a relatively lower average of expenditures does not make much income to the bank. These kind of people are not much important to the bank. But they do not make any harm to the bank’s cash flow or profit.
3) Another type of account holders are people who spend a lot and take a long time to pay their due amounts, but anyhow they manage to pay the due. Even though they provide a considerable amount of interest amount since they take considerable duration to pay their due. But the issue is this kind of income come with a risk.
4) Account-holders which takes a very long duration to pay their due, it makes a negative effect on the bank’s cash flow.

To improve the income of the bank above four categories should be handled effectively. The first type of account holders should be treated well to keep them on spending. Since they provide a higher amount of income to the bank, giving special offers to motivate them and leads to spending more money could boost up their expenditures and increase the amount of income to the bank and also satisfying the customer.
The second type of account holders tends to spend fewer amounts through their credit card account. But there is a chance we could manage to lead them to spend more through their credit card accounts. One way to achieve this is to give them the ability to make instalment payment through their credit card account. These kinds of account holders tend to use this kind of payment methods since they cost relatively less amount per month. This way ensure that account holder retains with the bank for a known duration.
But the total value of the instalment payments relies on the financial stability of the bank company.
The third type of account holders should be handled carefully. The bank should try to keep them with the bank. Offering Offers that suitable for their budget limit. They should be monitored frequently and notice whether they falling into a bankrupt. If any account holder is noticed to be bankrupt, it should inform the risk management team of the company.
The fourth type of account holders mostly tends to fall in to bankrupt. Therefore they must be monitored and their balance limit should be controlled to avoid them making risky payments.

Finally, By following above step a back can maximise their profit, customer base and customer retaining. Moreover, this will increase the cash-flow of the company and increase wealth in the long run.

Don’t forget to give us your ? !


Credit-Cards customer approval and categorization was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/credit-cards-customer-approval-and-categorization-441ea4d5a24f?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/credit-cards-customer-approval-and-categorization

Systemic Biasing 2.0

Image source: https://www.miltonmarketing.com/wp-content/uploads/2019/08/ai-isnt-biased-we-are.jpg

Right now, there is a huge awareness and discussion (and rightly so!) going on around systemic racial biases in the US against the black community. I am not an expert in sociology; hence I would avoid sharing an opinion on how to resolve that. But what I’ll talk about is the next wave of systemic biases — algorithmic biases — which is not being discussed as much but will be more widespread and detrimental to the society if we don’t put an effort to address them. It should not be news that decision-making is moving towards machines as our machines are becoming smarter. But these machines depend on algorithms that are written by human beings and trained on the dataset representing the same underlying systemic biases (sometimes).

Machine Learning, or Artificial Intelligence what media prefers to call to match the image people have in their mind from Ironman or Terminator movies, ultimately relies on how you train the algorithm — features, and dataset you choose. Hence these algorithms aren’t better than humans if features and dataset used to train them are biased in the first place. In the layman language, features are the different characteristics that help machines to differentiate between data points — similar to how in the physical world some people co-relate skin/race/gender/education/neighborhood to certain things/activities.

Trending AI Articles:

1. Natural Language Generation:
The Commercial State of the Art in 2020

2. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

Here is the real-world consequence of these (intentional or unintentional) biases, Optum designed an algorithm for hospitals (used in over 50 organizations and probably thousands of hospitals serving millions of patients in the US) that tells Doctors which of their patients require their additional attention based on their current health. It turned out the algorithm was biased towards healthier white patients and gave them priority over sicker black patients because the algorithm writer* used a feature of “cost” to rank patients. Historically healthcare cost on black patients has been lower than on the white patients; hence this feature shouldn’t have been selected in the first place. Ideally, patients should have been ranked based on their chronic diseases only and not on how much they pay for the additional care they receive by physicians. We don’t know how much damage this biased algorithm caused before the fix, but we probably wouldn’t see any protests against these biased algorithms or companies owning them. These AI systems are doing what and how they are trained for — (sounds familiar?!). This is just one example, but you can find hundreds of similar studies showing the same in various industries including job/resume selection based on the name and gender and men getting more credit limit than women with the same credit history when applying for a new credit card (Disclaimer: I am not blaming any of these companies doing this intentionally).

Hence as we are moving towards fully-automated or hybrid (machine filters first and human beings decide based on a machine provided options) decision-making systems — it is very critical that we create the right culture, environment, processes, policies, and training to address these unintended biases. I believe that the next wave of policies (I am calling them Policies 2.0) will have serious consequences of how our societies operate and these will not be driven by the policymakers in Capitol Hill but by these tech companies and their algorithm writers predominantly male in their 20s and 30s sitting in front of their wide-screen monitors and whiteboards. But that also means there is a big opportunity that we can actually fix these “systemic biases” if they are done right (not just with the best intentions but also actions with being open to accepting mistakes and correcting course).

Jobs in AI

Here are a few ideas that may be the first baby steps in addressing these issues:

  1. Diversity: It is not news that there is a racial as well as gender imbalance in the workforce in the tech industry. We need to fix this imbalance so that there are different opinions in the room and on whiteboards. We need to fix this 9:1 Men: Women ratio in our tech industry. Saying that there are not enough candidates should not be an excuse anymore. In recent years, big tech companies have acknowledged and taken encouraging actions. Still, progress has been really slow, and now it is high time that companies take measurable and actionable goals on these. Things are even worse when it comes to the Black and Latino communities working in the tech industry. If we can come up with driverless cars, then I am sure we can fix this too if large tech companies make this a real priority and not just for a good gesture or marketing.
  2. Feature engineering: This is one of the most critical, unacknowledged, and the hardest to capture sources of biases in algorithms. The way machine learning algorithms (over-simplified explanation) are built in most of the big tech companies, algorithm writers get or generate the dataset with the labeled data and split it into training and test data (e.g., 3:1) and build their algorithms on the training set and strive to match the output with their labeled data. To achieve this, Algorithm writers identify features that are characteristics of the data points and can be biased if the dataset is biased or if she/he lacks the domain knowledge to select unbiased features. For example, if an Ivy League University tomorrow decides to automate their undergrad admission process, they will design an algorithm that will look at ACT/SAT scores as one feature, high school GPA as the second feature, (maybe) parents jobs and education as the third feature to predict the probability of the success of applicants. Algorithm writers are obsessed with matching the output of their algorithms with the labels (aka ground truth), and they will give weightage to individual features to “fit” their output. With the re-emergence of Deep Learning in the past few years, even feature engineering and weightage are becoming automated. Hence the quality of decision making by an AI system will essentially be dependent on the quality of the data itself. Tech companies need to put a review process in place (maybe by a third-party and some regulatory body) for any AI system that does any decision making (fully or hybrid) in any public (serving human beings) institution like healthcare, education, law enforcement, and transportation. Facebook has recently created an oversight board to do something like that to police the content on its platform — a great first step but yet to be seen how Mark Zuckerberg will react when they overrule his decision.
    Similarly, algorithm writers should be conscious of these biases and should raise the flags when they see any underlying biases in datasets. Our job as algorithm writer should not be to just increase accuracy measured by precision and recall but also making sure decision-making is better, more effective, and unaffected by which and how humans use them in the future (Tough one — right?!). Tech companies can also put training in place to train their engineers using case studies on how to identify and flag these biases.
  3. Unbiased Datasets: I have already explained the issue of biases in datasets above that will creep into AI systems if the job of algorithm writer is to fit their model with the labeled data, which is what most of the current systems do. It is unfair to expect an unbiased AI system if the dataset is biased in the first place. Hence in our machine learning strategy, there should be an additional step of vetting the biases in the dataset (e.g., crowdsourcing). We should verify if the dataset represents the diversity of the population, it is going to make predictions about and if the labeled data is fair and impartial. There are various works done in theory on this, but very rarely we have seen practicing them on a large scale as practicing any of them will increase the time to deliver the project (and consequently increase the cost and lower profits). Hence, leadership and culture play a critical role here in managing these priorities.

Again this is a hard problem to solve, and by no means, my purpose of this write up is to provide solutions — this will be the next big evolution of our society and human beings. The 21st century will be a defining moment in designing these policies and practices — maybe we need a new UN like global body to which everyone is answerable to. I have personally become more conscious of my unconscious biases after recent discussions and incidents that hopefully take us to a better place than before. But right now, what we need is (1) an acknowledgment or at least possibility of these biases in AI systems and (2) initiating debate and processes about how we can address these biases. Acknowledgment itself is the first stepping stone of this long journey ahead of us.

P.S. Any thoughts or ideas shared in this blog are entirely personal and have nothing to do with my current or any of my previous employer(s). Also, I am not blaming any particular tech company or person for any intentional discrimination against any race/gender/sexual orientation or anything. All information shared here is based on the publicly available articles, and no confidential information was used.

*I used the word Algorithm writer instead of programmer or engineer as there are a lot of people involved when AI systems are designed including managers of these programmers, middle and senior leadership that sets goals and outcome metrics, and culture of the company which ultimately is driven by the executive leadership. All these people are algorithm writers and play a role in shaping the outcome.

Don’t forget to give us your ? !


Systemic Biasing 2.0 was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/systemic-biasing-2-0-c7cb8786b42a?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/systemic-biasing-20

Deep Learning: The Dawn of Automation

“What’s behind the driverless cars? Artificial Intelligence, or more specifically Deep Learning?”-Dave Waters

Image by Author

Deep Learning is a Machine Learning Technique that teaches computers/machines to imitate humans and therefore the way they think or react to a particular problem set. Deep Learning concerned with algorithms inspired by the structure and function of the brain called Artificial Neural Networks (ANNs). Until recently, it was difficult to perform such tasks due to a setback in computing powers but now advancements in big data analytics have permitted larger, sophisticated neural networks, allowing computers to observe, learn, and react to complex situations faster than humans. Deep learning has aided image classification, language translation, speech recognition.

Neural Networks
Neural network and deep neural network [1]

Evolution of deep learning:

It is believed that deep learning was invented at the dawn of 21st-century, but believe it or not, it has originated since the 1940s.

The reason most of us are unaware of deep learning advancements that were developed in the 20th century is due to the fact that the approaches used then were relatively unpopular due to their various shortcomings and the fact that it had a couple of revitalizations since then.

There were THREE Waves:

Cybernetics — During 1940–1960
Connectionism — During 1980–1990
Deep Learning — Since 2006

The first 2 waves were unpopular due to the critics of their shortcomings, however, there is no doubt that it has helped advance the field to where it is today and some of the algorithms developed during those times are used widely till today in various machine learning and deep learning models.

Trending AI Articles:

1. Natural Language Generation:
The Commercial State of the Art in 2020

2. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

After two dips, the third wave emerged in 2006 with a breakthrough. The advancements by Geoffrey Hinton were used by other researchers to train different types of Deep Networks. This enabled researchers around the world to Train Deeper and Deeper Neural Networks and led to the popularisation of the term Deep Learning.

Evolution of Deep Learning

Why deep learning over traditional machine learning?

Difference between ML and DL
Image by Author
Machine learning vs deep learning [2]

Problem Solving in Deep Learning:

Deep learning permits machines to unravel advanced issues even when employing a dataset that’s extremely numerous, unstructured, and inter-connected. The additional Deep Learning Algorithms they learn, the additional they perform. The process of problem-solving in deep learning does not want to be broken down into small steps. It solves problems on an end-to-end basis.

Jobs in AI

Applications of Deep Learning:

Some of the major applications of Deep Learning are:

  1. Self-Driving Cars: Deep Learning is the force that is bringing autonomous driving to life. A million sets of data are fed to a system to build a model, to train the machines to learn, and then test the results in a safe environment. A regular cycle of testing and implementation typical to deep learning algorithms is ensuring safe driving with more and more exposure to millions of scenarios. Data from cameras, sensors, geo-mapping is helping create succinct and sophisticated models to navigate through traffic, identify paths, signage, pedestrian-only routes, and real-time elements like traffic volume and road blockages.
  2. HealthCare: One of the chief DL applications in healthcare is the identification and diagnosis of diseases and ailments which are otherwise considered hard-to-diagnose. This can include anything from cancers that are tough to catch during the initial stages, to other genetic diseases. Deep learning and Machine Learning are both responsible for the breakthrough technology called Computer Vision. One of the most sought-after applications of machine learning in healthcare is in the field of Radiology which enables medical image analysis at any particular time.
  3. Voice Assistants: The most popular application of deep learning is virtual assistants ranging from Alexa to Siri to Google Assistant. Each interaction with these assistants provides them with an opportunity to learn more about your voice and accent, thereby providing you a secondary human interaction experience. They learn to understand your commands by evaluating natural human language to execute them. Another capability virtual assistants are endowed with is to translate your speech to text, make notes for you, and book appointments.
  4. Fraud Detection: Another domain benefitting from Deep Learning is the banking and financial sector that is plagued with the task of fraud detection with money transactions going digital. Autoencoders in Keras and Tensorflow are being developed to detect credit card frauds saving billions of dollars of cost in recovery and insurance for financial institutions. Fraud prevention and detection are done based on identifying patterns in customer transactions and credit scores, identifying anomalous behaviour, and outliers. Classification and Regression machine learning techniques and neural networks are used for fraud detection. While machine learning is mostly used for highlighting cases of fraud requiring human deliberation, deep learning is trying to minimize these efforts by scaling efforts of the machines. Machine learning allows for creating algorithms that process large datasets with many variables and help find these hidden correlations between user behaviour and the likelihood of fraudulent actions. Another strength of the machine learning system compared to rule-based ones is a faster data processing and less manual work. For example, smart algorithms fit well with behaviour analytics for helping reduce the number of verification steps.

Authors:

1. Rahul Mandviya
2. Abhinav Gadgil
are part of CORE TEAM of HEXABERRY DATA SCIENCE COMMUNITY.

Authors can be reached at:
1. Hexaberry.datasciencecommunity@gmail.com
2. info@hexaberrytechnologies.com
3. https://www.instagram.com/hdsc.official
4. https://www.facebook.com/Hexaberry-Data-Science-Community-HDSC-105656644481089

References:

  1. https://www.acfun.cn/a/ac3800512, accessed on 22/05/20
    2. https://morioh.com/p/7ff324bc021e, accessed on 21/05/20
    3. https://www.mygreatlearning.com/blog/deep-learning-applications/#virtual, accessed on 21/05/20
    4. https://machinelearningmastery.com/what-is-deep-learning/, accessed on 21/05/20
    5. https://towardsdatascience.com/the-deep-history-of-deep-learning-3bebeb810fb2, accessed on 21/05/20

Don’t forget to give us your ? !


Deep Learning: The Dawn of Automation was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/deep-learning-the-dawn-of-automation-63f54dffb805?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/deep-learning-the-dawn-of-automation

Design a site like this with WordPress.com
Get started