365 Data Science

Avoid overfitting using cross-validation

Folding Validation sets using Cross-Validation!

This article is divided into 3 main parts:

1 — Overfitting in Transfer learning

2 — Avoiding overfitting using k-fold cross-validation

3 — Coding part

Transfer Learning is a term that has crossed the field of deep learning lately and been used so far.

A quick recall about transfer learning: Using pre-trained models to train yours in case you don’t have enough dataset for the new dataset.

For a detailed explanation about transfer learning, read the following article about Transfer Learning.

A Kaggle competition caught my attention weeks ago that I felt intrigued to give it a try. It was centered on image classification and the candidate is up to choose one of two topics: heart disease and grocery items.

So, I chose the grocery items. They were 19 classes with significantly very low data!

For this reason, using Transfer learning is a must. But, a problem that anyone will encounter is overfitting. Overfitting can be simply put as the following:

The model can recognize the training dataset too well but lacks the ability to learn the dataset features, so it fails to predict new unseen data.

At first, the loss that the model produced was very high and the accuracy didn’t go above 0.1!. Analyzing the graphs produced to realize that it is an overfitting major problem. Hence, K-Fold Cross-validation was the best choice.

K-Fold Cross-Validation

Simply speaking, it is an algorithm that helps to divide the training dataset into k parts(folds). Within each epoch, (k-1) folds will be taken as training data, and the last part will be a testing part for predictions. The latter part is called the “holdout fold”. Each time, the holdout fold will change. A kind of shuffle k-times will take place within the k-parts.

How will this affect the model and alleviate the overfitting?

Actually, the problem with overfitting is that the model gets ‘over-familiar’ with the training data. To avoid such a scenario, we will use cross-validation.

Don’t forget to give us your ? !

Avoid overfitting using cross-validation was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/avoid-overfitting-using-cross-validation-51241aa9bf8c?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/avoid-overfitting-using-cross-validation

The AI behind getting the first-ever picture of a black hole

AI AND UNIVERSE

A black hole is a massively condensed object tat the center of our universe whose gravitation does not let even light escape from it

This year’s Nobel prize in physics has been awarded to Sir Roger Penrose (1/2), Reinhard Genzel (1/4), and Andrea Ghez (1/4) for their research on Blackhole. Even last year it was in astronomy and cosmology. These are exciting times for astronomy since the last one before that was in 2006.

There is a common trait in astronomy and AI. The work started sometime in the 20th century and was not proved then due to the limitation of the technology. And now when the technologies are developed, we are able to provide pieces of evidence.

The noble prize was awarded last week and that I why I thought it is worth dedicating some time to their work. The winners used the general theory of relativity which talks about how gravity behaves in our universe and how space and time are bent and changed in the space.

The bending of time in space — Source: The Royal Swedish Academy of Sciences

Sir Roger Penrose used it to explain and prove the existence of a black hole. The gravity is so strong at the center of the black hole that everything including light particles condenses and forms a massive object, the concept being called the singularity. Penrose proved its existence mathematically back in 1969. But the Nobel prize committee likes the theory observationally or experimentally confirmed before awarding the prize. This is similar to when Einstein provided the theory of gravitational waves in the 1960s but the award was provided in 2017 for the ‘detection’ of gravitational waves. The incredible first-ever image of a supermassive black hole at the center of the M87 galaxy is what I think triggered the Nobel prize committee to think about consideration about the prize at the end of 2019.

Don’t forget to give us your ? !

The AI behind getting the first-ever picture of a ‘black hole’ was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/the-ai-behind-getting-the-first-ever-picture-of-a-black-hole-c483e8eb6a21?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/the-ai-behind-getting-the-first-ever-picture-of-a-black-hole

How to Explain Key Machine Learning Algorithms at an Interview

While preparing for interviews in Data Science, it is essential to clearly understand a range of machine learning models — with a concise explanation for each at the ready. Here, we summarize various machine learning models by highlighting the main points to help you communicate complex models.

Originally from KDnuggets https://ift.tt/357Z6Jo

source https://365datascience.weebly.com/the-best-data-science-blog-2020/how-to-explain-key-machine-learning-algorithms-at-an-interview

Roadmap to Natural Language Processing (NLP)

Check out this introduction to some of the most common techniques and models used in Natural Language Processing (NLP).

Originally from KDnuggets https://ift.tt/2H4yIZ6

source https://365datascience.weebly.com/the-best-data-science-blog-2020/roadmap-to-natural-language-processing-nlp

365 Data Use Cases: Data Science and Medical Imaging with Giles McMullen-Klein

Hi! My name is Giles. I’m an Oxford-trained medical physicist and data scientist turned Python instructor and the author of the 365 Python Programmer Bootcamp course.

I’m happy to join the 365 Data Use Cases series and, in this post, I’ll tell you a bit more about my favorite data use case: advanced medical imaging.

We’ve also made a video on the topic that you can watch below or just scroll down if you prefer reading.

Data Science and Advanced Medical Imaging: DMP & MRI

Until recently, I worked as a research scientist, and a medical physicist. That’s how I was introduced to DMP (Dextran‐magnetite particles) – a specialist type of contrast agent used in Magnetic Resonance Imaging (MRI). Without going into detail about how it works, it would be enough to tell you that DMP provides such a sensitive type of advanced medical imaging, that it enables you to image metabolism in a living body.

The fact that you can create a medical imaging system that could do that is quite amazing in itself. But what’s really useful about it is that most disease affects in some way the metabolism of a body, especially cancer and heart disease. And those were precisely the areas I was focused on. So, in my line of work, we would collect huge amounts of data by doing advanced medical imaging. And that enabled us to:

follow and watch metabolic pathways in areas of interest within a body
see how they behaved in a healthy body
and then compare that to how they behaved in a diseased body

The Role of Diagnostic Medical Imaging in Analysis

Overall, the aim was to see whether we could find any way of using the data that we gathered in the medical imaging analysis as a diagnostic for different types of cancers, for example. In the case of heart disease, we were looking to see whether we could somehow determine the amount of damage that had been done, for instance, following a heart attack. Of course, there were many challenges that we faced doing that MRI analysis but that is a topic for a whole new article.

But the data analysis toolset that you’re learning does not restrict you to a specific industry. In fact, it can be applied to any field that you’re interested in. Whether that’s a business application or a scientific application, you use many of the same methods. That said, data science is a great field to be getting into and there’s almost no limit to what you can achieve with it. So, if you’re eager to learn more about data science and Python in particular, check out my YouTube channel. And, if you’re looking to add some indispensable Python programming skills under your belt, you can sign up for my Python course for free.

Try Python Programmer Bootcamp course for free

The post 365 Data Use Cases: Data Science and Medical Imaging with Giles McMullen-Klein appeared first on 365 Data Science.

from 365 Data Science https://ift.tt/3o3k3O9

Best and Top 10 Data Science Online Resource Available For Every Aspiring Data Scientist

As a student fresher or working professional, researcher or undergrad student, if you are thinking to start your journey into the field of…

Continue reading on Becoming Human: Artificial Intelligence Magazine »

Via https://becominghuman.ai/best-and-top-10-data-science-online-resource-available-for-every-aspiring-data-scientist-6aa5437cd9f2?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/best-and-top-10-data-science-online-resource-available-for-every-aspiring-data-scientist

The chatbot system for knowledge management and information management

For a modern workforce, the areas of knowledge and information management are becoming important. It must be intuitive, quick, and seamless to locate, record, and know stuff in an environment where data is distributed, workers are constantly on the go, and career opportunities change rapidly.

The basic principles of Knowledge and Information Management are how effective a company’s product and its service information is managed, like activities combining with optimized search tool since the early 2000s. Yet it won’t be enough any longer. From this promising point in the age of AI and chatbots, if you’re not putting a strong and precise bot to work, you will be left behind.

We have the intellect of a computer processing but wanted it to be smart enough to understand the laymen’s queries through their keyword phrases rather than go through the educating steps to be precise or vice versa. We need to segregate these steps among Knowledge Management and Information Management.

Let me share the difference between knowledge management vs information management in a firm.

Knowledge Management: The process of generating, exchanging, using, and maintaining an organization’s resources and their insights is knowledge management. It refers to an integrative strategy that allows the best use of expertise to achieve organizational goals effectively.

1. Bots need to organize their information better.

The information needs to be structured which impacts how it is found and used by individuals or bots by providing another channel to reach out to your customers. Bots can be leveraged to increase customer engagement with timely tips and offers. Real-time customer communication of chatbots helps the customer find what he is looking for and evaluates different suggestions while having their requirements in parallel to current trends.

Your files can be structured well using a solid folder or metadata structure in a standardized site and repository hierarchy. The performance of the structure, of course, depends on the below points

a) The technique used from the outset and from the beginning to organize the information/data material.

b) How well the structure and content have been maintained by the owner of the hierarchy over time (including eliminating ROT if needed).

For knowledge management, a well-organized hierarchy that is intuitive, standardized, and timely will work well. The content is not organized by searching alone, the kind of idea/suggestions along with queries will have more information on the customer’s knowledge of the product/service. In an organic fashion, the search engine can have several results based on keyword matches, metadata refiners, and, of course, previous file popularity. When the user has no idea how to create the expected details (or cares not to waste time passing through a file structure), this may work well.

With bots, the available information is absolutely determined by the owner(s) of the bot, certain individuals who organize the bot’s data, and how it guides users to the source data they are searching for. For each department or division in an organization, a good bot includes answers to the most popular questions, answers the question being asked (rather than only providing a source for the answer), and links directly to the origin as a guide for more information.

The response is important because, well, it’s what the user was searching for. The reference is also powerful since, should they need it, it automatically leads the information researcher to the source. Bots optimize what’s available when it relates to the available knowledge and have high returns on the expenditure on comments.

2. Bots make high-value data predictable in order to identify.

Site structures, quest, and bots rate quite differently as compared to finding what you’re anticipating. For example, a user is assumed to scan the human resources database with a site layout to find data about their employee compensation. And it is not assured, even though this can be predicted. The web layout may be more difficult than expected, or, frankly, the user may suffer a bit of laziness and give up.

But even if they understand how to get to the information, clicking through folders, views, and plugins is still a challenge and can deter anyone from looking forward to a document they need. They eventually accept not getting the data (possibly affecting the quality of their work) or asking someone else for help (someone adds minimal overall value by using another person’s precious time on a task). In general, this impact on seeking information is okay, but not fantastic.

With the search that just suddenly combed everything you have exposure to; you’re stuck with performance. The user also must contend with international results that are just not relevant for a user who wants to look for best practices combined with a specialized search system (e.g. advertised results, customer refiners). From linking keywords (e.g. “office” for facility details or “Office” for its computer use) to outdated information, a significant amount of data in the search must be collected according to the essence of its indigenous application.

The response is important because, well, it’s what the consumer was searching for. The comparison is also powerful as it leads the knowledge seeker to the source immediately if they need it. Bots optimize what is available when it comes to the available data and provide better returns on the expenditure on responses.

3. Bots make it possible to predict high-value knowledge to find

Site hierarchies, quest, and bots rate very differently as compared to finding what you are anticipating. The material is usually well structured for your folder structures. It’s predictable to find what you want to find because if a file was there last week, it’s probably still there this week, possibly on the same web, library, or folder. The hierarchies are trustworthy best buddies that can be used again and again to find data once we know our orientation. It may not be simple for the owner to set up and manage, but users would be easy to use and enjoy a well-organized information system. You have a perfect balance with a bot: you prescribe what the answers are to the most wanted queries and provide resources as a solid way return (via link) to the actual sources.

Deciding what to include can be daunting at first. Combining the top, say, 50 most popular search queries from the search analytics of your intranet with a known list of FAQs per department or category in your organization is a simple way to start. We’ll mostly see of use of the bot when you have around three-quarters of those systematic reviews. To decide what else people, want to hear, collect any unanswered feedback from users. A bot strikes a balance among knowledge management vs information management.

4. Bots force you to always personalize the high-value information

The curation of information is crucial. Your home page on the intranet may include dynamic content, but inevitably someone with a plan has planned how the content will be presented and has decided what to do or not show. For the overall knowledge control, the same goes.

Bots offer you a happy middle ground where only the content that is important can be curated. Yeah, keeping records of things that happened seven years ago is valuable, but it’s doubtful you’ll have to see it often. On your site, that kind of file is curated. Search offers organic answers and can provide insights into what’s common through its analytics. But searching will only provide you with the source of the data.

If you want to know about the holiday policy, the employee handbook will probably be returned by search; but to locate the section on time off, you’ll have to sift through that paper. For comparison, a curated bot will address the issue about time off and connect to the employee handbook. But rather than the source, the customized reply is the answer the user was searching for. After you have found the source you needed, a curated bot skips the irritating step of having to read, digest, or further check for information.

5. Bots require minimal user training

Unfortunately, excellent Information Management can only come with a well-trained user base of today’s information processing sphere, like using the acquired knowledge for customizing the information and structuring it better. But everybody has their daily jobs, so no one likes having to take a class to learn anything as fundamental as directory structures and searching (required or not). (Although, indeed, they are expected to.)

On the user’s hand, it takes a while to understand and discover a good site and library layout. Ask the newest member of your team how long it took them to grasp the design of the knowledge in your team if you are suspicious of the evaluation. It takes time to understand even the best frameworks, and it takes time away from other work that could be accomplished.

Searching, one might say, has no learning curve. With an empty box with a magnifying glass beside it, everybody knows what to do. But it is not so easy. Of course, if you like, you can use out-of-the-box search that way, but any smart search setup with configured search refiners, pinned results, and more can be used to get the most out of it.

Conclusion

· The Knowledge Management and Information Management fields is a shake-up of actual, functional AI solutions that can get us over the limitations of site frameworks and search, but directly link users to the data they want. Point A to Point Z without any stops along the road. Big, immediate wins can be bought by a smart implementation process. Keep these steps in mind when you begin playing with bots in your organization:

· A bot is required. You can spend several thousands of dollars and months constructing one for development time, or with a bot, you can get up and running in a matter of hours. There’s also a free trial for 30 days to see if he suits your needs.

· Record the x-most popular search queries in your search analytics (I optimized design at 50) and revisit this list each month or so.

· Collect and record the list of common questions from each team, and knowledge base they have, or any cheat cards they use to find appropriate details.

· To answer these questions, customize your bots using a platform like Question and Answer Creator. Link to the sources in your replies.

· Collect communications from users that do not have a successful response. Apply a framework for reviews to consider consumer needs.

· Review your replies on a regular basis to ensure they are specific.

· Delete redundant answers. Modifications of records. Observe best practices.

Don’t forget to give us your ? !

The chatbot system for knowledge management and information management was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/the-chatbot-system-for-knowledge-management-and-information-management-ebedae0cecfd?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/the-chatbot-system-for-knowledge-management-and-information-management

Cartoon: Cloud Dating

New KDnuggets cartoon looks at how AI can transform love and romance.

Originally from KDnuggets https://ift.tt/3o31cCT

source https://365datascience.weebly.com/the-best-data-science-blog-2020/cartoon-cloud-dating

DOE SMART Visualization Platform 1.5M Prize Challenge

The U.S. Department of Energy’s (DOE) Office of Fossil Energy (FE) will award up to $1.5 million to winning innovators in a prize challenge to support FE’s SMART initiative. Registration deadline to participate in the challenge is 11:59 p.m. EDT Friday, Jan 22, 2021.

Originally from KDnuggets https://ift.tt/3iZ34cg

source https://365datascience.weebly.com/the-best-data-science-blog-2020/doe-smart-visualization-platform-15m-prize-challenge

Optimizing the Levenshtein Distance for Measuring Text Similarity

For speeding up the calculation of the Levenshtein distance, this tutorial works on calculating using a vector rather than a matrix, which saves a lot of time. We’ll be coding in Java for this implementation.

Originally from KDnuggets https://ift.tt/37bw5iu

source https://365datascience.weebly.com/the-best-data-science-blog-2020/optimizing-the-levenshtein-distance-for-measuring-text-similarity

365 Data Science

Avoid overfitting using cross-validation

Trending AI Articles:

Training without K-fold cross-validation :

Training with K-fold cross-validation : ( k = 5)

Don’t forget to give us your ? !

The AI behind getting the first-ever picture of a black hole

AI AND UNIVERSE

A black hole is a massively condensed object tat the center of our universe whose gravitation does not let even light escape from it

Trending AI Articles:

Where does AI come into the picture here?

Don’t forget to give us your ? !

How to Explain Key Machine Learning Algorithms at an Interview

Roadmap to Natural Language Processing (NLP)

365 Data Use Cases: Data Science and Medical Imaging with Giles McMullen-Klein

Data Science and Advanced Medical Imaging: DMP & MRI

The Role of Diagnostic Medical Imaging in Analysis

Best and Top 10 Data Science Online Resource Available For Every Aspiring Data Scientist

The chatbot system for knowledge management and information management

Trending AI Articles:

1. Bots need to organize their information better.

2. Bots make high-value data predictable in order to identify.

3. Bots make it possible to predict high-value knowledge to find

4. Bots force you to always personalize the high-value information

5. Bots require minimal user training

Conclusion

Don’t forget to give us your ? !

Cartoon: Cloud Dating

DOE SMART Visualization Platform 1.5M Prize Challenge

Optimizing the Levenshtein Distance for Measuring Text Similarity