Applying Darwinian Evolution to feature selection with Kydavra GeneticAlgorithmSelector

The development of machine learning implies a lot of maths. But sometimes during feature selection phase maths sometimes can’t give an exact answer (because of the structure of data, it’s source, and many other causes). Then in-game enter the Programming tricks, mostly brute force methods :).

Genetic algorithms are a family of algorithms inspired by biological evolution, that basically use the cycle — cross, mutate, try, developing the best combination of states depending on the scoring metric. So, let’s get to the code.

Trending AI Articles:

1. Machine Learning Concepts Every Data Scientist Should Know

2. AI for CFD: byteLAKE’s approach (part3)

3. AI Fail: To Popularize and Scale Chatbots, We Need Better Data

4. Top 5 Jupyter Widgets to boost your productivity!

Using GeneticAlgorithmSelector from Kydavra library.

To install kydavra just write the following command in terminal:

pip install kydavra

Now you can import the Selector and apply it on your data set a follows:

from kydavra import GeneticAlgorithmSelector
selector = GeneticAlgorithmSelector()
new_columns = selector.select(model, df, ‘target’)

As with every Kydavra selector that’s all. Now let’s try it on the Hearth disease dataset.

Jobs in AI
import pandas as pd
df = pd.read_csv(‘cleaned.csv’)

I highly recommend you to shuffle your dataset before applying the selector, because it uses metrics (and right now cross_val_score isn’t implemented in this selector).

df = df.sample(frac=1).reset_index(drop=True)

Now we can apply our selector. To mention it has some parameters:

  • nb_children (int, default = 4) the number of best children that the algorithm will choose for the next generation.
  • nb_generation (int, default = 200) the number of generations that will be created, technically speaking the number of iterations.
  • scoring_metric (sklearn scoring metric, default = accuracy_score) The metric score used to select the best feature combination.
  • max (boolean, default=True) if is set to True the algorithm will select the combinations with the highest score if False the lowest scores will be chosen.

But for now, we will use the basic setting except of the scoring_metric, because we have there a problem of disease diagnosis, so it will better to use Precision instead of accuracy.

from kydavra import GeneticAlgorithmSelector
from sklearn.metrics import precision_score
from sklearn.ensemble import RandomForestClassifier
selector = GeneticAlgorithmSelector(scoring_metric=precision_score)
model = RandomForestClassifier()

So now let’s find the best features. GAS (short version for GeneticAlgorithmSelector) need a sklearn model to train during the process of choosing features, the data frame itself and of course the name of target column:

selected_cols = selector.select(model, df, 'target')

Now let’s evaluate the result. Before feature selection, the precision score of the Random Forest was — 0.805. GAS choose the following features:

['age', 'sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'thal']

Which gave the following precision score — 0.823. Which is a good result, knowing that in majority of cases it is very hard to level up the scoring metrics.

If you want to find out more about Genetic Algorithms at the bottom of the article are some useful links. If you tried Kydavra and have some issues or feedback, please contact me on medium or please fill this form.

Made with ❤ by Sigmoid

Useful links:

Don’t forget to give us your ? !


Applying Darwinian Evolution to feature selection with Kydavra GeneticAlgorithmSelector was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/applying-darwinian-evolution-to-feature-selection-with-kydavra-geneticalgorithmselector-f94c885a9ea7?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/applying-darwinian-evolution-to-feature-selection-with-kydavra-geneticalgorithmselector

Published by 365Data Science

365 Data Science is an online educational career website that offers the incredible opportunity to find your way into the data science world no matter your previous knowledge and experience. We have prepared numerous courses that suit the needs of aspiring BI analysts, Data analysts and Data scientists. We at 365 Data Science are committed educators who believe that curiosity should not be hindered by inability to access good learning resources. This is why we focus all our efforts on creating high-quality educational content which anyone can access online.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Design a site like this with WordPress.com
Get started