Creating a Classification Model for Breast Cancer Screening
The UCI Machine Learning Repository contains many full data sets that can be used to test and train machine learning models. One such example is the Breast Cancer Wisconsin (Diagnostic) Data Set which relates whether breast cancer is benign or malignant to 10 specific aspects of the tumor. Based on this dataset, we can develop a model that will be able to determine the likelihood of breast cancer being benign or malignant.
The process of using machine learning to analyze data is made easy with Knowi Adaptive Intelligence. Given a training dataset, Knowi can apply either classification or regression algorithms to build valuable insights from the data.
Here is a step-by-step guide about how to turn that data into a powerful machine learning model using Knowi:
1. Create the Workspace and Upload Data
To start the machine learning process, go to www.knowi.com. If you are not already a Knowi user, sign up for a free trial to complete this tutorial. Once in, go into the machine learning section that can be found on the left-hand side of the screen. From there, start a new workspace and you will be given a choice of either making a classification or regression model. In the case of the breast cancer example, the workspace will be classification due to the nature of the data where the variable that we are predicting will always fall into either of two categories. Next, upload the Breast Cancer Wisconsin (Diagnostic) Data Set.
2. Choose Response Variable and View Full Dataset
After uploading, and possibly manipulating the file, choose the Attribute to Predict from the drop-down list. In the case of the breast cancer data, the attribute that is being predicted is the class of the tumor. Following the choice of the prediction variable, the initial analysis takes place by using the Analyze Data button. This displays the data on the screen and allows an opportunity to scroll through the data looking for patterns.
3. Prepare the Data
After analyzing, data preparation begins. Data preparation is an optional, wizard-driven process that involves going through a step-by-step process where the program confirms the training set datatypes, identifies and allows for the removal of outliers, reports missing data with the option to remove or impute values, allows for rescaling of the data, groups into discrete bins and, finally, provides the option to create dummy variables. All decisions can be changed by moving backward and forwards through the steps at any time.
For the Breast Cancer data, a small amount of rescaling and grouping were necessary to increase accuracy.
4. Feature Selection
Whether you came in with prepared data, or just finished the process, the next step is to select which variables to be used in the model. To make this decision it is essential to check back at the data, looking for patterns and correlations.
5. Create and Compare the Models
At this point, you are left with choosing between the available algorithms (i.e. Decision Tree, Logistic Regression, K-Nearest Neighbor, or Naive Bayes). Knowi makes it easy to choose all available and compare them with useful attributes such as accuracy or the absolute deviation. Pressing the little eye next to the model created in the results section will show a preview of the input data along with the predictions of the program. Next, to the eye, there is a plus sign that, when pressed, will display the details of that specific model. It is beneficial to produce many models and tweak settings each time to find the best one for the situation. All past models are saved in the history and can be viewed, compared, and even published.
6. Publish
The last step is publication. This step involves the button next to the plus sign. Upon publishing, a prompt to name the model will be displayed. It is possible to publish as many models as needed from the same data. All models that are created can be viewed and compared directly in the ‘Published Models’ tab within Machine Learning.
How to Apply a Model to a Query
Now you have officially created a machine learning model that can seamlessly be applied to any query. To integrate it into a dataset simply press ‘Apply Model’ while performing a query and this will add a field where all the machine learning models will be available to be selected and used. Pressing the preview button on the screen will show the data along with the predictions made by the model.
Actions from Insight Made Easy
With those six steps, you have a machine learning model that can be integrated into any workflow and create new visualizations and insights that will drive downstream actions. The applications of the machine learning model are endless and can be tailored to the individual need. Once a model is made and put in place, there are many actions that can be performed to gain meaning and spark reactions. This is done through trigger notifications. A trigger notification is a notification that will act in the case that a certain condition is met. In the scope of the breast cancer machine learning model, an alert can be set to email a doctor the patient’s information in the situation that the model found a tumor to be malignant. This enables more than just insights, it generates action.
Summary
The process of creating a model within Knowi is so easy that anyone can do it, and it starts by simply uploading a dataset. Data can be uploaded from a file, SQL, and NoSQL sources, along with REST-APIs. Following the uploading of a file, Knowi has built-in algorithms available, or the option to create your own, along with a designated page to review multiple factors and evaluate the best algorithm for your situation. Using this method, the Breast Cancer training data was loaded from the UCI Machine Learning Repository into a Knowi workspace, then analyzed with the built-in data prepping tools. The resulting model was ready to be integrated into any workflow and autonomously perform actions based on the results, such as sending an alert to a doctor depending on the outcome of the test.
References
Dheeru, D., & Karra Taniskidou, E. (2017). UCI Machine Learning Repository. Retrieved from University of California, Irvine, School of Information and Computer Sciences: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
Knowi. (2017). Adaptive Intelligence for Modern Data. Retrieved from Knowi Website: www.knowi.com
Learn More: Give Knowi a try and see how easy visualizing and learning from your data can be. Click here and start your free 21 day trial