Mining Activities

A mining activity provides a step-by-step guide to model build or model apply.

The following are discussed:

Kinds of Activities

There are three kinds of mining activities:

You can create one or more mining activities at any time during an Oracle Data Miner session.

Create a mining activity using the Activity menu.

Build Activity

To create a build activity, select Activity | Build. The wizard collects the following information: the mining activity type, the data to be used for building the model, and the name of the activity. The data used for model building can be either a single table or view or a table or view created by joining tables or views to a base table or view. If you select either classification or clustering, there is a choice of algorithms; for example, if you select the classification Function Type, you can select one of the following algorithms: Decision Tree (the default), Naive Bayes, Logistic Regression (GLM), or Support Vector Machine. Once you select an algorithm for an activity, you cannot change it. You can also specify which attributes of the case table to include in the model. If the algorithm that you selected requires a target, you must specify a target and target value.

When you click Finish, the wizard creates a new mining activity and displays the activity and its steps in the right pane. For example, a Naive Bayes Mining Activity has the following steps: Sample, Discretize, Split, Build, and Test Metrics steps. Not all steps are required; the Sample step is always optional. The defaults in each step are the appropriate ones for building a Naive Bayes model.

Test Activity

A model is tested using a data set for which the target is known. The model is applied to the data, the predicted values are compared to the known values, and various statistics are calculated to describe the accuracy of the model.

You can create test activities for classification and regression models only.

The wizard collects the following information: the name of the activity, the table or view containing the test data, and the model to be tested. The model is identified by either the build activity used to create it or by name, if it was not created through a build activity.

Before you can test a model, the data used for testing must be prepared in the same way as the data used to build the model. For example, if the data used to build the model was discretized (binned), the new data must be discretized in the same way. If you are testing a model that was created using a build activity, the test activity automatically performs the data preparation of the new data based on the data preparation of the build activity. If you are testing a model that was not created through a build activity, you must ensure that the new data is correctly prepared. Data preparation can include indexing text columns, discretization, and normalization. The test activity applies the model to the prepared data, compares the predicted values with the actual values, and summarizes the results. For example, a Naive Bayes mining test activity for a model that was created using a build activity has two steps, Discretize and Test Metrics. The discretization is automatically done in the same way that it was done to build the model.

Apply Activity

A model is applied to new data to predict behavior of the new data, that is, the model is used to score new data. Note that not all models can be applied.

You can create an apply activity for an anomaly detection, classification, clustering, feature extraction, or regression model.

To create an apply activity, select Activity | Apply.

The wizard collects the following information: the name of the activity, the data to score, and the model to apply. The data to score can be either a single table or view or a table or view created by joining tables or views to a base table or view. The model is identified by either the build activity used to create it or by name if it is a model that was not created through a build activity.

Before you can apply a model to new data, the new data must be prepared in the same way as the data used to build the model. For example, if the data used to build the model was normalized, the new data must be normalized in the same way. If you are applying a model that was created using a build activity, the apply activity automatically performs the data preparation of the new data based on the data preparation of the build activity. If you are applying a model that was not created through a build activity, you must ensure that the new data is correctly prepared. Data preparation can include indexing text columns, discretization, and normalization. The apply activity applies the model to the prepared data. For example, a Naive Bayes Mining apply activity for a model that was created using a build activity has two steps, Discretize and Apply. The discretization is done automatically in the same way that it was done to build the model.

Steps in an Activity

A mining activity is a collection of Steps. The steps are displayed in an activity display. You can start an activity, interrupt it at any time between the steps, and finish it at a later time. Steps must be performed in order, but steps can be skipped. Activity progress is maintained by keeping track of steps that have been completed or skipped.

An activity has two kinds of steps: optional steps and required steps. Optional steps are not required; for example, Sample is an optional step by default. If you execute an activity without performing an optional step, that step is grayed out. A required step has a check next to its name. You can change steps from required to optional by clicking the checkbox next to the name of the step.

Omitting steps such as discretization or normalization may have a significant impact on the model.

The steps in an activity have carefully selected defaults. These defaults were chosen to return good results in most cases. An activity also restricts choices to avoid errors. An activity can override defaults. To see the options for a step, click Options; you can change these options for a step that has not run.

Each step invokes a wizard to do the required work. To start the wizard, click Run in the step. For example, if you click Run in the Split step, the Split Transformation wizard starts. Exactly which wizard is launched depends on the activity and the options for the activity. For example, the Sample steps invokes either the sample wizard or the stratifies sample wizard, depending on the advanced options for the activity. To complete the step, go through the wizard steps. Running steps individually gives you the maximum amount of control over the options in the step.

Some steps, such as Build, spawn a task. The spawned task is displayed on the Server tab in the Activity Tasks list. To monitor the executing task, right-click the task name (which is the activity name) in the tasks list and select View Task. The task can also be stopped.

After a step is completed, the output of the step (new table or view, model, test metrics, etc.) is displayed. Completed steps are marked with a check mark and the phrase Completed; skipped steps are marked with a "Skipped" icon and the phrase Skipped.

If step fails, the step is marked Failed. Click Failed to see the error message. You can reset a failed step, make changes to options, and execute it again.

A completed step can be reset to its uncompleted state by clicking Reset. See Reset Steps for details.

All created mining activities are displayed in the Oracle Data Mining navigator tree; to see a list of defined activities, expand the Mining Activity node. The activities are listed according to model type (Attribute Importance, Association, Classification, Regression, Feature Extraction, or Clustering). To view a particular activity, click its name. To delete an activity, right-click the name of the activity, and select Delete from the context menu. If you right-click a build activity, you can create an apply activity for the model by selecting Apply Activity from the context menu; you can create a test activity for the model by selecting Test Activity from the context menu. If a model cannot be applied or tested, these selections do not appear on the context menu.

Reset Steps

You can reset steps that have already been executed. If you click Reset for a particular step, all succeeding steps are also reset.

For example, suppose that you decide that the binning in a classification model build was not correct. Click Reset for the Discretize step. All later steps are automatically reset. Make changes to binning by changing the options or run the step individually. Then click Run Activity to execute all reset steps.

Execute an Activity

The steps of an activity must be executed in order; the first step is the one closest to the top of the window. Steps can be skipped.

There are several ways to execute an activity:

You can change options for a step that has not been executed. Just click Options and make the changes. If a step has executed, then you can view the options. If you need to change options for a step that has executed, you must reset the step.

To execute an unchecked step, such as Sample, that was skipped proceed as follows:

  1. Reset any steps that come after the step by clicking Reset in the first completed step after the unchecked step. This resets the step and all subsequent steps.
  2. Click the checkbox for the step, change options if necessary, and execute either the step or the entire activity.

Stop an Activity

When an activity is running, the Run Activity button turns into a Stop button. To stop the activity, click Stop. The activity terminates at the next opportunity, that is, it completes the currently running step. You can restart the activity by clicking Run Activity again. Execution picks up from where you left off.

Output of an Activity

Some steps generate output data or results when they complete. For example, the Split step generates two tables or views, one for build and one for test; the Build step generates output data and results (the model). To view output or results, click the link.

Results Viewers and Transparency

Oracle Data Miner includes results viewers that let you view models, test results, and apply output.

The results viewers provide transparency for the results, that is, they display results in terms of the input data. Models are built using transformed data, not the original data. For example, if the data is normalized before the model is built, raw model results are in terms of the normalized data. The results viewer "un-normalizes" the data, so that model results are expressed in terms of the original data.

Special Classification and Regression Functionality

Certain kinds of models support additional functionality, as follows:

Generate PL/SQL Code for an Activity

You can generate PL/SQL code for a successfully completed activity. For more information, see Oracle Data Miner PL/SQL Code Generator.