Text Mining Using Two Tables: Simple Join

This example illustrates mining data where the build data is in two tables that must be joined before building the model.

For an overview of text mining, see Text Mining.

This example uses two tables for input:

You install the SH schema and set appropriate access to it when you install the Data Mining Sample programs, as described in the Oracle Data Mining Administrators Guide.

This example adds additional data to a table. For more information about adding additional data, see "Simple Additional Data" in Chapter 3 - Overview of Mining Activity Guides in the Oracle Data Mining Tutorial; the tutorial includes screen shots that illustrate the process.

We want to predict customers for whom the value of AFFINITY_CARD is 1.

Follow these steps to build a classification model:

  1. Select Activity | Build to launch the Model Build Wizard.
  2. In Step 1 (Model Type) of the wizard, select Classification as the Function Type (this is the default), and select Support Vector Machine as the Algorithm.
  3. In Step 2 (Data), set Schema to the schema where the sample data resides, and set Table/View to MINING_DATA_BUILD_V. Check Join additional data with case table. Select CUST_ID as the Unique Identifier. For all other choices, use the defaults. Click Next.

    Note: Oracle Data Mining requires that the data for text mining be a table, not a view. If you provide a view as input to a Mining Activity, the view is automatically converted.

  4. In Step 3 (Join Additional Data), select SH.SUPPLEMENTARY_DEMOGRAPHICS in the Available Tables list and move it to the Selected Tables list. Click Edit to define the relationship.
  5. The Edit Relationship Window is displayed. In the Key Column Mapping grid, select CUST_ID for both Case Table Column and Related Table Column.
  6. Still in the Edit Relationship Window, select One to One for Relationship Type. In Selected Table Columns deselect all attributes except for COMMENTS; the only data that is needed from SH.SUPPLEMENTARY_DEMOGRAPHICS is user comments. Click OK to close the Edit Relationship Window. Then click Next to finish Step 3 of the Wizard.
  7. In Step 4 (Data Usage), select AFFINITY_CARD as the Target. Change the mining type of SH.SUPPLEMENTARY_DEMOGRAPHICS.COMMENTS to text: Select the COMMENTS row, click in the Mining Type column for COMMENTS and select text from the dropdown menu.

    Note:  You must change the mining type from categorical to text for text mining to take place. If the dropdown menu does not appear, you have selected an algorithm that does not support text mining.

    Click OK.
  8. In Step 5 (Select Preferred Target Value), accept the default (1).
  9. In Step 6 (Activity Name), name the activity DEMO_TEXT_JOIN.
  10. Click Finish to create the activity.
  11. The activity DEMO_TEXT_JOIN is displayed. Note that the activity has all of the steps of an activity that builds a Support Vector Machine model plus the steps Text and Test(Text). In these additional steps, Oracle Data Miner does all of the processing required to prepare the text column for mining. The Options for these steps support advanced text mining features such as customized stoplists.
  12. Run the activity. After it completes, you can examine the model and the test results. Click Results in the Test Metrics Step; Predictive Confidence is approximately 71%. To see if the user comments provided any more information, compare the test results with the results of building a Support Vector Machine classification model with input that does not contain the customer comments. If you build an SVM model using MINING_DATA_BUILD_V, which does not contain the user comments, the Predictive Confidence is approximately 60%.