Text Mining Using Two Tables: Complex Join

This example illustrates mining data where the build data is in two tables that must be joined before building the model.

For an overview of text mining, see Text Mining.

This example uses two tables for input:

Note: The table CUSTOMER_COMMENTS is not available; you cannot create an activity that can be run.

This example adds complex additional data to a table. For more information about adding complex additional data, see "Complex Additional Data" in Chapter 3 - Overview of Mining Activity Guides in the Oracle Data Mining Tutorial; the tutorial includes screen shots that illustrate the process.

The table CUSTOMER_COMMENTS has the following structure:

COMMENTS are customer comments. There will be several different comments for each customer.

We want to predict customers for whom the value of AFFINITY_CARD is 1.

These are the steps to build a classification model:

  1. Select the table MINING_DATA_BUILD_V. This is a view used by the Oracle Data Mining sample programs.
  2. Select Activity | Build to launch the Model Build Wizard.
  3. In Step 1 (Model Type) of the wizard, select Classification as the Function Type (this is the default), and select Support Vector Machine as the Algorithm.
  4. In Step 2 (Data), verify that Schema is the schema where the sample data resides, and Table/View is MINING_DATA_BUILD_V. Check Join additional data with case table. Select CUST_ID as the Unique Identifier. For all other choices, use the defaults. Click Next.
  5. In Step 3 (Join Additional Data), select CUSTOMER_COMMENTS in the Available Tables list and move it to the Selected Tables list. Click Edit to define the relationship.
  6. The Edit Relationship Window is displayed. In the Key Column Mapping grid, select CUST_ID for both Case Table Column and Related Table Column.
  7. Still in the Edit Relationship Window, select One to Many for Relationship Type; you select One to Many because there are several comments for each customer. In Transactional Column Mappings, click New to define a new mapping.
  8. The New Mapping Window opens. Select COMMENTS as the Value Column. Change the Mining Type to text. Verify that Data is Sparse is checked. Accept all other defaults.

    Note:  You must change the mining type from categorical to text for text mining to take place. If the dropdown menu does not appear, you have selected an algorithm that does not support text mining.

  9. Click OK to close the New Mapping Window. Click OK to close the Edit Relationship Window. Finally, click Next to finish Step 3 of the Wizard.
  10. In Step 4 (Data Usage), select AFFINITY_CARD as the Target. Note that the Mining Type of COMMENTS is text. Click OK.
  11. In Step 5 (Select Preferred Target Value), accept the default (1).
  12. In Step 6 (Activity Name), name the activity DEMO_TEXT_COMPLEX_JOIN.
  13. Click Finish to create the activity.
  14. The activity DEMO_TEXT_COMPLEX_JOIN is displayed. Note that the activity has all of the steps of an activity that builds a Support Vector Machine model plus the steps Text and Test(Text). In these additional steps, Oracle Data Miner does all of the processing required to prepare the text column for mining. The Options for these steps support advanced text mining features such as customized stoplists.