Basic Text Mining

This example illustrates mining data that contains one column of text data. For an overview of text mining, see Text Mining.

To view the data, go to the schema for the data mining user account. Locate the table MINING_BUILD_TEXT in Data Sources and double-click. Note that MINING_BUILD_TEXT is listed among tables.

MINING_BUILD_TEXT is essentially MINING_DATA_BUILD_V converted to a table with a new column COMMENTSconsisting of customer comments.

We want to predict customers for whom the value of AFFINITY_CARD is 1.

Follow these steps to build a classification model:

  1. Select Activity | Build to launch the Model Build Wizard.
  2. In Step 1 (Model Type) of the wizard, select Classification as the Function Type (this is the default), and select Support Vector Machine as the Algorithm.
  3. In Step 2 (Data), select the schema where MINING_BUILD_TEXT resides as Schema, MINING_BUILD_TEXT as the Table/View, and CUST_ID as the Unique Identifier. For all other choices, use the defaults.
  4. In Step 3 (Data Usage), select AFFINITY_CARD as the Target. Change the Mining Type of COMMENTS to text: Select the COMMENTS row, click in the Mining Type column for COMMENTS and select text from the dropdown menu.

    Note:  You must change the mining type from categorical to text for text mining to take place. If the dropdown menu does not appear, you have selected an algorithm that does not support text mining.

  5. In Step 4 (Select Preferred Target Value), accept the default (1).
  6. In Step 5 (Activity Name), name the activity DEMO_TEXT.
  7. Click Finish to create the activity.
  8. The activity DEMO_TEXT is displayed. Note that the activity has all of the steps of an activity that builds a Support Vector Machine model plus the steps Text and Test(Text). In these additional steps, Oracle Data Miner does all of the processing required to prepare the text column for mining. The Options for these steps support advanced text mining features such as customized stoplists.
  9. Run the activity. After it completes, you can examine the model and the test results. Click Results in the Test Metrics Step; Predictive Confidence is approximately 71%. To see if the user comments provided any more information, compare the test results with the results of building a Support Vector Machine classification model with input that does not contain the customer comments. If you build an SVM model using MINING_DATA_BUILD_V, which does not contain a COMMENTS column, the Predictive Confidence is approximately 60%.