Oracle Data Mining allows the combination of text and non-text (traditional categorical and numerical)columns of data to build classification, regression, anomaly detection, feature extraction, and k-Means clustering models.
The following topics are discussed:
Text data can be either in a text column in a table or in a transactional table.
A text column is a column with one of the following datatypes: VARCHAR2
,
CHAR
, BFILE
, XMLTYPE
, URITYPE
, BLOB
, CLOB
, RAW
, or LONG RAW
. For example, in a medical application,
the input table might consist of measurements (numeric values representing temperature,
blood pressure, or other measurements) and a text column consisting of physician's comments. You could build models to see if measurements combined with physician's comments predict outcomes more accurately than predictions based on measurements only.
The text data can be a transactional table. In this case, the transactional table must be joined to a case table when the activity is defined.
Oracle Data Miner sets the mining type of columns to text
when possible. For example, if the datatype of a column is CLOB
, the mining type is text
. In some cases, such as when the datatype is VARCHAR2
or
CHAR
, you must change the mining type to text
if the column is a text column.
The following Oracle Data Mining algorithms permit text columns:
All other algorithms (Decision Tree, Naive Bayes, and Attribute Importance) do not support text.
Oracle Data Miner does not support text mining in some situations where Oracle Data Mining 11g supports text mining as described in Oracle Data Miner Restrictions on Text Mining.
Note that the algorithms that support text mining are the algorithms that support sparse data.
For a discussion of text mining in Oracle Data Mining, see Oracle Data Mining Concepts and Oracle Data Mining Application Developer's Guide. Oracle Data Mining text mining uses the concepts and facilities of Oracle Text. Oracle Text is documented in two manuals: Oracle Text Application Developer's Guide and Oracle Text Reference. See Where to Find More Information to locate these manuals.
Oracle Data Miner allows one column of the input table for a mining activity to be a text column. You can mine tables with two or more text columns using the Oracle Data Mining programmatic interfaces. If you have text columns only, you can use Oracle Data Mining or Oracle Text.
Oracle Data Miner does not support all of text mining functionality provided by Oracle Data Mining.
The following restrictions apply to text mining using Oracle Data Miner:
text
; if the datatype of the column is VARCHAR2
or CHAR
, you must change the mining type from categorical
to text
.If you are building a model that uses any algorithm that does not support text, you cannot change any mining type to text
.
All text columns must be properly prepared:
Data | Transform | Text lets you prepare a text column for use with the Oracle Data Mining PL/SQL interface. The same transform allows you to do directly the processing done internally by the mining activity.
Copyright © 2006, 2008, Oracle. All rights reserved.