9.4 Rule-Based Classification
Rule-based classification is the basic solution for creating an Oracle Text classification application.
The basic steps for rule-based classification are as follows. Specific steps are explored in greater detail in the example.
-
Create a table for the documents to be classified, and then populate it.
-
Create a rule table (also known as a category table). The rule table consists of categories that you name, such as "medicine" or "finance," and the rules that sort documents into those categories.
These rules are actually queries. For example, you define the "medicine" category as documents that include the words "hospital," "doctor," or "disease." Therefore, you would set up a rule in the form of "hospital OR doctor OR disease."
-
Create a
CTXRULEindex on the rule table. -
Classify the documents.
See Also:
"CTXRULE Parameters and Limitations" for information on which operators are allowed for queries
9.4.1 Rule-Based Classification Example
In this example, you gather news articles about different subjects and then classify them. After you create the rules, you can index them and then use the MATCHES statement to classify documents.
To classify documents:
9.4.2 CTXRULE Parameters and Limitations
The following considerations apply to indexing a CTXRULE index:
-
If you use the
SVM_CLASSIFIERclassifier, then you may use theBASIC_LEXER,CHINESE_LEXER,JAPANESE_LEXER,orKOREAN_MORPH_LEXERlexers. If you do not useSVM_CLASSIFIER,then you can use only theBASIC_LEXERlexer type to index your query set. -
Filter, memory, datastore, and [no]populate parameters are not applicable to the
CTXRULEindex type. -
The
CREATEINDEXstorage clause is supported for creating the index on the queries. -
Wordlists are supported for stemming operations on your query set.
-
Queries for
CTXRULEare similar to theCONTAINSqueries. Basic phrasing ("dog house") is supported, as are the followingCONTAINSoperators:ABOUT,AND,NEAR,NOT,OR,STEM,WITHIN,andTHESAURUS.Section groups are supported for using theMATCHESoperator to classify documents. Field sections are also supported; however,CTXRULEdoes not directly support field queries, so you must use a query rewrite on aCONTEXTquery. -
You must drop the
CTXRULEindex before exporting or downgrading the database.
See Also:
-
Oracle Text Reference for more information on lexer and classifier preferences