DBMS_SEARCH Package

16 DBMS_SEARCH Package

The DBMS_SEARCH PL/SQL package provides procedures and functions to create, manage, and query a ubiquitous search index.

Instead of creating various individual indexes or manually defining the USER_DATASTORE or MULTI_COLUMN_DATASTORE procedures along with materialized views, you can create a ubiquitous search index to automatically synthesize multiple tables or views into a single index. This index lets you perform full-text, vector-based semantic search, hybrid text-vector search and range-based searches across multiple objects within an entire schema.

Note:

In addition to the examples provided for each of these APIs, you can run various end-to-end example scenarios, as demonstrated in Oracle Text Application Developer's Guide.

Name	Description
CREATE_INDEX	Creates a ubiquitous search index.
ADD_SOURCE	Adds a table, view, or Duality view to the index as data source.
REMOVE_SOURCE	Removes a data source and all its associated data from the index.
DROP_INDEX	Removes the index and all its associated data from the database.
GET_DOCUMENT	Returns a virtual indexed JSON document for the specified source metadata.
FIND	Retrieves a hitlist, and facets an aggregations of JSON documents based on the specified filter conditions.

16.1 CREATE_INDEX

The DBMS_SEARCH.CREATE_INDEX procedure creates a ubiquitous search index (or DBMS_SEARCH index) to perform full-text and range-based searches across multiple schema objects.

Syntax

The index type is a JSON search index enabled with a predefined set of preferences and settings to perform full text searches on tables, views, and Duality views.

DBMS_SEARCH.CREATE_INDEX(
    index_name       VARCHAR2,
    tablespace       VARCHAR2 DEFAULT NULL,
    datatype         VARCHAR2 DEFAULT NULL,
    lexer            VARCHAR2 DEFAULT NULL,
    stoplist         VARCHAR2 DEFAULT NULL,
    wordlist         VARCHAR2 DEFAULT NULL,
    vectorizer       VARCHAR2 DEFAULT NULL
);

Note:

The DBMS_SEARCH.CREATE_INDEX procedure has been upgraded in version 23.9 to support lexer, stoplist, wordlist, and vectorizer parameters. These parameters are not available for use in earlier release versions. Additionally, starting from version 23.9 DBMS_SEARCH index will not get created with default indexing preference, WILDCARD_INDEX.

index_name

Specify the name of the DBMS_SEARCH index to create. You can also specify the schema owner name along with the index name as:

[schema].index_name

tablespace

Specify the name of the tablespace to contain the index or index partitions.

datatype

Specify the datatype of the DATA column on which to create the index. The allowed values are JSON and OSON.

The default value is set to JSON, so you need to specify the datatype argument only if you want to override this default.

lexer

Specify the name of your lexer or multilexer preference. Use the lexer preference to identify the language of your text and how text is tokenized for indexing. See Lexer Types

wordlist

Specify the name of your wordlist preference. Use the wordlist preference to enable features such as fuzzy, stemming, and prefix indexing for better wildcard searching. See Wordlist Type

stoplist

Specify the name of your stoplist. Use stoplist to identify words that are not to be indexed. See Stoplists

vectorizer

Specify the name of your vectorizer preference. Use the vectorizer preference to customize vector search parameters of a hybrid vector indexing pipeline. The goal of a vectorizer preference is to provide you with a straightforward way to configure how to chunk and embed your documents and create a vector index, without requiring a deep understanding of various chunking or embedding strategies.

A vectorizer preference is a JSON object that collectively holds all indexing parameters related to chunking (UTL_TO_CHUNKS or VECTOR_CHUNKS), embedding (UTL_TO_EMBEDDING, UTL_TO_EMBEDDINGS, or VECTOR_EMBEDDING), and vector index (distance, accuracy, or vector_idxtype). You use the DBMS_VECTOR_CHAIN.CREATE_PREFERENCE PL/SQL function to create a vectorizer preference. To create a vectorizer preference, see DBMS_VECTOR_CHAIN.CREATE_PREFERENCE. After creating a vectorizer preference, you can use the vectorizer parameter to pass the preference name.

Notes

You can define which tables or views should be indexed by adding them as data sources into your index. All the columns of those tables or views are indexed. Use the DBMS_SEARCH.ADD_SOURCE and DBMS_SEARCH.REMOVE_SOURCE procedures to manage data sources.

The DBMS_SEARCH index is created with the following default indexing preferences:

Preference	Description
`WILDCARD_INDEX`	Enables wildcard indexing for a fast wildcard search.
`BASIC_WORDLIST`	Enables stemming and fuzzy matching.
`SEARCH_ON`	Allows both full-text and range-search queries for a specific data type. The supported data types are `NUMBER` (for indexing numeric values) and `TIMESTAMP` (for indexing date-time values).
`SYNC` and `OPTIMIZE`	Creates background jobs at predefined intervals to automatically synchronize the DML changes and optimize the index using the `AUTO_DAILY` mode on all data sources. You do not need to explicitly run any `SYNC_INDEX` and `OPTIMIZE_INDEX` operations on this index.

You can query this index using the CONTAINS(), JSON_TEXTCONTAINS(), and JSON_EXISTS operators on the INDEX_NAME table.
You can use the following DBMS_SEARCH dictionary views to examine these indexes:
- USER_DBMS_SEARCH_INDEXES: To query information about the DBMS_SEARCH indexes that are created in a user's schema.
- ALL_DBMS_SEARCH_INDEXES: To query information about all existing DBMS_SEARCH indexes, corresponding to each index owner.
- USER_DBMS_SEARCH_INDEX_SOURCES: To query information about the data sources that are added to the DBMS_SEARCH indexes, created in a user's schema.
- ALL_DBMS_SEARCH_INDEX_SOURCES: To query information about all existing data sources added to the DBMS_SEARCH indexes, corresponding to each index owner.

Example

This example specifies the index_name, tablespace, and datatype arguments. Here, the schema owner name is specified along with the index name as SCOTT.MYINDEX.

CREATE TABLESPACE tbs_02 DATAFILE 'dt.dbf' size 100MB segment space management auto;

exec DBMS_SEARCH.CREATE_INDEX('SCOTT.MYINDEX','tbs_02','JSON');

Related Topics

16.2 ADD_SOURCE

The DBMS_SEARCH.ADD_SOURCE procedure adds one or more data sources (tables or views) from different schemas to the DBMS_SEARCH index.

Syntax

DBMS_SEARCH.ADD_SOURCE (
index_name      VARCHAR2,
source_name     VARCHAR2,
memory          VARCHAR2 DEFAULT NULL,
parallel_degree NUMBER DEFAULT NULL);

Note:

The DBMS_SEARCH.ADD_SOURCE procedure has been upgraded in version 23.9 to support memory and parallel_degree parameters. These parameters are not available for use in earlier release versions.

index_name: Specify name of the index to which you want to add the table or view. You can also specify [schema].index_name.
source_name: Specify name of the table, view, or Duality view to add to the index. You can also specify [schema].table_or_view_name.
memory: Specify the amount of memory allocated for indexing of the added data source.

Note:
Total memory usage is determined by multiplying the memory parameter by the parallel_degree.
parallel_degree: Specify the degree of parallelism used for indexing the added data source.

Notes

To add a data source, the index owner must have SELECT and DML access to the source. The user also must have the CTXAPP privilege.
All the data sources (such as table, view, or each table in the view definition) that are added to the DBMS_SEARCH index must include at least one Primary Key column. Each table that is part of a view source having a foreign key must also have the Foreign Key constraint, referencing the relevant primary keys defined on the table.

If the source table does not have a primary key, then a ROWID is used instead. However, Oracle strongly recommends defining a primary key.
The DBMS_SEARCH index stores all supported SQL data types (including Object Type columns) in JSON objects, except for the XMLTYPE and LONG data types. This means that you cannot add a table or view as a data source to the index if it has a column with the XMLTYPE or LONG data type. The maximum allowed length of a JSON data type is 32 megabytes.
You can use the USER_DBMS_SEARCH_INDEX_SOURCES and ALL_DBMS_SEARCH_INDEX_SOURCES dictionary views to query information about the data sources that are added to your DBMS_SEARCH indexes.

Examples

exec DBMS_SEARCH.ADD_SOURCE('MYINDEX','MYTABLE');

exec DBMS_SEARCH.ADD_SOURCE('MYINDEX','MYVIEW');

exec DBMS_SEARCH.ADD_SOURCE('DOCUSER.MYINDEX','DOCUSER.MYTABLE');

Related Topics

16.3 REMOVE_SOURCE

The DBMS_SEARCH.REMOVE_SOURCE procedure removes one or more data sources (tables or views) from the DBMS_SEARCH index.

When run, this procedure deletes all indexed data and stops further indexing or maintenance operations on the associated data sources (tables or views).

Syntax

DBMS_SEARCH.REMOVE_SOURCE(
  index_name VARCHAR2, 
  source_name VARCHAR2);

index_name: Specify name of the index from which you want to remove the table or view.
source_name: Specify name of the table or view that you want to remove.

Example

exec DBMS_SEARCH.REMOVE_SOURCE('MYINDEX','MYTABLE');

Related Topics

Oracle Text Application Developer's Guide

16.4 DROP_INDEX

The DBMS_SEARCH.DROP_INDEX procedure removes a DBMS_SEARCH index and all its associated data from the database.

Syntax

DBMS_SEARCH.DROP_INDEX(
  INDEX_NAME VARCHAR2);

index_name: Specify name of the index that you want to drop.

Example

exec DBMS_SEARCH.DROP_INDEX('MYINDEX');

Related Topics

Oracle Text Application Developer's Guide

16.5 GET_DOCUMENT

The DBMS_SEARCH.GET_DOCUMENT procedure returns a virtual indexed JSON document as is indexed in the JSON search index for a particular row of an indexed data source (table or view).

Syntax

The DBMS_SEARCH index references associated data source tables to dynamically create a virtual indexed document. This document contains a JSON representation for each indexed row of a table or view that is added as a data source to this index. In this way, you can view all the contents extracted from the original base tables.

DBMS_SEARCH.GET_DOCUMENT(
  index_name VARCHAR2,
  metadata JSON
);

index_name: Specify name of the index for which you want to retrieve the data.
metadata: Specify the JSON metadata values, such as OWNER, SOURCE, or KEY. You must specify the metadata format based on the METADATA column of the INDEX_NAME table.

Example

SELECT DBMS_SEARCH.GET_DOCUMENT('MYINDEX',METADATA) from MYINDEX;

Related Topics

Oracle Text Application Developer's Guide

16.6 FIND

The DBMS_SEARCH.FIND procedure retrieves a hitlist, and facets an aggregations of JSON documents based on the specified query-by-example (QBE) filter conditions.

You can compute aggregations on different fields of the JSON data. The query lists search results in the JSON Results Set Interface, which supports faceted navigation and aggregations.

Syntax

DBMS_SEARCH.FIND(
  index_name VARCHAR2, 
  search_QBE JSON);

index_name

Specify name of the index on which you want to perform the query.

search_QBE

Specify the result_set_descriptor parameter value in JSON. It describes what the result set should contain.

The JSON format input result set descriptor consists of the $query, $search, and $facet parts:

{
  "$query":text query and filter conditions,
  "$search":search result specifications,
  "$facet":faceted result specifications	
}

For details on each of these JSON objects, see The JSON Format Input Result Set Descriptor.

The JSON format output result set descriptor consists of the following parts:

"$count":number
"$hit":[ hit_object_1, ..., hit_object_i , ... ]
"$facet":[ facet_object_1, ..., facet_object_i, ...]

For details on each of these JSON objects, see The JSON Format Result Set Output.

Example

Create a table and populate it with values:

SET FEEDBACK 1
SET NUMWIDTH 10
SET LINESIZE 80
SET TRIMSPOOL ON
SET TAB OFF
SET PAGESIZE 100
connect sys/knl_example as sysdba;
Connected.

grant connect,resource, unlimited tablespace, ctxapp
  to u1 identified by u1;
Grant succeeded.

connect u1/u1;
Connected.

create table tbl(id number primary key, jsn_col clob check(jsn_col is json));
Table created.

INSERT INTO tbl
  VALUES (1,'{ "zebra" : { "price" : [2000,1000],
                         "name" : "Marty",
                         "stripes" : ["Dark","Light"],
                         "handler" : "Bob", "sold" : true }}');
1 row created.

INSERT INTO tbl
  VALUES (2,'{ "zebra" : { "rating": 5, "price" : 1000,
                         "name" : "Zigby",
                         "stripes" : ["Light","Grey"],
                         "handler" : "Handy Marty", "sold" : "true" }}');
1 row created.
 
 INSERT INTO tbl
   VALUES (3,'{ "zebra" : { "rating": 4.5, "price" : 3000,
                          "name" : "Zigs",
                          "stripes" : ["Grey","Dark"],
                          "handler" : "Handy Marty", "sold" : false }}');
1 row created.
 
 INSERT INTO tbl
   VALUES (4,'{ "zebra" : { "rating": "4.5", "price" : "3000",
                          "name" : "Zigs",
                          "stripes" : ["Grey","Dark"],
                          "handler" : "Handy Marty", "sold" : null }}');
1 row created.
 
 commit;
Commit complete.

Create a DBMS_SEARCH index using the DBMS_SEARCH.CREATE_INDEX procedure, and add a source table to the index:

SQL> exec DBMS_SEARCH.CREATE_INDEX('JIDX');
PL/SQL procedure successfully completed.
SQL> exec DBMS_SEARCH.ADD_SOURCE('JIDX','TBL');
PL/SQL procedure successfully completed.
SQL>

Run the DBMS_SEARCH.FIND procedure:

Query: All zebras having name starting with Zig or having name Marty and
having a price greater than equal to 2000
Facets: For all zebras that satisfy the query, do the following
--   1. Get the count of zebras per zebra handler
--   2. Get the minimum zebra rating
--   3. Get the count of zebras for each unique stripe color
select DBMS_SEARCH.FIND('JIDX',JSON('
  {
    "$query": { "$and" : [
                    { "U1.TBL.JSN_COL.zebra.name" : { "$contains" : "Zig% or Marty" } },
                    { "U1.TBL.JSN_COL.zebra.price" : { "$gte" : 2000 } }
                  ]
                },
      "$facet" : [
        { "$uniqueCount" : "U1.TBL.JSN_COL.zebra.handler" },
        { "$min" : "U1.TBL.JSN_COL.zebra.rating" },
        { "$uniqueCount" : "U1.TBL.JSN_COL.zebra.stripes" }
      ]
    }'));

The output is as follows:


FIND_RESULT
--------------------------------------------------------------------------------
{
  "$count" : 3,
  "$facet" :
  [
    {
      "U1.TBL.JSN_COL.zebra.handler" :
      [
        {
          "value" : "Handy Marty",
          "$uniqueCount" : 2
        },
        {
          "value" : "Bob",
          "$uniqueCount" : 1
        }
      ]
    },
    {
      "U1.TBL.JSN_COL.zebra.rating" :
      {
        "$min" : 4.5
      }
    },
    {
      "U1.TBL.JSN_COL.zebra.stripes" :
      [
        {
          "value" : "Dark",
          "$uniqueCount" : 3
        },
        {
          "value" : "Grey",
          "$uniqueCount" : 2
        },
        {
          "value" : "Light",
          "$uniqueCount" : 1
        }
      ]
    }
  ]
}
1 row selected.
connect sys/knl_example as sysdba;
Connected.
drop user u1 cascade;
User dropped.

Related Topics