LATEST VERSION: 3.2.0 - RELEASE NOTES
Pivotal® Greenplum® Text v2.0.0

GPText Function Reference

The following functions are available in Pivotal GPText.

Indexing

gptext.create_index() – creates an empty index.

gptext.index() – populates an index.

gptext.commit_index() – finalizes index operations.

gptext.enable_terms() – enables term vectors and positions to allow extracting terms and their positions from text fields.

Modifying or Deleting an Index

gptext.add_field() – adds a field to an index.

gptext.delete() – deletes documents matching a search query.

gptext.drop_field() – deletes a field from an index.

gptext.drop_index()– deletes an index.

gptext.search() – searches an index.

gptext.search_count() – returns number of documents that match search.

gptext.gptext_retrieve_field – extracts a single field from the rf search result column as text.

gptext.gptext_retrieve_field_int – extracts a single field from the rf search result column and converts to an integer.

gptext.gptext_retrieve_field_float – extracts a single field from the rf search result column and converts to a float.

gptext.highlight() – returns search result with search term highlighted.

gptext.faceted_field_search() – search, faceted by fields.

gptext.faceted_query_search() – search, faceted by queries.

gptext.faceted_range_search() – search, faceted by defined ranges.

Working With Terms

gptext.enable_terms() – enables term vectors and positions.

gptext.terms() – gets the term vectors for the indexed documents in a Solr index for the specified field.

Configuration and Monitoring

gptext.index_status() – shows status for an index or all indexes.

gptext.reload_index() – reloads Solr configuration files.

gptext.version() – returns version of GPText installation.

High Availability

gptext.add_replica() – Adds a replica of an index shard.

gptext.delete_replica() – Deletes a replica of an index shard.

General Purpose Functions

gptext.count_t() – counts number of rows in a table.

Privileges

Your privileges to execute the GPText functions depend on your Greenplum Database privileges for the table from which the index is generated. For example, if you have SELECT privileges for a table in the Greenplum database, you have SELECT privileges for an index generated from that table.

Executing index functions requires one of OWNER, SELECT, INSERT, UPDATE, or DELETE privileges, depending on the function. The OWNER is the person who created the table and has all privileges. See the Security section of the GPText User’s Guide for information about setting privileges.

The Privileges required section for each of the GPText functions specifies the privileges required to execute that function.

Usage

The gptext functions in this section must be executed as SQL queries in the form:

SELECT * FROM gptext.function();

You must run these queries from the Greenplum Database master.

The examples in this section use one of the following:

  • A Greenplum database named wikipedia set up as follows:

    • A public schema.
    • A table named articles.

      The articles table has the following columns:

      id, date_time, title, content, refs.

      The default search column is content.

      The name of the index on the articles table is wikipedia.public.articles.

Indexing

Indexing functions create, set up, populate, and finalize (commit) Solr indexes.

gptext.create_index()

Creates an empty Solr index.

Syntax

gptext.create_index(schema_name, table_name, id_col_name, 
  def_search_col_name [, if_check_id_uniqueness])

or

gptext.create_index(schema_name, table_name, p_columns, p_types, 
  id_col_name, def_search_col_name [, if_check_id_uniqueness])

Parameters

schema_name
The name of the schema in the Greenplum database.
table_name
The name of the table in the Greenplum database.
p_columns
A text array containing the names of the table columns to index. If p_columns and p_types are omitted, all table columns are indexed.

The columns must be valid columns in the table. The columns identified by the id_col_name and def_search_col_name must be included in the array.

If the p_columns parameter is supplied, the p_types parameter must also be supplied. The sizes of the p_columns and p_types arrays must be the same.

p_types
A text array containing the Solr data types of the columns in the p_columns array.

Text types can be mapped to the name of an analyzer chain, for example text_intl, text_sm, or any type defined in the managed_schema. See Mapping Greenplum Database Data Types to Solr Data Types for equivalent Solr data types for other Greenplum types.

The p_types parameter must be supplied if the p_columns parameter is supplied.

id_col_name
The name of the id column in table_name. The column must be of type bigint or int8.
def_search_col_name
The name of the default column to search in table_name, if no other column is named in a query.
if_check_id_uniqueness
Optional. A Boolean value. The default is true. Set to false to index a table with a non-unique ID field.

Return type

boolean

Privileges required

Only the OWNER can execute this function.

Remarks

A GPText index is a Solr collection.

The contents of the id_col_name column should, in most cases, be a unique ID for each row. It must be of type bigint or int8. If the if_check_id_uniqueness argument is true, the default, a document with an ID matching an existing ID cannot be added to the index.

If the if_check_id_uniquess argument is false, documents with duplicate IDs are allowed to be added to the index. The content of other fields may or may not be the same as existing documents with the same ID. When a query returns multiple documents with the same ID, it is the user’s responsibility to anticipate and handle the multiple documents. For example, a table could have a revision column that is incremented when a new version of a document is added to the index, allowing queries that omit all but the most recent version from search results.

The name of the index created has the format:

<database_name>.<schema_name>.<table_name>

Populate the new index with gptext.index().

The number of replicas for each shard is determined when the index is created. It is the value of the gptext.replication_factor server configuration parameter, 3 by default.

If the gptext.failover_factor server configuration parameter is set, gptext.create_index() fails if the ratio of the number of GPText nodes that are up to the total number of GPText nodes is less than the gptext.failover_factor value (from 0.0 to 1.0). Index shards can only be created on active GPText nodes, so the gptext.failover_factor parameter prevents overloading the active GPText nodes when too many nodes are down.

Example

  • Create an index, wikipedia.public.articles, with content as the default search field. sql => SELECT * FROM gptext.create_index('public', 'articles', 'id', 'content');

  • Create an index, wikipedia.public.articles, with content as the default search field. Index the id, content, and title fields. sql => SELECT * FROM gptext.create_index('public', 'articles', '{"id", "content", "title"}', '{"long", "text", "text"}', 'id', 'content')

gptext.enable_terms()

Enables term vectors and positions to allow extracting terms and their positions from fields of data type text.

Syntax

gptext.enable_terms(index_name, field_name) 

Parameters

index_name
The name of the index for which you want to enable terms.
field_name
The name of the field for which you want to enable terms.

Return type

boolean

Privileges required

Only the OWNER can execute this function.

Remarks

Solr can mark terms and their positions in documents when indexing. This capability is disabled by default. Use gptext.enable_terms() to enable the capability.

Call gptext.enable_terms() for each field where you want to enable terms.

After calling this function, you must index or re-index with gptext.index().

Example

=# SELECT * FROM gptext.enable_terms('wikipedia.public.articles', 'content');
WARNING:  Enable terms for field: content of index: wikipedia.public.articles successfully. Reindex data needed.
 enable_terms 
--------------
 t
(1 row)

SELECT * FROM gptext.index(TABLE(SELECT * FROM articles), 'wikipedia.public.articles');
SELECT * FROM gptext.commit_index('wikipedia.public.articles');

gptext.index()

Populates an index by indexing data in a table.

Syntax

gptext.index(TABLE(SELECT * FROM table_name), index_name) 

Parameters

TABLE(SELECT * FROM table_name)
The table to be indexed, with data type anytable.
index_name
Name of the index that was created with gptext.create_index() and is to be populated.

Return type

SETOF dbid INT, num_docs BIGINT

where dbid is the dbid of the segment that the documents were sent to, and num_docs is the number of documents that were indexed.

Privileges required

You must have the INSERT or UPDATE privilege to execute this function.

Remarks

index_name must have been created with gptext.create_index().

The arguments to the gptext.index() function must be expressions. For example, TABLE(SELECT * FROM articles) creates a “table-valued expression” from the articles table, using the table function TABLE.

You can selectively index/update by changing the inner select list in the query.

After successfully indexing, you must commit the index with gptext.commit_index(index_name).

The output includes a two-column table with dbid (the Greenplum segment ID) and num_docs (the number of documents added to the index for that segment) as the columns.

Note:
Be careful about distribution policies:

The first parameter of gptext.index() is TABLE(SELECT * FROM messages). The query in this parameter should have the same distribution policy as the table you are indexing. There are two cases where the query will not have the same distribution policy:

  1. Your query is a join of two tables

  2. You are indexing an intermediate table (staging table) that is distributed differently than the final table.

When the distribution policies differ, you must specify "SCATTER BY" for the query like this:

TABLE(SELECT * FROM messages SCATTER BY distrib_id),

where distrib_idis the distribution id used when you created your primary/final table.

Example

=# SELECT * FROM gptext.index(TABLE(select * FROM articles), 'wikipedia.public.articles');
 dbid | num_docs 
------+----------
    3 |        6
    2 |        5
(2 rows)

gptext.commit_index()

Finishes an index operation. The results of an indexing operation are not available until this function is called for the index.

Syntax

gptext.commit_index(index_name)

Parameters

index_name
The name of the index to commit.

Return type

boolean

Privileges required

You must have the INSERT, UPDATE, or DELETE privilege to execute this function.

Remarks

Must be called after gptext.index() and gptext.delete().

Example

=# SELECT * FROM gptext.commit_index('wikipedia.public.articles');
 commit_index 
--------------
 t
(1 row)

Modifying or Deleting an Index

You can change an index by adding or dropping fields, reverting an index to its previous state, or deleting the index.

gptext.add_field()

Adds a field to your schema if the field was added to the database after the index was created.

Syntax

gptext.add_field (index_name, field_name[, is_default_search_col [, if_enable_terms]])

Parameters

index_name
The name of the index to which you want to add the field.
field_name
The name of the field to be indexed.
is_default_search_col
Optional. Boolean value. Is this to become the default search column (field)?
if_enable_terms
Optional. Boolean value. Enable terms support on this field when added to the GPText index?

Return type

SETOF boolean

Privileges required

Only the OWNER can execute this function.

Remarks

Call this function for each field you add.

Before and after you add one or more fields, reload the Solr configuration files with gptext.reload_index(). The initial reload_index() call is required because of Solr 4.0 behavior and may not be required in subsequent versions.

After you add one or more fields, you must also create and populate a new index with gptext.create_index()andgptext.index(), then commit with index with gptext.commit_index().

Example

Adds the field external_links to the index, then recreates, repopulates, and commits the index.

=# SELECT * FROM gptext.reload_index('wikipedia.public.articles');
=# SELECT * FROM gptext.add_field('wikipedia.public.articles', 'external_links', false, false);
INFO:  Add field: external_links for index: wikipedia.public.articles
 add_field 
-----------
 t
(1 row)

=# SELECT * FROM gptext.reload_index('wikipedia.public.articles');
 reload_index 
--------------
 t
(1 row)

SELECT * FROM gptext.commit_index('wikipedia.public.articles');

gptext.delete()

Deletes all documents that match the search query.

Syntax

gptext.delete(index_name, query)

Parameters

index_name
The name of the index.
query
Documents matching this query will be deleted. To delete all documents use the query'*'or'*:*'.

Return type

boolean

Privileges required

You must have the DELETE privilege to execute this function.

Remarks

After a successful delete, commit the index as follows.

gptext.commit_index(index_name)

Examples

  • Delete all documents containing the word “unverified” in the default search field:

    =# SELECT * FROM gptext.delete('wikipedia.public.articles', 'unverified');
     delete 
    --------
     t
    (1 row)=# SELECT * FROM gptext.commit_index('wikipedia.public.articles');
     commit_index 
    --------------
     t
    (1 row)
    
  • Delete all documents from the index:

    wikipedia=# SELECT * FROM gptext.delete('wikipedia.public.articles', '*:*');
     delete 
    --------
     t
    (1 row)
    
    wikipedia=# SELECT * FROM gptext.commit_index('wikipedia.public.articles');
     commit_index 
    --------------
     t
    (1 row)
    

gptext.drop_field()

Removes a field from your schema.

Syntax

gptext.drop_field(index_name, field_name)

Parameters

index_name
The name of the index from which to drop the field.
field_name
The name of the field to drop.

Return type

boolean

Privileges required

Only the OWNER can execute this function.

Remarks

Call this function for each field you drop.

Before and after dropping one or more fields, you must reload the Solr configuration files with gptext.reload_index(), then commit the index with gptext.commit_index().

The initial reload_index() is required by Solr 4.0 behavior and may not be necessary in subsequent versions.

Example

Drops the field external_links from the index.

=# SELECT * FROM gptext.reload_index('wikipedia.public.articles');
 reload_index 
--------------
 t
(1 row)

=# SELECT * FROM gptext.drop_field('wikipedia.public.articles', 'external_links');
INFO:  Drop field: external_links for index: wikipedia.public.articles
 drop_field 
------------
 t
(1 row)

=# SELECT * FROM gptext.reload_index('wikipedia.public.articles');
 reload_index 
--------------
 t
(1 row)

=# SELECT * FROM gptext.commit_index('wikipedia.public.articles');
 commit_index 
--------------
 t
(1 row)

gptext.drop_index()

Removes an index.

Syntax

gptext.drop_index(index_name)

Parameters

index_name
The name of the index to drop.

Return type

boolean

Privileges required

Only the OWNER can execute this function.

Remarks

A dropped index cannot be recovered.

Example

=# SELECT * FROM gptext.drop_index('wikipedia.public.articles');
 drop_index 
------------
 t
(1 row)

Search functions enable querying an index.

Changing the query parser at query time

When using the search functions, you can change the query parser used by Solr at query time. A different query parser may be required, depending on the nature of the query. See the Using Advanced Querying Options for a list of the query parsers GPText supports.

To change the query parser at query time, use the defType Solr option with the gptext.search() function.

To change the query parser for any search function at query time, use the Solr localParams syntax, replacing the <query> term with '{!type=edismax}<query>'.

With the GPText Universal Query Parser, you can use features of any of the supported query parsers in one query. To use the Universal Query Parser, replace the <query> term with '{!gptextqp}<query>'. See Using the Universal Query Parser for information and examples.

gptext.search()

Searches an index.

Syntax

gptext.search(src_tbl, index_name, search_query, filter_queries)

or:

gptext.search(src_tbl, index_name, search_query, filter_queries[, options])

Parameters

src_table
Specifies a SELECT statement on an existing, indexed table on which to perform the search. The src_table parameter is an anytable data type, specified in the following format

TABLE(SELECT * FROM <src_table>)

index_name
The name of the index to search.
search_query
Text value containing a Solr text search query.
filter_queries
A text array of filter queries, if any. If none, set this parameter to null.
options
An optional ampersand-delimited list of Solr query parameters. See Solr options.

Return type

SETOF gptext.search_scored_result

This is a composite type where:

Column Type
id bigint
score double precision
offsets (conditional) gptext hstore
rf (conditional) text

If the options parameter is included in gptext.search(), the result includes the offsets column. This column contains key-value pairs, where the key is the column name and the value is a comma-separated list of offsets to locations where the search term occurs. This data is used by the gptext.highlight() function to add highlighting tags to the column data. If highlighting is not enabled with the 'hl=true' option, the offsets column is NULL.

If the fl option is included in the options parameter to specify additional fields to add to the result, the rf column contains the additional fields in a formatted text value. The gptext.gptext_retrieve_field() function can be used to extract a single field value from the rf column. There are variants of the gptext.gptext_retrieve_field() function to retrieve integer and float values from the rf column value.

Privileges required

You must have the SELECT privilege to execute this function.

Solr options

Solr queries allow the following optional refinements, specified as an ampersand-delimited list in the options parameter.

defType
The name of the query parser to use for this query.

Example: defType=edismax

rows
The maximum number of rows to return per segment. If omitted, all rows are returned.

Example: rows=100 returns 100 rows per segment or all rows if there are fewer than 100.

sort
Sorts on a field or score in ascending or descending order.

Examples:

  • score desc (default if no sort defined)
  • date_time asc
  • date_time asc score desc sorts on date_time ascending, then on score descending
start
The number of the first record to return.

Examples:

  • start=0 default: returned records starts with record 0
  • start=25 returned records starts with record 25
hl
Enable highlighting.

Example: hl=true

hl.fl
Comma-separated list of field names to consider when highlighting.

Examples:

  • hl.fl=field1
  • hl.fl=field1,field2
fl
Comma-separated list of fields to include in search results. The fields must be set to stored=true in the managed-schema for the index.

Example: fl=version,author

Remarks

  • The output includes a table with columns id (the ID named in gptext.create_index()) and score (the tf-idf score). A column named offsets is included if highlighting is specified in the options parameter. An column named rf is included when a list of additional fields to include is specified in the options parameter.
  • To change the query parser at query time, specify the defType option in the options parameter list. For example, setting the options parameter to ’rows=100&defType=edismax’ limits the output to 100 rows per segment and will change the query parser to edismax.
  • The TABLE query is planned and affects the estimate for gptext.search(), but does not execute. For example, if your query includes

    gptext.search(TABLE(SELECT * FROM t), ...)
    

    the query planner estimates the number of results as the number of rows in t. This can cause the query planner to ignore the use of an index scan. Use a query like TABLE(SELECT 1 SCATTER BY 1) to avoid this issue.

  • If you do not specify options, gptext.search() returns all rows.

  • The options separator has changed from comma to ampersand (&) in order to support highlighting. If you do not use highlighting, you can revert to using the comma separator by setting the gptext.search_param_separator to ','.

  • The Solr option rows specifies the maximum number of rows to return per segment. For example, if you have four segments,‘rows=100' returns at most a total of 400 rows. To limit the number of rows returned for an entire query, set a LIMIT in the SQL query. For example, the following query returns at most a total of 20 rows for the query: select t.id, q.score, t.message\_text from twitter.message t, gptext.search(...) q where t.id=q.id LIMIT '20';

  • The gptext hstore type is a limited form of the Postgres hstore type. It only has the hstore input and output functions implemented, as gptext_hstore_in and gptext_hstore_out.

Examples

  • Runs a GPText query that looks for Wikipedia articles that contain the term “optimization”, and joins the results to the original Greenplum Database articles table:

    =# SELECT a.id, a.title, q.score 
    FROM articles a, 
    gptext.search(TABLE(SELECT * FROM articles), 'wikipedia.public.articles', 
    'optimization', null) q 
    WHERE a.id = q.id ORDER BY score DESC;
        id    |     title      |    score    
    ----------+----------------+-------------
     43835553 | Cubesort       | 0.054204933
     29050607 | Cycle sort     | 0.044258144
      6508027 | Tree sort      | 0.036136623
        29352 | Selection sort | 0.034337204
        20039 | Merge sort     | 0.033537593
      3268249 | Quicksort      | 0.030907018
     25977485 | Bubble sort    | 0.028614337
     23954341 | Timsort        | 0.023363508
        77355 | Shellsort      | 0.023128692
        15205 | Insertion sort |  0.02289147
        13995 | Heapsort       | 0.011179198
    (11 rows)
    
  • Returns 3 results from each shard beginning at record 0:

    =# SELECT a.id, a.title, q.score 
    FROM articles a,
    gptext.search(TABLE(SELECT * FROM articles), 'wikipedia.public.articles', 
    'optimization', null, 'rows=3&start=0') q
    WHERE a.id = q.id;
        id    |     title      |    score    
    ----------+----------------+-------------
        13995 | Heapsort       | 0.011179198
        20039 | Merge sort     | 0.033537593
      6508027 | Tree sort      | 0.036136623
        15205 | Insertion sort |  0.02289147
        77355 | Shellsort      | 0.023128692
     23954341 | Timsort        | 0.023363508
    (6 rows)
    
  • Returns 10 rows with the text “iphone” highlighted in the message column.

    SELECT t.id,
    gptext.highlight(t.message, 'message', s.hs)
    FROM twitter.message t,
    gptext.search(TABLE(SELECT 1 SCATTER BY 1),
    'demo.twitter.message','{!gptextqp}iphone', null,
    'rows=10&hl=true&hl.fl=message'
    ) s
    WHERE t.id = s.id;
    

gptext.search_count()

Returns the number of documents that match the search query.

Syntax

gptext.search_count(index_name, search_query, filter_queries)

Parameters

index_name
The name of the index. dd>
search_query
The search query.
filter_queries
A comma-delimited array of filter queries, if any. If none, set this parameter to null.

Return type

bigint

Privileges required

You must have the SELECT privilege to execute this function.

Remarks

None.

Example

=# SELECT * FROM gptext.search_count('wikipedia.public.articles', 'bubble', null);                                                                           count 
-------
     4
(1 row)

gptext.gptext_retrieve_field()

Retrieves a single field from the rf search result column as a text value.

Syntax

gptext.gptext_retrieve_field(rf, field_name)

Parameters

rf
The name of the rf column.
field_name
The name of the field to retrieve.

Remarks

The fl=<field_list> Solr search option is added to the options parameter of the gptext.search() function to request additional stored fields. The additional fields are returned in the results in a column named rf. The rf column value has a format like the following:

column_value { name: "_version" value: "1544714234398507008" } column_value { name: "revision" value: "9.70" } column_value { name: "author" value: "jdough" }

The gptext.gptext_retrieve_field() function extracts the value for a single specified field and returns it as a text value. If there is no field with the specified name in the rf column, it returns NULL.

Storing additional fields in an index requires editing managed-schema to specify the fields that should be stored. See Storing Additional Fields in an Index for instructions.

gptext.gptext_retrieve_field_int()

Retrieves a single field from the rf search result column as an integer value.

Syntax

gptext.gptext_retrieve_field_int(rf, field_name)

Parameters

rf
The name of the rf column.
field_name
The name of the integer field to retrieve.

Remarks

The gptext.gptext_retrieve_field_int() function is the same as the gptext.gptext_retrieve_field function, except that the extracted field value is converted to an integer value.

gptext.gptext_retrieve_field_float()

Retrieves a single field from the rf search result column as a float value.

Syntax

gptext.gptext_retrieve_field_float(rf, field_name)

Parameters

rf
The name of the rf column.
field_name
The name of the float field to retrieve.

Remarks

The gptext.gptext_retrieve_field_float() function is the same as the gptext.gptext_retrieve_field function, except that the extracted field value is converted to a float value.

gptext.highlight()

Highlights terms by inserting markup tags into data.

Syntax

gptext.highlight(column_data, column_name, offsets)

Parameters

column_data
The text data from the table which is to be tagged with highlighting tags.
column_name
The name of the corresponding column from the table.
offsets
A gptext hstore value that contains key-value pairs that indicate the locations of the text to highlight within the column data. See Remarks for information about the gptext hstore data type.

Prequisite

  • To use highlighting, term vectors must be enabled before creating the index. To enable term vectors, call gptext.enable_terms() for each field where you want to enable terms, then index or re-index with gptext.index().

Remarks

  • The offsets parameter is a gptext hstore, where each key is a column name and the value is a comma-separated list of offsets into the column data. This hstore is constructed by gptext.search() with highlighting enabled in the offsets parameter.

    Following is an example of the offsets hstore content:

    "field1"=>"0:5,9:14","field2"=>"13:20"
    

    gptext.highlight() will insert two sets of tags into the field1 data and one set into the field2 data at the indicated offsets.

  • The gptext hstore type is a limited form of the Postgres hstore type. It has only the hstore input and output functions implemented, as gptext_hstore_in and gptext_hstore_out.

  • The highlight tags are defined by the gptext.hl_pre_tag and gptext.hl_post_tag server configuration parameters. Their default values are <em> and </em>, respectively.

Example

  • Returns 10 rows with the text “iphone” highlighted in the message column.

    SELECT t.id,
    gptext.highlight(t.message 'message', s.hs)
    FROM twitter.message t,
    gptext.search(TABLE(SELECT 1 SCATTER BY 1),
    'demo.twitter.message','{!gptextqp}iphone', null,
    'rows=10&hl=true&hl.fl=message'
    ) s
    WHERE t.id = s.id;
    

Faceting breaks up a search result into multiple categories, showing counts for each.

gptext.faceted_field_search()

The faceted_field_search() function breaks search results into field name categories.

Syntax

gptext.faceted_field_search (index_name, query, filter_queries, facet_fields, facet_limit, minimum)

Parameters

index_name
The name of the index.
query
Query statement. Use *:* to query for all results.
filter_queries
A text array of filter queries, if any. If none, set this parameter to null.
facet_fields
An array of field names to facet. Use PostgreSQL array notation.
facet_limit
Maximum number of results to be returned for each aggregation (facet).
minimum
Minimum number of results required before an aggregation (facet) will be returned. Enter 0 to return all facets.

Return type

SETOF gptext.facet_field_result

Privileges required

You must have the SELECT privilege to execute this function.

Remarks

None.

Examples

  • Facet on spam and truncated fields, no limit, no minimum, all tweets:

    SELECT * FROM 
    gptext.faceted_field_search('twitter.public.message', '*:*', null, '{spam, truncated}', 0, 0);
    
  • Facet on author_id, no limit, with a minimum of five tweets, all tweets:

    SELECT * FROM
    gptext.faceted_field_search('twitter.public.message', '*:*', null, '{author_id}', 0, 5);
    

gptext.faceted_query_search()

The faceted_query_search() function breaks search results into categories defined by queries that you provide.

Syntax

gptext.faceted_query_search(index_name, query, filter_queries, facet_queries)

Parameters

index_name
The name of the index.
query
Query statement. Use *:* to query for all results.
filter_queries
A text array of filter queries, if any. If none, set this parameter to null.
facet_queries
Type: text[]. Required. An array of facet queries.

Return type

SETOF gptext.facet_query_result

Privileges required

You must have the SELECT privilege to execute this function.

Remarks

None.

Example

Facet on product price ranges (0-100, 101-200, 201-300, 300+) of cameras:

SELECT * FROM 
  gptext.faceted_query_search('store.public.catalog', 
  'product_type:camera', null, 
  '{price:[* TO 100], price:[101 TO 200], 
  price:[201 TO 300], price:[301 TO *]}');         

gptext.faceted_range_search()

The faceted_range_search() function breaks search results into range categories, with ranges defined by the range_start, range_end, and range_gap parameters.

Syntax

gptext.faceted_range_search(index_name, query, filter_queries, 
  field_name, range_start, range_end, range_gap)        

Parameters

index_name
The name of the index.
query
Query statement. Use *:* to query for all results.
filter_queries
An text array of filter queries, if any. If none, set this parameter to null.
field_name
The name of the field on which to facet.
range_start
Beginning of the range.
range_end
End of the range.
range_gap
Size of range increment, a text value.

Return type

SETOF gptext.facet_range_result

Privileges required

You must have the SELECT privilege to execute this function.

Remarks

None.

Example

Facet on date range, starting 1 year ago, ending now, every day:

SELECT * FROM 
  gptext.faceted_range_search('twitter.public.message', '*:*', 
  null, 'created_at', 'NOW/YEAR-1YEAR', 'NOW/YEAR', '+1DAY');          

Working with Terms

gptext.terms()

Gets the term vectors for the indexed documents in a Solr index for the specified field. You can use gptext.terms() to create tables.

Syntax

gptext.terms(src_table, index_name, field_name, 
            search_query, filter_queries[, options])          

Parameters

src_table
An anytable value that specifies a SELECT statement on an existing, indexed table on which to perform the search. Specify in the format:

TABLE(SELECT * FROM <src_table>;)

index_name
The name of the index to query for terms.
field_name
The name of the field to query for term.
search_query
A query that the field must match.
filter_queries
A comma-delimited array of filter queries, if any. If none, set this parameter to null.
options
An optional, comma-delimited list of Solr query parameters.

Return type

SETOF gptext.term_info

This is a composite type where:

Column Type
id bigint
term text
positions integer[]

Privileges required

You must have the SELECT privilege to execute this function.

Remarks

  • To enable using gptext.terms(), execute the GPText function gptext.enable_terms(), then reindex with gptext.index().
  • The TABLE query is planned and affects the estimate of of gptext.terms(), but does not execute. For example, if your query includes:

    gptext.terms(TABLE(SELECT * FROM t), ...)
    

    The query planner estimates the number of results as the number of rows in t. This can cause the query planner to ignore the use of an index scan. Use a query like TABLE(SELECT 1 SCATTER BY 1) to avoid this issue.

Examples - Create a Terms Table

CREATE TABLE twitter.terms AS 
  SELECT * FROM gptext.terms(TABLE(SELECT * FROM 
  twitter.message), 'demo.twitter.message', 'message_text', 
  'iphone', null);          

Configuration and Monitoring

Index configuration and monitoring functions enable managing indexes, tracking index statistics, checking status of index segments, and ensuring that index contents are current.

gptext.index_status()

Shows status for a specified index or for all indexes.

Syntax

gptext.index_status([index_name])

Parameters

index_name
The name of the index. Optional. Returns status for all indexes if no index is specified.

Return Type

SETOF gptext.index_status_result

gptext.reload_index()

Reloads Solr configuration files if they have been modified.

Syntax

gptext.reload_index([index_name])

Parameters

index_name
Optional. The name of the index for which to reload the configuration files.

Return type

boolean

Privileges required

Only the OWNER can execute this function.

Remarks

None.

Example

=# SELECT * FROM gptext.reload_index('wikipedia.public.articles');
 reload_index 
--------------
 t
(1 row)

gptext.version()

Returns the version of your GPText installation, including build number.

Syntax

SELECT * FROM gptext.version()

Parameters

None.

Return type

text

Privileges required

You do not need any privileges to execute this function.

Remarks

None.

Example

=# SELECT * FROM gptext.version();
                 version                 
-----------------------------------------
 Greenplum Text Analytics 2.0.0 (main)
(1 row)

High Availability

gptext.add_replica()

Adds a replica of an index shard.

Syntax

gptext.add_replica(index_name, shard_name[, node_name])

Parameters

index_name
Name of the index.
shard_name
Name of the shard to replicate.
node_name
Name of the node where the replica is to be added. Optional. If omitted, SolrCloud chooses the node.

Return type

boolean

Remarks

This function is used by the GPText management utility gptext-replica add.

The value of the gptext.replication_factor configuration parameter when an index is created determines how many replicas are created for each shard. In a Greenplum system, there are the same number of shards as there are Greenplum segments. The number of replicas created for a new index is the number of segments times the value of the gptext.replication_factor configuration parameter, 3 by default. The replicas are distributed evenly among the live GPText nodes.

Replicas consume space on the host where they are created, so they are usually only created to replace a replica that has failed or become unavailable or to relocate a replica to another GPText instance. When adding replicas, you should maintain equal distribution of replicas among the GPText nodes and avoid placing multiple replicas for the same shard on the same host.

The total number of replicas for an index that can be placed on each GPText node is set when the index is created. (In Solr, this is the MaxShardsPerNode parameter.) GPText sets this limit by calculating the number of replicas to create per node and adding an additional factor, specified in the gptext.extension_factor server configuration parameter. This parameter can be set between 0 and 10; the default value is 2. Since the limit is set when the index is created, it is recommended to set the gptext.extension_factor parameter to a higher number to allow new replicas to be created when necessary.

Example

=# SELECT * FROM gptext.add_replica('wikipedia.public.articles', 'shard1');
 success |                 core_name                 
---------+-------------------------------------------
 t       | wikipedia.public.articles_shard1_replica4
(1 row)

gptext.delete_replica()

Deletes a named replica from the specified index and shard.

Syntax

gptext.delete_replica(index_name, shard_name, replica_name[, only_if_down])

Parameters

index_name
Name of the index.
shard_name
Name of the shard that contains the replica to delete.
replica_name
Name of the replica to remove.
only_if_down
Optional. When true, no action is taken if the replica is active. Default is false.

Return type

boolean

Remarks

Use the gptext.index_status() function to find the name of the replica to drop. Names are in the format core_nodeX, where X is a number.

This function is called from the gptext-replica drop management utility.

Examples

  1. Delete the core_node5 replica if it is down.

    =# SELECT * FROM gptext.delete_replica('wikipedia.public.articles', 'shard1', 'core_node5', true);
    ERROR:  Delete replica failed: Attempted to remove replica : wikipedia.public.articles/shard1/core_node5 with onlyIfDown='true', but state is 'active'.
    CONTEXT:  SQL function "delete_replica" statement 1
    
  2. Delete the core_node5 replica even if it is active.

    =# SELECT * FROM gptext.delete_replica('wikipedia.public.articles', 'shard1', 'core_node5');
     success 
    ---------
     t
    (1 row)
    

General Purpose Functions

gptext.count_t()

Counts the number of records in a table.

Syntax

gptext.count_t(table_name)

Parameters

table_name
Name of the table for which to count records.

Return type

integer

Privileges required

You need SELECT privileges on table_name to execute this function.

Example

=# SELECT * FROM gptext.count_t('wikipedia.public.articles');
 count_t 
---------
      11
(1 row)