LATEST VERSION: 3.2.0 - RELEASE NOTES
Pivotal® Greenplum® Text v2.0.0

Administering GPText

GPText administration includes security considerations, monitoring Solr index statistics, and troubleshooting.

Changing GPText Server Configuration Parameters

Configuration parameters used with GPText are built-in to GPText with default values. You can change the values for these parameters by setting the new values in a Greenplum Database session. The new values are stored in ZooKeeper. GPText indexes use the values of configuration parameters when they are created. Changing configuration parameters affects new indexes, but does not affect existing indexes.

See GPText Configuration Parameters for a complete list of configuration parameters.

A one-time Greenplum Database configuration change is needed for Greenplum Database to allow setting and displaying GPText configuration variables. As the gpadmin user, enter the following commands in a shell:

$ gpconfig -c custom_variable_classes -v 'gptext'
$ gpstop -u

Then connect to a database that contains the GPText schema and execute the gptext.version() function to expose the GPText configuration variables:

twitter=# select * from gptext.version();

Change the values of GPText configuration variables using the SET command in a session with a database that contains the GPText schema. The following example sets values for three configuration parameters in a psql session:

twitter=# set gptext.idx_buffer_size=10485760;
SET
twitter=# set gptext.idx_delim='|';
SET
twitter=# set gptext.extension_factor=5;
SET

You can view the current value of a configuration parameter that you have set using the SHOW command:

twitter=# show gptext.idx_delim;
 gptext.idx_delim 
------------------
 |
(1 row)

Security and GPText Indexes

GPText security is based on Greenplum Database security. Your privileges to execute GPText functions depend on your privileges for the database table that is the source for the index. For example, if you have SELECT privileges for a table in the Greenplum database, then you have SELECT privileges for an index generated from that table.

Executing GPText functions requires one of OWNER, SELECT, INSERT, UPDATE, or DELETE privileges, depending on the function. The OWNER is the person who created the table and has all privileges. See the Greenplum Database Administrator Guide for information about setting privileges.

Checking ZooKeeper Status

Use the zkManager utility from the command line to check the ZooKeeper cluster status. If the Zookeepr cluster is bound to GPText, you can start and stop the cluster using zkManager.

To check the ZooKeeper cluster status, run the following command:

zkManager state

The utility lists the hosts, ports, latency, and follower/leader mode for each ZooKeeper instance. If a node is down, it’s mode is listed as Down.

If the ZooKeeper cluster was installed by the GPText installer, the zkManager utility can be used to start or stop the ZooKeeper cluster. To start the cluster, run the following command:

zkManager start

To stop ZooKeeper, run this command:

zkManager stop

Checking SolrCloud Status

You can check the status of the SolrCloud cluster and indexes by running the gptext-state utility from the command line.

To check the state of the GPText nodes and indexes, run the gptext-state utility with no options:

gptext-state

This command reports the status of the GPText nodes and status of each GPText index.

Run gptext-state list to view just the indexes.

The gptext-state healthcheck command checks the GPText configuration files, the index status, required disk space, user privileges, and index and database consistency. By default, the required disk space check passes if there is at least 20% disk free. You can set a different disk free threshold using the --disk_free option. For example:

[gpadmin@gpdb-sandbox ~]$ gptext-state healthcheck --disk_free=25
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Execute healthcheck on GPText cluster!
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText config files ...
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText index status ...
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for required disk space...
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for required user privileges...
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for indexes and database consistency...
20160629:15:45:27:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:27:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.

See the gptext-state utility reference for additional options.

Recovering GPText Nodes

Use the gptext-recover utility to recover down GPText nodes, for example after a failed Greenplum segment host is recovered.

With no arguments, the gptext-recover utility discovers down GPText nodes and restarts them.

With the -f (or --force) option, if a GPText node cannot be restarted and no shards are down, the node is deleted and created again on the same host. Missing replicas are added and the failed node and failed replicas are removed.

The -H option allows recreating down GPText nodes on a new host that replaces a failed host. It forces down nodes to be deleted and recreated on the new host. If shards are down, it advises reindexing. If only some replicas are down, it recreates the replicas on the new host and updates gptext.conf.

Gathering Solr Index Statistics

You can gather Solr index statistics by running the gptext-state utility from the command line.

To list all GPText indexes, enter the following command at the command line:

gptext-state list

A command line sample that retrieves all statistics for an index:

gptext-state --index wikipedia.public.articles

A command line sample that retrieves the number of documents in an index:

gptext-state --index wikipedia.public.articles --stats-columns num_docs

A command line sample that retrieves ’num_docs’ and the index size:

gptext-state --index wikipedia.public.articles 
--stats-columns num_docs,size

Backing Up and Restoring GPText Indexes

With the gptext-backup management utility, you can back up a GPText index to a shared directory so that, if needed, you can quickly recover from a failure. The backup can be restored to the same GPText system or to another system with the same number of Greenplum segments.

The gptext-backup management utility backs up an index and its configuration files to a shared file system, which must be accessible and writable by each host in the Greenplum cluster. The --path command-line option specifies the location of a directory on the mounted file system. The --name option provides a name for the backup.

The gptext-backup utility first checks that:

  • the GPText cluster is up
  • the shared file system is valid
  • the directory specified with the --name option does not already exist at the location specified by the --path option

The utility creates the new directory and saves one copy of each index shard to that directory, along with the index’s configuration files.

To restore an index, use the gptext-restore management utility. The GPText system you restore to must be on a Greenplum cluster with the same number of segments. The database, schema, and base table for the index must be present.

The --index option specifies the name of the GPText index that will be restored. If the index exists, you must first drop it with the gptext.delete() user-defined function.

The --path option specifies the location of the directory containing the backup files—the directory that gptext-backup created on the shared file system.

See gptext-backup for syntax and details for running gptext-backup. See gptext-restore for syntax and details for running gptext-backup.

Expanding the GPText Cluster

The gptext-expand management utility adds GPText nodes to the cluster. There are two ways to add nodes: - Add GPText nodes to existing hosts in the cluster. This option increases the number of GPText nodes on each host. - Add GPText nodes to new hosts added when using the Greenplum gpexpand management utility to expand the Greenplum Database system.

Adding GPText Nodes to Existing Segment Hosts

To add nodes to existing segment hosts, run the gptext-expand utility with a command like the following:

gptext-expand -e -p /data1/nodes, /data2/nodes

This example adds two GPText nodes to each host.

The -e (--existing) option specifies that nodes are to be added to existing hosts.

The -p (--expand_paths) option provides a list of directories where the new nodes’ data directories are to be created. These should be the same directories that contain the Greenplum segment data directories and existing GPText data directories. The number of directories in the list is the number of new nodes that are added.

A directory can be repeated in the directory list multiple times to increase the number of new GPText nodes to create. For example, if there is currently one GPText node per host in the /data1/nodes directory, you could add three nodes with a command like the following:

gptext-expand -e -p /data1/nodes, /data2/nodes, /data2/nodes

This adds one node to the /data1/nodes directory and two nodes to the /data2/nodes directory so there are two GPText nodes in each directory.

Adding GPText nodes affects new indexes, but not existing indexes. Replicas for new indexes will be distributed across all of the nodes, including both old nodes and the newly created nodes. Replicas for indexes that existed before running gptext-expand are not automatically moved. Rebalancing existing replicas requires reindexing.

Adding GPText Nodes to New Hosts

To add GPText nodes to hosts after expanding the Greenplum cluster with the gpexpand management utility, call gptext-expand with the name of database containing the gpexpand schema, for example, if the gpexpand schema was created in the postgres database:

gptext-expand -d postgres

The gptext-expand utility installs GPText binaries on new hosts and then creates new GPText nodes on the new hosts.

Expanding a Greenplum cluster increases the number of segments, so the number of GPText index shards for existing indexes must be increased to equal the new number of segments. This requires reindexing documents for all existing documents. Newly created indexes will automatically be distributed among the new shards.

Troubleshooting

GPText errors are of the following types:

  • Solr errors
  • gptext errors

Most of the Solr errors are self-explanatory.

gptext errors are caused by misuse of a function or utility. They provide a message that tells you when you have used an incorrect function or argument.

Monitoring Logs

You can examine the Greenplum Database and Solr logs for more information if errors occur. Greenplum Database logs reside in:

segment-directory/pg-log

Solr logs reside in:

<GPDB path>/solr/logs

Determining Segment Status with gptext-state

Use the gptext-state utility to determine if any primary or mirror segments are down. See gptext-state in the GPText Function Reference.