GPText administration includes security considerations, monitoring Solr index statistics, and troubleshooting.
Configuration parameters used with GPText are built-in to GPText with default values. You can change the values for these parameters by setting the new values in a Greenplum Database session. The new values are stored in ZooKeeper. GPText indexes use the values of configuration parameters when they are created. Changing configuration parameters affects new indexes, but does not affect existing indexes.
See GPText Configuration Parameters for a complete list of configuration parameters.
A one-time Greenplum Database configuration change is needed for Greenplum Database to allow setting and displaying GPText configuration variables. As the
gpadmin user, enter the following commands in a shell:
$ gpconfig -c custom_variable_classes -v 'gptext' $ gpstop -u
Then connect to a database that contains the GPText schema and execute the
gptext.version() function to expose the GPText configuration variables:
twitter=# select * from gptext.version();
Change the values of GPText configuration variables using the
SET command in a session with a database that contains the GPText schema. The following example sets values for three configuration parameters in a
twitter=# set gptext.idx_buffer_size=10485760; SET twitter=# set gptext.idx_delim='|'; SET twitter=# set gptext.extension_factor=5; SET
You can view the current value of a configuration parameter that you have set using the
twitter=# show gptext.idx_delim; gptext.idx_delim ------------------ | (1 row)
GPText security is based on Greenplum Database security. Your privileges to execute GPText functions depend on your privileges for the database table that is the source for the index. For example, if you have SELECT privileges for a table in the Greenplum database, then you have SELECT privileges for an index generated from that table.
Executing GPText functions requires one of OWNER, SELECT, INSERT, UPDATE, or DELETE privileges, depending on the function. The OWNER is the person who created the table and has all privileges. See the Greenplum Database Administrator Guide for information about setting privileges.
Use the zkManager utility from the command line to check the ZooKeeper cluster status. If the Zookeepr cluster is bound to GPText, you can start and stop the cluster using zkManager.
To check the ZooKeeper cluster status, run the following command:
The utility lists the hosts, ports, latency, and follower/leader mode for each ZooKeeper instance. If a node is down, it’s mode is listed as Down.
If the ZooKeeper cluster was installed by the GPText installer, the zkManager utility can be used to start or stop the ZooKeeper cluster. To start the cluster, run the following command:
To stop ZooKeeper, run this command:
You can check the status of the SolrCloud cluster and indexes by running the
gptext-state utility from the command line.
To check the state of the GPText nodes and indexes, run the
gptext-state utility with no options:
This command reports the status of the GPText nodes and status of each GPText index.
gptext-state list to view just the indexes.
gptext-state healthcheck command checks the GPText configuration files, the index status, required disk space, user privileges, and index and database consistency. By default, the required disk space check passes if there is at least 20% disk free. You can set a different disk free threshold using the
--disk_free option. For example:
[gpadmin@gpdb-sandbox ~]$ gptext-state healthcheck --disk_free=25 20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Execute healthcheck on GPText cluster! 20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText config files ... 20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD 20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText index status ... 20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD 20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for required disk space... 20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD 20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for required user privileges... 20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD 20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for indexes and database consistency... 20160629:15:45:27:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD 20160629:15:45:27:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.
gptext-state utility reference for additional options.
gptext-recover utility to recover down GPText nodes, for example after a failed Greenplum segment host is recovered.
With no arguments, the
gptext-recover utility discovers down GPText nodes and restarts them.
--force) option, if a GPText node cannot be restarted and no shards are down, the node is deleted and created again on the same host. Missing replicas are added and the failed node and failed replicas are removed.
-H option allows recreating down GPText nodes on a new host that replaces a failed host. It forces down nodes to be deleted and recreated on the new host. If shards are down, it advises reindexing. If only some replicas are down, it recreates the replicas on the new host and updates
You can gather Solr index statistics by running the
gptext-state utility from the command line.
To list all GPText indexes, enter the following command at the command line:
A command line sample that retrieves all statistics for an index:
gptext-state --index wikipedia.public.articles
A command line sample that retrieves the number of documents in an index:
gptext-state --index wikipedia.public.articles --stats-columns num_docs
A command line sample that retrieves ’
num_docs’ and the index
gptext-state --index wikipedia.public.articles --stats-columns num_docs,size
gptext-backup management utility, you can back up a GPText index to a shared directory so that, if needed, you can quickly recover from a failure. The backup can be restored to the same GPText system or to another system with the same number of Greenplum segments.
gptext-backup management utility backs up an index and its configuration files to a shared file system, which must be accessible and writable by each host in the Greenplum cluster. The
--path command-line option specifies the location of a directory on the mounted file system. The
--name option provides a name for the backup.
gptext-backup utility first checks that:
- the GPText cluster is up
- the shared file system is valid
- the directory specified with the
--nameoption does not already exist at the location specified by the
The utility creates the new directory and saves one copy of each index shard to that directory, along with the index’s configuration files.
To restore an index, use the
gptext-restore management utility. The GPText system you restore to must be on a Greenplum cluster with the same number of segments. The database, schema, and base table for the index must be present.
--index option specifies the name of the GPText index that will be restored. If the index exists, you must first drop it with the
gptext.delete() user-defined function.
--path option specifies the location of the directory containing the backup files—the directory that
gptext-backup created on the shared file system.
gptext-expand management utility adds GPText nodes to the cluster. There are two ways to add nodes:
- Add GPText nodes to existing hosts in the cluster. This option increases the number of GPText nodes on each host.
- Add GPText nodes to new hosts added when using the Greenplum
gpexpand management utility to expand the Greenplum Database system.
Adding GPText Nodes to Existing Segment Hosts
To add nodes to existing segment hosts, run the
gptext-expand utility with a command like the following:
gptext-expand -e -p /data1/nodes, /data2/nodes
This example adds two GPText nodes to each host.
--existing) option specifies that nodes are to be added to existing hosts.
--expand_paths) option provides a list of directories where the new nodes’ data directories are to be created. These should be the same directories that contain the Greenplum segment data directories and existing GPText data directories. The number of directories in the list is the number of new nodes that are added.
A directory can be repeated in the directory list multiple times to increase the number of new GPText nodes to create. For example, if there is currently one GPText node per host in the
/data1/nodes directory, you could add three nodes with a command like the following:
gptext-expand -e -p /data1/nodes, /data2/nodes, /data2/nodes
This adds one node to the
/data1/nodes directory and two nodes to the
/data2/nodes directory so there are two GPText nodes in each directory.
Adding GPText nodes affects new indexes, but not existing indexes. Replicas for new indexes will be distributed across all of the nodes, including both old nodes and the newly created nodes. Replicas for indexes that existed before running
gptext-expand are not automatically moved. Rebalancing existing replicas requires reindexing.
Adding GPText Nodes to New Hosts
To add GPText nodes to hosts after expanding the Greenplum cluster with the
gpexpand management utility, call
gptext-expand with the name of database containing the gpexpand schema, for example, if the gpexpand schema was created in the postgres database:
gptext-expand -d postgres
gptext-expand utility installs GPText binaries on new hosts and then creates new GPText nodes on the new hosts.
Expanding a Greenplum cluster increases the number of segments, so the number of GPText index shards for existing indexes must be increased to equal the new number of segments. This requires reindexing documents for all existing documents. Newly created indexes will automatically be distributed among the new shards.
GPText errors are of the following types:
- Solr errors
Most of the Solr errors are self-explanatory.
gptext errors are caused by misuse of a function or utility. They provide a message that tells you when you have used an incorrect function or argument.
You can examine the Greenplum Database and Solr logs for more information if errors occur. Greenplum Database logs reside in:
Solr logs reside in:
gptext-state utility to determine if any primary or mirror segments are down. See
gptext-state in the GPText Function Reference.