LATEST VERSION: 3.2.0 - RELEASE NOTES
Pivotal® Greenplum® Text v2.0.0

GPText Management Utilities

Management utilities are GPText command-line utilities that are used to manage the GPText cluster. The utilities must be run on the Greenplum master as the gpadmin user.

To ensure the utilities are in your path, source the GPText environment script at install_dir/greenplum-text_path.sh. For example:

source /usr/local/gptext-version/greenplum-text_path.sh

Help

To get help for a utility, specify the flag -h or --help. A short help message displays with a list of parameters.

Debugging

To get verbose output for debugging for all functions except gptext-state, specify the flags -v or --verbose.

GPText Utilities

  • gptext-start – starts or restarts the GPText cluster.
  • gptext-stop – shuts down the GPText cluster.
  • gptext-state – display the state of the GPText cluster.
  • gptext-recover – restarts GPText nodes that are down.
  • gptext-installsql – installs or removes the gptext schema and user-defined functions in Greenplum databases.
  • gptext-replica – adds or drops a replica of an index shard.
  • gptext-config – performs GPText configuration options.
  • gptext-backup – backs up a GPText index to a shared file system.
  • gptext-restore – restores a GPText index from a backup on a shared file system.
  • gptext-expand – adds new GPText nodes to existing hosts in the cluster.
  • gptext-uninstall – uninstalls GPText, including data and installed files, and ZooKeeper nodes if they were installed with the GPText installer.
  • zkManager – checks the ZooKeeper cluster state. If ZooKeeper was installed with GPText, zkManager can start or stop the ZooKeeper cluster.

gptext-start

Starts or restarts the GPText cluster.

Syntax

gptext-start -h 

gptext-start [-r] [-v]

Parameters

Parameter Description
-h

--help

Displays a usage message and exits.
-r

--restart

Restarts the GPText cluster.
-v

--verbose

Displays debug output.

Notes

The gptext-start -r command calls the solr restart command to stop and restart all of the Solr instances in the cluster. The GPText utility determines if the processes are running before it completes, but it cannot verify that all of the Solr processes were stopped. If it is important to be certain that Solr processes were stopped, for example if you have changed the JVM options, use gptext-stop followed by gptext-start instead of gptext-start -r.

Examples

  1. Start the GPText cluster.

    gptext-start
    
  2. Restart the GPText cluster.

    gptext-start -r
    

gptext-stop

Stop the GPText cluster nodes.

Syntax

gptext-stop -h

gptext-stop [-v] [-f]

Parameters

Parameter Description
-h

--help

Displays a usage message and exits.
-v

--verbose

Displays debug output.
-f

--force

Forcefully stops all Solr processes.

Examples

  1. Stop the GPText cluster.

    gptext-stop
    
  2. Force stop the GPText cluster.

    gptext-stop -f
    

gptext-state

Displays the state of the cluster.

Syntax

gptext-state -h

gptext-state [-d db-name] [-i index-name] [-c]

gptext-state list [-d db-name]

gptext-state healthcheck [--stats] [--index=<index_name>]
          [--healthcheck [--disk_free=<percent>]]
          [--stats-columns=<col1,...>] [--database=<database_name>] 

Parameters

Parameter Description
-h

--help

Displays a usage message and exits.
-d db-name

--database=db-name

The name of a database containing the GPText schema.

gptext-state searches all databases for the functions it needs to run. If the user does not have access permission to the database it begins with, it fails. In this case, use the --database= parameter to specify an accessible database to search.

-i index-name

--index=index-name

The name of an index. This option cannot be used with the list or healthcheck subcommands.
-c stats-list

--stats_columns=stats-list

Used with the -i or --index option, specifies a comma-separated list of statistics to display. The list may contain replication_factor, max_shards_per_node, num_docs, and size_in_bytes. If no -c or --stats_columns option is supplied, all four statistics are displayed.
-f diskfree

--disk_free=diskfree

Used with the healthcheck command, specifies the percentage disk free required per host to report a healthy GPText cluster. The default is 10.

Notes

All parameters are optional, except that --index is required when you specify --stats_columns.

When executed with no arguments, gptext-state displays a list of GPText indexes with the columns database, index_name, and state. The state column displays the status of the index as Green, Yellow, or Red.

  • A Green state means that all shards and replicas are healthy.
  • A Yellow state means that all shards are available, but one or more replicas is down.
  • A Red state means that one more more shards is down.

Examples

  1. Show the GPText cluster state, specifying wikipedia as a database containing the GPText schema.

    $ gptext-state -d wikipedia
    20160603:13:54:27:302662 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText cluster status...
    20160603:13:54:27:302662 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Current GPText Version: 2.0.0
    20160603:13:54:28:302662 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-All nodes are up and running.
    20160603:13:54:28:302662 gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------
    20160603:13:54:28:302662 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Index state.
    20160603:13:54:28:302662 gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------
    20160603:13:54:28:302662 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-   database    index_name                  state
    20160603:13:54:28:302662 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-   wikipedia   wikipedia.public.articles   Green
    20160603:13:54:28:302662 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-   gptextdoc   gptextdoc.public.docs       Green
    
  2. Show replication_factor and num_docs statistics for the GPText index wikipedia.public.articles. Specify wikipedia as the database with the GPText schema.

    $ gptext-state -i wikipedia.public.articles -c replication_factor,num_docs -d wikipedia
    20160603:13:57:16:303262 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText cluster statistics...
    20160603:13:57:18:303262 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-   Replicas Up:   6
    20160603:13:57:18:303262 gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------
    20160603:13:57:18:303262 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Index wikipedia.public.articles statistics.
    20160603:13:57:18:303262 gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------
    20160603:13:57:18:303262 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-   replication_factor   num_docs
    20160603:13:57:18:303262 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-   3                    11
    
  3. List all indexes, specifying wikipedia as a database containing the GPText schema.

    $ gptext-state list -d wikipedia
    20160603:13:58:10:303550 gptext-state:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------------
    20160603:13:58:10:303550 gptext-state:gpdb-sandbox:gpadmin-[INFO]:- Index list
    20160603:13:58:10:303550 gptext-state:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------------
    20160603:13:58:10:303550 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-  gptextdoc.public.docs
    20160603:13:58:10:303550 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-  wikipedia.public.articles
    
  4. Perform a health check with a 20% free disk requirement.

    $ gptext-state healthcheck -f 20 -d wikipedia
    20160603:13:58:56:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Execute healthcheck on GPText cluster!
    20160603:13:58:56:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText config files ...
    20160603:13:58:56:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
    20160603:13:58:56:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText index status ...
    20160603:13:58:56:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
    20160603:13:58:56:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for required disk space...
    20160603:13:58:57:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
    20160603:13:58:57:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for required user privileges...
    20160603:13:58:57:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
    20160603:13:58:57:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for indexes and database consistency...
    20160603:13:58:58:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
    20160603:13:58:58:303655 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.
    

gptext-recover

Recovers GPText nodes.

Syntax

gptext-recover -h

gptext-recover [-f] [-v]

gptext-recover [-H] <new-host1>,<new-host2>,... [-v]

Parameters

Parameter Description
-h

--help

Displays a usage message and exits.
-f

--force

Forces recovery for any GPText nodes that are down. If the node is unrecoverable, deletes the node, creates a new node, and recreates replicas.
-H

--new-hosts

Recover down nodes on new hosts. For example “host1,host2”.
-v

--verbose

Displays debug output.

Notes

The -f and -H options cannot be used at the same time.

If shards are down, gptext-recover advises you to reindex.

If no shards are down, gptext-recover restores any replicas that are down.

gptext-installsql

Installs or removes the gptext schema and user-defined functions in databases.

Syntax

gptext-installsql -h 

gptext-installsql [-c] [-v] db_name [db2_name...]

Parameters

Parameter Description
-c

--clean

Removes the gptext schema and UDFs from the specified databases.
-h

--help

Displays a usage message and exits.
-v

--verbose

Displays debug output.

Notes

None.

Examples

  1. Install GPText UDFs in databases wikipedia and twitter.

    $ gptext-installsql wikipedia twitter
    20160603:14:03:53:305130 gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Creating 'gptext' schema and UDFs in database wikipedia...
    20160603:14:03:53:305130 gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Creating 'gptext' schema and UDFs in database twitter...
    20160603:14:03:54:305130 gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Validating gptext installation
    20160603:14:03:59:305130 gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Done.
    
  2. Delete GPText UDFs in database wikipedia.

    $ gptext-installsql -c wikipedia
    20160603:14:03:04:304847 gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Connecting to database wikipedia
    20160603:14:03:05:304847 gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Dropping 'gptext' schema and UDFs...
    20160603:14:03:05:304847 gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Validating clean operation
    20160603:14:03:09:304847 gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Done.
    

gptext-replica

Add or delete a replica for an index shard.

Syntax

gptext-replica -h

gptext-replica add -i index-name -s shard [-n node]

gptext-replica drop -i index-name -s shard -r replica [-o]

Parameters

Parameter Description
-h

--help

Displays a usage message and exits.
-i index

--index= index

Required. The name of the index.
-s shard

--shard=shard

Required. The name of the shard to add a replica to.
-n node

--node=node

Optional. The node where the replica is to be added.
-r replica

--replica=replica

Required for the drop command only. The name of the replica to drop.
-o

--onlyifdown

Optional. Used only with the drop command. Only drop the replica if it’s down.

Notes

To find the name of a replica to drop, check gptext.index_status(). The name is core_nodeX where X is a number.

Examples

  1. Add a replica for index wikipedia.public.articles in shard shard0, on node node1.

    gptext-replica add -i wikipedia.public.articles -s shard0 -n node1
    
  2. Drop the replica named core_node1 for index wikipedia.public.articles in shard shard0 if the replica is down.

    gptext-replica drop -i wikipedia.public.articles -s shard0 -r core_node3 -o
    

gptext-config

Performs GPText configuration tasks:

  • Edit, add, or upload configuration files in ZooKeeper
  • Revert configured files in ZooKeeper
  • Edit JVM configuration options
  • Upload jar files to the GPText home directory

Syntax

gptext-config -h | --help

gptext-config -f file_name -i index_name [-r] [-e] [-b]

gptext-config -a append_file -f file_name -i index_name

gptext-config -u path/local_file_name -f path/zookeeper_file_name -i index_name

gptext-config -j path/jar_file

gptext-config -o jvm_options

gptext-config -f file_name -i index_name -a file_to_append

gptext-config -k solr_key -s <solr_value>

Parameters

Parameter Description
-i index-name

--index=index-name

Name of the index.
-f filename

--file=filename

The name of a file to edit, append, or upload. The -i option must be included to specify the index. The following files are supported:
  • solrconfig.xml – Contains most of the parameters for configuring Solr itself (see [https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml](http://wiki.apache.org/solr/SolrConfigXml)).
  • schema.xml – Defines the analysis chains that Solr uses for various different types of search fields (see [Setting up Text Analysis Chains](indexes.html#topic5)).
  • stopwords.txt – Lists words you want to eliminate from the final index. You can also edit language specific stopwords by specifying a filename in the format stopwords_language_code.txt, where language_code is a two-character code such as en, fr, or es.
  • protwords.txt – Lists protected words that you do not want to be modified by the analysis chain. For example, iPhone.
  • synonyms.txt – Lists words that you want replaced by synonyms in the analysis chain.
  • emoticons.txt – Defines emoticons for the text_sm social media analysis chain. See [gptext-start](#topic13).
  • currency.txt – Defines exchange rates between one currency and another (see https://cwiki.apache.org/confluence/display/solr/Working+with+Currencies+and+Exchange+Rates).
  • jar_file – the name of a jar file to upload to GPText_Install_Directory/lib/.
-e command

--editor=command

Editor to use. Choices are any editor that takes a filename on the command line as a parameter. For example, vi, vim, emacs, nano, etc. If absent, vi is used.
-a filename

--append= filename

Appends a named file to a configuration file and distributes the resulting files. Requires the -f and -i parameters. -f names the configuration file to which you want to append the file named (including local path) with the -a parameter.
-r

--revert

filename
Revert named file to previous version.
-b n

--batch_size=n

How many Solr instances to configure concurrently. The default (64) is generally more than enough. A larger number may increase speed.
-u local_file_path

--upload local_file_path

Upload a configuration file at local_file_path to ZooKeeper. Specify the destination Zookeeper file name with the -f option and specify the index name with the -i option.
-j jarfile

--jar=jarfile

Uploads a jar file to GPText_Install_Directory/lib/.
-o “JVM_Options” Modifies JVM options. To ensure that the JVMs are restarted after changing JVM options, restart the GPText cluster using the gptext-stop and gptext-start utilities.
-k

--solr_prop=

-s

---solr_val=

Sets a custom property in the solr.xml configuration file. The -k option specifies the name of the property. The -s option specifies the value. Currently, the only custom GPText property is the trackCommit property, which enables or disables commit tracking.

Notes

Use the gptext-config utility to edit the configuration files for a specified index.

Warning: Never edit the template configuration files. If you do, every index you create after editing the templates will be created with your modified versions. Use the gptext-config utility to ensure that you are editing the configuration files for your index, rather than the template configuration files.

gptext-config automatically reindexes after editing files if the configuration changes made require it.

If you use the -f (--file) parameter to edit one of the index configuration files, GPText automatically places the edited file in its proper directory.

To move an index configuration file from the local file system to the index configuration directory in all of the segments, specify the local file with the -u (--upload) option and the destination file with the -f (–file`) option.

The -k(--solr_prop) and -s (--solr_val) parameters must always be used together. They set a custom GPText property in the solr.xml configuration file. Currently, they are only used to enable or disable commit tracking. Changes to solr.xml do not take effect until the GPText instance is restarted.

Examples

  1. Edit the managed-schema file in index wikipedia.public.articles, using the vi editor:

    gptext-config -f managed-schema -i wikipedia.public.articles -e vi
    
  2. Append the file stopwords.add to stopwords.txt in index wikipedia.public.articles:

    gptext-config -a stopwords.add -f stopwords.txt -i wikipedia.public.articles
    
  3. Revert file managed-schema in index wikipedia.public.articles after editing it.

    gptext-config -f managed-schema -i wikipedia.public.articles -r
    
  4. Upload the local file custom.txt to the ZooKeeper file custom.conf in index wikipedia.public.articles:

    gptext-config -u custom.txt -f custom.conf -i wikipedia.public.articles
    
  5. Upload jar file text.jar to the lib directory in the GPText home directory:

    gptext-config -j text.jar
    
  6. Set JVM options:

    gptext-config -o "-Xms256M -Xmx400M"
    

gptext-backup

Backs up a GPText index to a shared file system.

Syntax

gptext-backup -h

gptext-backup -p <path> -i <index> -n <name> [-v]

Parameters

Parameter Description
-h

--help

Displays a usage message and exits.
-p path

--path path

The path where the shared file system is mounted on each host. The file system must be accessible from all hosts in the cluster and readable and writable by the gpadmin user.
-i index_name

--index index_name

The name of the GPText index to back up.
-n backup_name

--name backup_name

A name for the backup.

Example

gptext-backup -i myindex -p /mnt/backupfs/gptext-backups -n mybackup

Notes

You can back up an index so that you can restore it to a different GPText system or avoid having to reindex if the existing index becomes corrupted.

The shared file system must be mounted on all hosts with GPText nodes and must be writable by the gpadmin user. The file system could be, for example, an NFS mount or a SSH server with sshfs support. The file system must be configured and accessible before you execute the gptext-backup utility and able to accept connections from each host in the cluster.

The gptext-backup utility creates a new subdirectory at the specified path with the backup name specified. The command fails if the directory already exists.

When the backup is complete, the backup directory contains the following:

backup.info
A text file containing three comma-separated strings: the database name, schema name, and index name for the index that was backed up.

backup.properties
A text file with properties that describe the backup, such as the date and time the backup started, the name of the backup, and the names of the Solr collection and collection configuration.

zk_backup
A directory containing the following:

  • collection_state.json – a JSON file describing the status of the Solr collection.

  • configs/<collection-name>/ – a directory containing copies of the Solr configuration files stored in ZooKeeper for the index, for example managed-schema, solrconfig.xml, protwords.txt, stopwords.txt.

snapshot.shard0 … snapshot.shard_N_
A directory for each shard, with the files containing content of the shard.

If the backup fails—for example if there is insufficient disk space—an error message is displayed, but the backup directory is not removed. Be sure to remove the backup directory before restarting the backup.

gptext-restore

Restore a GPText index from a backup created on a shared file system with the gptext-backup utility.

Syntax

gptext-restore -h

gptext-restore -i <index_name> -p <path> [-v]

Parameters

Parameter Description
-h

--help

Displays a usage message and exits.
-p path

--path path

The path to the backup directory on each host.
-i index_name

--index index_name

The name of the GPText index to restore. The index must not already exist in the target GPText system.

Example

gptext-restore -i myindex -p /mnt/backupfs/gptext-backups/mybackup

Notes

Use the gptext-restore utility to restore a GPText index backup from a shared file system. You can restore the backup to a new GPText system or restore a backup to recover a corrupted GPText index.

The index you are restoring must not exist. If you are restoring an index to recover a corrupted index, you must first delete the existing index with the gptext.delete() UDF. The gptext-restore utility creates a new index and will output an error and quit if the index you are restoring exists.

gptext-expand

Expands a GPText cluster by adding new GPText nodes to existing hosts in a GPText cluster or to hosts added by the Greenplum Database gpexpand management utility. Replicas for indexes created after the new GPText nodes are added will be distributed across the new and existing nodes. Documents must be reindexed to rebalance replicas on existing hosts or, after expanding the Greenplum cluster, to redistribute the index to new shards.

Synax

gptext-expand -h 

gptext-expand -e -p <paths> [-v] 

gptext-expand -d <database> [-v]

Parameters

Parameter Description
-h

--help

Displays a usage message and exits.
-e

--existing

Adds GPText nodes to existing hosts in the GPText cluster. Either the -e or the -d option must be specified.
-p

--expand_paths

Specifies paths to directories where the new GPText nodes’ data directories are to be created. These directories should be parallel to the Greenplum Database segment data directories. If there is more than one directory, place them in a comma-delimited list, for example -p /data1/nodes, /data1/nodes, /data2/nodes. Required when expanding on existing hosts.
-d

--database

Specifies the name of the database containing the gpexpand schema used to expand the Greenplum Database cluster. Either the -e or the -d option must be specified.
-v

--verbose

Displays debug output.

Notes

  • The -p and -d options cannot be used together.

  • Existing replicas are not automatically redistributed. To rebalance replicas among the expanded GPText cluster, you must reindex.

  • When expanding to new hosts, you must reindex to redistribute the index among existing and new shards.

gptext-uninstall

Uninstalls GPText, including data and installed files. Uninstalls ZooKeeper nodes if they were installed with the GPText installer.

  • Stops any running GPText instances.
  • Deletes all Solr directories in segment directories.
  • Deletes the installation directory.
  • Removes all GPText schemas and indexes from all databases.
  • Uninstalls ZooKeeper if it was installed with the GPText installer.

Syntax

gptext-uninstall -h | --help

gptext-uninstall [-v | --verbose]

Parameters

Parameter Description
-h

--help

Displays a usage message and exits.
-r

--restart

Restarts the GPText cluster.
-v

--verbose

Displays debug output.

Notes

  • To use gptext-uninstall, you must have superuser permissions on all databases with GPText schemas.
  • gptext-uninstall runs only if there is at least one database with a GPText schema.

Examples

  1. Uninstall GPText.

    gptext-uninstall
    

zkManager

Checks the ZooKeeper cluster state. If ZooKeeper was installed with GPText, zkManager can start or stop the ZooKeeper cluster.

Syntax

zkManager [-h | --help] 

zkManager state [-v | --verbose]

zkManager start [-v | --verbose]

zkManager stop [-v | --verbose] [-f | --force]

Parameters

Parameter Description
-h

--help

Display a usage message and quit.
-f

--force

When used with the stop command, performs a forced stop.
-v

--verbose

Displays debug output when executing the command.

Notes

  • The zkManager start and zkManager stop commands are only available if the ZooKeeper cluster was installed by the GPText installer.
  • By default, all gptext-* utilities check the ZooKeeper cluster state. If the cluster is not healthy, the ZooKeeper state information is displayed to warn the user.
  • The nc (netcat) command must be installed on the master host. Run nc in a terminal to ensure the command is installed.

Examples

  1. Start the ZooKeeper cluster, if ZooKeeper was installed by the GPText binary:

    zkManager start
    
  2. Stop the ZooKeeper cluster, if ZooKeeper was installed by the GPText binary:

    zkManager stop
    
  3. Force stop the ZooKeeper cluster, if ZooKeeper was installed by the GPText binary:

    zkManager stop -f
    
  4. Check the state of the ZooKeeper cluster:

    $ zkManager state
    20160603:14:17:01:307386 zkManager:gpdb-sandbox:gpadmin-[INFO]:-Execute zookeeper state process.
    20160603:14:17:01:307386 zkManager:gpdb-sandbox:gpadmin-[INFO]:-   Host                       port   Latency min/avg/max   Mode
    20160603:14:17:01:307386 zkManager:gpdb-sandbox:gpadmin-[INFO]:-   gpdb-sandbox.localdomain   2188   0/0/17                follower
    20160603:14:17:01:307386 zkManager:gpdb-sandbox:gpadmin-[INFO]:-   gpdb-sandbox.localdomain   2189   0/0/17                leader
    20160603:14:17:01:307386 zkManager:gpdb-sandbox:gpadmin-[INFO]:-   gpdb-sandbox.localdomain   2190   0/0/70                follower
    20160603:14:17:06:307386 zkManager:gpdb-sandbox:gpadmin-[INFO]:-Done.