Pivotal Greenplum GPText v2.2.1

Installing GPText


The GPText installation includes the installation of Apache Solr Cloud and, optionally, Apache ZooKeeper.

If you are installing a new GPText release into an existing GPText system, follow the instructions in Upgrading GPText instead.

Following are GPText installation prerequisites.

  • Install and configure your Greenplum Database system, version 4.3.6 or higher. See the Pivotal Greenplum Database Installation Guide at
  • GPText runs on Red Hat Enterprise Linux or CentOS 5.x, 6.x, or 7.x.
  • GPText cannot be installed onto a shared NFS mount.
  • Install a JRE 1.8.x on all hosts in the cluster.
  • Ensure that nc (netcat) is installed on all Greenplum cluster hosts (yum install nc).
  • Installing lsof on all cluster hosts is recommended (sudo yum install lsof).
  • GPText nodes can be installed on the Greenplum Database cluster hosts alongside the Greenplum segments or on additional, non-database hosts accessible on the Greenplum cluster network. All hosts participating in the GPText system must have the same operating system and configuration and have passwordless-ssh access for the gpadmin user. See the Pivotal Greenplum Database Installation Guide for instructions to configure hosts.
  • If you plan to place GPText nodes on the Greenplum Database segment hosts, ensure that you reserve memory for GPText use when you configure Greenplum Database. To determine the memory to set aside for GPText, multiply the number of GPText nodes to create on each Greenplum segment host by the JVM maximum size. Subtract this memory from the physical RAM when calculating the value for the Greenplum Database gp_vmem_protect_limit server configuration parameter. See the Greenplum Database server configuration parameter gp_vmem_protect_limit in the Greenplum Database Reference Guide for recommended memory calculation formulas or visit the GPDB Virtual Memory Calculator web site.
  • Apache Solr requires a ZooKeeper cluster with at minimum three nodes. You can install a “binding” ZooKeeper cluster with GPText on the Greenplum cluster hosts, or you can use an existing ZooKeeper cluster. When deployed alongside Greenplum Database segments, ZooKeeper performance can be affected under heavy database load. For best performance, install a ZooKeeper cluster with at least three nodes (five nodes recommended) on separate hosts with network connectivity to the Greenplum network.

Install the GPText Binary Distribution

  1. On the Greenplum master host, extract the GPText distribution file, a compressed tar archive. For example:

    cd /home/gpadmin
    tar xvfz greenplum-text-release-rhel5_x86_64.tar.gz

    The release directory contains an installation configuration file, gptext_install_config, and the GPText installation binary, which has a name similar to greenplum-text-<version>-<platform>.bin, for example, greenplum-text-2.2.0-rhel6_x86_64.bin.

  2. If necessary, grant execute permission to the GPText binary. For example:

    chmod +x /home/gpadmin/greenplum-text-2.1.0-rhel5_x86_64.bin
  3. If you are installing GPText in a directory that is only accessible to root, for example /usr/local, perform these steps:

    1. Create the installation directory as root and change the ownership to the gpadmin user.
    2. To install to a directory where the user may or may not have write permissions:

      • Use gpssh to create a directory with the same file path on all hosts (mdw, smdw, and the segment hosts sdw1, sdw2, and so on). For example:

      • As root, set the file permissions and owner. For example:

        # chmod 775 /usr/local/<gptext-version>
        # chown gpadmin:gpadmin /usr/local/<gptext-version>
  4. Edit the gptext_install_config file to set parameters for the installation. See Set Installation Parameters for details.

  5. Run the GPText installation binary as gpadmin on the master server:

    ./gptext-<version>.bin -c <gptext_install_config>
  6. Accept the Pivotal license agreement.

Optional Two-Part GPText Installation

You can run the GPText installation in two parts by following these steps.

  1. Prepare GPText installation directories as described in steps 1 through 3 in Install the GPText Binaries.

  2. Run the GPText installation binary as gpadmin on the master server:

    ./gptext-<version>.bin -b

    Note that the -c <gptext_install_config> option is omitted.

  3. Source the GPText environment script in the GPText installation directory:

    source <gptext-install-dir>/
  4. Edit the gptext_install_config file to set parameters for the GPText installation. See Set Installation Parameters for details.

  5. Deploy the GPText cluster with the following command:

    gptext-deploy -c <gptext_install_config>

Set Installation Parameters

A GPText configuration file named gptext_install_config contains parameters to configure the GPText installation. Edit the file and set the parameters as described in the following table.

The GPTEXT_HOSTS and DATA_DIRECTORY installation parameters determine the number of GPText nodes that are deployed.

  • The number of directories included in the DATA_DIRECTORY array is the number of GPText nodes that are created per host.
  • The GPTEXT_HOSTS parameter determines the number of hosts. If set to the constant "ALLSEGHOSTS" the number of GPText node hosts is the same as the number of Greenplum segment hosts. If GPTEXT_HOSTS is set to an array of host names, the length of the array is the number of GPText node hosts.

The maximum number of GPText nodes is the number of Greenplum Database primary segments. The best practice recommendation is to deploy fewer GPText nodes with more memory rather than to divide the memory available to GPText among the maximum number of GPText nodes allowed. For example, if there are eight primary segments per host in the Greenplum Database cluster, the maximum number of GPText nodes per host is eight, but you should test with two or four GPText nodes per host, adjusting the JAVA_OPTS installation parameter to divide the memory reserved for GPText among them.

Table 1. GPText installation parameters
Parameter Description Example
GPTEXT_HOSTS An array of host names on which to install GPText, or use the constant "ALLSEGHOSTS" to install GPText on all Greenplum Database segment hosts. GPText hosts must be passwordless ssh-accessible by the gpadmin user from all other hosts in the Greenplum Cluster.
declare -a GPTEXT_HOSTS=(gptext_host1 gptext_host2 gptext_host3)
DATA_DIRECTORY An array of directory paths where GPText data directories are to be created. The number of directories in the array determines the number of GPText nodes that will be created on each physical host. If GPTEXT_HOSTS lists multiple interfaces per host, the GPText nodes are spread evenly across the interface addresses.
declare -a DATA_DIRECTORY=(/data/primary /data/primary)
JAVA_OPTS Sets the minimum and maximum memory each SolrCloud JVM can use.
JAVA_OPTS="-Xms1024M -Xmx2048M"


Set a range of port numbers available to GPText nodes. GPText finds unused ports in the specified range.
ZOO_CLUSTER Whether to deploy a GPText binding ZooKeeper cluster or use an existing ZooKeeper cluster. If set to "BINDING" the installation deploys a ZooKeeper cluster. To use an existing ZooKeeper cluster, set this parameter to a list of ZooKeeper nodes in the format "host1:port,host2:port,host3:port“.
ZOO_HOSTS If ZOO_CLUSTER is set to "BINDING", this parameter is an array of the hosts where the ZooKeeper nodes are to be installed. The array must contain 3, 5, or 7 host names, for example ZOO_HOSTS=(sdw1 sdw2 swd3 sdw4 sdw5). If you are using a single host for ZooKeeper, specify it multiple times, for example, ZOO_HOSTS=(sdw1 sdw1 swd1). declare -a ZOO_HOSTS=(localhost localhost localhost localhost localhost)
ZOO_DATA_DIR The ZooKeeper data directory, required when ZOO_CLUSTER is set to "BINDING".
ZOO_GPTXTNODE The node path in ZooKeeper for GPText. This parameter is required whether ZOO_CLUSTER is set to "BINDING" or a list of hosts.


A range of port numbers to use for the ZooKeeper cluster. Unused ports are allocated from within this range. The range must contain at least 4000 port numbers.
GPTEXT_JAVA_HOME The home directory of the Java installation to run for ZooKeeper and Solr processes. If not set, the JRE specified in the PATH and JAVA_HOME environment variables will be used.

Starting GPText

First, make sure the GPText command-line utilities are in your path by sourcing the Greenplum Database and GPText environment scripts. It is important to source the GPText environment script each time you source the Greenplum Database script. For example:

source /usr/local/greenplum-db-<version>/
source /usr/local/greenplum-text-<version>/

To use GPText in a database, you must first use the gptext-installsql management utility to install the GPText user-defined functions and other objects in the database:

gptext-installsql database [database2 ... ]

The GPText objects are created in the gptext schema.

The ZooKeeper cluster must be running before you start GPText. If you installed a bound ZooKeeper cluster, start it with the zkManager command-line utility.

$ zkManager start

Start GPText with the gptext-start utility.

$ gptext-start

Configure Greenplum Database

GPText configuration parameters are saved in ZooKeeper. You can, however, view and set GPText configuration parameters in a Greenplum Database session using the SHOW and SET commands. This requires adding the GPText custom variable class to the Greenplum Database custom_variable_classes configuration parameter.

The custom_variable_classes configuration parameter is a comma-separated list of class names. It is unset by default. To see if any custom variable classes have already been configured, run this gpconfig command at the command line.

gpconfig -s custom_variable_classes

If no custom variable classes have been set, set the parameter with the following command.

gpconfig -c custom_variable_classes -v 'gptext'
[gpadmin@gpsne ~]$ gpconfig -c custom_variable_classes -v 'gptext'
20171029:12:29:11:028199 gpconfig:gpsne:gpadmin-[INFO]:-completed successfully

If other classes have been configured, add gptext to the existing list, separated by a comma.

Run gpstop -u to have Greenplum Database reload the configuration file.

When you want to view or set GPText configuration parameters, first execute the gptext.version() function to load the GPText configuration parameters into the session.

=#  SELECT gptext.version();
 Greenplum Text Analytics 2.1.2
(1 row)

=# SHOW gptext.idx_delim;
(1 row)

See Setting GPText Configuration Parameters for more about GPText configuration parameters.

Uninstalling GPText

To uninstall GPText, run the gptext-uninstall utility. You must have superuser permissions on all databases with GPText schemas to run gptext-uninstall.

gptext-uninstall runs only if there is at least one database with a GPText schema.