LATEST VERSION: 3.2.0 - RELEASE NOTES
Pivotal® Greenplum® Text v2.0.0

Installing GPText

Prerequisites

The GPText installation includes the installation of Apache Solr Cloud.

Before you install GPText:

  • Install and configure your Greenplum Database system, version 4.3.5 or higher. See the Greenplum Database Installation Guide at http://gpdb.docs.pivotal.io.
  • When you configure Greenplum Database, first reserve memory on each Greenplum segment host for GPText use. To determine the memory to set aside for GPText, multiply the number of GPText nodes to create on each Greenplum segment host by the JVM maximum size. Subtract this memory from the physical RAM when calculating the value for the Greenplum Database gp_vmem_protect_limit server configuration parameter. See the Greenplum Database server configuration parameter gp_vmem_protect_limit in the Greenplum Database Reference Guide for recommended memory calculation formulas or visit the GPDB Virtual Memory Calculator web site.
  • GPText requires Red Hat Enterprise Linux 5.x or 6.x.
  • Install Oracle JRE 1.8.x and place it in PATH on the master and all segment servers. Set the JAVA_HOME environment variable to the JRE installation directory and ensure that any other Java environment variables (JAVA_BIN, CLASSPATH) are set properly for the JRE 1.8 installation.
  • GPText cannot be installed onto a shared NFS mount.
  • Ensure that nc (netcat) is installed on all Greenplum cluster hosts (sudo yum install nc).
  • Installing lsof on the Greenplum master and all hosts is recommended (sudo yum install lsof).

Note:
GPText can use an existing Apache ZooKeeper cluster or you can install a “binding” ZooKeeper cluster on the Greenplum cluster during GPText installation. A separate ZooKeeper cluster with at least five nodes is recommended for best performance with heavy database loads. To use a separate ZooKeeper cluster, the cluster must be up and have network connectivity with the Greenplum cluster hosts before you begin installing GPText.


Install the GPText Binaries

  1. On the Greenplum master host, extract the GPText distribution file, a compressed tar archive. For example:

    cd /home/gpadmin
    tar xvfz greenplum-text-release-rhel5_x86_64.tar.gz
    

    The release directory contains an installation configuration file, gptext_install_config, and the GPText installation binary, which has a name similar to greenplum-text-version-OS.bin, for example, greenplum-text-2.0.0-rhel5_x86_64.bin.

  2. If necessary, grant execute permission to the GPText binary. For example:

    chmod +x /home/gpadmin/greenplum-text-2.0.0-rhel5_x86_64.bin
    
  3. If you are installing GPText in a directory that is only accessible to root, for example /usr/local, perform these steps:

    1. Create the installation directory as root and change the ownership to the GPText installer, gpadmin.
    2. To install to a directory where the user may or may not have write permissions:

      • Use gpssh to create a directory with the same file path on all hosts (mdw, smdw, and the segment hosts sdw1, sdw2, and so on). For example:

        /usr/local/greenplum-text-<version>
        
      • As root, set the file permissions and owner. For example:

        # chmod 775 /usr/local/greenplum-text-<version>
        # chown gpadmin:gpadmin /usr/local/greenplum-text-<version>
        
  4. Edit the gptext_install_config file to set parameters for the installation. See Set Installation Parameters for details.

  5. Run the GPText installation binary as gpadmin on the master server:

    ./greenplum-text-<version>.bin -c gptext_install_config
    
  6. Accept the Pivotal license agreement.

Set Installation Parameters

A GPText configuration file named gptext_install_config contains parameters to configure the GPText installation. Edit the file and set the parameters as described in the following table.

Table 1. GPText installation parameters
Parameter Description Default value
DATA_DIRECTORY An array of directory paths where GPText data directories are to be created. The number of directories in the array determines the number of GPText nodes that will be created on each physical host. If GPTEXT_HOSTS lists multiple interfaces per host, the GPText nodes are spread evenly across the interface addresses. declare -a DATA_DIRECTORY=(/data/primary /data/primary)
JAVA_OPTS Sets the minimum and maximum memory each SolrCloud JVM can use. JAVA_OPTS="-Xms1024M -Xmx2048M"
GPTEXT_PORT_BASE

GP_MAX_PORT_LIMIT

Set a range of port numbers available to GPText nodes. GPText finds unused ports in the specified range. 18983 - 28983
ZOO_CLUSTER Whether to deploy a GPText binding ZooKeeper cluster or use an existing ZooKeeper cluster. If set to "BINDING" the installation deploys a ZooKeeper cluster. To use an existing ZooKeeper cluster, set this parameter to a list of the ZooKeeper nodes in the format "host1:port,host2:port,host3:port". See the note under Prerequisites concerning ZooKeeper performance. "BINDING"
ZOO_HOSTS If ZOO_HOSTS=(sdw1 sdw1 swd1 sdw1 sdw1). declare -a ZOO_HOSTS=(localhost localhost localhost localhost localhost)
ZOO_DATA_DIR The ZooKeeper data directory, required when "BINDING". /data/master
ZOO_GPTXTNODE The node path in ZooKeeper for GPText. This parameter is required whether "BINDING" or a list of hosts. "gptext"
ZOO_PORT_BASE

ZOO_MAX_PORT_LIMIT

A range of port numbers to use for the ZooKeeper cluster. Unused ports are allocated from within this range. The range must contain at least 4000 port numbers, that is:
ZOO_MAX_PORT_LIMIT - ZOO_PORT_BASE >= 4000
2188 - 12188

Starting GPText

First, make sure the GPText command-line utilities are in your path by sourcing the following file located in the GPText installation directory:

source install_dir/greenplum-text_path.sh

To use GPText in a database, you must first use the gptext-installsql management utility to install the GPText user-defined functions and other objects in the database:

gptext-installsql database [database2 ... ]

The GPText objects are created in the gptext schema.

Start GPText by running the gptext-start management utility at the command line:

$ gptext-start

Uninstalling GPText

To uninstall GPText, run the gptext-uninstall utility. You must have superuser permissions on all databases with GPText schemas to run gptext-uninstall.

gptext-uninstall runs only if there is at least one database with a GPText schema.

Execute:

gptext-uninstall