LATEST VERSION: 2.2.1 - CHANGELOG
Pivotal Greenplum GPText v2.0.0

PivotalĀ® GPText 2.0 Release Notes

This document contains release information for Pivotal GPText 2.0.

About Pivotal GPText 2.0

Pivotal GPText joins the Greenplum Database massively parallel-processing database server with Apache SolrCloud enterprise search and the Apache MADlib (incubating) Analytics Library to provide large-scale analytics processing and business decision support. GPText includes free text search as well as support for text analysis.

GPText includes the following features:

  • The GPText database schema provides in-database access to Apache Solr indexing and searching
  • Custom tokenizers for international text and social media text
  • A Universal Query Processor that accepts queries with mixed syntax from supported Solr query processors
  • Faceted search results
  • Term highlighting in results
  • Greater emphasis on high availability

The GPText management utility suite includes command-line utilities to perform the following tasks:

  • Start, stop, and monitor ZooKeeper and GPText nodes
  • Configure GPText nodes and indexes
  • Add and delete replicas for index shards
  • Back up and restore GPText indexes
  • Recover a GPText node
  • Expand the GPText cluster by adding GPText nodes

Prerequisites

The GPText installation includes the installation of Apache Solr Cloud.

Before you install GPText:

  • Install and configure your Greenplum Database system, version 4.3.5 or higher. See the Greenplum Database Installation Guide at http://gpdb.docs.pivotal.io.
  • When you configure Greenplum Database, first reserve memory on each Greenplum segment host for GPText use. To determine the memory to set aside for GPText, multiply the number of GPText nodes to create on each Greenplum segment host by the JVM maximum size. Subtract this memory from the physical RAM when calculating the value for the Greenplum Database gp_vmem_protect_limit server configuration parameter. See the Greenplum Database server configuration parameter gp_vmem_protect_limit in the Greenplum Database Reference Guide for recommended memory calculation formulas or visit the GPDB Virtual Memory Calculator web site.
  • GPText runs on Red Hat Enterprise Linux 5.x or 6.x.
  • Install Oracle JRE 1.8.x and place it in PATH on the master and all segment servers.
  • GPText cannot be installed onto a shared NFS mount.
  • Ensure that nc (netcat) is installed on all Greenplum cluster hosts (yum install nc).
  • Installing lsof on the Greenplum master and all hosts is recommended (sudo yum install lsof).
  • Apache Solr requires a ZooKeeper cluster with at minimum three nodes. You can install a “binding” ZooKeeper cluster with GPText on the Greenplum cluster hosts, or you can use an existing ZooKeeper cluster. When deployed alongside Greenplum Database segments, ZooKeeper performance can be affected under heavy database load. For best performance, install a ZooKeeper cluster with at least three nodes (five nodes recommended) on separate hosts with network connectivity to the Greenplum network.

Known Issues

  • Solr does not return all fields when the fl Solr search option contains a wildcard that matches field names. For example, given a table with columns contenta and contentb, specifying fl=contenta,contentb,(sum,1,1) correctly returns three fields. Specifying fl=cont*,sum(1,1) correctly returns contenta and contentb, but omits the pseudo-field sum(1,1). Specifying a wildcard to match all fields (fl=*,sum(1,1)) also omits the pseudo-field.

  • If Solr fails to load an index because of a configuration file error, and then the index is dropped without first correcting the configuration file error, the index cannot be recreated until GPText is restarted. This can happen if you edit managed-schema or solrconfig.xml and introduce an XML syntax error or a typo in configuration values. Workaround:

    1. When an index fails to load, check the Solr log to find the cause.
    2. If the cause is a configuration file error, such as invalid XML, use the gptext-config utility to edit the file and fix the error. Dropping the index without first correcting the error is not recommended.
    3. If you have dropped an index that failed to load without first correcting the cause of the failure, you must restart GPText before you can recreate the index. Run gptext-start -r to restart GPText.