LATEST VERSION: 3.2.0 - RELEASE NOTES
Pivotal® Greenplum® Text v2.0.0

Queries

To retrieve data, you submit a query that performs a search based on criteria that you specify. Simple queries return straight-forward results. You can use the default query parser, or specify a different query parser at query time.

Creating a Simple Search Query

After a Solr index is committed, you can create simple queries with the gptext.search() function:

gptext.search(src_table, index_name, search_query, filter_queries[, options])

Example - Top 10 results, no filter query

SELECT w.id, w.date_time, w.title, q.score
FROM articles w,
gptext.search(TABLE(SELECT 1 SCATTER BY 1), 
'wikipedia.public.articles' , 'Libya AND  (Gadaffi OR Kadafi OR 
Qad*I)', null, 'rows=10') q
WHERE q.id = w.id;

Example - Top 100 results, no filter query

SELECT w.title, q.score FROM wikipedia w, 
gptext.search(TABLE(SELECT 1 SCATTER BY 1), 
'wikipedia.public.articles', 'solr search query', null, 'rows=100') q 
WHERE w.id = q.id;

Example - All results, no filter query

SELECT w.title, q.score FROM wikipedia w, 
gptext.search(TABLE(SELECT 1 SCATTER BY 1), 
'wikipedia.public.articles', 'solr search query', null) q 
WHERE w.id = q.id;

Example - Top 100 results, with filter query

SELECT w.title, q.score FROM wikipedia w, 
gptext.search(TABLE(SELECT 1 SCATTER BY 1), 
'wikipedia.public.articles', 'solr search query', 
'solr filter query', 'rows=100') q 
WHERE w.id = q.id;

Creating a Faceted Search Query

Faceting breaks query results into multiple categories and shows counts for each category.

With the faceted_field_search() function, the categories are field names. Here is the syntax for the faceted_field_search() function:

faceted_field_search(index_name, query, filter_queries, facet_fields, facet_limit, minimum)

In this example, the query runs on a set of social media feeds and eliminates spam and truncated fields:

SELECT * FROM 
gptext.faceted_field_search('twitter.public.message', '*:*', null, '{spam, truncated}', 0, 0);

In this example, the query searches all tweets in the data set by users (author_ids) who have created at least five tweets:

SELECT * FROM 
gptext.faceted_field_search('twitter.public.message', '*:*', null, '{author_id}', -1, 5);

Highlighting Search Terms in Query Results

The gptext.highlight() function highlights search terms by inserting markup tags before and after each occurrence of the term in results returned from a gptext.search() query. For example, if the search term is “iphone”, each occurrence of “iphone” in the data is marked up:

<em>iphone</em>

You can change the default markup strings from <em> and </em> by setting the gptext.hl_pre_tag and gptext.hl_post_tag server configuration parameters.

The index must have been created with terms enabled. Use gptext.enable_terms() to enable term vectors and the reindex the data if it was already indexed. See gptext.enable_terms() in the Pivotal GPText User’s Guide.

Important: Highlighting adds overhead to the query, including index space, indexing time, and search time.

The gptext.highlight() syntax is:

gptext.highlight(column_data, column_name, offsets)

The column_data parameter contains the text data that will be marked up with highlighting tags.

The column_name parameter is the name of the corresponding table column.

The offsets parameter is a gptext hstore type that contains key-value pairs that specify the locations of the search term in the text data. This value is constructed by the gptext.search() function when highlighting is enabled. The key contains the column name and the value is a comma-separated list of offsets where the data appears.

To enable highlighing in a gptext.search() query, add the hl and hl.fl options:

hl=true&hl.fl=field1,field2

Setting the hl=true option enables highlighting for the search. The hl.fl option specifies a list of the field names to highlight.

Example - Top 10 results, with search terms highlighted

SELECT t.id, gptext.highlight(t.message, 'message', s.hs) 
FROM twitter.message t, 
     gptext.search(TABLE(SELECT 1 SCATTER BY 1), 
            'demo.twitter.message',
            '{!gptextqp}iphone', null, 
            'rows=10&hl=true&hl.fl=message' ) s 
WHERE t.id = s.id;

Retrieving Additional Stored Fields

A GPText index can be configured to store additional fields when documents are indexed with the gptext.index() function. The additional stored fields can be returned with search results so that it is unnecessary to join GPText search results with the original table in the Greenplum Database.

See Storing Additional Fields in an Index for instructions to configure the index to store the additional fields.

Retrieve the additional field values in a GPText search by specifying the list of fields in the gptext.search() options argument. For example:

SELECT * 
FROM gptext.search(TABLE(SELECT 1 SCATTER by 1), 
    'wikipedia.public.articles', 
    '*:*', NULL, 'fl=revision,author');

This query returns all results and the revision and author fields for each result.

To retrieve all stored fields, use the * wildcard for the field list: 'fl=*'.

In the search results, the requested fields are packed into an extra field named rf with the following format:

column_value { name: "_version" value: "1544714234398507008" } column_value { name: "revision" value: "9.70" } column_value { name: "author" value: "jdough" }

Use the gptext.gptext_retrieve_field(rf,<column_name>) function to retrieve a single field from this structure as a text value. For example:

gptext.gptext_retrieve_field(rf, 'author')

If the specified field name does not exist in the rf structure, the function returns NULL.

Use gptext.gptext_retrieve_field_float() to retrieve the column value converted to a float value or gptext.gptext_retrieve_field_int() to retrieve the column value converted to an integer.

Using Advanced Querying Options

When you submit a query, Solr processes the query using a query parser. There are several Solr query parsers with different capabilities. For example, the ComplexPhraseQueryParser can parse wildcards, and the SurroundQueryParser supports span queries: finding words in the vicinity of a search word in a document.

You can use the most appropriate parser for your query (see Changing the Query Parser at Query Time).

GPText supports these query parsers:

  1. QParserPlugin, the default GPText query parser. QParserPlugin is a superset of the LuceneQParserPlugin, Solr’s native Lucene query parser. QParserPlugin is a general purpose query parser with broad capabilities. QParserPlugin does not support span queries and handles operator precedence in an unintuitive manner. The support for field selection is also rather weak. See http://wiki.apache.org/solr/SolrQuerySyntax.
  2. The ComplexPhraseQueryParser, which supports wildcards, ORs, ranges, and fuzzies inside phrase queries. See https://issues.apache.org/jira/browse/SOLR-1604.
  3. The SurroundQueryParser, which supports the family of span queries. See http://wiki.apache.org/solr/SurroundQueryParser.
  4. The DisMax (or eDisMax) Query Parser, which handles operator precedence in an intuitive manner and is best suited for user queries. See http://wiki.apache.org/solr/DisMaxQParserPlugin.
  5. The Unified Query Parser can use all the query parsers in combination. See Using the Universal Query Parser for more information.

A good general reference for query parsers can be found at: http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers.

Changing the Query Parser at Query Time

You can change the query parser at query time with the defType Solr option in the gptext.search() function that supports Solr options. For example, this query uses the dismax query parser to return the top 100 results for “olympics”, with no filter query, and using the default search field:

Top 100 results, no filter query, default search field used:

SELECT w.title, q.score FROM wikipedia w, 
gptext.search(TABLE(SELECT 1 SCATTER BY 1), 
'wikipedia.public.articles', 'olympics', null, 
'rows=100&defType=dismax') q
WHERE w.id = q.id;

The options parameter includes ‘defType=dismax’.

Invoke the Universal Query Parser by inserting {!gptextqp} <search query> in the query parameter of the gptext.search() function.

Using the Solr localParams syntax

You can use the Solr localParams syntax with all GPText search functions to change the query parser at query time by replacing the <query> term with '{!type=dismax}<query>'.

Note: The default query parser is specified in the requestHandler definitions in solrconfig.xml. You can edit solrconfig.xmlwith the management utility gptext-config.

Using the Universal Query Parser

With the Universal Query Parser, you can perform searches using features of any of the query parsers in one search string. Invoke the Universal Query Parser with the gptext.search() function’s search_query parameter in this format:

'{!gptextqp} search_query'

The following search illustrates using the Universal Query Parser:

SELECT w.title, q.score 
FROM wikipedia w, 
  gptext.search(TABLE(SELECT 1 SCATTER BY 1),'wikipedia.public.articles', 
      '{!gptextqp} mobil* 2W (Obama OR Clinton)', null) q
WHERE w.id = q.id;

The search query in this example includes syntax for three query parsers:

  • mobil* – Complex Query with wildcard
  • 2W – Proximity Query, requesting a maximum of two words proximity between the phrases
  • Obama OR Clinton – Lucene Query