Troubleshooting Hadoop Connection Problems
This section describes Hadoop-related problems and potential solutions to these issues.
DataNode Access Errors
You may experience Hadoop access errors with GPText if any DataNodes in the Hadoop cluster reside in a multi-homed network. GPText uses an external IP address to access the HDFS NameNode. GPText encounters an error when the NameNode provides an internal IP address for a DataNode. In this situation, additional configuration is required to configure GPText to perform its own DNS resolution of DataNode host names.
Perform the following procedure to explicitly configure DNS resolution of DataNode host names:
Locate a local copy of the Hadoop authentication configuration directory that you previously uploaded to ZooKeeper. For example, if the directory is located at
/home/gpadmin/auths/hdfs_conf
:$ cd /home/gpadmin/auths/hdfs_conf $ ls core-site.xml hdfs-site.xml user.txt
Open
hdfs-site.xml
in the editor of your choice. For example:$ vi hdfs-site.xml
Add the following property block to the file, and then save the file and exit:
<property> <name>dfs.client.use.datanode.hostname</name> <value>true</value> </property>
This property allows GPText hosts to perform their own DNS resolution of HDFS DataNode hostnames.
Re-upload the modified configuration to ZooKeeper. For example, if the
hdfs_conf
directory includes the authentication configuration files for a Hadoop cluster with <config_name>hdfs_bill_auth
:$ cd .. $ gptext-external upload -t hdfs -c hdfs_bill_auth -p hdfs_conf
Determine the hostname-to-IP address mapping for all DataNodes, and add the associated entries into the
/etc/hosts
file on all GPText client hosts.
Kerberos-Related Errors
The following problems are specific to Hadoop clusters secured with Kerberos.
Clock Skew
A login attempt to a Hadoop cluster secured with Kerberos will fail if clock skew between GPText client hosts and the Kerberos KDC host is too great. In this situation, you may see the following error in the Solr log:
java.io.IOException
caused by a KrbException
noting “Clock skew too great”
To resolve this situation, ensure that the clocks on the Kerberos KDC host and GPText client hosts are synchronized.
Timeout Errors
A login attempt to a Hadoop cluster secured with Kerberos may fail with timeout errors when the kdc
and admin_server
settings in the krb5.conf
file are specified with a hostname, and the GPText client hosts cannot resolve the hostname. In this situation, you may see one of the following errors in the Solr log:
org.apache.solr.common.SolrException: Failed to login HDFS
message caused by ajava.io.IOException
specifyingjavax.security.auth.login.LoginException: Receive timed out
java.nio.channels.UnresolvedAddressException
withSocketIOWithTimeout
referenced in the stack trace
In this situation, you may choose either of the following:
Update the Kerberos
krb5.conf
file to specify thekdc
andadmin_server
settings using IP addresses.Or
Update all GPText hosts to perform their own DNS resolution of the Kerberos KDC server.
If you choose to update the krb5.conf
file:
Locate a local copy of the Hadoop Kerberos authentication configuration directory that you previously uploaded to ZooKeeper. For example, if the directory is located at
/home/gpadmin/auths/hdfs_kerb_conf
:$ cd /home/gpadmin/auths/hdfs_kerb_conf $ ls core-site.xml hdfs-site.xml keytab krb5.conf user.txt
Open
krb5.conf
in the editor of your choice. For example:$ vi krb5.conf
Replace the
KERBEROS
block attributes with their equivalent IP addresses and then save the file and exit. For example:[realms] KERBEROS = { kdc = <kdc_ipaddress> admin_server = <admin_server_ipaddress> }
Re-upload the modified configuration to ZooKeeper. For example, if the directory named
hdfs_kerb_conf
includes the authentication configuration files for a Hadoop cluster defined with the <config_name>hdfs_kerb_auth
:$ cd .. $ gptext-external upload -t hdfs -c hdfs_kerb_auth -p hdfs_kerb_conf
Alternately, if you choose to configure the GPText hosts to perform their own DNS resolution of the Kerberos KDC server, add an entry for the KDC hostname-to-IP address mapping to the /etc/hosts
file on all GPText client hosts.