Protecting Sensitive Information in Hadoop Cluster

hadoop Jan 19, 2021

Today, security is the main concern to everyone and when you product need to be deployed on premises there are few things which need to be provided to our application, a very basic example is database password, today industries are not ready to put them in a configuration file in cleartext format, everyone is looking for encryption. Which is now commonly known as Vault.

Here I’ve prepared a working vault using Hadoop credential provider api.

Setting Up Hadoop Credential Provider API (Hadoop Vault)


Passwordless

This command will generate hdfs.jceks file on HDFS: [Hence no need to localise]

HDFS: Create alias and save password

hadoop credential create db.password -value db_123 -provider jceks://hdfs/credentials/hdfs.jceks

This command will generate hdfs.jceks file on local FS:

FS: Create alias and save password

hadoop credential create db.password -value db_123 -provider jceks://file/credentials/file.jceks

Java API to access the password:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.security.alias.CredentialProviderFactory
 
object HC {
 
  def main(args: Array[String]): Unit = {
 
//    val path = "jceks://file/home/ec2-user/example/file.jceks"
    val path = "jceks://hdfs/credentials/hdfs.jceks"
    val conf = new Configuration()
 
    val provider = conf.get(path)
    conf.set(CredentialProviderFactory.CREDENTIAL_PROVIDER_PATH, path)
 
    val credentialProvider = CredentialProviderFactory.getProviders(conf).get(0)
    println(credentialProvider.getAliases)
    val password = credentialProvider.getCredentialEntry("db.password").getCredential.mkString
 
    println(password)
  }
 
}

output

[db.password, aws.secret.key.password]
db_123

With Password

A. Setting password using environment variable
Set Password

export HADOOP_CREDSTORE_PASSWORD=TEST-password@12

B. There is another option as well to put password in a file and make it available on Hadoop classpath.

  1. The name of the file can be specified by                  `hadoop.security.credstore.java-keystore-provider.password-file` property and then Hadoop will search           for this file name on classpath and then it will get the password from file.

  2. HDFS: Create alias and save password

hadoop credential -Dhadoop.security.credstore.java-keystore-provider.password-file=hdfs.jceks.password create db.password -value db_123 -provider jceks://hdfs/credentials/hdfs.jceks

  hdfs.jceks.password is the password file name.

  3. `hadoop.security.credstore.java-keystore-provider.password-file`       this property can also be the part of core-site.xml but then it will be wide visible       to all the jobs working on same cluster.

Tags

Kshitij

Lead Engineer at Tookitaki