Connect from Spark to AWS S3 via Assume Role credential
This is a non trivial one and caused me a day of hadache. I’m using
hadoop-aws 3.2.0 and the short answer is:
val hc = spark.sparkContext.hadoopConfiguration hc.set("fs.s3a.access.key", config.awsKey) hc.set("fs.s3a.secret.key", config.awsSecret) hc.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider") hc.set("fs.s3a.assumed.role.arn", "arn:aws:iam::...:role/...") hc.set("fs.s3a.assumed.role.sts.endpoint.region", config.awsRegion) System.setProperty("aws.region", config.awsRegion) // a hack
Assume role is only available since
hadoop-aws v3 (Spark 3 is using it already, but if you’re running Spark standalone, make sure you are). You can set it with
fs.s3a.assumed.role.arn property, and explicitly selecting
org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider as your credential provider. This does not exist in hadoop-aws 2.x, but if you really want to, you can implement a workaround. Another thing to mention is sometimes you will get the following exception when doing it the official way:
Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: Instantiate org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider on : com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region.: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region. at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:189) at org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProvider(S3AUtils.java:713) at org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:605)
no matter whether
fs.s3a.assumed.role.sts.endpoint.region is set or not. I found deep inside AWS SDK that assume role provider will actually read the region using standard configuration chain, which Spark does not populate properly. Luckily, one of the configuration providers that are easy to modify is a system property, hence the call to
System.setProperty. I hope this will be fixed some time in future.
Em, excuse me! Have Android 📱 and use Databricks? You might be interested in my totally free (and ad-free) Pocket Bricks . You can get it from Google Play too:
To contact me, send an email anytime or leave a comment below.