How do I force all Spark users to use the AuthZ plugin when they run jobs on my YARN cluster? #6934
Unanswered
orthoxerox
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Here's the situation: I have a Hadoop cluster with HDFS, YARN, Kyuubi, Hive (including Hive Metastore) and Ranger.
There's table
some_data
in Hive that has columnis_hidden
with valuestrue
andfalse
. The user must never see the hidden values.I have a policy in Ranger Hadoop SQL plugin that restricts the user only to values "is_hidden = false".
I have a policy in Ranger HDFS plugin that restricts the user from accessing the location of the table.
If the user accesses the table via HDFS, the namenode checks the Ranger HDFS policy and blocks the user.
If the user accesses the table via Hive, Hive checks the Ranger Hadoop SQL policy, accesses HDFS as
hive
and applies the policy before returning the dataset to the user, filtering out the hidden rows.If the user accesses the table via Spark, Spark doesn't check the Ranger Hadoop SQL policy, accesses HDFS as
spark
and returns the whole dataset, unless the Spark job is run with the AuthZ plugin enabled.The user has full control over his configuration files and is free to remove any references to Kyuubi from them. How can I ensure that no matter how the user runs his Spark application (local mode, client mode, cluster mode), operations
spark.read.table("some_data")
orspark.read.parquet("/path/to/some_data")
will always fail unless Kyuubi Ranger plugin is enabled and working correctly?Spark running in local mode will try to access HDFS as the user himself and will fail.
Spark running in client mode or cluster mode will try to access HDFS as
spark
. I thought I could prevent this by blocking network access to YARN ResourceManager and forcing users to access it through Kyuubi, but I was wrong, Kyuubi doesn't provide a YARN-compatible client API.The docs say that "Kyuubi provides a mechanism to ban security configurations to enhance the security of production environments", but how can I do that?
Beta Was this translation helpful? Give feedback.
All reactions