Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Using [session user] as [proxy user] to execute statements at the server share level #5687

Open
3 of 4 tasks
theoryxu opened this issue Nov 14, 2023 · 8 comments · May be fixed by #5772
Open
3 of 4 tasks

Comments

@theoryxu
Copy link

theoryxu commented Nov 14, 2023

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

At the server share level, If I don't use the Ranger AuthZ Plugin, multiple users use the system user's credential, who submits the spark job, to access HDFS, rather than their owner credential.

So on the HDFS side, all actions are executed as the system user, so the HDFS Ranger Plugin loses control.

I want to find a way for all session users can use their own Identity to access HDFS. Thus at the server share level, session users could be under minimum access control.

How should we improve?

I read Kyuubi's code And found the function setting appropriate configs for each session user. 👇

https://github.com/apache/kyuubi/blob/master/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkOperation.scala#L130

Maybe we can set [session user] as [proxy user] to execute statements, like this: 👇

image

Are there some lurking issues or concerns at the system level?

Looking forward to your opinions very much!

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
  • No. I cannot submit a PR at this time.
Copy link

Hello @theoryxu,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.

@pan3793
Copy link
Member

pan3793 commented Nov 16, 2023

@theoryxu we can do that, for GROUP and SERVER share level, configurablly.

actually, this only covers the execution at the driver side, such as DDL commands, when the query triggers a Job, the task executed on the executor side still uses the application user.

@pan3793
Copy link
Member

pan3793 commented Nov 16, 2023

I know that STS supports that, I suppose it has same issue I mentioned above, also cc @wangyum

@theoryxu
Copy link
Author

@pan3793
Is anyone working on this? If not, I can submit a PR for it recently and then discuss it in more detail.

@theoryxu theoryxu reopened this Nov 20, 2023
@pan3793
Copy link
Member

pan3793 commented Nov 20, 2023

@theoryxu please go head, also cc @yaooqinn and @cxzl25 (we have discussed this topic offline before)

@yaooqinn
Copy link
Member

OK, that sounds reasonable.

theoryxu pushed a commit to theoryxu/kyuubi that referenced this issue Nov 24, 2023
…ements at the server and group share level
@theoryxu
Copy link
Author

@pan3793 Hi bro, I have submitted a PR. Could you review it? Thanks.

@pan3793
Copy link
Member

pan3793 commented Nov 27, 2023

@theoryxu ack. thanks for your contribution, I will take a look soon

theoryxu pushed a commit to theoryxu/kyuubi that referenced this issue Dec 1, 2023
…ements at the server and group share level; Kerberos Compatible
theoryxu pushed a commit to theoryxu/kyuubi that referenced this issue Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants