Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to enable debug log in spark-history-server #21

Open
morgantankim opened this issue Aug 14, 2024 · 1 comment
Open

How to enable debug log in spark-history-server #21

morgantankim opened this issue Aug 14, 2024 · 1 comment

Comments

@morgantankim
Copy link

morgantankim commented Aug 14, 2024

I couldn't find any error logs in the spark-history-server pod, but it keeps restarting.
I want to change the logging level to DEBUG to identify the root cause.
I tried doing this, but it didn't work.

-Dlog4j.rootCategory=DEBUG,console
@morgantankim
Copy link
Author

I resolved this issue by creating a custom image based on varabonthu/spark-web-ui:1.0.6.
I added a custom log4j.properties file to /spark/opt/conf, which allowed me to identify the root cause of the problem.

The issue stems from the fact that we're not permitted to access S3 via the public network, but the system attempts to connect to S3 using the *.s3.amazonaws.com domain.

Caused by: java.net.UnknownHostException: BUCKET_NAME.s3.amazonaws.com: Name or service not known
	at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)
	at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)
	at com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)
	at com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)

To access S3 via a VPC Endpoint, I added the following option to sparkHistoryOpts:

sparkHistoryOpts: "-Dspark.history.fs.logDirectory=s3a://LOG_PATH/ -Dspark.hadoop.fs.s3a.endpoint=s3.ap-northeast-2.amazonaws.com -Dspark.hadoop.fs.s3a.endpoint.region=ap-northeast-2"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant