Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-5861] Correct shading Prefix #4545

Merged
merged 8 commits into from
Jan 23, 2023
Merged

Conversation

Reamer
Copy link
Contributor

@Reamer Reamer commented Jan 6, 2023

What is this PR for?

This pull request includes several things:

  • Changing the shading prefix so we don't have recursion. See MSHADE-252
  • Remove unnecessary Maven plugin executions.
  • Using grpc-netty-shaded as described in the official documentation.

What type of PR is it?

  • Refactoring

How should this be tested?

  • via CI

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@Reamer
Copy link
Contributor Author

Reamer commented Jan 6, 2023

Found the following error inside log. Therefore I created ZEPPELIN-5861

2023-01-06T14:50:16.2093997Z [INFO] Interpreter launch command: /opt/hostedtoolcache/Java_Temurin-Hotspot_jdk/8.0.352-8/x64/bin/java -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///home/runner/work/zeppelin/zeppelin/conf/log4j.properties -Dlog4j.configurationFile=file:///home/runner/work/zeppelin/zeppelin/conf/log4j2.properties -Dzeppelin.log.file=/home/runner/work/zeppelin/zeppelin/logs/zeppelin-interpreter-test-shared_process--fv-az335-610.log -Xmx1024m -cp :/home/runner/work/zeppelin/zeppelin/zeppelin-zengine/../local-repo/test/*:/home/runner/work/zeppelin/zeppelin/zeppelin-zengine/../interpreter_RemoteAngularObjectTest/test/*:/home/runner/work/zeppelin/zeppelin/zeppelin-interpreter-shaded/target/*:/home/runner/work/zeppelin/zeppelin/zeppelin-zengine/target/test-classes/*:::/home/runner/work/zeppelin/zeppelin/interpreter/zeppelin-interpreter-shaded-0.11.0-SNAPSHOT.jar:/home/runner/work/zeppelin/zeppelin/zeppelin-interpreter/target/classes:/home/runner/work/zeppelin/zeppelin/zeppelin-zengine/target/test-classes org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 127.0.0.1 42277 test-shared_process :
2023-01-06T14:50:16.2095863Z 
2023-01-06T14:50:16.2096005Z -------------------
2023-01-06T14:50:16.2096480Z 	at org.apache.zeppelin.interpreter.remote.RemoteAngularObjectTest.setUp(RemoteAngularObjectTest.java:72)
2023-01-06T14:50:16.2096958Z Caused by: java.io.IOException: 
2023-01-06T14:50:16.2097229Z Fail to launch interpreter process:
2023-01-06T14:50:16.2097501Z -------------------
2023-01-06T14:50:16.2100576Z Interpreter download command: /opt/hostedtoolcache/Java_Temurin-Hotspot_jdk/8.0.352-8/x64/bin/java -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///home/runner/work/zeppelin/zeppelin/conf/log4j.properties -Dlog4j.configurationFile=file:///home/runner/work/zeppelin/zeppelin/conf/log4j2.properties -Dzeppelin.log.file=/home/runner/work/zeppelin/zeppelin/logs/zeppelin-interpreter-test-shared_process--fv-az335-610.log -cp :/home/runner/work/zeppelin/zeppelin/zeppelin-zengine/../interpreter_RemoteAngularObjectTest/test/*:/home/runner/work/zeppelin/zeppelin/zeppelin-interpreter-shaded/target/*:/home/runner/work/zeppelin/zeppelin/zeppelin-zengine/target/test-classes/*:::/home/runner/work/zeppelin/zeppelin/interpreter/zeppelin-interpreter-shaded-0.11.0-SNAPSHOT.jar:/home/runner/work/zeppelin/zeppelin/zeppelin-interpreter/target/classes:/home/runner/work/zeppelin/zeppelin/zeppelin-zengine/target/test-classes org.apache.zeppelin.interpreter.remote.RemoteInterpreterDownloader 127.0.0.1 42277 test /home/runner/work/zeppelin/zeppelin/zeppelin-zengine/../local-repo/test
2023-01-06T14:50:16.2113231Z ##[error]Exception in thread "main" java.lang.IllegalArgumentException: Unable to create EvictionPolicy instance of type org/apache/zeppelin/shaded/org.apache.zeppelin.shaded.org.apache.commons.pool2.impl.DefaultEvictionPolicy
2023-01-06T14:50:16.2114631Z 	at org.apache.zeppelin.shaded.org.apache.commons.pool2.impl.BaseGenericObjectPool.setEvictionPolicyClassName(BaseGenericObjectPool.java:607)
2023-01-06T14:50:16.2115551Z 	at org.apache.zeppelin.shaded.org.apache.commons.pool2.impl.GenericObjectPool.setConfig(GenericObjectPool.java:321)
2023-01-06T14:50:16.2116321Z 	at org.apache.zeppelin.shaded.org.apache.commons.pool2.impl.GenericObjectPool.<init>(GenericObjectPool.java:117)
2023-01-06T14:50:16.2116986Z 	at org.apache.zeppelin.interpreter.remote.PooledRemoteClient.<init>(PooledRemoteClient.java:47)
2023-01-06T14:50:16.2117647Z 	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterEventClient.<init>(RemoteInterpreterEventClient.java:67)
2023-01-06T14:50:16.2118381Z 	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterDownloader.main(RemoteInterpreterDownloader.java:58)
2023-01-06T14:50:16.2119172Z Caused by: java.lang.ClassNotFoundException: org/apache/zeppelin/shaded/org.apache.zeppelin.shaded.org.apache.commons.pool2.impl.DefaultEvictionPolicy
2023-01-06T14:50:16.2119735Z 	at java.lang.Class.forName0(Native Method)
2023-01-06T14:50:16.2120042Z 	at java.lang.Class.forName(Class.java:348)
2023-01-06T14:50:16.2120715Z 	at org.apache.zeppelin.shaded.org.apache.commons.pool2.impl.BaseGenericObjectPool.setEvictionPolicyClassName(BaseGenericObjectPool.java:598)
2023-01-06T14:50:16.2121304Z 	... 5 more

@Reamer
Copy link
Contributor Author

Reamer commented Jan 6, 2023

Caused by: java.lang.ClassNotFoundException: org/apache/zeppelin/shaded/org.apache.zeppelin.shaded.org.apache.commons.pool2.impl.DefaultEvictionPolicy

looks like a shading issue. I was doing a little reading about shading and came across the following comment. Therefore the shading prefix change.

Please avoid shading a package into a subpackage of its own like com.fake to com.fake.shaded. You could potentially be getting some kind of recursive problem and depending on the algorithm used for replacing package and path names you could be lucky and it works or not. In your case it works for binaries because Maven Shade uses an ASM utility class for relocating binaries, which seems to do a good job.

https://issues.apache.org/jira/browse/MSHADE-252?focusedCommentId=17314377&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17314377

@Reamer
Copy link
Contributor Author

Reamer commented Jan 6, 2023

Next error

Error:  Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:3.2.2:shade (default) on project zeppelin-jupyter-interpreter-shaded: Error creating shaded jar: duplicate entry: META-INF/services/org.apache.zeppelin.jupyter.io.grpc.NameResolverProvider -> [Help 1]
Error:  
Error:  To see the full stack trace of the errors, re-run Maven with the -e switch.
Error:  Re-run Maven using the -X switch to enable full debug logging.
Error:  
Error:  For more information about the errors and possible solutions, please read the following articles:
Error:  [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
Error:  
Error:  After correcting the problems, you can resume the build with the command
Error:    mvn <args> -rf :zeppelin-jupyter-interpreter-shaded
Error: Process completed with exit code 1.

Looks like the following issue MSHADE-425

@jongyoul
Copy link
Member

jongyoul commented Jan 7, 2023

Oh.. Interesting.

@Reamer
Copy link
Contributor Author

Reamer commented Jan 9, 2023

The next error, which occurs frequently, is as follows.

2023-01-06T17:43:32.9579892Z [ERROR] testIPythonProcessKilled  Time elapsed: 3.305 s  <<< ERROR!
2023-01-06T17:43:32.9588899Z org.apache.zeppelin.interpreter.InterpreterException: java.lang.NoClassDefFoundError: io/grpc/stub/StreamObserver
2023-01-06T17:43:32.9597447Z 	at org.apache.zeppelin.spark.IPySparkInterpreterTest.startInterpreter(IPySparkInterpreterTest.java:98)
2023-01-06T17:43:32.9604208Z Caused by: java.lang.NoClassDefFoundError: io/grpc/stub/StreamObserver
2023-01-06T17:43:32.9612809Z 	at org.apache.zeppelin.spark.IPySparkInterpreterTest.startInterpreter(IPySparkInterpreterTest.java:98)
2023-01-06T17:43:32.9619547Z Caused by: java.lang.ClassNotFoundException: io.grpc.stub.StreamObserver
2023-01-06T17:43:32.9627967Z 	at org.apache.zeppelin.spark.IPySparkInterpreterTest.startInterpreter(IPySparkInterpreterTest.java:98)

We find the class io/grpc/stub/StreamObserver as a dependency in zeppelin-jupyter-interpreter.

<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-stub</artifactId>
<version>${grpc.version}</version>
</dependency>

zeppelin-jupyter-interpreter is a compile dependency of the spark-interpreter and should be included in the shaded artifact of spark-interpreter.

<dependency>
<groupId>org.apache.zeppelin</groupId>
<artifactId>zeppelin-jupyter-interpreter</artifactId>
<version>${project.version}</version>
<exclusions>
<exclusion>
<groupId>net.sf.py4j</groupId>
<artifactId>py4j</artifactId>
</exclusion>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>

At the moment I don't know where the problem is.

@Reamer Reamer force-pushed the ci-broken branch 2 times, most recently from d1445d5 to b9121dc Compare January 11, 2023 14:37
@Reamer
Copy link
Contributor Author

Reamer commented Jan 11, 2023

The following error seems to be a maven reactor issue. Running the tests with -am -Dtest=org/apache/zeppelin/spark/* seems to solve the issue (at least on my local machine)

2023-01-06T17:43:32.9579892Z [ERROR] testIPythonProcessKilled  Time elapsed: 3.305 s  <<< ERROR!
2023-01-06T17:43:32.9588899Z org.apache.zeppelin.interpreter.InterpreterException: java.lang.NoClassDefFoundError: io/grpc/stub/StreamObserver
2023-01-06T17:43:32.9597447Z 	at org.apache.zeppelin.spark.IPySparkInterpreterTest.startInterpreter(IPySparkInterpreterTest.java:98)
2023-01-06T17:43:32.9604208Z Caused by: java.lang.NoClassDefFoundError: io/grpc/stub/StreamObserver
2023-01-06T17:43:32.9612809Z 	at org.apache.zeppelin.spark.IPySparkInterpreterTest.startInterpreter(IPySparkInterpreterTest.java:98)
2023-01-06T17:43:32.9619547Z Caused by: java.lang.ClassNotFoundException: io.grpc.stub.StreamObserver
2023-01-06T17:43:32.9627967Z 	at org.apache.zeppelin.spark.IPySparkInterpreterTest.startInterpreter(IPySparkInterpreterTest.java:98)

@Reamer
Copy link
Contributor Author

Reamer commented Jan 12, 2023

Spark CI-Jobs seems to be running without errors.
https://github.com/apache/zeppelin/actions/runs/3894046270/jobs/6660597445

@jongyoul
Copy link
Member

jongyoul commented Jan 13, 2023

BTW, I see some different outputs for CI as well. Do you have any idea?

image

The current CI failure is because of this error.

@jongyoul
Copy link
Member

@jongyoul
Copy link
Member

I made #4550

@Reamer
Copy link
Contributor Author

Reamer commented Jan 13, 2023

@zjffdu
Do you have any idea how to fix the Flink compilation? I find the setup with the symlink of flink-scala-parent very unattractive, this does not seem clean.

@Reamer
Copy link
Contributor Author

Reamer commented Jan 17, 2023

The tests are running through. I will clean up the PR now.

@jongyoul
Copy link
Member

I think it's good but I just have one question. Is there any critical reason that we changed test to verify? It looks similar and verify is running after test and package, and sometimes it uses more time. So I just wonder why verify is better than test in our cases.

@Reamer
Copy link
Contributor Author

Reamer commented Jan 18, 2023

I think it's good but I just have one question. Is there any critical reason that we changed test to verify? It looks similar and verify is running after test and package, and sometimes it uses more time. So I just wonder why verify is better than test in our cases.

You are correct that verify comes far after test in the Maven lifecycle. This is required to enrich the Maven Reactor with the necessary artefacts (phase package) from required modules (flag -am).
My goal is to remove the install step, which usually comes before any test. This would then save time again.

@@ -434,9 +434,6 @@ public void run() {
InterpreterResult result = interpreter.interpret("import time\ntime.sleep(1000)",
getInterpreterContext());
waiter.assertEquals(InterpreterResult.Code.ERROR, result.code());
waiter.assertEquals(
"IPython kernel is abnormally exited, please check your code and log.",
result.message().get(0).getData());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During my tests result.message() was always null. We can investigate this later, as this has nothing to do with the issue.
In the summary of https://github.com/apache/zeppelin/actions/runs/3939354095 you can also see several errors which have errors because of this null.

The 0 errors indicate the null error:
grafik

After clicking on the message, the exception is displayed.
grafik

If you then look in the affected line, you will see a similar line.

Thread thread = new Thread() {
@Override
public void run() {
try {
InterpreterResult result = interpreter.interpret("ldf <- dapplyCollect(\n" +
" df,\n" +
" function(x) {\n" +
" Sys.sleep(3)\n" +
" x <- cbind(x, \"waiting_secs\" = x$waiting * 60)\n" +
" })\n" +
"head(ldf, 3)", context2);
assertTrue(result.message().get(0).getData().contains("cancelled"));
} catch (InterpreterException e) {
fail("Should not throw InterpreterException");
}
}
};
thread.setName("Cancel-Thread");
thread.start();

You should not make assertions within additional threads. See also: https://rules.sonarsource.com/java/RSPEC-2186

@@ -304,10 +304,6 @@
</executions>
</plugin>

<plugin>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a long time to figure out that this execution breaks the Maven build.
Until a few hours ago, I thought that this execution was just putting the dependencies under a location. What I didn't know is that this step also adds the dependency to the Maven reactor. This has led to various duplicate libraries within the Maven build, up to and including serious errors. It was hard to debug because the IDE ignores these dependencies.

The funny or sad thing, the build step is completely unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this kind of step is too hard to debug by IDE, but, unfortunately, Zeppelin has a bit lot of these types of modification. Hopefully, I would like to reduce/remove them.

@Reamer Reamer changed the title CI broken [ZEPPELIN-5861] Correct shading Prefix Jan 19, 2023
@Reamer Reamer requested a review from jongyoul January 19, 2023 17:11
@Reamer
Copy link
Contributor Author

Reamer commented Jan 23, 2023

Can someone do a review?

@jongyoul jongyoul merged commit 45b15fe into apache:master Jan 23, 2023
@Reamer Reamer deleted the ci-broken branch January 23, 2023 12:38
proceane pushed a commit to proceane/zeppelin that referenced this pull request Feb 16, 2023
* Add Process ErrorMessage

* change shade prefix

* Run with pre modules to fill the maven reactor

* spark-interpreter has no scala test so we can remove the maven plugin

* Fix flaky python test

* Use shaded version of io.grpc:grpc-netty (io.grrpc:grpc-netty-shaded)

* Adjust interpreter tests

* Remove maven-dependency-plugin, because downloaded artefacts are attached to classpath
@bigjar bigjar mentioned this pull request Mar 2, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants