-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faunus 0.10 is not compatible with Cloudera CDH4 #100
Comments
Subclassing the Mapper.Context class does not seem to work for both version, since in each version the constructor is different, but the subclass has to call the super constructor. Its probably possible to solve this somehow via reflection but that would not be an elegant solution. The MapSequence seems to run multiple Mappers during a single Mapper invocation from the framework. It looks like that the purpose of that is to run multiple transformations on the input data in a pipeline without running multiple MapReduce Jobs. Would it be possible to make these transformations without implementing the Mapper interface, and therefore eliminating the need to subclass Mapper.Context? |
The purpose of the MapSequence is to do in-memory mapping when you have a chain of mappers in a row. I would have to think about how to remove the need to subclass Mapper.Context. If you have an idea and can provide a pull request, that would be most appreciated. On Feb 19, 2013, at 7:32 AM, Joern Kottmann [email protected] wrote:
|
Ok, so we can have something like this: Mapper1 | Mapper2 | Mapper3 | Reducer. What do you think about ChainMapper to set up the Mappers? As far as I can see it is available in both versions. ChainMapper JavaDoc: |
ChainMapper doesn't work for mapreduce library -- only mapred. Hence the reason I created MapSequence :(. On Feb 19, 2013, at 10:14 AM, Joern Kottmann [email protected] wrote:
|
Hi all, I managed to get Faunus working with CDH 4.2. There were roughly two sets of changes I had to make :
The simplest way to get around these changes is to reimplement The right answer for Hadoop 2/ CDH compatibility is probably to create another build profile. MRUnit ( https://github.com/apache/mrunit/blob/trunk/pom.xml) does this particularly effectively. If there is an interest in going this route, or other suggestions on how to create a build that works with both versions, I would be happy to volunteer my time to implement For now, I've forked Faunus and implemented the fixes. The fork can be found at https://github.com/karkumar/faunus and the fix is in the cdh4-port branch. Thanks again! |
Hey guys, so I just did the update to 0.4.0 snapshot. Apache Hadoop 2 compatibility. The only change I had to make was to change instances of TaskAttemptContext to TaskAttemptContextImpl. Again the fork can be found at https://github.com/karkumar/faunus and the fix is in the cdh4-port branch. |
The problem with that (correct me if I'm wrong) is that TaskAttemptContextImpl does NOT work with Hadoop 1.y.z. Hadoop 2 has not seen a stable release yet. Until Apache Hadoop goes 2.0-stable, then we are going to stick with 1.y.z API. If you can figure out how to make it 2.0 AND 1.y.z compatible, I would definitely make that change immediately. On May 2, 2013, at 3:35 PM, Karthik Ramachandran [email protected] wrote:
|
Yup, thats correct. That said, there are only two classes that are really preventing Faunus from being Hadoop 2 compatible : MemoryMapper.MemoryMapContext and TaskAttemptContext. Really the only change between Hadoop 1 and Hadoop 2 is that these clases became abstract and their implementations were moved to impl classes in alternate packages. So you could just package your own version of MapContext and TaskAttemptContext in Faunus -- literally cut and paste them out of the Hadoop 1.y.z code base into Faunus--and then you should be able to run against either 1.y.z or 2.0. The solution isn't elegant, but it will probably work. You would probably also want to add a maven build profile that changes the Hadoop and MRUnit artifacts to the 2.0 artifacts. If you'd like I can experiment with this change in my fork. If not, the over head of keeping my fork up to date is fairly minor, so I can keep doing that and updating this ticket. Thanks for taking the time to think about my request, it's much appreciated. |
Please.
|
Hi, I think I have found a reasonable solution to this incompatibility, which allows for us to generate hadoop 1 compatible and hadoop 2 compatible binaries from the same code base. There are three different parts to the fix:
If we stop with steps 1 and 2 we create a faunus jars that should be compatible with Hadoop 1 and Hadoop 2. However, the distribution that is created is only compatible with Hadoop 1 because it includes the wrong Hadoop jars in the lib directory. So I created a build profile:
These changes can be found in the proxying-port branch of my fork : https://github.com/karkumar/faunus/tree/proxying-port This solution isn't ideal because it imposes a small cost on every context.write. However, if we clean up the proxy object a bit that cost should be relatively minor. Let me know if this is a viable option, I would be happy to spend some more time cleaning this up and testing. Right now, you should be able to build and run all the tests agains either build profile. All tests should pass. I've also tested the code against my Hadoop 2 cluster and it seems to work. I haven't tested agains a Hadoop 1 cluster. Again, thanks for taking the time to think about my request. |
I just wanted to revisit this issue in light of the recent release of 2.1.0. In 2.1.0 they claim that there is now source compatibility for jobs that use Hadoop 1.x Mapreduce APIs and Hadoop 2.0 (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html) Has anyone had a chance to check try this out with Faunus ? Is there any interest in creating multiple build profiles for Faunus? One that builds again Hadoop 1.0 jars and on that builds agains Hadoop 2.1 jars? Thanks |
Thanks for your work. I've used your MemoryMapper code and have created a Hadoop2 branch of Faunus: https://github.com/thinkaurelius/faunus/tree/hadoop2 |
We have a Cloudera CDH4 cluster and run into a compatibility issue with Faunus, CDH4 is based on Hadoop 2.x instead of Hadoop 1.x.
The Mapper.Context constructor signature changed and causes a NoSuchMethodError when called from Faunus.
Here is the stack trace:
java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.Mapper$Context.(Lorg/apache/hadoop/mapreduce/Mapper;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/mapreduce/TaskAttemptID;Lorg/apache/hadoop/mapreduce/RecordReader;Lorg/apache/hadoop/mapreduce/RecordWriter;Lorg/apache/hadoop/mapreduce/OutputCommitter;Lorg/apache/hadoop/mapreduce/StatusReporter;Lorg/apache/hadoop/mapreduce/InputSplit;)V
at com.thinkaurelius.faunus.mapreduce.MemoryMapper$MemoryMapContext.(MemoryMapper.java:32)
at com.thinkaurelius.faunus.mapreduce.MapSequence$Map.setup(MapSequence.java:30)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:263)
We are getting this error even so we are running MRv1.
The error above is the first one we got, there might be more compatibility issues.
The issue was first reported in the aureliusgraphs google group:
https://groups.google.com/forum/#!topic/aureliusgraphs/B3gvUWOQ2cA
The text was updated successfully, but these errors were encountered: