Skip to content

adithyapathipaka/ExcelRecordReaderMapReduce

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExcelRecordReaderMapReduce

MapReduce Input format for hadoop mapreduce to read Microsoft Excel spreadsheet

License

Apache licensed.

Usage

1. Download and run ant.
2. Include ExcelRecordReaderMapReduce-0.0.1-SNAPSHOT.jar in your environment
3. Use ExcelInputFormat class as Mapper's input format.

Check src/test/resource/test.xls to see demofile. The key returned is the file offset which starts with zero and value is the all columns value for single row. Zip files are not supported
Execute the job as
> hadoop jar ExcelRecordReaderMapReduce-0.0.1-SNAPSHOT-jar-with-dependencies.jar in out

After the job has completed you can examine the contents of the output directory in HDFS.

hadoop fs -cat out/part*

0 Buffet Jimmy Somewhere on the Beach Key West FL
1 Bush George 1600 Pennsylvania Ave Washington DC
2 Cartman Eric 84 Bigboned Way South Park CO
3 Crockett Davey The Alamo San Antonio TX
4 Doe Jane 821 Zimbabwe Ave DC
5 Gates Bill 1 Microsoft Way Redmond WA

About

MapReduce InputFormat that can read Excel files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%