v1.8.0
Milestone
Changelog
New Feature and Notable Changes
- ORC-450 Support selecting list indices without materializing list items
- ORC-824 Add column statistics for List and Map
- ORC-1004 Java ORC writer supports the selection vector
- ORC-1075 Support reading ORC files with no column statistics
- ORC-1125 Support decoding decimals in RLE
- ORC-1136 Optimize reads by combining multiple reads without significant separation into a single read
- ORC-1138 Seek vs Read Optimization
- ORC-1172 Add row count limit config for one stripe
- ORC-1212 Upgrade protobuf-java to 3.17.3
- ORC-1220 Set min.hadoop.version to 2.7.3
- ORC-1248 Redefine Hadoop dependency for Apache ORC 1.8.0
- ORC-1256 Publish test-jar to maven central
- ORC-1260 Publish
shaded-protobuf
classifier artifacts
Improvement
- ORC-825 Use Empty Array For Collections toArray
- ORC-826 Do Not Use Collection Contains/Get
- ORC-828 Improve Fetch Data Set Process
- ORC-829 Optimize Serialization percentileBits
- ORC-831 Do Not Copy String When Flushing Dictionary
- ORC-833 RunLengthIntegerReaderV2 Calculate Batch Size Once
- ORC-834 Do Not Convert to String in DecimalFromTimestampTreeReader
- ORC-835 Cache TRUE/FALSE Bytes in StringGroupFromBooleanTreeReader
- ORC-836 StringGroupFromDoubleTreeReader Use Double toString
- ORC-837 Reuse HiveDecimalWritable in ConvertTreeReaderFactory
- ORC-838 Simplify compareTo/equals/putBuffer of ByteBufferAllocatorPool
- ORC-840 Remove Superfluous Array Fill in RecordReaderImpl
- ORC-841 Remove Superfluous Array Fill in StringHashTableDictionary
- ORC-842 Remove newKey from StringHashTableDictionary
- ORC-844 Improve hashCode Methods
- ORC-847 Do Not Create Empty Array in StringGroupFromBinaryTreeReader
- ORC-852 Allow DynamicByteArray to Return a ByteBuffer
- ORC-853 Optimize writeDouble Implementation
- ORC-855 Remove Unused isRepeating from RunLengthIntegerReaderV2
- ORC-865 Bump opencsv from 3.9 to 5.5.1
- ORC-883 Dependency Audit and QA
- ORC-897 optimization loop termination condition in readerIsCompatible method
- ORC-935 Bump commons-csv from 1.8 to 1.9.0
- ORC-937 Replace deprecated method
- ORC-958 Convert command support overwrite option
- ORC-969 Evaluate SearchArguments using file and stripe level stats
- ORC-975 Avoid double counting closestFixedBits in percentileBits method
- ORC-982 Extract checkstyle to a single file, help newcomers check code style
- ORC-988 Bump opencsv from 5.5.1 to 5.5.2
- ORC-992 Reached max repeat length, we can directly decide to use DELTA encoding
- ORC-1005 Make that the java and C++ implementations of determineEncoding in RunLengthIntegerWriterV2 are consistent.
- ORC-1007 Fix a warning from the shade plugin
- ORC-1013 Renaming a parameter in constructors of TreeWriter's derived classes
- ORC-1014 Add details when we get IOExceptions from file system
- ORC-1020 Improve orc::RleDecoderV2::nextDirect
- ORC-1027 Filter processing to allow filter injections that cannot be represented via SArgs
- ORC-1047 Handle quoted field names during string schema parsing
- ORC-1077 Remove commons-codec dependency and use java.util.Base64
- ORC-1099 Extend ReadIntent to support MAP and UNION type
- ORC-1101 Improve malformed STRUCT handling
- ORC-1122 Add buffer to decode the whole run in RleDecoderV2
- ORC-1137 Improve float/double conversion in DoubleColumnReader::next()
- ORC-1149 Bump slf4j.version to 1.7.36
- ORC-1150 Improve RowReaderImpl::computeBatchSize()
- ORC-1152 Support encoding short decimals in RLEv2
- ORC-1156 Update opencsv to 5.6
- ORC-1163 Bump zookeeper from 3.7.0 to 3.8.0
- ORC-1169 Use Hadoop 3.3.2 on Java 17+
- ORC-1178 Use hadoop 3.3.3 on Java 17+
Bug
- ORC-845 Fix NPE in DynamicIntArray toString
- ORC-929 Fix NaN at orc-tools 'meta' command
- ORC-1129 The build of tool-test should depends on cpp tools
- ORC-1159 Crash when the last stripe is skipped
- ORC-1242 Bump threeten-extra to 1.7.1
Test
- ORC-860 Add dependabot
- ORC-864 Bump jackson.version from 2.12.2 to 2.12.4
- ORC-877 Bump junit-vintage-engine from 5.7.0 to 5.7.2
- ORC-888 Bump objenesis from 3.1 to 3.2
- ORC-905 Add an integration test for
example
- ORC-917 Bump mockito-core from 3.7.0 to 3.11.2
- ORC-919 Spark bench objenesis should be the same as Spark.
- ORC-920 Use junit.version and mockito.version property and bump junit to 5.7.2
- ORC-925 Simplify assertions
- ORC-928 Bump checkstyle from 8.44 to 8.45.1
- ORC-932 Bump byte-buddy from 1.10.19 to 1.11.12
- ORC-934 Add integration tests for Java bench
- ORC-940 Use Hadoop 3.3.1 in bench module
- ORC-955 Add Javadoc generation GitHub Action job
- ORC-963 Build
benchmark
module always for integration testing - ORC-966 Bump byte-buddy from 1.11.12 to 1.11.13
- ORC-967 Bump mockito.version from 3.11.2 to 3.12.1
- ORC-986 Bump mockito.version from 3.12.1 to 3.12.4
- ORC-987 Bump jackson.version from 2.12.4 to 2.12.5
- ORC-1001 Bump maven-enforcer-plugin to 3.0.0
- ORC-1019 Remove redundant jackson dependencies
- ORC-1022 Bump byte-buddy from 1.11.13 to 1.11.19
- ORC-1038 Bump mockito.version from 3.12.4 to 4.0.0
- ORC-1074 Bump byte-buddy from 1.11.19 to 1.12.6
- ORC-1079 Add Linux clang GitHub Action job
- ORC-1085 Bump auto-service from 1.0 to 1.0.1
- ORC-1089 Add test cases verifying writers with selected vector
- ORC-1104 Use Spark 3.2.1 in benchmark
- ORC-1107 Fix NPE at benchmark data schema loading
- ORC-1110 Bump mockito.version from 4.0.0 to 4.3.1
- ORC-1126 Bump byte-buddy from 1.12.6 to 1.12.8
- ORC-1139 Benchmark for Seek vs Read
- ORC-1141 Bump mockito.version from 4.3.1 to 4.4.0
- ORC-1145 Add Java 18 to GitHub Action CI.
- ORC-1153 Bump byte-buddy from 1.12.8 to 1.12.9
- ORC-1157 Update guava to 31.1-jre
- ORC-1168 Update byte-buddy to 1.12.10
- ORC-1177 Upgrade mockito.version to 4.5.1
- ORC-1179 Upgrade checkstyle to 10.2 on Java 11+
- ORC-1187 Use main instead of master in merge_orc_pr.py
- ORC-1194 Bump mockito.version to 4.6.0
- ORC-1195 Bump checkstyle to 10.3
- ORC-1196 Add spark benchmark integration tests to GHA
- ORC-1197 Bump mockito.version from 4.6.0 to 4.6.1
- ORC-1201 Remove Debian 9 from Docker Tests
- ORC-1203 Bump maven-enforcer-plugin to 3.1.0
- ORC-1206 Bump netty-all to 4.1.78.Final
- ORC-1207 Upgrade Spark to 3.3.0
- ORC-1208 Bump byte-buddy to 1.12.12
- ORC-1209 Bump checkstyle to 10.3.1
- ORC-1234 Upgrade objenesis to 3.2 in Spark benchmark
- ORC-1236 Bump checkstyle to 10.3.2
- ORC-1243 Bump byte-buddy to 1.12.13
- ORC-1253 Add Fedora 37 docker test
- ORC-1254 Add spotbugs check
Task
- ORC-868 Pin gson to 2.2.4
- ORC-869 Pin jmh 1.20
- ORC-872 Bump kryo-shaded from 3.0.3 to 4.0.2
- ORC-874 Bump zookeeper from 3.6.2 to 3.7.0
- ORC-884 Bump jettison from 1.1 to 1.4.1
- ORC-887 Remove ORC Twitter link from
news
page - ORC-890 Pin minimum support Hadoop version to 2.2.0
- ORC-892 Pin scala-library to 2.12.10
- ORC-898 Bump threeten-extra from 1.5.0 to 1.7.0
- ORC-899 Archive Apache ORC 1.4.x in
releases
page - ORC-900 Update doap_orc.rdf for Apache Projects page
- ORC-908 Use https instead of http for website links in
pom.xml
- ORC-914 Pin maven-dependency-plugin to 3.1.2
- ORC-916 Bump annotations from 17.0.0 to 21.0.1
- ORC-918 Pin protobuf-java to 2.5.0
- ORC-923 Bump apache from 23 to 24
- ORC-946 Unified json library
- ORC-949 Add CustomImportOrder rule
- ORC-956 Bump annotations from 21.0.1 to 22.0.0
- ORC-977 Update webpages and TestVectorOrcFile.java to be more neutral
- ORC-1045 Bump commons-cli to 1.5
- ORC-1056 Bump annotations from 22.0.0 to 23.0.0
- ORC-1103 Use Maven 3.8.4
- ORC-1140 Documentation for Seek vs Read
- ORC-1158 Add notification settings to .asf.yam
- ORC-1162 Fix Apache Project Website Checks Warningl
- ORC-1165 Enable GitHub Action in branch-1.8
- ORC-1166 Enable snapshot publishing in branch-1.8
- ORC-1171 Skip build and test on docker and site updates
- ORC-1173 Pin jodd-core to 3.5.2
- ORC-1176 Upgrade maven-jar-plugin to 3.2.2
- ORC-1185 Add merge_orc_pr.py
- ORC-1210 Upgrade maven to 3.8.6
- ORC-1216 Pin org.jetbrains.annotations dependency to 17.0.0
- ORC-1211 Upgrade maven-assembly-plugin to 3.4.0
- ORC-1214 Bump maven-assembly-plugin to 3.4.1
- ORC-1217 Downgrade org.jetbrains.annotations to 17.0.0
- ORC-1223 Move DirectDecompressWrapper to org.apache.orc.impl
- ORC-1224 Move getDecompressor to HadoopShimsCurrent
- ORC-1226 Add a deprecation warning for Hadoop 2.7.2 and below
- ORC-1229 Move KeyProviderImpl to org.apache.orc.impl
- ORC-1230 Move encryption utility functions to HadoopShimsCurrent
- ORC-1246 Revamp ORC Website
- ORC-1247 Improve Apache ORC website and docs
- ORC-1249 Move site/_docs/releases.md to site/releases/index.md
- ORC-1255 Fix ORC website navbar highlight
- ORC-1257 Publish multi-architecture ORC-dev docker images
- ORC-1261 Rename shaded pattern
com.google.protobuf25
toorg.apache.orc.protobuf
- ORC-1263 Add decimal type to ORC Website
- ORC-1221 Move NullKeyProvider to org.apache.orc.impl