Can make the query couple of 10 times faster.
In order to debug a Presto query performance issue, I observed the
seeking in Sahara-extra is expensive and sometimes even unnecessary.
The best way to avoid the overhead and unnecessary calls of seeking
is to do it only when the client really needs the data.
After this changes, the same query in Presto able to run 30 times faster.
Both Presto and S3 clients have added the similar changes too.
Change-Id: I8586af0d481fd08d48620e699467280f7b93150a
Sources were obtained from https://issues.apache.org/jira/secure/attachment/12583703/HADOOP-8545-033.patch
by running "patch" command. All the files related to Hadoop-common were skiped during patching.
Changes were made after patching:
* pom.xml was updated to use hadoop-core 1.1.2 dependency
* removed dependency on 2.x hadoop in code (@Override and isDirectory() -> isDir())
* removed Hadoop 2.X tests
There are no unit-tests, only integration.
Change-Id: I8d7c2f544d14f79597fcdefe27ecae0d43b6df9e