We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
目前loader读取parquet文件时,所有的列都读出来了,没有考虑field_mapping只用到了其中部分列的情况
例如如果原始表有 100 列, 但是实际入图只需要 10 列,全部读取再从里面去把需要的10列捞出来,这样发挥不出来 parquet 列存储的优势,读取很慢,而且传递的 map 消耗了许多不必要的内存
parquet
The text was updated successfully, but these errors were encountered:
感谢反馈, 后续确认一下问题会安排排期优化, 如果你有简单的优化方案, 欢迎贡献 PR 提交一下代码, 我们也会给出相应的建议 😃
Sorry, something went wrong.
我看了代码,主要是HDFSFileReader类只拿到了source的信息,拿不到fieldMapping的信息,所以没法只读取需要的字段,得再InputReader.create就把顶点/边的mapping信息全部带过来才行,包括JDBC其实也一样,只需要select 需要的字段就行了
No branches or pull requests
Feature Description (功能描述)
目前loader读取parquet文件时,所有的列都读出来了,没有考虑field_mapping只用到了其中部分列的情况
例如如果原始表有 100 列, 但是实际入图只需要 10 列,全部读取再从里面去把需要的10列捞出来,这样发挥不出来
parquet
列存储的优势,读取很慢,而且传递的 map 消耗了许多不必要的内存The text was updated successfully, but these errors were encountered: