-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New features for seqminer (8.0) #12
Comments
That’s all very helpful suggestions. Thanks and I will implement those.
…Sent from my iPhone
On Jul 28, 2020, at 3:56 PM, Wenjian Bi ***@***.***> wrote:
Hi Xiaowei,
I am using seqminer (v8.0) and it works pretty well under multiple OS. I am wondering if you can add some features to the current functions.
Usually, we do not need all subjects in analysis. So, for readBGENToMatrixByRange() and readVCFToMatrixByRange(), can you add one more argument such as 'subjIDs' or 'subjIndex' to specify the subjects in analysis. That can save a lot of memory sometimes.
Can you add one more function to split all markers into multiple ranges, and each range includes similar number of markers. When conducting a genome-wide analysis, we cannot put the genotype of all markers into memory. Hence, this function can greatly help us for that purpose. If possible, I suggest the new function should be like splitRange(fileName, memoryChunk = 4GB, subjIDs, ...). Output can be a data.frame object in which each row is for one range.
Sometimes, the plink bed/bim/fam files or bgen bgen/bgi files have different prefix names. I am wondering if you can let users specify the different names for different files. That would be also helpful.
Thanks,
Wenjian
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Thank you for the swift reply. Bgen files are becoming more and more popular and I think your package can be a very important tool for R users. |
I think it'd be great if there is an option load a matrix from readBGENToMatrixByRange indexed by rsid instead of position. |
Thanks for the suggestion, but managing rsid is quite challenging as they can change over time (rs ids can merge or becomes invalid across releases).
…Sent from my iPhone
On Feb 10, 2021, at 5:44 PM, Peiyuan Zhu ***@***.***> wrote:
I think it'd be great if there is an option load a matrix from readBGENToMatrixByRange indexed by rsid instead of position.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Missing data imputation can be an important feature to have. I wonder how missing genotype is handled in the current version. |
If a genotype is missing in BGEN, you will get NA as the genotype.
Best,
Xiaowei
…On Thu, Feb 11, 2021 at 8:07 PM Peiyuan Zhu ***@***.***> wrote:
Missing data imputation can be an important feature to have. I wonder how
missing genotype is handled in the current version.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#12 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABGRCHLX7WLVFL77LZ6ZMDS6SEPFANCNFSM4PK5HAWQ>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Xiaowei,
I am using seqminer (v8.0) and it works pretty well under multiple OS. I am wondering if you can add some features to the current functions.
Usually, we do not need all subjects in analysis. So, for readBGENToMatrixByRange() and readVCFToMatrixByRange(), can you add one more argument such as 'subjIDs' or 'subjIndex' to specify the subjects in analysis. That can save a lot of memory sometimes.
Can you add one more function to split all markers into multiple ranges, and each range includes similar number of markers. When conducting a genome-wide analysis, we cannot put the genotype of all markers into memory. Hence, this function can greatly help us for that purpose. If possible, I suggest the new function should be like splitRange(fileName, memoryChunk = 4GB, subjIDs, ...). Output can be a data.frame object in which each row is for one range.
Sometimes, the plink bed/bim/fam files or bgen bgen/bgi files have different prefix names. I am wondering if you can let users specify the different names for different files. That would be also helpful.
Thanks,
Wenjian
The text was updated successfully, but these errors were encountered: