New features for seqminer (8.0) #12

WenjianBI · 2020-07-28T20:55:59Z

Hi Xiaowei,

I am using seqminer (v8.0) and it works pretty well under multiple OS. I am wondering if you can add some features to the current functions.

Usually, we do not need all subjects in analysis. So, for readBGENToMatrixByRange() and readVCFToMatrixByRange(), can you add one more argument such as 'subjIDs' or 'subjIndex' to specify the subjects in analysis. That can save a lot of memory sometimes.
Can you add one more function to split all markers into multiple ranges, and each range includes similar number of markers. When conducting a genome-wide analysis, we cannot put the genotype of all markers into memory. Hence, this function can greatly help us for that purpose. If possible, I suggest the new function should be like splitRange(fileName, memoryChunk = 4GB, subjIDs, ...). Output can be a data.frame object in which each row is for one range.
Sometimes, the plink bed/bim/fam files or bgen bgen/bgi files have different prefix names. I am wondering if you can let users specify the different names for different files. That would be also helpful.

Thanks,
Wenjian

zhanxw · 2020-07-28T23:33:21Z

That’s all very helpful suggestions. Thanks and I will implement those.

…

Sent from my iPhone

On Jul 28, 2020, at 3:56 PM, Wenjian Bi ***@***.***> wrote: Hi Xiaowei, I am using seqminer (v8.0) and it works pretty well under multiple OS. I am wondering if you can add some features to the current functions. Usually, we do not need all subjects in analysis. So, for readBGENToMatrixByRange() and readVCFToMatrixByRange(), can you add one more argument such as 'subjIDs' or 'subjIndex' to specify the subjects in analysis. That can save a lot of memory sometimes. Can you add one more function to split all markers into multiple ranges, and each range includes similar number of markers. When conducting a genome-wide analysis, we cannot put the genotype of all markers into memory. Hence, this function can greatly help us for that purpose. If possible, I suggest the new function should be like splitRange(fileName, memoryChunk = 4GB, subjIDs, ...). Output can be a data.frame object in which each row is for one range. Sometimes, the plink bed/bim/fam files or bgen bgen/bgi files have different prefix names. I am wondering if you can let users specify the different names for different files. That would be also helpful. Thanks, Wenjian — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

WenjianBI · 2020-07-28T23:41:02Z

Thank you for the swift reply. Bgen files are becoming more and more popular and I think your package can be a very important tool for R users.

garyzhubc · 2021-02-10T23:44:41Z

I think it'd be great if there is an option load a matrix from readBGENToMatrixByRange indexed by rsid instead of position.

zhanxw · 2021-02-11T03:29:24Z

Thanks for the suggestion, but managing rsid is quite challenging as they can change over time (rs ids can merge or becomes invalid across releases).

…

Sent from my iPhone

On Feb 10, 2021, at 5:44 PM, Peiyuan Zhu ***@***.***> wrote: I think it'd be great if there is an option load a matrix from readBGENToMatrixByRange indexed by rsid instead of position. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

garyzhubc · 2021-02-12T02:07:33Z

Missing data imputation can be an important feature to have. I wonder how missing genotype is handled in the current version.

zhanxw · 2021-02-12T02:52:32Z

If a genotype is missing in BGEN, you will get NA as the genotype. Best, Xiaowei

…

On Thu, Feb 11, 2021 at 8:07 PM Peiyuan Zhu ***@***.***> wrote: Missing data imputation can be an important feature to have. I wonder how missing genotype is handled in the current version. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABGRCHLX7WLVFL77LZ6ZMDS6SEPFANCNFSM4PK5HAWQ> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features for seqminer (8.0) #12

New features for seqminer (8.0) #12

WenjianBI commented Jul 28, 2020

zhanxw commented Jul 28, 2020 via email

WenjianBI commented Jul 28, 2020

garyzhubc commented Feb 10, 2021

zhanxw commented Feb 11, 2021 via email

garyzhubc commented Feb 12, 2021

zhanxw commented Feb 12, 2021 via email

New features for seqminer (8.0) #12

New features for seqminer (8.0) #12

Comments

WenjianBI commented Jul 28, 2020

zhanxw commented Jul 28, 2020 via email

WenjianBI commented Jul 28, 2020

garyzhubc commented Feb 10, 2021

zhanxw commented Feb 11, 2021 via email

garyzhubc commented Feb 12, 2021

zhanxw commented Feb 12, 2021 via email