Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programming practices chapter (now on a branch, rather than a fork) #14

Merged
merged 15 commits into from
Nov 30, 2023

Conversation

jjc2718
Copy link
Collaborator

@jjc2718 jjc2718 commented Oct 19, 2023

Revised version of #9 - I think the automated steps should run now!

Purpose/implementation Section

What changes are being implemented in this Pull Request?

Adding a chapter that introduces good programming practices and how they relate to scientific software development, and a short intro to how automation fits in.

Tell potential reviewers what kind of feedback you are soliciting.

Any feedback would be great! Structure, text, concepts I'm missing, etc.

New Content Checklist

@github-actions
Copy link
Contributor

github-actions bot commented Oct 19, 2023

⚠️ broken urls ⚠️
There are broken urls that need to be addressed. Read this guide for more info.
Download the errors here.
Comment updated at 2023-11-30 with changes from d2bd74a

@github-actions
Copy link
Contributor

github-actions bot commented Oct 19, 2023

No spelling errors! 🎉
Comment updated at 2023-11-30 with changes from d2bd74a

@jjc2718 jjc2718 requested a review from cansavvy October 19, 2023 15:54
@github-actions
Copy link
Contributor

github-actions bot commented Oct 19, 2023

Re-rendered previews from the latest commit:

Updated at 2023-11-30 with changes from d2bd74a

Copy link
Contributor

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjc2718 This is great! Awesome to see this coming together! I have some ideas and comments. Happy to chat more about this tomorrow as well. Really shaping up nicely!

02-programming-practices.Rmd Outdated Show resolved Hide resolved
02-programming-practices.Rmd Outdated Show resolved Hide resolved
02-programming-practices.Rmd Outdated Show resolved Hide resolved

In both R and Python, a _t_-test is a very well-defined, specific function that takes two lists of numbers and returns the _t_-statistic and _p_-value.
Since this is a part of a standard, widely used library in each language, it is already tested as part of those libraries.
In your own software, you might need to do some verification of your input (for instance, what happens if you pass an empty list of numbers?) but probably not too much, since you can be fairly confident that the _t_-test function does what it is documented to do in the programming language you choose to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this sentence. Can you explain what you mean here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I'm trying to say is that a t-test (or the function to perform one) has a specific input and output, and is very commonly used in a wide variety of applications. So it's probably okay to assume that it's doing what its documentation says it will do, without verifying it yourself or writing your own tests for it, unless you're doing something really non-standard with it, or making a mistake in your own code (like the empty list example).

This would be in contrast to the sequencing pipeline example, where it's (as far as I know) not practical to have a single well-tested and widely used function that goes from raw reads to a volcano plot - there are a lot of very subjective decisions and data transformations that have to happen to get from point A to point B, and they can't be completely encapsulated or abstracted away, which makes it more important to make sure that each step is doing what you expect it to do.

I guess another way to think about this would be the difference between using a programming language as a calculator or a set of steps (which is more or less where I started, and where I think many people probably start), and using it to build more complex software with more extensive computing/testing requirements. I think as someone gravitates toward the latter, the argument for automation starts to make a bit more sense.

Does that make any sense to you? I guess I'll have to think about how to get all that across more concisely - let me know if you have ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think this is a good point. Let's just figure out how we can get this point across more concisely.

It sounds like to me you are saying that as the complexity of an analysis increases, so does the decisions and parameters surrounding it. This means its even more critical to carefully read and consider the documentation of the software you are using to figure out what best fits the goals for your data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably make a toy graph illustrating this idea. x axis is complexity of the analysis y is the number of decisions and parameters associated with the analysis.

(This idea would need some polishing).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a graph I added to the slides (I'll link to it in the part of the text you referenced):

Screen Shot 2023-11-26 at 4 08 39 PM

I also toyed with labeling the parts of the graph off the "y = x" line, but I'm not sure if this is helpful or counter-productive: maybe we shouldn't even mention overengineering since it's such an uncommon case in academic software (in my experience at least!)

Screen Shot 2023-11-26 at 4 08 55 PM

Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some minor edits to these slides but overall I love the concept!!! Great visual.

02-programming-practices.Rmd Outdated Show resolved Hide resolved
02-programming-practices.Rmd Outdated Show resolved Hide resolved
02-programming-practices.Rmd Outdated Show resolved Hide resolved
02-programming-practices.Rmd Outdated Show resolved Hide resolved
02-programming-practices.Rmd Show resolved Hide resolved
02-programming-practices.Rmd Show resolved Hide resolved
@cansavvy
Copy link
Contributor

I think this covers what we need! I'm going to merge!

@cansavvy
Copy link
Contributor

Note that the URL error warning is a bug and something I have to address elsewhere.

@cansavvy cansavvy merged commit e021a3e into main Nov 30, 2023
5 of 6 checks passed
@cansavvy cansavvy deleted the jjc2718/programming-practices branch November 30, 2023 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants