-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Programming practices chapter (now on a branch, rather than a fork) #14
Conversation
|
No spelling errors! 🎉 |
Re-rendered previews from the latest commit:
Updated at 2023-11-30 with changes from d2bd74a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjc2718 This is great! Awesome to see this coming together! I have some ideas and comments. Happy to chat more about this tomorrow as well. Really shaping up nicely!
|
||
In both R and Python, a _t_-test is a very well-defined, specific function that takes two lists of numbers and returns the _t_-statistic and _p_-value. | ||
Since this is a part of a standard, widely used library in each language, it is already tested as part of those libraries. | ||
In your own software, you might need to do some verification of your input (for instance, what happens if you pass an empty list of numbers?) but probably not too much, since you can be fairly confident that the _t_-test function does what it is documented to do in the programming language you choose to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by this sentence. Can you explain what you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I'm trying to say is that a t-test (or the function to perform one) has a specific input and output, and is very commonly used in a wide variety of applications. So it's probably okay to assume that it's doing what its documentation says it will do, without verifying it yourself or writing your own tests for it, unless you're doing something really non-standard with it, or making a mistake in your own code (like the empty list example).
This would be in contrast to the sequencing pipeline example, where it's (as far as I know) not practical to have a single well-tested and widely used function that goes from raw reads to a volcano plot - there are a lot of very subjective decisions and data transformations that have to happen to get from point A to point B, and they can't be completely encapsulated or abstracted away, which makes it more important to make sure that each step is doing what you expect it to do.
I guess another way to think about this would be the difference between using a programming language as a calculator or a set of steps (which is more or less where I started, and where I think many people probably start), and using it to build more complex software with more extensive computing/testing requirements. I think as someone gravitates toward the latter, the argument for automation starts to make a bit more sense.
Does that make any sense to you? I guess I'll have to think about how to get all that across more concisely - let me know if you have ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think this is a good point. Let's just figure out how we can get this point across more concisely.
It sounds like to me you are saying that as the complexity of an analysis increases, so does the decisions and parameters surrounding it. This means its even more critical to carefully read and consider the documentation of the software you are using to figure out what best fits the goals for your data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could probably make a toy graph illustrating this idea. x axis is complexity of the analysis y is the number of decisions and parameters associated with the analysis.
(This idea would need some polishing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a graph I added to the slides (I'll link to it in the part of the text you referenced):
![Screen Shot 2023-11-26 at 4 08 39 PM](https://private-user-images.githubusercontent.com/2345877/285681917-7eb6bf3f-8f3e-4dbb-89fe-d1c9fad40a91.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTAwNTcsIm5iZiI6MTczOTAwOTc1NywicGF0aCI6Ii8yMzQ1ODc3LzI4NTY4MTkxNy03ZWI2YmYzZi04ZjNlLTRkYmItODlmZS1kMWM5ZmFkNDBhOTEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDhUMTAxNTU3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OGE2ZDA3NGU4YWM0MjQzZTIwMDY5NDZkYzY2ZTkyNDA2ZTgwOWU3ZTU1NTE5Y2Y4NjIwNGExMjlmM2ExZjk2OSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ._rPWqZlfodU9ePSFLq-5fy5mq1XdSFjgrMgHJVycUjk)
I also toyed with labeling the parts of the graph off the "y = x" line, but I'm not sure if this is helpful or counter-productive: maybe we shouldn't even mention overengineering since it's such an uncommon case in academic software (in my experience at least!)
![Screen Shot 2023-11-26 at 4 08 55 PM](https://private-user-images.githubusercontent.com/2345877/285681927-409afc87-c46a-4225-a74b-73af2ccdf22e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTAwNTcsIm5iZiI6MTczOTAwOTc1NywicGF0aCI6Ii8yMzQ1ODc3LzI4NTY4MTkyNy00MDlhZmM4Ny1jNDZhLTQyMjUtYTc0Yi03M2FmMmNjZGYyMmUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDhUMTAxNTU3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NWQyYWJiMzU4MWE3NmFiMTA3ZDkwZTVlMDJiZDUxZWJhNGY1NDFkYWJhMjAxZWExMGVkOTY2YzczNWJhNTY1MiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.z9vH3L29a4GkuW8A2iKG2Zjr1nA-FPzEEBGMX-qRKLo)
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some minor edits to these slides but overall I love the concept!!! Great visual.
…ub_Automation_for_Scientists into jjc2718/programming-practices
I think this covers what we need! I'm going to merge! |
Note that the URL error warning is a bug and something I have to address elsewhere. |
Revised version of #9 - I think the automated steps should run now!
Purpose/implementation Section
What changes are being implemented in this Pull Request?
Adding a chapter that introduces good programming practices and how they relate to scientific software development, and a short intro to how automation fits in.
Tell potential reviewers what kind of feedback you are soliciting.
Any feedback would be great! Structure, text, concepts I'm missing, etc.
New Content Checklist
New content/chapter is in an Rmd file with this kind of format and headers.
New content/chapter contains learning objectives.
Bookdown successfully re-renders and any new content files have been added to the _bookdown.yml.
Spell check runs successfully).
Any newly necessary packages that are needed have been added to the Dockerfile and image.
Images are in the correct format for rendering.
Every new image has alt text and is in a Google Slide.
Each slide is described in the notes of the slide so learners relying on a screen reader can access the content. See https://lastcallmedia.com/blog/accessible-comics for more guidance on this.
The color palette choices of the slide are contrasted in a way that is friendly to those with color vision deficiencies.
You can check this using Color Oracle.