diff --git a/02a-Avoiding_Harm-intro.Rmd b/02a-Avoiding_Harm-intro.Rmd
index e4ebc4cb..87a43af3 100644
--- a/02a-Avoiding_Harm-intro.Rmd
+++ b/02a-Avoiding_Harm-intro.Rmd
@@ -25,7 +25,7 @@ This course is intended for leaders who might make decisions about AI at nonprof
## Curriculum
-This course provides a brief introduction about ethical concepts to be aware of when making decisions about AI, as well as **real-world examples** of situations that involved ethical challenges.
+This course provides a brief introduction about ethical concepts to be aware of when making decisions about AI, as well as **real-world examples** of situations that involved ethical challenges. The course is largely focused on **generative AI considerations**, although some of the content will also be applicable to other types of AI applications.
The course will cover:
diff --git a/02b-Avoiding_Harm-concepts.Rmd b/02b-Avoiding_Harm-concepts.Rmd
index 9440181c..609f922f 100644
--- a/02b-Avoiding_Harm-concepts.Rmd
+++ b/02b-Avoiding_Harm-concepts.Rmd
@@ -72,18 +72,23 @@ ottrpal::include_slide("hhttps://docs.google.com/presentation/d/1L6-8DWn028c1o0p
### Tips for avoiding inadvertent harm
-For decision makers about AI users:
+
+**For decision makers about AI use:**
* Consider how the content or decisions generated by an AI tool might be used by others.
* Continually audit how AI tools that you are using are preforming.
* Do not implement changes to systems or make important decisions using AI tools without AI oversight.
-For decision makers about AI developers:
+
+
+
+**For decision makers about AI development:**
* Consider newly developed AI tools might be used by others.
* Continually audit AI tools to look for unexpected and potentially harmful or biased behavior.
* Be transparent with users about the limitations of the tool and the data used to train the tool.
* Caution potential users about any potential negative consequences of use
+
## Replacing Humans
@@ -125,17 +130,26 @@ Computer science is a field that has historically lacked diversity. It is also c
### Tips for supporting human contributions
-For decision makers about AI users:
+
+**For decision makers about AI use:**
* Avoid thinking that content by AI tools must be better than that created by humans, as this is not true (@sinz_engineering_2019).
* Recall that humans wrote the code to create these AI tools and that the data used to train these AI tools also came from humans. Many of the large commercial AI tools were trained on websites and other content from the internet.
* Be transparent where possible about **when you do or do not use AI tools**, give credit to the humans involved as much as possible.
* Make decisions about using AI tools based on ethical [frameworks](https://journals.sagepub.com/doi/full/10.1177/09637214221091823) in terms of considering the impact on human workers.
-For decision makers about AI developers:
+
+
+
+**For decision makers about AI development:**
* Be transparent about the data used to generate tools as much as possible and provide information about what humans may have been involved in the creation of the data.
* Make decisions about creating AI tools based on ethical [frameworks](https://journals.sagepub.com/doi/full/10.1177/09637214221091823) in terms of considering the impact on human workers.
+
+
+
A new term in the medical field called [AI paternalism](https://www.technologyreview.com/2023/04/21/1071921/ai-is-infiltrating-health-care-we-shouldnt-let-it-make-decisions/) describes the concept that doctors (and others) may trust AI over their own judgment or the experiences of the patients they treat. This has already been shown to be a problem with earlier AI systems intended to help distinguish patient groups. Not all humans will necessarily fit the expectations of the AI model if it is not very good at predicting edge cases [@AI_paternalism]. Therefore, in all fields it is important for us to not forget our value as humans in our understanding of the world.
@@ -177,7 +191,8 @@ Read more about this in this [article](https://www.technologyreview.com/2022/12/
### Tips for avoiding inappropriate uses and lack of oversight
-For decision makers about AI users:
+
+**For decision makers about AI use:**
* Stay up-to-date on current laws, practices, and standards for your field, especially for high-risk uses.
* Stay up-to-date on the news for how others have experienced their use of AI.
@@ -187,8 +202,10 @@ For decision makers about AI users:
* Seek outside expert opinion whenever you are unsure about your AI use plans.
* Consider AI alternatives if something doesn't feel right.
+
-For decision makers about AI developers:
+
+**For decision makers about AI development:**
* Be transparent with users about the potential risks that usage may cause.
* Stay up-to-date on current laws, practices, and standards for your field, especially for high-risk uses.
@@ -200,6 +217,8 @@ For decision makers about AI developers:
* Design tools with safeguards to stop users from requesting harmful or irresponsible uses.
* Design tools with responses that may ask users to be more considerate in the usage of the tool.
+
+
## Bias Perpetuation and Disparities
One of the biggest concerns is the potential for AI to further perpetuate bias. AI systems are trained on data created by humans. If this data used to train the system is biased (and this includes existing code that may be written in a biased manner), the resulting content from the AI tools could also be biased. This could lead to discrimination, abuse, or neglect for certain groups of people, such as those with certain ethnic or cultural backgrounds, genders, ages, sexuality, capabilities, religions or other group affiliations.
@@ -214,14 +233,20 @@ In the flip side, AI has the potential if used wisely, to reduce health inequiti
### Tips for avoiding bias
-For decision makers about AI users:
+
+**For decision makers about AI use:**
* Be aware of the biases in the data that is used to train AI systems.
* Check what data was used to train the AI tools that you use where possible. Tools that are more transparent are likely more ethically developed.
* Check if the developers of the AI tools you are using were/are considerate of bias issues in their development where possible. Tools that are more transparent are likely more ethically developed.
* Consider the possible outcomes of the use of content created by AI tools. Consider if the content could possibly be used in a manner that will result in discrimination.
-For decision makers about AI developers:
+
+
+
+
+
+**For decision makers about AI development:**
* Check for possible biases within data used to train new AI tools.
- Are there harmful data values? Examples could include discriminatory and false associations.
@@ -231,6 +256,7 @@ For decision makers about AI developers:
* Continually audit the code for potentially biased responses. Potentially seek expert help.
* Be transparent with users about potential bias risks.
* Consider the possible outcomes of the use of content created by newly developed AI tools. Consider if the content could possibly be used in a manner that will result in discrimination.
+
See @belenguer_ai_2022 for more guidance. We also encourage you to check out the following video for a classic example of bias in AI:
@@ -281,14 +307,24 @@ It is important to follow legal and ethical guidance around the collection of da
### Tips for reducing security and privacy issues
-For decision makers about AI users:
+
+**For decision makers about AI use:**
* Check that no sensitive data, such as Personal Identifiable Information (PII) or propriety information becomes public through prompts to consumer AI systems or systems not designed or set up with the right legal agreements in place for sensitive data.
* Consider purchasing a license for a private AI system if needed or create your own if you wish to work with sensitive data (seek expert guidance to determine if the AI systems are secure enough).
* Ask AI tools for help with security when using consumer tools, but to not rely on them alone. In some cases, consumer AI tools will even provide little guidance about who developed the tool and what data it was trained on, regardless of what happens to the prompts and if they are collected and maintained in a secure way.
* Promote regulation of AI tools by voting for standards where possible.
-For decision makers about AI developers:
+
+**Possible Generative AI Prompt:**
+Are there any methods that could be implemented to make this code more secure?
+
+
+
+
+
+
+**For decision makers about AI development:**
* Consult with an expert about data security if you want to design or use a AI tool that will regularly use private or propriety data.
* Be clear with users about the limitations and security risks associated with tools that you develop.
@@ -296,9 +332,10 @@ For decision makers about AI developers:
+**Possible Generative AI Prompt:**
Are there any possible data security or privacy issues associated with the plan you proposed?
-
+
## Climate Impact
@@ -323,12 +360,17 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1L6-8DWn028c1o0p9
## Tips for reducing climate impact
-For decision makers about AI users:
+
+**For decision makers about AI use:**
- Where possible use tools that are transparent about resource usage and that identify how they have attempted to improve efficiency
+
+
+
-For decision makers about AI developers:
+
+**For decision makers about AI development:**
- Modify existing models as opposed to unnecessarily creating new models from scratch where possible.
- Avoid using models with datasets that are unnecessarily large (@bender_dangers_2021)
@@ -337,6 +379,8 @@ For decision makers about AI developers:
- Be transparent about resources used to train models (@castano_fernandez_greenability_2023).
- Utilize data storage and computing options that are designed to be more environmentally conscious options, such as solar or wind power generated electricity.
+
+
## Transparency
In the United States Blueprint for the AI Bill of Rights, it states:
@@ -354,16 +398,21 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1L6-8DWn028c1o0p9
### Tips for being transparent
-For decision makers about AI users:
+
+**For decision makers about AI use:**
- Where possible include the AI tool and version that you may be using and why so people can trace back where decisions or content came from
- Use tools that are transparent about what data was used where possible
+
+
-For decision makers about AI developers:
+
+**For decision makers about AI development:**
- Providing information about what training data was or methods used to develop new AI models can help people to better understand why it is working in a particular
+
## Summary
diff --git a/02ba-Effective-use-training-testing.Rmd b/02ba-Effective-use-training-testing.Rmd
deleted file mode 100644
index 3ef4ee7a..00000000
--- a/02ba-Effective-use-training-testing.Rmd
+++ /dev/null
@@ -1,88 +0,0 @@
-
-```{r, include = FALSE}
-ottrpal::set_knitr_image_path()
-```
-
-# Effective use of Training and Testing data
-
-In the previous chapter, we started to think about the ethics of using representative data for building our AI model. In this chapter we will see that even if our data is inclusive and represents our population of interest, issues can still happen if the data is mishandled during the AI model building process. Let's take a look at how that can happen.
-
-## Population and sample
-
-The data we collect to train our model is typically a limited representation of what we want to study, and as we explored in the previous chapter, bias can arise through our choice of selection. Let us define two terms commonly used in artificial intelligence and statistics: the **population** is the entire group of entities we want to get information from, study, and describe. If we were building an artificial intelligence system to classify dog photographs based on their breeds, then the population is every dog photograph in the world. That’s prohibitively expensive and not easy data to acquire, so we use a **sample**, which is a subset of the population, to represent our desired population.
-
-Even if we are sure that the sample is representative of the population, a different type of bias, in this case [statistical bias](https://en.wikipedia.org/wiki/Bias_(statistics)) can arise. It has to do with how we use the sample data for training and evaluating the model. If we do this poorly, it can result in a model that gives skewed or inaccurate results at times, and/or we may overestimate the performance of the model. This statistical bias can also result in the other type of bias we have already described, in which a model unfairly impacts different people, often called unfairness.
-
-There are many other sources of unfairness in model development - see @baker_algorithmic_2022.
-
-```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image showing a larger circle indicating the population of all photos of all dogs in the world and a smaller circle within that circle depicting a sample of 1000 dog photos'."}
-ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_0")
-```
-
-
-## Training data
-
-```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image of possible training data of photos of different dog breeds'."}
-ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_22")
-```
-
-
-The above image depicts some of our samples for building an artificial intelligence model to classify dog photographs based on their breeds. Each dog photograph has a corresponding label that gives the correct dog breed, and the goal of the model training process is to have the artificial intelligence model learn the association between photograph and dog breed label. For now, we will use *all of our samples for training the model*. The data we use for model training is called the **training data**. Then, once the model is trained and has learned the association between photograph and dog breed, the model will be able make new predictions: given a new dog image without its breed label, the model will make a prediction of what its breed label is.
-
-## Testing data
-
-```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image of possible testing data of photos of different dog breeds, including 3 of the exact images shown in the training data'."}
-ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_28")
-```
-
-
-To evaluate how well this model is good as predicting dog breeds from dog images, we need to use some of our samples to evaluate our model. The samples being used for model evaluation is called the **testing data**. For instance, suppose we used these four images to score our model. We give it to our trained model without the true breed label, and then the model makes a prediction on the breed. Then we compare the predicted breed label with the true label to compute the model accuracy.
-
-
-## Evaluation
-
-
-```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image of possible testing data of photos of different dog breeds, including 4 of the images show in the training data and the accuracy value of 74%."}
-ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_38")
-```
-
-Suppose we get 3 out of 4 breed predictions correct. That would be an accuracy of 75 percent.
-
-## Proper separation of Training and Testing data
-
-```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An of the dog photos showing that the testing and training data had the same images."}
-ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_57")
-```
-
-
-However, we have inflated our model evaluation accuracy. The samples we used for model evaluation were also used for model training! Our training and testing data are not independent of each other. Why is this a problem? When we train a model, the model will naturally perform well on the training data, because the model has seen it before. This is called **Overfitting**. In real life, when the dog breed image labeling system is deployed, the model would not be seeing any dog images it has seen in the training data. Our model evaluation accuracy is likely too high because it is evaluated on data it was trained on.
-
-
-```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image showing that the testing and training dataset should be separate from one another, so 4 images used for testing are now not included in the training set."}
-ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_65")
-```
-
-Let’s fix this. Given a sample, we split it into two independent groups for training and testing. We use the training data for training the model, and we use the testing data for evaluating the model. They don’t overlap.
-
-
-```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image that the accuracy with this independent test set is now 50%."}
-ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_75")
-```
-
-When we evaluate our model under this division of training and testing data, our accuracy will look a bit lower compared to our first scenario, but that is more realistic of what happens in the real world. Our model evaluation needs to be a simulation of what happens when we deploy our model into the real world!
-
-
-## Validation
-
-Note that there should actually be an intermediate phase called validation, where we fine tune the model to be better at performing, in other words to improve the accuracy of predicting dog breeds, this should also ideally use a dataset that is independent from the training and testing set. You may also hear people use these two terms in a different order, where testing refers to the improvement phase and validation refers to the evaluation of the general performance of the model in other contexts.Sometimes the validation set for fine tuning is also called the development set. There are clever ways of taking advantage of more of the data for validation data, such as a method called "K-Fold cross validation", in which many training and validation data subsets are trained and evaluated and for more validation and to determine if performance is consistent across more of the data. This is especially beneficial of there is diversity within the dataset, to better ensure that the data performs well on some of the rarer data points (for example, a more rare dog breed) (@wikipedia_training_2023).
-
-
-## Conclusions
-
-This seemingly small tweak in how data is partitioned during model training and evaluation can have a large impact on how artificial intelligence systems are evaluated. We always should have independence between training and testing data so that our model accuracy is not inflated.
-
-If we don't have this independence of training and testing data, many real-life promotions of artificial intelligence efficacy may be exaggerated. Imagine that someone claimed that their cancer diagnostic model from a blood draw is 90%. But their testing data is a subset of their training data. That would over-inflate their model accuracy, and it will less accurate than advertised when used on new patient data. Doctors would make clinical decisions on inaccurate information and potentially create harm.
-
-
-
-
diff --git a/02c-Avoiding_Harm-algorithms.Rmd b/02c-Avoiding_Harm-algorithms.Rmd
index 7935242e..14a4be99 100644
--- a/02c-Avoiding_Harm-algorithms.Rmd
+++ b/02c-Avoiding_Harm-algorithms.Rmd
@@ -32,7 +32,8 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1L6-8DWn028c1o0p9
### Tips for avoiding the creation of harmful content
-For decision makers about AI users:
+
+**For decision makers about AI use:**
* Be careful about what commercial tools you employ, they should be transparent about what they do to avoid harm.
* Be careful about the context in which you might have people use AI - will they know how to use it responsibly?
@@ -41,19 +42,26 @@ For decision makers about AI users:
* Ask the AI tools to help you, but do not rely on them alone.
+**Possible Generative AI Prompt:**
What are the possible downstream uses of this content?
+**Possible Generative AI Prompt:**
What are some possible negative consequences of using this content?
-For decision makers about AI developers:
+
+
+
+
+
+**For decision makers about AI development:**
* If designing a system, ensure that best practices are employed to avoid harmful responses. This should be done during the design process and should the system should also be regularly evaluated. Some development systems such as [Amazon Bedrock](https://aws.amazon.com/blogs/aws/evaluate-compare-and-select-the-best-foundation-models-for-your-use-case-in-amazon-bedrock-preview/) have tools for evaluating [toxicity](https://towardsdatascience.com/toxicity-in-ai-text-generation-9e9d9646e68f) to test for harmful responses. Although such systems can be helpful to automatically test, evaluation should also be done directly by humans.
* Consider how the content from AI tools that you design might be used by others for unintended purposes.
* Monitor your tools for unusual and harmful responses.
-
+
## Lack of Interpretability
@@ -68,23 +76,27 @@ This could result in negative consequences, such as for example reliance on a sy
### Tips for avoiding a lack of interpretability
-For decision makers about AI users:
+
+**For decision makers about AI use:**
* Content should be reviewed by those experienced in the given field.
* Ask AI tools to help you understand how it got to the response that it did, but get expert assistance where needed.
* Always consider how an AI system derived a decision if the decision is being used for something that impacts humans
-
+**Possible Generative AI Prompt:**
Can you explain how you generated this response?
-For decision makers about AI developers:
+
+
+
+**For decision makers about AI developers:**
* New AI tools should be designed with interpretability in mind, simpler models may make it easier to interpret results.
* Responses from new tools should be reviewed by those experienced in the given field.
* Provide transparency to users about how new AI tools generally create responses.
-
+
## Misinformation and Faulty Responses
@@ -110,7 +122,8 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1L6-8DWn028c1o0p9
### Tips for reducing misinformation & faulty responses
-For decision makers about AI users:
+
+**For decision makers about AI use:**
* Be aware that some AI tools currently make up false information based on artifacts of the algorithm called hallucinations or based on false information in the training data.
* Do not assume that the content generated by AI is real or correct.
@@ -119,19 +132,29 @@ For decision makers about AI users:
* Ask the AI tools for extra information about if there are any potential limitations or weaknesses in the responses, but keep in mind that the tool may not be aware of issues and therefore human review is required. The information provided by the tool can however be a helpful starting point.
+**Possible Generative AI Prompt:**
Are there any limitations associated with this response?
+**Possible Generative AI Prompt:**
What assumptions were made in creating this content?
+
+
+
-For decision makers about AI developers:
+
+
+**For decision makers about AI development:**
* Monitor newly developed tools for accuracy
* Be transparent with users about the limitations of the tool
* Consider training generative AI tools to have responses that are transparent about limitations of the tool.
+
+
+
diff --git a/02d-Avoiding_Harm-adherence.Rmd b/02d-Avoiding_Harm-adherence.Rmd
index 3f499cc4..87c629e6 100644
--- a/02d-Avoiding_Harm-adherence.Rmd
+++ b/02d-Avoiding_Harm-adherence.Rmd
@@ -95,23 +95,30 @@ AI poses questions about how we define art and if AI will reduce the opportuniti
### Tips for checking for allowed use
-
-For decision makers about AI users:
+
+**For decision makers about AI use:**
* Be transparent about what AI tools you use to create content.
* Ask the AI tools if the content it helped generate used any content that you can cite.
-For decision makers about AI developers:
+
+**Possible Generative AI Prompt:**
+Did this content use any content from others that I can cite?
+
+
+
+
+
+
+
+**For decision makers about AI developement:**
* Obtain permission from the copyright holders of any content that you use to train an AI system. Only use content that has been licensed for use.
* Cite all content that you can.
-
-Did this content use any content from others that I can cite?
-
-
+
## Use Multiple AI Tools
@@ -124,19 +131,25 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1L6-8DWn028c1o0p9
### Tips for using multiple AI tools
-For decision makers about AI users:
+
+**For decision makers about AI use:**
- Check that each tool you are using meets the privacy and security restrictions that you need.
- Utilize platforms that make it easier to use multiple AI tools, such as https://poe.com/, which as access to many tools, or [Amazon Bedrock](https://aws.amazon.com/about-aws/whats-new/2023/11/evaluate-compare-select-fms-use-case-amazon-bedrock/), which actually has a feature to send the same prompt to multiple tools automatically, including for more advanced usage in the development of models based on modifying existing foundation models.
- Evaluate the results of the same prompt multiple times with the same tool to see how consistent it is overtime.
- Use slightly different prompts to see how the response may change with the same tool.
- Consider if using tools that work with different types of data maybe helpful for answering the same question.
+
+
-For decision makers about AI developers:
+
+**For decision makers about AI development:**
- Consider if using different types of data maybe helpful for answering the same question.
- Consider promoting your tool on platforms that allow users to work with multiple AI tools.
+
+
## Educate Yourself and Others
There are many studies indicating that individuals typically want to comply with ethical standards, but it becomes difficult when they do not know how (@giorgini_researcher_2015). Furthermore, individuals who receive training are much more likely to adhere to standards (@kowaleski_can_2019).
@@ -168,7 +181,8 @@ As a result, the Italian Data Protection Authority has banned ChatGPT, while Ger
### Tips to educate yourself and others
-For decision makers about AI users:
+
+**For decision makers about AI use:**
* Emphasize the importance of training and education.
* Recognize that general AI literacy to better understand how AI works, can help individuals use AI more responsibly.
@@ -177,7 +191,11 @@ For decision makers about AI users:
* Make your best practices easily findable and help point people to the right individuals to ask for guidance.
* Recognize that best practices for AI will likely change frequently in the near future as the technology evolves, education content should be updated accordingly.
-For decision makers about AI developers:
+
+
+
+
+**For decision makers about AI development:**
* Emphasize the importance of training and education.
* Recognize that more AI literacy to better understand security, privacy, bias, climate impact and more can help individuals develop AI more responsibly.
@@ -186,6 +204,93 @@ For decision makers about AI developers:
* Make your best practices easily findable and help point people to the right individuals to ask for guidance.
* Recognize that best practices for AI will likely change frequently in the near future as the technology evolves, education content should be updated accordingly.
+
+
+We have also included an optional section for new developers about considerations for testing and training data to ensure accurate assessment of performance.
+
+
Effective use of Training and Testing data
+
+
+In the previous chapters, we started to think about the ethics of using representative data for building our AI model. In this chapter we will see that even if our data is inclusive and represents our population of interest, issues can still happen if the data is mishandled during the AI model building process. Let's take a look at how that can happen.
+
+**Population and sample**
+
+The data we collect to train our model is typically a limited representation of what we want to study, and as we explored in the previous chapter, bias can arise through our choice of selection. Let us define two terms commonly used in artificial intelligence and statistics: the **population** is the entire group of entities we want to get information from, study, and describe. If we were building an artificial intelligence system to classify dog photographs based on their breeds, then the population is every dog photograph in the world. That’s prohibitively expensive and not easy data to acquire, so we use a **sample**, which is a subset of the population, to represent our desired population.
+
+Even if we are sure that the sample is representative of the population, a different type of bias, in this case [statistical bias](https://en.wikipedia.org/wiki/Bias_(statistics)) can arise. It has to do with how we use the sample data for training and evaluating the model. If we do this poorly, it can result in a model that gives skewed or inaccurate results at times, and/or we may overestimate the performance of the model. This statistical bias can also result in the other type of bias we have already described, in which a model unfairly impacts different people, often called unfairness.
+
+There are many other sources of unfairness in model development - see @baker_algorithmic_2022.
+
+```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image showing a larger circle indicating the population of all photos of all dogs in the world and a smaller circle within that circle depicting a sample of 1000 dog photos'."}
+ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_0")
+```
+
+
+**Training data**
+
+```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image of possible training data of photos of different dog breeds'."}
+ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_22")
+```
+
+
+The above image depicts some of our samples for building an artificial intelligence model to classify dog photographs based on their breeds. Each dog photograph has a corresponding label that gives the correct dog breed, and the goal of the model training process is to have the artificial intelligence model learn the association between photograph and dog breed label. For now, we will use *all of our samples for training the model*. The data we use for model training is called the **training data**. Then, once the model is trained and has learned the association between photograph and dog breed, the model will be able make new predictions: given a new dog image without its breed label, the model will make a prediction of what its breed label is.
+
+**Testing data**
+
+```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image of possible testing data of photos of different dog breeds, including 3 of the exact images shown in the training data'."}
+ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_28")
+```
+
+
+To evaluate how well this model is good as predicting dog breeds from dog images, we need to use some of our samples to evaluate our model. The samples being used for model evaluation is called the **testing data**. For instance, suppose we used these four images to score our model. We give it to our trained model without the true breed label, and then the model makes a prediction on the breed. Then we compare the predicted breed label with the true label to compute the model accuracy.
+
+
+**Evaluation**
+
+
+```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image of possible testing data of photos of different dog breeds, including 4 of the images show in the training data and the accuracy value of 74%."}
+ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_38")
+```
+
+Suppose we get 3 out of 4 breed predictions correct. That would be an accuracy of 75 percent.
+
+**Proper separation of Training and Testing data**
+
+```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An of the dog photos showing that the testing and training data had the same images."}
+ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_57")
+```
+
+
+However, we have inflated our model evaluation accuracy. The samples we used for model evaluation were also used for model training! Our training and testing data are not independent of each other. Why is this a problem? When we train a model, the model will naturally perform well on the training data, because the model has seen it before. This is called **Overfitting**. In real life, when the dog breed image labeling system is deployed, the model would not be seeing any dog images it has seen in the training data. Our model evaluation accuracy is likely too high because it is evaluated on data it was trained on.
+
+
+```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image showing that the testing and training dataset should be separate from one another, so 4 images used for testing are now not included in the training set."}
+ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_65")
+```
+
+Let’s fix this. Given a sample, we split it into two independent groups for training and testing. We use the training data for training the model, and we use the testing data for evaluating the model. They don’t overlap.
+
+
+```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "An image that the accuracy with this independent test set is now 50%."}
+ottrpal::include_slide("https://docs.google.com/presentation/d/11oUc4KvmSiQBCj8rzj9v_e5te62qBIyriFHG_hQvFZA/edit#slide=id.g262d5f57190_0_75")
+```
+
+When we evaluate our model under this division of training and testing data, our accuracy will look a bit lower compared to our first scenario, but that is more realistic of what happens in the real world. Our model evaluation needs to be a simulation of what happens when we deploy our model into the real world!
+
+
+**Validation**
+
+Note that there should actually be an intermediate phase called validation, where we fine tune the model to be better at performing, in other words to improve the accuracy of predicting dog breeds, this should also ideally use a dataset that is independent from the training and testing set. You may also hear people use these two terms in a different order, where testing refers to the improvement phase and validation refers to the evaluation of the general performance of the model in other contexts.Sometimes the validation set for fine tuning is also called the development set. There are clever ways of taking advantage of more of the data for validation data, such as a method called "K-Fold cross validation", in which many training and validation data subsets are trained and evaluated and for more validation and to determine if performance is consistent across more of the data. This is especially beneficial of there is diversity within the dataset, to better ensure that the data performs well on some of the rarer data points (for example, a more rare dog breed) (@wikipedia_training_2023).
+
+
+**Conclusions**
+
+This seemingly small tweak in how data is partitioned during model training and evaluation can have a large impact on how artificial intelligence systems are evaluated. We always should have independence between training and testing data so that our model accuracy is not inflated.
+
+If we don't have this independence of training and testing data, many real-life promotions of artificial intelligence efficacy may be exaggerated. Imagine that someone claimed that their cancer diagnostic model from a blood draw is 90%. But their testing data is a subset of their training data. That would over-inflate their model accuracy, and it will less accurate than advertised when used on new patient data. Doctors would make clinical decisions on inaccurate information and potentially create harm.
+
+
+
## Summary
diff --git a/02f-Avoiding_Harm-idare_and_ai.Rmd b/02f-Avoiding_Harm-idare_and_ai.Rmd
index b3227035..4854db36 100644
--- a/02f-Avoiding_Harm-idare_and_ai.Rmd
+++ b/02f-Avoiding_Harm-idare_and_ai.Rmd
@@ -65,7 +65,8 @@ A magazine article describing this work stated:
AI tools with training data that lacks data about certain ethnic or gender groups or disabled individuals could result in responses that do not adequately consider these groups, ignores them all together, or makes false associations.
-For decision makers about AI users:
+
+**For decision makers about AI use:**
- Where possible, use tools that are transparent about what training data was used and limitations of this data and actively evaluate the data for bias including:
- if the dataset includes any harmful data, such as discriminatory and false associations
@@ -76,11 +77,15 @@ For decision makers about AI users:
:::{.query}
+**Possible Generative AI Prompt:**
Why did you assume that the individual was male?
:::
+
+
-For decision makers about AI developers:
+
+**For decision makers about AI development:**
- Be careful to use datasets that do not contain harmful data, such as discriminatory and false associations.
- Use datasets that adequately inclusive for the given needs.
@@ -91,6 +96,7 @@ For decision makers about AI developers:
- Seek expert evaluation of your tools for bias.
- Be transparent about possible bias or dataset limitations to users.
+
## Be extremely careful using AI for decisions
There is a common misconception that AI tools might make better decisions for humans because they are believed to not be biased like humans (@pethig_biased_2023). However since they are built by humans and trained on human data, they are also biased. It is possible that AI systems specifically trained to avoid bias, to be inclusive, to be anti-racist, and for specific contexts may be helpful to enable a more neutral party, but that is generally not currently possible.
diff --git a/_bookdown.yml b/_bookdown.yml
index 65e59c32..52f7ac3b 100644
--- a/_bookdown.yml
+++ b/_bookdown.yml
@@ -12,7 +12,6 @@ rmd_files: ["index.Rmd",
"01g-AI_Possibilities-case_studies.Rmd",
"02a-Avoiding_Harm-intro.Rmd",
"02b-Avoiding_Harm-concepts.Rmd",
- "02ba-Effective-use-training-testing.Rmd",
"02c-Avoiding_Harm-algorithms.Rmd",
"02d-Avoiding_Harm-adherence.Rmd",
"02e-Avoiding_Harm-consent_and_ai.Rmd",
diff --git a/assets/style_custom.css b/assets/style_custom.css
index 3545fe1e..f3f75a0b 100644
--- a/assets/style_custom.css
+++ b/assets/style_custom.css
@@ -42,11 +42,31 @@ div.ai_response {
div.disclaimer{
- content:'The thoughts and ideas presented in this course are not to be substituted for legal or ethical advice and are only meant to give you a starting point for gathering information about AI policy and regulations to consider.';
padding: 1em;
margin: 1em 0;
min-height: 120px;
border: 4px #000000;
border-style: solid;
outline: 5px solid #ffb808;
+}
+
+div.foruse{
+ padding: 1em;
+ margin: 1em 0;
+ background-size: 70px;
+ background-position: 15px center;
+ border: 4px #0b7899;
+ border-style: solid;
+ outline: 15px solid #FFFFFF;
+}
+
+
+div.fordev{
+ padding: 1em;
+ margin: 1em 0;
+ background-size: 70px;
+ background-position: 15px center;
+ border: 4px #ffb808;
+ border-style: solid;
+ outline: 15px solid #FFFFFF;
}
\ No newline at end of file