From cfb672b556b50f810ae1474a8356eea4835eb42c Mon Sep 17 00:00:00 2001 From: avahoffman Date: Thu, 14 Dec 2023 17:16:10 -0500 Subject: [PATCH 01/10] Skin example --- 01c-AI_Possibilities-how_ai_works.Rmd | 56 ++++++++++++++++----------- book.bib | 8 ++++ 2 files changed, 42 insertions(+), 22 deletions(-) diff --git a/01c-AI_Possibilities-how_ai_works.Rmd b/01c-AI_Possibilities-how_ai_works.Rmd index 83a74671..02045aa8 100644 --- a/01c-AI_Possibilities-how_ai_works.Rmd +++ b/01c-AI_Possibilities-how_ai_works.Rmd @@ -7,30 +7,57 @@ ottrpal::set_knitr_image_path() # How AI Works -Let's briefly revisit our definition of AI: it must have data, training via an algorithm, and an interface. How do each of these work? We'll explore below. +Let's briefly revisit our definition of AI: it must have data, training via an algorithm, and an interface. Let's dive into each of these in more detail below. -## The Data Explosion +## Early Warning for Skin Cancer -Let's say we're driving a car or taking public transportation in a city. We might notice a pattern between the amount of traffic on roads, and the time of day. If you commute once at a specific time of day and observe the traffic around you, you have one data point. You can do this a bunch of times and collect more data. +Each year in the United States, 6.1 million adults are treated for skin cancer (basal cell and squamous cell carcinomas), totaling nearly $10 billion in costs [@CDC2023]. It is one of the most common forms of cancer in the United States, and mortality from skin cancer is a real concern. Fortunately, early detection through regular screening can increase survival rates to over 95% [@Melarkode2023]. Cost and accessibility of screening providers, however, means that many people aren't getting the preventative care they need. -Historically, this is the way data has been collected, and you could manage that data in an Excel Spreadsheet. However, as computer storage has become cheaper and data collection methods have become more sophisticated, our ability to access data has exploded in scale. It's not hard to imagine that using traffic cameras, dashcams, and car sensors could collect a lot more information than any one person. +Increasingly, AI is being used to flag potential skin cancer. AI focused on skin cancer detection could be used by would-be patients to motivate them to seek a professional opinion, or by clinicians to validate their findings or help with continuous learning. -Think about how much text information is freely available on the internet! Treating that as input data, AI systems can look for patterns of words that typically go together. For example, you're much more likely to see the phrase "cancer is a disease" than "cancer is a computer program". +1. **Data**: Images of skin + +1. **Algorithm**: Detection of possible skin cancer + +1. **Interface**: Web portal or app where you can submit a new picture + +## Collecting Datapoints + +Let's say a clinician, *Dr. J*, is learning how to screen for skin cancer. Dr. J diagnoses their first instance of skin cancer, meaning they now have one internal reference point. Dr. J could make future diagnoses based on this one data point, but it might not be very accurate. Over time, as Dr. J does more screenings of skin with and without cancer, they will get a better and better idea of what skin cancer looks like. Many of us refer to our on-the-job learning as "training". + +AI works in a similar way. As more data is provided, AI will typically get closer to finding the patterns in which we are interested. In order to train an AI algorithm to detect possible skin cancer, we'll first want to gather as many pictures of normal and cancerous skin as we can. This is our **raw data** [@Leek2017]. ```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} -ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g2a3877ab699_0_79") +ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_153") ``` ### What Is Data -Data comes in many shapes and forms. Data can be **structured**, such as a spreadsheet of times and traffic volume or counts of viral particles in different patients. Data can also be **unstructured**, such as might be found in social media text or genome sequence data. +In our skin cancer screening example, our data is all of the information stored in an image. However, data comes in many shapes and forms. Data can be **structured**, such as a spreadsheet of the time of day plus traffic volume or counts of viral particles in different patients. Data can also be **unstructured**, such as might be found in social media text or genome sequence data. Other kinds of data can be collected and used to train algorithms. These might include survey data collected directly from consumers, medical data collected in a healthcare setting, purchase or transaction tracking, and online tracking of your time on certain web pages [@Cote2022]. +Quantity *and* quality of data are very important. More data makes it easier to detect and account for minor differences among observations. However, that shouldn't come at the cost of quality. It is sometimes better to have fewer, high resolution or high quality images in our dataset than many images that are blurry, discolored, or in other ways questionable. + +Large Language Models (LLMs), which we will cover later, are great examples of high quantity and quality of data. Think about how much text information is freely available on the internet! Treating that as input data, AI systems can look for patterns of words that typically go together. For example, you're much more likely to see the phrase "cancer is a disease" than "cancer is a computer program". Many LLMs are trained on sources like [Wikipedia](https://www.wikipedia.org/), which are typically grammatically sound and informative, leading to higher quality output. + +```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} +ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g2a3877ab699_0_79") +``` +
It is **essential** that you and your team think critically about data sources. Many companies releasing generative AI systems have come under fire for training these systems on data that doesn't belong to them [@Walsh2023]. Individual people also have a right to data privacy. No personal data should be used without permission, even if that data could be interesting or useful.
+### Preparing the Data + +```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} +ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_318") +``` + + + + ## Machines Can Learn Like Us Human beings are powerhouses when it comes to pattern recognition and processing [@Mattson2014]. We are constantly observing the world around us, collecting data to learn and make decisions. For example, we might notice a pattern between the amount of traffic on roads in a city, and the time of day. @@ -52,18 +79,3 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONP ``` The rise of machine learning has been propelled by our ability to collect vast amounts of data and sophisticated types of AI and computing power. - - - - - - - - - - - - - - - diff --git a/book.bib b/book.bib index 92f9b868..859457ab 100644 --- a/book.bib +++ b/book.bib @@ -328,3 +328,11 @@ @article{Melarkode2023 year={2023}, publisher={MDPI} } + +@misc{Leek2017, + title = {Demystifying Artificial Intelligence}, + url = {https://leanpub.com/demystifyai}, + author = {Leek, Jeffrey T and Narayanan, Divya}, + language = {en}, + year= {2017} +} From 4afeb1265add61fba45561883111265dd2bc67bb Mon Sep 17 00:00:00 2001 From: avahoffman Date: Thu, 14 Dec 2023 23:34:17 -0500 Subject: [PATCH 02/10] More new skin (a la S.C.I.E.N.C.E.) --- 01c-AI_Possibilities-how_ai_works.Rmd | 46 +++++++++++++++++++-------- book.bib | 2 +- 2 files changed, 33 insertions(+), 15 deletions(-) diff --git a/01c-AI_Possibilities-how_ai_works.Rmd b/01c-AI_Possibilities-how_ai_works.Rmd index 02045aa8..1da2a1e1 100644 --- a/01c-AI_Possibilities-how_ai_works.Rmd +++ b/01c-AI_Possibilities-how_ai_works.Rmd @@ -23,9 +23,9 @@ Increasingly, AI is being used to flag potential skin cancer. AI focused on skin ## Collecting Datapoints -Let's say a clinician, *Dr. J*, is learning how to screen for skin cancer. Dr. J diagnoses their first instance of skin cancer, meaning they now have one internal reference point. Dr. J could make future diagnoses based on this one data point, but it might not be very accurate. Over time, as Dr. J does more screenings of skin with and without cancer, they will get a better and better idea of what skin cancer looks like. Many of us refer to our on-the-job learning as "training". +Let's say a clinician, *Dr. Derma*, is learning how to screen for skin cancer. When Dr. D sees their first instance of skin cancer, they now have one data point. Dr. D could make future diagnoses based on this one data point, but it might not be very accurate. Over time, as Dr. D does more screenings of skin with and without cancer, they will get a better and better idea of what skin cancer looks like. This is part of what we do best. Human beings are powerhouses when it comes to pattern recognition and processing [@Mattson2014]. -AI works in a similar way. As more data is provided, AI will typically get closer to finding the patterns in which we are interested. In order to train an AI algorithm to detect possible skin cancer, we'll first want to gather as many pictures of normal and cancerous skin as we can. This is our **raw data** [@Leek2017]. +Like Dr. D, AI will get better at finding the right patterns with more data. In order to train an AI algorithm to detect possible skin cancer, we'll first want to gather as many pictures of normal and cancerous skin as we can. This is the **raw data** [@Leek2017]. ```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_153") @@ -39,6 +39,10 @@ Other kinds of data can be collected and used to train algorithms. These might i Quantity *and* quality of data are very important. More data makes it easier to detect and account for minor differences among observations. However, that shouldn't come at the cost of quality. It is sometimes better to have fewer, high resolution or high quality images in our dataset than many images that are blurry, discolored, or in other ways questionable. +
+Diversity in datasets is often critical for AI. For example, if our skin cancer screening AI never sees skin cancer on darker skin, it might fail to alert patients that have darker skin. Lack of representation in the tech industry is partially responsible for these kinds of failures being discovered after harm has already happened. +
+ Large Language Models (LLMs), which we will cover later, are great examples of high quantity and quality of data. Think about how much text information is freely available on the internet! Treating that as input data, AI systems can look for patterns of words that typically go together. For example, you're much more likely to see the phrase "cancer is a disease" than "cancer is a computer program". Many LLMs are trained on sources like [Wikipedia](https://www.wikipedia.org/), which are typically grammatically sound and informative, leading to higher quality output. ```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} @@ -51,31 +55,45 @@ It is **essential** that you and your team think critically about data sources. ### Preparing the Data +It's important to remember that AI systems need specific instructions to start detecting patterns. We'll need to take our raw data and indicate which pictures are positive for skin cancer and which aren't. This process is called **labeling** and has to be done by humans. + ```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_318") ``` +Once data is labeled, either "cancer" or "not cancer", we can use it to train the algorithm in the next step. This data is aptly called **training data**. +## Understanding the Algorithm +Our goal is "detection of possible skin cancer", but how does a computer do that? -## Machines Can Learn Like Us +First, we'll need to break down the image into attributes called **features**. This could be the presence of certain color pixels, percentage of certain shades, spot perimeter regularity, or other features. Features can be determined by computers or by data scientists who know what kind of features are important. It's not uncommon for an AI looking at image data to have thousands of features. -Human beings are powerhouses when it comes to pattern recognition and processing [@Mattson2014]. We are constantly observing the world around us, collecting data to learn and make decisions. For example, we might notice a pattern between the amount of traffic on roads in a city, and the time of day. +Because we've supplied a bunch of images with labels, AI can look for patterns that are present in cancerous images that are not present in others. -```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} -ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.gcf1264c749_0_140") -``` +As an example, here is a very simple algorithm with one feature (spot perimeter): -Much like the human brain, machine learning detects patterns within data. **Machine learning** is at the heart of artificial intelligence, allowing computers to learn and make predictions. In more complex machine learning, computers make millions of calculations, mastering the mapping of inputs (observations) to outputs (predictions). This process mirrors how humans learn through experience. +1. Calculate the perimeter of a darker spot in the image. -
-**Machine Learning**: Machine learning is a way for computers to learn from examples and improve their performance over time, resembling how humans learn from experience. -
+1. If the perimeter of the spot is exactly circular, label the image "not cancer". -A machine learning system refines its understanding by continuously updating its parameters based on the feedback received from the provided data. For example, our system might be guessing traffic by time of day, but also judging its accuracy while accounting for other factors, such as whether or not it was a work day, if some workers are on holiday, or how many people live in the city. +1. If the perimeter of the spot is not circular, label the image "cancer". + +### Testing the Algorithm + +After setting up and quantifying the features, we want to make sure the AI is actually doing a good job. We'll take some images the AI hasn't seen before, called **test data**. We know the correct answers, but the AI does not. The AI will measure the features within each of the images to provide an educated guess of the proper label. Every time AI gets a label wrong, it will reassess parts of the algorithm. For example, it might make the tweak below: + +1. Calculate the perimeter of a darker spot in the image. + +1. If the perimeter of the spot is close to circular, label the image "not cancer". + +1. If the perimeter of the spot is not close to circular, label the image "cancer". ```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} -ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g1965a5f7f0a_0_44") +ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_360") ``` -The rise of machine learning has been propelled by our ability to collect vast amounts of data and sophisticated types of AI and computing power. +Humans play a big part in what kind of scores are acceptable when producing outputs. With cancer screening, we might be very worried about missing a real instance of cancer. Therefore, we might tell the AI to score false negatives more harshly than false positives. + +## Interfacing with AI + diff --git a/book.bib b/book.bib index 859457ab..908389d5 100644 --- a/book.bib +++ b/book.bib @@ -312,7 +312,7 @@ @misc{pearce_beware_2021 @misc{CDC2023, title = {Melanoma of the Skin Statistics}, url = {https://www.cdc.gov/cancer/skin/statistics/index.htm}, - author = {US Centeres for Disease Control and Prevention}, + author = {CDC}, language = {en}, urldate = {2023-12-14}, year= {2023} From 385f75ca1889f860598be53d9f0dfeccac667de2 Mon Sep 17 00:00:00 2001 From: avahoffman Date: Fri, 15 Dec 2023 00:40:42 -0500 Subject: [PATCH 03/10] Add info about interface --- 01c-AI_Possibilities-how_ai_works.Rmd | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/01c-AI_Possibilities-how_ai_works.Rmd b/01c-AI_Possibilities-how_ai_works.Rmd index 1da2a1e1..88b3b731 100644 --- a/01c-AI_Possibilities-how_ai_works.Rmd +++ b/01c-AI_Possibilities-how_ai_works.Rmd @@ -97,3 +97,12 @@ Humans play a big part in what kind of scores are acceptable when producing outp ## Interfacing with AI +Finally, AI would not work without an interface. This is where we can get creative. In our skin cancer screening, we might create a website where providers or patients could upload a picture of an area that needs screening. + +- Because skin images could be considered medical data, we would need to think critically about what happens to images after they are uploaded. Are images deleted after a screening prognosis is made? Will images be used to update the training data? + +- Telling people they might have cancer could be very upsetting for them. Our interface should provide supporting resources and clear disclaimers about its abilities. + +```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} +ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_397") +``` From 75a7c3641e2fed3507314c5ca106e50f4ae07a5f Mon Sep 17 00:00:00 2001 From: avahoffman Date: Fri, 15 Dec 2023 00:48:58 -0500 Subject: [PATCH 04/10] summary and slides --- 01b-AI_Possibilities-what_is_ai.Rmd | 4 ++++ 01c-AI_Possibilities-how_ai_works.Rmd | 2 ++ 2 files changed, 6 insertions(+) diff --git a/01b-AI_Possibilities-what_is_ai.Rmd b/01b-AI_Possibilities-what_is_ai.Rmd index dad4ea40..b57247aa 100644 --- a/01b-AI_Possibilities-what_is_ai.Rmd +++ b/01b-AI_Possibilities-what_is_ai.Rmd @@ -129,3 +129,7 @@ While the core functionality of speed cameras relies on sensor technology and pr This is considered AI. Social media algorithms, like Instagram's, make recommendations based on user behavior. For example, if you spend a lot of time viewing a page that was recommended, the system interprets that as positive feedback and will make similar recommendations. Typically, these recommendations get better over time as the user generates more user-specific data. You supply data through your behaviors, the algorithm gets trained, and you interact with the suggestions via the app. + +## Summary + +The definition of artificial intelligence (AI) has shifted over time. We use the three part framework of data, algorithms, and interfaces to describe AI applications. You will need to consider specific technologies and whether they meet the criteria for being classified as AI using this framework. Adaptability and training with new data are key factors to keep in mind as we move further in the course. diff --git a/01c-AI_Possibilities-how_ai_works.Rmd b/01c-AI_Possibilities-how_ai_works.Rmd index 88b3b731..328e56a4 100644 --- a/01c-AI_Possibilities-how_ai_works.Rmd +++ b/01c-AI_Possibilities-how_ai_works.Rmd @@ -5,6 +5,8 @@ ottrpal::set_knitr_image_path() # VIDEO How AI Works +TODO: Slides here: https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_397 + # How AI Works Let's briefly revisit our definition of AI: it must have data, training via an algorithm, and an interface. Let's dive into each of these in more detail below. From d9839f29333d769054604e025d606fc0846ada2e Mon Sep 17 00:00:00 2001 From: avahoffman Date: Fri, 15 Dec 2023 00:52:22 -0500 Subject: [PATCH 05/10] Cleanup phrasing --- 01c-AI_Possibilities-how_ai_works.Rmd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/01c-AI_Possibilities-how_ai_works.Rmd b/01c-AI_Possibilities-how_ai_works.Rmd index 328e56a4..24cdec8c 100644 --- a/01c-AI_Possibilities-how_ai_works.Rmd +++ b/01c-AI_Possibilities-how_ai_works.Rmd @@ -42,7 +42,9 @@ Other kinds of data can be collected and used to train algorithms. These might i Quantity *and* quality of data are very important. More data makes it easier to detect and account for minor differences among observations. However, that shouldn't come at the cost of quality. It is sometimes better to have fewer, high resolution or high quality images in our dataset than many images that are blurry, discolored, or in other ways questionable.
-Diversity in datasets is often critical for AI. For example, if our skin cancer screening AI never sees skin cancer on darker skin, it might fail to alert patients that have darker skin. Lack of representation in the tech industry is partially responsible for these kinds of failures being discovered after harm has already happened. +Representative diversity of datasets is crucial for the effectiveness of AI. For instance, if an AI used for skin cancer screening only encounters instances of skin cancer on lighter skin tones, it might fail to alert individuals with darker skin tones. + +The tech industry's lack of diversity contributes to these issues, often leading to the discovery of failures only after harm has occurred.
Large Language Models (LLMs), which we will cover later, are great examples of high quantity and quality of data. Think about how much text information is freely available on the internet! Treating that as input data, AI systems can look for patterns of words that typically go together. For example, you're much more likely to see the phrase "cancer is a disease" than "cancer is a computer program". Many LLMs are trained on sources like [Wikipedia](https://www.wikipedia.org/), which are typically grammatically sound and informative, leading to higher quality output. From e5236e3da5c90c5701f1cd666a936fede449a382 Mon Sep 17 00:00:00 2001 From: avahoffman Date: Fri, 15 Dec 2023 01:38:16 -0500 Subject: [PATCH 06/10] Starting types --- 01c-AI_Possibilities-how_ai_works.Rmd | 4 ++-- 01d-AI_Possibilities-ai_types.Rmd | 15 +++++++++------ 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/01c-AI_Possibilities-how_ai_works.Rmd b/01c-AI_Possibilities-how_ai_works.Rmd index 24cdec8c..974b2860 100644 --- a/01c-AI_Possibilities-how_ai_works.Rmd +++ b/01c-AI_Possibilities-how_ai_works.Rmd @@ -9,7 +9,7 @@ TODO: Slides here: https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1 # How AI Works -Let's briefly revisit our definition of AI: it must have data, training via an algorithm, and an interface. Let's dive into each of these in more detail below. +Let's briefly revisit our definition of AI: it must have data, algorithm(s), and an interface. Let's dive into each of these in more detail below. ## Early Warning for Skin Cancer @@ -47,7 +47,7 @@ Representative diversity of datasets is crucial for the effectiveness of AI. For The tech industry's lack of diversity contributes to these issues, often leading to the discovery of failures only after harm has occurred. -Large Language Models (LLMs), which we will cover later, are great examples of high quantity and quality of data. Think about how much text information is freely available on the internet! Treating that as input data, AI systems can look for patterns of words that typically go together. For example, you're much more likely to see the phrase "cancer is a disease" than "cancer is a computer program". Many LLMs are trained on sources like [Wikipedia](https://www.wikipedia.org/), which are typically grammatically sound and informative, leading to higher quality output. +Large Language Models (LLMs), which we will cover later, are great examples of high quantity and quality of data. Think about how much text information is freely available on the internet! Throughout the internet, we're much more likely to see the phrase "cancer is a disease" than "cancer is a computer program". Many LLMs are trained on sources like [Wikipedia](https://www.wikipedia.org/), which are typically grammatically sound and informative, leading to higher quality output. ```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g2a3877ab699_0_79") diff --git a/01d-AI_Possibilities-ai_types.Rmd b/01d-AI_Possibilities-ai_types.Rmd index c7da38c5..2ee24781 100644 --- a/01d-AI_Possibilities-ai_types.Rmd +++ b/01d-AI_Possibilities-ai_types.Rmd @@ -7,15 +7,18 @@ ottrpal::set_knitr_image_path() # Types of AI -How they work.. +We've learned a bit about how AI works. However there are many different types of AI with different combinations of data, algorithms, and interfaces. There are also general terms that are important to know. Let's explore some of these below. - +## Machine Learning +**Machine learning** is broad concept describing how computers to learn from data. It includes traditional methods like decision trees and linear regression, as well as more modern approaches such as deep learning. It involves training models on labeled data to make predictions or uncover patterns or grouping of data. Machine learning is often the "algorithm" part of our data - algorithm - interface framework. - +## Neural Networks - +**Neural networks** are a specific class of algorithms within the broader field of machine learning. They organize data into layers, including an input layer for data input and an output layer for results, with intermediate layers in between. These layers help neural networks understand hierarchical patterns in data. - +The connections between nodes have weights that the network learns during training. The network can then adjust these weights to minimize errors in predictions. Neural networks often require large amounts of labeled data for training, and their performance may continue to improve with more data. - +```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} +ottrpal::include_slide("https://docs.google.com/presentation/d/1UiYOR_4a68524XsCv-f950n_CfbyNJVez2KdAjq2ltU/edit#slide=id.g2a694e3cce9_0_0") +``` From fc06718acdc805d24a85ec20ecb831b13abd1f59 Mon Sep 17 00:00:00 2001 From: avahoffman Date: Fri, 15 Dec 2023 01:59:10 -0500 Subject: [PATCH 07/10] Add brackets --- 01b-AI_Possibilities-what_is_ai.Rmd | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/01b-AI_Possibilities-what_is_ai.Rmd b/01b-AI_Possibilities-what_is_ai.Rmd index 66584826..adcd5479 100644 --- a/01b-AI_Possibilities-what_is_ai.Rmd +++ b/01b-AI_Possibilities-what_is_ai.Rmd @@ -65,7 +65,7 @@ In this case study, we will look at how artificial intelligence has been utilize There are many uses of AI for improving financial institutions, each with potential benefits and risks. Most financial institutions weigh the benefits and risks carefully before implementation. -For instance, if a financial institution takes a high-risk prediction seriously, such as predicting a financial crisis or a large recession, then it would have huge impact on a bank’s policy and allows the bank to act early. However, many financial institutions are hesitant to take action based on artificial intelligence predictions because the prediction is for a high-risk situation. If the prediction is not accurate then there can be severe consequences. Additionally, data on rare events such as financial crises are not abundant, so researchers worry that there is not enough data to train accurate models @nelson2023. +For instance, if a financial institution takes a high-risk prediction seriously, such as predicting a financial crisis or a large recession, then it would have huge impact on a bank’s policy and allows the bank to act early. However, many financial institutions are hesitant to take action based on artificial intelligence predictions because the prediction is for a high-risk situation. If the prediction is not accurate then there can be severe consequences. Additionally, data on rare events such as financial crises are not abundant, so researchers worry that there is not enough data to train accurate models [@nelson2023]. Many banks prefer to pilot AI for low-risk, repeated predictions, in which the events are common and there is a lot of data to train the model on. @@ -77,7 +77,7 @@ Let’s look at a few examples that illustrate the potential benefits and risks ottrpal::include_slide("https://docs.google.com/presentation/d/1b8ivojtu3UA0HcACLqcghS300Ia4Wu7iXmgp6KacEJw/edit#slide=id.g2639341f200_0_58") ``` -An important task in analysis of economic data is to classify business by institutional sector. For instance, given 10 million legal entities in the European Union, they need to be classified by financial sector to conduct downstream analysis. In the past, classifying legal entities was curated by expert knowledge @moufakkir2023. +An important task in analysis of economic data is to classify business by institutional sector. For instance, given 10 million legal entities in the European Union, they need to be classified by financial sector to conduct downstream analysis. In the past, classifying legal entities was curated by expert knowledge [@moufakkir2023]. Text-based analysis and machine learning classifiers, which are all considered AI models, help reduce this manual curation time. An AI model would extract important keywords and classify into an appropriate financial sector, such as “non-profits”, “small business”, or “government”. This would be a low-risk use of AI, as one could easily validate the result to the true financial sector. @@ -87,8 +87,7 @@ Text-based analysis and machine learning classifiers, which are all considered A ottrpal::include_slide("https://docs.google.com/presentation/d/1b8ivojtu3UA0HcACLqcghS300Ia4Wu7iXmgp6KacEJw/edit#slide=id.g2639341f200_0_70") ``` - -Banks are considering expanding upon existing traditional economic models to bring in a wider data sources, such as pulling in social media feeds as an indicator of public sentiment. The National bank of France has started to use social media information to estimate the public perception of inflation. The Malaysian national bank has started to incorporate new articles into its financial model of gross domestic product estimation. However, the use of these new data sources may may raise questions about government oversight of social media and public domain information @omfif2023. +Banks are considering expanding upon existing traditional economic models to bring in a wider data sources, such as pulling in social media feeds as an indicator of public sentiment. The National bank of France has started to use social media information to estimate the public perception of inflation. The Malaysian national bank has started to incorporate new articles into its financial model of gross domestic product estimation. However, the use of these new data sources may may raise questions about government oversight of social media and public domain information [@omfif2023]. #### Using Large Language Models to predict inflation @@ -96,17 +95,9 @@ Banks are considering expanding upon existing traditional economic models to bri ottrpal::include_slide("https://docs.google.com/presentation/d/1b8ivojtu3UA0HcACLqcghS300Ia4Wu7iXmgp6KacEJw/edit#slide=id.g2639341f200_0_14") ``` -The US Federal Reserve has researched the idea of using pre-trained large language models from Google to make inflation predictions. Usually, inflation is predicted from the Survey of Professional Forecasters, which pools forecasts from a range of financial forecasts and experts. When compared to the true inflation rate, the researchers found that the large language models performed slightly better than the Survey of Professional Forecasters @stlouisfed2023. - -A concern of using pre-trained large language models is that the data sources used for model training are not known, so the financial institution may be using data that is not in line with its policy. Also, a potential risk of using large language models that perform similarly is the convergence of predictions. If large language models make very similar predictions, banks would act similarly and make similar policies, which may lead to financial instability @omfif2023. - - - - - - - +The US Federal Reserve has researched the idea of using pre-trained large language models from Google to make inflation predictions. Usually, inflation is predicted from the Survey of Professional Forecasters, which pools forecasts from a range of financial forecasts and experts. When compared to the true inflation rate, the researchers found that the large language models performed slightly better than the Survey of Professional Forecasters [@stlouisfed2023]. +A concern of using pre-trained large language models is that the data sources used for model training are not known, so the financial institution may be using data that is not in line with its policy. Also, a potential risk of using large language models that perform similarly is the convergence of predictions. If large language models make very similar predictions, banks would act similarly and make similar policies, which may lead to financial instability [@omfif2023]. ## What Is and Is Not AI From 23bedbeb88d55f17185af876ac40b041793e1f47 Mon Sep 17 00:00:00 2001 From: avahoffman Date: Fri, 15 Dec 2023 01:59:19 -0500 Subject: [PATCH 08/10] Lil reorg --- 01c-AI_Possibilities-how_ai_works.Rmd | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/01c-AI_Possibilities-how_ai_works.Rmd b/01c-AI_Possibilities-how_ai_works.Rmd index 974b2860..8c90a0f0 100644 --- a/01c-AI_Possibilities-how_ai_works.Rmd +++ b/01c-AI_Possibilities-how_ai_works.Rmd @@ -61,12 +61,12 @@ It is **essential** that you and your team think critically about data sources. It's important to remember that AI systems need specific instructions to start detecting patterns. We'll need to take our raw data and indicate which pictures are positive for skin cancer and which aren't. This process is called **labeling** and has to be done by humans. +Once data is labeled, either "cancer" or "not cancer", we can use it to train the algorithm in the next step. This data is aptly called **training data**. + ```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_318") ``` -Once data is labeled, either "cancer" or "not cancer", we can use it to train the algorithm in the next step. This data is aptly called **training data**. - ## Understanding the Algorithm Our goal is "detection of possible skin cancer", but how does a computer do that? @@ -93,12 +93,12 @@ After setting up and quantifying the features, we want to make sure the AI is ac 1. If the perimeter of the spot is not close to circular, label the image "cancer". +Humans play a big part in what kind of scores are acceptable when producing outputs. With cancer screening, we might be very worried about missing a real instance of cancer. Therefore, we might tell the AI to score false negatives more harshly than false positives. + ```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_360") ``` -Humans play a big part in what kind of scores are acceptable when producing outputs. With cancer screening, we might be very worried about missing a real instance of cancer. Therefore, we might tell the AI to score false negatives more harshly than false positives. - ## Interfacing with AI Finally, AI would not work without an interface. This is where we can get creative. In our skin cancer screening, we might create a website where providers or patients could upload a picture of an area that needs screening. From 311cc56fbf624fe1cd0de828d87cbed091d3da0e Mon Sep 17 00:00:00 2001 From: avahoffman Date: Fri, 15 Dec 2023 01:59:26 -0500 Subject: [PATCH 09/10] Better analogy --- 01d-AI_Possibilities-ai_types.Rmd | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/01d-AI_Possibilities-ai_types.Rmd b/01d-AI_Possibilities-ai_types.Rmd index 2ee24781..5b122558 100644 --- a/01d-AI_Possibilities-ai_types.Rmd +++ b/01d-AI_Possibilities-ai_types.Rmd @@ -5,7 +5,7 @@ ottrpal::set_knitr_image_path() # VIDEO Different Types of AI -# Types of AI +# Demystifying Types of AI We've learned a bit about how AI works. However there are many different types of AI with different combinations of data, algorithms, and interfaces. There are also general terms that are important to know. Let's explore some of these below. @@ -15,7 +15,9 @@ We've learned a bit about how AI works. However there are many different types o ## Neural Networks -**Neural networks** are a specific class of algorithms within the broader field of machine learning. They organize data into layers, including an input layer for data input and an output layer for results, with intermediate layers in between. These layers help neural networks understand hierarchical patterns in data. +**Neural networks** are a specific class of algorithms within the broader field of machine learning. They organize data into layers, including an input layer for data input and an output layer for results, with intermediate "hidden" layers in between. + +You can think of layers like different teams in an organization. The input layer is in charge of scoping and strategy, the output layer is in charge of finalizing deliverables, while the intermediate layers are responsible for piecing together existing and creating new project materials. These layers help neural networks understand hierarchical patterns in data. The connections between nodes have weights that the network learns during training. The network can then adjust these weights to minimize errors in predictions. Neural networks often require large amounts of labeled data for training, and their performance may continue to improve with more data. From f47cc89dff6c0a66687481cba811aaebfbc2acc2 Mon Sep 17 00:00:00 2001 From: avahoffman Date: Fri, 15 Dec 2023 02:09:34 -0500 Subject: [PATCH 10/10] Add to dictionary --- resources/dictionary.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/resources/dictionary.txt b/resources/dictionary.txt index d3ab8858..a6a65507 100644 --- a/resources/dictionary.txt +++ b/resources/dictionary.txt @@ -49,6 +49,7 @@ ChatGPT's CIO Coursera css +curation cyberattacks cybersecurity DALL @@ -57,7 +58,9 @@ DaSL DaSL's Datatrail DataTrail +deliverables deepfakes +Derma Dockerfile Dockerhub dropdown @@ -74,6 +77,7 @@ GPT HIPAA IDARE impactful +IRB IRBs ITCR itcrtraining