diff --git a/01b-AI_Possibilities-what_is_ai.Rmd b/01b-AI_Possibilities-what_is_ai.Rmd index 9e53509d..adcd5479 100644 --- a/01b-AI_Possibilities-what_is_ai.Rmd +++ b/01b-AI_Possibilities-what_is_ai.Rmd @@ -65,7 +65,7 @@ In this case study, we will look at how artificial intelligence has been utilize There are many uses of AI for improving financial institutions, each with potential benefits and risks. Most financial institutions weigh the benefits and risks carefully before implementation. -For instance, if a financial institution takes a high-risk prediction seriously, such as predicting a financial crisis or a large recession, then it would have huge impact on a bank’s policy and allows the bank to act early. However, many financial institutions are hesitant to take action based on artificial intelligence predictions because the prediction is for a high-risk situation. If the prediction is not accurate then there can be severe consequences. Additionally, data on rare events such as financial crises are not abundant, so researchers worry that there is not enough data to train accurate models @nelson2023. +For instance, if a financial institution takes a high-risk prediction seriously, such as predicting a financial crisis or a large recession, then it would have huge impact on a bank’s policy and allows the bank to act early. However, many financial institutions are hesitant to take action based on artificial intelligence predictions because the prediction is for a high-risk situation. If the prediction is not accurate then there can be severe consequences. Additionally, data on rare events such as financial crises are not abundant, so researchers worry that there is not enough data to train accurate models [@nelson2023]. Many banks prefer to pilot AI for low-risk, repeated predictions, in which the events are common and there is a lot of data to train the model on. @@ -77,7 +77,7 @@ Let’s look at a few examples that illustrate the potential benefits and risks ottrpal::include_slide("https://docs.google.com/presentation/d/1b8ivojtu3UA0HcACLqcghS300Ia4Wu7iXmgp6KacEJw/edit#slide=id.g2639341f200_0_58") ``` -An important task in analysis of economic data is to classify business by institutional sector. For instance, given 10 million legal entities in the European Union, they need to be classified by financial sector to conduct downstream analysis. In the past, classifying legal entities was curated by expert knowledge @moufakkir2023. +An important task in analysis of economic data is to classify business by institutional sector. For instance, given 10 million legal entities in the European Union, they need to be classified by financial sector to conduct downstream analysis. In the past, classifying legal entities was curated by expert knowledge [@moufakkir2023]. Text-based analysis and machine learning classifiers, which are all considered AI models, help reduce this manual curation time. An AI model would extract important keywords and classify into an appropriate financial sector, such as “non-profits”, “small business”, or “government”. This would be a low-risk use of AI, as one could easily validate the result to the true financial sector. @@ -87,8 +87,7 @@ Text-based analysis and machine learning classifiers, which are all considered A ottrpal::include_slide("https://docs.google.com/presentation/d/1b8ivojtu3UA0HcACLqcghS300Ia4Wu7iXmgp6KacEJw/edit#slide=id.g2639341f200_0_70") ``` - -Banks are considering expanding upon existing traditional economic models to bring in a wider data sources, such as pulling in social media feeds as an indicator of public sentiment. The National bank of France has started to use social media information to estimate the public perception of inflation. The Malaysian national bank has started to incorporate new articles into its financial model of gross domestic product estimation. However, the use of these new data sources may may raise questions about government oversight of social media and public domain information @omfif2023. +Banks are considering expanding upon existing traditional economic models to bring in a wider data sources, such as pulling in social media feeds as an indicator of public sentiment. The National bank of France has started to use social media information to estimate the public perception of inflation. The Malaysian national bank has started to incorporate new articles into its financial model of gross domestic product estimation. However, the use of these new data sources may may raise questions about government oversight of social media and public domain information [@omfif2023]. #### Using Large Language Models to predict inflation @@ -96,17 +95,9 @@ Banks are considering expanding upon existing traditional economic models to bri ottrpal::include_slide("https://docs.google.com/presentation/d/1b8ivojtu3UA0HcACLqcghS300Ia4Wu7iXmgp6KacEJw/edit#slide=id.g2639341f200_0_14") ``` -The US Federal Reserve has researched the idea of using pre-trained large language models from Google to make inflation predictions. Usually, inflation is predicted from the Survey of Professional Forecasters, which pools forecasts from a range of financial forecasts and experts. When compared to the true inflation rate, the researchers found that the large language models performed slightly better than the Survey of Professional Forecasters @stlouisfed2023. - -A concern of using pre-trained large language models is that the data sources used for model training are not known, so the financial institution may be using data that is not in line with its policy. Also, a potential risk of using large language models that perform similarly is the convergence of predictions. If large language models make very similar predictions, banks would act similarly and make similar policies, which may lead to financial instability @omfif2023. - - - - - - - +The US Federal Reserve has researched the idea of using pre-trained large language models from Google to make inflation predictions. Usually, inflation is predicted from the Survey of Professional Forecasters, which pools forecasts from a range of financial forecasts and experts. When compared to the true inflation rate, the researchers found that the large language models performed slightly better than the Survey of Professional Forecasters [@stlouisfed2023]. +A concern of using pre-trained large language models is that the data sources used for model training are not known, so the financial institution may be using data that is not in line with its policy. Also, a potential risk of using large language models that perform similarly is the convergence of predictions. If large language models make very similar predictions, banks would act similarly and make similar policies, which may lead to financial instability [@omfif2023]. ## What Is and Is Not AI @@ -174,3 +165,7 @@ While the core functionality of speed cameras relies on sensor technology and pr This is considered AI. Social media algorithms, like Instagram's, make recommendations based on user behavior. For example, if you spend a lot of time viewing a page that was recommended, the system interprets that as positive feedback and will make similar recommendations. Typically, these recommendations get better over time as the user generates more user-specific data. You supply data through your behaviors, the algorithm gets trained, and you interact with the suggestions via the app. + +## Summary + +The definition of artificial intelligence (AI) has shifted over time. We use the three part framework of data, algorithms, and interfaces to describe AI applications. You will need to consider specific technologies and whether they meet the criteria for being classified as AI using this framework. Adaptability and training with new data are key factors to keep in mind as we move further in the course. diff --git a/01c-AI_Possibilities-how_ai_works.Rmd b/01c-AI_Possibilities-how_ai_works.Rmd index 83a74671..8c90a0f0 100644 --- a/01c-AI_Possibilities-how_ai_works.Rmd +++ b/01c-AI_Possibilities-how_ai_works.Rmd @@ -5,65 +5,108 @@ ottrpal::set_knitr_image_path() # VIDEO How AI Works +TODO: Slides here: https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_397 + # How AI Works -Let's briefly revisit our definition of AI: it must have data, training via an algorithm, and an interface. How do each of these work? We'll explore below. +Let's briefly revisit our definition of AI: it must have data, algorithm(s), and an interface. Let's dive into each of these in more detail below. + +## Early Warning for Skin Cancer + +Each year in the United States, 6.1 million adults are treated for skin cancer (basal cell and squamous cell carcinomas), totaling nearly $10 billion in costs [@CDC2023]. It is one of the most common forms of cancer in the United States, and mortality from skin cancer is a real concern. Fortunately, early detection through regular screening can increase survival rates to over 95% [@Melarkode2023]. Cost and accessibility of screening providers, however, means that many people aren't getting the preventative care they need. + +Increasingly, AI is being used to flag potential skin cancer. AI focused on skin cancer detection could be used by would-be patients to motivate them to seek a professional opinion, or by clinicians to validate their findings or help with continuous learning. -## The Data Explosion +1. **Data**: Images of skin -Let's say we're driving a car or taking public transportation in a city. We might notice a pattern between the amount of traffic on roads, and the time of day. If you commute once at a specific time of day and observe the traffic around you, you have one data point. You can do this a bunch of times and collect more data. +1. **Algorithm**: Detection of possible skin cancer -Historically, this is the way data has been collected, and you could manage that data in an Excel Spreadsheet. However, as computer storage has become cheaper and data collection methods have become more sophisticated, our ability to access data has exploded in scale. It's not hard to imagine that using traffic cameras, dashcams, and car sensors could collect a lot more information than any one person. +1. **Interface**: Web portal or app where you can submit a new picture -Think about how much text information is freely available on the internet! Treating that as input data, AI systems can look for patterns of words that typically go together. For example, you're much more likely to see the phrase "cancer is a disease" than "cancer is a computer program". +## Collecting Datapoints + +Let's say a clinician, *Dr. Derma*, is learning how to screen for skin cancer. When Dr. D sees their first instance of skin cancer, they now have one data point. Dr. D could make future diagnoses based on this one data point, but it might not be very accurate. Over time, as Dr. D does more screenings of skin with and without cancer, they will get a better and better idea of what skin cancer looks like. This is part of what we do best. Human beings are powerhouses when it comes to pattern recognition and processing [@Mattson2014]. + +Like Dr. D, AI will get better at finding the right patterns with more data. In order to train an AI algorithm to detect possible skin cancer, we'll first want to gather as many pictures of normal and cancerous skin as we can. This is the **raw data** [@Leek2017]. ```{r, echo=FALSE, fig.alt='CAPTION HERE', out.width = '100%', fig.align = 'center'} -ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g2a3877ab699_0_79") +ottrpal::include_slide("https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I/edit#slide=id.g263e06ef889_36_153") ``` ### What Is Data -Data comes in many shapes and forms. Data can be **structured**, such as a spreadsheet of times and traffic volume or counts of viral particles in different patients. Data can also be **unstructured**, such as might be found in social media text or genome sequence data. +In our skin cancer screening example, our data is all of the information stored in an image. However, data comes in many shapes and forms. Data can be **structured**, such as a spreadsheet of the time of day plus traffic volume or counts of viral particles in different patients. Data can also be **unstructured**, such as might be found in social media text or genome sequence data. Other kinds of data can be collected and used to train algorithms. These might include survey data collected directly from consumers, medical data collected in a healthcare setting, purchase or transaction tracking, and online tracking of your time on certain web pages [@Cote2022]. +Quantity *and* quality of data are very important. More data makes it easier to detect and account for minor differences among observations. However, that shouldn't come at the cost of quality. It is sometimes better to have fewer, high resolution or high quality images in our dataset than many images that are blurry, discolored, or in other ways questionable. +