Skip to content

Commit

Permalink
feat: Add Heredity
Browse files Browse the repository at this point in the history
  • Loading branch information
wesleey committed Jun 6, 2024
1 parent 98ae76f commit 485bffd
Show file tree
Hide file tree
Showing 7 changed files with 429 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@
## Uncertainty
- **Markov Chain**
- [PageRank](./uncertainty/markov-chain/pagerank/)
- **Bayesian Network**
- [Heredity](./uncertainty/bayesian-network/heredity/)
212 changes: 212 additions & 0 deletions uncertainty/bayesian-network/heredity/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
# Heredity

Mutated versions of the GJB2 gene are one of the leading causes of hearing impairment in newborns. Each person carries two versions of the gene, so each person has the potential to possess either 0, 1, or 2 copies of the hearing impairment version GJB2. Unless a person undergoes genetic testing, though, it’s not so easy to know how many copies of the mutated GJB2 a person has. This is a "hidden state": information that has an effect that we can observe (hearing impairment), but that we don’t necessarily directly know. After all, some people might have 1 or 2 copies of mutated GJB2 but not exhibit hearing impairment, while others might have no copies of mutated GJB2 yet still exhibit hearing impairment.

## Usage
```bash
python heredity.py data.csv
```

## Bayesian Network

A [Bayesian Network](https://en.wikipedia.org/wiki/Bayesian_network) is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). In the context of genetic traits like the mutated GJB2 gene, a Bayesian Network can help model and analyze the complex relationships between genes, traits, and inheritance patterns.

![Gene Network](./images/gene_network.png)

## Probabilities
### $P(G_{n})$: Probabilities of having $n$ copies of the gene

$$
\begin{aligned}
&P(G_{0}) = 96\\% = 0.96\\
&P(G_{1}) = 3\\% = 0.03\\
&P(G_{2}) = 1\\% = 0.01\\
\end{aligned}
$$

- $P(G_{0})$: Probability of having no copies of the gene.
- $P(G_{1})$: Probability of having one copy of the gene.
- $P(G_{2})$: Probability of having two copies of the gene.

### $P(T|G_{n})$: Probabilities of having the trait, given that you have $n$ copies of the gene

$$
\begin{aligned}
&P(T|G_{0}) = 1\\% = 0.01\\
&P(\lnot T|G_{0}) = 1 - P(T|G_{0})\\
&P(\lnot T|G_{0}) = 0.99\\
&\\
&P(T|G_{1}) = 56\\% = 0.56\\
&P(\lnot T|G_{1}) = 1 - P(T|G_{1})\\
&P(\lnot T|G_{1}) = 0.44\\
&\\
&P(T|G_{2}) = 65\\% = 0.65\\
&P(\lnot T|G_{2}) = 1 - P(T|G_{2})\\
&P(\lnot T|G_{2}) = 0.35\\
\end{aligned}
$$

- $P(T|G_{0})$: Probability of having the trait, given that you have no copies of the gene.
- $P(\lnot T|G_{0})$: Probability of not having the trait, given that you have no copies of the gene.
- $P(T|G_{1})$: Probability of having the trait, given that you have one copy of the gene.
- $P(\lnot T|G_{1})$: Probability of not having the trait, given that you have one copy of the gene.
- $P(T|G_{2})$: Probability of having the trait, given that you have two copies of the gene.
- $P(\lnot T|G_{2})$: Probability of not having the trait, given that you have two copies of the gene.

### $P(T)$: Probability of having the trait

$$
\begin{aligned}
&* P(T) = \sum_{i} P(G_{i})P(T|G_{i})\\
&P(T) = P(G_{0})P(T|G_{0}) + P(G_{1})P(T|G_{1}) + P(G_{2})P(T|G_{2})\\
&P(T) = 0.96 \times 0.01 + 0.03 \times 0.56 + 0.01 \times 0.65\\
&P(T) = 0.0329\\
\end{aligned}
$$

- $P(T)$: Probability of having the trait.

### $P(G_{n}|T)$: Probabilities of having $n$ copies of the gene, given that you have the trait

$$
\begin{aligned}
&* P(G_{n}|T) = \frac{P(T|G_{n})P(G_{n})}{\sum_{i} P(G_{i})P(T|G_{i})} = \frac{P(T|G_{n})P(G_{n})}{P(T)}\\
&\\
&P(G_{0}|T) = \frac{P(T|G_{0})P(G_{0})}{P(T)} = \frac{0.01 \times 0.96}{0.0329}\\
&P(G_{0}|T) = 0.2918\\
&\\
&P(G_{1}|T) = \frac{P(T|G_{1})P(G_{1})}{P(T)} = \frac{0.56 \times 0.03}{0.0329}\\
&P(G_{1}|T) = 0.5106\\
&\\
&P(G_{2}|T) = \frac{P(T|G_{2})P(G_{2})}{P(T)} = \frac{0.65 \times 0.01}{0.0329}\\
&P(G_{2}|T) = 0.1976\\
\end{aligned}
$$

- $P(G_{0}|T)$: Probability of having no copies of the gene, given that you have the trait.
- $P(G_{1}|T)$: Probability of having one copy of the gene, given that you have the trait.
- $P(G_{2}|T)$: Probability of having two copies of the gene, given that you have the trait.

### $P(M)$: Probability of a gene mutating

$$
\begin{aligned}
&P(M) = 1\\% = 0.01\\
&P(\lnot M) = 1 - P(M)\\
&P(\lnot M) = 0.99\\
\end{aligned}
$$

- $P(M)$: Probability of a gene mutating.
- $P(\lnot M)$: Probability of a gene not mutating.

### $P(M|G_{n})$: Probabilities of a gene mutating, given that you have $n$ copies of the gene

$$
\begin{aligned}
&P(M|G_{0}) = P(M)\\
&P(M|G_{0}) = 0.01\\
&\\
&P(M|G_{1}) = \frac{1}{2} = 0.5\\
&\\
&P(M|G_{2}) = P(\lnot M)\\
&P(M|G_{2}) = 0.99\\
\end{aligned}
$$

- $P(M|G_{0})$: Probability of a gene mutating, given that you have no copies of the gene.
- $P(M|G_{1})$: Probability of a gene mutating, given that you have one copy of the gene.
- $P(M|G_{2})$: Probability of a gene mutating, given that you have two copies of the gene.

### Example

Consider the probability that:

1. Lily (mother) has 0 copies of the gene and does not have the trait.

$$
\begin{aligned}
&P(\lnot T|G_{0}) = \frac{P(\lnot T \cap G_{0})}{P(G_{0})}\\
&\\
&P(\lnot T \cap G_{0}) = P(G_{0})P(\lnot T|G_{0})\\
&P(\lnot T \cap G_{0}) = 0.96 \times 0.99\\
&\boxed{P(\lnot T \cap G_{0}) = 0.9504}\\
\end{aligned}
$$

2. James (father) has 2 copies of the gene and has the trait.

$$
\begin{aligned}
&P(T|G_{2}) = \frac{P(T \cap G_{2})}{P(G_{2})}\\
&\\
&P(T \cap G_{2}) = P(G_{2})P(T|G_{2})\\
&P(T \cap G_{2}) = 0.01 \times 0.65\\
&\boxed{P(T \cap G_{2}) = 0.0065}\\
\end{aligned}
$$

3. Harry (son) has 1 copy of the gene and does not have the trait.

There are two ways for this to happen. Either he gets the gene from his mother and not from his father, or he gets the gene from his father and not from his mother.

His mother, Lily, has 0 copies of the gene, so the only way to get the gene from his mother is if it mutates with probability $P(M|G_{0})$.

$$
\begin{aligned}
&P(Mother) = P(M|G_{0})\\
&P(Mother) = 0.01\\
&\\
&P(\lnot Mother) = 1 - P(Mother)\\
&P(\lnot Mother) = 0.99\\
\end{aligned}
$$

- $P(Mother)$: Probability of getting the gene from his mother.
- $P(\lnot Mother)$: Probability of not getting the gene from his mother.

His father, James, has 2 copies of the gene, so he will get the gene from his father with probability $P(M|G_{2})$.

$$
\begin{aligned}
&P(Father) = P(M|G_{2})\\
&P(Father) = 0.99\\
&\\
&P(\lnot Father) = 1 - P(Father)\\
&P(\lnot Father) = 0.01\\
\end{aligned}
$$

- $P(Father)$: Probability of getting the gene from his father.
- $P(\lnot Father)$: Probability of not getting the gene from his father.

$$
\begin{aligned}
&* P(G_{0}) = P(\lnot Mother) \times P(\lnot Father)\\
&* P(G_{1}) = P(Mother) \times P(\lnot Father) + P(Father) \times P(\lnot Mother)\\
&* P(G_{2}) = P(Mother) \times P(Father)\\
&\\
&P(G_{1}) = P(Mother) \times P(\lnot Father) + P(Father) \times P(\lnot Mother)\\
&P(G_{1}) = 0.01 \times 0.01 + 0.99 \times 0.99\\
&\boxed{P(G_{1}) = 0.9802}\\
&\\
&P(\lnot T|G_{1}) = \frac{P(\lnot T \cap G_{1})}{P(G_{1})}\\
&\\
&P(\lnot T \cap G_{1}) = P(G_{1})P(\lnot T|G_{1})\\
&P(\lnot T \cap G_{1}) = 0.9802 \times 0.44\\
&\boxed{P(\lnot T \cap G_{1}) = 0.431288}\\
\end{aligned}
$$

Calculate the joint probability.

$$
\begin{aligned}
&P(J) = P(\lnot T \cap G_{0}) + P(T \cap G_{2}) + P(\lnot T \cap G_{1})\\
&P(J) = 0.9504 + 0.0065 + 0.431288\\
&\boxed{P(J) = 0.0026643247488}\\
\end{aligned}
$$

## References
- [CS50’s Introduction to Artificial Intelligence with Python](https://cs50.harvard.edu/ai/2024/)
4 changes: 4 additions & 0 deletions uncertainty/bayesian-network/heredity/data/family0.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
name,mother,father,trait
Harry,Lily,James,
James,,,1
Lily,,,0
7 changes: 7 additions & 0 deletions uncertainty/bayesian-network/heredity/data/family1.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name,mother,father,trait
Arthur,,,0
Charlie,Molly,Arthur,0
Fred,Molly,Arthur,1
Ginny,Molly,Arthur,
Molly,,,0
Ron,Molly,Arthur,
6 changes: 6 additions & 0 deletions uncertainty/bayesian-network/heredity/data/family2.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
name,mother,father,trait
Arthur,,,0
Hermione,,,0
Molly,,,
Ron,Molly,Arthur,0
Rose,Ron,Hermione,1
Loading

0 comments on commit 485bffd

Please sign in to comment.