Skip to content

Commit

Permalink
Default config file for decennial census with missing data noise func…
Browse files Browse the repository at this point in the history
…tion (#13)

Default configuration for initial release

Updates default configuration for initial release for decennial census form with missing data noise function.
- *Category*: Feature
- *JIRA issue*: [MIC-3925](https://jira.ihme.washington.edu/browse/MIC-3925)

-Updates default 

Testing
Ran generate_decennial_census successfully with required columns to experience noise have the configured percentage of noise applied to them.
  • Loading branch information
albrja authored Mar 23, 2023
1 parent 61c25bd commit efdc3ef
Showing 1 changed file with 15 additions and 136 deletions.
151 changes: 15 additions & 136 deletions src/pseudopeople/default_configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,169 +7,48 @@ decennial_census:
omission: 0.0145
duplication: 0.05
first_name:
nickname:
row_noise_level: 0.01
fake_names:
row_noise_level: 0.01
missing_data:
row_noise_level: 0.01
phonetic:
row_noise_level: 0.01
token_noise_level: 0.1
ocr:
middle_initial:
missing_data:
row_noise_level: 0.01
token_noise_level: 0.1
typographic:
last_name:
missing_data:
row_noise_level: 0.01
token_noise_level: 0.1
age:
missing_data:
row_noise_level: 0.01
ocr:
row_noise_level: 0.01
token_noise_level: 0.1
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
age_miswriting:
row_noise_level: 0.01
age_miswriting: [1, -1]
zipcode:
date_of_birth:
missing_data:
row_noise_level: 0.01
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
zipcode_miswriting:
row_noise_level: 0.01
zipcode_miswriting: [0.04, 0.04, 0.2, 0.36, 0.36]

american_communities_survey:
omission: 0.0145
duplication: 0.05
first_name:
nickname:
row_noise_level: 0.01
fake_names:
row_noise_level: 0.01
street_number:
missing_data:
row_noise_level: 0.01
phonetic:
row_noise_level: 0.01
token_noise_level: 0.1
ocr:
row_noise_level: 0.01
token_noise_level: 0.1
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
age:
street_name:
missing_data:
row_noise_level: 0.01
ocr:
row_noise_level: 0.01
token_noise_level: 0.1
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
age_miswriting:
row_noise_level: 0.01
age_miswriting: [1, -1]
zipcode:
unit_number:
missing_data:
row_noise_level: 0.01
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
zipcode_miswriting:
row_noise_level: 0.01
zipcode_miswriting: [0.04, 0.04, 0.2, 0.36, 0.36]

current_population_survey:
omission: 0.2905
duplication: 0.05
first_name:
nickname:
row_noise_level: 0.01
fake_names:
row_noise_level: 0.01
city:
missing_data:
row_noise_level: 0.01
phonetic:
row_noise_level: 0.01
token_noise_level: 0.1
ocr:
row_noise_level: 0.01
token_noise_level: 0.1
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
age:
state:
missing_data:
row_noise_level: 0.01
ocr:
row_noise_level: 0.01
token_noise_level: 0.1
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
age_miswriting:
row_noise_level: 0.01
age_miswriting: [1, -1]
zipcode:
missing_data:
row_noise_level: 0.01
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
zipcode_miswriting:
row_noise_level: 0.01
zipcode_miswriting: [0.04, 0.04, 0.2, 0.36, 0.36]

women_infants_and_children:
omission: 0.0
duplication: 0.05
first_name:
nickname:
row_noise_level: 0.01
fake_names:
row_noise_level: 0.01
relation_to_household_head:
missing_data:
row_noise_level: 0.01
phonetic:
row_noise_level: 0.01
token_noise_level: 0.1
ocr:
row_noise_level: 0.01
token_noise_level: 0.1
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
age:
sex:
missing_data:
row_noise_level: 0.01
ocr:
row_noise_level: 0.01
token_noise_level: 0.1
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
age_miswriting:
row_noise_level: 0.01
age_miswriting: [1, -1]
zipcode:
race_ethnicity:
missing_data:
row_noise_level: 0.01
typographic:
row_noise_level: 0.01
token_noise_level: 0.1
zipcode_miswriting:
housing_type:
missing_data:
row_noise_level: 0.01
zipcode_miswriting: [0.04, 0.04, 0.2, 0.36, 0.36]

# TODO: add the rest of observers/forms with RT input
#social_security:
#
#taxes_w2_and_1099:
#
#taxes_1040:

0 comments on commit efdc3ef

Please sign in to comment.