From 418b4b49bad540ea484a9136f49d11b8cf28e047 Mon Sep 17 00:00:00 2001 From: crccheck Date: Fri, 18 Jul 2014 16:25:01 -0500 Subject: [PATCH 01/11] add intro to intro to ipeds grafs --- datasets/ipeds.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 datasets/ipeds.md diff --git a/datasets/ipeds.md b/datasets/ipeds.md new file mode 100644 index 0000000..a4ce0fa --- /dev/null +++ b/datasets/ipeds.md @@ -0,0 +1,36 @@ +IPEDS Data Center +================= + +The Integrated Postsecondary Education Data System (IPEDS) is a Federal +clearinghouse for higher education data run by the [National Center for +Education Statistics] which is run by the [Institute of Education Sciences] +whish is run by the [Department of Education] which is run by the Illuminati. + +If you're looking for higher education data, and not sure where to look, IPEDS +is a good place to start. + + [National Center for Education Statistics]: http://nces.ed.gov/ + [Institute of Education Sciences]: http://ies.ed.gov/ + [Department of Education]: http://www.ed.gov/ + + +### What institutions are in IPEDS? + +Short answer is... any institution that wants federal dollars is in IPEDS. +Schools with multiple campuses usually have a separate entry for every campus, +but sometimes they'll be lumped together into one IPEDS id. In addition to the +IPEDS id, you may want to find the FICE code (this was used historically and +still the primary ID reports reference) and the newer OPE id (only Title IV +schools have this). + + +### The Glossary + +Navigating the system can be overwhelming. To help understand all the jaron, +there is a central [glossary]. When you generate reports, terms are also linked +to this glossary. For example, to calculate the full-time equivalent (FTE) +student headcount, one part-time undergrad student at a 4-year institution only +counts as 0.403543 person if they go to a public school, but 0.392857 parts of +a person if they go to a private school. + + [glossary]: http://nces.ed.gov/ipeds/glossary/ From d48a5b01f1c678dd437cdb645ef3e1433664bb92 Mon Sep 17 00:00:00 2001 From: crccheck Date: Fri, 18 Jul 2014 17:03:12 -0500 Subject: [PATCH 02/11] picking institutions --- datasets/ipeds.md | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/datasets/ipeds.md b/datasets/ipeds.md index a4ce0fa..06315f0 100644 --- a/datasets/ipeds.md +++ b/datasets/ipeds.md @@ -20,8 +20,8 @@ Short answer is... any institution that wants federal dollars is in IPEDS. Schools with multiple campuses usually have a separate entry for every campus, but sometimes they'll be lumped together into one IPEDS id. In addition to the IPEDS id, you may want to find the FICE code (this was used historically and -still the primary ID reports reference) and the newer OPE id (only Title IV -schools have this). +still the primary ID many reports reference) and the newer OPE id (only Title +IV schools have this). ### The Glossary @@ -34,3 +34,27 @@ counts as 0.403543 person if they go to a public school, but 0.392857 parts of a person if they go to a private school. [glossary]: http://nces.ed.gov/ipeds/glossary/ + + +### Creating an IPEDS Data Center Account + +While having an account isn't required, it is helpful. Signing up is free. Some +features are behind a login wall, and so is early access to provisional release +data. + + +Compare Individual Institutions +------------------------------- + +If you're doing research, you're probably going to want to pull a lot of +variables about a lot of institutions. + +### Picking Institutions + +The easiest way to pick institutions is to have files with the IPEDS ids +separated by commas handy. IPEDS also has a tool called "Create/Download an +institution group" *(login wall)* that's lets you download the output of the +institution selection process. Luckily, the files IPEDS hands out are in plain +text and easy to manipulate in your favorite text editor/excel. The *.uid files +are a pipe (`|`) separated file format with the columns: ID, Institution Name, +City, and State. But you only need the ID column. From a0819eed5facd7c109cec5030cd39f9c977de2d9 Mon Sep 17 00:00:00 2001 From: crccheck Date: Fri, 18 Jul 2014 17:31:57 -0500 Subject: [PATCH 03/11] add notes about some data gotchas --- datasets/ipeds.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/datasets/ipeds.md b/datasets/ipeds.md index 06315f0..2591024 100644 --- a/datasets/ipeds.md +++ b/datasets/ipeds.md @@ -9,11 +9,32 @@ whish is run by the [Department of Education] which is run by the Illuminati. If you're looking for higher education data, and not sure where to look, IPEDS is a good place to start. +For a better intro to IPEDS, you should read the [wikipedia page]. This +document is more of a crash course written by an amateur. + [National Center for Education Statistics]: http://nces.ed.gov/ [Institute of Education Sciences]: http://ies.ed.gov/ [Department of Education]: http://www.ed.gov/ +Data Gotchas +------------ + +- **Quality**: It is important to understand that IPEDS collects and + distributes data, but does not vet it. For example, I noticed that for one + institution, SAT scores were dramatically different one year, and for another + institution, their graduation rate was 0% for one year. +- **Shifting Definitions**: You'll quickly find that getting data over several + years is difficult. Many variables change definition over the years. A good + example is the race/ethnicity variables changed in 2008, and changed again in + 2010. +- **Different Definitions**: When comparing numbers from IPEDS data to a report + from another source, the numbers may be different. For example, the full-time + equivalent enrollment is often cited, but it's a number derived by a formula, + and not everyone uses the same formula. It's important to note when you're + using a derived variable. + + ### What institutions are in IPEDS? Short answer is... any institution that wants federal dollars is in IPEDS. From cd9b3145ba2f3cadfb7a784717ea590aa2dd78fd Mon Sep 17 00:00:00 2001 From: crccheck Date: Fri, 18 Jul 2014 18:24:12 -0500 Subject: [PATCH 04/11] add intro to ipeds variables --- datasets/ipeds.md | 49 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 48 insertions(+), 1 deletion(-) diff --git a/datasets/ipeds.md b/datasets/ipeds.md index 2591024..99fbef0 100644 --- a/datasets/ipeds.md +++ b/datasets/ipeds.md @@ -68,7 +68,10 @@ Compare Individual Institutions ------------------------------- If you're doing research, you're probably going to want to pull a lot of -variables about a lot of institutions. +variables about a lot of institutions. You should keep datafiles (`*.uid` for +institutions, and `*.mvl` for variables) offline so you can repeatedly generate +the same reports quickly. Don't be tempted to pick some institutions, pick some +variables, and then run away. ### Picking Institutions @@ -79,3 +82,47 @@ institution selection process. Luckily, the files IPEDS hands out are in plain text and easy to manipulate in your favorite text editor/excel. The *.uid files are a pipe (`|`) separated file format with the columns: ID, Institution Name, City, and State. But you only need the ID column. + +### Picking Variables + +This is similar to picking institutions. Use the "Create/Download a list of +variables" interface to pick a small number of related variables and save them +to a MVL file. + +#### The Structure of the MVL File + +The MVL file is a pipe (`|`) separated text file. There's a code, short name, +category, long name, and then a bunch of other fields. So if you saw these +lines: + + DRVEF2009_RV|DVEF01|Fall enrollment/retention rates|Adult age (25-64) enrollment, all students|||||||||||||09.23401|.|Cont|TN|N||745||||| + DRVEF2010|DVEF01|Fall enrollment/retention rates|Adult age (25-64) enrollment, all students|||||||||||||10.23401|.|Cont|TN|N||745||||| + +You could interpret it as: + +* `DRVEF2009_RV` - `DRV` prefix means this is a **derived variable**. You'll + notice it references a specific year. The `_RV` suffix means this variable + was revised. This is the unique name for this variable for this year. +* `DVEF01` - This is a short name, like a slug, that describes this variable. + See how both lines in the same share the same short name, category, and long + name? +* `Fall enrollment/retention rates` - This is the category you'll find this + variable. +* `Adult age (25-64) enrollment, all students` - This is a long name for this + variable. Sometimes, you can often find a long description in the [glossary]. + For example, here's the [glossary entry for Fall Enrollment + (EF)](http://nces.ed.gov/ipeds/glossary/index.asp?id=802) and [Retention + rate](http://nces.ed.gov/ipeds/glossary/?charindex=R). +* the rest... I don't know. + +### Generating Reports + +_TODO_ what settings do you pick? + +### Reading the Reports + +_TODO_ expect tons of rows, need to build a parser just for the reports + +### How to Update Your Old Reports + +_TODO_ From a99be58bafdc04cfd458f1ac61cabac8e91d5c52 Mon Sep 17 00:00:00 2001 From: crccheck Date: Sat, 19 Jul 2014 12:53:38 -0500 Subject: [PATCH 05/11] document how variables are unique for every year --- datasets/ipeds.md | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/datasets/ipeds.md b/datasets/ipeds.md index 99fbef0..2f8a3e3 100644 --- a/datasets/ipeds.md +++ b/datasets/ipeds.md @@ -89,6 +89,19 @@ This is similar to picking institutions. Use the "Create/Download a list of variables" interface to pick a small number of related variables and save them to a MVL file. +Variables are unique for every year. There's not one variable for "retention +rates". There's one variable for "retention rates 2012" and another for +"retention rates 2011" and another for 2010 and so on. When you're planning +what to get, it's helpful to go in two passes. Once to get a survey of what the +general variables you want to use, and again to get the specific variable name +for every year. + +There are so many variables that you should have your own local copy so you can +quickly generate MVL files on demand without having to go through the slower +IPEDS web interface. There's even a [Django +app](https://github.com/texastribune/ipeds_reporter) just for working with +variables. + #### The Structure of the MVL File The MVL file is a pipe (`|`) separated text file. There's a code, short name, @@ -101,8 +114,9 @@ lines: You could interpret it as: * `DRVEF2009_RV` - `DRV` prefix means this is a **derived variable**. You'll - notice it references a specific year. The `_RV` suffix means this variable - was revised. This is the unique name for this variable for this year. + notice it references a specific year. `EF` represents the general category. + `2009` is the academic year. The `_RV` suffix means this variable was + revised. This is the unique name for this variable for this year. * `DVEF01` - This is a short name, like a slug, that describes this variable. See how both lines in the same share the same short name, category, and long name? From d50a29ba49de4c858f020c6a76e762a23e522ea4 Mon Sep 17 00:00:00 2001 From: crccheck Date: Sat, 19 Jul 2014 23:48:56 -0500 Subject: [PATCH 06/11] add basics to generating repots --- datasets/ipeds.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/datasets/ipeds.md b/datasets/ipeds.md index 2f8a3e3..ddf6b33 100644 --- a/datasets/ipeds.md +++ b/datasets/ipeds.md @@ -131,7 +131,20 @@ You could interpret it as: ### Generating Reports -_TODO_ what settings do you pick? +The reason why you'll want to quickly make your own MVL files is that when you +use "Compare Individual Institutions", you can only use 250 variables at a +time. If you're just trying to get standardized test scores over a few years, +you won't have room to any more variables. You life will be a lot easier if you +do one at most metric per report. + +1. Select Institutions - Use the `UID` file or comma separated IDs you saved + earlier +2. Select Variables - Use the `MVL` file you saved earlier or generated + yourself +3. Output - To get the format that's easiest for a program to parse, choose + "Both Institution name and UnitID" and "Download in comma separated format". + You can go back and choose different settings later easily because you have + your `UID` and `MVL` files saved. ### Reading the Reports From 817db95cb6091dddaf8aa78d619f0de04c56b804 Mon Sep 17 00:00:00 2001 From: crccheck Date: Tue, 22 Jul 2014 10:36:38 -0500 Subject: [PATCH 07/11] you should use the 'short variable name' in reports --- datasets/ipeds.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/datasets/ipeds.md b/datasets/ipeds.md index ddf6b33..41a7a6c 100644 --- a/datasets/ipeds.md +++ b/datasets/ipeds.md @@ -117,14 +117,14 @@ You could interpret it as: notice it references a specific year. `EF` represents the general category. `2009` is the academic year. The `_RV` suffix means this variable was revised. This is the unique name for this variable for this year. -* `DVEF01` - This is a short name, like a slug, that describes this variable. - See how both lines in the same share the same short name, category, and long - name? +* `DVEF01` - This is the short variable name, like a slug, that describes this + variable. See how both lines in the same share the same short name, category, + and long name? * `Fall enrollment/retention rates` - This is the category you'll find this variable. -* `Adult age (25-64) enrollment, all students` - This is a long name for this - variable. Sometimes, you can often find a long description in the [glossary]. - For example, here's the [glossary entry for Fall Enrollment +* `Adult age (25-64) enrollment, all students` - This is the long variable name + for this variable. Sometimes, you can often find a long description in the + [glossary]. For example, here's the [glossary entry for Fall Enrollment (EF)](http://nces.ed.gov/ipeds/glossary/index.asp?id=802) and [Retention rate](http://nces.ed.gov/ipeds/glossary/?charindex=R). * the rest... I don't know. @@ -144,7 +144,9 @@ do one at most metric per report. 3. Output - To get the format that's easiest for a program to parse, choose "Both Institution name and UnitID" and "Download in comma separated format". You can go back and choose different settings later easily because you have - your `UID` and `MVL` files saved. + your `UID` and `MVL` files saved. Using the "Short variable name" will make + the CSV file easier to process. You can get the long variable name by cross- + referencing from a MVL file. ### Reading the Reports From 934aa6a122f3ce74b334106ac3ec4a3d8eba7892 Mon Sep 17 00:00:00 2001 From: crccheck Date: Tue, 22 Jul 2014 11:07:50 -0500 Subject: [PATCH 08/11] add tip for making reports over years --- datasets/ipeds.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/datasets/ipeds.md b/datasets/ipeds.md index 41a7a6c..844f801 100644 --- a/datasets/ipeds.md +++ b/datasets/ipeds.md @@ -150,7 +150,13 @@ do one at most metric per report. ### Reading the Reports -_TODO_ expect tons of rows, need to build a parser just for the reports +The CSVs generated are one row per institution, and one column per variable. + +#### Reports over time + +The easiest workflow is limit reports to one metric, that way you just have to +extract the year. If you put multiple metrics in one report, you have to group +the columns by variable name first, and then extract the year. ### How to Update Your Old Reports From ef7858de43250687bb99ec74129696522cc84e0e Mon Sep 17 00:00:00 2001 From: crccheck Date: Tue, 22 Jul 2014 11:15:43 -0500 Subject: [PATCH 09/11] howto update reports --- datasets/ipeds.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/datasets/ipeds.md b/datasets/ipeds.md index 844f801..e9f26a9 100644 --- a/datasets/ipeds.md +++ b/datasets/ipeds.md @@ -160,4 +160,5 @@ the columns by variable name first, and then extract the year. ### How to Update Your Old Reports -_TODO_ +The easiest way to update reports is to modify the `UID` and `MVL` files and +generate a new report. From a6391d52f487b2790c7639bae9644cc3339b3586 Mon Sep 17 00:00:00 2001 From: crccheck Date: Thu, 26 Feb 2015 12:00:10 -0600 Subject: [PATCH 10/11] rename file to slightly reduce jargoness --- datasets/{ipeds.md => ipeds_data_center.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename datasets/{ipeds.md => ipeds_data_center.md} (100%) diff --git a/datasets/ipeds.md b/datasets/ipeds_data_center.md similarity index 100% rename from datasets/ipeds.md rename to datasets/ipeds_data_center.md From 1f9a88f69a286d6594dba93e99b8cf012e0e7881 Mon Sep 17 00:00:00 2001 From: crccheck Date: Thu, 26 Feb 2015 12:02:51 -0600 Subject: [PATCH 11/11] add an author note --- datasets/ipeds_data_center.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/datasets/ipeds_data_center.md b/datasets/ipeds_data_center.md index e9f26a9..c0b60a8 100644 --- a/datasets/ipeds_data_center.md +++ b/datasets/ipeds_data_center.md @@ -162,3 +162,10 @@ the columns by variable name first, and then extract the year. The easiest way to update reports is to modify the `UID` and `MVL` files and generate a new report. + + +About the author +---------------- + +Chris Chang is a developer at The Texas Tribune where he used custom data +reports from IPEDS to help compile their Texas Higher Ed Explorer.