-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #845 from Teradata/recipe_ordinalencoding
new recipe notebook for ordinalencoding fit and transform function
- Loading branch information
Showing
1 changed file
with
371 additions
and
0 deletions.
There are no files selected for viewing
371 changes: 371 additions & 0 deletions
371
Recipes/ClearScape_Functions/OrdinalEncodingFitandTransform.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,371 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7d0ed18d-aedb-4aef-a6ac-fa6e268c2a82", | ||
"metadata": {}, | ||
"source": [ | ||
"<header>\n", | ||
" <p style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>\n", | ||
" OrdinalEncodingFit and OrdinalEncodingTransform function in Vantage\n", | ||
" <br>\n", | ||
" <img id=\"teradata-logo\" src=\"https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg\" alt=\"Teradata\" style=\"width: 125px; height: auto; margin-top: 20pt;\">\n", | ||
" </p>\n", | ||
"</header>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e59084b2-30cc-4309-8127-6529bb6ce531", | ||
"metadata": {}, | ||
"source": [ | ||
"<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>\n", | ||
"<p style = 'font-size:16px;font-family:Arial'>OrdinalEncodingFit function identifies distinct categorical\n", | ||
" values from the input data or a user-defined list and generates\n", | ||
" the distinct categorical values along with the ordinal value for\n", | ||
" each category.\n", | ||
" The OrdinalEncodingTransform function maps the categorical value\n", | ||
" to a specified ordinal value using the OrdinalEncodingFit\n", | ||
" function output.\n", | ||
"<br> In this notebook we will see how we can use the OneHotEncodingFit and OneHotEncodingTransform functions available in Vantage. </p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3cd2d5ea-cae4-4260-884e-d88255e143ff", | ||
"metadata": {}, | ||
"source": [ | ||
"<hr style=\"height:2px;border:none;\">\n", | ||
"<b style = 'font-size:20px;font-family:Arial'>1. Initiate a connection to Vantage</b>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "66f24e7d-a0b5-4925-9c30-eb30ff0ec162", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from teradataml import *\n", | ||
"\n", | ||
"# Modify the following to match the specific client environment settings\n", | ||
"display.max_rows = 5" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e513d1ed-67f7-4ff9-ad47-c8edb6898f70", | ||
"metadata": {}, | ||
"source": [ | ||
"<hr style=\"height:1px;border:none;\">\n", | ||
"<p style = 'font-size:18px;font-family:Arial'><b>1.1 Connect to Vantage</b></p>\n", | ||
"<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7de87f4a-b757-4525-aa6b-8c9df1fff6e9", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%run -i ../../UseCases/startup.ipynb\n", | ||
"eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)\n", | ||
"print(eng)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "fbb3b095-081a-4dfb-8f8b-e270d7f8c6c3", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%capture\n", | ||
"execute_sql('''SET query_band='DEMO=PP_OrdinalEncodingFitandTransform_Python.ipynb;' UPDATE FOR SESSION; ''')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "20621a33-a250-4c9d-a704-8620e11a7bcf", | ||
"metadata": {}, | ||
"source": [ | ||
"<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f79f096b-3f0e-46dd-ad97-4ef2796c4ff6", | ||
"metadata": {}, | ||
"source": [ | ||
"<hr style='height:1px;border:none;'>\n", | ||
"\n", | ||
"<p style = 'font-size:18px;font-family:Arial'><b>1.2 Getting Data for This Demo</b></p>\n", | ||
"\n", | ||
"<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b10b168a-d86d-4a1b-a0cb-7a22698ddfaa", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%run -i ../../UseCases/run_procedure.py \"call get_data('DEMO_BankChurn_cloud');\" # Takes 30 seconds\n", | ||
"#%run -i ../../UseCases/run_procedure.py \"call get_data('DEMO_BankChurn_local');\" " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a1b9b401-03f3-4586-9c8f-2059f36c1564", | ||
"metadata": {}, | ||
"source": [ | ||
"<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "e9193ac2-6f44-4447-bbcb-138333d23839", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%run -i ../../UseCases/run_procedure.py \"call space_report();\" # Takes 10 seconds" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "80bc0f06-65d1-4ed6-bb91-fc0024f5dc87", | ||
"metadata": {}, | ||
"source": [ | ||
"<hr style=\"height:2px;border:none;\">\n", | ||
"<b style = 'font-size:20px;font-family:Arial'>2. Data Exploration</b>\n", | ||
"<p style = 'font-size:16px;font-family:Arial'>Create a \"Virtual DataFrame\" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f0f7ddb0-b56e-4fb0-a148-56f8e0c8c247", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"tdf = DataFrame(in_schema(\"DEMO_BankChurn\", \"customer_churn\"))\n", | ||
"print(\"Shape of the data: \", tdf.shape)\n", | ||
"tdf" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "1bb4af1e-5ac6-4d15-b15e-2bf3c27ca77c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"tdf.tdtypes" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "52aaeddc-19b6-4816-8eb6-991c25fd479f", | ||
"metadata": {}, | ||
"source": [ | ||
"<p style = 'font-size:16px;font-family:Arial'>A bank aims to analyze whether customer geography influence churn. Since \"Geography\" is a categorical variable (Germany, France, Spain), we need to be transform it into numerical representations using Ordinal Encoding.\n", | ||
"</p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "1a391d86-56a7-4d8c-bbdc-2002f14692d7", | ||
"metadata": {}, | ||
"source": [ | ||
"<p style = 'font-size:16px;font-family:Arial'>Fit Ordinal Encoding for 'Geography' column\n", | ||
"</p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "413436cd-aa9b-44b1-974c-561d3c3326f9", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"help(OrdinalEncodingFit)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "2c4d0acf-3d9a-4473-9fa2-b36234c49d10", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"fit_result = OrdinalEncodingFit(\n", | ||
" target_column=\"Geography\",\n", | ||
" data=tdf\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4d78eb1c-2909-4705-83e6-84f7e16d493b", | ||
"metadata": {}, | ||
"source": [ | ||
"<p style = 'font-size:16px;font-family:Arial'>Apply Ordinal Encoding Transformation to make changes in the dataset\n", | ||
"</p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "bcd5fb18-d780-4615-8198-9f305b7ed398", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"help(OneHotEncodingTransform)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "0ccdd5b7-188d-4150-8b12-2207562df73d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Applying OrdinalEncodingTransform\n", | ||
"transformed_data = OrdinalEncodingTransform(\n", | ||
" data=tdf,\n", | ||
" object=fit_result,\n", | ||
" accumulate=['CustomerId', 'CreditScore', 'Age', 'Balance', 'Exited'] \n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b3c47a8b-2e9a-4d7b-99b7-0ec4780a50a3", | ||
"metadata": {}, | ||
"source": [ | ||
"<p style = 'font-size:16px;font-family:Arial'>The categorical values of Geography column is now replaced by numbers. Example: Germany → 0, France → 1, Spain → 2\n", | ||
"</p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "eab0e2da-f881-4cb7-a2b9-7295b3797373", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Print the transformed DataFrame\n", | ||
"transformed_data.result" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "09a26038-4747-420a-86b6-6772d0d9ddc6", | ||
"metadata": {}, | ||
"source": [ | ||
"<hr style=\"height:2px;border:none;\">\n", | ||
"<b style = 'font-size:20px;font-family:Arial'>3. Cleanup</b>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c1effadb-7103-4424-91d4-016e2dc0bd82", | ||
"metadata": {}, | ||
"source": [ | ||
"<hr style=\"height:1px;border:none;\">\n", | ||
"<p style = 'font-size:18px;font-family:Arial'> <b>Databases and Tables </b></p>\n", | ||
"<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "15f98d8d-d519-4d9f-a1d8-1b5e9e213fdd", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%run -i ../../UseCases/run_procedure.py \"call remove_data('DEMO_BankChurn');\" # Takes 10 seconds" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "50fc554c-282c-4591-ad5a-3d9c2b590f4f", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"remove_context()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "011e3f1b-8421-4178-b4e4-d865812b0ad5", | ||
"metadata": {}, | ||
"source": [ | ||
"<hr style=\"height:1px;border:none;\">\n", | ||
"<b style = 'font-size:18px;font-family:Arial'>Dataset:</b>\n", | ||
"\n", | ||
"- `RowNumber`: Row index\n", | ||
"- `CustomerId`: Unique customer ID\n", | ||
"- `Surname`: Customer's surname\n", | ||
"- `CreditScore`: Credit score of the customer\n", | ||
"- `Geography`: Country (Germany / France / Spain)\n", | ||
"- `Gender`: Gender (Male / Female)\n", | ||
"- `Age`: Age of the customer\n", | ||
"- `Tenure`: Number of years the customer has been associated with the bank\n", | ||
"- `Balance`: Account balance\n", | ||
"- `NumOfProducts`: Number of bank products used\n", | ||
"- `HasCrCard`: Credit card status (0 = No, 1 = Yes)\n", | ||
"- `IsActiveMember`: Active membership status (0 = No, 1 = Yes)\n", | ||
"- `EstimatedSalary`: Estimated salary of the customer\n", | ||
"- `Exited`: Customer churn status (0 = No, 1 = Yes)\n", | ||
"\n", | ||
"<p style = 'font-size:16px;font-family:Arial'><b>Links:</b></p>\n", | ||
"<ul style = 'font-size:16px;font-family:Arial'>\n", | ||
" <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>\n", | ||
" <li>OrdinalEncodingFit function reference: <a href = 'https://docs.teradata.com/search/all?query=OrdinalEncodingFit&value-filters=prodname~%2522Teradata+Package+for+Python%2522*vrm_release~%252220.00.00.03%2522&content-lang=en-US&_gl=1*19tr550*_gcl_aw*R0NMLjE3MzMyMDc4MjguRUFJYUlRb2JDaE1JeVpYM3BQNktpZ01WSWpLREF4MmluUmowRUFBWUFTQUFFZ0tSRVBEX0J3RQ..*_gcl_au*MTM2MDk0NzQ4OS4xNzM3NTI3NTA5*_ga*NTU2MTUwNDQ1LjE2OTM4MDU3NjE.*_ga_7PE2TMW3FE*MTczODY1MTA2OS4xNDUuMS4xNzM4NjUxMzgwLjYwLjAuMA..'>here</a></li>\n", | ||
" <li>OrdinalEncodingTransform function reference: <a href = 'https://docs.teradata.com/search/all?query=OrdinalEncodingTransform&value-filters=prodname~%2522Teradata+Package+for+Python%2522*vrm_release~%252220.00.00.03%2522&content-lang=en-US&_gl=1*19tr550*_gcl_aw*R0NMLjE3MzMyMDc4MjguRUFJYUlRb2JDaE1JeVpYM3BQNktpZ01WSWpLREF4MmluUmowRUFBWUFTQUFFZ0tSRVBEX0J3RQ..*_gcl_au*MTM2MDk0NzQ4OS4xNzM3NTI3NTA5*_ga*NTU2MTUwNDQ1LjE2OTM4MDU3NjE.*_ga_7PE2TMW3FE*MTczODY1MTA2OS4xNDUuMS4xNzM4NjUxMzgwLjYwLjAuMA..'>here</a></li></li>\n", | ||
"</ul>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "2e4ead86-9089-43a9-98d7-c1c877046df9", | ||
"metadata": {}, | ||
"source": [ | ||
"<footer style=\"padding-bottom:35px; border-bottom:3px solid #91A0Ab\">\n", | ||
" <div style=\"float:left;margin-top:14px\">ClearScape Analytics™</div>\n", | ||
" <div style=\"float:right;\">\n", | ||
" <div style=\"float:left; margin-top:14px\">\n", | ||
" Copyright © Teradata Corporation - 2025. All Rights Reserved\n", | ||
" </div>\n", | ||
" </div>\n", | ||
"</footer>" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |