1261 lines
2 MiB
Text
1261 lines
2 MiB
Text
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "594c8d15-68ec-466b-bec1-4d74120f8c0e",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Predicting Housing Prices in Ames, Iowa\n",
|
||
|
"\n",
|
||
|
"To better understand the housing market in Ames we will create a regression model to predict the \n",
|
||
|
"sale price of any given house by analyzing certain features of that house. By comparing the predicted \n",
|
||
|
"sale price to a list of known sale prices of houses with the same features, we can test our model and \n",
|
||
|
"assign a score to the results. This can then be used to judge whether or not the price of a house for\n",
|
||
|
"sale is in line with what it should be, or to come up with a price for a new listing."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "4018d860-3f73-4cd1-baa2-56406ddbf05f",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Imports"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"id": "2438f7a8-f9b7-4fba-9783-9c644b44362d",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-30T20:48:02.024387Z",
|
||
|
"iopub.status.busy": "2022-05-30T20:48:02.023868Z",
|
||
|
"iopub.status.idle": "2022-05-30T20:48:03.639974Z",
|
||
|
"shell.execute_reply": "2022-05-30T20:48:03.639485Z",
|
||
|
"shell.execute_reply.started": "2022-05-30T20:48:02.024310Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import numpy as np\n",
|
||
|
"import pandas as pd\n",
|
||
|
"import matplotlib.pyplot as plt\n",
|
||
|
"from matplotlib.gridspec import GridSpec\n",
|
||
|
"from sklearn.preprocessing import StandardScaler, QuantileTransformer, PolynomialFeatures\n",
|
||
|
"from sklearn.linear_model import LinearRegression, LogisticRegression, LassoCV, Lasso, Ridge, RidgeCV\n",
|
||
|
"from sklearn.neighbors import KNeighborsRegressor\n",
|
||
|
"from sklearn.model_selection import GridSearchCV, train_test_split\n",
|
||
|
"from sklearn.pipeline import Pipeline\n",
|
||
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||
|
"import seaborn as sns\n",
|
||
|
"from wand.image import Image\n",
|
||
|
"from wand.color import Color\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "3f3887bf-5bf7-4802-9c45-913d67063613",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Load and Clean Data"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"id": "262991a8-6f4f-4af5-9654-4c58763fb931",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-30T20:48:03.641159Z",
|
||
|
"iopub.status.busy": "2022-05-30T20:48:03.640832Z",
|
||
|
"iopub.status.idle": "2022-05-30T20:48:03.672779Z",
|
||
|
"shell.execute_reply": "2022-05-30T20:48:03.672211Z",
|
||
|
"shell.execute_reply.started": "2022-05-30T20:48:03.641141Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Load in data from CSV\n",
|
||
|
"kaggle = pd.read_csv('data/test.csv')\n",
|
||
|
"train = pd.read_csv('data/train.csv')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "c78eaa1a-340a-4c00-999b-1ab5456c1021",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Spaces should be underscores and column names should be all lowercase to make things easier to type"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"id": "452a5cf0-878d-4ae5-b799-201c2c082d9f",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-30T20:48:03.673849Z",
|
||
|
"iopub.status.busy": "2022-05-30T20:48:03.673619Z",
|
||
|
"iopub.status.idle": "2022-05-30T20:48:03.677612Z",
|
||
|
"shell.execute_reply": "2022-05-30T20:48:03.676907Z",
|
||
|
"shell.execute_reply.started": "2022-05-30T20:48:03.673832Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"for df in [train,kaggle]:\n",
|
||
|
" df.columns = [col.lower().replace(' ','_') for col in df.columns]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "23670302-d9dd-4c2e-a391-0be65ddc3832",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Discovery\n",
|
||
|
"Let's figure out what we're going to feed into the model. [The dictionary can be found here](data/dictionary.txt)<br>\n",
|
||
|
"First let's make a function to make plotting easier."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 8,
|
||
|
"id": "d152cb9e-3b90-42dd-87fe-9265158fce6f",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-30T20:50:35.358569Z",
|
||
|
"iopub.status.busy": "2022-05-30T20:50:35.357933Z",
|
||
|
"iopub.status.idle": "2022-05-30T20:50:35.363845Z",
|
||
|
"shell.execute_reply": "2022-05-30T20:50:35.362921Z",
|
||
|
"shell.execute_reply.started": "2022-05-30T20:50:35.358538Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"sns.set_context('talk')\n",
|
||
|
"sns.set_style('darkgrid')\n",
|
||
|
"def discover_plot(x,y):\n",
|
||
|
" figname = str(x[0])+str(y[0])+'.svg'\n",
|
||
|
" return sns.jointplot(y=y, x=x,\n",
|
||
|
" kind=\"reg\", truncate=False,\n",
|
||
|
" color='crimson', height=7,\n",
|
||
|
" ).savefig(figname, backend='Cairo')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 5,
|
||
|
"id": "0217c6ad-a2d3-48de-a116-723ca1d0d41b",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-30T20:48:17.945279Z",
|
||
|
"iopub.status.busy": "2022-05-30T20:48:17.944876Z",
|
||
|
"iopub.status.idle": "2022-05-30T20:48:17.949875Z",
|
||
|
"shell.execute_reply": "2022-05-30T20:48:17.949042Z",
|
||
|
"shell.execute_reply.started": "2022-05-30T20:48:17.945247Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"gr_liv = str(train.gr_liv_area[0])+str(train.saleprice[0])+'.png'"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 102,
|
||
|
"id": "ca94d4a7-be9c-46e7-9d70-0bc3eeb4f33f",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T08:04:32.972946Z",
|
||
|
"iopub.status.busy": "2022-05-29T08:04:32.972334Z",
|
||
|
"iopub.status.idle": "2022-05-29T08:04:32.977528Z",
|
||
|
"shell.execute_reply": "2022-05-29T08:04:32.976647Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T08:04:32.972916Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"ename": "SyntaxError",
|
||
|
"evalue": "invalid syntax (3671886240.py, line 2)",
|
||
|
"output_type": "error",
|
||
|
"traceback": [
|
||
|
"\u001b[0;36m Input \u001b[0;32mIn [102]\u001b[0;36m\u001b[0m\n\u001b[0;31m img.\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"with Image(filename=gr_liv) as img:\n",
|
||
|
" img.\n",
|
||
|
" img.save(filename='edge-pl1.png')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 103,
|
||
|
"id": "bcce29db-9682-4264-8fe5-616660ca82d0",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T08:04:33.168532Z",
|
||
|
"iopub.status.busy": "2022-05-29T08:04:33.168059Z",
|
||
|
"iopub.status.idle": "2022-05-29T08:04:33.177108Z",
|
||
|
"shell.execute_reply": "2022-05-29T08:04:33.176550Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T08:04:33.168502Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"<class 'pandas.core.series.Series'>\n",
|
||
|
"Int64Index: 2047 entries, 0 to 2050\n",
|
||
|
"Series name: gr_liv_area\n",
|
||
|
"Non-Null Count Dtype\n",
|
||
|
"-------------- -----\n",
|
||
|
"2047 non-null int64\n",
|
||
|
"dtypes: int64(1)\n",
|
||
|
"memory usage: 96.5 KB\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"train.gr_liv_area.info()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "ea5fcbc3-3180-4fd6-9990-02fb020c7e6f",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Above grade living area sounds like a feature with lots of influence and has no nulls. A low hanging fruit"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 104,
|
||
|
"id": "eed7245f-6dea-416d-8a8e-2542f65dd241",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T08:04:33.585261Z",
|
||
|
"iopub.status.busy": "2022-05-29T08:04:33.584773Z",
|
||
|
"iopub.status.idle": "2022-05-29T08:04:34.508102Z",
|
||
|
"shell.execute_reply": "2022-05-29T08:04:34.507183Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T08:04:33.585232Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAHfCAYAAAAvE8DnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAADe4klEQVR4nOydeZwcVbm/n1NVvcz09GyZyZ5AyAKEHVEIRBBC2BcJyBpFFsUrqKig96dX9Hq5XpQIXEEWLwIisu8BDVsQhYSoIBAMmJAIhJBMMvt0T29VdX5/nOptpmemZzIzPct5Ph+YTHfVqdPVPX3e877f932FlFKi0Wg0Go1mXGOUegIajUaj0WhKjzYINBqNRqPRaINAo9FoNBqNNgg0Go1Go9GgDQKNRqPRaDRog0Cj0Wg0Gg1glXoCmr7ZsaOj1FMY0VRVlQHQ1hYr8UzGHvreDi36/g4t9fXhUk9hVKE9BBqNRqPRaLRBoNFoNBqNRocMNJqiqAyaGL0U9XSFoD3u5D0mkyncLQ3IxhZkYzOyLZJ90jIR9bUYk+sxJtchqsIDuoZGo9EMFtog0GiKwJCSbT+/u8fnJ3/7C7ibt2K/8jrO6+tw1v8Ld9NmsItbwK2ZU3A/tQ+x5naMmVMR5cGC19BoNJqhQhsEGs0AkakUcksD7odb+fjpP2J/8HGPx4rqSsyaMHZLByDBcaEzpn4C9odbiXy4FQBHCMQuUzH32wMxdSJCiOF4ORqNZpyjDQKNpkiklMr9/+FW3M1bkdsawVULuusdI6orMQ/eF3P+HIzdZ2HO2QVRX4vwWVQHjDwvg5QSYglkWwdy2w78ZQFiL78OyRTy/S3Y729BTJ2IufATGHU1JXjFGo1mPKENAo2mB2SkE+e9D3DefJcdr/+D1J9fg3gi/yAhEJPrqDz7eOxDDsSYPxthmkWNL4SA8qAKD0ypZ9IVX2DLNXfivv8R7pv/RDY0Ij/ejv3QCowD5iOTKaC4sTUajaa/aINAMy7IFezJlI2zowVn2w6cxlbcplbsljbiDS24zW3IplbcDz9Gbt1ReLBwCGPGFIyZUxDTJiECfmq+cR6u03sncaMIz7+wTMw5u2DO2QX3w604r7yObGnDff0fNJz9bepuvgprcl3Bc7XoUKPR7AzaINCMaaTj4Lz1TyJvvUPbQ88itzdBpLNfY4gp9ZQdvA+J1gjG1IlQFe4e13d7Fx0CTL2if6JAZXAch/PXtbivryP55no+Xnwx1omfwaiv7Xa8Fh1qNJqdQRsEmjGJu6WB5L1PkVq+EtnYQo8mgM+CsiD+2dNxq6sRE6oQtdUYk+sw5u6KOWcmoircLf4/XAjTxDpkf9wpE3H+9FdkRxT7iReUUTClftjno9Foxi7aINCMKdytO0j88nekHn8+I/gDsGZNw/X5EJPqEDWViFA5VJQhfD4Apn7n/F5d/sW4+4cSY5epTHzo53y85JvQGcN+6kWs0xZrsaFGoxk0tEGgGRNIxyFx9+Mk/vduiMUBEJMm4D/rBHwnHUnt7Km97/D7cPn3190/FPjn7YrvtMWkHn8Oosoo8C05BlFZUeqpaTSaMYA2CDSjHqehieZLryax+g0AjLoaqr6xlIrPHYvwqY94qXf4g4WoqsA6+Ujsx56Dzjip5S/iW7IYUda9kJFGo9H0B20QaEY1zlv/ZNvX/wt3ezMAxvzZmAsOILq1iegv7s0cNxJ2+IOFUVuNdcIR2E++CG0d2L9/CevUo0s9LY1GM8rRzY00o5bUi2uInv9d3O3NiHAI67hPY33mYETAX+qpDTnGlIlYiw8FQDY04bz8txLPSKPRjHa0QaAZ8VQGTaoDRt5//pf/Quzy/4ZEEmuXKUx59AaM3WaUeqrDirHbDMyD9wPAXbeRyIMrSjwjjUYzmtEhA82Ip2tjIff9j7BXvAyui6itInj8p/HPHl/GQBrjwPm425uQ//qI5h/+ktBuu2LuM6/U09JoNKMQ7SHQjCrcDz/OMwasUxZhlJeVelolQwiBddQhUB2GZIrOb/w3bmt7qael0WhGIdog0Iwa3OZW7GfyjYFCbYLHGyLgxzru04jyIHLbDuLfXYbMqcGg0Wg0xaANAs2oQMbi2L//E6RsCJVhnXykNgZyMGqrqf2vrwFg//lvJG9/qMQz0mg0ow1tEGhGPDKZUp6B9ghYJtbxR6hKg5o8Qp9dhO9zxwOQ+MVvsf+6tsQz0mg0owltEGhGPC3/83/Ij7cDYB51CMbE7o19NIrg9y7B2GM3cF1i374Gd0dzqaek0WhGCdog0IxoUi+uIXL3kwAYn9gLc84uJZ7RyEYE/JRf/z2oKEc2thD7zs+Qjm6JrNFo+kYbBJoRi7u9ifj3rwdATJ2I+cl9Sjyj0YGxy1TKrr4cAGfNW6q/g0aj0fSBNgg0IxLpusT+38+Rre2IygqsRQsQhv64FovvmIX4P38qAMnbHyK1/MUSz0ij0Yx09DespuQUqkRo/u4JHK9ZUd3/XI4Ih0o7yVFI4IqLMA/ZH4DYD27AfvPd0k5Io9GMaHSlQk3J6VqJULa2k3rg9+q5PWcTOn4hbf/YVKrpjRoMU1AdyLHxA36qbv4+25Zcjv3+FuJf/y8mPn4jnTValKnRaLqjDQLNiEJKif3iGnBcqCjHPOzAUk9p9ODmG1ZpxIL94ePtuDtaaPri9wjc+VOM2qrhn59GoxnR6JCBZkTh/mMDcusOAKzPfArh95V4RqMfUVOFdeynwTBIbfiQzou+j2ztKPW0NBrNCEMbBJoRg+yIZnQDxu6zMGZOLe2ExhDGjMlYxy4Ey8T95yail/wA2REt9bQ0Gs0IQhsEmhGBlBL7pb+o0sRlQR0qGAKMWdOpu/67YBi4a9cT/eK/4zY0lnpaGo1mhKANAs2IwF3/PvLDrQBYhx+ECAZKPKOxSfkJhxP8ybfANHDf2Uj07G/irNtY6mlpNJoRgDYINCXHaWzBefk1AMRuMzBmzyzxjMY2/lOOovyW/4RQGbKhiejnryT13CulnpZGoykx2iDQlJyW/7wFEkkI+LAOP6jU0xkXWAs/QejenyOm1EMsTuwb/03sqv9FRmOlnppGoykR2iDQlJTU86vo/P2fADAP+wSivKzEMxo/mHN3JXT/9ZniRamHnyF6+tew/76utBPTaDQlQRsEmpIhWzuI//iXAIgZkzF2n1XiGY0/jPpaym+/msB3vwQ+C/fDj+k87wpiP7gBt6Wt1NPTaDTDiDYINCUj/pNbkY0tiFAZ1hGfQghR6imNS4RhEDj/NEIP/S++vecAkHrkWTpP/DLROx/F6IxlSkpXBs0Sz1aj0QwV2iDQlITUC6tJPaUa7lT/+8WIyooSz2h8kC5vXOi/CfvMZupj/4v56YPA78Nt7aD5P25k88Hn8PFlP2Hrst9gSFnql6DRaIYIXbpYM+y4re3Ef3QTAOaC/ak45wSi1/22xLMaJ/RQ3jjN1Cu+gLnPPIzZM3BefRP33U24zW24f/gTYko9iaMPhj3nDeOENRrNcKE9BJphRUpJ/OpbkE0tECqj7Mff0KGCEYgoL8M66hBCnz8Fa9Y0AOTWHTScfjmd3/wJzvtbSjxDjUYz2GiDQDOspB59Fvv3LwEQ/M6XMKZNKvGMNL1h1tdQftrRWKcchaivAcB+5mWiJ19C7HvXacNAoxlDaINAM2w46/9F/OpbALAWH4rvjGNLPCNNsRjTJ2OdcRwTrv8uYtokcFxSjz9P9KRLiH33Wpx/fVTqKWo0mp1EGwSaYUFGO4l9838gkURMn0zZf12uQwWjDCEE4dOOYvrzt1P7k29gTp8Erktq+YtET/4Kqcuvxv/G20gtPNRoRiVaVKgZcqSUxH50E+6/PgLLovy6f9dZBaMVV9Jw430AGCceCev/hfPaP6A9QuzZVcSeXYUxb1d8px+L7/jDMepqSjxhjUZTLNog0Aw5ydsewH76jwAEv3Mx5t5apT4WEKaBuedsjHmzcDd9iPvWemRDI+7690n8z20kfvZ/mAsOwHf0oVgLP4ExdWKpp6zRaHpBGwSaISX55EoSv1Bpbr7PHo3vvJNLPCPNYCNMA3P
|
||
|
"text/plain": [
|
||
|
"<Figure size 504x504 with 3 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"discover_plot(train.gr_liv_area,train.saleprice);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "694fbdfa-0e56-484f-b860-c18dedb3e6a7",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"It looks decent, the correlation is linear and distributions are of desirable shape. But the two outliers will probably be an issue"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "d54c95b6-6220-4cc3-8fc1-eddc6e2d58a1",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"We can make a function to drop any outliers above a certain threshold"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 6,
|
||
|
"id": "0a538ee8-5660-4551-a8a8-b410233a65dd",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-30T20:48:53.709770Z",
|
||
|
"iopub.status.busy": "2022-05-30T20:48:53.709081Z",
|
||
|
"iopub.status.idle": "2022-05-30T20:48:53.713811Z",
|
||
|
"shell.execute_reply": "2022-05-30T20:48:53.713075Z",
|
||
|
"shell.execute_reply.started": "2022-05-30T20:48:53.709739Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def drop_outliers(dflist,feature,threshold):\n",
|
||
|
" for df in dflist:\n",
|
||
|
" df.drop(df[feature][df[feature]>threshold].keys(),inplace=True)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 9,
|
||
|
"id": "c430bccc-ed7c-4a4f-98b0-3fe523947e6c",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-30T20:50:39.651753Z",
|
||
|
"iopub.status.busy": "2022-05-30T20:50:39.651214Z",
|
||
|
"iopub.status.idle": "2022-05-30T20:50:40.476122Z",
|
||
|
"shell.execute_reply": "2022-05-30T20:50:40.475580Z",
|
||
|
"shell.execute_reply.started": "2022-05-30T20:50:39.651732Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAHfCAYAAAAvE8DnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAADdmklEQVR4nOydd3xc1Zm/n3PvNGlm1OWGC8WmmE4gYCA4YEwvgRCqE0IJZBeSkMQkm82GzWbZ/EhwgA2EkqWH0LuBmGYCARsngZhmwMYOYIwtq42kGWnKvff8/jh3mjSSRrKkUTnP5wOWZu4998yd0Zz3vOX7CimlRKPRaDQazYTGKPUENBqNRqPRlB5tEGg0Go1Go9EGgUaj0Wg0Gm0QaDQajUajQRsEGo1Go9Fo0AaBRqPRaDQawFPqCWj6p7Gxo9RTGNVUVpYB0NbWVeKZjD/0vR1e9P0dXurrw6WewphCewg0Go1Go9Fog0Cj0Wg0Go0OGWg0RVERMDH6EPV0hKA9buc9JpMpnE0NyKZWZFMLsi2afdJjIuprMCbXYUytR1SGB3UNjUajGSq0QaDRFIEhJVt+c3evz0/54TdwNm7Geu1N7DfXYK/9J86GjWAVt4B7Zk7FPmB34i0dGLOmIcrLCl5Do9FohgttEGg0g0SmUshNDTifbubzp/+M9cnnvR4rqiowq8NYrR2ABNuBzi71L2B9uhnr080A2EIgZkzF3GdXxHaTEUKMxMvRaDQTHG0QaDRFIqVU7v9PN+Ns3Izc0gSOWtAd9xhRXYF54N6Yc2dj7Lw95uxZiPoahNdDld/I8zJIKSGeQLZ1IDc34Qv66XrlDUimkJ9+jvXp54gp9ZiH7IcxubYEr1ij0UwktEGg0fSCjHZif/QJ9lsf0Pjme6T+8gbEE/kHGQIxuY6Ks47DOmhfjLmzEUZxubpCCCgLIMoCMKWeyYu/waZf3YH8eBP22x8iNzcitzRiPfIsxt67Ii89E/0nq9Fohgv97aKZEOQm7MmUhd3Yir2lEbspgtMcwWptI97QitPShmyO4Hz6OXJzYy+DBTFmTMWYMRUxfQrC56X6u2fj2H13EjeK8PwL00TsNBNjp5k4mxqwX31DzeetD2g47fvU3fQzPNOnFDxXJx1qNJptQRsEmnGNtG3stz8k+vb7tD30HHJrM0Q7BzSGmFpP2YF7kohEMaZNgspwz7i+03fSIcC0xQNLCjS2m4w47RicN9/D/vu7JNes5/OjL8Zz3HyMKXU9jtdJhxqNZlvQBoFmXOJsaiB571Okli5HNrXSqwng9UBZAN9O03GqqhC1lYiaKowpdRhztsecPRNRGe4R/x8phGlgHrAnYmo9zl/+jtPajvXkcjzHHYbRi6dAo9FoBoM2CDTjCmdzI4nf/ZHU4y9kEv4APDtsh+P1IibXIaorEMFyCJUhvF4Apv3o3D5d/sW4+4cTY/oUJj9yHZtO/g50xLCeeRnPSQsKego0Go1mMGiDQDMukLZN4u7HSfzv3dAVB0BMrsV3xnF4Tzicmp2m9b3D78flP1B3/3Dg3X4a3lMWKmOnPYr19J/xnrIQUVNZ6qlpNJpxgDYINGMeu6GZlkuuJLFyNQBGXTWV31tE6GtHI7zqI17qHf5QIULleE88nNSjz0NXnNRTL+E99ShEqLzUU9NoNGMcbRBoxjT22x+y5bv/jbO1BQBj7k6Y8/YltrmZ2G/vzRw3Gnb4Q4WoDOM54ctYj78A0U6sp17C89WjSj0tjUYzxtHNjTRjltRLq4id+2OcrS2IcBDPMV/C8+UDEX5fqac27Bj1NXiOPQwMA9nShr18lRI60mg0mkGiDQLNqKciYFLlN/L+8736V7ou+x9IJPFsP41pj/0vxo4zSj3VEcWYPgXz0P0AcNZ/Ssetj5R4RhqNZiyjQwaaUU/3xkLOx59hLXsVHAdRU0ngmC/h3XF6CWdYOozd5yAbmnE+/CeRX99O+Zwd8czbp9TT0mg0YxDtIdCMKZxPP88zBjwnLcAoD5R6WiVDCIE5/wBEfTU4Dl0/vArn862lnpZGoxmDaINAM2ZwWiJYz+YbA2ICGwNphMeD55jDMKrCyEg7XT/4f8hkqtTT0mg0YwxtEGjGBLIrjvXMK5CyIFiG58TDtTGQgwgHqb3mxyAE9tsfEv/1raWekkajGWNog0Az6pHJlPIMtEfBY+I5dr5SGtTkUTZ/f3z/chYAqXuXknrqpRLPSKPRjCW0QaAZ9bT+v/9DunFx84iDMCbVlHhGoxf/v5yFeegXAOj6z99ir/ukxDPSaDRjBW0QaEY1qZdWEb37SQCM/ffAnD2rxDMa3QjTpOxXixFT66ErQddl/4OMDay7o0ajmZhog0AzanG2NhP/6bUAiGmTMPffo8QzGhsY1ZWUX/vv4PHg/PMzuv7jOi1apNFo+kUbBJpRiXQcun7yG2SkHaMyhGfBPIShP67FYu61C4F/vxgA69lXSf7hiRLPSKPRjHb0N6ym5BRSIjT/+AS226yo9peXIcLB0k5yDOI94zi8Jx4OQOLqW7FefaPEM9JoNKMZrVSoKTndlQhlpJ3UA8+o5+buRPDYQ2l7b0OppjdmMExBlT/fxnd++T0aPvqE1Psb6Lrsf6i/fwnx2TuWaIYajWY0ow0CzahCSon10iqwHQiVYx6yX6mnNHZw8g2rDF/cCzZuQUY7aTr/Z5Tf+xuM6VNGfn4ajWZUo0MGmlGF89465OZGANW50Ost8YzGPiJYjveEw8HvxWlqpfOin+E0tpR6WhqNZpShDQLNqEF2xDJ5A8auO2LMnFraCY0jRE0lnmPng8+L8/EmYosux9nUUOppaTSaUYQ2CDSjAikl1st/VdLEZQHMg/ct9ZTGHca0SdTf8p8Q8CM3bib29cux//lZqael0WhGCdog0IwKnLUfIz/dDIDnsP0RAX+JZzQ+KTtsf8p//98QLENuaaLz65dj/f2dUk9Lo9GMArRBoCk5dlMrtlsSJ3acgbHTzBLPaHzj2X8Pgnf8P0RVBbKljc7zfkLijke1eJFGM8HRBoGm5LT+102QSILfi+ew/Us9nQmBucfOBO+7BmOXHcB2SFx9K13f/yUy0lHqqWk0mhKhDQJNSUm9sILOZ14BwDzkC4jyshLPaOJgzJpG8N7f4D15AQDWc68RPfHbpJ57tcQz02g0pUAbBJqSISMdxH/xOwDEjClqt6oZUURZgMAvf0DgF9+F8jJkcytdl/2Szu9dieOWf2o0momBNgg0JSP+y5uRTa2IYBme+V9ECFHqKU1IhBD4TjuG0JM3EZh/AADW8yuIHX8R7b+6FRFpz0hKVwTMEs9Wo9EMF9og0JSE1IsrST31EgBV/3YhoiJU4hlNDNLyxoX+q9lhCpNv/wXmgnlQHkDGE7T99o9s/OKZbPrWz9m85C4MnXio0YxbtHSxZsRxIu3Ef34DAOa8fQiddRyxa/5Q4llNEHqTN3aZtvgbmLvsgLHjdOw31+Cs/gAZ68Je/jrOOx/Stf9c5EH7aW+ORjMO0R4CzYgipSR+5U3I5lYIllH2i+/pxWUUIrxePAfuTei8r+DZZXsAZGMrjRf8jM5Fi7FWvVXaCWo0miFHGwSaESX16HNYz7wMQOBH38LYbnKJZ6TpC6MiRPnx8/F89SiE2xDJ/sf7dJ73E2Ln/QRr9fslnqFGoxkqtEGgGTHstf8kfuVNAHgWHoz3tKNLPCNNsRiT6/CedAST7v015n67A2CveovOs39I7KKfYa16SwsbaTRjHG0QaEYEGeuk6/v/DxJJxPQplP33ZTpUMAYpP3hvpj24hPo7rsS35xwA7FffUGqHZ16G57lXkMlUiWep0WgGg04q1Aw7Ukq6fn4Dzj8/A4+H8mv+TVcVjFUcSYObACoP3R/PjKnYb76P3NJI8p11tFx2FaKqAu/JR+A99SjMOduXdr4ajaZotEGgGXaStzyA9fSfAQj86ELMPXYu7YQ0Q4IQArH9dIztp+NsacRe/QHyk03ISDvJux4nedfjGDvOwHPkwXgXzMOYuxPC1DoGGs1oRRsEmmEl+eRyEr9VZW7erxyJ95wTSzwjzXB
|
||
|
"text/plain": [
|
||
|
"<Figure size 504x504 with 3 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"drop_outliers([train],'gr_liv_area',4500)\n",
|
||
|
"discover_plot(train.gr_liv_area,train.saleprice);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "faa68621-e52d-4247-ac20-dc9976b03814",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"As living area increases, so does sale price"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "8310d45d-e9c2-4636-aa3d-1d6f1a47c491",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"By subtracting the year a house was built from the year it was sold, we can get the age of a house at its point of sale and use this in our model"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 107,
|
||
|
"id": "0e3b4f2d-539f-488b-be4b-351735cd02cc",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T08:04:35.388421Z",
|
||
|
"iopub.status.busy": "2022-05-29T08:04:35.388144Z",
|
||
|
"iopub.status.idle": "2022-05-29T08:04:37.347017Z",
|
||
|
"shell.execute_reply": "2022-05-29T08:04:37.346334Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T08:04:35.388402Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAHfCAYAAAAvE8DnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAADiuUlEQVR4nOy9eZwcVbn//z5VvcxMT8+WmexhTQKyBESBBBCEJIisgsomyhJF/cJV9IZ7vdfv5d7r19+9KBFREHBhUZBdCIYlbEEQsqBAWAyYsEmAZJJZe3rvqjq/P05VLzM9Mz2ZpWd6zvv1CqG7qk+dqpn0ec6zfB4hpZRoNBqNRqOZ1BjlnoBGo9FoNJryow0CjUaj0Wg02iDQaDQajUajDQKNRqPRaDRog0Cj0Wg0Gg3aINBoNBqNRgP4yj0BzeDs3NmzS5+rr68GoLs7MZLTGTfo+5vY6Pub2EyE+2tpCZd7ChMK7SHQaDQajUajPQSVjGkaCCGylvyuYFkOsVhqBGel0Wg0mvGINggqGCEEpNM429p26fOB6c3gM0d4VhqNRqMZj2iDoMKxWjvYeesfd+mzLRecijFr2gjPSKPRaDTjEZ1DoNFoNBqNRhsEGo1Go9FotEGg0Wg0Go0GbRBoNBqNRqNBGwQajUaj0WjQBoFGo9FoNBq0QaDRaDQajQZtEGg0Go1Go0EbBBqNRqPRaNAGgUaj0Wg0GrRBoNFoNBqNBm0QaDQajUajQRsEGo1Go9Fo0AaBRqPRaDQatEGg0Wg0Go0GbRBoNBqNRqNBGwQajUaj0WjQBoFGo9FoNBq0QaDRaDQajQZtEGg0Go1Go0EbBBqNRqPRaNAGgUaj0Wg0GrRBoNFoNBqNBm0QaDQajUajAXzlnoBm7JGZDPIf23De/wjZ1onsiUE6Az4TqqsQTfUYM1rIfOrjBGdNK/d0NRqNRjMGaINgEiFjCeyXN+G88TZkrL4nZCzIRJGRKPZ7H7Jj3Ub8+8/FPPNE/KctRgT8Yz9pjUaj0YwJ2iCYBEjbxnn5DeyXNoHlGgKGgZg1DWNGC6KhDgJ+sG1kNIZs68LZug2icTJ/e4vMf/6c1I13EvzamfjPOF4bBhqNRlOBaIOgwkm/9T7WfY8h27vUG9VBzIM+hrHf3oiqYL+fk1LS8Mn9iK1+jvj9TyC37ST5g1+QuuUPVH3/m/iPPnRsbkCj0Wg0Y4JOKqxgon94gh3n/7syBoTAOGhf/OeegnnIfgMaAwBCCAIHzmPKNd+j9uFf4//88eAzkVu3k/jGfxK/7P/D2bZzbG5Eo9FoNKOONggqFKe9i53f+h9kKg3hEL7Tl+A78hBEMDDksYzdZlD9/y4jtPJ6zMMXAGA9/jzRU79B+u5HkFKO9PQ1Go1GM8bokEGFIhrC1Hz2UwBkWqYM6hEoBXOvOdTc/L9YD/+J5I9+g2zvJPnf15FZ/Weq/9+3MWZPH/Y1NBqNRlMetIegQhGmybTf/IAp//l/RsQYyI4rBP6Tj6V21Y34Tz0OAHvDK0RP+z+kf78K6Tgjdi2NRqPRjB3aQ6DpF19TPYbPpL6+uu/B+mq44T9InLGEzu9djb29jeT/dwPyyedpXHE5/r1mA2BZDrFYaoxnPj4IhYL4fLtuc/t8pg7HaDSaMUMbBJp+MQJ+SKdxtrX1e05wv72Y+vsf0X3NbcRX/YnUhldpXbKM8DfOoumy8yAw9JyFSsHnM/DZNunt/T+/AZnRjJjEz0+j0Ywt2iDQDIjV2sHOW/84+IlzZuI7+dNYf3oBGY0T+dltJNesp+m6/wstLaM/0SIMd4cOxT0c0raRHd1K5TESzfsTg2QSmbEgY+EgkV0REpveUR80DTDcP6ahEjyrgiqkEwwgQtVQW4Mw1JynXfQ5fHN0XoZGoxkbtEGgGTGM3WbiP/sk7LUv42x6i/RrW9h+/FcJXnIegQvOQPjMMZ3PcHboTjSOSCbJvP0BcvM/sN75APujHdjb27B3tIM9eK5EelcmLQSEQ4i6Wjpad1J16AHYe+yGsc+eu1QhotFoNKWiDQLNiCICfnyfPgxn7m7Iv76O/dEOUlffQubx56j+/76DOW+PMZ1PenvboB4Omc4gd3YgW9txdrQjd7RDNF7yNURtDUY4hFEXQlQFEX4f+H2YdbVI2yb1j23qOo6jDAnHAcuGVBqZTEMqpV4DSAmuxyH2wXZif3hCve8z8c/fg8DH9yW46GCqjjwEc2rToHObzDkcGo1maGiDQDMqGLOnM+W7X6Hndw8SvfkBnNe3EPv8P+H//GcIfvMcjKlTyjY3adnIbTtxPtiO/GA7cmdHv+caTfXIqiCiPowIhyBUjQjVQG01oqYaAv6siz87vvv3tO9dhLWjg9abVw4+p4wFsXgu/NAdpWpqI6m/vYX90U6wbDKb3iaz6W1iv38YAN+eswl+cn+CCxcQPPQAjF7VJIHpzaphlUaj0ZSANgg0o4ZRU0Xj//sWzqcXkfyPa3D+8RGZux8h8+BTBM49mcCXT8OY1jzq85BSqnj/1u3KCNi2E2y774mNdRhTpyCmTkE0NzLzv7+JTFslLejDRfh90FCn+kq4TPu3i7B3dLDt2ruUB2NnO862NuS2HWDZWO9+gPXuB8TufQx8JmL2dIw9ZmHsPgsRqqblglMxdLdKjUZTItog0Iw6vk8eQOjBG8jc8yipX96FbO8iffMfSP9uJb6lRxLw5JSFGJHrSSlx/vEh0VffIPnEOjLPvwzJvm5z0dKkFtHZ0xDTmvs0bTIb6rB29O89GCtETRVi95mw+0xM3KTGHe3ID1qVgbO9DSwb+d6H2O99iA2IqU1E7Aw1ZxyPnD1zxJ6tRqOpXLRBoBkTRMBP4LxT8Z++lPRtD5L+/R+R7V1Yjz6L9eiziOnN+I5bhP+4hZgH7aPc8iUi0xmct9/Hfn0L1guvYv/lNeSOdmK9TwyHMOZMx5g9AzFrKqK6akTvcawQpomYMRVmTMU89EBkMoXz/kfI9z7EeX8bpDPIHR30/PJeen55r3q2nzoU3zGH4lt4MKJmYt63RqMZXbRBoBlTRKia4DfOJnDR58msfpb07X/EeX0LcnsbmTtWkbljlWrEtOdsjI/tjTFtCqKpAVEfdpPxLGQihWxtI9PegfWPbWQ2vwcZq8+1jMY6Ah/fl4wtMWZPV2NUIKIqiDl/T5i/p/IebNuJ896HiLZO7I92qGd776Nk7n0UAn7MwxbgP/pQzCM+jrHnbO090Gg0gDYINGVCBPwETl1M4NTF2O9sxXpyLZkn1+K8vgWkxHlnK847W4c2ZlM95sf3wzxsAb7DF9D4iX2R23aWpqNQIQjTzSWYPZ3m80/BiSfpfOjPWM/+Bfulv0E6g/3ci9jPvajOn9KIeegB+A49EPOTB2DsNQdh6kREjWYyog0CTdkx95qDefFZBC8+C9ndg73pbey/bcF5+32c9i5kexeyuwdMUyXfBfwYU6cQ3H0GvllTSe82C/Nje6tkwLzdrjAMJrPwrxAC//w9CC6bRnDZF5DdPVhrX8J65i9Yz72oxJXaO7FW/xlr9Z/Vh6qDmPvuhbnfPIz952LO3wNj95lDCuFoNJqJiTYINKPGgL0Q+qO+GnabCicsAgauo/fG7e5ODHuukwFRH8b/2WPwf/YYlXj59lbsv7yq8i7++jqyvQsSKeyX38B++Y2Cz5rTm/HtNQffnrMwZ07FN6MFc0Yz5vRmzBktGOFQ0WuWUwdhtJQqNZpKRRsEmlGjlF4IA6Hr6IfHoAbZJ/ZRf77xRaSU2O9vI/3qZtKvbSb92hYyr23G6YwAKIXG7W2k1r5cdChRHcSoD2OEQ4i6EEY4hG/aFERDHSlhZD07+P2qmsPvI1rtB8smHU0ibRssR5WDpjPIVBq/Y0MmowScUmlkSr0vk97rwj+k0kjbQQT8iGCAeFUg+/9GfRijIYxRX4vRWIc5rRnfjGbM6S0YU+qL5lHo3z/NZEMbBJpRpeReCEXQdfTDY6gGmeEzqTrkY1Qd8jEAAnOmk3r9LXZcfxeyqwfZFUF2R5WAUjQB8YRSVgRkIoWdSGEPQSY6OcjxXZJ+hqGHiUw
|
||
|
"text/plain": [
|
||
|
"<Figure size 504x504 with 3 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"for df in [train,kaggle]:\n",
|
||
|
" df['age'] = [df.iloc[i]['yr_sold'] - df.iloc[i]['year_built'] for i in range(len(df))]\n",
|
||
|
"\n",
|
||
|
"discover_plot(train.age,train.saleprice);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "0f9d613b-3aa6-4e89-a5fd-fa6627c06e62",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"As a house ages, its value goes down"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 108,
|
||
|
"id": "51e87153-ae5f-406b-95ed-d085f255abbe",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T08:04:37.348305Z",
|
||
|
"iopub.status.busy": "2022-05-29T08:04:37.348040Z",
|
||
|
"iopub.status.idle": "2022-05-29T08:04:38.355622Z",
|
||
|
"shell.execute_reply": "2022-05-29T08:04:38.354979Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T08:04:37.348282Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAHfCAYAAAAvE8DnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAC/uUlEQVR4nOy9d3wc1bn//z4zs7uSVt2WC9hU2XRsSAg2EBywzaW30GvABJILSQjX3Hu/l29yk9z87jcJDs0OIQ0IOPRiML2YUNwggGkGbEwAUyzbklar3dWWmTm/P85WaVVWbVfSeb9eJNLO7Jmz49WczznneT6PkFJKNBqNRqPRjGmMYndAo9FoNBpN8dGCQKPRaDQajRYEGo1Go9FotCDQaDQajUaDFgQajUaj0WjQgkCj0Wg0Gg1gFbsDmt7Ztq29x+M1NeUAtLV1DEd3xhz6/g4t+v4OHWP93jY0VBW7CyMKvUKg0Wg0Go1GCwLN8GIYAsMQxe6GRqPRaDqhBYFm2DAMgbN8Bc7yFVoUaDQaTYmhYwg0w4odUPEQZpH7odFoNJpctCDQDAnZKwCuq8tlaDQaTamjBYFm0EltDdiBdqzaKswTjtSiQKPRaEocLQg0Q4IdaMdpDYIQeJKrBTpuQKPRaEoXLQg0Q4pZ7cd+9HniTS34pk5ECIGuuK3RaDSlhxYEmiHFXr+J2PIXkM0BoqaJNWMPrMO+VuxuaTQajaYTWhBohgznrQ+Ir3oDUgsCjoP9xnqcTZspO/1ojIZ6QAcdajQaTSmgBYFmSHA2bcZZ+QYAxo4TML6+H4aUxJ98CdnWTvC0H+E97wQ8k8broEONRqMpAbQxkWbQcQPtxB97AQBjlx3xnX8Sxo4T8R52IL7TjgbTwG3aTvypV9K+BBqNRqMpLloQaAaNlC1x7Ma/QigCHouyM49BWJmFKHP3qXiP/iYA7nsbcT7fUqzuajQajSYLLQg0g0LKe6Bj8Z3E7n0SAPMb+2OMq+1yruewryEmjQcg8eTLSNcdzq5qNBqNJg9aEGgGDTvQTuL5NeC6iCo/xj7T8p4nTAPvsXMAkFubSTyzcji7qdFoNJo8aEGgGTTcYAj3w38CYB32NYTVfcUCc4cJiF2nABBdfCfScYaljxqNRqPJjxYEmgGTih1w1r0PUiIqK7Bm7tnr+8yD9gPA3bQZ+6mXh7qbGo1Go+kBLQg0AyJd0vjZlTjrPgDAOmg/hNl7PUNjfB3GnrsBEPvzA9rBUKPRaIqIFgSaAWMH2om99i6yPQyA5+D9+vxez+yZALgffpz2LdBoNBrN8KMFgWZQSKx5CwAxdRJGfW2f32fsOBFr1kwA4n95YAh6ptFoNJq+oAWBZsC4be04yWBCc+/8mQXdYVb78e7XCIC99i3c9zYMev80Go1G0ztaEGgGjLPuA5ASKisQu+xY8PvlhPGI8XWAiiXQaDQazfCjBYFmQEjbwV73PgDWjD0RZuFfKSEExgF7AZB4ZiXOJ18Mah81Go1G0ztaEGgGhP3iq8qmWIA1c69+t2PsvhOitgqkJH7bQ4PYQ41Go9H0BS0INAMidl/SpnjaLhh11f1uRxgG1sEzAEg88hzutpZB6Z9Go9Fo+oYWBJp+437RhP3SPwDwzJox4PbMGXsg6qohniB+5yMDbk+j0Wg0fUcLAk2/iT/4dDqY0NxrtwG3JzwefOedqNq+9wlkKDLgNjUajUbTN7Qg0PQLmbBJPPgMkAom7N2ZsFeEoOy8E6HcB+1h4vc+MfA2NRqNRtMntCDQ9Av7xVeR21pACMwD+h9MmI1Z7cd9+TXMfacDEL9jGTKeGJS2NRqNRtMzWhBo+kU8GUxoffNrGDVVg9auHQip+gaGgdzWgv3YC4PWtkaj0Wi6RwsCTZ9JVTWUXzal6w54zzhm0K8jqvzpaomxG+9AoIseaTQazVCjBYGmT6SqGsbufIToz5eoMscTx+H51sFDcj3Ptw4CwN3WQuL51UNyDY1Go9Fk0IJA02fsQDt2UzOJtW8D4D3nBIQ1CMGEeTAnNWA07gxA7M/369LIGo1GM8RoQaApCPf9TRBPQLkP7+mDv12QjeeQmQA4b32I/cLaIb2WRqPRjHW0IND0GWk7OG99AID35HnKangIMXfaAWO3qQDEfnsrMmEP6fU0Go1mLKMFgabPOG99oOoWGAa+i789LNf0zJ0FhoH7z89J3P/ksFxTo9FoxiJaEGj6hIzHSSQzC8z9p2NOnTws1zUmjMN76nwAojfdidscGJbrajQazVhDCwJNn4jdvgzaw2AIrEMPHNZrl131HaiuhGCI2KK/DOu1NRqNZqygBYGmV9wvtxL9/V0AGPtNx6jtf1XD/mDU11J21UUAJB55Hju5UqHRaDSawUMLAk2PSCmJ/vJm6IipIkYH7T+8HRDKDMl3xtGYX9sHgI6f3IBsDw9vPzQajWaUowWBpkfiSx/B/vurAHjmHYLweob1+ma1H/vR54n/bTll82ZDmQ+5ZTvR/7lZexNoNBrNIKIFgaZb7FffJnbtrQB4TpqLtU9jcfoRCOG0BnE9Hir+fQEAicdeIPHQs0Xpj0aj0YxGtCDQ5MV+4z0i3/8Z2DbGblOp+Onlxe4SZrUfs6oCY49dAYj+f7/HeW9jkXul0Wg0owMtCDQ5SCmJP/g0kYv/CzqiiB0nUvGnXyL85ZmTkvv6qf+GE6ctjHnY1xC11RCNEbniF7CtZVj7oNFoNKMRLQjGONkDu7PxEyKX/oToT26EeAJj5x2ovP3/Ye04IWfgT+3rx+58BHfFaoQYXlEgfF7KF5wKZV5kUzOhs34M7aFh7YNGo9GMNqxid0BTPAxDqIC9N9bjvLEeZ8Mn6WPW/EPw/+9VuH9fS+zvr+KbOjFn4E/t69s1lUXoORgTx+M7ZT6xe5/A/Wob4Ut/QsUf/gdRXZz+aDQazUhHrxCMIbKX+GXCJv7I84Sv/Qvxe55IiwFRX0P5RadQteQnmDWV2IF2NfAHSy/Nz9x9J8x5h4AQOG99SPg7/4m7vbXY3dJoNJoRiV4hGOWkBIBhCBKPPI90XZyOONE/3ov8cmv6PLNxJ8wD90GOq8XcdUec5SswKiuGfTugUMzGnTHKfSQe+zvuBx8TPvNKKhb/FHPv3YvdNY1GoxlRaEEwijEMgbN8BXagHd/UiSQ+/ITYYy8gt2xPnYC5526IfadR9vV9cINh7JY2QG0JGO7IyPO39p2O9+jDCV/1/5BfbSN87r/hu+oivOeegDD0IphGo9H0Bf20HOXYgXbsljYij71I9NYH02LAc8p8qp76E95T5mGMrytyLweIEPjmzqLqnusxpk6CWJzY//sDkXMXYr/+XrF7p9FoNCMCLQhGOTKRwHluFfHH/g6ui6ivoere6/D/v6vw7LJjsbs3KKSyHsyvmvBfegbm/nsAqlxz5PyriVzxC5wP/1nkXmo0Gk1po7cMRjHulm3E7ngUuWUbAOZ+0yk782jE5q+Ivb2hS+bASCa1xeHGEpiHfQ1zxp44697HeWcD9oo12CvWYB60H95zTsCaOxthmcXuskaj0ZQUWhCMUmRHlNA5C5UYEALvcXMwZ+yJ8HqLnjI4HHj3m4bYfxqx1W9hv/w67rYWnNfeoeO1dxDj6/AeNwfruG9h7DNt1IgijUajGQhaEIxWbAcZaIcyL9a8Q/Ee/jXcEkwdHEqctjBMaqD83y5CbvqU6Auv4W76FLm9ldhflxH76zKMXafgPeEIzCNmYUzfRYsDjUYzZtGCYJQiqvxUP38b8YefxQ11FLs7RUUYAjmxAeuoQ7FqjiKx5m3s9zYit7bg/vNzojfdCTfdiZg4Diu53WDutTvGtF16re6YSut0R0hGhkaj0XSHFgSjGKOuBuHxAGNbEGRj1NXgOfRAvEcfhrPxU+KvvYv85+fIYAjZ1EziwWdIPPiMOtk0EA31RCePx5wwDru8HCrKEX71n1HlR274J/i8mN86GFlehvBXqOOVFVBRhjB1rIJGoxkZaEGgGZMIITAmNWAdeiCec46DbS3E3vwA+UUTbnMrclsrOC5yy3biKd+Gnrjjkfyvl/vSIiEjJio
|
||
|
"text/plain": [
|
||
|
"<Figure size 504x504 with 3 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"discover_plot(train.lot_area, train.saleprice);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "a0733ef0-7608-4e14-9ce9-8b041a39594e",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Sale price increases with lot area as well, but we have the same problem with outliers"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 109,
|
||
|
"id": "16e00fab-bda6-4d0d-bfff-9613514b7940",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T08:04:38.356770Z",
|
||
|
"iopub.status.busy": "2022-05-29T08:04:38.356567Z",
|
||
|
"iopub.status.idle": "2022-05-29T08:04:38.362415Z",
|
||
|
"shell.execute_reply": "2022-05-29T08:04:38.361692Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T08:04:38.356748Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"drop_outliers([train],'lot_area',80000)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 110,
|
||
|
"id": "0fc710a4-102f-4f1b-a989-91c7c38f51b0",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T08:04:38.363723Z",
|
||
|
"iopub.status.busy": "2022-05-29T08:04:38.363540Z",
|
||
|
"iopub.status.idle": "2022-05-29T08:04:39.413779Z",
|
||
|
"shell.execute_reply": "2022-05-29T08:04:39.413306Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T08:04:38.363707Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAHfCAYAAAAvE8DnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAADCNklEQVR4nOy9d3wc1bn//z4zs0VaNcuWbcB0YTo2BIINBAdsc+kt9BowgeRCEsI1997v5ZvcJDe/+02CQ7NDSAMCDi0Ug+nFhGIbE4ppBmxMAFMsW2W12pW2zMz5/XG2SquyarsrnffrRSLtzM6cHa9mPuc5z/N5hJRSotFoNBqNZlxjFHsAGo1Go9Foio8WBBqNRqPRaLQg0Gg0Go1GowWBRqPRaDQatCDQaDQajUaDFgQajUaj0WgAq9gD0PTP1q0dfW6vra0AoL29azSGM+7Q13dk0dd35Bjv17ahobrYQygrdIRAo9FoNBqNFgSa0cUwBIYhij0MjUaj0XRDCwLNqGEYAmf5CpzlK7Qo0Gg0mhJD5xBoRhU7qPIhzCKPQ6PRaDS5aEGgGRGyIwCuq9tlaDQaTamjBYFm2EktDdjBDqy6aswTjtSiQKPRaEocLQg0I4Id7MBpC4EQeJLRAp03oNFoNKWLFgSaEcWsCWA/8hzxplZ8209BCIHuuK3RaDSlhxYEmhHFXreR2PLnkS1BoqaJNWN3rMO+VuxhaTQajaYbWhBoRgznrQ+Ir3oDUgEBx8F+Yx3Oxk34Tz8ao6Ee0EmHGo1GUwpoQaAZEZyNm3BWvgGAsd1kjAP3xZCS+BMvIts7CJ32Q7znnYBn6iSddKjRaDQlgDYm0gw7brCD+KPPA2DstB2+80/C2G4K3sMOwHfa0WAauE3NxJ98Oe1LoNFoNJriogWBZthI2RLHbvwLhDvBY+E/8xiElQlEmbtuj/fobwDgvrcB5/PNxRquRqPRaLLQgkAzLKS8B7oW30ns3icAML++H8bEuh77eg77GmLqJAAST7yEdN3RHKpGo9Fo8qAFgWbYsIMdJJ57BVwXUR3A2Hu3vPsJ08B77BwA5JYWEk+vHM1hajQajSYPWhBohg03FMb98J8AWId9DWH13rHA3HYyYudpAEQX34l0nFEZo0aj0WjyowWBZsikcgecte+DlIiqSqyZe/T7PvOgfQFwN27CfvKlkR6mRqPRaPpACwLNkEi3NH5mJc7aDwCwDtoXYfbfz9CYNAFjj10AiP3pfu1gqNFoNEVECwLNkLGDHcT+8S6yIwKA5+B9B/xez+yZALgffpz2LdBoNBrN6KMFgWZYSLzyFgBi+6kY9XUDfp+x3RSsWTMBiP/5/hEYmUaj0WgGghYEmiHjtnfgJJMJzb3yVxb0hlkTwLtvIwD2mrdw31s/7OPTaDQaTf9oQaAZMs7aD0BKqKpE7LRdwe+XkychJk0AVC6BRqPRaEYfLQg0Q0LaDvba9wGwZuyBMAv/SgkhMPbfE4DE0ytxPvliWMeo0Wg0mv7RgkAzJOwXXlU2xQKsmXsO+jjGrjsg6qpBSuK3PTiMI9RoNBrNQNCCQDMkYvclbYp32wljQs2gjyMMA+vgGQAkHn4Wd2vrsIxPo9FoNANDCwLNoHG/aMJ+8TUAPLNmDPl45ozdERNqIJ4gfufDQz6eRqPRaAaOFgSaQRN/4Kl0MqG55y5DPp7wePCdd6I69r2PI8OdQz6mRqPRaAaGFgSaQSETNokHngZSyYT9OxP2ixD4zzsRKnzQESF+7+NDP6ZGo9FoBoQWBJpBYb/wKnJrKwiBuf/gkwmzMWsCuC/9A3Of6QDE71iGjCeG5dgajUaj6RstCDSDIp5MJrS+8TWM2uphO64dDKv+BoaB3NqK/ejzw3ZsjUaj0fSOFgSaAZPqaii/bEr3HfCeccywn0dUB9LdEmM33oFANz3SaDSakUYLAs2ASHU1jN35MNGfLVFtjqdMxPPNg0fkfJ5vHgSAu7WVxHOrR+QcGo1Go8mgBYFmwNjBDuymFhJr3gbAe84JCGsYkgnzYE5twGjcEYDYn/6mWyNrNBrNCKMFgaYg3Pc3QjwBFT68pw//ckE2nkNmAuC89SH282tG9FwajUYz3tGCQDNgpO3gvPUBAN6T5ymr4RHE3GFbjF22ByD2m1uRCXtEz6fRaDTjGS0INAPGeesD1bfAMPBd/K1ROadn7iwwDNx/fk7ib0+Myjk1Go1mPKIFgWZAyHicRLKywNxvOub224zKeY3JE/GeOh+A6E134rYER+W8Go1GM97QgkAzIGK3L4OOCBgC69ADRvXc/qu+DTVVEAoTW/TnUT23RqPRjBe0IND0i/vlFqK/uwsAY9/pGHWD72o4GIz6OvxXXQRA4uHnsJORCo1Go9EMH1oQaPpESkn0FzdDV0w1MTpov9EdgFBmSL4zjsb82t4AdP34BmRHZHTHodFoNGMcLQg0fRJf+jD2318FwDPvEITXM6rnN2sC2I88R/yvy/HPmw1+H3JzM9H/uVl7E2g0Gs0wogWBplfsV98mdu2tAHhOmou1d2NxxhEM47SFcD0eKv99AQCJR58n8eAzRRmPRqPRjEW0INDkxX7jPTq/91OwbYxdtqfyJ5cXe0iYNQHM6kqM3XcGIPr//Q7nvQ1FHpVGo9GMDbQg0OQgpST+wFN0Xvxf0BVFbDeFyj/+AhGoyOyUXNdP/TeaOO0RzMO+hqirgWiMzit+DltbR3UMGo1GMxbRgmCck/1gdzZ8QuelPyb64xshnsDYcVuqbv9/WNtNznnwp9b1Y3c+jLtiNUKMrigQPi8VC04FvxfZ1EL4rB9BR3hUx6DRaDRjDavYA9AUD8MQKmHvjXU4b6zDWf9Jeps1/xAC/3sV7t/XEPv7q/i2n5Lz4E+t69u1VUUYORhTJuE7ZT6xex/H/WorkUt/TOXv/wdRU5zxaDQaTbmjIwTjiOwQv0zYxB9+jsi1fyZ+z+NpMSDqa6m46BSql/wYs7YKO9ihHvyh0ivzM3fdAXPeISAEzlsfEvn2f+I2txV7WBqNRlOW6AjBGCclAAxDkHj4OaTr4nTFif7hXuSXW9L7mY07YB6wN3JiHebO2+EsX4FRVTnqywGFYjbuiFHhI/Ho33E/+JjImVdSufgnmHvtWuyhaTQaTVmhBcEYxjAEzvIV2MEOfNtPIfHhJ8QefR65uTm1A+YeuyD22Q3/gXvjhiLYre2AWhIw3PKo87f2mY736MOJXPX/kF9tJXLuv+G76iK8556AMHQQTKPRaAaCvluOcexgB3ZrO52PvkD01gfSYsBzynyqn/wj3lPmYUyaUORRDhEh8M2dRfU912NsPxVicWL/7/d0nrsQ+/X3ij06jUajKQu0IBjjyEQC59lVxB/9O7guor6W6nuvI/D/rsKz03bFHt6wkKp6ML9qInDpGZj77Q6ods2d519N5xU/x/nwn0UepUaj0ZQ2eslgDONu3krsjkeQm7cCYO47Hf+ZRyM2fUXs7fU9KgfKmdQShxtLYB72NcwZe+CsfR/nnfXYK17BXvEK5kH74j3nBKy5sxGWWewhazQaTUmhBcEYRXZFCZ+zUIkBIfAeNwdzxh4Ir7foJYOjgXff3RD77UZs9VvYL72Ou7UV5x/v0PWPdxCTJuA9bg7Wcd/E2Hu3MSOKNBqNZihoQTBWsR1ksAP8Xqx5h+I9/Gu4JVg6OJI47RGY2kDFv12E3Pgp0ef/gbvxU2RzG7G/LCP2l2UYO0/De8IRmEfMwpi+kxYHGo1m3KIFwRhFVAeoee424g89gxvuKvZwioowBHJKA9ZRh2LVHkXilbex39uA3NKK+8/Pid50J9x0J2LKRKzkcoO5564Yu+3Ub3fHVFmnWyYVGRqNRtMbWhCMYYwJtQiPBxjfgiAbY0ItnkMPwHv0YTgbPiX+j3eR//wcGQojm1pIPPA0iQeeVjubBqKhnug2kzAnT8SuqIDKCkRA/WdUB5Dr/wk+L+Y3D0ZW+BGBSrW9qhIq/QhT5ypoNJryQAsCzbhECIExtQHr0APwnHMcbG0l9uYHyC+acFvakFvbwHGRm5uJp3wb+uKOh/O/XuFLi4S
|
||
|
"text/plain": [
|
||
|
"<Figure size 504x504 with 3 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"discover_plot(train.lot_area, train.saleprice);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "9c2790c8-eb24-434e-a333-ca752faaf803",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"This is a clearer picture. There is still some noise but the scaler should be able to handle the rest"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "2363762a-fd4c-4e1e-bdf0-c67f101980f7",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Create a DataFrame for the model\n",
|
||
|
"Now split off price (our target) as y and list features to be included in the feature set (X) for our model. \n",
|
||
|
"Also create X_kaggle for creating a kaggle submission later on"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 151,
|
||
|
"id": "8207024f-87b2-4c50-b6cc-f1b0ed7f75ce",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:09.528495Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:09.528007Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:09.534060Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:09.533301Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:09.528468Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"y = train.saleprice\n",
|
||
|
"\n",
|
||
|
"include = [\n",
|
||
|
" 'gr_liv_area',\n",
|
||
|
" 'age',\n",
|
||
|
" 'lot_area',\n",
|
||
|
" 'neighborhood',\n",
|
||
|
" 'central_air',\n",
|
||
|
" 'overall_qual',\n",
|
||
|
" 'fireplaces',\n",
|
||
|
" 'full_bath',\n",
|
||
|
" 'half_bath',\n",
|
||
|
" 'ms_zoning',\n",
|
||
|
" 'street',\n",
|
||
|
" # 'low_qual_fin_sf',\n",
|
||
|
" 'land_contour',\n",
|
||
|
" 'lot_config',\n",
|
||
|
" # 'overall_cond', # points the wrong way\n",
|
||
|
" ] \n",
|
||
|
"X = train[include]\n",
|
||
|
"X_kaggle = kaggle[include]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "969ea58d-6494-4009-b40b-e53bcf9786da",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"From the selected features, anything that is an object we'll just convert to dummies and then drop the original column"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 152,
|
||
|
"id": "a51b40dc-0d8e-458f-bec9-268a6956f885",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:09.862460Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:09.862132Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:09.883430Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:09.882816Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:09.862433Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# convert all object type columns into dummies\n",
|
||
|
"def make_dummies(df):\n",
|
||
|
" for col_name in df.columns:\n",
|
||
|
" if df[col_name].dtype == 'O':\n",
|
||
|
" dums = pd.get_dummies(df[col_name],prefix=col_name,dtype=int,drop_first=True)\n",
|
||
|
" df = df.drop(labels=[col_name],axis=1)\n",
|
||
|
" df = df.join(dums)\n",
|
||
|
" return df\n",
|
||
|
"\n",
|
||
|
"X = make_dummies(X)\n",
|
||
|
"X_kaggle = make_dummies(X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "ab6c21b8-74b2-42bc-80ab-5866066c4aad",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The block below can be expanded to show a plot of the linearity of every feature used in the model and provide some insight as to how each feature affects the predicted sale price"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 153,
|
||
|
"id": "13ab6821-c284-48d0-ad3d-20410420bb9c",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:09.947477Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:09.946842Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:09.950534Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:09.949804Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:09.947448Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# for col in X.columns:\n",
|
||
|
"# discover_plot(X[col],y)\n",
|
||
|
"# plt.show()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "fb1beba2-a5cc-47d8-9c22-76645579c9b5",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Noteworthy Discoveries"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 154,
|
||
|
"id": "c8d6079a-236a-407a-90be-bea8d98610dd",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:10.013814Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:10.013516Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:10.941033Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:10.940199Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:10.013781Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAHfCAYAAAAvE8DnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAACWMklEQVR4nOzdeXxU1f34/9e5d2ayL+yogFICVEREW2UR4SOLta4VrWjdilhtK63WamtrtcvHXz+2ri2UUttaq1SttYrihgt+cWFxRVHUAG64EJYsk2SSzMy95/fHnZnMZL0JmbmT5P18PGKSO8c7J0My933PeZ/3UVprjRBCCCH6NcPrDgghhBDCexIQCCGEEEICAiGEEEJIQCCEEEIIJCAQQgghBBIQCCGEEALwed0B0bndu2u97kKPKinJA6CmpsHjnvQO8np1jbxeXdOXX68hQ4q87kKvIiMEQgghhJCAQAghhBAyZSCE6KJ8HYWGptYP5OUQUvKWIkRvJX+9QoiuaWii8o6HWh0eeOFpkC9vKUL0VjJlIIQQQggJCIQQQgghAYEQQgghkIBACCGEEEhAIIQQQggkIBBCCCEEEhAIIYQQAgkIhBBCCIEEBEIIIYRAAgIhhBBCIAGBEEIIIZCAQAghhBBIQCCEEEIIJCAQQgghBBIQCCGEEAIJCIQQQgiBBARCCCGEQAICIYQQQiABgRBCCCGQgEAIIYQQSEAghBBCCCQgEEIIIQQSEAghhBACCQiEEEIIgQQEQgghhEACAiGEEEIgAYEQQgghkIBACCGEEEhAIIQQQggkIBBCCCEEEhAIIYQQAgkIhBBCCIEEBEIIIYRAAgIhhBBCIAGBEEIIIZCAQAghhBCAz+sOCCHEvsjXUWhoaj4QaXCO+/yElLzFCeGW/LUIIXq3hiYq73go8W1urh+A/G+dBPnyFieEW/LXIoRwTWtN06vvEH3pdfTuStAalZ+HOmAYdrAO8gu87qIQopskIBBCuBJ98z2abrgd6833Uo5rgO2fsPuUSwlceDqBC89ABfye9FEI0X0SEAghOqQti/Dy+2hafi9YNgBqyEDUAcNQfh+6qgb748/RdSGa/ng3kWfXk3fjTzEPOsDjngshukICAiFEu3SokYaf3kj02fUAGOMOovTH36buzfLUdo1N+JuaCN3zGPY72wh968fk/flX+A77shfdzkqtkh/j8nIk+VFkBfktFEK0SdfUEvruLxNTBP5vnUzuVYsIWBFoERCo3ByKv38WHHcMDT/6LbqyhtDCn5F389X4j53iRfezT4vkx7iBF54myY8iK0gdAiFEK3ZlDfULf+YEA0qR+4vvkfeL76FyAh3+f74jD6XgXzejRg6HxiYafvC/hP+7OkO9FkLsCwkIhBAp7F17CZ3/E+z3PgDTIO/GnxD41smu/3/jwP0p+NfNGIeUgW3TeO0faPpn6ztjIUR2kYBACJFgffQZ9edehf3BDvD7yLvtGvwnzOryeYzBAyi48wbMKYcB0PS7v9K07B601j3dZSFED5GAQAgBgPXOVkLnXon+dCfk5ZC/7Ff450zr9vlUQT75y3+NL5ZD0LR0BU033I62rJ7qshCiB0lAIIQguu516i+4Gl1ZgyopIv+O/8N39BH7fF6VEyDvtmvwxUYZwnc/TMMPr0fX1u/zuYUQPUtSW4Xox7RtE77jvzTd9k+wbdTwIeT/9X8xx4zqsedQfh95v7uSpiEDCf/zIaLPbaTujB+Q9/uf7POyRK010Q8/xdqyDV0VRNfVY0WioDWNG95CDx2EcdABmIeMxTikDKO0uId+KiH6HgkIhOinrI8+pfG6P2K9+jYAxsFjyP/TLzGGD+7x51KmSe5Pv4MxZiSN1/8ZvWMnoW/9GP+ZXyfne2djDB3k6jxaa+yPP8Pa+BbRVzZjvfwWtXuqUn+u+OcdO1v9/8ZBB2BOm4xv2uH4phyGKkpfqWXd0Ii99WPsnbvR1bXYu/c6r7XfB34fKi8XVVKEjkTS1oeOJOoixDeDijhFp6QuQv+ltGT5ZL3du2u97kKPKinJA6CmpsHjnvQOPf162Z98QdOdDxJ54EmIOpdP/zePJ/fn3+10WSFAfqi+3fX0IRd7GVjvf0jD1Tdhv/+hc8Dnw3fc0fjnTMOcNB61/1CUUmit0Xursbd/gr3tE6Kb3sV6ZTN6197WJ83LQQ0egCoqxFeYB0rhKxtFuLoW+/2PnCRJ2079f0wD89DxmNMOxzf9cOe5/d27EOpgHdZ7H2Bt2Yb97nasLduxP/y09XO2xTQwRu2PMWEM5iFjnY+Dx6AK87vVF7fi/47xzaAaG53AxO2/Y28wZEiR113oVSQg6AUkIOjfSv2gQw1EIi0uLi7v5HRjE3b5R0Rff4fo/3sZ6+W3Eo+p/YeS98vF+I75quv+7GtAAKCjFuF7VhH+6/3ovdWpD5oG5OdBUxjCbd89qwHFmEdOwjdlEoWHllHz3CsopYDU3Q7j/dENjVjvbsfa+CbR9ZuwNr2bCIaaf7A8fEcdinnEIRgj98MYORw1ZKBzN+/3QcTCrqpB761Gf7Eba9vH2Ns/wSr/yEnEbE9xIaqkCF9xAdE91RCNoiNRCDW07kPiB1QYBx2AMaEM85CyHgkStGVBfQO6LoSuqydnz16CDz2LX9voqEXEdkZyir4+g6biIlRBPhQVoIoLUUUF3Q6WvCQBQddIQNALdDcg0KFG7M93JR1I/qdu8c+e/Jhu53hH7Voe0O18DRQU5ABQX9fU6rF2fx077EeLx2wNlgWWhY5aTv19y3LefC3LeWOMHQ9Ew+iGcHP7WFutFBFLg2mgDANME3wmGAb4TJRpOhcuM/mY0Xws/rhS0BhGNzVBQxO6Kex8bmx0Pjc0okMNEGpyPjc0okON6IZGCDWgG5ogEnFeWqVAxT6bJkZRPjonx9lIyO+HgM/52meiQ40QakQH65xdCVtQw4cQWHgagQUndnkjop4ICOJ0YxORx9YSWf2CE6i0FQAohRoxDPPLX8J31GGYRx2KMWaU8+/SRn/aCghaPW99A9FXN2Ote4Po+k3Y2z7uUr/bpBTGgftjHDwGc8IYzIPHYBw8BmNASZv91FpDXYjCKYcS+ngn1pat2O9sw/7os/afo7gQY9gg1NBBqIJ8VH4u5ASc399ILNCIRJzfm7oQurYeXVePrgtB/T4G4Hk5qKJYcFBcgCoqjAUMBUnHnc/k5Tq/qy1pEn9rzX+PduLr5r/Z2EcggP/kY7ud+yEBQddIQNAL2HY3/om0RteHWl8sRf+lDJQvFtj4un+3p9DoptYXbpXjR7d5FXBLO8Gc1s2/t4ZyAq4OztuqP7GmKtCF/mgduyhFnQtUch/afFLlBCTxj3gw2JV+xo+3fN20dqYa4sGpbbubeui2WKCZ8uNmz/uGCvghJ6db/69h7MvvY/8jAYEQQgghpA6BEEIIISQgEEIIIQQSEAghhBACCQiEEEIIgQQEQgghhEACAiGEEEIgAYEQQgghkIBACCGEEEhAIIQQQggkIBBCCCEEEhAIIYQQAuh9+1n2Q7at2bu3zutu9BjZ/rhr5PXqGnm9uqYvv15d2e2wr73Ptqej10RGCIQQQgghAYEQQgghJCAQQgghBBIQCCGEEAIJCIQQQgiBBARCCCGEQAICIYQQQiABgRBCCCGQgEAIIYQQSEAghBBCCKR0sRCiG3RdiOiGTejqIKq0GN/UyajCfK+7JYTYBxIQCCFc01oTvu8xIiseQVs2RKPgM1Gmif/cUwicdSJKKa+7KYToBgkIhBCuhe97jPA/HoSCPJTPTBzXUcs5DuScfZJX3RNC7APJIRBCuKLrQkRWPNIqGACc7wvynJGD+pBHPRRiH4TD2Lsrve6FpyQgEEK4Et2wCW1ZrYKBOOUz0ZZFdP2mzHZMiB6gm8KE737Y6254yvOAYOPGjVx44YV89atf5bDDDuOEE07g3//+d0qbl156iTPPPJNJkyYxbdo0rrvuOoLBYKtz1dfXc/311zNjxgwmTZrE/PnzefbZZ9t8Xi/PKURvpKuDELU6bhS1nHZC9EK+unqvu+ApTwOChx56iIU
|
||
|
"text/plain": [
|
||
|
"<Figure size 504x504 with 3 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"discover_plot(train.overall_cond,y);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "edd6a5ff-ba1c-49f7-84b3-335451557248",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"This feature is actually the opposite of what you may expect... As overall condition goes up, the price goes down? I think someone is lying!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 155,
|
||
|
"id": "64e52733-3ddf-4157-9d72-caea7b509646",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:10.942849Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:10.942506Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:11.798102Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:11.797337Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:10.942825Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAHfCAYAAAAvE8DnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAB8cUlEQVR4nO3deXxU1d348c+5s2RfWMKigFICCCJgXQChUFmsxYWCe12qYtHnkVZrwWqt1vaxv8cF0RZK0fZRa617kYooyFKxsqkoiqKGpSqoBMi+z3LP74+bmWTIJJkJmdxZvu/XK5rce3LnnEnI/d5zvuccpbXWCCGEECKlGXZXQAghhBD2k4BACCGEEBIQCCGEEEICAiGEEEIgAYEQQgghkIBACCGEEIDT7gqI9h06VNWh78vLywCgoqKuM6sTV6SNySMV2ilt7FoFBTl2VyGhSA9BktJaU/XES9SueMPuqgghhEgA0kOQpPSBw5Tf8QdwGGT/628YPbvZXSUhhBBxTHoIkpTqkY9Kd4PfxP/uR3ZXRwghRJyTgCBJKbcL9yknAuB750ObayOEECLeSUCQxNLGjQbA/470EAghhGibBARJLG3cKADM3V9glpTbWxkhhBBxTQKCJJY2epiVRwCSRyCEEKJNEhAkMZXuxn3ycAD823faXBshhBDxTAKCJOcsHACA+c0hm2sihBAinklAkOQcvXsAoA+W2lwTIYQQ8UwCgiTn6GMFBObBEptrIoQQIp5JQJDkHL0aewgOlaK1trk2Qggh4pUEBEnO0bun9YnXh67o2CZJQgghkp8EBEkuMGQAoGXYQAghRCskIEhyRvc8cDoASSwUQgjROgkIkpwyDFTP7oAkFgohhGidBAQpQBVYWx/rQ2U210QIIUS8koAgBRiNMw3MQ9JDIIQQIjwJCFKA6mUNGUgOgRBCiNZIQJACjILAaoXSQyCEECI8CQhSQKCHwJQeAiGEEK2QgCAFGAWNQwaHS9GmaXNthBBCxCMJCFKAakwqxOdHl8tqhUIIIVqSgCAFBIYMQPIIhBBChCcBQQpQ+bngdAKyOJEQQojwJCBIAUqppqmHhySxUAghREsSEKSIYGKh9BAIIYQIQwKCFBGceijLFwshhAhDAoIUoWRxIiGEEG2QgCBFGMEeAskhEEII0ZIEBCkisBaB9BAIIYQIRwKCFBHoIdCHy9B+v821EUIIEW8kIEgRqnGWAX4TXVZpb2WEEELEHQkIUkRgx0OQbZCFEEK0JAFBqsjLBrcLAPOQ5BEIIYQIJQFBirBWKwwkFkoPgRBCiFASEKSQYGKhzDQQQghxBAkIUojq2bgWgfQQCCGEOIIEBCnE6NsTAPObgzbXRAghRLxx2l0BETsOh4FSiry8DACMIcfhAdT+b4LHOoPPZ1JT09Bp1xNCCNH1JCBIYkop8HgwvzkMgCM7EwDfvmL8X3yNcjqO+jXcfXpCJ1xHCCGEvSQgSHK+4lIOPfEyALqiyjro93Poj8+h8rKP+voFV5+PcWzvo76OEEIIe0kOQSrJzgKlgGbBgRBCCIEEBClFOQzIyQJAV1bbXBshhBDxRAKCFBMYJpAeAiGEEM1JQJBiVG4OID0EQgghQklAkGKCiYTSQyCEEKIZCQhSjMpr6iHQWttcGyGEEPFCAoJUk9vYQ+DzQ02dvXURQggRNyQgSDEqLwcM68cumxwJIYQIkIAgxSinA9W3AABz/wGbayOEECJeSECQgox+fQAw90lAIIQQwiIBQQpS/RqXGq6oQlfV2FsZIYQQcUECghSkCrqD2wXIsIEQQgiLBAQpSBkGqnFDInPXFzL9UAghhAQEqcoxdCAAev8BzM/+Y3NthBBC2E22P05RamA/1Lf6o/fuw//vd6Gqxuo1SHejUO1+v8bqVfDu2YeqqsVfVRc40coLHvm1av2kUk2HlGoqq5rKqsZj3px0AMyq+nbrnKhSoY2QGu2UNkYpMx2joPvRX0dERGnpL457ptmxH1Hwntva92uNWVsH8isghIhTKj0NXK4Ofa9htP9wI5pIQCCEEEIIySEQQgghhAQEQgghhEACAiGEEEIgAYEQQgghkIBACCGEEEhAIIQQQggkIBBCCCEEEhAIIYQQAgkIhBBCCIEEBEIIIYRAAgIhhBBCILsdJgTT1JSUVEf9fXl5GQBUVNR1dpXihrQxeaRCO6WNXaugICfish39O5to2npPpIdACCGEEBIQCCGEEEICAiGEEEIgAYEQQgghkIBACCGEEEhAIIQQQggkIBBCCCEEEhAIIYQQAgkIhBBCCIEEBEIIIYRAli5OWrq6ltpN72KWVeJNS8c5djQqO9PuagkhhIhTEhAkGa01nmdX4n3qZWq1Bp8P0zBQDgeuK87Hfek5KKXsrqYQQog4IwFBkvE8uxLP48sgKwNHmgsA7TfRPr91HEi77Fw7qyiEECIOSQ5BEtHVtXifehmyMlBOR8g55XRAVgbep15G19TaVEMhhIhTHg/moVK7a2ErCQiSiG/LdrTf3yIYCFBOB9rvx7d5e9dWTAgh4pxu8OD52z/troatbA8Itm7dyrXXXsupp57KqFGjmD59Os8991xImY0bN3LxxRczcuRIxo0bx1133UVlZWWLa9XU1HDPPfcwYcIERo4cyaxZs1i3bl3Y17XzmrGiyyvB52+7kM9vlRNCCBHCWV1jdxVsZWtA8NJLL3HNNdfQv39/Fi5cyNKlS7n88svxer3BMlu3bmXOnDn06dOHpUuX8otf/IL169czZ84cTNMMud7cuXNZsWIFN910E4888giFhYXMnTuXDRs2hJSz+5qxovJzoZXegSCnwyonhBAilM9ndw1sZVtS4TfffMPdd9/Nz372M3784x8Hj48bNy6k3AMPPMDgwYN5+OGHMQwrfikoKODaa69l1apVTJ8+HYANGzawadMmFi9ezLRp0wAYO3Ys+/bt495772XSpElxcc1Yco4djXI40L7wwwba50M5HDjHjY55XYQQIuF00cNbvLKth+DFF18E4Morr2y1THFxMTt27GDGjBnBmyzA+PHj6d27N6tXrw4eW7NmDTk5OUyZMiV4TCnFzJkz2bt3L7t3746La8aSys7EdcX5UFOHPmLoQPv8UFOP64rzUVmyHoEQQrRgartrYCvbAoJ33nmHQYMG8frrr/O9732PYcOGMXHiRBYsWIDH4wGgqKgIgMGDB7f4/iFDhrBr167g17t27aKwsDDkhgwwdOjQkGvZfc1Yc196Du5rZqE8XszKGszSCnRVDcrjxX3NLNyXntNldRFCiISS4j0Etg0ZHDx4kIMHD3LPPfdw0003UVhYyJYtW3j00Uf55ptvePDBBykvLwcgLy+vxffn5eWxc+fO4Nfl5eUcf/zxYcsFzjf/v13X7AilIC8vI/JvuOEifDO+S+0Ty/EXl0BBN3JmX4CzT8+jqkc8cjYOjUT1/iSYVGgjpEY7pY1xTuvErHcnsS0g0FpTU1PDwoULOecc66l1zJgx1NfX89hjj/HTn/40WLa1lfWOPN7WCnyRlu2Ka8aS1prqv/6T6seWgWmivT5wOqlfvZHsa2eR/aMZslKhEEKEkep/GW0LCPLz8wGYMGFCyPGJEyfy2GOP8fHHHwfLBJ7Am6uoqAh5Is/Pz2+1HDQ9vdt9zY7QGioq6iIq2/DMK8GVCp1pLhTg95uYPj8Vf3qOunpvUq1UGIjmI31/ElEqtBFSo53Sxq5VUJATVXnT54+LesdSW++JbTkEQ4YMafO8YRjBMflwY/BFRUUhY/aFhYXs2bOnxRS/wPh+4PXsvmYsyUqFQghxFFI8h8C2gCAwje/I+fwbNmxAKcVJJ51Enz59GDFiBCtWrAi5KW/evJni4mLOOuuskOtVVlayfv36kOstX76cgQMHUlhYCGD7NWMpZKVCv4lZUW0lFVbWgN+UlQqFEKINOsUDAtuGDCZOnMjEiRP57W9/S1lZGYMHD2bLli08+eSTXHrppRx77LEAzJs3j9mzZ3PLLbdwySWXUFxczIIFCxg1ahRnn3128HqTJk1izJgx3HHHHZSXl9OvXz+WL1/Otm3bWLJ
|
||
|
"text/plain": [
|
||
|
"<Figure size 504x504 with 3 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"discover_plot(train.low_qual_fin_sf,y);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "7d00dcde-3e6a-40d3-a198-fde1fbd76948",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Here's an example of a feature I think is just useless, not enough meaningful data to be included"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "80436214-6f1e-44d2-bd8a-cf7f296a593d",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The data set isn't huge, so we'll use a 60/40 split to try to balance representation between testing and training sets. By not specifying a random seed we can get a slightly different result each time we run the split, but since the split is defined seperately from the model, each model runs on the same data split with the same random seed. In other words, the more we split and run models, we can see how consistent the model is. This Model is quite consistent in my testing"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 156,
|
||
|
"id": "780b860f-641e-45b8-abe7-a7aff0716381",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:11.799352Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:11.798934Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:11.805253Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:11.804579Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:11.799326Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "3abdcea3-eea9-464b-8114-f718f9d54aa6",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"And then run the model, first just a basic LinearRegression"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 157,
|
||
|
"id": "f44d32c1-dac1-4164-9a69-9cb0750b378f",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:11.806782Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:11.806456Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:11.892078Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:11.891386Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:11.806761Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"--Test--\n",
|
||
|
"RMSE: 37720.42\n",
|
||
|
"R2 Score: 0.79\n",
|
||
|
"\n",
|
||
|
"--Train--\n",
|
||
|
"RMSE: 32402.64\n",
|
||
|
"R2 Score: 0.82\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pipe = Pipeline([\n",
|
||
|
" ('scale', QuantileTransformer()),\n",
|
||
|
" ('model', LinearRegression())\n",
|
||
|
"])\n",
|
||
|
"model = pipe.fit(X_train,y_train)\n",
|
||
|
"test_predict = model.predict(X_test)\n",
|
||
|
"train_predict = model.predict(X_train)\n",
|
||
|
"print(f'\\\n",
|
||
|
"--Test--\\n\\\n",
|
||
|
"RMSE: {mean_squared_error(y_test, test_predict ,squared=False):.2f}\\n\\\n",
|
||
|
"R2 Score: {r2_score(y_test, test_predict):.2f}\\n\\n\\\n",
|
||
|
"--Train--\\n\\\n",
|
||
|
"RMSE: {mean_squared_error(y_train, train_predict ,squared=False):.2f}\\n\\\n",
|
||
|
"R2 Score: {r2_score(y_train, train_predict):.2f}\\\n",
|
||
|
"')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "720d24fd-a752-4eed-b5df-3be0e62b10c6",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Calculating the Root Mean Squared Error will show us on average how far off our prediction is in USD. Here we're off by an amount in the mid $30,000s on average.<br>\n",
|
||
|
"The R2 scores show how good of a fit we have to the data. As you can see they're about equal, indicating a possible overfit. This model may not perform well with unseen data.<br>\n",
|
||
|
"Next try a Lasso"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 158,
|
||
|
"id": "9d11d0b6-3b2e-4054-8102-43c3f7041e52",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:11.893071Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:11.892836Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:12.034623Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:12.033989Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:11.893051Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"--Test--\n",
|
||
|
"RMSE: 37768.26\n",
|
||
|
"R2 Score: 0.79\n",
|
||
|
"\n",
|
||
|
"--Train--\n",
|
||
|
"RMSE: 32467.83\n",
|
||
|
"R2 Score: 0.82\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pipe = Pipeline([\n",
|
||
|
" ('scale', QuantileTransformer()),\n",
|
||
|
" ('model', LassoCV())\n",
|
||
|
"])\n",
|
||
|
"model = pipe.fit(X_train,y_train)\n",
|
||
|
"test_predict = model.predict(X_test)\n",
|
||
|
"train_predict = model.predict(X_train)\n",
|
||
|
"print(f'\\\n",
|
||
|
"--Test--\\n\\\n",
|
||
|
"RMSE: {mean_squared_error(y_test, test_predict ,squared=False):.2f}\\n\\\n",
|
||
|
"R2 Score: {r2_score(y_test, test_predict):.2f}\\n\\n\\\n",
|
||
|
"--Train--\\n\\\n",
|
||
|
"RMSE: {mean_squared_error(y_train, train_predict ,squared=False):.2f}\\n\\\n",
|
||
|
"R2 Score: {r2_score(y_train, train_predict):.2f}\\\n",
|
||
|
"')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "ea2e7079-3dfc-419f-a3c4-09077aabe272",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"And now a Ridge"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 159,
|
||
|
"id": "891b339f-6ca5-452c-b225-3b659a5a749b",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:12.035545Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:12.035290Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:12.109181Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:12.108432Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:12.035527Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"--Test--\n",
|
||
|
"RMSE: 37818.28\n",
|
||
|
"R2 Score: 0.79\n",
|
||
|
"\n",
|
||
|
"--Train--\n",
|
||
|
"RMSE: 32482.25\n",
|
||
|
"R2 Score: 0.82\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pipe = Pipeline([\n",
|
||
|
" ('scale', QuantileTransformer()),\n",
|
||
|
" ('model', RidgeCV())\n",
|
||
|
"])\n",
|
||
|
"model = pipe.fit(X_train,y_train)\n",
|
||
|
"test_predict = model.predict(X_test)\n",
|
||
|
"train_predict = model.predict(X_train)\n",
|
||
|
"print(f'\\\n",
|
||
|
"--Test--\\n\\\n",
|
||
|
"RMSE: {mean_squared_error(y_test, test_predict ,squared=False):.2f}\\n\\\n",
|
||
|
"R2 Score: {r2_score(y_test, test_predict):.2f}\\n\\n\\\n",
|
||
|
"--Train--\\n\\\n",
|
||
|
"RMSE: {mean_squared_error(y_train, train_predict ,squared=False):.2f}\\n\\\n",
|
||
|
"R2 Score: {r2_score(y_train, train_predict):.2f}\\\n",
|
||
|
"')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "a0f0de54-b8c0-4999-bf5f-f66ebfc7cfde",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"There is hardly any difference in output, I believe due to the relatively clean input and the type of scaling applied. This can be demonstrated below. Brighter dots are higher price predictions"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 160,
|
||
|
"id": "e9f73528-d63e-4932-b961-9f92a1b53b59",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:12.110114Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:12.109871Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:12.114997Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:12.114440Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:12.110094Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def model_scatter(x,y):\n",
|
||
|
" figname = str(x)+'-'+str(y)+'.png'\n",
|
||
|
" f, ax = plt.subplots(figsize=(8, 7))\n",
|
||
|
" plt.scatter(X_trans[:,x], X_trans[:,y], c=new_predict, cmap='rocket_r', alpha=0.5)\n",
|
||
|
" plt.xlabel(X.columns[x])\n",
|
||
|
" plt.ylabel(X.columns[y])\n",
|
||
|
" plt.title('Predicted Sale Price')\n",
|
||
|
" f.savefig(figname, backend='Cairo')\n",
|
||
|
" plt.show()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 161,
|
||
|
"id": "d008af5d-50d3-4981-b9b8-b234fce79405",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:12.115799Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:12.115615Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:14.157868Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:14.157349Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:12.115783Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgcAAAHQCAYAAAA1cskxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOy9d5AkaXnn/3nfzCzf1d7M9HjT43bWsAusBRaQWA4nFklIJ4wkJB2EQifD6UCnIO4XpwvZkDmhk5BDikOBBEgggZAwy4Iwy7LA+t3xvqe9LV+Vme/z++Ot7pme6Z7psT3m/UQQ7FRlZb5vVna93/exSkQEh8PhcDgcjiZ6pQfgcDgcDofj2sKJA4fD4XA4HAtw4sDhcDgcDscCnDhwOBwOh8OxACcOHA6Hw+FwLMCJA4fD4XA4HAtw4sDhuI4YHBxk27ZtfPjDHz7na9cSH/zgB9m2bduKjuHTn/4027Zt4zvf+c6KjmMprvXxOW4+/JUegMNxrfOd73yHd73rXQtey2QybNy4kbe85S284x3vwPO8FRrdpTE4OMhnPvMZXvva17Jjx46VHg71ep1PfvKT/PM//zODg4PUajU6OjrYtGkTL3/5y/m5n/u5lR7iAj796U/za7/2a/P/VkqRzWYZGBjg7W9/Oz/0Qz+0coNzOC4BJw4cjmXyxje+kVe84hWICGNjY3zmM5/hN3/zNzl48CC/8Ru/sWLj6u/v59lnn70ogXLy5En+5E/+hP7+/hUXB1EU8e53v5unnnqKV77ylbzpTW8ik8kwODjI9773PT7ykY9cc+Jgjne+853s3r0bEWFwcJBPfepTfOADH2BkZIT3vve95/38W97yFt7whjcQBMFVGK3DcX6cOHA4lsnOnTt5y1veMv/v//yf/zOvf/3r+dSnPsUv/uIv0tXVtejnSqUSuVzuio1LKUUymbxi579afOUrX+Gpp57i3e9+N//jf/yPs94fGRlZgVEtj7vuuouHHnpo/t9ve9vbeOihh/jLv/xLfuZnfgbfX/yndu7Z8DzvurU+OW5MXMyBw3GR5HI57rjjDkSEEydOAPDqV7+ad77znbz44ou85z3v4c477+TNb37z/GeOHj3Kr/7qr3L//fdzyy238OpXv5rf+Z3foVKpnHX+733ve/zYj/0Yt956K/feey//63/9r0WPO1fMwRe/+EXe+c53ctddd3Hbbbfxute9jv/9v/83jUaDT3/60/Pukl/7tV9j27ZtbNu2jXe+853znxcRPv7xj/Pwww9z2223cccdd/DOd76Txx9//Kxr1et1fud3fof777+fW2+9lR/+4R/mm9/85rLv57FjxwC45557Fn2/r69vwb8PHTrE//f//X+84Q1v4I477uC2227j4Ycf5pOf/OSyr9loNPjIRz7CG97wBnbv3s1dd93Fe9/7Xl588cVln2MxVq1axebNmymVSkxNTQGwbds2PvjBD/Ltb3+bH//xH+eOO+7gfe97H7B0zEGj0eAv//Ivectb3sJtt93GnXfeycMPP8zf/d3fLTiuWCzye7/3e/zAD/wAt9xyC3fffTe/8iu/Mv9cOhwXirMcOBwXiYjML2jt7e3zrw8NDfHud7+bhx56iB/8wR+cX9Cff/553v3ud5PP53n7299Ob28ve/fu5WMf+xhPPfUUH/vYx+bNys888ww/9VM/RTab5Wd/9mdpaWnh3/7t3/jABz6w7PH94R/+IR/5yEfYsmULP/mTP0l3dzfHjx/nS1/6Ev/1v/5XXvrSl/Le976Xj3zkI7z97W/nzjvvBFhgAfnVX/1VPv/5z/O6172Ohx9+mEajwec+9zl++qd/mg9/+MO85jWvmT/2V37lV3jkkUd48MEHeeCBBzh+/Di/8Au/wJo1a5Y13rVr1wLw2c9+lnvuuYdUKnXO45944gm+973v8apXvYo1a9ZQrVb5whe+wIc+9CGmp6f5L//lv5zz82EY8p73vIennnqKt7zlLfzET/wEpVKJT37yk/z4j/84f/d3f8fu3buXNfYzaTQaDA8P4/s++Xx+/vXnn3+eL37xi/zoj/4ob33rW897jve85z088cQT3H///bz5zW8mmUyyf/9+vvSlL/GOd7wDsMLgx37sxxgaGuJtb3sbW7duZXx8nI9//OP8yI/8CP/0T/9Ef3//Rc3DcRMjDofjnDz++OMyMDAgH/7wh2VyclImJydlz5498uu//usyMDAgP/qjPzp/7IMPPigDAwPyyU9+8qzzvOlNb5LXve51UiwWF7z+pS99SQYGBuSf/umf5l97+9vfLrt27ZLDhw/Pv1av1+Vtb3ubDAwMyB//8R/Pv37ixImzXnvmmWdkYGBA3vnOd0qtVltwPWOMGGMWzO30a585rn/4h39Y8HoYhvLWt75VHnzwwfnzfOMb35CBgQH5wAc+sODYL3/5yzIwMCADAwNnnf9M6vW6vPWtb5WBgQG588475ed+7ufkT/7kT+Rb3/qWNBqNs44vl8tnvRbHsbzjHe+Ql7zkJQs+80//9E8yMDAgjz/++Pxrf/M3fyMDAwPy9a9/fcE5isWivPKVr5R3vOMd5x3z3Hn/8R//USYnJ2ViYkKeeeYZed/73icDAwPyy7/8y/PHzt2Hb33rW0ue5/Tx/cVf/IUMDAzI7//+7y86zzl+4zd+Q3bv3i179uxZcMzg4KDccccdZ30nDsdycJYDh2OZfPjDH15gutda8+pXv/qsYMS2tjYefvjhBa/t27ePffv28Qu/8As0Go15UzPAnXfeSSaT4Vvf+hYPP/wwk5OTPPXUU7zuda9j48aN88clEgl+8id/kve///3nHetnP/tZAN7//vefFY+glFrWfD/72c+SzWZ57Wtfu2C8YN0nH/7whzl69CgbN27kkUceAeA973nPguNe+9rXsnHjRo4cOXLe6yUSCT72sY/x//7f/+Pf//3f+Y//+A++9rWvAdDZ2ckHP/jBBS6aTCYz/9/1ep1KpYKIcN999/HEE09w+PDhc6ZQfvazn2XTpk3s2rXrrPnde++9/PM//zO1Wu28FgzgrBiJIAh461vfyoc+9KEFr2/fvp177733vOcD+NznPkdrays///M/f9Z7WluPsIjwuc99jpe+9KX09PQsmEc6neb222+/INeOwzGHEwcOxzJ5+9vfzkMPPYRSinQ6zYYNG2hrazvruLVr154VXHbo0CHgbIFxOhMTEwDzfuJNmzaddcyWLVuWNdZjx46hlGL79u3LOn4xDh06RLlcPudiNjk5ycaNGzlx4gRaazZs2HDWMZs3b16WOADIZrO8733v433vex+lUolnn32WRx55hE9+8pN84AMfoL+/f979US6X+ZM/+RP+/d//neHh4bPOVSgUzju/Wq22ZIwDwPT0NKtWrTrvuH/+53+eu+66az6VcdOmTYsGoS52f5bi2LFj7Nix45zBplNTU8zMzPDNb35zyXnMCQmH40Jw4sDhWCbr169f1q4vnU4v+d5P//RP88ADDyz63pxvWkSAxXf4c++dDxFZtoXgXOfo6Ojg93//95c8ZuvWrcs6z8WQy+W49957uffee9m+fTsf+tCH+PSnPz0vDt7//vfzta99jR/90R/lpS99Ka2trfi+z3/8x3/wt3/7txhjzjuugYGBBXUKzqSjo2NZYx0YGLjkZ+NimLu39957Lz/7sz97Wc/tuLlx4sDhuAqsX78esLu48y0i69atA05ZG05nsdcWY+PGjXzjG99g37593HrrrUsedy4BsX79eo4ePcptt91GNps95/XWrl2LMYajR4+eJRgOHz68rDGfi9tuuw2A0dFRwFoFvva1r/GWt7yF//W//teCYx977LFlnXP9+vVMT09z9913X5O76w0bNnD48GEajQaJRGLRYzo6Osjn85RKpWW7KxyO5XDt/UU4HDcgO3fuZGBggH/4h39YNL0siiJmZmYA61+//fbbefTRRxeY4xuNBn/7t3+7rOu96U1vAuAP/uAPaDQaZ70/t+Oc89vPzs6edcwP/dAPYYzhD/7gDxa9xpwbBJjPWvjrv/7rBcc88sgjy3Yp7Nmzh7G
|
||
|
"text/plain": [
|
||
|
"<Figure size 576x504 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgcAAAHQCAYAAAA1cskxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOzdeXhdV3no/+/ae5950DzLkizZ8hTbcSbHzkQGIBBCSICEAEkYCiXloS2lPNByube37aW3pQHapL0MLVDgl4YwBBJCBpKQkMRxRju241myZM2zdOZp7/X749iyFcm27FjW4PfzPH4Sn72197tk6ez3rPWutZTWWiOEEEIIcYgx2wEIIYQQYm6R5EAIIYQQE0hyIIQQQogJJDkQQgghxASSHAghhBBiAkkOhBBCCDGBJAdCzCOdnZ0sW7aMu++++7ivzSVf/vKXWbZs2azG8Mtf/pJly5bx4osvzmocxzLX4xNnH2u2AxBirnvxxRe5/fbbJ7zm9/tZvHgxN9xwAx/96EcxTXOWontrOjs7eeCBB7jmmmtYsWLFbIdDOp3m/vvv51e/+hWdnZ2kUimKi4tpbGxk/fr1fPrTn57tECf45S9/yV/91V+N/10pRSAQoLm5mVtuuYX3ve99sxecEG+BJAdCTNN73vMeLr/8crTW9Pf388ADD/C1r32N/fv383d/93ezFldNTQ3btm07pQSlq6uLe+65h5qamllPDnK5HHfccQdbtmzhiiuu4Prrr8fv99PZ2ckrr7zCt7/97TmXHBx22223sXr1arTWdHZ28rOf/YwvfelL9Pb28pnPfOaEX3/DDTdw3XXX4XK5zkC0QpyYJAdCTNPKlSu54YYbxv/+4Q9/mHe961387Gc/48/+7M8oLS2d8utisRjBYHDG4lJK4fF4Zuz6Z8qTTz7Jli1buOOOO/jrv/7rScd7e3tnIarpueCCC7j22mvH//7+97+fa6+9lu9973v80R/9EZY19Vvt4Z8N0zTnbe+TWJik5kCIUxQMBlm3bh1aazo6OgC46qqruO2229i5cyef/OQnOf/883nve987/jVtbW188Ytf5NJLL+Wcc87hqquu4h//8R9JJBKTrv/KK6/woQ99iDVr1rBx40b+9m//dsrzjldz8Nhjj3HbbbdxwQUXsHbtWt75znfy93//92QyGX75y1+OD5f81V/9FcuWLWPZsmXcdttt41+vtebee+/lpptuYu3ataxbt47bbruNzZs3T7pXOp3mH//xH7n00ktZs2YNH/jAB3juueem/f1sb28HYMOGDVMer6ysnPD3lpYW/uZv/obrrruOdevWsXbtWm666Sbuv//+ad8zk8nw7W9/m+uuu47Vq1dzwQUX8JnPfIadO3dO+xpTqaqqoqmpiVgsxvDwMADLli3jy1/+Mi+88AK33nor69at48477wSOXXOQyWT43ve+xw033MDatWs5//zzuemmm/jJT34y4bxoNMrXv/513v72t3POOedw8cUX8xd/8RfjP5dCnCzpORDiFGmtxx9oRUVF4693d3dzxx13cO211/KOd7xj/IG+Y8cO7rjjDsLhMLfccgsVFRXs3r2bH//4x2zZsoUf//jH493Kr7/+Oh//+McJBAJ86lOfIhQK8dvf/pYvfelL047vm9/8Jt/+9rdZsmQJH/vYxygrK+PgwYM8/vjj/Omf/ikXXnghn/nMZ/j2t7/NLbfcwvnnnw8woQfki1/8Ig8//DDvfOc7uemmm8hkMjz00EN84hOf4O677+bqq68eP/cv/uIveOKJJ7jyyiu57LLLOHjwIJ/73Oeora2dVryLFi0C4MEHH2TDhg14vd7jnv/SSy/xyiuv8La3vY3a2lqSySSPPvooX/3qVxkZGeGP//iPj/v12WyWT37yk2zZsoUbbriBj3zkI8RiMe6//35uvfVWfvKTn7B69eppxf5mmUyGnp4eLMsiHA6Pv75jxw4ee+wxbr75Zm688cYTXuOTn/wkL730Epdeeinvfe978Xg87N27l8cff5yPfvSjQD4x+NCHPkR3dzfvf//7Wbp0KQMDA9x777188IMf5Be/+AU1NTWn1A5xFtNCiOPavHmzbm5u1nfffbceGhrSQ0NDeteuXforX/mKbm5u1jfffPP4uVdeeaVubm7W999//6TrXH/99fqd73ynjkajE15//PHHdXNzs/7FL34x/tott9yiV61apVtbW8dfS6fT+v3vf79ubm7W//qv/zr+ekdHx6TXXn/9dd3c3Kxvu+02nUqlJtzPcRztOM6Eth197zfHdd999014PZvN6htvvFFfeeWV49d59tlndXNzs/7Sl7404dzf/e53urm5WTc3N0+6/pul02l944036ubmZn3++efrT3/60/qee+7Rzz//vM5kMpPOj8fjk16zbVt/9KMf1eedd96Er/nFL36hm5ub9ebNm8df+8EPfqCbm5v1H/7whwnXiEaj+oorrtAf/ehHTxjz4ev+/Oc/10NDQ3pwcFC//vrr+s4779TNzc3685///Pi5h78Pzz///DGvc3R83/3ud3Vzc7O+6667pmznYX/3d3+nV69erXft2jXhnM7OTr1u3bpJ/yZCTIf0HAgxTXffffeErnvDMLjqqqsmFSMWFhZy0003TXhtz5497Nmzh8997nNkMpnxrmaA888/H7/fz/PPP89NN93E0NAQW7Zs4Z3vfCeLFy8eP8/tdvOxj32ML3zhCyeM9cEHHwTgC1/4wqR6BKXUtNr74IMPEggEuOaaaybEC/nhk7vvvpu2tjYWL17ME088AcAnP/nJCeddc801LF68mAMHDpzwfm63mx//+Mf86Ec/4pFHHuGZZ57h6aefBqCkpIQvf/nLE4Zo/H7/+P+n02kSiQRaay655BJeeuklWltbjzuF8sEHH6SxsZFVq1ZNat/GjRv51a9+RSqVOmEPBjCpRsLlcnHjjTfy1a9+dcLry5cvZ+PGjSe8HsBDDz1EQUEBn/3sZycdM4z8iLDWmoceeogLL7yQ8vLyCe3w+Xyce+65JzW0I8RhkhwIMU233HIL1157LUopfD4fDQ0NFBYWTjpv0aJFk4rLWlpagMkJxtEGBwcBxseJGxsbJ52zZMmSacXa3t6OUorly5dP6/yptLS0EI/Hj/swGxoaYvHixXR0dGAYBg0NDZPOaWpqmlZyABAIBLjzzju58847icVibNu2jSeeeIL777+fL33pS9TU1IwPf8Tjce655x4eeeQRenp6Jl0rEomcsH2pVOqYNQ4AIyMjVFVVnTDuz372s1xwwQXjUxkbGxunLEKd6vtzLO3t7axYseK4xabDw8OMjo7y3HPPHbMdhxMJIU6GJAdCTFN9ff20PvX5fL5jHvvEJz7BZZddNuWxw2PTWmtg6k/4h4+diNZ62j0Ex7tGcXExd9111zHPWbp06bSucyqCwSAbN25k48aNLF++nK9+9av88pe/HE8OvvCFL/D0009z8803c+GFF1JQUIBlWTzzzDP88Ic/xHGcE8bV3Nw8YZ2CNysuLp5WrM3NzW/5Z+NUHP7ebty4kU996lOn9dri7CbJgRBnQH19PZD/FHeih0hdXR1wpLfhaFO9NpXFixfz7LPPsmfPHtasWXPM846XQNTX19PW1sbatWsJBALHvd+iRYtwHIe2trZJCUNra+u0Yj6etWvXAtDX1wfkewWefvppbrjhBv72b/92wrmbNm2a1jXr6+sZGRnh4osvnpOfrhsaGmhtbSWTyeB2u6c8p7i4mHA4TCwWm/ZwhRDTMfd+I4RYgFauXElzczP33XfflNPLcrkco6OjQH58/dxzz+Wpp56a0B2fyWT44Q9/OK37XX/99QB84xvfIJPJTDp++BPn4XH7sbGxSee8733vw3EcvvGNb0x5j8PDIMD4rIX//M//nHDOE088Me0hhV27dtHf3z/lscM1DYeHVY4ecz9af38/P/vZz6Z1v/e9730MDAzwgx/8YMrjR7dvNlx//fWMjY3x7//+75OOHW63YRhcf/31bNu2jUcffXTK6wwNDc1onGJhkp4DIc4ApRT/9E//xB133MF
|
||
|
"text/plain": [
|
||
|
"<Figure size 576x504 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgkAAAHQCAYAAAAru/mCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOz9eZRdV3ngf3/3PsOdb82lkkqz5JLnmcETxgyxCYPBJCF0mAKBhpWVzkCnIZ3F+mV1emVcBDrQCQkJydtkEYZmCIQOEDAYjDGOwROeZM1VqpJqvnXnM+zn/WNXlSSrSpZlWaqS9mctY6vuqXP3OSXqPHfvZz+PEhHBcRzHcRznafTZHoDjOI7jOCuTCxIcx3Ecx1mSCxIcx3Ecx1mSCxIcx3Ecx1mSCxIcx3Ecx1mSCxIcx3Ecx1mSCxIcZxUZGRlhx44dfOxjHzvh11aSD37wg+zYseOsjuFLX/oSO3bs4Mc//vFZHcdyVvr4nPOXf7YH4Dgr3Y9//GPe9ra3HfO1fD7Pli1buP3223nLW96C53lnaXTPzcjICF/+8pd5xStewUUXXXS2h0O73ebzn/88X/nKVxgZGaHVatHd3c3WrVt50YtexHve856zPcRjfOlLX+L3fu/3Fv+slKJQKDA0NMSb3vQmXv/615+9wTnOaeCCBMc5Sa95zWt4yUtegogwPj7Ol7/8Zf7oj/6IXbt28Yd/+IdnbVyDg4M8/PDDpxSoHDx4kI9//OMMDg6e9SAhSRLe/va388ADD3DzzTfz2te+lnw+z8jICPfffz+f+MQnVlyQsOCtb30rl112GSLCyMgIX/jCF/jABz7AoUOHeO973/uM33/77bfz6le/miAIzsBoHefkuSDBcU7SxRdfzO2337745//0n/4Tr3rVq/jCF77Ab/7mb9Lb27vk99VqNYrF4vM2LqUUmUzmeTv/mfKd73yHBx54gLe//e389//+3497/dChQ2dhVCfn2muv5bbbblv88xvf+EZuu+02PvnJT/Jrv/Zr+P7Sv2oX/m54nrdqZ6Occ5vLSXCcU1QsFrnqqqsQEYaHhwF42ctexlvf+lYee+wx3vWud3HNNdfwute9bvF79u3bx+/+7u9y4403cumll/Kyl72MP/3TP6XRaBx3/vvvv59f/uVf5vLLL+f666/nf/yP/7HkcSfKSfjmN7/JW9/6Vq699lquuOIKbr31Vv7n//yfRFHEl770pcVllN/7vd9jx44d7Nixg7e+9a2L3y8ifOYzn+GOO+7giiuu4KqrruKtb30r995773Hv1W63+dM//VNuvPFGLr/8cn7hF36Bu++++6Tv5/79+wG47rrrlnx9YGDgmD/v3r2bP/iDP+DVr341V111FVdccQV33HEHn//850/6PaMo4hOf+ASvfvWrueyyy7j22mt573vfy2OPPXbS51jK2rVr2bZtG7VajenpaQB27NjBBz/4QX70ox/x5je/mauuuor3ve99wPI5CVEU8clPfpLbb7+dK664gmuuuYY77riDf/qnfzrmuGq1yp//+Z/zyle+kksvvZQXv/jF/M7v/M7i30vHOVVuJsFxTpGILD7Yurq6Fr8+OjrK29/+dm677TZ+7ud+bvHB/rOf/Yy3v/3tlMtl3vSmN7FmzRqeeOIJPv3pT/PAAw/w6U9/enG6+aGHHuJXf/VXKRQKvPvd76ZUKvH//t//4wMf+MBJj+8jH/kIn/jEJ9i+fTvveMc76Ovr48CBA3zrW9/iv/yX/8ILXvAC3vve9/KJT3yCN73pTVxzzTUAx8yI/O7v/i5f//rXufXWW7njjjuIooivfe1rvPOd7+RjH/sYL3/5yxeP/Z3f+R2+/e1vc8stt3DTTTdx4MABfuM3foP169ef1Hg3bNgAwFe/+lWuu+46stnsCY+/7777uP/++3npS1/K+vXraTabfOMb3+BDH/oQMzMz/Of//J9P+P1xHPOud72LBx54gNtvv51f+ZVfoVar8fnPf543v/nN/NM//ROXXXbZSY396aIoYmxsDN/3KZfLi1//2c9+xje/+U1+6Zd+iTe84Q3PeI53vetd3Hfffdx444287nWvI5PJsHPnTr71rW/xlre8BbABwi//8i8zOjrKG9/4Ri644AImJib4zGc+wy/+4i/yxS9+kcHBwVO6DsdBHMc5oXvvvVeGhobkYx/7mExNTcnU1JQ8/vjj8vu///syNDQkv/RLv7R47C233CJDQ0Py+c9//rjzvPa1r5Vbb71VqtXqMV//1re+JUNDQ/LFL35x8WtvetOb5JJLLpE9e/Ysfq3dbssb3/hGGRoakr/8y79c/Prw8PBxX3vooYdkaGhI3vrWt0qr1Trm/YwxYow55tqOfu+nj+uzn/3sMV+P41je8IY3yC233LJ4nh/84AcyNDQkH/jAB4459t///d9laGhIhoaGjjv/07XbbXnDG94gQ0NDcs0118h73vMe+fjHPy4//OEPJYqi446v1+vHfS1NU3nLW94iV1999THf88UvflGGhobk3nvvXfzaP/zDP8jQ0JB8//vfP+Yc1WpVbr75ZnnLW97yjGNeOO///b//V6ampmRyclIeeughed/73idDQ0Py27/924vHLtyHH/7wh8ue5+jx/e3f/q0MDQ3Jhz/84SWvc8Ef/uEfymWXXSaPP/74MceMjIzIVVddddzPxHGeDTeT4Dgn6WMf+9gxU/paa172spcdl7TY2dnJHXfccczXnnzySZ588kl+4zd+gyiKFqegAa655hry+Tw//OEPueOOO5iamuKBBx7g1ltvZcuWLYvHhWHIO97xDt7//vc/41i/+tWvAvD+97//uHwFpdRJXe9Xv/pVCoUCr3jFK44ZL9hllY997GPs27ePLVu28O1vfxuAd73rXccc94pXvIItW7awd+/eZ3y/MAz59Kc/zf/5P/+Hf/u3f+Ouu+7ie9/7HgA9PT188IMfPGbpJp/PL/53u92m0WggItxwww3cd9997Nmz54RbL7/61a+ydetWLrnkkuOu7/rrr+crX/kKrVbrGWc0gONyKIIg4A1veAMf+tCHjvn6hRdeyPXXX/+M5wP42te+RkdHB7/+679+3Gta25ViEeFrX/saL3jBC+jv7z/mOnK5HFdeeeWzWvJxnKdzQYLjnKQ3velN3HbbbSilyOVybN68mc7OzuOO27Bhw3FJaLt37waODzSONjk5CbC4jrx169bjjtm+fftJjXX//v0opbjwwgtP6vil7N69m3q9fsKH2tTUFFu2bGF4eBitNZs3bz7umG3btp1UkABQKBR43/vex/ve9z5qtRoPP/ww3/72t/n85z/PBz7wAQYHBxeXRer1Oh//+Mf5t3/7N8bGxo4719zc3DNeX6vVWjYHAmBmZoa1a9c+47h//dd/nWuvvXZxC+TWrVuXTFZd6v4sZ//+/Vx00UUnTEqdnp5mdnaWu+++e9nrWAgoHOdUuCDBcU7Spk2bTupTYC6XW/a1d77zndx0001Lvrawdi0iwNKf+BdeeyYictIzBic6R3d3Nx/+8IeXPeaCCy44qfOcimKxyPXXX8/111/PhRdeyIc+9CG+9KUvLQYJ73//+/ne977HL/3SL/GCF7yAjo4OfN/nrrvu4h//8R8xxjzjuIaGho6pc/B03d3dJzXWoaGh5/x341Qs3Nvrr7+ed7/73af13I4DLkhwnDNi06ZNgP1U90wPk40bNwJHZh+OttTXlrJlyxZ+8IMf8OSTT3L55Zcve9yJAolNmzaxb98+rrjiCgqFwgnfb8OGDRhj2Ldv33GBw549e05qzCdyxRVXAHD48GHAzhJ873vf4/bbb+d//I//ccyx99xzz0mdc9OmTczMzPDiF794RX7a3rx5M3v27CGKIsIwXPKY7u5uyuUytVrtpJcxHOfZWHn/z3Ccc9DFF1/M0NAQn/3sZ5fclpYkCbOzs4Bdf7/yyiu58847j5mmj6KIf/zHfzyp93vta18LwF/8xV8QRdFxry98Al1Y169UKscd8/rXvx5jDH/xF3+x5HssLI8Ai7sc/v7v//6YY7797W+f9FLD448/zvj
|
||
|
"text/plain": [
|
||
|
"<Figure size 576x504 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"new_predict = model.predict(X)\n",
|
||
|
"X_trans = StandardScaler().fit_transform(X)\n",
|
||
|
"\n",
|
||
|
"model_scatter(0,1)\n",
|
||
|
"model_scatter(0,2)\n",
|
||
|
"model_scatter(2,1)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "548cb174-4c66-4fdc-ba4b-ddeea123a1e5",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"If you stare at it long enough you can make some sense of it, but it's not a clear picture. You don't get the full story.<br>\n",
|
||
|
"But if we change the scaler..."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 162,
|
||
|
"id": "60a2f70e-b778-4251-99b0-43d4ba2cf1e0",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:14.158763Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:14.158528Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:14.163957Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:14.162995Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:14.158746Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[-0.03310339, -0.06938842, 0.6904143 , ..., -0.17377076,\n",
|
||
|
" -0.0664537 , -1.66010942],\n",
|
||
|
" [ 1.2909327 , -0.76451772, 0.30382328, ..., -0.17377076,\n",
|
||
|
" -0.0664537 , -1.66010942],\n",
|
||
|
" [-0.90206642, 0.69194368, -0.37772237, ..., -0.17377076,\n",
|
||
|
" -0.0664537 , 0.60236993],\n",
|
||
|
" ...,\n",
|
||
|
" [ 0.86056949, 1.48637717, -0.4472133 , ..., -0.17377076,\n",
|
||
|
" -0.0664537 , 0.60236993],\n",
|
||
|
" [-0.60760739, 0.5595381 , 0.09535049, ..., -0.17377076,\n",
|
||
|
" -0.0664537 , -1.66010942],\n",
|
||
|
" [ 0.63612169, -0.83072051, -0.45828603, ..., -0.17377076,\n",
|
||
|
" -0.0664537 , 0.60236993]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 162,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"X_trans"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 163,
|
||
|
"id": "fb712bcb-5d39-47bd-9ea3-fae66621552b",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:14.166982Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:14.166410Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:16.306778Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:16.306017Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:14.166950Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgkAAAHQCAYAAAAru/mCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOz9ebBlWVnmj3/W2nufebrzvXlzHm7WnDVCURRgAUrxowEpVEQpUWltCMO21bbB7jCiv3aHLRqoIUaDsyFIKzRDgzaDRTEXVQXUPGRm5XjzztOZpz2s9/fH2nfKvDlUVZYF9nkiGPKec/Zee1zPet/nfV4lIkIPPfTQQw899NDDWdAv9gB66KGHHnrooYfvT/RIQg899NBDDz30sCV6JKGHHnrooYceetgSPZLQQw899NBDDz1siR5J6KGHHnrooYcetkSPJPTQQw899NBDD1uiRxJ66OEHEFNTUxw8eJAPfvCDF/zb9xPe9773cfDgwRd1DJ/61Kc4ePAgDzzwwIs6jvPh+318Pfy/B/fFHkAPPfyg4IEHHuBnfuZnNv0tk8mwZ88e3vzmN/OOd7wDx3FepNE9P0xNTfHpT3+a1772tVx55ZUv9nDodrt8/OMf5zOf+QxTU1N0Oh36+/vZu3cvL33pS/nFX/zFF3uIm/CpT32K3/zN31z7t1KKbDbLxMQEb3vb2/jRH/3RF29wPfTwPNAjCT308Czxb/7Nv+GVr3wlIsLCwgKf/vSn+Z3f+R2OHTvGf/tv/+1FG9f4+DiPPfbYcyIq09PT/Mmf/Anj4+MvOkkIw5B3vvOdPPzww7zqVa/ijW98I5lMhqmpKb773e/y4Q9/+PuOJKzi7rvv5tprr0VEmJqa4hOf+ATvfe97mZub493vfvdFf//mN7+ZN7zhDXie9y8w2h56uDh6JKGHHp4lrrrqKt785jev/funfuqneP3rX88nPvEJfuVXfoXBwcEtf9doNMjlci/YuJRSJJPJF2z7/1L48pe/zMMPP8w73/lO/vN//s/nfD43N/cijOrScPPNN3PnnXeu/futb30rd955J3/+53/Ov/23/xbX3fqVu3pvOI7zAxuN6uFfJ3qahB56eJ7I5XLccMMNiAhnzpwB4NWvfjV33303Tz31FO9617u46aabeNOb3rT2m1OnTvEbv/Eb3H777VxzzTW8+tWv5v3vfz+tVuuc7X/3u9/lJ3/yJ7nuuuu47bbb+O3f/u0tv3chTcIXv/hF7r77bm6++WYOHTrE6173Ov77f//v+L7Ppz71qbU0ym/+5m9y8OBBDh48yN133732exHhYx/7GHfddReHDh3ihhtu4O677+b+++8/Z1/dbpf3v//93H777Vx33XX82I/9GN/85jcv+XyePn0agJe97GVbfj46Orrp38ePH+e//tf/yhve8AZuuOEGDh06xF133cXHP/7xS96n7/t8+MMf5g1veAPXXnstN998M+9+97t56qmnLnkbW2FsbIx9+/bRaDRYWVkB4ODBg7zvfe/j29/+Nm9/+9u54YYbeM973gOcX5Pg+z5//ud/zpvf/GYOHTrETTfdxF133cVHP/rRTd+r1+v8/u//Pj/8wz/MNddcw6233sqv/dqvrd2XPfTwbNGLJPTQw/OEiKxNbH19fWt/n5mZ4Z3vfCd33nknP/IjP7I2sT/xxBO8853vpFAo8La3vY2RkREOHz7MRz7yER5++GE+8pGPrIWbH330UX7u536ObDbLL/zCL5DP5/m///f/8t73vveSx/eHf/iHfPjDH2b//v387M/+LENDQ0xOTvKlL32Jf//v/z233HIL7373u/nwhz/M2972Nm666SaATRGR3/iN3+Cf/umfeN3rXsddd92F7/t87nOf4+d//uf54Ac/yGte85q17/7ar/0a99xzD3fccQeveMUrmJyc5Jd/+ZfZvn37JY13x44dAHz2s5/lZS97GalU6oLff/DBB/nud7/LD/3QD7F9+3ba7TZf+MIX+K3f+i3K5TL/7t/9uwv+PggC3vWud/Hwww/z5je/mZ/+6Z+m0Wjw8Y9/nLe//e189KMf5dprr72ksZ8N3/eZnZ3FdV0KhcLa35944gm++MUv8hM/8RO85S1vueg23vWud/Hggw9y++2386Y3vYlkMsnRo0f50pe+xDve8Q7AEoSf/MmfZGZmhre+9a0cOHCAxcVFPvaxj/HjP/7jfPKTn2R8fPw5HUcP/w9Deuihh0vC/fffLxMTE/LBD35QlpeXZXl5WZ5++mn5L//lv8jExIT8xE/8xNp377jjDpmYmJCPf/zj52znjW98o7zuda+Ter2+6e9f+tKXZGJiQj75yU+u/e1tb3ubXH311XLixIm1v3W7XXnrW98qExMT8sd//Mdrfz9z5sw5f3v00UdlYmJC7r77bul0Opv2Z4wRY8ymY9u477PH9fd///eb/h4EgbzlLW+RO+64Y2073/jGN2RiYkLe+973bvruP//zP8vExIRMTEycs/2z0e125S1veYtMTEzITTfdJL/4i78of/InfyLf+ta3xPf9c77fbDbP+VsURfKOd7xDbrzxxk2/+eQnPykTExNy//33r/3tr//6r2ViYkK+/vWvb9pGvV6XV73qVfKOd7zjomNe3e7//t//W5aXl2VpaUkeffRRec973iMTExPyq7/6q2vfXT0P3/rWt867nY3j+7M/+zOZmJiQD3zgA1se5yr+23/7b3LttdfK008/vek7U1NTcsMNN5xzTXro4VLQiyT00MOzxAc/+MFNIX2tNa9+9avPES2WSiXuuuuuTX87cuQIR44c4Zd/+ZfxfX8tBA1w0003kclk+Na3vsVdd93F8vIyDz/8MK973evYs2fP2vcSiQQ/+7M/y6//+q9fdKyf/exnAfj1X//1c/QKSqlLOt7PfvazZLNZXvva124aL9i0ygc/+EFOnTrFnj17uOeeewB417vetel7r33ta9mzZw8nT5686P4SiQQf+chH+Nu//Vs+//nP87WvfY2vfvWrAAwMDPC+971vU+omk8ms/f9ut0ur1UJEePnLX86DDz7IiRMnLlh6+dnPfpa9e/dy9dVXn3N8t912G5/5zGfodDoXjWgA52goPM/jLW95C7/1W7+16e9XXHEFt91220W3B/C5z32OYrHIL/3SL53zmdY2YywifO5zn+OWW25heHh403Gk02muv/76Z5Xy6aGHVfRIQg89PEu87W1v484770QpRTqdZvfu3ZRKpXO+t2PHjnNEaMePHwfOJRobsbS0BLCWR967d+8539m/f/8ljfX06dMopbjiiisu6ftb4fjx4zSbzQtOasvLy+zZs4czZ86gtWb37t3nfGffvn2XRBIAstks73nPe3jPe95Do9Hgscce45577uHjH/84733vexkfH19LizSbTf7kT/6Ez3/+88zOzp6zrVqtdtHj63Q659VAAJTLZcbGxi467l/6pV/i5ptvXiuB3Lt375Zi1a3Oz/lw+vRprrzyyguKUldWVqhUKnzzm98873GsEooeeng26JGEHnp4lti1a9clrQLT6fR5P/v5n/95XvGKV2z52WruWkSArVf8q59dDCJyyRGDC22jv7+fD3zgA+f9zoEDBy5pO88FuVyO2267jdtuu40rrriC3/qt3+JTn/rUGkn49V//db761a/yEz/xE9xyyy0Ui0Vc1+VrX/saf/M3f4Mx5qLjmpiY2ORzcDb6+/svaawTExPP+954Llg9t7fddhu/8Au/cFm33cP/2+iRhB56+BfErl27ALuqu9hksnPnTmA9+rARW/1tK+zZs4dvfOMbHDlyhOuuu+6837sQkdi1axenTp3i0KFDZLPZC+5vx44dGGM4derUOcThxIkTlzTmC+HQoUMAzM/PAzZK8NWvfpU3v/nN/PZv//am7953332XtM1du3ZRLpe59dZbvy9X27t37+bEiRP4vk8ikdjyO/39/RQKBRqNxiWnMXro4VLw/fdE9NDDv2JcddVVTExM8Pd///dblqWFYUilUgFs/v3666/n3nvv3RSm932fv/mbv7mk/b3xjW8E4A/+4A/
|
||
|
"text/plain": [
|
||
|
"<Figure size 576x504 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgkAAAHQCAYAAAAru/mCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOy9d5BlZ32n/7zvSTeHvp17cugJmtFoFEAoIESSMAaBsMHYyNhm8cK6vF7b68XeLVftb+3yLnZhu4xrwfbauAxmsViCSSYISUgoh5E0o8nTM53jzfGk9/39cXpa05oeaZQQ2PehVEzfe+457znn3vN+3m8UWmtNly5dunTp0qXLs5Cv9gC6dOnSpUuXLj+edEVCly5dunTp0mVNuiKhS5cuXbp06bImXZHQpUuXLl26dFmTrkjo0qVLly5duqxJVyR06dKlS5cuXdakKxK6dPkJZGpqih07dvDJT37yOV/7ceJ3f/d32bFjx6s6hi9/+cvs2LGDhx566FUdx4X4cR9fl397mK/2ALp0+UnhoYce4hd/8RdXvZZIJNi8eTO33HILH/jABzAM41Ua3UtjamqKr3zlK7z5zW9m165dr/ZwcF2X22+/na9+9atMTU3R6XTo6elhy5YtvPa1r+VXf/VXX+0hruLLX/4yv/d7v7fytxCCZDLJ6Ogo73vf+3jXu9716g2uS5eXQFckdOnyAvnpn/5pXv/616O1ZmFhga985Sv80R/9ESdPnuQP/uAPXrVxjYyM8NRTT70ooTI9Pc1f/uVfMjIy8qqLhCAI+OAHP8iBAwe44YYbeMc73kEikWBqaopHH32UT3/60z92IuEst912G3v37kVrzdTUFF/84hf52Mc+xtzcHB/5yEee9/O33HILb3/727Es60cw2i5dnp+uSOjS5QWye/dubrnllpW/f/7nf563ve1tfPGLX+Q3fuM36O3tXfNzjUaDVCr1io1LCIHjOK/Y/n9UfP/73+fAgQN88IMf5L/+1/963vtzc3OvwqgujiuvvJKbb7555e/3vOc93HzzzfzN3/wN/+7f/TtMc+1H7tnvhmEYP7HWqC7/OunGJHTp8hJJpVLs378frTWTk5MAvPGNb+S2227j8OHDfOhDH+KKK67gne9858pnzpw5w+/8zu9w3XXXsWfPHt74xjfy8Y9/nFardd7+H330UX7u536OSy+9lGuuuYb/8T/+x5rbPVdMwne+8x1uu+02rrzySvbt28dNN93EH/7hH+J5Hl/+8pdX3Ci/93u/x44dO9ixYwe33Xbbyue11nz+85/n1ltvZd++fezfv5/bbruNBx988Lxjua7Lxz/+ca677jouvfRSfuZnfoYf/vCHF309x8fHAXjd61635vuDg4Or/j516hT//b//d97+9rezf/9+9u3bx6233srtt99+0cf0PI9Pf/rTvP3tb2fv3r1ceeWVfOQjH+Hw4cMXvY+1GBoaYuvWrTQaDUqlEgA7duzgd3/3d3nggQd4//vfz/79+/noRz8KXDgmwfM8/uZv/oZbbrmFffv2ccUVV3Drrbfyuc99btV29XqdP/mTP+Etb3kLe/bs4eqrr+a3fuu3Vr6XXbq8ULqWhC5dXiJa65WJLZ/Pr7w+MzPDBz/4QW6++Wbe+ta3rkzshw4d4oMf/CCZTIb3ve99DAwMcPToUT772c9y4MABPvvZz66Ym5988kl++Zd/mWQyyYc//GHS6TTf+ta3+NjHPnbR4/uzP/szPv3pT7Nt2zZ+6Zd+ib6+PiYmJvjud7/Lf/yP/5GrrrqKj3zkI3z605/mfe97H1dccQXAKovI7/zO7/DNb36Tm266iVtvvRXP8/j617/Or/zKr/DJT36SN73pTSvb/tZv/RZ33HEHN954I9dffz0TExP8+q//OuvWrbuo8a5fvx6Ar33ta7zuda8jFos95/YPP/wwjz76KG94wxtYt24d7Xabb3/72/z+7/8+5XKZf//v//1zft73fT70oQ9x4MABbrnlFn7hF36BRqPB7bffzvvf/34+97nPsXfv3osa+7PxPI/Z2VlM0ySTyay8fujQIb7zne/w3ve+l3e/+93Pu48PfehDPPzww1x33XW8853vxHEcjh8/zne/+10+8IEPAJFA+Lmf+zlmZmZ4z3vew/bt21lcXOTzn/88P/uzP8uXvvQlRkZGXtR5dPk3jO7SpctF8eCDD+rR0VH9yU9+UheLRV0sFvWRI0f0f/tv/02Pjo7q9773vSvb3njjjXp0dFTffvvt5+3nHe94h77pppt0vV5f9fp3v/tdPTo6qr/0pS+tvPa+971PX3LJJXpsbGzlNdd19Xve8x49Ojqq/+Iv/mLl9cnJyfNee/LJJ/Xo6Ki+7bbbdKfTWXU8pZRWSq06t3OP/exxfeELX1j1uu/7+t3vfre+8cYbV/Zz77336tHRUf2xj31s1bbf+9739OjoqB4dHT1v/8/GdV397ne/W4+OjuorrrhC/+qv/qr+y7/8S33fffdpz/PO277ZbJ73WhiG+gMf+IC+/PLLV33mS1/6kh4dHdUPPvjgymuf+cxn9OjoqL7nnntW7aNer+sbbrhBf+ADH3jeMZ/d7//7f/9PF4tFvbS0pJ988kn90Y9+VI+Ojurf/M3fXNn27HW47777Lrifc8f313/913p0dFR/4hOfWPM8z/IHf/AHeu/evfrIkSOrtpmamtL79+8/75506XIxdC0JXbq8QD75yU+uMulLKXnjG994XtBiLpfj1ltvXfXasWPHOHbsGL/+67+O53krJmiAK664gkQiwX333cett95KsVjkwIED3HTTTWzevHllO9u2+aVf+iV++7d/+3nH+rWvfQ2A3/7t3z4vXkEIcVHn+7WvfY1kMsmb3/zmVeOFyK3yyU9+kjNnzrB582buuOMOAD70oQ+t2u7Nb34zmzdv5vTp0897PNu2+exnP8s//MM/8C//8i/84Ac/4O677wagUCjwu7/7u6tcN4lEYuXfruvSarXQWnPttdfy8MMPMzY29pypl1/72tfYsmULl1xyyXnnd8011/DVr36VTqfzvBYN4LwYCsuyePe7383v//7vr3p9586dXHPNNc+7P4Cvf/3rZLNZfu3Xfu2896SMPMZaa77+9a9z1VVX0d/fv+o84vE4l1122Qty+XTpcpauSOjS5QXyvve9j5tvvhkhBPF4nE2bNpHL5c7bbv369ecFoZ06dQo4X2icy9LSEsCKH3nLli3nbbNt27aLGuv4+DhCCHbu3HlR26/FqVOnaDabzzmpFYtFNm/ezOTkJFJKNm3adN42W7duvSiRAJBMJvnoRz/KRz/6URqNBk899RR33HEHt99+Ox/72McYGRlZcYs0m03+8i//kn/5l39hdnb2vH3VarXnPb9Op3PBGAiAcrnM0NDQ8477137t17jyyitXUiC3bNmyZrDqWtfnQoyPj7Nr167nDEotlUpUKhV++MMfXvA8zgqKLl1eCF2R0KXLC2Tjxo0XtQqMx+MXfO9XfuVXuP7669d876zvWmsNrL3iP/ve86G1vmiLwXPto6enh0984hMX3Gb79u0XtZ8XQyqV4pprruGaa65h586d/P7v/z5f/vKXV0TCb//2b3P33Xfz3ve+l6uuuopsNotpmvzgBz/g7//+71FKPe+4RkdHV9U5eDY9PT0XNdbR0dGX/N14MZy9ttdccw0f/vCHX9Z9d/m3TVckdOnyI2Tjxo1AtKp7vslkw4YNwDPWh3NZ67W12Lx5M/feey/Hjh3j0ksvveB2zyUkNm7cyJkzZ9i3bx/JZPI5j7d+/XqUUpw5c+Y84TA2NnZRY34u9u3bB8D8/DwQWQnuvvtubrnlFv7H//gfq7a9//77L2qfGzdupFwuc/XVV/9YrrY3bdrE2NgYnudh2/aa2/T09JDJZGg0GhftxujS5WL48ftFdOnyr5jdu3czOjrKF77whTXT0oIgoFKpAJH//bLLLuPOO+9cZab3PI+///u/v6jjveMd7wDgT//0T/E877z3z65Az/r1q9Xqedu8613vQinFn/7pn655jLPuEWAly+Fv//ZvV21
|
||
|
"text/plain": [
|
||
|
"<Figure size 576x504 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgkAAAHQCAYAAAAru/mCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOz9eZBkV3nmj3/OuUvulbUvXb0v1a3W0lqREBIgwCAPBoEYG2MjY5uxB8Lh8dgeD3gmHDFfz4TH2IHtMI4B7/4ZzNgwgA3GbEJskhBC+9b7Ut21b5mVe97lvL8/TtbWVb0ItdTCzicC1JV5895zzz33nPe87/M+rxIRoY022mijjTbaaOMs6MvdgDbaaKONNtpo4+WJtpHQRhtttNFGG21siLaR0EYbbbTRRhttbIi2kdBGG2200UYbbWyItpHQRhtttNFGG21siLaR0EYbbbTRRhttbIi2kdBGGz+EGBsbY+/evXzkIx8572cvJ3zwgx9k7969l7UNn/3sZ9m7dy/f+973Lms7zoWXe/va+LcH93I3oI02fljwve99j5/5mZ9Z81k6nWbHjh3cddddvPvd78ZxnMvUuheGsbExPve5z/GGN7yBK6644nI3h2azyac+9Sn+8R//kbGxMRqNBt3d3ezcuZObb76ZX/zFX7zcTVyDz372s/zmb/7m8t9KKTKZDCMjI7zzne/kbW972+VrXBttvAC0jYQ22nie+LEf+zFe/epXIyLMzMzwuc99jt/5nd/h2LFj/M//+T8vW7uGh4d56qmnfiBDZXx8nD/5kz9heHj4shsJURTxnve8h8cff5zXvOY1vOUtbyGdTjM2NsYjjzzCxz72sZedkbCEe+65h6uvvhoRYWxsjE9/+tN84AMfYGpqive9730X/P1dd93Fm9/8ZjzPewla20YbF0bbSGijjeeJ/fv3c9dddy3//VM/9VP86I/+KJ/+9Kf5lV/5FXp7ezf8XaVSIZvNvmjtUkqRSCRetPO/VPj617/O448/znve8x7+23/7b+u+n5qaugytujjceOON3Hnnnct/v+Md7+DOO+/kz//8z/kP/+E/4LobT7lLY8NxnB9ab1Qb/zrR5iS00cYLRDab5brrrkNEOHPmDACve93ruOeee3juued473vfyw033MBb3/rW5d+cOnWK3/iN3+C2227jqquu4nWvex0f+tCHqNVq687/yCOP8JM/+ZNcc8013Hrrrfz2b//2hsedj5Pwla98hXvuuYcbb7yRAwcO8KY3vYn/9b/+F0EQ8NnPfnY5jPKbv/mb7N27l71793LPPfcs/15E+OQnP8ndd9/NgQMHuO6667jnnnt46KGH1l2r2WzyoQ99iNtuu41rrrmGf//v/z3333//Rffn6OgoAK985Ss3/H5wcHDN38ePH+d//I//wZvf/Gauu+46Dhw4wN13382nPvWpi75mEAR87GMf481vfjNXX301N954I+973/t47rnnLvocG2FoaIhdu3ZRqVRYWFgAYO/evXzwgx/ku9/9Lu9617u47rrreP/73w+cm5MQBAF//ud/zl133cWBAwe44YYbuPvuu/nEJz6x5rhyuczv//7v8yM/8iNcddVV3HLLLfzar/3a8rhso43ni7YnoY02XiBEZHlh6+rqWv58YmKC97znPdx555288Y1vXF7Yn3nmGd7znvfQ0dHBO9/5TgYGBjh06BAf//jHefzxx/n4xz++7G5+8skn+bmf+zkymQy/8Au/QC6X41/+5V/4wAc+cNHt+8M//EM+9rGPsXv3bn72Z3+Wvr4+Tp8+zVe/+lX+03/6T9x00028733v42Mf+xjvfOc7ueGGGwDWeER+4zd+gy9+8Yu86U1v4u677yYIAr7whS/w8z//83zkIx/h9a9//fKxv/Zrv8a9997LHXfcwe23387p06f55V/+ZTZv3nxR7d2yZQsAn//853nlK19JMpk87/EPP/wwjzzyCK997WvZvHkz9XqdL3/5y/zWb/0WhUKB//gf/+N5fx+GIe9973t5/PHHueuuu/jpn/5pKpUKn/rUp3jXu97FJz7xCa6++uqLavvZCIKAyclJXNelo6Nj+fNnnnmGr3zlK/zET/wEb3/72y94jve+9708/PDD3Hbbbbz1rW8lkUhw5MgRvvrVr/Lud78bsAbCT/7kTzIxMcE73vEO9uzZw+zsLJ/85Cf58R//cT7zmc8wPDz8A91HG/+GIW200cZF4aGHHpKRkRH5yEc+IvPz8zI/Py8HDx6U//7f/7uMjIzIT/zETywfe8cdd8jIyIh86lOfWneet7zlLfKmN71JyuXyms+/+tWvysjIiHzmM59Z/uyd73ynXHnllXLixInlz5rNprzjHe+QkZER+eM//uPlz8+cObPusyeffFJGRkbknnvukUajseZ6xhgxxqy5t9XXPrtdf//3f7/m8zAM5e1vf7vccccdy+f5zne+IyMjI/KBD3xgzbFf+9rXZGRkREZGRtad/2w0m015+9vfLiMjI3LDDTfIL/7iL8qf/MmfyAMPPCBBEKw7vlqtrvssjmN597vfLddff/2a33zmM5+RkZEReeihh5Y/++u//msZGRmRb3/722vOUS6X5TWveY28+93vvmCbl877//7f/5P5+XmZm5uTJ598Ut7//vfLyMiI/Oqv/urysUv98MADD5zzPKvb92d/9mcyMjIiH/7whze8zyX8z//5P+Xqq6+WgwcPrjlmbGxMrrvuunXPpI02LgZtT0IbbTxPfOQjH1nj0tda87rXvW4dabGzs5O77757zWeHDx/m8OHD/PIv/zJBECy7oAFuuOEG0uk0DzzwAHfffTfz8/M8/vjjvOlNb2LHjh3Lx/m+z8/+7M/y67/+6xds6+c//3kAfv3Xf30dX0EpdVH3+/nPf55MJsMb3vCGNe0FG1b5yEc+wqlTp9ixYwf33nsvAO9973vXHPeGN7yBHTt2cPLkyQtez/d9Pv7xj/O3f/u3fOlLX+Jb3/oW3/zmNwHo6enhgx/84JrQTTqdXv53s9mkVqshIrzqVa/i4Ycf5sSJE+dNvfz85z/Pzp07ufLKK9fd36233so//uM/0mg0LujRANZxKDzP4+1vfzu/9Vu/tebzffv2ceutt17wfABf+MIXyOfz/NIv/dK677S2EWMR4Qtf+AI33XQT/f39a+4jlUpx7bXXPq+QTxttLKFtJLTRxvPEO9/5Tu68806UUqRSKbZv305nZ+e647Zs2bKOhHb8+HFgvaGxGnNzcwDLceSdO3euO2b37t0X1dbR0VGUUuzbt++ijt8Ix48fp1qtnndRm5+fZ8eOHZw5cwatNdu3b193zK5duy7KSADIZDK8//3v5/3vfz+VSoWnnnqKe++9l0996lN84AMfYHh4eDksUq1W+ZM/+RO+9KUvMTk5ue5cpVLpgvfXaDTOyYEAKBQKDA0NXbDdv/RLv8SNN964nAK5c+fODcmqG/XPuTA6OsoVV1xxXlLqwsICxWKR+++//5z3sWRQtNHG80HbSGijjeeJbdu2XdQuMJVKnfO7n//5n+f222/f8Lul2LWIABvv+Je+uxBE5KI9Buc7R3d3Nx/+8IfPecyePXsu6jw/CLLZLLfeeiu33nor+/bt47d+67f47Gc/u2wk/Pqv/zrf/OY3+Ymf+Aluuukm8vk8ruvyrW99i7/5m7/BGHPBdo2MjKzROTgb3d3dF9XWkZGRFzw2fhAs9e2tt97KL/zCL1zSc7fxbxttI6GNNl5CbNu2DbC7ugstJlu3bgVWvA+rsdFnG2HHjh185zvf4fDhw1xzzTXnPO58hsS2bds4deoUBw4cIJPJnPd6W7ZswRjDqVOn1hkOJ06cuKg2nw8HDhwAYHp6GrBegm9+85vcdddd/PZv//aaYx988MGLOue2bdsoFArccsstL8vd9vbt2zlx4gRBEOD7/obHdHd309HRQaVSuegwRhttXAxefm9EG238K8b+/fsZGRnh7//+7zdMS4uiiGKxCNj4+7XXXst99923xk0fBAF/8zd/c1HXe8tb3gLAH/zBHxAEwbrvl3agS3H9xcX
|
||
|
"text/plain": [
|
||
|
"<Figure size 576x504 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"X_trans = QuantileTransformer().fit_transform(X)\n",
|
||
|
"\n",
|
||
|
"model_scatter(0,1)\n",
|
||
|
"model_scatter(0,2)\n",
|
||
|
"model_scatter(2,1)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "436079ee-9e80-45b9-906c-efcd3e157317",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The quantile transformer scales the data based on quantiles, reducing the effect of outliers and creating a clear picture of the data. You can clearly see\n",
|
||
|
"for instance in the first picture that the predicted sale price for an old shed is the lowest, and a new mansion is worth lots of money. The model can see this clearer as well"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 164,
|
||
|
"id": "5dfafaa7-499b-4779-8c62-e222b9783940",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:16.308442Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:16.307969Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:16.615516Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:16.614871Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:16.308412Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArYAAAIGCAYAAAChycyZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOzdeZzcVZXw/8/91tJVva/Zl07SSSdkY98FIZBhERlwEBUEBY04Rh8GYUaGAR3xN/AMCDggoiiugw+LLMIIAUFh2CMQCCTpbJ10J530Xr3W+r3398ftrnSl96R6See8Xy9fTldVV9261RlPne+55yhjjEEIIYQQQohDnDPWCxBCCCGEECIdJLAVQgghhBATggS2QgghhBBiQpDAVgghhBBCTAgS2AohhBBCiAlBAlshhBBCCDEhSGArhBhTZ555Jl/84hfH7PXvvfdeysvL2bVr16CPLS8v5zvf+c4orGp8G86e9eWJJ56gvLyct99+O80rE0Ic7iSwFUKkXTgc5le/+hVf+MIXOP7441m8eDEnn3wyX/3qV3niiSdIJBJjvcRDVndQWF5ezi9+8Ys+H7Nx48bkYyZKIF5dXc33vvc9Vq5cybJlyzj22GO59NJL+dWvfkUsFjuo525tbeXee+8d9UB7165d3HvvvWzcuHFUX1eIicw71gsQQkwsO3fuZNWqVezYsYOTTz6ZVatWUVBQQGNjI2+++SY33ngjW7du5Z//+Z/HeqnD9uGHH+I44yMfkJGRwRNPPMHVV1/d677HH3+cjIwMotHoGKws/f785z/z7W9/G8dxuOiii1i4cCHhcJhXX32V2267jaeffpoHH3yQ4uLiA3r+1tZW7rvvPlavXs0JJ5yQ5tX3b/fu3dx3331Mnz6dRYsWjdrrCjGRSWArhEibSCTC1772tWQmauXKlSn3r1q1ig8//JD169eP0QoPTkZGxlgvIenss8/m2Wef5cMPP2TZsmXJ22OxGM8++2zy/kPd5s2bue666ygsLOQ3v/kNs2bNSt535ZVX8uijj3LzzTdz7bXX8tvf/hal1BiuVggx1sZH6kEIMSE89thjVFZW8uUvf7lXUNtt2bJlXHbZZb1u37ZtG6tWreKoo47imGOO4Vvf+hb19fW9HtfW1sYdd9zB2WefzZIlSzjxxBO57rrrqK6u7vXYWCzGgw8+yIUXXsjy5cs55phjuPjii/nd73434PtwXZdbbrmFhQsX8uCDDyZv7+vSfvdt77//PpdffjlHHnkkJ5xwAjfddBMdHR29nvudd97h0ksvZdmyZZxyyin84Ac/YMuWLZSXl3PvvfcOuK6ezjjjDAoLC/nDH/6Qcvuf//xnQqEQn/nMZ/r93ccee4yLLrqIZcuWccwxx3DVVVfxt7/9rdfjtNb89Kc/5cwzz2Tp0qVccMEF/PGPf+z3eevq6vjud7/LJz/5SZYsWcKpp57KzTffTGNj45Df1/7uvfdeotEo//7v/54S1Hb77Gc/yznnnMPatWv561//mrz9O9/5DuXl5X0+Z8/P8e2332bFihUA3HfffckSjjPPPBOw5QLdn82zzz7LBRdcwNKlS/nkJz/Jvffe26us5otf/GLyd3vq+TxgS0quuOIKAG688cbk645lvbkQE4FkbIUQabNmzRoALr300mH9Xm1tLVdccQVnnXUW//zP/8ymTZt45JFHaG9v56GHHko+rq2tjc997nPU1NTwmc98hvnz51NfX8/DDz/MJZdcwh/+8AemT58O2KD26quv5p133uHUU0/l05/+NBkZGWzevJkXXniByy+/vM+1RCIRrrvuOl599VX+7//9v1x44YWDrn/jxo1cc801XHzxxXzqU5/inXfe4fHHH8dxHG699dbk4/72t79x1VVXkZeXx6pVq8jJyeG5557jvffeG9Z+AXi9Xi644AKeeOIJbrzxRgKBAAB/+MMfOOKII1i4cGGfv3fHHXfw85//nGXLlnHdddfR3t7Oo48+ypVXXsn999/P6aefnnzsbbfdxm9+8xuOO+44vvSlL9HY2Mj3v/99Zs6c2et5a2pquPTSS4nH4/zDP/wDs2bNYufOnfz+97/n7bff5g9/+AM5OTnDeo/RaJS//vWvTJkyhdNOO63fx332s5/l+eef54UXXuCMM84Y1mvMmzePG2+8kdtuu42zzz6bs88+G4CsrKyUx/3lL3/h17/+NZdddhnFxcW8/PLL3HfffdTU1HDbbbcN6zUBjjvuOK655hoeeOABLr30Uo455hiAAy6nEEJYEtgKIdJmy5YtZGVl9Rn4DGTnzp3cfffdnHfeecnbHMfh4YcfZtu2bcybNw+AH/3oR1RXV/Poo4+mBG4XXXQRF1xwAffeey+33347AL/+9a955513+NrXvsZ1112X8npa6z7XEQqFuOaaa6ioqOCBBx7g1FNPHdL6Kyoq+H//7/9x5JFHAvC5z32O9vZ2nnjiCb7zne8kg6Tbb78dpRT/7//9v+QefeELXzjgLN1nPvMZfv3rX/Piiy9ywQUXsHfvXt544w1uuummPh+/fft2fvGLX3D00Ufz61//Gr/fD8All1zC+eefz7//+7/z4osv4vF42L59O7/97W858cQTeeihh/B4PACsXLmyz2zwrbfeSiKR4KmnnmLKlCnJ288555zkIa9vfvObw3p/O3bsIBaLsWjRogFLDI444gjAfg7DVVxczFlnncVtt91GeXl5v19kNm7cyOOPP87ixYsBuPzyy1m9ejVPPPEEl156afKzH6qZM2dy8skn88ADD3DkkUcO6QuUEGJwUooghEib9vZ2srOzh/17kyZNSglqAU488UQAqqqqADDG8Mwzz3DccccxadIkmpqakv8JBoMceeSRvPbaa8nff+aZZ8jLy+Mb3/hGr9fr6wBYTU0Nn//856muruZ3v/vdkINagCOPPLJXYHPiiSeSSCTYvXs3AA0NDaxfv54VK1akBP4+ny95SXq4ysvLWbJkCU888QQATz75JF6vl0996lN9Pv6ll17CGMNXvvKVZFALMHnyZC666CJ2797Nhg0bUh775S9/ORnUAixevJhTTjkl5Xnb2tr461//yplnnonf70/5bKZPn86sWbN4/fXXh/3+2tvbAQbN9Hbf3/34kXDyyScng1oApRRf+cpXAHjxxRdH7HWFEMMjGVshRNpkZ2f3WVc6mL4yvPn5+YDNogI0NTURCoV47bXXOOmkk/p8np4B686dO1m0aNGQD3xdc801uK7LH//4R2bPnp329Xf3fJ0zZ06vx86dO3dYr9fTxRdfzK233sru3bt58sknWbFiBfn5+TQ1NfV6bPca5s+f3+u+BQsWALat1tKlS5M1y32tbd68eSlfIiorK9Fa8/jjj/P444/3uc7hZvGB5Jektra2AR/XHdAeyJeqoeq+atBTWVkZQJ/13UKIsSGBrRAibebPn8/atWuprq4eViDTMyO4P2NMyn9398NNt0996lM88sgj3H///dx2223Daus1nPWn2wUXXMD//b//l5tvvpmdO3dy8803D7qW4eirBGD/5+n++dOf/jQXXXRRn89zIB0lSktL8fv9g/Z57c4y9zws1l/pwoH2UD7Ybguu6x7U7wshhkYCWyFE2qxcuZK1a9fy2GOP9aprPViFhYXk5ubS3t7OySefPOjjS0tL2b59O7FYLOWye39WrVrF7Nmz+c///E8SiQT/+Z//OWDAOlzdgX5lZWWv+7Zv337Az5ubm5ts7TV16tReZQI9dXcV2LJlS68OA1u3bk1ZZ/d/b9u2rdeXlP3XO2vWLJRSxOPxIX02Q5WRkcFpp53Gn//8Z1599dV+D5A99thjAMmDXwB5eXmAzZh3Z8+h7+zqUILW7v3p67ae+5Ofn8/HH3/c67EH+rpCiOGRGlshRNpccsklzJkzh4ceeog///nPfT7mo48+4r//+7+H/dyO43DBBRfw4Ycf8vzzz/f5mJ5tpS644AJaWlq4//77ez2uv8zl1Vdfzb/+67/y7LPP8u1vfzutE9KKi4tZsmQJL730UkqQE4/H+c1vfnNQz/3Vr36V1atXc/PNNw+
|
||
|
"text/plain": [
|
||
|
"<Figure size 792x576 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\n",
|
||
|
"f, ax = plt.subplots(figsize=(11, 8))\n",
|
||
|
"plt.scatter(new_predict/1000,y/1000, alpha=0.5, color='crimson')\n",
|
||
|
"plt.xlabel('Predicted Price (x1000)')\n",
|
||
|
"plt.ylabel('Actual Price (x1000)')\n",
|
||
|
"plt.title('Checking Model Output')\n",
|
||
|
"f.savefig('mod_out.png', backend='Cairo')\n",
|
||
|
"plt.show()\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "88e661d4-47dd-4758-8c4c-6dce06e20b86",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The model seems to be a little fuzzy working with higher prices. But this is about the shape we're looking for. \"Perfect\" would be a solid straight line"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "f99b8a37-f856-423e-ba19-b79a5526ec8f",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Finally, output results to csv for kaggle submission"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 165,
|
||
|
"id": "59c5e4cc-470c-4503-9c17-0e80aba31f1a",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:16.616517Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:16.616255Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:16.639735Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:16.638937Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:16.616491Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
|
"RangeIndex: 878 entries, 0 to 877\n",
|
||
|
"Data columns (total 2 columns):\n",
|
||
|
" # Column Non-Null Count Dtype \n",
|
||
|
"--- ------ -------------- ----- \n",
|
||
|
" 0 Id 878 non-null int64 \n",
|
||
|
" 1 SalePrice 878 non-null float64\n",
|
||
|
"dtypes: float64(1), int64(1)\n",
|
||
|
"memory usage: 13.8 KB\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"results = pd.DataFrame(kaggle.pid)\n",
|
||
|
"kpredict = model.predict(X_kaggle)\n",
|
||
|
"results['SalePrice'] = pd.Series(kpredict)\n",
|
||
|
"results.columns = ['Id','SalePrice']\n",
|
||
|
"results.to_csv('data/results.csv',index=False)\n",
|
||
|
"results.info()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "45fb58be-3c77-455e-8b17-59af514e81ce",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Conclusion"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "f95059af-53da-43be-ae20-fd1633c46b19",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"This model was designed with a focus on quality and consistency. With some more refinement the margin of error should be able to be reduced\n",
|
||
|
"to a reasonable number and reliable, accurate predictions can be made for any application where there is a need to assess the value of a property.<br>\n",
|
||
|
"I think a big limiting factor here is the size of the data set compared to quality of the features provided. There are more features from this data set\n",
|
||
|
"that can be added but I think the biggest gains will be had from simply feeding in more data for the chosen features. The new added features will also\n",
|
||
|
"scale with more data. As you stray from the \"low hanging fruit\" features the quality of your data starts to go down: "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 166,
|
||
|
"id": "a082eb49-6b3f-4d1a-8132-110998919ec9",
|
||
|
"metadata": {
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2022-05-29T09:14:16.640736Z",
|
||
|
"iopub.status.busy": "2022-05-29T09:14:16.640476Z",
|
||
|
"iopub.status.idle": "2022-05-29T09:14:17.552167Z",
|
||
|
"shell.execute_reply": "2022-05-29T09:14:17.551321Z",
|
||
|
"shell.execute_reply.started": "2022-05-29T09:14:16.640718Z"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAHfCAYAAAAvE8DnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAACWGUlEQVR4nOzdeXxU1f34/9e5d2ayLywBlE1KgIqIqFVAET6yWOtasRWtWxGrXWi1Vltbq10+/vqxdW2hFG1rrVK3WkVxwwW/WGVxRVHUsLjgwpplkkySmbn3/P64M5OZrDeQmTtJ3s/HIyRz53DnZJLMvO857/M+SmutEUIIIUSfZnjdASGEEEJ4TwICIYQQQkhAIIQQQggJCIQQQgiBBARCCCGEQAICIYQQQgA+rzsgOrd7d63XXehWJSV5ANTUNHjck55Bnq+ukeera3rz81VWVuR1F3oUGSEQQgghhAQEQgghhJApAyFEF+XrKDQ0tb4jL4eQkpcUIXoq+esVQnRNQxOVdz7S6nD/i86AfHlJEaKnkikDIYQQQkhAIIQQQggJCIQQQgiBBARCCCGEQAICIYQQQiABgRBCCCGQgEAIIYQQSEAghBBCCCQgEEIIIQQSEAghhBACCQiEEEIIgQQEQgghhEACAiGEEEIgAYEQQgghkIBACCGEEEhAIIQQQggkIBBCCCEEEhAIIYQQAgkIhBBCCIEEBEIIIYRAAgIhhBBCIAGBEEIIIZCAQAghhBBIQCCEEEIIJCAQQgghBBIQCCGEEAIJCIQQQgiBBARCCCGEQAICIYQQQiABgRBCCCGQgEAIIYQQSEAghBBCCCQgEEIIIQQSEAghhBACCQiEEEIIgQQEQgghhAB8XndACCH2R76OQkNT84FIg3Pc5yek5CVOCLfkr0UI0bM1NFF55yOJm7m5fgDyv3UK5MtLnBBuyV+LEMI1rTVNr71L9OU30LsrQWtUfh5q6GDsYB3kF3jdRSHEPpKAQAjhSvSt92m64Q6st95POa4Btn7C7tN+QOCiMwlc9A1UwO9JH4UQ+04CAiFEh7RlEV56P01L7wPLBkCV9UcNHYzy+9BVNdgff46uC9H0p3uIPL+WvBt/hnnQUI97LoToCgkIhBDt0qFGGn52I9Hn1wJgjD2I0p98m7q3KlLbNTbhb2oidO8T2O9uIfStn5D3l1/jO+zLXnQ7K7VKfozLy5HkR5EV5LdQCNEmXVNL6Lu/SkwR+L91KrlXLSBgRaBFQKBycyj+/tlwwnE0/Ph36MoaQvN/Tt7NV+M/frIX3c8+LZIf4/pfdIYkP4qsIHUIhBCt2JU11M//uRMMKEXuL79H3i+/h8oJdPj/fEcdSsG/bkYNHwKNTTT88H8J/2dlhnothNgfEhAIIVLYu/YSuuCn2O9vA9Mg78afEvjWqa7/vzHyQAr+dTPGIeVg2zRe+0ea/tn6ylgIkV0kIBBCJFgffUb9eVdhb9sOfh95t12D/6QZXT6PMbAfBXfdgDn5MACafv9Xmpbci9a6u7sshOgmEhAIIQCw3t1M6Lwr0Z/ugLwc8pf8Gv+sqft8PlWQT/7S3+CL5RA0LV5G0w13oC2ru7oshOhGEhAIIYiueYP6C69GV9agSorIv/P/8B17xH6fV+UEyLvtGnyxUYbwPY/S8KPr0bX1+31uIUT3ktRWIfowbduE7/wPTbf9E2wbNaSM/L/+L+boEd32GMrvI+/3V9JU1p/wPx8h+sJ66r7xQ/L+8NP9XpaotSb64adYm7agq4LounqsSBS0pnHd2+hBAzAOGop5yBiMQ8oxSou76bsSoveRgECIPsr66FMar/sT1mvvAGAcPJr8P/8KY8jAbn8sZZrk/uw7GKOH03j9X9DbdxD61k/wn/U1cr53DsagAa7Oo7XG/vgzrPVvE311I9Yrb1O7pyr1+4p/3r6j1f83DhqKOXUSvqmH45t8GKoofaWWdUMj9uaPsXfsRlfXYu/e6zzXfh/4fai8XFRJEToSSVsfOpKoixDfDCriFJ2Sugh9l9KS5ZP1du+u9boL3aqkJA+AmpoGj3vSM3T382V/8gVNdz1M5KGnIeq8ffq/eSK5v/hup8sKAfJD9e2upw+52MvA+uBDGq6+CfuDD50DPh++E47FP2sq5sRxqAMHoZRCa43eW4299RPsLZ8Q3fAe1qsb0bv2tj5pXg5qYD9UUSG+wjxQCl/5CMLVtdgffOQkSdp26v8xDcxDx2FOPRzfMYc7j+3ftzdCHazDen8b1qYt2O9txdq0FfvDT1s/ZltMA2PEgRjjR2MeMsb5OHg0qjB/n/riVvznGN8MqrHRCUzc/hx7grKyIq+70KNIQNADSEDQt5X6QYcaiERavLm4vJLTjU3YFR8RfeNdov/vFaxX3k7cpw4cRN6vFuI77iuu+7O/AQGAjlqE711B+K8PovdWp95pGpCfB01hCLd99az6FWMeNRHf5IkUHlpOzQuvopQCUnc7jPdHNzRivbcVa/1bRNduwNrwXiIYav7G8vAdfSjmEYdgDD8AY/gQVFl/52re74OIhV1Vg95bjf5iN9aWj7G3foJV8ZGTiNme4kJUSRG+4gKie6ohGkVHohBqaN2HxDeoMA4aijG+HPOQ8m4JErRlQX0Dui6ErqsnZ89ego88j1/b6KhFxHZGcoq+No2m4iJUQT4UFaCKC1FFBfscLHlJAoKukYCgB9jXgECHGrE/35V0IPlH3eLHnnyfbud4R+1aHtDtfA0UFOQAUF/X1Oq+dn8dO+xHi/tsDZYFloWOWk79fctyXnwty3lhjB0PRMPohnBz+1hbrRQRS4NpoAwDTBN8JhgG+EyUaTpvXGbyMaP5WPx+paAxjG5qgoYmdFPY+dzY6HxuaESHGiDU5HxuaESHGtENjRBqQDc0QSTiPLVKgYp9Nk2Monx0To6zkZDfDwGf87XPRIcaIdSIDtY5uxK2oIaUEZh/BoF5J3d5I6LuCAjidGMTkSdWE1n5XydQaSsAUAo1bDDml7+E7+jDMI8+FGP0COfn0kZ/2goIWj1ufQPR1zZirXmT6NoN2Fs+7lK/26QUxsgDMQ4ejTl+NObBozEOHo3Rr6TNfmqtoS5E4eRDCX28A2vTZux3t2B/9Fn7j1FciDF4AGrQAFRBPio/F3ICzu9vJBZoRCLO701dCF1bj66rR9eFoH4/A/C8HFRRLDgoLkAVFcYChoKk485n8nKd39WWNIm/tea/RzvxdfPfbOwjEMB/6vH7nPshAUHXSEDQA9j2PvyItEbXh1q/WYq+SxkoXyyw8e371Z5Co5tav3GrHD+6zXcBt7QTzGnd/HtrKCfg6uC8rfoTa6oCXeiP1rE3pajzBpXchzYfVDkBSfwjHgx2pZ/x4y2fN62dqYZ4cGrb7qYe9lks0Ez5drPndUMF/JCTs0//1zD25/ex75GAQAghhBBSh0AIIYQQEhAIIYQQAgkIhBBCCIEEBEIIIYRAAgIhhBBCIAGBEEIIIZCAQAghhBBIQCCEEEIIJCAQQgghBBIQCCGEEAIJCIQQQggB9Lz9LPsg29bs3VvndTe6jWx/3DXyfHWNPF9d05ufr67sdtjbXmfb09FzIiMEQgghhJCAQAghhBASEAghhBACCQiEEEIIgQQEQgghhEACAiGEEEIgAYEQQgghkIBACCGEEEhAIIQQQggkIBBCCCEEUrpYCLEPdF2I6LoN6OogqrQY35RJqMJ8r7slhNgPEhAIIVzTWhO+/wkiyx5DWzZEo+AzUaaJ/7zTCJx9Mkopr7sphNgHEhAIIVwL3/8E4X88DAV5KJ+ZOK6jlnMcyDnnFK+6J4TYD5JDIIRwRdeFiCx7rFUwADi3C/KckYP6kEc9FGI/hMPYuyu97oWnJCAQQrgSXbcBbVmtgoE45TPRlkV07YbMdkyIbqCbwoTvedTrbnjK84Bg/fr1XHTRRXzlK1/hsMMO46STTuKBBx5IafPyyy9z1llnMXHiRKZOncp1111HMBhsda76+nquv/56pk2bxsSJE5k7dy7PP/98m4/r5TmF6Il0dRCiVseNopbTTogeyFdX73UXPOVpQPDII48wf/5
|
||
|
"text/plain": [
|
||
|
"<Figure size 504x504 with 3 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"discover_plot(train.overall_cond,y);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "ab113dc6-42df-466a-a869-dde532b80be2",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In the case of overall condition of property, everyone wants to say their house is of highest condition and most houses sold are sold for less money than the mean of all sales prices. So this will drive the score down and skew predictions"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3 (ipykernel)",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.10.5"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 5
|
||
|
}
|