mpg/eda.ipynb
2022-08-01 09:32:07 -04:00

1996 lines
152 KiB
Text

{
"cells": [
{
"cell_type": "markdown",
"id": "807a5642-e420-4208-9ccb-bcc442617ad0",
"metadata": {},
"source": [
"[Cleaning](clean.ipynb)"
]
},
{
"cell_type": "markdown",
"id": "04ed2aa6-7b64-4c2d-b007-594e31ecdae8",
"metadata": {},
"source": [
"# EDA"
]
},
{
"cell_type": "markdown",
"id": "682a6d42-8ce0-42fb-9892-b6a46beb0b9b",
"metadata": {},
"source": [
"Import and define some functions"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5ffa8b01-0b17-4ad8-8e85-f2656da50c9e",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:10.964571Z",
"iopub.status.busy": "2022-08-01T04:20:10.964037Z",
"iopub.status.idle": "2022-08-01T04:20:11.911635Z",
"shell.execute_reply": "2022-08-01T04:20:11.911048Z",
"shell.execute_reply.started": "2022-08-01T04:20:10.964486Z"
},
"tags": []
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"from os.path import exists\n",
"import matplotlib.pyplot as plt\n",
"from IPython.display import display, Markdown\n",
"\n",
"sns.set_theme(style='darkgrid')\n",
"\n",
"df = pd.read_csv('data/clean.csv')\n",
"y = df.mpg\n",
"\n",
"def show_plots(filenames):\n",
" for j in range(0,len(filenames),2):\n",
" if (len(filenames)-j)>1:\n",
" display(Markdown(f'![]({filenames[j]})![]({filenames[j+1]})'))\n",
" else:\n",
" display(Markdown(f'![]({filenames[j]})'))\n",
"\n",
"def make_plots(df, y):\n",
" filenames = []\n",
" \n",
" for col in df.columns:\n",
" filename = 'img/%s_joint.png' % col\n",
" filenames.append(filename)\n",
" if not exists(filename):\n",
" sns.jointplot(x=df[col],y=y,kind='reg',\n",
" joint_kws={'scatter_kws':dict(alpha=0.3)})\n",
" plt.suptitle(f'{col} vs mpg')\n",
" plt.subplots_adjust(top=.93)\n",
" plt.savefig(filename,facecolor='white',transparent=False)\n",
" plt.close()\n",
" \n",
" show_plots(filenames)"
]
},
{
"cell_type": "markdown",
"id": "416a9d7e-e2ad-41f0-a674-d13c01f41896",
"metadata": {},
"source": [
"## A bit on engines:\n",
"\n",
"* A most basic description of an engine is that it's an air pump\n",
"* Horsepower = (Torque * RPM) / 5252\n",
"* Torque peak is where an engine is operating most efficiently as far as air flow, applied science in action. (Fluid dynamics, resonance)\n",
"* Operating above or below the torque peak reduces efficiency and efficiency == fuel economy\n",
"* Torque peaks normally occur below 5252rpm, and horsepower peaks above that, so long as the engine can actually rev that high. On a dyno sheet (measuring torque and horsepower vs rpm) you'll see the torque/horsepower lines cross at 5252rpm\n",
"* As an engine spins faster, the power output increases until combustion is so inefficient and it produces so little torque that spinning faster produces no more power, if it holds together that long\n",
"\n",
"Basically an engine that makes lots of power at high rpm but relatively little low end torque (mazda rotary), is going to have poor fuel economy because it spends most of its time outside of its efficiency range. In contrast, diesel engines typically turn lower rpms and create all kinds of torque down low. So not only do they start off making more torque but they are less likely to stray very far from torque peak. This is also why horsepower numbers on a diesel appear low, because they can't rev as high. There's more to it but this should be enough to provide context"
]
},
{
"cell_type": "markdown",
"id": "7af7dcdd-9618-4e81-88c8-d2c2cde0fdc2",
"metadata": {},
"source": [
"So I'm only interested in a few things:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3e633f5f-8a7f-4776-a855-f22fcb87e88d",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:11.913047Z",
"iopub.status.busy": "2022-08-01T04:20:11.912844Z",
"iopub.status.idle": "2022-08-01T04:20:11.923117Z",
"shell.execute_reply": "2022-08-01T04:20:11.922526Z",
"shell.execute_reply.started": "2022-08-01T04:20:11.913032Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/markdown": [
"![](img/cylinders_joint.png)![](img/displacement_joint.png)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"![](img/horsepower_joint.png)![](img/weight_joint.png)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"make_plots(df[['cylinders','displacement',\n",
" 'horsepower','weight',]],y)"
]
},
{
"cell_type": "markdown",
"id": "b0f65dd4-16b6-4222-8758-71e2ecac473e",
"metadata": {},
"source": [
"As the number of cylinders, displacement, horsepower, or weight increase, MPG goes down. There are some outliers, we'll get to that in a minute"
]
},
{
"cell_type": "markdown",
"id": "61b1b79e-46c2-4e7b-b565-84d1e2045777",
"metadata": {},
"source": [
"There are some other things I'd like to see:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "7342da99-d04a-4f4f-ad3c-06840144ec48",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:11.924056Z",
"iopub.status.busy": "2022-08-01T04:20:11.923876Z",
"iopub.status.idle": "2022-08-01T04:20:11.935974Z",
"shell.execute_reply": "2022-08-01T04:20:11.935279Z",
"shell.execute_reply.started": "2022-08-01T04:20:11.924034Z"
},
"tags": []
},
"outputs": [],
"source": [
"new_features = pd.DataFrame()\n",
"new_features['efficiency'] = df.horsepower / df.displacement\n",
"new_features['load'] = df.displacement / df.weight\n",
"new_features['bore_size'] = df.displacement / df.cylinders\n",
"new_features['grunt'] = new_features.bore_size * new_features.efficiency * df.horsepower\n",
"# new_features['grunt'] = (df.horsepower / new_features.bore_size) * new_features.efficiency"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9fa0bf3e-d45b-4698-afac-e549db0de148",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:11.936853Z",
"iopub.status.busy": "2022-08-01T04:20:11.936679Z",
"iopub.status.idle": "2022-08-01T04:20:12.329795Z",
"shell.execute_reply": "2022-08-01T04:20:12.329065Z",
"shell.execute_reply.started": "2022-08-01T04:20:11.936838Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/markdown": [
"![](img/efficiency_joint.png)![](img/load_joint.png)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"![](img/bore_size_joint.png)![](img/grunt_joint.png)"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"make_plots(new_features,y)"
]
},
{
"cell_type": "markdown",
"id": "5cbe16d7-24ef-4ceb-acd1-0dcecfdc96c2",
"metadata": {},
"source": [
"* Efficiency (HP per cubic inch) is a rough measure of engine tech/efficiency, as this increases so does MPG\n",
"* Load is a metric of how hard the engine has to work compared to its size. Engines that work hard use more fuel and a small engine working really hard can use more fuel than a big engine that's not doing much\n",
"* Bore_size is an attempt to describe cylinder bore diameter which gives insight on torque curve\n",
"* Grunt is an attempt to describe the power curve of an engine, or more specifically the presence/absence of low rpm torque output"
]
},
{
"cell_type": "markdown",
"id": "dd05abcd-9ac9-4821-b575-ffbf8544db3c",
"metadata": {},
"source": [
"Merge new with the old"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "89cea145-4b6e-457b-9970-578144c1c364",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.331981Z",
"iopub.status.busy": "2022-08-01T04:20:12.331135Z",
"iopub.status.idle": "2022-08-01T04:20:12.338040Z",
"shell.execute_reply": "2022-08-01T04:20:12.337017Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.331935Z"
},
"tags": []
},
"outputs": [],
"source": [
"merged = df.join(new_features)\n",
"del new_features\n",
"del df"
]
},
{
"cell_type": "markdown",
"id": "d39b59e4-e596-4fc9-b886-1e6d314f597e",
"metadata": {},
"source": [
"# What's all that on the edges?\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"id": "fe7ee071-8aa4-4a8d-9e8e-480f3b9da9da",
"metadata": {},
"source": [
"## Rotaries"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "dbbfdab6-1cca-4329-a2ae-9258678ab0b1",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.339782Z",
"iopub.status.busy": "2022-08-01T04:20:12.339317Z",
"iopub.status.idle": "2022-08-01T04:20:12.367167Z",
"shell.execute_reply": "2022-08-01T04:20:12.365967Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.339751Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mpg</th>\n",
" <th>cylinders</th>\n",
" <th>displacement</th>\n",
" <th>horsepower</th>\n",
" <th>weight</th>\n",
" <th>acceleration</th>\n",
" <th>model_year</th>\n",
" <th>origin</th>\n",
" <th>car_name</th>\n",
" <th>efficiency</th>\n",
" <th>load</th>\n",
" <th>bore_size</th>\n",
" <th>grunt</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>71</th>\n",
" <td>19.0</td>\n",
" <td>3</td>\n",
" <td>70.0</td>\n",
" <td>97.0</td>\n",
" <td>2330.0</td>\n",
" <td>13.5</td>\n",
" <td>72</td>\n",
" <td>3</td>\n",
" <td>mazda rx2 coupe</td>\n",
" <td>1.385714</td>\n",
" <td>0.030043</td>\n",
" <td>23.333333</td>\n",
" <td>3136.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>111</th>\n",
" <td>18.0</td>\n",
" <td>3</td>\n",
" <td>70.0</td>\n",
" <td>90.0</td>\n",
" <td>2124.0</td>\n",
" <td>13.5</td>\n",
" <td>73</td>\n",
" <td>3</td>\n",
" <td>maxda rx3</td>\n",
" <td>1.285714</td>\n",
" <td>0.032957</td>\n",
" <td>23.333333</td>\n",
" <td>2700.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>243</th>\n",
" <td>21.5</td>\n",
" <td>3</td>\n",
" <td>80.0</td>\n",
" <td>110.0</td>\n",
" <td>2720.0</td>\n",
" <td>13.5</td>\n",
" <td>77</td>\n",
" <td>3</td>\n",
" <td>mazda rx-4</td>\n",
" <td>1.375000</td>\n",
" <td>0.029412</td>\n",
" <td>26.666667</td>\n",
" <td>4033.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>334</th>\n",
" <td>23.7</td>\n",
" <td>3</td>\n",
" <td>70.0</td>\n",
" <td>100.0</td>\n",
" <td>2420.0</td>\n",
" <td>12.5</td>\n",
" <td>80</td>\n",
" <td>3</td>\n",
" <td>mazda rx-7 gs</td>\n",
" <td>1.428571</td>\n",
" <td>0.028926</td>\n",
" <td>23.333333</td>\n",
" <td>3333.333333</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" mpg cylinders displacement horsepower weight acceleration \\\n",
"71 19.0 3 70.0 97.0 2330.0 13.5 \n",
"111 18.0 3 70.0 90.0 2124.0 13.5 \n",
"243 21.5 3 80.0 110.0 2720.0 13.5 \n",
"334 23.7 3 70.0 100.0 2420.0 12.5 \n",
"\n",
" model_year origin car_name efficiency load bore_size \\\n",
"71 72 3 mazda rx2 coupe 1.385714 0.030043 23.333333 \n",
"111 73 3 maxda rx3 1.285714 0.032957 23.333333 \n",
"243 77 3 mazda rx-4 1.375000 0.029412 26.666667 \n",
"334 80 3 mazda rx-7 gs 1.428571 0.028926 23.333333 \n",
"\n",
" grunt \n",
"71 3136.333333 \n",
"111 2700.000000 \n",
"243 4033.333333 \n",
"334 3333.333333 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wankels = merged[merged.efficiency>1]\n",
"wankels"
]
},
{
"cell_type": "markdown",
"id": "d1f1bf61-6c9b-498e-a5de-6fbe6bb719d3",
"metadata": {},
"source": [
"These are the Mazda rotaries, otherwise known as [Wankel Engines](https://en.wikipedia.org/wiki/Wankel_engine)\n",
"\n",
"Efficient power for their size because they can rev to 7000rpm or so, and that's where they make peak power"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "5eda8f40-bff6-4715-ba54-b083c74b039d",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.371148Z",
"iopub.status.busy": "2022-08-01T04:20:12.370520Z",
"iopub.status.idle": "2022-08-01T04:20:12.488456Z",
"shell.execute_reply": "2022-08-01T04:20:12.488016Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.371118Z"
},
"tags": []
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"wankels.efficiency.plot(kind='bar')\n",
"plt.xticks(np.arange(4),wankels.car_name)\n",
"pd.Series([merged['efficiency'].mean() for i in range(len(wankels))]).plot(kind='line',color='red')\n",
"plt.title('Mazda Rotary Efficiency (red line is average)');"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "1476617f-8097-4294-8c42-fb86ff96c1d0",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.489303Z",
"iopub.status.busy": "2022-08-01T04:20:12.489077Z",
"iopub.status.idle": "2022-08-01T04:20:12.574526Z",
"shell.execute_reply": "2022-08-01T04:20:12.574107Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.489288Z"
},
"tags": []
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"wankels.mpg.plot(kind='bar')\n",
"plt.xticks(np.arange(4),wankels.car_name)\n",
"pd.Series([merged['mpg'].mean() for i in range(len(wankels))]).plot(kind='line',color='red')\n",
"plt.title('Mazda Rotary MPG (red line is average)');"
]
},
{
"cell_type": "markdown",
"id": "4f793604-5c51-44b7-9cb6-301151304400",
"metadata": {},
"source": [
"## Diesels"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "c7aece22-1e78-4a48-b969-f1207ba09aad",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.575478Z",
"iopub.status.busy": "2022-08-01T04:20:12.575129Z",
"iopub.status.idle": "2022-08-01T04:20:12.588438Z",
"shell.execute_reply": "2022-08-01T04:20:12.587822Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.575463Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mpg</th>\n",
" <th>cylinders</th>\n",
" <th>displacement</th>\n",
" <th>horsepower</th>\n",
" <th>weight</th>\n",
" <th>acceleration</th>\n",
" <th>model_year</th>\n",
" <th>origin</th>\n",
" <th>car_name</th>\n",
" <th>efficiency</th>\n",
" <th>load</th>\n",
" <th>bore_size</th>\n",
" <th>grunt</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>244</th>\n",
" <td>43.1</td>\n",
" <td>4</td>\n",
" <td>90.0</td>\n",
" <td>48.0</td>\n",
" <td>1985.0</td>\n",
" <td>21.5</td>\n",
" <td>78</td>\n",
" <td>2</td>\n",
" <td>volkswagen rabbit custom diesel</td>\n",
" <td>0.533333</td>\n",
" <td>0.045340</td>\n",
" <td>22.500000</td>\n",
" <td>576.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>325</th>\n",
" <td>44.3</td>\n",
" <td>4</td>\n",
" <td>90.0</td>\n",
" <td>48.0</td>\n",
" <td>2085.0</td>\n",
" <td>21.7</td>\n",
" <td>80</td>\n",
" <td>2</td>\n",
" <td>vw rabbit c (diesel)</td>\n",
" <td>0.533333</td>\n",
" <td>0.043165</td>\n",
" <td>22.500000</td>\n",
" <td>576.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>326</th>\n",
" <td>43.4</td>\n",
" <td>4</td>\n",
" <td>90.0</td>\n",
" <td>48.0</td>\n",
" <td>2335.0</td>\n",
" <td>23.7</td>\n",
" <td>80</td>\n",
" <td>2</td>\n",
" <td>vw dasher (diesel)</td>\n",
" <td>0.533333</td>\n",
" <td>0.038544</td>\n",
" <td>22.500000</td>\n",
" <td>576.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>327</th>\n",
" <td>36.4</td>\n",
" <td>5</td>\n",
" <td>121.0</td>\n",
" <td>67.0</td>\n",
" <td>2950.0</td>\n",
" <td>19.9</td>\n",
" <td>80</td>\n",
" <td>2</td>\n",
" <td>audi 5000s (diesel)</td>\n",
" <td>0.553719</td>\n",
" <td>0.041017</td>\n",
" <td>24.200000</td>\n",
" <td>897.800000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>358</th>\n",
" <td>28.1</td>\n",
" <td>4</td>\n",
" <td>141.0</td>\n",
" <td>80.0</td>\n",
" <td>3230.0</td>\n",
" <td>20.4</td>\n",
" <td>81</td>\n",
" <td>2</td>\n",
" <td>peugeot 505s turbo diesel</td>\n",
" <td>0.567376</td>\n",
" <td>0.043653</td>\n",
" <td>35.250000</td>\n",
" <td>1600.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>359</th>\n",
" <td>30.7</td>\n",
" <td>6</td>\n",
" <td>145.0</td>\n",
" <td>76.0</td>\n",
" <td>3160.0</td>\n",
" <td>19.6</td>\n",
" <td>81</td>\n",
" <td>2</td>\n",
" <td>volvo diesel</td>\n",
" <td>0.524138</td>\n",
" <td>0.045886</td>\n",
" <td>24.166667</td>\n",
" <td>962.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>386</th>\n",
" <td>38.0</td>\n",
" <td>6</td>\n",
" <td>262.0</td>\n",
" <td>85.0</td>\n",
" <td>3015.0</td>\n",
" <td>17.0</td>\n",
" <td>82</td>\n",
" <td>1</td>\n",
" <td>oldsmobile cutlass ciera (diesel)</td>\n",
" <td>0.324427</td>\n",
" <td>0.086899</td>\n",
" <td>43.666667</td>\n",
" <td>1204.166667</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" mpg cylinders displacement horsepower weight acceleration \\\n",
"244 43.1 4 90.0 48.0 1985.0 21.5 \n",
"325 44.3 4 90.0 48.0 2085.0 21.7 \n",
"326 43.4 4 90.0 48.0 2335.0 23.7 \n",
"327 36.4 5 121.0 67.0 2950.0 19.9 \n",
"358 28.1 4 141.0 80.0 3230.0 20.4 \n",
"359 30.7 6 145.0 76.0 3160.0 19.6 \n",
"386 38.0 6 262.0 85.0 3015.0 17.0 \n",
"\n",
" model_year origin car_name efficiency \\\n",
"244 78 2 volkswagen rabbit custom diesel 0.533333 \n",
"325 80 2 vw rabbit c (diesel) 0.533333 \n",
"326 80 2 vw dasher (diesel) 0.533333 \n",
"327 80 2 audi 5000s (diesel) 0.553719 \n",
"358 81 2 peugeot 505s turbo diesel 0.567376 \n",
"359 81 2 volvo diesel 0.524138 \n",
"386 82 1 oldsmobile cutlass ciera (diesel) 0.324427 \n",
"\n",
" load bore_size grunt \n",
"244 0.045340 22.500000 576.000000 \n",
"325 0.043165 22.500000 576.000000 \n",
"326 0.038544 22.500000 576.000000 \n",
"327 0.041017 24.200000 897.800000 \n",
"358 0.043653 35.250000 1600.000000 \n",
"359 0.045886 24.166667 962.666667 \n",
"386 0.086899 43.666667 1204.166667 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"diesels = merged[merged.car_name.str.contains('diesel')]\n",
"diesels"
]
},
{
"cell_type": "markdown",
"id": "79979a1e-de58-4610-8878-6f374f500d1c",
"metadata": {},
"source": [
"All of the diesels get higher than average MPG"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "92f0bf1a-af7b-4a26-b422-8ca01fdfde1b",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.590154Z",
"iopub.status.busy": "2022-08-01T04:20:12.589539Z",
"iopub.status.idle": "2022-08-01T04:20:12.714518Z",
"shell.execute_reply": "2022-08-01T04:20:12.713877Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.590124Z"
},
"tags": []
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"diesels.mpg.plot(kind='barh')\n",
"plt.yticks(np.arange(len(diesels)),diesels.car_name)\n",
"plt.axvline(merged.mpg.mean(),color='red')\n",
"plt.title('Diesel MPG (red line is average)');"
]
},
{
"cell_type": "markdown",
"id": "df9d8d17-46ec-4ce8-950e-aa1a24d98d7f",
"metadata": {},
"source": [
"# Interesting"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "0fb1ed64-bba6-463c-9a0f-84af360515b5",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.715450Z",
"iopub.status.busy": "2022-08-01T04:20:12.715284Z",
"iopub.status.idle": "2022-08-01T04:20:12.726019Z",
"shell.execute_reply": "2022-08-01T04:20:12.725441Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.715435Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mpg</th>\n",
" <th>cylinders</th>\n",
" <th>displacement</th>\n",
" <th>horsepower</th>\n",
" <th>weight</th>\n",
" <th>acceleration</th>\n",
" <th>model_year</th>\n",
" <th>origin</th>\n",
" <th>car_name</th>\n",
" <th>efficiency</th>\n",
" <th>load</th>\n",
" <th>bore_size</th>\n",
" <th>grunt</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>386</th>\n",
" <td>38.0</td>\n",
" <td>6</td>\n",
" <td>262.0</td>\n",
" <td>85.0</td>\n",
" <td>3015.0</td>\n",
" <td>17.0</td>\n",
" <td>82</td>\n",
" <td>1</td>\n",
" <td>oldsmobile cutlass ciera (diesel)</td>\n",
" <td>0.324427</td>\n",
" <td>0.086899</td>\n",
" <td>43.666667</td>\n",
" <td>1204.166667</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" mpg cylinders displacement horsepower weight acceleration \\\n",
"386 38.0 6 262.0 85.0 3015.0 17.0 \n",
"\n",
" model_year origin car_name efficiency \\\n",
"386 82 1 oldsmobile cutlass ciera (diesel) 0.324427 \n",
"\n",
" load bore_size grunt \n",
"386 0.086899 43.666667 1204.166667 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged.iloc[np.where((merged.mpg>35) & (merged.displacement > 250))]"
]
},
{
"cell_type": "markdown",
"id": "1e1ec508-df30-42c6-a63f-aea36c12d2e8",
"metadata": {},
"source": [
"This is an interesting engine. In fact, [these cars are rumored to be the reason why diesel cars are so unpopular in North America](https://www.autotrader.com/car-news/when-diesel-was-dreadful-oldsmobile-diesels-259997). [Here is a more technical write-up](https://www.dieselworldmag.com/diesel-engines/oldsmobile-350-v8)"
]
},
{
"cell_type": "markdown",
"id": "b9858dee-1de0-46ab-b46d-baa4cafc0efc",
"metadata": {},
"source": [
"<hr>"
]
},
{
"cell_type": "markdown",
"id": "d8625227-6fca-4e92-ba0c-271bbea53c23",
"metadata": {},
"source": [
"Big lazy engines in big heavy cars don't have to have poor MPG!"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c0c4f183-ef44-42ee-b64c-a75c63450d7b",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.727961Z",
"iopub.status.busy": "2022-08-01T04:20:12.727510Z",
"iopub.status.idle": "2022-08-01T04:20:12.755277Z",
"shell.execute_reply": "2022-08-01T04:20:12.754339Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.727919Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mpg</th>\n",
" <th>cylinders</th>\n",
" <th>displacement</th>\n",
" <th>horsepower</th>\n",
" <th>weight</th>\n",
" <th>acceleration</th>\n",
" <th>model_year</th>\n",
" <th>origin</th>\n",
" <th>car_name</th>\n",
" <th>efficiency</th>\n",
" <th>load</th>\n",
" <th>bore_size</th>\n",
" <th>grunt</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>298</th>\n",
" <td>23.0</td>\n",
" <td>8</td>\n",
" <td>350.0</td>\n",
" <td>125.0</td>\n",
" <td>3900.0</td>\n",
" <td>17.4</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>cadillac eldorado</td>\n",
" <td>0.357143</td>\n",
" <td>0.089744</td>\n",
" <td>43.75</td>\n",
" <td>1953.125</td>\n",
" </tr>\n",
" <tr>\n",
" <th>363</th>\n",
" <td>26.6</td>\n",
" <td>8</td>\n",
" <td>350.0</td>\n",
" <td>105.0</td>\n",
" <td>3725.0</td>\n",
" <td>19.0</td>\n",
" <td>81</td>\n",
" <td>1</td>\n",
" <td>oldsmobile cutlass ls</td>\n",
" <td>0.300000</td>\n",
" <td>0.093960</td>\n",
" <td>43.75</td>\n",
" <td>1378.125</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" mpg cylinders displacement horsepower weight acceleration \\\n",
"298 23.0 8 350.0 125.0 3900.0 17.4 \n",
"363 26.6 8 350.0 105.0 3725.0 19.0 \n",
"\n",
" model_year origin car_name efficiency load \\\n",
"298 79 1 cadillac eldorado 0.357143 0.089744 \n",
"363 81 1 oldsmobile cutlass ls 0.300000 0.093960 \n",
"\n",
" bore_size grunt \n",
"298 43.75 1953.125 \n",
"363 43.75 1378.125 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged.iloc[np.where((merged.mpg>20) & (merged.displacement > 340))]"
]
},
{
"cell_type": "markdown",
"id": "2ccec1cb-db88-430c-a118-351da41a23c1",
"metadata": {},
"source": [
"But some still do"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "8f51f87e-fb76-4c8a-b4bc-05f147fc8efa",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.756201Z",
"iopub.status.busy": "2022-08-01T04:20:12.756022Z",
"iopub.status.idle": "2022-08-01T04:20:12.770660Z",
"shell.execute_reply": "2022-08-01T04:20:12.769833Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.756186Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mpg</th>\n",
" <th>cylinders</th>\n",
" <th>displacement</th>\n",
" <th>horsepower</th>\n",
" <th>weight</th>\n",
" <th>acceleration</th>\n",
" <th>model_year</th>\n",
" <th>origin</th>\n",
" <th>car_name</th>\n",
" <th>efficiency</th>\n",
" <th>load</th>\n",
" <th>bore_size</th>\n",
" <th>grunt</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>14.0</td>\n",
" <td>8</td>\n",
" <td>455.0</td>\n",
" <td>225.0</td>\n",
" <td>3086.0</td>\n",
" <td>10.0</td>\n",
" <td>70</td>\n",
" <td>1</td>\n",
" <td>buick estate wagon (sw)</td>\n",
" <td>0.494505</td>\n",
" <td>0.14744</td>\n",
" <td>56.875</td>\n",
" <td>6328.125</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" mpg cylinders displacement horsepower weight acceleration \\\n",
"13 14.0 8 455.0 225.0 3086.0 10.0 \n",
"\n",
" model_year origin car_name efficiency load \\\n",
"13 70 1 buick estate wagon (sw) 0.494505 0.14744 \n",
"\n",
" bore_size grunt \n",
"13 56.875 6328.125 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged[merged.load>0.14]"
]
},
{
"cell_type": "markdown",
"id": "4415be3a-f8fb-47f1-b39d-2c60a3495a1d",
"metadata": {},
"source": [
"Big car, big engine, terrible MPG.. That weight is way off"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "7d556866-da6d-48dd-b37a-e59c3155085d",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.772414Z",
"iopub.status.busy": "2022-08-01T04:20:12.771774Z",
"iopub.status.idle": "2022-08-01T04:20:12.776947Z",
"shell.execute_reply": "2022-08-01T04:20:12.776094Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.772385Z"
},
"tags": []
},
"outputs": [],
"source": [
"merged.at[13,'weight'] = 5000\n",
"merged['load'] = merged.displacement / merged.weight"
]
},
{
"cell_type": "markdown",
"id": "146d6761-455a-407f-b627-24c13586a88f",
"metadata": {},
"source": [
"## What vehicles have the Highest MPG?"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "558d450a-2649-4005-bbe8-5f8cc509f965",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.778488Z",
"iopub.status.busy": "2022-08-01T04:20:12.778076Z",
"iopub.status.idle": "2022-08-01T04:20:12.930503Z",
"shell.execute_reply": "2022-08-01T04:20:12.929878Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.778459Z"
},
"tags": []
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"top_mpg = merged.sort_values('mpg').tail(10)\n",
"\n",
"fig, ax = plt.subplots(figsize = (6,5))\n",
"ax.barh(top_mpg.car_name,top_mpg.mpg)\n",
"for i in ax.patches:\n",
" plt.text(i.get_width()+0.2, i.get_y()+0.5,\n",
" str(round((i.get_width()), 2)),\n",
" fontsize = 10, fontweight ='bold',\n",
" color ='grey')\n",
"ax.set_title('Top 10 MPG (red line is average)')\n",
"plt.axvline(merged.mpg.mean(),color='red')\n",
"plt.show();"
]
},
{
"cell_type": "markdown",
"id": "260484cb-5145-4c0f-8952-8f6ba652c8a5",
"metadata": {},
"source": [
"In more detail:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "38935e91-3877-47a3-96d6-cd54e2704bdb",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.931405Z",
"iopub.status.busy": "2022-08-01T04:20:12.931238Z",
"iopub.status.idle": "2022-08-01T04:20:12.947214Z",
"shell.execute_reply": "2022-08-01T04:20:12.946618Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.931389Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mpg</th>\n",
" <th>cylinders</th>\n",
" <th>displacement</th>\n",
" <th>horsepower</th>\n",
" <th>weight</th>\n",
" <th>acceleration</th>\n",
" <th>model_year</th>\n",
" <th>origin</th>\n",
" <th>car_name</th>\n",
" <th>efficiency</th>\n",
" <th>load</th>\n",
" <th>bore_size</th>\n",
" <th>grunt</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>322</th>\n",
" <td>46.6</td>\n",
" <td>4</td>\n",
" <td>86.0</td>\n",
" <td>65.0</td>\n",
" <td>2110.0</td>\n",
" <td>17.9</td>\n",
" <td>80</td>\n",
" <td>3</td>\n",
" <td>mazda glc</td>\n",
" <td>0.755814</td>\n",
" <td>0.040758</td>\n",
" <td>21.50</td>\n",
" <td>1056.2500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>329</th>\n",
" <td>44.6</td>\n",
" <td>4</td>\n",
" <td>91.0</td>\n",
" <td>67.0</td>\n",
" <td>1850.0</td>\n",
" <td>13.8</td>\n",
" <td>80</td>\n",
" <td>3</td>\n",
" <td>honda civic 1500 gl</td>\n",
" <td>0.736264</td>\n",
" <td>0.049189</td>\n",
" <td>22.75</td>\n",
" <td>1122.2500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>325</th>\n",
" <td>44.3</td>\n",
" <td>4</td>\n",
" <td>90.0</td>\n",
" <td>48.0</td>\n",
" <td>2085.0</td>\n",
" <td>21.7</td>\n",
" <td>80</td>\n",
" <td>2</td>\n",
" <td>vw rabbit c (diesel)</td>\n",
" <td>0.533333</td>\n",
" <td>0.043165</td>\n",
" <td>22.50</td>\n",
" <td>576.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>393</th>\n",
" <td>44.0</td>\n",
" <td>4</td>\n",
" <td>97.0</td>\n",
" <td>52.0</td>\n",
" <td>2130.0</td>\n",
" <td>24.6</td>\n",
" <td>82</td>\n",
" <td>2</td>\n",
" <td>vw pickup</td>\n",
" <td>0.536082</td>\n",
" <td>0.045540</td>\n",
" <td>24.25</td>\n",
" <td>676.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>326</th>\n",
" <td>43.4</td>\n",
" <td>4</td>\n",
" <td>90.0</td>\n",
" <td>48.0</td>\n",
" <td>2335.0</td>\n",
" <td>23.7</td>\n",
" <td>80</td>\n",
" <td>2</td>\n",
" <td>vw dasher (diesel)</td>\n",
" <td>0.533333</td>\n",
" <td>0.038544</td>\n",
" <td>22.50</td>\n",
" <td>576.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>244</th>\n",
" <td>43.1</td>\n",
" <td>4</td>\n",
" <td>90.0</td>\n",
" <td>48.0</td>\n",
" <td>1985.0</td>\n",
" <td>21.5</td>\n",
" <td>78</td>\n",
" <td>2</td>\n",
" <td>volkswagen rabbit custom diesel</td>\n",
" <td>0.533333</td>\n",
" <td>0.045340</td>\n",
" <td>22.50</td>\n",
" <td>576.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>309</th>\n",
" <td>41.5</td>\n",
" <td>4</td>\n",
" <td>98.0</td>\n",
" <td>76.0</td>\n",
" <td>2144.0</td>\n",
" <td>14.7</td>\n",
" <td>80</td>\n",
" <td>2</td>\n",
" <td>vw rabbit</td>\n",
" <td>0.775510</td>\n",
" <td>0.045709</td>\n",
" <td>24.50</td>\n",
" <td>1444.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>330</th>\n",
" <td>40.9</td>\n",
" <td>4</td>\n",
" <td>85.0</td>\n",
" <td>53.5</td>\n",
" <td>1835.0</td>\n",
" <td>17.3</td>\n",
" <td>80</td>\n",
" <td>2</td>\n",
" <td>renault lecar deluxe</td>\n",
" <td>0.629412</td>\n",
" <td>0.046322</td>\n",
" <td>21.25</td>\n",
" <td>715.5625</td>\n",
" </tr>\n",
" <tr>\n",
" <th>324</th>\n",
" <td>40.8</td>\n",
" <td>4</td>\n",
" <td>85.0</td>\n",
" <td>65.0</td>\n",
" <td>2110.0</td>\n",
" <td>19.2</td>\n",
" <td>80</td>\n",
" <td>3</td>\n",
" <td>datsun 210</td>\n",
" <td>0.764706</td>\n",
" <td>0.040284</td>\n",
" <td>21.25</td>\n",
" <td>1056.2500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>247</th>\n",
" <td>39.4</td>\n",
" <td>4</td>\n",
" <td>85.0</td>\n",
" <td>70.0</td>\n",
" <td>2070.0</td>\n",
" <td>18.6</td>\n",
" <td>78</td>\n",
" <td>3</td>\n",
" <td>datsun b210 gx</td>\n",
" <td>0.823529</td>\n",
" <td>0.041063</td>\n",
" <td>21.25</td>\n",
" <td>1225.0000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" mpg cylinders displacement horsepower weight acceleration \\\n",
"322 46.6 4 86.0 65.0 2110.0 17.9 \n",
"329 44.6 4 91.0 67.0 1850.0 13.8 \n",
"325 44.3 4 90.0 48.0 2085.0 21.7 \n",
"393 44.0 4 97.0 52.0 2130.0 24.6 \n",
"326 43.4 4 90.0 48.0 2335.0 23.7 \n",
"244 43.1 4 90.0 48.0 1985.0 21.5 \n",
"309 41.5 4 98.0 76.0 2144.0 14.7 \n",
"330 40.9 4 85.0 53.5 1835.0 17.3 \n",
"324 40.8 4 85.0 65.0 2110.0 19.2 \n",
"247 39.4 4 85.0 70.0 2070.0 18.6 \n",
"\n",
" model_year origin car_name efficiency \\\n",
"322 80 3 mazda glc 0.755814 \n",
"329 80 3 honda civic 1500 gl 0.736264 \n",
"325 80 2 vw rabbit c (diesel) 0.533333 \n",
"393 82 2 vw pickup 0.536082 \n",
"326 80 2 vw dasher (diesel) 0.533333 \n",
"244 78 2 volkswagen rabbit custom diesel 0.533333 \n",
"309 80 2 vw rabbit 0.775510 \n",
"330 80 2 renault lecar deluxe 0.629412 \n",
"324 80 3 datsun 210 0.764706 \n",
"247 78 3 datsun b210 gx 0.823529 \n",
"\n",
" load bore_size grunt \n",
"322 0.040758 21.50 1056.2500 \n",
"329 0.049189 22.75 1122.2500 \n",
"325 0.043165 22.50 576.0000 \n",
"393 0.045540 24.25 676.0000 \n",
"326 0.038544 22.50 576.0000 \n",
"244 0.045340 22.50 576.0000 \n",
"309 0.045709 24.50 1444.0000 \n",
"330 0.046322 21.25 715.5625 \n",
"324 0.040284 21.25 1056.2500 \n",
"247 0.041063 21.25 1225.0000 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged.sort_values('mpg',ascending=False).head(10)"
]
},
{
"cell_type": "markdown",
"id": "15d5a2c5-cb01-4a54-8ce4-375018ebc79a",
"metadata": {},
"source": [
"## What vehicles have the lowest MPG?"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "65588fe3-762f-42b0-9427-feb64275b792",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:12.948105Z",
"iopub.status.busy": "2022-08-01T04:20:12.947948Z",
"iopub.status.idle": "2022-08-01T04:20:13.102377Z",
"shell.execute_reply": "2022-08-01T04:20:13.101717Z",
"shell.execute_reply.started": "2022-08-01T04:20:12.948090Z"
},
"tags": []
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"low_mpg = merged.sort_values('mpg', ascending=False).tail(10)\n",
"\n",
"fig, ax = plt.subplots(figsize = (6,5))\n",
"ax.barh(low_mpg.car_name,low_mpg.mpg)\n",
"for i in ax.patches:\n",
" plt.text(i.get_width()+0.2, i.get_y()+0.5,\n",
" str(round((i.get_width()), 2)),\n",
" fontsize = 10, fontweight ='bold',\n",
" color ='grey')\n",
"ax.set_title('Bottom 10 MPG (red line is average)')\n",
"plt.axvline(merged.mpg.mean(),color='red')\n",
"plt.show();"
]
},
{
"cell_type": "markdown",
"id": "3d1c5be8-b63f-496e-a2cd-475a47e7a542",
"metadata": {},
"source": [
"In more detail:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "1497a48c-42a3-447e-b1fb-e3a5b78902da",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:13.103850Z",
"iopub.status.busy": "2022-08-01T04:20:13.103446Z",
"iopub.status.idle": "2022-08-01T04:20:13.131408Z",
"shell.execute_reply": "2022-08-01T04:20:13.130488Z",
"shell.execute_reply.started": "2022-08-01T04:20:13.103820Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mpg</th>\n",
" <th>cylinders</th>\n",
" <th>displacement</th>\n",
" <th>horsepower</th>\n",
" <th>weight</th>\n",
" <th>acceleration</th>\n",
" <th>model_year</th>\n",
" <th>origin</th>\n",
" <th>car_name</th>\n",
" <th>efficiency</th>\n",
" <th>load</th>\n",
" <th>bore_size</th>\n",
" <th>grunt</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>9.0</td>\n",
" <td>8</td>\n",
" <td>304.0</td>\n",
" <td>193.0</td>\n",
" <td>4732.0</td>\n",
" <td>18.5</td>\n",
" <td>70</td>\n",
" <td>1</td>\n",
" <td>hi 1200d</td>\n",
" <td>0.634868</td>\n",
" <td>0.064243</td>\n",
" <td>38.000</td>\n",
" <td>4656.125</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>10.0</td>\n",
" <td>8</td>\n",
" <td>307.0</td>\n",
" <td>200.0</td>\n",
" <td>4376.0</td>\n",
" <td>15.0</td>\n",
" <td>70</td>\n",
" <td>1</td>\n",
" <td>chevy c20</td>\n",
" <td>0.651466</td>\n",
" <td>0.070155</td>\n",
" <td>38.375</td>\n",
" <td>5000.000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>10.0</td>\n",
" <td>8</td>\n",
" <td>360.0</td>\n",
" <td>215.0</td>\n",
" <td>4615.0</td>\n",
" <td>14.0</td>\n",
" <td>70</td>\n",
" <td>1</td>\n",
" <td>ford f250</td>\n",
" <td>0.597222</td>\n",
" <td>0.078007</td>\n",
" <td>45.000</td>\n",
" <td>5778.125</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>11.0</td>\n",
" <td>8</td>\n",
" <td>318.0</td>\n",
" <td>210.0</td>\n",
" <td>4382.0</td>\n",
" <td>13.5</td>\n",
" <td>70</td>\n",
" <td>1</td>\n",
" <td>dodge d200</td>\n",
" <td>0.660377</td>\n",
" <td>0.072570</td>\n",
" <td>39.750</td>\n",
" <td>5512.500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>103</th>\n",
" <td>11.0</td>\n",
" <td>8</td>\n",
" <td>400.0</td>\n",
" <td>150.0</td>\n",
" <td>4997.0</td>\n",
" <td>14.0</td>\n",
" <td>73</td>\n",
" <td>1</td>\n",
" <td>chevrolet impala</td>\n",
" <td>0.375000</td>\n",
" <td>0.080048</td>\n",
" <td>50.000</td>\n",
" <td>2812.500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>11.0</td>\n",
" <td>8</td>\n",
" <td>429.0</td>\n",
" <td>208.0</td>\n",
" <td>4633.0</td>\n",
" <td>11.0</td>\n",
" <td>72</td>\n",
" <td>1</td>\n",
" <td>mercury marquis</td>\n",
" <td>0.484848</td>\n",
" <td>0.092597</td>\n",
" <td>53.625</td>\n",
" <td>5408.000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>124</th>\n",
" <td>11.0</td>\n",
" <td>8</td>\n",
" <td>350.0</td>\n",
" <td>180.0</td>\n",
" <td>3664.0</td>\n",
" <td>11.0</td>\n",
" <td>73</td>\n",
" <td>1</td>\n",
" <td>oldsmobile omega</td>\n",
" <td>0.514286</td>\n",
" <td>0.095524</td>\n",
" <td>43.750</td>\n",
" <td>4050.000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>12.0</td>\n",
" <td>8</td>\n",
" <td>383.0</td>\n",
" <td>180.0</td>\n",
" <td>4955.0</td>\n",
" <td>11.5</td>\n",
" <td>71</td>\n",
" <td>1</td>\n",
" <td>dodge monaco (sw)</td>\n",
" <td>0.469974</td>\n",
" <td>0.077296</td>\n",
" <td>47.875</td>\n",
" <td>4050.000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>12.0</td>\n",
" <td>8</td>\n",
" <td>455.0</td>\n",
" <td>225.0</td>\n",
" <td>4951.0</td>\n",
" <td>11.0</td>\n",
" <td>73</td>\n",
" <td>1</td>\n",
" <td>buick electra 225 custom</td>\n",
" <td>0.494505</td>\n",
" <td>0.091901</td>\n",
" <td>56.875</td>\n",
" <td>6328.125</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90</th>\n",
" <td>12.0</td>\n",
" <td>8</td>\n",
" <td>429.0</td>\n",
" <td>198.0</td>\n",
" <td>4952.0</td>\n",
" <td>11.5</td>\n",
" <td>73</td>\n",
" <td>1</td>\n",
" <td>mercury marquis brougham</td>\n",
" <td>0.461538</td>\n",
" <td>0.086632</td>\n",
" <td>53.625</td>\n",
" <td>4900.500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" mpg cylinders displacement horsepower weight acceleration \\\n",
"28 9.0 8 304.0 193.0 4732.0 18.5 \n",
"26 10.0 8 307.0 200.0 4376.0 15.0 \n",
"25 10.0 8 360.0 215.0 4615.0 14.0 \n",
"27 11.0 8 318.0 210.0 4382.0 13.5 \n",
"103 11.0 8 400.0 150.0 4997.0 14.0 \n",
"67 11.0 8 429.0 208.0 4633.0 11.0 \n",
"124 11.0 8 350.0 180.0 3664.0 11.0 \n",
"42 12.0 8 383.0 180.0 4955.0 11.5 \n",
"95 12.0 8 455.0 225.0 4951.0 11.0 \n",
"90 12.0 8 429.0 198.0 4952.0 11.5 \n",
"\n",
" model_year origin car_name efficiency load \\\n",
"28 70 1 hi 1200d 0.634868 0.064243 \n",
"26 70 1 chevy c20 0.651466 0.070155 \n",
"25 70 1 ford f250 0.597222 0.078007 \n",
"27 70 1 dodge d200 0.660377 0.072570 \n",
"103 73 1 chevrolet impala 0.375000 0.080048 \n",
"67 72 1 mercury marquis 0.484848 0.092597 \n",
"124 73 1 oldsmobile omega 0.514286 0.095524 \n",
"42 71 1 dodge monaco (sw) 0.469974 0.077296 \n",
"95 73 1 buick electra 225 custom 0.494505 0.091901 \n",
"90 73 1 mercury marquis brougham 0.461538 0.086632 \n",
"\n",
" bore_size grunt \n",
"28 38.000 4656.125 \n",
"26 38.375 5000.000 \n",
"25 45.000 5778.125 \n",
"27 39.750 5512.500 \n",
"103 50.000 2812.500 \n",
"67 53.625 5408.000 \n",
"124 43.750 4050.000 \n",
"42 47.875 4050.000 \n",
"95 56.875 6328.125 \n",
"90 53.625 4900.500 "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged.sort_values('mpg').head(10)"
]
},
{
"cell_type": "markdown",
"id": "15d0d27b-5f92-4648-ad5c-35cc811430b3",
"metadata": {},
"source": [
"## Some stats"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "8710cba8-6b7e-4219-98b9-b7d5a1b4f4b9",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:13.132876Z",
"iopub.status.busy": "2022-08-01T04:20:13.132574Z",
"iopub.status.idle": "2022-08-01T04:20:13.142096Z",
"shell.execute_reply": "2022-08-01T04:20:13.141321Z",
"shell.execute_reply.started": "2022-08-01T04:20:13.132851Z"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mean MPG: 23.51\n",
"Mean Weight: 2975.41\n",
"Mean Horsepower: 104.12\n",
"efficiency mean: 0.61\n",
"load mean: 0.06\n",
"bore_size mean: 33.36\n",
"grunt mean: 2060.50\n"
]
}
],
"source": [
"print(f'''Mean MPG: {y.mean():.2f}\n",
"Mean Weight: {merged.weight.mean():.2f}\n",
"Mean Horsepower: {merged.horsepower.mean():.2f}''')\n",
"\n",
"for col in merged.columns[9:]:\n",
" print(f'{col} mean: {merged[col].mean():.2f}')"
]
},
{
"cell_type": "markdown",
"id": "0213061d-29c8-4f47-9128-705253bc6320",
"metadata": {},
"source": [
"Check Correlation"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "7205bdab-a7df-41b4-9ec0-c1c9e2fe1c03",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:13.143792Z",
"iopub.status.busy": "2022-08-01T04:20:13.143335Z",
"iopub.status.idle": "2022-08-01T04:20:13.153208Z",
"shell.execute_reply": "2022-08-01T04:20:13.152547Z",
"shell.execute_reply.started": "2022-08-01T04:20:13.143758Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"weight -0.832707\n",
"displacement -0.804456\n",
"horsepower -0.777897\n",
"cylinders -0.776090\n",
"bore_size -0.773403\n",
"load -0.724271\n",
"grunt -0.644081\n",
"acceleration 0.420414\n",
"efficiency 0.509309\n",
"origin 0.563833\n",
"model_year 0.580091\n",
"mpg 1.000000\n",
"dtype: float64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged.corrwith(y).sort_values()"
]
},
{
"cell_type": "markdown",
"id": "d8889b56-a87c-4901-b654-aaf5a4b9fb14",
"metadata": {},
"source": [
"<hr>\n",
"Math says to use weight, displacement, horsepower, cylinders...\n",
"\n",
"While I agree that these are the most important features, there's more to it than just these numbers. Like how a stew is not just a sum of its ingredients."
]
},
{
"cell_type": "markdown",
"id": "27e89d6b-7603-403c-8235-e9bad49040b3",
"metadata": {},
"source": [
"I'll test both"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "52d0ffbf-55aa-49b9-b99f-8160bf09cc79",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:13.154483Z",
"iopub.status.busy": "2022-08-01T04:20:13.154106Z",
"iopub.status.idle": "2022-08-01T04:20:13.159628Z",
"shell.execute_reply": "2022-08-01T04:20:13.158886Z",
"shell.execute_reply.started": "2022-08-01T04:20:13.154458Z"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',\n",
" 'acceleration', 'model_year', 'origin', 'car_name', 'efficiency',\n",
" 'load', 'bore_size', 'grunt'],\n",
" dtype='object')"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged.columns"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "6a4a9e48-57a1-48b6-b289-58bc43584112",
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-01T04:20:13.163437Z",
"iopub.status.busy": "2022-08-01T04:20:13.163108Z",
"iopub.status.idle": "2022-08-01T04:20:13.175359Z",
"shell.execute_reply": "2022-08-01T04:20:13.174579Z",
"shell.execute_reply.started": "2022-08-01T04:20:13.163422Z"
},
"tags": []
},
"outputs": [],
"source": [
"y.to_csv('data/y.csv',index=False)\n",
"\n",
"merged[[\\\n",
" 'horsepower',\n",
" 'bore_size',\n",
" 'grunt',\n",
" 'load',\n",
" ]].to_csv('data/X_engineered.csv',index=False)\n",
"\n",
"merged[[\\\n",
" 'horsepower',\n",
" 'weight',\n",
" 'displacement',\n",
" 'cylinders',\n",
" ]].to_csv('data/X_straight.csv',index=False)"
]
},
{
"cell_type": "markdown",
"id": "4802d1fd-079c-4053-88f2-b5dca7cf8dae",
"metadata": {},
"source": [
"[Modeling](model.ipynb)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}