{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Aprendizagem de máquina\n", "\n", "### Objetivos\n", "\n", " - Praticar os algoritmos de clusterização" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Análise de crédito do cartão de credito\n", "\n", "Realizar uma análise exploratória na base de dados de clientes afim de categorizar a quantidade de perfis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Importa libs" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from sklearn.cluster import KMeans\n", "from sklearn.preprocessing import StandardScaler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Importa dataset" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4...BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default payment next month
01200002212422-1-1...000068900001
1212000022226-1200...3272345532610100010001000020001
2390000222340000...1433114948155491518150010001000100050000
3450000221370000...2831428959295472000201912001100106910000
455000012157-10-10...2094019146191312000366811000090006896790
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " ID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 PAY_2 PAY_3 PAY_4 \\\n", "0 1 20000 2 2 1 24 2 2 -1 -1 \n", "1 2 120000 2 2 2 26 -1 2 0 0 \n", "2 3 90000 2 2 2 34 0 0 0 0 \n", "3 4 50000 2 2 1 37 0 0 0 0 \n", "4 5 50000 1 2 1 57 -1 0 -1 0 \n", "\n", " ... BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 \\\n", "0 ... 0 0 0 0 689 0 \n", "1 ... 3272 3455 3261 0 1000 1000 \n", "2 ... 14331 14948 15549 1518 1500 1000 \n", "3 ... 28314 28959 29547 2000 2019 1200 \n", "4 ... 20940 19146 19131 2000 36681 10000 \n", "\n", " PAY_AMT4 PAY_AMT5 PAY_AMT6 default payment next month \n", "0 0 0 0 1 \n", "1 1000 0 2000 1 \n", "2 1000 1000 5000 0 \n", "3 1100 1069 1000 0 \n", "4 9000 689 679 0 \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "url_db= 'default of credit card clients.xls'\n", "df = pd.read_excel(url_db, header=1)\n", "\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "## Analise se o dataset possui dados faltantes, se sim, faça o drop dessas entradas\n", "\n", "## Seu código aqui.....\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Descrição das colunas do dataset:\n", "\n", " 'ID', \n", " 'LIMIT_BAL', --> Limite de crédito\n", " 'SEX', \n", " 'EDUCATION', \n", " 'MARRIAGE', \n", " 'AGE', \n", " 'PAY_0',\n", " 'PAY_2', \n", " 'PAY_3', \n", " 'PAY_4', \n", " 'PAY_5', \n", " 'PAY_6', \n", " 'BILL_AMT1', --> dívida\n", " 'BILL_AMT2', --> dívida \n", " 'BILL_AMT3', --> dívida \n", " 'BILL_AMT4', --> dívida \n", " 'BILL_AMT5', --> dívida \n", " 'BILL_AMT6', --> dívida \n", " 'PAY_AMT1',\n", " 'PAY_AMT2', \n", " 'PAY_AMT3', \n", " 'PAY_AMT4', \n", " 'PAY_AMT5', \n", " 'PAY_AMT6',\n", " 'default payment next month'\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Criando um subset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Crie uma nova coluna que soma todas as colunas de dívidas \n", "\n", "## Seu código aqui.....\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Selecione as colunas de limite de crédito e divida total \n", "\n", "#data = df.iloc[:,[,]].values\n", "#data;" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "## Como estamos trabalhos com algoritmos de distância, é importante e recomendavel que os dados sejam normalizados\n", "## seu código de normalização...\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Quantidade de k cluster\n", "\n", "Escolha uma técnica dada em aula para inicializar o Kmeans. Poder ser a técnica `Elbow` ou `dendrograma`. \n", "\n", "---\n", "wcss = within-cluster sum of squares = soma dos quadrados intra-clusters\n", "\n", "```python\n", "wcss = []\n", "K = range(1,12)\n", "\n", "for k in K:\n", " km = KMeans(n_clusters=k)\n", " km = km.fit(data_treino)\n", " wcss.append(km.inertia_)\n", " \n", "\n", "plt.plot(K, wcss, \"bx-\", color = \"grey\")\n", "plt.xlabel(\"k\")\n", "plt.ylabel(\"WCSS\")\n", "plt.title(\"Método do Cotovelo para k Otimizado\");\n", "```\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "## Seu código aqui......\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Agrupando dados\n", "\n", "Realize o agrupamento utilizando `Kmeans e Agnes` e compare os resultados obtidos." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "## Sua resposta aqui.....\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "## Faça uma breve descrição de cada cluster obtido, o que eles indicam?\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 4 }