반응형
단원별 심화 연습 문제¶
In [1]:
!pip install seaborn==0.13.0
Defaulting to user installation because normal site-packages is not writeable Collecting seaborn==0.13.0 Downloading seaborn-0.13.0-py3-none-any.whl (294 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 294.6/294.6 kB 2.1 MB/s eta 0:00:00a 0:00:01 Requirement already satisfied: matplotlib!=3.6.1,>=3.3 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (3.6.0) Requirement already satisfied: pandas>=1.2 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (1.4.2) Requirement already satisfied: numpy!=1.24.0,>=1.20 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (1.23.3) Requirement already satisfied: cycler>=0.10 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (0.11.0) Requirement already satisfied: python-dateutil>=2.7 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (2.8.2) Requirement already satisfied: pillow>=6.2.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (9.3.0) Requirement already satisfied: kiwisolver>=1.0.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.4.4) Requirement already satisfied: fonttools>=4.22.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (4.38.0) Requirement already satisfied: pyparsing>=2.2.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (3.0.9) Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (21.3) Requirement already satisfied: contourpy>=1.0.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.0.6) Requirement already satisfied: pytz>=2020.1 in ./.local/lib/python3.9/site-packages (from pandas>=1.2->seaborn==0.13.0) (2022.5) Requirement already satisfied: six>=1.5 in ./.local/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.16.0) Installing collected packages: seaborn Attempting uninstall: seaborn Found existing installation: seaborn 0.12.0 Uninstalling seaborn-0.12.0: Successfully uninstalled seaborn-0.12.0 Successfully installed seaborn-0.13.0 [notice] A new release of pip available: 22.2.2 -> 24.1.1 [notice] To update, run: pip install --upgrade pip
In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import seaborn as sns
import glob
# set floating point formatting
pd.options.display.float_format = '{:,.6f}'.format
Q3¶
범위
- (이전 범위 포함)
- 결측치 채우기
In [3]:
import seaborn as sns
titanic = sns.load_dataset('titanic')
titanic.head()
Out[3]:
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.000000 | 1 | 0 | 7.250000 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.000000 | 1 | 0 | 71.283300 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.000000 | 0 | 0 | 7.925000 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.000000 | 1 | 0 | 53.100000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.000000 | 0 | 0 | 8.050000 | S | Third | man | True | NaN | Southampton | no | True |
각 컬럼별 결측치를 출력하세요
In [5]:
# 코드를 입력해 주세요
titanic.isnull().sum()
Out[5]:
survived 0 pclass 0 sex 0 age 177 sibsp 0 parch 0 fare 0 embarked 2 class 0 who 0 adult_male 0 deck 688 embark_town 2 alive 0 alone 0 dtype: int64
[출력 결과]
survived 0 pclass 0 sex 0 age 177 sibsp 0 parch 0 fare 0 embarked 2 class 0 who 0 adult_male 0 deck 688 embark_town 2 alive 0 alone 0 dtype: int64
age
컬럼의 결측 데이터를 다음의 조건에 맞도록 채워 주세요
who
가man
인 데이터에서age
가 결측치인 데이터의 값을 남자 나이의 median값으로 결측치를 채워 주세요who
가woman
인 데이터에서age
가 결측치인 데이터의 값을 여자 나이의 25% Quantile값으로 결측치를 채워 주세요who
가child
인 데이터에서age
가 결측치인 데이터의 값을 아이 나이의 평균값으로 결측치를 채워 주세요
In [6]:
# 코드를 입력해 주세요
df = titanic
# 각 그룹별 통계값 계산
man_median_age = df[df['who'] == 'man']['age'].median()
woman_quantile_age = df[df['who'] == 'woman']['age'].quantile(0.25)
child_mean_age = df[df['who'] == 'child']['age'].mean()
# 결측치 채우기
df.loc[(df['who'] == 'man') & (df['age'].isna()), 'age'] = man_median_age
df.loc[(df['who'] == 'woman') & (df['age'].isna()), 'age'] = woman_quantile_age
df.loc[(df['who'] == 'child') & (df['age'].isna()), 'age'] = child_mean_age
In [7]:
# 검증코드
print(f"결측치: {titanic['age'].isnull().sum()}")
print(f"age mean: {titanic['age'].mean():.4f}")
결측치: 0 age mean: 29.3425
[출력 결과]
결측치: 0 age mean: 29.3425
제출¶
제출을 위해 새로 로드된 타이타닉 데이터셋에서 age
컬럼의 결측치를 다음 조건에 맞춰 채운 결과를 result_df
에 저장하세요.
who
가man
인 데이터에서age
가 결측치인 데이터의 값을 남자 나이의 median값으로 결측치를 채워 주세요who
가woman
인 데이터에서age
가 결측치인 데이터의 값을 여자 나이의 25% Quantile값으로 결측치를 채워 주세요who
가child
인 데이터에서age
가 결측치인 데이터의 값을 아이 나이의 평균값으로 결측치를 채워 주세요
In [10]:
titanic = sns.load_dataset('titanic')
df = titanic
# 각 그룹별 통계값 계산
man_median_age = df[df['who'] == 'man']['age'].median()
woman_quantile_age = df[df['who'] == 'woman']['age'].quantile(0.25)
child_mean_age = df[df['who'] == 'child']['age'].mean()
# 결측치 채우기
df.loc[(df['who'] == 'man') & (df['age'].isna()), 'age'] = man_median_age
df.loc[(df['who'] == 'woman') & (df['age'].isna()), 'age'] = woman_quantile_age
df.loc[(df['who'] == 'child') & (df['age'].isna()), 'age'] = child_mean_age
result_df = df
df.head()
Out[10]:
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.000000 | 1 | 0 | 7.250000 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.000000 | 1 | 0 | 71.283300 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.000000 | 0 | 0 | 7.925000 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.000000 | 1 | 0 | 53.100000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.000000 | 0 | 0 | 8.050000 | S | Third | man | True | NaN | Southampton | no | True |
반응형
'Biusiness Insight > Data Science' 카테고리의 다른 글
[Python] Pandas 고급 전처리와 피벗테이블 (0) | 2024.06.30 |
---|---|
[Python] Pandas 전처리, 추가, 삭제, 데이터 변환 (0) | 2024.06.30 |
[Python] Pandas 복제, 결측치 (0) | 2024.06.30 |
[Python] Pandas 통계 실습 (0) | 2024.06.30 |
[Python] Pandas 통계 (0) | 2024.06.30 |