반응형
단원별 심화 연습 문제 (난이도: 중)¶
In [1]:
!pip install seaborn==0.13.0
Defaulting to user installation because normal site-packages is not writeable Collecting seaborn==0.13.0 Downloading seaborn-0.13.0-py3-none-any.whl (294 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 294.6/294.6 kB 2.1 MB/s eta 0:00:00a 0:00:01 Requirement already satisfied: pandas>=1.2 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (1.4.2) Requirement already satisfied: matplotlib!=3.6.1,>=3.3 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (3.6.0) Requirement already satisfied: numpy!=1.24.0,>=1.20 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (1.23.3) Requirement already satisfied: cycler>=0.10 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (0.11.0) Requirement already satisfied: pyparsing>=2.2.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (3.0.9) Requirement already satisfied: contourpy>=1.0.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.0.6) Requirement already satisfied: kiwisolver>=1.0.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.4.4) Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (21.3) Requirement already satisfied: pillow>=6.2.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (9.3.0) Requirement already satisfied: fonttools>=4.22.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (4.38.0) Requirement already satisfied: python-dateutil>=2.7 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (2.8.2) Requirement already satisfied: pytz>=2020.1 in ./.local/lib/python3.9/site-packages (from pandas>=1.2->seaborn==0.13.0) (2022.5) Requirement already satisfied: six>=1.5 in ./.local/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.16.0) Installing collected packages: seaborn Attempting uninstall: seaborn Found existing installation: seaborn 0.12.0 Uninstalling seaborn-0.12.0: Successfully uninstalled seaborn-0.12.0 Successfully installed seaborn-0.13.0 [notice] A new release of pip available: 22.2.2 -> 24.1.1 [notice] To update, run: pip install --upgrade pip
In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import seaborn as sns
import glob
# set floating point formatting
pd.options.display.float_format = '{:,.1f}'.format
/mnt/elice/dataset/insurance.csv
파일을 읽어서 DataFrame 형식으로 출력합니다. (상위 5개 행을 출력 합니다.)
In [37]:
# 코드를 입력해 주세요
df = pd.read_csv('/mnt/elice/dataset/insurance.csv')
df.head()
Out[37]:
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
0 | 19 | female | 27.9 | 0 | yes | southwest | 16,884.9 |
1 | 18 | male | 33.8 | 1 | no | southeast | 1,725.6 |
2 | 28 | male | 33.0 | 3 | no | southeast | 4,449.5 |
3 | 33 | male | 22.7 | 0 | no | northwest | 21,984.5 |
4 | 32 | male | 28.9 | 0 | no | northwest | 3,866.9 |
[출력 결과]
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
0 | 19 | female | 27.9 | 0 | yes | southwest | 16,884.9 |
1 | 18 | male | 33.8 | 1 | no | southeast | 1,725.6 |
2 | 28 | male | 33.0 | 3 | no | southeast | 4,449.5 |
3 | 33 | male | 22.7 | 0 | no | northwest | 21,984.5 |
4 | 32 | male | 28.9 | 0 | no | northwest | 3,866.9 |
다음의 조건을 만족하는 DataFrame을 출력하시오.
sex
컬럼이male
인 데이터만 필터합니다.children
이 2명 이상인 데이터만 필터합니다.region
이northeast
와northwest
인 데이터만 필터합니다.bmi
와age
를 기준으로 내림차순 정렬하고 index 3~5 행만 선택하여 출력합니다.
In [27]:
##### 코드를 입력해 주세요
df1 = df[df['sex'] == 'male']
df1 = df1[df1['children'] >= 2]
df1 = df1[df1['region'].isin(['northeast', 'northwest'])]
result_df = df1.sort_values(by=['bmi', 'age'], ascending=[False, False])
result_df[3:6]
Out[27]:
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
569 | 48 | male | 40.6 | 2 | yes | northwest | 45,702.0 |
1201 | 46 | male | 40.4 | 2 | no | northwest | 8,733.2 |
1318 | 35 | male | 39.7 | 4 | no | northeast | 19,496.7 |
[출력 결과]
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
569 | 48 | male | 40.6 | 2 | yes | northwest | 45,702.0 |
1201 | 46 | male | 40.4 | 2 | no | northwest | 8,733.2 |
1318 | 35 | male | 39.7 | 4 | no | northeast | 19,496.7 |
다음의 조건을 만족하는 DataFrame을 출력하시오.
sex
컬럼이female
인 데이터만 필터합니다.- 흡연자(
smoker
컬럼이yes
)만 필터합니다. - 나이는 40대만 필터합니다. (40세 ~ 49세)
charges
를 기준으로 내림차순 정렬하고, 상위 5개 행만 출력 합니다.
In [31]:
##### 코드를 입력해 주세요
df2 = df[df['sex'] == 'female']
df2 = df2[df2['smoker'] == 'yes']
df2 = df2[(df2['age'] >= 40) & (df2['age'] <= 49)]
result_df = df2.sort_values(by='charges', ascending=False).head()
result_df
Out[31]:
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
488 | 44 | female | 38.1 | 0 | yes | southeast | 48,885.1 |
674 | 44 | female | 43.9 | 2 | yes | southeast | 46,201.0 |
549 | 43 | female | 46.2 | 0 | yes | southeast | 45,863.2 |
1323 | 42 | female | 40.4 | 2 | yes | southeast | 43,896.4 |
629 | 44 | female | 39.0 | 0 | yes | northwest | 42,983.5 |
[출력 결과]
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
488 | 44 | female | 38.1 | 0 | yes | southeast | 48,885.1 |
674 | 44 | female | 43.9 | 2 | yes | southeast | 46,201.0 |
549 | 43 | female | 46.2 | 0 | yes | southeast | 45,863.2 |
1323 | 42 | female | 40.4 | 2 | yes | southeast | 43,896.4 |
629 | 44 | female | 39.0 | 0 | yes | northwest | 42,983.5 |
다음의 조건을 만족하는 Series를 출력하시오.
bmi
가 30이상 40 미만인 데이터만 필터합니다.age
가 30대인 데이터만 필터 합니다.- 필터된 데이터의
region
컬럼의 분포를 확인합니다.
In [45]:
# 코드를 입력해 주세요
df3 = df[(df['bmi'] >= 30) & (df['bmi'] < 40)]
df3 = df3[(df3['age'] >= 30) & (df3['age'] < 40)]
df3['region'].value_counts()
Out[45]:
southwest 32 southeast 32 northeast 20 northwest 20 Name: region, dtype: int64
[출력 결과]
southwest 32 southeast 32 northeast 20 northwest 20 Name: region, dtype: int64
제출¶
제출을 위해 새로 로드된 /mnt/elice/dataset/insurance.csv
데이터에서 아래 조건을 만족하는 결과를 result_df
에 저장하세요.
sex
컬럼이female
인 데이터만 필터합니다.- 흡연자(
smoker
컬럼이yes
)만 필터합니다. - 나이는 40대만 필터합니다. (40세 ~ 49세)
charges
를 기준으로 내림차순 정렬하고, 상위 5개 행만 가져옵니다.
In [47]:
df = pd.read_csv('/mnt/elice/dataset/insurance.csv')
df4 = df[df['sex']=='female']
df4 = df4[df4['smoker']=='yes']
df4 = df4[(df4['age'] >= 40) & (df4['age'] < 50)]
result_df = df4.sort_values(by='charges', ascending=False).head()
result_df
Out[47]:
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
488 | 44 | female | 38.1 | 0 | yes | southeast | 48,885.1 |
674 | 44 | female | 43.9 | 2 | yes | southeast | 46,201.0 |
549 | 43 | female | 46.2 | 0 | yes | southeast | 45,863.2 |
1323 | 42 | female | 40.4 | 2 | yes | southeast | 43,896.4 |
629 | 44 | female | 39.0 | 0 | yes | northwest | 42,983.5 |
반응형
'Biusiness Insight > Data Science' 카테고리의 다른 글
[Python] Pandas 통계 실습 (0) | 2024.06.30 |
---|---|
[Python] Pandas 통계 (0) | 2024.06.30 |
[Python] Pandas 조회, 정렬, 조건 필터 (타이타닉 승객 데이터) (0) | 2024.06.30 |
[Python] 파일 입출력 (0) | 2024.06.30 |
[Python] Pandas 자료 구조 (0) | 2024.06.30 |