반응형
단원별 심화 연습 문제¶
In [1]:
!pip install seaborn==0.13.0
Defaulting to user installation because normal site-packages is not writeable Collecting seaborn==0.13.0 Downloading seaborn-0.13.0-py3-none-any.whl (294 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 294.6/294.6 kB 2.1 MB/s eta 0:00:00a 0:00:01 Requirement already satisfied: numpy!=1.24.0,>=1.20 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (1.23.3) Requirement already satisfied: matplotlib!=3.6.1,>=3.3 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (3.6.0) Requirement already satisfied: pandas>=1.2 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (1.4.2) Requirement already satisfied: kiwisolver>=1.0.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.4.4) Requirement already satisfied: python-dateutil>=2.7 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (2.8.2) Requirement already satisfied: pillow>=6.2.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (9.3.0) Requirement already satisfied: fonttools>=4.22.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (4.38.0) Requirement already satisfied: cycler>=0.10 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (0.11.0) Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (21.3) Requirement already satisfied: pyparsing>=2.2.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (3.0.9) Requirement already satisfied: contourpy>=1.0.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.0.6) Requirement already satisfied: pytz>=2020.1 in ./.local/lib/python3.9/site-packages (from pandas>=1.2->seaborn==0.13.0) (2022.5) Requirement already satisfied: six>=1.5 in ./.local/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.16.0) Installing collected packages: seaborn Attempting uninstall: seaborn Found existing installation: seaborn 0.12.0 Uninstalling seaborn-0.12.0: Successfully uninstalled seaborn-0.12.0 Successfully installed seaborn-0.13.0 [notice] A new release of pip available: 22.2.2 -> 24.1.1 [notice] To update, run: pip install --upgrade pip
In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import seaborn as sns
import glob
# set floating point formatting
pd.options.display.float_format = '{:,.1f}'.format
In [6]:
# 데이터셋
taxis = sns.load_dataset('taxis')
taxis.head()
Out[6]:
pickup | dropoff | passengers | distance | fare | tip | tolls | total | color | payment | pickup_zone | dropoff_zone | pickup_borough | dropoff_borough | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-03-23 20:21:09 | 2019-03-23 20:27:24 | 1 | 1.6 | 7.0 | 2.1 | 0.0 | 12.9 | yellow | credit card | Lenox Hill West | UN/Turtle Bay South | Manhattan | Manhattan |
1 | 2019-03-04 16:11:55 | 2019-03-04 16:19:00 | 1 | 0.8 | 5.0 | 0.0 | 0.0 | 9.3 | yellow | cash | Upper West Side South | Upper West Side South | Manhattan | Manhattan |
2 | 2019-03-27 17:53:01 | 2019-03-27 18:00:25 | 1 | 1.4 | 7.5 | 2.4 | 0.0 | 14.2 | yellow | credit card | Alphabet City | West Village | Manhattan | Manhattan |
3 | 2019-03-10 01:23:59 | 2019-03-10 01:49:51 | 1 | 7.7 | 27.0 | 6.2 | 0.0 | 37.0 | yellow | credit card | Hudson Sq | Yorkville West | Manhattan | Manhattan |
4 | 2019-03-30 13:27:42 | 2019-03-30 13:37:14 | 3 | 2.2 | 9.0 | 1.1 | 0.0 | 13.4 | yellow | credit card | Midtown East | Yorkville West | Manhattan | Manhattan |
taxis
데이터에서 요일별 distance
의 평균을 산출하세요
(DataFrame 형식으로 출력하세요)
- [참고] 0: 월요일 ~ 6: 일요일
In [8]:
# 코드를 입력해 주세요
taxis['pickup'] = pd.to_datetime(taxis['pickup'])
taxis['dayofweek'] = taxis['pickup'].dt.dayofweek
taxis.groupby('dayofweek')['distance'].mean().reset_index()
Out[8]:
dayofweek | distance | |
---|---|---|
0 | 0 | 3.2 |
1 | 1 | 3.0 |
2 | 2 | 3.1 |
3 | 3 | 3.0 |
4 | 4 | 2.9 |
5 | 5 | 2.9 |
6 | 6 | 3.0 |
[출력 결과]
dayofweek | distance | |
---|---|---|
0 | 0 | 3.2 |
1 | 1 | 3.0 |
2 | 2 | 3.1 |
3 | 3 | 3.0 |
4 | 4 | 2.9 |
5 | 5 | 2.9 |
6 | 6 | 3.0 |
위에서 산출한 통계를 바탕으로 다음을 도출합니다.
- 요일별
distance
의 평균 값을 구하였습니다. - 구한 평균치보다 작은(미만) 인 데이터만 추출하세요.
- (예시) 0: 월요일 평균
distance
가3.215971
이라면3.215971
보다 작은 데이터만 추출합니다. (월~ 일요일까지 모두 적용) - 필터 후
distance
,fare
,tip
에 대한 중앙값(median)과 표준편차(std)를 산출합니다.
In [10]:
# 코드를 입력해 주세요
# 요일별 distance의 평균 계산
average_distance_by_day = taxis.groupby('dayofweek')['distance'].mean()
# 평균치보다 작은 데이터만 추출
filtered_taxis = taxis[taxis.apply(lambda row: row['distance'] < average_distance_by_day[row['dayofweek']], axis=1)]
# distance, fare, tip에 대한 중앙값(median)과 표준편차(std) 계산
statistics = filtered_taxis[['distance', 'fare', 'tip']].agg(['median', 'std'])
statistics
Out[10]:
distance | fare | tip | |
---|---|---|---|
median | 1.3 | 7.5 | 1.6 |
std | 0.7 | 4.8 | 1.5 |
[출력 결과]
distance | fare | tip | |
---|---|---|---|
median | 1.3 | 7.5 | 1.6 |
std | 0.7 | 4.8 | 1.5 |
다음의 피벗테이블을 완성해 주세요
- 요일(
dayofweek
)과payment
별fare
의 중앙값(median)과 표준편차(std)를 산출합니다.
In [11]:
# 코드를 입력해 주세요
# 요일(dayofweek)과 payment 별 fare의 중앙값(median)과 표준편차(std) 피벗 테이블 생성
pivot_table = taxis.pivot_table(
values='fare',
index='dayofweek',
columns='payment',
aggfunc=['median', 'std']
)
pivot_table
Out[11]:
median | std | |||
---|---|---|---|---|
payment | cash | credit card | cash | credit card |
dayofweek | ||||
0 | 8.5 | 9.5 | 12.8 | 12.4 |
1 | 8.0 | 10.0 | 11.9 | 12.9 |
2 | 8.5 | 9.5 | 13.2 | 11.2 |
3 | 8.5 | 10.0 | 9.5 | 11.1 |
4 | 8.5 | 9.5 | 10.3 | 12.1 |
5 | 9.0 | 9.2 | 8.6 | 10.9 |
6 | 8.0 | 9.5 | 12.9 | 10.6 |
[출력 결과]
median | std | |||
---|---|---|---|---|
payment | cash | credit card | cash | credit card |
dayofweek | ||||
0 | 8.5 | 9.5 | 12.8 | 12.4 |
1 | 8.0 | 10.0 | 11.9 | 12.9 |
2 | 8.5 | 9.5 | 13.2 | 11.2 |
3 | 8.5 | 10.0 | 9.5 | 11.1 |
4 | 8.5 | 9.5 | 10.3 | 12.1 |
5 | 9.0 | 9.2 | 8.6 | 10.9 |
6 | 8.0 | 9.5 | 12.9 | 10.6 |
제출¶
제출을 위해 새로 로드된 택시 데이터셋에서 요일별 distance
의 평균을 계산한 결과를 DataFrame 형식으로 result_df
에 저장하세요.
In [12]:
taxis = sns.load_dataset('taxis')
# pickup 컬럼을 datetime으로 변환
taxis['pickup'] = pd.to_datetime(taxis['pickup'])
# pickup 컬럼을 요일로 변환 (0: 월요일 ~ 6: 일요일)
taxis['dayofweek'] = taxis['pickup'].dt.dayofweek
# 요일별 distance의 평균 계산
average_distance_by_day = taxis.groupby('dayofweek')['distance'].mean().reset_index()
# DataFrame 형식으로 저장
result_df = pd.DataFrame(average_distance_by_day)
result_df.head()
Out[12]:
dayofweek | distance | |
---|---|---|
0 | 0 | 3.2 |
1 | 1 | 3.0 |
2 | 2 | 3.1 |
3 | 3 | 3.0 |
4 | 4 | 2.9 |
반응형
'Biusiness Insight > Data Science' 카테고리의 다른 글
[Python] Pandas concat, merge 실습 (0) | 2024.06.30 |
---|---|
[Python] Pandas concat, merge (0) | 2024.06.30 |
[Python] Pandas 고급 전처리와 피벗테이블 (0) | 2024.06.30 |
[Python] Pandas 전처리, 추가, 삭제, 데이터 변환 (0) | 2024.06.30 |
[Python] Pandas 복제, 결측치 실습 (0) | 2024.06.30 |