모듈 import¶

In [1]:

!pip install seaborn==0.13.0

Defaulting to user installation because normal site-packages is not writeable
Collecting seaborn==0.13.0
  Downloading seaborn-0.13.0-py3-none-any.whl (294 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 294.6/294.6 kB 6.1 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: matplotlib!=3.6.1,>=3.3 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (3.6.0)
Requirement already satisfied: pandas>=1.2 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (1.4.2)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (1.23.3)
Requirement already satisfied: python-dateutil>=2.7 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (2.8.2)
Requirement already satisfied: fonttools>=4.22.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (4.38.0)
Requirement already satisfied: pillow>=6.2.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (9.3.0)
Requirement already satisfied: kiwisolver>=1.0.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.4.4)
Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (21.3)
Requirement already satisfied: cycler>=0.10 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (0.11.0)
Requirement already satisfied: contourpy>=1.0.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.0.6)
Requirement already satisfied: pyparsing>=2.2.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (3.0.9)
Requirement already satisfied: pytz>=2020.1 in ./.local/lib/python3.9/site-packages (from pandas>=1.2->seaborn==0.13.0) (2022.5)
Requirement already satisfied: six>=1.5 in ./.local/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.16.0)
Installing collected packages: seaborn
  Attempting uninstall: seaborn
    Found existing installation: seaborn 0.12.0
    Uninstalling seaborn-0.12.0:
      Successfully uninstalled seaborn-0.12.0
Successfully installed seaborn-0.13.0

[notice] A new release of pip available: 22.2.2 -> 24.1.1
[notice] To update, run: pip install --upgrade pip

In [2]:

import pandas as pd

data = {'survived': [1, 0, 1], 'age': [22, 28, 38]}
df = pd.DataFrame(data)

type_1 = type(df['survived'])
print(type_1)

type_2 = type(df[['survived']])
print(type_2)

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

In [3]:

from IPython.display import Image
import numpy as np
import pandas as pd
import seaborn as sns
import warnings

# 경고 무시
warnings.filterwarnings('ignore')

데이터셋 로드¶

In [4]:

df = sns.load_dataset('titanic')
df.head()

Out[4]:

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True

컬럼(columns) 설명

survivied: 생존여부 (1: 생존, 0: 사망)
pclass: 좌석 등급 (1등급, 2등급, 3등급)
sex: 성별
age: 나이
sibsp: 형제 + 배우자 수
parch: 부모 + 자녀 수
fare: 좌석 요금
embarked: 탑승 항구 (S, C, Q)
class: pclass와 동일
who: 남자(man), 여자(woman), 아이(child)
adult_male: 성인 남자 여부
deck: 데크 번호 (알파벳 + 숫자 혼용)
embark_town: 탑승 항구 이름
alive: 생존여부 (yes, no)
alone: 혼자 탑승 여부

apply() - 함수를 적용¶

apply()는 데이터 전처리시 굉장히 많이 활용하는 기능입니다.

좀 더 복잡한 logic을 컬럼 혹은 DataFrame에 적용하고자 할 때 사용합니다.

In [5]:

df.head()

Out[5]:

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True

who 컬럼에 대하여 man은 남자, woman은 여자, child는 아이로 변경하고자 한다면 apply를 활용하여 해결할 수 있습니다.

In [6]:

df['who'].value_counts()

Out[6]:

man      537
woman    271
child     83
Name: who, dtype: int64

함수(function) 정의¶

In [7]:

def transform_who(x):
    if x == 'man':
        return '남자'
    elif x == 'woman':
        return '여자'
    else:
        return '아이'

In [8]:

df['who'].apply(transform_who)

Out[8]:

0      남자
1      여자
2      여자
3      여자
4      남자
       ..
886    남자
887    여자
888    여자
889    남자
890    남자
Name: who, Length: 891, dtype: object

분포를 확인하면 다음과 같습니다.

In [9]:

df['who'].apply(transform_who).value_counts()

Out[9]:

남자    537
여자    271
아이     83
Name: who, dtype: int64

In [10]:

def transform_who(x):
    return x['fare'] / x['age']

In [11]:

df.apply(transform_who, axis=1)

Out[11]:

0      0.329545
1      1.875876
2      0.304808
3      1.517143
4      0.230000
         ...   
886    0.481481
887    1.578947
888         NaN
889    1.153846
890    0.242188
Length: 891, dtype: float64

apply() - lambda 함수¶

간단한 logic은 함수를 굳이 정의하지 않고, lambda 함수로 쉽게 해결할 수 있습니다.

In [12]:

df['survived'].value_counts()

Out[12]:

0    549
1    342
Name: survived, dtype: int64

0: 사망, 1: 생존 으로 변경하도록 하겠습니다.

In [13]:

df.head()

Out[13]:

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True

In [14]:

df['survived'].apply(lambda x: '생존' if x == 1 else '사망')

Out[14]:

0      사망
1      생존
2      생존
3      생존
4      사망
       ..
886    사망
887    생존
888    사망
889    생존
890    사망
Name: survived, Length: 891, dtype: object

In [15]:

df['survived'].apply(lambda x: '생존' if x == 1 else '사망').value_counts()

Out[15]:

사망    549
생존    342
Name: survived, dtype: int64

연습문제¶

In [16]:

sample = df.copy()
sample.head()

Out[16]:

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True

In [17]:

sample['class'].value_counts()

Out[17]:

Third     491
First     216
Second    184
Name: class, dtype: int64

sample 데이터프레임에 대하여 apply()를 활용하여 class 컬럼의 값을 다음과 같이 바꾸고, 분포를 출력후 변경 전과 동일한지 확인하세요

In [24]:

# 코드를 입력해 주세요
def map_class(value):
    mapping = {
        'Third': '삼등석',
        'Second': '이등석',
        'First': '일등석'
    }
    return mapping.get(value, value)

# 'class' 열을 apply와 map_class 함수를 사용하여 변경
sample['class'] = sample['class'].apply(map_class)
sample['class'].value_counts()

Out[24]:

삼등석    491
일등석    216
이등석    184
Name: class, dtype: int64

[출력 결과]

삼등석    491
일등석    216
이등석    184
Name: class, dtype: int64

groupby() - 그룹¶

데이터를 특정 기준으로 그룹핑할 때 활용합니다. 엑셀의 피봇테이블과 유사합니다.

참고링크

판다스(Pandas) .groupby()로 할 수 있는 거의 모든 것!

In [25]:

df.head()

Out[25]:

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True

타이타닉 호의 생존자와 사망자를 성별 기준으로 그룹핑하여 평균을 살펴보겠습니다.

In [26]:

df.groupby('sex').mean()

Out[26]:

	survived	pclass	age	sibsp	parch	fare	adult_male	alone
sex
female	0.742038	2.159236	27.915709	0.694268	0.649682	44.479818	0.000000	0.401274
male	0.188908	2.389948	30.726645	0.429809	0.235702	25.523893	0.930676	0.712305

groupby()를 사용할 때는 반드시 aggregate 하는 통계함수와 일반적으로 같이 적용합니다.

2개 이상의 컬럼으로 그룹¶

2개 이상의 컬럼으로 그룹핑할 때도 list로 묶어서 지정하면 됩니다.

In [27]:

# 성별, 좌석등급 별 통계
df.groupby(['sex', 'pclass']).mean()

Out[27]:

		survived	age	sibsp	parch	fare	adult_male	alone
sex	pclass
female	1	0.968085	34.611765	0.553191	0.457447	106.125798	0.000000	0.361702
	2	0.921053	28.722973	0.486842	0.605263	21.970121	0.000000	0.421053
	3	0.500000	21.750000	0.895833	0.798611	16.118810	0.000000	0.416667
male	1	0.368852	41.281386	0.311475	0.278689	67.226127	0.975410	0.614754
	2	0.157407	30.740707	0.342593	0.222222	19.741782	0.916667	0.666667
	3	0.135447	26.507589	0.498559	0.224784	12.661633	0.919308	0.760807

1개의 특정 컬럼에 대한 결과 도출¶

우리의 주요 관심사는 survived 컬럼입니다. 만약 survived컬럼에 대한 결과만 도출하고 싶다면 컬럼을 맨 끝에 지정합니다.

In [28]:

# 성별, 좌석등급 별 통계
df.groupby(['sex', 'pclass'])['survived'].mean()

Out[28]:

sex     pclass
female  1         0.968085
        2         0.921053
        3         0.500000
male    1         0.368852
        2         0.157407
        3         0.135447
Name: survived, dtype: float64

예쁘게 출력하려면 pd.DataFrame()으로 감싸주거나, survived 컬럼을 []로 한 번 더 감싸주면 됩니다.

In [29]:

# 성별, 좌석등급 별 통계
df.groupby(['sex', 'pclass'])['survived'].mean()

Out[29]:

sex     pclass
female  1         0.968085
        2         0.921053
        3         0.500000
male    1         0.368852
        2         0.157407
        3         0.135447
Name: survived, dtype: float64

In [30]:

# DataFrame으로 출력
pd.DataFrame(df.groupby(['sex', 'pclass'])['survived'].mean())

Out[30]:

		survived
sex	pclass
female	1	0.968085
	2	0.921053
	3	0.500000
male	1	0.368852
	2	0.157407
	3	0.135447

In [31]:

# DataFrame으로 출력
df.groupby(['sex', 'pclass'])[['survived']].mean()

Out[31]:

		survived
sex	pclass
female	1	0.968085
	2	0.921053
	3	0.500000
male	1	0.368852
	2	0.157407
	3	0.135447

reset_index(): 인덱스 초기화¶

reset_index(): 그룹핑된 데이터프레임의 index를 초기화하여 새로운 데이터프레임을 생성합니다.

In [32]:

# index 초기화
df.groupby(['sex', 'pclass'])['survived'].mean().reset_index()

Out[32]:

	sex	pclass	survived
0	female	1	0.968085
1	female	2	0.921053
2	female	3	0.500000
3	male	1	0.368852
4	male	2	0.157407
5	male	3	0.135447

다중 컬럼에 대한 결과 도출¶

끝에 단일 컬럼이 아닌 여러 개의 컬럼을 지정합니다.

In [33]:

# 성별, 좌석등급 별 통계
df.groupby(['sex', 'pclass'])[['survived', 'age']].mean()

Out[33]:

		survived	age
sex	pclass
female	1	0.968085	34.611765
	2	0.921053	28.722973
	3	0.500000	21.750000
male	1	0.368852	41.281386
	2	0.157407	30.740707
	3	0.135447	26.507589

다중 통계 함수 적용¶

여러 가지의 통계 값을 적용할 때는 agg()를 사용합니다.

In [34]:

# 성별, 좌석등급 별 통계
df.groupby(['sex', 'pclass'])[['survived', 'age']].agg(['mean', 'sum'])

Out[34]:

		survived		age
		mean	sum	mean	sum
sex	pclass
female	1	0.968085	91	34.611765	2942.00
	2	0.921053	70	28.722973	2125.50
	3	0.500000	72	21.750000	2218.50
male	1	0.368852	45	41.281386	4169.42
	2	0.157407	17	30.740707	3043.33
	3	0.135447	47	26.507589	6706.42

연습문제¶

In [35]:

sample = df.copy()
sample

Out[35]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
886	0	2	male	27.0	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19.0	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	NaN	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26.0	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32.0	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

891 rows × 15 columns

groupby()를 활용하여 다음을 출력 하세요

pclass 별 생존율

In [36]:

# 코드를 입력해 주세요
sample.groupby('pclass')['survived'].mean()

Out[36]:

pclass
1    0.629630
2    0.472826
3    0.242363
Name: survived, dtype: float64

[출력 결과]

pclass
1    0.629630
2    0.472826
3    0.242363
Name: survived, dtype: float64

embarked 별 생존율 통합 통계

In [37]:

# 코드를 입력해 주세요
sample.groupby('embarked')['survived'].agg(['mean', 'count', 'sum'])

Out[37]:

	mean	count	sum
embarked
C	0.553571	168	93
Q	0.389610	77	30
S	0.336957	644	217

[출력 결과]

	mean	var
embarked
C	0.553571	0.248610
Q	0.389610	0.240943
S	0.336957	0.223764

who, pclass별 생존율, 생존자수

In [42]:

# 코드를 입력해 주세요
survival_stats = sample.groupby(['who', 'pclass']).agg(
    survival_rate=('survived', 'mean'),
    survived_count=('survived', 'sum')
)
survival_stats

Out[42]:

		survival_rate	survived_count
who	pclass
child	1	0.833333	5
	2	1.000000	19
	3	0.431034	25
man	1	0.352941	42
	2	0.080808	8
	3	0.119122	38
woman	1	0.978022	89
	2	0.909091	60
	3	0.491228	56

[출력 결과]

		mean	sum
who	pclass
child	1	0.833333	5
	2	1.000000	19
	3	0.431034	25
man	1	0.352941	42
	2	0.080808	8
	3	0.119122	38
woman	1	0.978022	89
	2	0.909091	60
	3	0.491228	56

남자의 나이는 남자 나이의 평균으로 채우세요
여자의 나이는 여자 나이의 평균으로 채우세요

In [44]:

# 결측치 확인
print(sample['age'].isnull().sum())
print(f"age 평균: {sample['age'].mean():.2f}")

177
age 평균: 29.70

In [47]:

# 코드를 입력해 주세요
# 각 그룹별 평균 나이 계산
mean_ages = sample.groupby('who')['age'].mean()

# 결측값을 채우는 함수 정의
def fill_missing_age(row):
    if pd.isnull(row['age']):
        return mean_ages[row['who']]
    return row['age']

# 결측값 채우기
sample['age'] = sample.apply(fill_missing_age, axis=1)
#sample.head()

In [48]:

# 검증코드
print(sample['age'].isnull().sum())
print(f"age 평균: {sample['age'].mean():.2f}")

0
age 평균: 30.32

[출력 결과]

0
age 평균: 29.74

pivot_table()¶

피벗테이블은 엑셀의 피벗과 동작이 유사하며, groupby()와도 동작이 유사합니다.

기본 동작 원리는 index, columns, values를 지정하여 피벗합니다.

1개 그룹에 대한 단일 컬럼 결과¶

In [49]:

# index에 그룹을 표기
df.pivot_table(index='who', values='survived')

Out[49]:

	survived
who
child	0.590361
man	0.163873
woman	0.756458

In [50]:

# columns에 그룹을 표기
df.pivot_table(columns='who', values='survived')

Out[50]:

who	child	man	woman
survived	0.590361	0.163873	0.756458

다중 그룹에 대한 단일 컬럼 결과¶

In [51]:

df.pivot_table(index=['who', 'pclass'], values='survived')

Out[51]:

		survived
who	pclass
child	1	0.833333
	2	1.000000
	3	0.431034
man	1	0.352941
	2	0.080808
	3	0.119122
woman	1	0.978022
	2	0.909091
	3	0.491228

index에 컬럼을 중첩하지 않고 행과 열로 펼친 결과¶

In [52]:

df.pivot_table(index='who', columns='pclass', values='survived')

Out[52]:

pclass	1	2	3
who
child	0.833333	1.000000	0.431034
man	0.352941	0.080808	0.119122
woman	0.978022	0.909091	0.491228

다중 통계함수 적용¶

In [53]:

df.pivot_table(index='who', columns='pclass', values='survived', aggfunc=['sum', 'mean'])

Out[53]:

	sum			mean
pclass	1	2	3	1	2	3
who
child	5	19	25	0.833333	1.000000	0.431034
man	42	8	38	0.352941	0.080808	0.119122
woman	89	60	56	0.978022	0.909091	0.491228

연습문제¶

In [54]:

tips = sns.load_dataset('tips')
tips.head()

Out[54]:

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

tips 데이터셋을 활용하여 다음을 출력하세요

다음의 pivot table을 생성합니다.

value는 tip에 대한 평균값을 산출합니다.

In [68]:

# 코드를 입력해 주세요
tips.pivot_table(values='tip', aggfunc='mean', columns='day', index=['smoker'])

Out[68]:

day	Thur	Fri	Sat	Sun
smoker
Yes	3.030000	2.7140	2.875476	3.516842
No	2.673778	2.8125	3.102889	3.167895

[출력 결과]

day	Thur	Fri	Sat	Sun
smoker
Yes	3.030000	2.7140	2.875476	3.516842
No	2.673778	2.8125	3.102889	3.167895

다음의 pivot table을 생성합니다.

value는 total_bill에 대한 평균과 합계를 산출합니다.

In [69]:

# 코드를 입력해 주세요
tips.pivot_table(values='total_bill',  aggfunc=['mean', 'sum'], columns='time', index=['day'])

Out[69]:

	mean		sum
time	Lunch	Dinner	Lunch	Dinner
day
Thur	17.664754	18.780000	1077.55	18.78
Fri	12.845714	19.663333	89.92	235.96
Sat	NaN	20.441379	0.00	1778.40
Sun	NaN	21.410000	0.00	1627.16

[출력 결과]

	mean		sum
time	Lunch	Dinner	Lunch	Dinner
day
Thur	17.664754	18.780000	1077.55	18.78
Fri	12.845714	19.663333	89.92	235.96
Sat	NaN	20.441379	0.00	1778.40
Sun	NaN	21.410000	0.00	1627.16

제출¶

제출을 위해 새로 로드된 타이타닉 데이터셋을 sex와 pclass 컬럼으로 그룹핑한 결과에서 survived 컬럼의 평균값을 구한 다음 인덱스를 초기화한 결과를 result_df에 저장하세요.

In [70]:

df = sns.load_dataset('titanic')
grouped = df.groupby(['sex', 'pclass'])['survived'].mean().reset_index()

# 결과를 result_df에 저장
result_df = grouped
grouped.head()

Out[70]:

	sex	pclass	survived
0	female	1	0.968085
1	female	2	0.921053
2	female	3	0.500000
3	male	1	0.368852
4	male	2	0.157407

[Python] Pandas concat, merge (0)	2024.06.30
[Python] Pandas groupby, pivottable 실습 (0)	2024.06.30
[Python] Pandas 전처리, 추가, 삭제, 데이터 변환 (0)	2024.06.30
[Python] Pandas 복제, 결측치 실습 (0)	2024.06.30
[Python] Pandas 복제, 결측치 (0)	2024.06.30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

. way to L!ah ;

[Python] Pandas 고급 전처리와 피벗테이블

모듈 import¶

데이터셋 로드¶

apply() - 함수를 적용¶

함수(function) 정의¶

apply() - lambda 함수¶

연습문제¶

groupby() - 그룹¶

2개 이상의 컬럼으로 그룹¶

1개의 특정 컬럼에 대한 결과 도출¶

reset_index(): 인덱스 초기화¶

다중 컬럼에 대한 결과 도출¶

다중 통계 함수 적용¶

연습문제¶

pivot_table()¶

1개 그룹에 대한 단일 컬럼 결과¶

다중 그룹에 대한 단일 컬럼 결과¶

index에 컬럼을 중첩하지 않고 행과 열로 펼친 결과¶

다중 통계함수 적용¶

연습문제¶

제출¶

'Biusiness Insight > Data Science' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

[Python] Pandas 고급 전처리와 피벗테이블

모듈 import¶

데이터셋 로드¶

apply() - 함수를 적용¶

함수(function) 정의¶

apply() - lambda 함수¶

연습문제¶

groupby() - 그룹¶

2개 이상의 컬럼으로 그룹¶

1개의 특정 컬럼에 대한 결과 도출¶

reset_index(): 인덱스 초기화¶

다중 컬럼에 대한 결과 도출¶

다중 통계 함수 적용¶

연습문제¶

pivot_table()¶

1개 그룹에 대한 단일 컬럼 결과¶

다중 그룹에 대한 단일 컬럼 결과¶

index에 컬럼을 중첩하지 않고 행과 열로 펼친 결과¶

다중 통계함수 적용¶

연습문제¶

제출¶

'Biusiness Insight > Data Science' 카테고리의 다른 글

'Biusiness Insight/Data Science' Related Articles

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역