이 포스팅은 시각화 정리 시리즈 16 편 중 9 번째 글 입니다.

  • Part 1 - 01: 라이브러리
  • Part 2 - 02: matplotlib 알아보기
  • Part 3 - 03: 산점도
  • Part 4 - 04: Circling을 통한 버블 플롯
  • Part 5 - 05: 선형 회귀 선을 포함한 산점도
  • Part 6 - 06: Strip plot
  • Part 7 - 07: Counts plot
  • Part 8 - 08: Marginal Histogram
  • Part 9 - This Post
  • Part 10 - 09: Marginal Boxplot
  • Part 11 - 10: Pair plot
  • Part 12 - 11: Diverging Bars
  • Part 13 - 12: Diverging lines with text
  • Part 14 - 13: Diverging Lollipop Chart with Markers
  • Part 15 - 14: Area chart
  • Part 16 - 15: Ordered Bar Chart
▼ 목록 보기

변수들의 상관관계를 한눈에 파악할 수 있는 Correlation plot을 알아본다.
연습 kaggle notebook

# Useful for:
# The correlation plot helps us to comparte how correlated are 2 variables between them

# More info:
# https://en.wikipedia.org/wiki/Covariance_matrix#Correlation_matrix

# ----------------------------------------------------------------------------------------------------
# get the data
PATH = '/kaggle/input/the-50-plot-challenge/mtcars.csv'
df = pd.read_csv(PATH)

# ----------------------------------------------------------------------------------------------------
# instanciate the figure
fig = plt.figure(figsize = (10, 5))
ax = fig.add_subplot()

# plot using matplotlib
# https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.axes.Axes.imshow.html
ax.imshow(df.corr(), cmap = 'viridis', interpolation = 'nearest')
# set the title for the figure
ax.set_title("Heatmap using matplotlib");

다운로드 (9)

단순히 이렇게만 plot하면 알아보는 것이 어려우니, x, y축에 변수이름을 적어 나타내어 보자.

# Useful for:
# The correlation plot helps us to comparte how correlated are 2 variables between them

# More info:
# https://en.wikipedia.org/wiki/Covariance_matrix#Correlation_matrix

# ----------------------------------------------------------------------------------------------------
# get the data
PATH = '/kaggle/input/the-50-plot-challenge/mtcars.csv'
df = pd.read_csv(PATH)

# ----------------------------------------------------------------------------------------------------
# prepare the data for plotting
# calculate the correlation between all variables
corr = df.corr()
# create a mask to pass it to seaborn and only show half of the cells
# because corr between x and y is the same as the y and x
# it's only for estetic reasons
mask = np.zeros_like(corr) # 0행렬을 만든다.
mask[np.triu_indices_from(mask)] = True # upper triangle 부분을 true로 바꾼다.

# ----------------------------------------------------------------------------------------------------
# instanciate the figure
fig = plt.figure(figsize = (10, 5))

# plot the data using seaborn
ax = sns.heatmap(corr,
                 mask = mask,
                 vmax = 0.3,
                 square = True,
                 cmap = "viridis")
# set the title for the figure
ax.set_title("Heatmap using seaborn");

다운로드 (10)

Reference

Plotting with Python: learn 80 plots STEP by STEP