이 포스팅은 시각화 정리 시리즈 16 편 중 5 번째 글 입니다.

  • Part 1 - 01: 라이브러리
  • Part 2 - 02: matplotlib 알아보기
  • Part 3 - 03: 산점도
  • Part 4 - 04: Circling을 통한 버블 플롯
  • Part 5 - This Post
  • Part 6 - 06: Strip plot
  • Part 7 - 07: Counts plot
  • Part 8 - 08: Marginal Histogram
  • Part 9 - 09: Correlation plot
  • Part 10 - 09: Marginal Boxplot
  • Part 11 - 10: Pair plot
  • Part 12 - 11: Diverging Bars
  • Part 13 - 12: Diverging lines with text
  • Part 14 - 13: Diverging Lollipop Chart with Markers
  • Part 15 - 14: Area chart
  • Part 16 - 15: Ordered Bar Chart
▼ 목록 보기

목차

▼ 내리기

선형 회귀 선을 포함한 산점도 그래프를 그려보자.
연습 kaggle notebook

산점도에서 추가적으로 간단한 회귀 선을 그려, x, y간의 상관관계를 볼 수 있다.

# Useful for:
# This is a normal scatter plot but we also plot a simple regression line to see the correlation between the x and the y variables.

# More info:
# https://visual.ly/m/scatter-plots-regression-lines/

# ----------------------------------------------------------------------------------------------------
# get the data
PATH = '/kaggle/input/the-50-plot-challenge/mpg_ggplot2.csv'
df = pd.read_csv(PATH)

# ----------------------------------------------------------------------------------------------------
# prepare the data for plotting
# filter only 2 clases to separate it more easily on the plot
# cyl이 4, 8인 두 그룹에 대해 회귀를 진행한다.
df = df[df["cyl"].isin([4,8])]

# ----------------------------------------------------------------------------------------------------
# plot the data using seaborn
# hue가 구분을 결정해주는 변수
# x = "displ", y = "hwy"
sns.lmplot("displ", "hwy", df, hue = "cyl")

# ----------------------------------------------------------------------------------------------------
# prettify the plot

# since we are using seaborn and this library uses matplotlib behind the scenes
# you can call plt.gca (get current axes) and use all the familiar matplotlib commands
# 저 위까지만 해도 그릴 수 있다. 하지만 내가 plot의 꾸미는 부분을 바꾸고 싶다면,
# matplotlib의 axis 단계에 접근하여 수정할 수 있다.
ax = plt.gca()

# change the upper limit of the plot to make it more pleasant
ax.set_xlim(0, 10)
ax.set_ylim(0, 50)

# set title
ax.set_title("Scatter plot with regression");

__results___45_0

분리하여 그리기


# Useful for:
# This is a normal scatter plot but we also plot a simple regression line to see the correlation between the x and the y variables.
# This plot is similar to the previous one but plots each data on separate axes

# More info:
# https://visual.ly/m/scatter-plots-regression-lines/

# sns의 테마 정하기
sns.set(color_codes=True)
# ----------------------------------------------------------------------------------------------------
# get the data
PATH = '/kaggle/input/the-50-plot-challenge/mpg_ggplot2.csv'
df = pd.read_csv(PATH)

# ----------------------------------------------------------------------------------------------------
# prepare the data for plotting
# filter only 2 clases to separate it more easily on the plot
df = df[df["cyl"].isin([4,8])]


# ----------------------------------------------------------------------------------------------------
# plot the data using seaborn
axes = sns.lmplot("displ",
                  "hwy",
                  df,
                  hue = "cyl",
                  col = "cyl" # by specifying the col, seaborn creates several axes for each group
                  # col을 명확하게 적으면 분리하여 그래프를 그려준다.
                 )

# ----------------------------------------------------------------------------------------------------
# prettify the plot

# change the upper limit of the plot to make it more pleasant
axes.set( xlim = (0.5, 7.5), ylim = (0, 50))

# set title for all axes using plt
plt.suptitle("Scatter plot with regression lines on different axes", fontsize = 10);

다운로드

Reference

Plotting with Python: learn 80 plots STEP by STEP