๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
์„œ๋น„์Šค & ํˆด

[Metric] ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค + ๊ทธ๋ผํŒŒ๋‚˜

by ์„œ์•„๋ž‘๐Ÿ˜ 2024. 12. 23.

 

๋“ค์–ด๊ฐ€๋ฉฐ

์ปจํ…Œ์ด๋„ˆ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค์—์„œ ํ•ญ์ƒ ๋“ฑ์žฅํ•˜๋Š” ๊ฒƒ์ด ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค(Prometheus)์™€ ๊ทธ๋ผํŒŒ๋‚˜(Grafana)์ž…๋‹ˆ๋‹ค.๋‘ ๊ธฐ์ˆ ์€ ์ฒ ์ฒ˜ํžˆ ์„œ๋กœ ๋‹ค๋ฅธ ์—ญํ• ์„ ํ•˜์ง€๋งŒ, ๋ชจ๋‹ˆํ„ฐ๋ง ํˆด๋กœ ์‚ฌ์šฉํ•˜๊ธฐ์—๋Š” ๊ถํ•ฉ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ ๋‘ ๊ฐ€์ง€ ๊ธฐ์ˆ ์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๊ณ , ๊ฐ„๋‹จํžˆ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐฉ๋ฒ•๊นŒ์ง€ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค ๋งคํŠธ๋ฆญ(Prometheus Metrics)์ด๋ž€?

ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค(Prometheus)๋Š” ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ์•Œ๋ฆผ์„ ์œ„ํ•œ ์˜คํ”ˆ์†Œ์Šค ์‹œ์Šคํ…œ์œผ๋กœ, ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ(time-series data)๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ "๋งคํŠธ๋ฆญ(metric)"์€ ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค๊ฐ€ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, ์ด๋Š” ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ณ€๊ฒฝ๋˜๋Š” ํŠน์ • ์ธก์ •๊ฐ’์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค ๋งคํŠธ๋ฆญ์˜ ์ฃผ์š” ํŠน์ง•

  1. ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ: ๋งคํŠธ๋ฆญ์€ ํŠน์ • ์‹œ๊ฐ„(timestamp)์— ๋Œ€ํ•œ ๊ฐ’(value)๊ณผ ํ•ด๋‹น ๊ฐ’์ด ๋ฐœ์ƒํ•œ ๋ ˆ์ด๋ธ”(label)์˜ ์ง‘ํ•ฉ์œผ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ: cpu_usage{host="server1", core="0"} 0.85 @ 1672601983
  2. ๋ ˆ์ด๋ธ”(Label): ๋งคํŠธ๋ฆญ์— ์ถ”๊ฐ€ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” key-value ์Œ์ž…๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์„ ํ†ตํ•ด ๋™์ผํ•œ ๋งคํŠธ๋ฆญ ์ด๋ฆ„์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๋ผ๋„ ๊ตฌ์ฒด์ ์ธ ๊ตฌ๋ถ„์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ: http_requests_total{method="POST", status="500"} 10
  3. ๋‹ค์–‘ํ•œ ์œ ํ˜•: ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค ๋งคํŠธ๋ฆญ์€ ๋„ค ๊ฐ€์ง€ ๊ธฐ๋ณธ ์œ ํ˜•์œผ๋กœ ๋ถ„๋ฅ˜๋ฉ๋‹ˆ๋‹ค.

 

ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค ๋งคํŠธ๋ฆญ์˜ ์žฅ์ 

  • ๊ณ ๋„๋กœ ์ปค์Šคํ„ฐ๋งˆ์ด์ง• ๊ฐ€๋Šฅํ•œ ๋ ˆ์ด๋ธ”: ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ ์ฟผ๋ฆฌ์™€ ๋ถ„์„ ๊ฐ€๋Šฅ.
  • ํšจ์œจ์ ์ธ ์Šคํ† ๋ฆฌ์ง€: ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ €์žฅํ•˜๊ณ  ์ฒ˜๋ฆฌ.
  • ์•Œ๋ฆผ ์‹œ์Šคํ…œ ํ†ตํ•ฉ: AlertManager์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ด ๋ฌธ์ œ ๋ฐœ์ƒ ์‹œ ์•Œ๋ฆผ์„ ๋ณด๋‚ผ ์ˆ˜ ์žˆ์Œ.

 

์ฆ‰, ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค๋Š” ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘๋งŒ์„ ๋‹ด๋‹นํ•˜๋Š” ์„œ๋น„์Šค์ž…๋‹ˆ๋‹ค.

 

 

 

๊ทธ๋ผํŒŒ๋‚˜(Grafana)๋ž€?

๊ทธ๋ผํŒŒ๋‚˜(Grafana)๋Š” ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”์™€ ๋ชจ๋‹ˆํ„ฐ๋ง์„ ์œ„ํ•œ ์˜คํ”ˆ์†Œ์Šค ํ”Œ๋žซํผ์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€ ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐํ™”ํ•˜์—ฌ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค. ์ฃผ๋กœ ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค(Prometheus)์™€ ํ•จ๊ป˜ ์‚ฌ์šฉ๋˜์ง€๋งŒ, ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์†Œ์Šค์™€๋„ ์ž˜ ํ†ตํ•ฉ๋ฉ๋‹ˆ๋‹ค.


๊ทธ๋ผํŒŒ๋‚˜์˜ ์ฃผ์š” ๊ธฐ๋Šฅ

  1. ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”
    • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์ฐจํŠธ, ๊ทธ๋ž˜ํ”„, ํ…Œ์ด๋ธ” ํ˜•ํƒœ๋กœ ์‹œ๊ฐํ™”.
    • ๋ฐ์ดํ„ฐ๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๊ณ  ์ง๊ด€์ ์œผ๋กœ ํŒŒ์•… ๊ฐ€๋Šฅ.
  2. ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค ์ง€์›
    • Prometheus, InfluxDB, Elasticsearch, OpenSearch, MySQL, PostgreSQL ๋“ฑ๊ณผ ์—ฐ๋™ ๊ฐ€๋Šฅ.
    • ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋ฅผ ํ•œ ๋Œ€์‹œ๋ณด๋“œ์— ํ†ตํ•ฉํ•˜์—ฌ ๋ถ„์„ ๊ฐ€๋Šฅ.
  3. ๋Œ€์‹œ๋ณด๋“œ ์ƒ์„ฑ ๋ฐ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•
    • ๋“œ๋ž˜๊ทธ ์•ค ๋“œ๋กญ ๋ฐฉ์‹์œผ๋กœ ๋Œ€์‹œ๋ณด๋“œ ์ƒ์„ฑ.
    • ๋‹ค์–‘ํ•œ ์œ„์ ฏ(๊ทธ๋ž˜ํ”„, ๊ฒŒ์ด์ง€, ํžˆํŠธ๋งต ๋“ฑ) ์ œ๊ณต.
  4. ์•Œ๋ฆผ(Alerting)
    • ํŠน์ • ์กฐ๊ฑด์„ ์„ค์ •ํ•ด ์•Œ๋ฆผ์„ ์ƒ์„ฑํ•˜๊ณ  ์ด๋ฉ”์ผ, Slack, PagerDuty ๋“ฑ์œผ๋กœ ์•Œ๋ฆผ ์ „์†ก.
    • ์˜ˆ: ์„œ๋ฒ„์˜ CPU ์‚ฌ์šฉ๋ฅ ์ด 90%๋ฅผ ์ดˆ๊ณผํ•  ๊ฒฝ์šฐ ์•Œ๋ฆผ ๋ฐœ์†ก.
  5. ์‚ฌ์šฉ์ž ๊ด€๋ฆฌ ๋ฐ ๊ณต์œ 
    • ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ํŒ€๊ณผ ๊ณต์œ ํ•˜๊ฑฐ๋‚˜ ์‚ฌ์šฉ์ž๋ณ„ ์ ‘๊ทผ ๊ถŒํ•œ ์„ค์ •.
    • ํŠน์ • ์‚ฌ์šฉ์ž๋‚˜ ํŒ€์— ๋งž์ถ˜ ๋Œ€์‹œ๋ณด๋“œ ์ œ๊ณต.
  6. ํ”Œ๋Ÿฌ๊ทธ์ธ ํ™•์žฅ์„ฑ
    • ์ˆ˜๋ฐฑ ๊ฐœ์˜ ํ”Œ๋Ÿฌ๊ทธ์ธ์„ ์„ค์น˜ํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์†Œ์Šค, ์ฐจํŠธ ์œ ํ˜•, ํŒจ๋„ ๊ธฐ๋Šฅ ์ถ”๊ฐ€ ๊ฐ€๋Šฅ.

 

๊ทธ๋ผํŒŒ๋‚˜์˜ ์ฃผ์š” ํ™œ์šฉ ์‚ฌ๋ก€

  1. ์„œ๋ฒ„ ๋ฐ ์‹œ์Šคํ…œ ๋ชจ๋‹ˆํ„ฐ๋ง
    • CPU, ๋ฉ”๋ชจ๋ฆฌ, ๋„คํŠธ์›Œํฌ ์‚ฌ์šฉ๋Ÿ‰ ์ถ”์ .
    • ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์™€ ์—ฐ๋™ํ•˜์—ฌ ์„œ๋ฒ„ ์ƒํƒœ๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ชจ๋‹ˆํ„ฐ๋ง.
  2. ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ฑ๋Šฅ ๊ด€๋ฆฌ(APM)
    • ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์‘๋‹ต ์‹œ๊ฐ„, ์—๋Ÿฌ์œจ, ํŠธ๋ž˜ํ”ฝ ๋ถ„์„.
    • ๋ฐ์ดํ„ฐ ์†Œ์Šค๋กœ New Relic, Jaeger ๋“ฑ ์‚ฌ์šฉ.
  3. ๋น„์ฆˆ๋‹ˆ์Šค ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”
    • ๋งค์ถœ, ์‚ฌ์šฉ์ž ํ–‰๋™, ํŠธ๋žœ์žญ์…˜ ๋ฐ์ดํ„ฐ ๋ถ„์„.
    • MySQL, PostgreSQL ๋“ฑ์—์„œ ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์™€ ๋Œ€์‹œ๋ณด๋“œ ์ƒ์„ฑ.
  4. ํด๋ผ์šฐ๋“œ ์ธํ”„๋ผ ๋ชจ๋‹ˆํ„ฐ๋ง
    • AWS, Google Cloud, Azure์˜ ์„œ๋น„์Šค ์ƒํƒœ ๋ฐ ๋น„์šฉ ๋ถ„์„.
    • CloudWatch, BigQuery์™€ ๊ฐ™์€ ํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ ์†Œ์Šค์™€ ํ†ตํ•ฉ.

 

 

 

ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค+๊ทธ๋ผํŒŒ๋‚˜ ๊ตฌ์ถ•ํ•˜๊ธฐ

Python FastAPI๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์‹ฌํ”Œํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ๊ทธ๋ผํŒŒ๋‚˜์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋จผ์ € ๋กœ๊ทธ ์นด์šดํŠธ๋ฅผ ์„ธ๋Š” FastAPI ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

from fastapi import FastAPI
from prometheus_client import Counter, generate_latest

app = FastAPI()

# Prometheus Metrics
LOG_COUNTER = Counter("log_count", "Count of specific logs")

@app.get("/")
def read_root():
    return {"message": "Hello World"}

@app.get("/metrics")
def metrics():
    # Prometheus metrics endpoint
    return generate_latest(), {"Content-Type": "text/plain; version=0.0.4"}

@app.get("/log")
def log_event():
    LOG_COUNTER.inc()  # Increment the counter
    return {"message": "Log incremented"}

ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค ํด๋ผ์ด์–ธํŠธ์™€ FastAPI๋ฅผ ๊ตฌ๋™ํ•  ์ˆ˜ ์žˆ๋Š” ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ด์ค๋‹ˆ๋‹ค.

pip install prometheus-client fastapi uvicorn

 

FastAPI ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์‹คํ–‰ํ•ด์ค๋‹ˆ๋‹ค.

uvicorn app:app --host 0.0.0.0 --port 8000

http://localhost:8000/metrics์—์„œ ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค๊ฐ€ ์ˆ˜์ง‘ํ•˜๋Š” ๋ฉ”ํŠธ๋ฆญ์„ ํ™•์ธํ•ด๋ด…๋‹ˆ๋‹ค. ์ œ ๊ฐœ์ธ์ ์ธ ํ…Œ์ŠคํŠธ ํ™˜๊ฒฝ์—์„œ๋Š” ์ฃผ์š” ๊ฐ€์ƒ์ž์‚ฐ์— ๋Œ€ํ•œ ์˜ค๋”๋ถ ๊ฐœ์ˆ˜๋ฅผ ์ˆ˜์ง‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์™€ ๋น„์Šทํ•˜๊ฒŒ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค.

process_cpu_seconds_total 4.460000000000001
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 16.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 65535.0
# HELP btc_open_order BTC open order count
# TYPE btc_open_order gauge
btc_open_order 328.0
# HELP eth_open_order ETH open order count
# TYPE eth_open_order gauge
eth_open_order 220.0
# HELP xrp_open_order XRP open order count
# TYPE xrp_open_order gauge
xrp_open_order 227.0
# HELP doge_open_order DOGE open order count
# TYPE doge_open_order gauge
doge_open_order 231.0
# HELP sol_open_order SOL open order count
# TYPE sol_open_order gauge
sol_open_order 234.0

metrics์˜ ๋ฐ์ดํ„ฐ๋Š” jsonํ˜•ํƒœ๊ฐ€ ์•„๋‹Œ plain text์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ metrics์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋Œ€๊ด„ํ˜ธ๋กœ ์‹œ์ž‘ํ–ˆ์„ ๋•Œ๋Š” json์œผ๋กœ ์ฝ์œผ๋ ค๋‹ค๊ฐ€ ์—๋Ÿฌ๊ฐ€ ๋‚  ์ˆ˜ ์žˆ์œผ๋‹ˆ FastAPI์˜ metric ๋ผ์šฐํ„ฐ์˜ ๋ฐ˜ํ™˜๊ฐ’์„ ์•„๋ž˜์™€ ๊ฐ™์ด Response๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.

from fastapi import FastAPI
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
from starlette.responses import Response

app = FastAPI()

@app.get("/metrics")
def metrics():
    # Return Prometheus metrics with explicit Content-Type
    return Response(content=generate_latest(), media_type=CONTENT_TYPE_LATEST)

 

๋‹ค์Œ์œผ๋กœ ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์™€ ๊ทธ๋ผํŒŒ๋‚˜๋ฅผ ๋„์ปค ์ปจํ…Œ์ด๋„ˆ ์œ„์— ์‹คํ–‰ํ•  ์ค€๋น„๋ฅผ ํ•ฉ๋‹ˆ๋‹ค. docker-compose.yml์„ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    networks:
      - monitoring

  grafana:
    image: grafana/grafana
    container_name: grafana
    ports:
      - "3000:3000"
    networks:
      - monitoring

networks:
  monitoring:
    driver: bridge

 

๋‹ค์Œ์œผ๋กœ prometheus.yml์„ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'fastapi'
    metrics_path: /metrics
    static_configs:
      - targets: ['host.docker.internal:8000']

 

๋งˆ์ง€๋ง‰์œผ๋กœ Docker compose๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

docker-compose up -d

docker-compose ps๋ฅผ ํ†ตํ•ด ์ปจํ…Œ์ด๋„ˆ๊ฐ€ ์ œ๋Œ€๋กœ ์ž‘๋™ํ•˜๋Š” ์ง€ ์ฒดํฌ ํ•ฉ๋‹ˆ๋‹ค. 

๋งŒ์•ฝ ์ปจํ…Œ์ด๋„ˆ์— ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค๋ฉด docker-compose logs ํ˜น์€ docker logs prometheus ๋ช…๋ น์–ด์—์„œ ERROR๋ฅผ ์ฐพ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค๊ฐ€ ์ˆ˜์ง‘ํ•˜๋Š” ์ปจํ…Œ์ด๋„ˆ ์ƒ์˜ ํฌํŠธ๋Š” 9090์ด๋ฉฐ(http://localhost:9090), ๊ทธ๋ผํŒŒ๋‚˜๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋Š” ํฌํŠธ๋Š” 3000์ž…๋‹ˆ๋‹ค( http://localhost:3000). http://localhost:9090/api/v1/targets๋ฅผ ์กฐํšŒํ•ด์„œ health๊ฐ’์ด "up" ์ƒํƒœ์ด์–ด์•ผ ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค๊ฐ€ ์ œ๋Œ€๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ "down"์ƒํƒœ์ด๋ฉด lastError๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ์ˆ˜์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

permission denied ๊ถŒํ•œ ๋ฌธ์ œ๊ฐ€ ๋‚˜์™”๋‹ค๋ฉด,

sudo chown -R 472:472 ./grafana-data ๋ช…๋ น์–ด๋ฅผ ํ†ตํ•ด Grafana Docker ์ปจํ…Œ์ด๋„ˆ๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ๊ทธ๋ฃน์œผ๋กœ ๋ณ€๊ฒฝํ•ด์ค๋‹ˆ๋‹ค.

yml ํŒŒ์ผ๋“ค์„ ๋ณ€๊ฒฝํ•˜๊ณ  ๋‚˜๋ฉด docker-compose ์žฌ์‹œ์ž‘ํ•ด์•ผ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

docker-compose down
docker-compose up -d

 

http://localhost:3000์— ์ ‘์†ํ•˜์—ฌ ๊ทธ๋ผํŒŒ๋‚˜๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์‹คํ–‰๋˜๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ์ดˆ๊ธฐ ๋กœ๊ทธ์ธ ์ •๋ณด๋Š” admin/admin์ž…๋‹ˆ๋‹ค. ์ฒซ ๋กœ๊ทธ์ธ์‹œ ๋ณ€๊ฒฝ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์ด ์‹ซ์œผ์‹œ๋ฉด docker-compose.yml์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.

grafana:
    image: grafana/grafana
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=my_grafana
    volumes:
      - ./grafana-data:/var/lib/grafana
    networks:
      - monitoring

 

๊ทธ๋ผํŒŒ๋‚˜ ์„ค์ •์€ ๋‘ ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” data-source์ด๊ณ  ๋‘ ๋ฒˆ์งธ๋Š” dashboard setting์ž…๋‹ˆ๋‹ค. Connections์˜ Data sources๋กœ ๋“ค์–ด๊ฐ€์…”์„œ ๊ธฐ๋ณธ์œผ๋กœ ์žกํ˜€์žˆ๋Š” connection์„ ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค๋กœ ์„ค์ •ํ•ด์ค๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด connection์„ ํƒ๋ฐฉํ•˜๋‹ค๋ณด๋ฉด ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค ๋ง๊ณ ๋„ ์ˆ˜๋งŽ์€ ๋ฐ์ดํ„ฐ ์†Œ์Šค๊ฐ€ ์žˆ๋Š” ๊ฒƒ์„ ๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๋‹ค์Œ์€ Dashboard๋กœ ๋“ค์–ด๊ฐ€์„œ ์ƒˆ๋กœ์šด Dashboard๋ฅผ ๋งŒ๋“œ์‹œ๊ณ  new panel ์„ค์ •์—์„œ Metric Browser์—์„œ log_count๋ฅผ ์„ ํƒํ•˜๋ฉด ์‹œ๊ฐํ™” ๊ทธ๋ž˜ํ”„๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ œ ์˜ˆ์ œ์—์„œ๋Š” ์•ž์„œ ๋ง์”€๋“œ๋ฆฐ ๊ฒƒ๊ณผ ๊ฐ™์ด ๊ฐ€์ƒ์ž์‚ฐ์— ๋Œ€ํ•œ count๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

๋งŒ์•ฝ metric browser์—์„œ "log_count"๋ผ๋Š” ํƒœ๊ทธ๊ฐ€ ์žกํžˆ์ง€ ์•Š๋Š”๋‹ค๋ฉด, ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์—์„œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์ด ์›ํ™œํ•˜๊ฒŒ ์•ˆ๋˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ด๋‹ˆ ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์—์„œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์ด ์ž˜ ๋˜๊ณ  ์žˆ๋Š”์ง€ logs์™€ http://localhost:9090/api/v1/targets๋ฅผ ๋‹ค์‹œ ํ™•์ธํ•ด์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. data-source๋„ ํ™•์ธํ•ด๋ณด์„ธ์š”.

๊ทธ๋ฆฌ๊ณ  query option์—์„œ interval์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๊ณ , panel option์—์„œ Legend, tooltip, Axis, Style๋“ฑ ๋‹ค์–‘ํ•œ ์‹œ๊ฐํ™” ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์•„์˜ˆ ์ฐจํŠธ ์ข…๋ฅ˜๋ฅผ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์ฃ . Time series๋ฅผ Bar chart๋กœ ๋ณ€๊ฒฝํ•œ ๋ชจ์Šต์ž…๋‹ˆ๋‹ค.

 

 

์—ฌ๊ธฐ๊นŒ์ง€ ๊ฐ„๋‹จํ•˜๊ฒŒ ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์™€ ๊ทธ๋ผํŒŒ๋‚˜๋ฅผ ํ†ตํ•œ ์‹œ๊ฐํ™” ๋ชจ๋‹ˆํ„ฐ๋ง์„ ๋งŒ๋“ค์–ด๋ดค์Šต๋‹ˆ๋‹ค.

 

 

๋งˆ์น˜๋ฉฐ

๋งˆ์น˜๋Š” ๊ธ€ ์ „์—, ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์— ๋Œ€ํ•ด ์กฐ๊ธˆ ๋” ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ €๋„ ๋ชจ๋‹ˆํ„ฐ๋ง ์‹œ๊ฐํ™”๋ฅผ ๊ตฌ์ถ•ํ•  ๋•Œ ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๋Š” ๋ฐ ์• ๋ฅผ ๋จน์—ˆ๊ฑฐ๋“ ์š”. 

๋จผ์ € ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž… ์ข…๋ฅ˜๊ฐ€ ๋ช‡๊ฐ€์ง€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒํ™ฉ์— ๋งž๋Š” ๋ฐ์ดํ„ฐ ํƒ€์ž…์„ ์‚ฌ์šฉํ•ด์•ผ ์ œ๋Œ€๋กœ ๋œ ์‹œ๊ฐํ™” ํˆด์„ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž…์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.


1. Counter (์นด์šดํ„ฐ)

์„ค๋ช…

  • ๋‹จ์กฐ ์ฆ๊ฐ€ ๋ฉ”ํŠธ๋ฆญ์œผ๋กœ, ๊ฐ’์ด 0 ์ด์ƒ์ด๋ฉฐ ์˜ค์ง ์ฆ๊ฐ€๋งŒ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • ์ฃผ๋กœ ์ด๋ฒคํŠธ ํšŸ์ˆ˜๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์žฌ์‹œ์ž‘ ์‹œ ์ดˆ๊ธฐํ™”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ ์˜ˆ์‹œ

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET", status="200"} 1234
http_requests_total{method="POST", status="500"} 42
  • ์˜๋ฏธ: ์ด HTTP ์š”์ฒญ ์ˆ˜๋ฅผ ์ƒํƒœ ์ฝ”๋“œ๋ณ„๋กœ ๊ตฌ๋ถ„.
  • PromQL ์˜ˆ์ œ:
    rate(http_requests_total[5m])  # ์ง€๋‚œ 5๋ถ„ ๋™์•ˆ ์ดˆ๋‹น ์š”์ฒญ ์ˆ˜์˜ ๋น„์œจ ๊ณ„์‚ฐ
    

2. Gauge (๊ฒŒ์ด์ง€)

์„ค๋ช…

  • ๊ฐ’์ด ์ฆ๊ฐ€ํ•˜๊ฑฐ๋‚˜ ๊ฐ์†Œํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”ํŠธ๋ฆญ.
  • ์ฃผ๋กœ ํ˜„์žฌ ์ƒํƒœ(CPU ์‚ฌ์šฉ๋ฅ , ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ๋“ฑ)๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ ์˜ˆ์‹œ

# HELP memory_usage_bytes Current memory usage in bytes
# TYPE memory_usage_bytes gauge
memory_usage_bytes 204857600
  • ์˜๋ฏธ: ํ˜„์žฌ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰(๋ฐ”์ดํŠธ ๋‹จ์œ„).
  • PromQL ์˜ˆ์ œ:
    memory_usage_bytes  # ํ˜„์žฌ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ์กฐํšŒ
    

3. Histogram (ํžˆ์Šคํ† ๊ทธ๋žจ)

์„ค๋ช…

  • ๋ฐ์ดํ„ฐ๋ฅผ ๋ฒ„ํ‚ท ๋‹จ์œ„๋กœ ๋ถ„ํฌ์‹œ์ผœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ’์ด ํŠน์ • ๋ฒ”์œ„์— ์–ผ๋งˆ๋‚˜ ์ž์ฃผ ๋‚˜ํƒ€๋‚˜๋Š”์ง€๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
  • ์š”์•ฝ ์ •๋ณด(_sum, _count, ๋ฒ„ํ‚ท๋ณ„ ๋ฐ์ดํ„ฐ)๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ ์˜ˆ์‹œ

# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.1"} 2400
http_request_duration_seconds_bucket{le="0.5"} 3000
http_request_duration_seconds_bucket{le="1.0"} 3500
http_request_duration_seconds_bucket{le="+Inf"} 4000
http_request_duration_seconds_sum 700
http_request_duration_seconds_count 4000
  • ์˜๋ฏธ:
    • le="0.1": 0.1์ดˆ ์ดํ•˜๋กœ ์ฒ˜๋ฆฌ๋œ ์š”์ฒญ ์ˆ˜๋Š” 2400.
    • http_request_duration_seconds_sum: ๋ชจ๋“  ์š”์ฒญ์˜ ์ด ์‹œ๊ฐ„ ํ•ฉ๊ณ„(์ดˆ ๋‹จ์œ„).
    • http_request_duration_seconds_count: ์š”์ฒญ ์ด ๊ฐœ์ˆ˜.
  • PromQL ์˜ˆ์ œ:
    • ์˜๋ฏธ: ์ง€๋‚œ 5๋ถ„ ๋™์•ˆ ์š”์ฒญ์˜ ํ‰๊ท  ์ฒ˜๋ฆฌ ์‹œ๊ฐ„.
  • rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

4. Summary (์š”์•ฝ)

์„ค๋ช…

  • ํžˆ์Šคํ† ๊ทธ๋žจ๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ ์ฟผํƒ€์ผ(Quantile) ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฒ„ํ‚ท ๋Œ€์‹  ์ง์ ‘์ ์ธ ํ†ต๊ณ„ ๊ฐ’(์˜ˆ: 0.5-๋ฐฑ๋ถ„์œ„์ˆ˜, 0.9-๋ฐฑ๋ถ„์œ„์ˆ˜)์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • ์ •ํ™•ํ•œ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ๋ฉ”๋ชจ๋ฆฌ์™€ CPU ์‚ฌ์šฉ๋Ÿ‰์ด ๋†’์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ ์˜ˆ์‹œ

# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds summary
http_request_duration_seconds{quantile="0.5"} 0.05
http_request_duration_seconds{quantile="0.9"} 0.1
http_request_duration_seconds{quantile="0.99"} 0.2
http_request_duration_seconds_sum 500
http_request_duration_seconds_count 10000
  • ์˜๋ฏธ:
    • quantile="0.5": ์š”์ฒญ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„์˜ 50%๊ฐ€ 0.05์ดˆ ์ดํ•˜.
    • quantile="0.99": ์š”์ฒญ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„์˜ 99%๊ฐ€ 0.2์ดˆ ์ดํ•˜.
  • PromQL ์˜ˆ์ œ:
    • ์˜๋ฏธ: 99-๋ฐฑ๋ถ„์œ„์ˆ˜ ์š”์ฒญ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„.
  • http_request_duration_seconds{quantile="0.99"}

๋ฐ์ดํ„ฐ ํƒ€์ž… ๋น„๊ต ๋ฐ ์„ ํƒ ๊ฐ€์ด๋“œ

ํƒ€์ž… ์ฃผ์š” ํŠน์ง• ์‚ฌ์šฉ ์‚ฌ๋ก€
Counter ๋‹จ์กฐ ์ฆ๊ฐ€, ๊ฐ’ ์ดˆ๊ธฐํ™” ๊ฐ€๋Šฅ ์š”์ฒญ ์ˆ˜, ์˜ค๋ฅ˜ ์ˆ˜, ์ฒ˜๋ฆฌ๋œ ์ž‘์—… ์ˆ˜ ์ธก์ •
Gauge ๊ฐ’ ์ฆ๊ฐ€/๊ฐ์†Œ ๊ฐ€๋Šฅ, ํ˜„์žฌ ์ƒํƒœ ์ธก์ • CPU/๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰, ํ˜„์žฌ ์˜จ๋„, ํ™œ์„ฑ ์—ฐ๊ฒฐ ์ˆ˜
Histogram ๋ฒ„ํ‚ท ๋ถ„ํฌ ๋ฐ์ดํ„ฐ, _sum์™€ _count ์ œ๊ณต ์š”์ฒญ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„ ๋ถ„ํฌ, ํŒŒ์ผ ํฌ๊ธฐ ๋ถ„ํฌ
Summary ์ฟผํƒ€์ผ ๊ธฐ๋ฐ˜ ์š”์•ฝ, ์ •ํ™•ํ•œ ๊ณ„์‚ฐ ์‘๋‹ต ์‹œ๊ฐ„์˜ 50%, 90%, 99% ๋ฐฑ๋ถ„์œ„์ˆ˜ ๊ณ„์‚ฐ

 

 

ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์™€ ๊ทธ๋ผํŒŒ๋‚˜๋ฅผ ์ด์šฉํ•˜๋ฉด ๊ฐ•๋ ฅํ•˜๊ณ  ํŽธํ•œ ์‹œ๊ฐํ™” ํˆด์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ดˆ๊ธฐ ๊ตฌ์ถ•์ด ์กฐ๊ธˆ ๊นŒ๋‹ค๋กญ๊ธด ํ•˜์ง€๋งŒ, ๊ฐ„๋‹จํ•œ ์˜ˆ์ œ๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด๋ณด์„ธ์š”! ์ดํ›„ ์—ฌ๋Ÿฌ๋ถ„๋“ค์ด ํ•˜๋Š” ํ”„๋กœ์ ํŠธ์—์„œ ๋ณด๊ณ  ์‹ถ์€ ๋ฐ์ดํ„ฐ๋“ค์„ ๋ชจ๋‘ ์ถ”๊ฐ€ํ•ด์„œ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์Œ“์ด๋Š” ๋ฐ์ดํ„ฐ ๋ณ€ํ™”๋ฅผ ๊ฐ์ƒํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

 

๋Œ“๊ธ€