kcg-ai-monitoring/prediction/models_core/registry.py
htlee 2ceeb966d8 feat(prediction): Phase 1-2 detection model registry + snapshot 관찰 보강
- models_core 패키지 신설 — BaseDetectionModel / ModelContext / ModelResult
  + Registry (ACTIVE 버전 인스턴스화, DAG 순환 검출, topo 플랜)
  + DAGExecutor (PRIMARY→ctx.shared 주입, SHADOW persist-only 오염 차단)
  + params_loader (5분 TTL 캐시), feature_flag (PREDICTION_USE_MODEL_REGISTRY)
- V034 스키마 정합성 사전 검증 + silent error 3건 선제 방어
  · model_id VARCHAR(64) 초과 시 __init__ 에서 즉시 ValueError
  · metric_key VARCHAR(64) 초과는 경고 후 drop (다른 metric 는 저장)
  · persist 가 ctx.conn 재사용 (pool maxconn=5 고갈 방지)
- scheduler.py — 10단계 feature flag 분기 (기본 0, 구 경로 보존)
- partition_manager — detection_model_run_outputs 월별 파티션 자동 생성/DROP
- 유닛테스트 15 케이스 전체 통과 (DAG 순환, SHADOW 오염 차단, 길이 검증)
- snapshot 스크립트 (hourly/diagnostic) 개선
  · spoofing gt0/gt03/gt05/gt07 세분화 — 'silent fault' vs 'no signal' 구분
  · V030 gear_identity_collisions 원시 섹션 (CRITICAL 51건 OPEN 포착)
  · V034 detection_model_* 모니터링 섹션 (Phase 2 대비)
  · stage timing 집계 + stats_hourly vs events category drift 감시

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 08:07:29 +09:00

283 lines
11 KiB
Python

"""ModelRegistry — ACTIVE 버전 전체 인스턴스화 + DAG 검증 + 실행 플랜 생성.
역할:
1. `prediction/models_core/registered/` 를 스캔하여 BaseDetectionModel 서브클래스를 모음
2. DB 에서 ACTIVE 버전 목록(PRIMARY + SHADOW/CHALLENGER) 을 읽어 **버전별 인스턴스**를 생성
3. detection_model_dependencies + 클래스 `depends_on` 을 합쳐 DAG 를 구성하고 순환 검출
4. Executor 가 쓸 topological 실행 플랜(ExecutionPlan) 을 반환
주의:
- 클래스에 `model_id` 가 정의돼 있어도 DB 에 해당 레코드가 없으면 인스턴스화하지 않음
(즉 DB 가 Single Source of Truth, 코드는 "구현 있음" 선언 역할)
- DB 에 model_id 가 있고 코드에 클래스가 없으면 경고 로그 후 **스킵** (부분 배포 허용)
"""
from __future__ import annotations
import importlib
import logging
import pkgutil
from collections import defaultdict, deque
from dataclasses import dataclass, field
from typing import Iterable, Optional, Type
from .base import (
ALLOWED_ROLES,
ROLE_CHALLENGER,
ROLE_PRIMARY,
ROLE_SHADOW,
BaseDetectionModel,
)
from .params_loader import VersionRow, load_active_versions, load_dependencies
logger = logging.getLogger(__name__)
@dataclass
class ExecutionPlan:
"""Executor 가 따를 실행 순서.
Attributes:
topo_order: PRIMARY 기준 topological order (model_id 문자열 리스트).
SHADOW/CHALLENGER 는 자기 model_id 의 PRIMARY 와 같은 슬롯에서 돈다.
primaries: model_id -> BaseDetectionModel 인스턴스 (PRIMARY)
shadows: model_id -> list[BaseDetectionModel] (SHADOW + CHALLENGER)
edges: DAG 디버깅용 (model_id -> set(depends_on))
"""
topo_order: list[str] = field(default_factory=list)
primaries: dict[str, BaseDetectionModel] = field(default_factory=dict)
shadows: dict[str, list[BaseDetectionModel]] = field(default_factory=lambda: defaultdict(list))
edges: dict[str, set[str]] = field(default_factory=lambda: defaultdict(set))
class DAGCycleError(RuntimeError):
"""모델 의존성 그래프에 순환이 있을 때."""
class ModelRegistry:
"""ACTIVE 버전 인스턴스 저장소 + Plan 제공자."""
_DEFAULT_REGISTERED_PKG = 'models_core.registered'
def __init__(self, registered_pkg: str = _DEFAULT_REGISTERED_PKG) -> None:
self._registered_pkg = registered_pkg
self._classes: dict[str, Type[BaseDetectionModel]] = {}
# ------------------------------------------------------------------
# 클래스 discovery
# ------------------------------------------------------------------
def discover_classes(self) -> dict[str, Type[BaseDetectionModel]]:
"""`registered/` 하위 모듈 auto-import + BaseDetectionModel 서브클래스 수집.
동일 model_id 가 여러 클래스에서 중복 선언되면 ValueError.
"""
self._classes = {}
try:
pkg = importlib.import_module(self._registered_pkg)
except ImportError:
logger.warning('registered package %s not importable', self._registered_pkg)
return {}
for mod_info in pkgutil.iter_modules(pkg.__path__, prefix=f'{self._registered_pkg}.'):
try:
module = importlib.import_module(mod_info.name)
except Exception:
logger.exception('failed to import %s', mod_info.name)
continue
for attr_name in dir(module):
obj = getattr(module, attr_name)
if not isinstance(obj, type):
continue
if obj is BaseDetectionModel:
continue
if not issubclass(obj, BaseDetectionModel):
continue
mid = getattr(obj, 'model_id', '')
if not mid:
continue
if mid in self._classes and self._classes[mid] is not obj:
raise ValueError(
f'duplicate model_id {mid!r}: '
f'{self._classes[mid].__name__} vs {obj.__name__}'
)
self._classes[mid] = obj
logger.info('discovered %d detection model classes: %s',
len(self._classes), sorted(self._classes.keys()))
return dict(self._classes)
def register_class(self, cls: Type[BaseDetectionModel]) -> None:
"""테스트·수동 등록용."""
mid = getattr(cls, 'model_id', '')
if not mid:
raise ValueError(f'{cls.__name__}.model_id is empty')
self._classes[mid] = cls
# ------------------------------------------------------------------
# Plan 생성
# ------------------------------------------------------------------
def build_plan(self, conn, *, force_reload: bool = False) -> ExecutionPlan:
"""DB ACTIVE 버전 + 클래스 + DAG 를 합쳐 ExecutionPlan 생성."""
versions = load_active_versions(conn, force_reload=force_reload)
edges = self._collect_edges(conn, versions)
plan = self._instantiate(versions, edges)
plan.topo_order = self._topo_sort(plan)
return plan
def build_plan_from_rows(
self,
versions: Iterable[VersionRow],
dependencies: Iterable[tuple[str, str, str]] = (),
) -> ExecutionPlan:
"""테스트용 — DB 없이 in-memory rows 만으로 Plan 생성."""
edges: dict[str, set[str]] = defaultdict(set)
active_ids = {v['model_id'] for v in versions}
for model_id, depends_on, _key in dependencies:
if model_id in active_ids and depends_on in active_ids:
edges[model_id].add(depends_on)
# 클래스 선언 depends_on 도 합류
for v in versions:
cls = self._classes.get(v['model_id'])
if cls is None:
continue
for dep in getattr(cls, 'depends_on', []) or []:
if dep in active_ids:
edges[v['model_id']].add(dep)
plan = self._instantiate(versions, edges)
plan.topo_order = self._topo_sort(plan)
return plan
# ------------------------------------------------------------------
# 내부
# ------------------------------------------------------------------
def _collect_edges(
self,
conn,
versions: list[VersionRow],
) -> dict[str, set[str]]:
"""DB dependencies + 클래스 선언 depends_on 합산."""
edges: dict[str, set[str]] = defaultdict(set)
active_ids = {v['model_id'] for v in versions}
try:
for model_id, depends_on, _key in load_dependencies(conn):
if model_id in active_ids and depends_on in active_ids:
edges[model_id].add(depends_on)
except Exception:
logger.exception('load_dependencies failed — proceeding with class-level depends_on only')
for v in versions:
cls = self._classes.get(v['model_id'])
if cls is None:
continue
for dep in getattr(cls, 'depends_on', []) or []:
if dep in active_ids:
edges[v['model_id']].add(dep)
return edges
def _instantiate(
self,
versions: Iterable[VersionRow],
edges: dict[str, set[str]],
) -> ExecutionPlan:
plan = ExecutionPlan()
plan.edges = defaultdict(set, {k: set(v) for k, v in edges.items()})
for v in versions:
mid = v['model_id']
role = v['role']
if role not in ALLOWED_ROLES:
logger.warning(
'skip version id=%s role=%r not in %s',
v['id'], role, ALLOWED_ROLES,
)
continue
cls = self._classes.get(mid)
if cls is None:
logger.warning(
'model_id=%s has ACTIVE version %s(role=%s) but no registered class — skipping',
mid, v['version'], role,
)
continue
try:
inst = cls(
version_id=v['id'],
version_str=v['version'],
role=role,
params=v['params'],
)
except Exception:
logger.exception(
'failed to instantiate %s version_id=%s — skipping',
cls.__name__, v['id'],
)
continue
if role == ROLE_PRIMARY:
if mid in plan.primaries:
# DB UNIQUE INDEX 가 보장하지만 방어적으로
logger.error(
'duplicate PRIMARY for %s (existing id=%s, new id=%s) — keeping existing',
mid, plan.primaries[mid].version_id, v['id'],
)
continue
plan.primaries[mid] = inst
else: # SHADOW / CHALLENGER
plan.shadows[mid].append(inst)
return plan
@staticmethod
def _topo_sort(plan: ExecutionPlan) -> list[str]:
"""PRIMARY 노드 기준 topological order. 순환 시 DAGCycleError."""
nodes = set(plan.primaries.keys()) | set(plan.shadows.keys())
# SHADOW-only 모델도 노드로 취급 (PRIMARY 미등록이면 Executor 가 skip)
in_degree: dict[str, int] = {n: 0 for n in nodes}
adj: dict[str, set[str]] = defaultdict(set)
for node, deps in plan.edges.items():
if node not in nodes:
continue
for dep in deps:
if dep not in nodes:
continue
adj[dep].add(node)
in_degree[node] = in_degree.get(node, 0) + 1
order: list[str] = []
queue = deque(sorted([n for n, d in in_degree.items() if d == 0]))
while queue:
n = queue.popleft()
order.append(n)
for nxt in sorted(adj[n]):
in_degree[nxt] -= 1
if in_degree[nxt] == 0:
queue.append(nxt)
if len(order) != len(nodes):
remaining = [n for n in nodes if n not in order]
raise DAGCycleError(
f'DAG cycle detected among detection models: {sorted(remaining)}'
)
return order
# 편의: 싱글톤 패턴 (운영 환경에서 주로 한 인스턴스만 씀)
_registry_singleton: Optional[ModelRegistry] = None
def get_registry() -> ModelRegistry:
global _registry_singleton
if _registry_singleton is None:
_registry_singleton = ModelRegistry()
_registry_singleton.discover_classes()
return _registry_singleton
__all__ = [
'ModelRegistry',
'ExecutionPlan',
'DAGCycleError',
'get_registry',
'ROLE_PRIMARY',
'ROLE_SHADOW',
'ROLE_CHALLENGER',
]