feat: S2 prediction 분석 엔진 모노레포 이식

iran prediction 47개 Python 파일을 prediction/ 디렉토리로 복제:
- algorithms/ 14개 분석 알고리즘 (어구추론, 다크베셀, 스푸핑, 환적, 위험도 등)
- pipeline/ 7단계 분류 파이프라인
- cache/vessel_store (24h 슬라이딩 윈도우)
- db/ 어댑터 (snpdb 원본조회, kcgdb 결과저장)
- chat/ AI 채팅 (Ollama, 후순위)
- data/ 정적 데이터 (기선, 특정어업수역 GeoJSON)

config.py를 kcgaidb로 재구성 (DB명, 사용자, 비밀번호)
DB 연결 검증 완료 (kcgaidb 37개 테이블 접근 확인)
Makefile에 dev-prediction / dev-all 타겟 추가
CLAUDE.md에 prediction 섹션 추가

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
htlee 2026-04-07 12:56:51 +09:00
부모 c17d190e1d
커밋 e2fc355b2c
57개의 변경된 파일9936개의 추가작업 그리고 13개의 파일을 삭제

7
.gitignore vendored
파일 보기

@ -4,6 +4,13 @@ frontend/build/
backend/target/
backend/build/
# === Python (prediction) ===
prediction/.venv/
prediction/__pycache__/
prediction/**/__pycache__/
prediction/*.pyc
prediction/.env
# === Dependencies ===
frontend/node_modules/
node_modules/

파일 보기

@ -19,9 +19,11 @@ kcg-ai-monitoring/
## 시스템 구성
```
[Frontend Vite :5173] ──→ [Backend Spring :8080] ──┬→ [Iran Backend :8080] (분석 데이터 read)
│ └→ [Prediction FastAPI :8001]
└→ [PostgreSQL kcgaidb] (자체 인증/권한/감사/의사결정)
[Frontend Vite :5173] ──→ [Backend Spring :8080] ──→ [PostgreSQL kcgaidb]
↑ write
[Prediction FastAPI :8001] ──────┘ (5분 주기 분석 결과 저장)
↑ read ↑ read
[SNPDB PostgreSQL] (AIS 원본) [Iran Backend] (레거시 프록시, 선택)
```
- **자체 백엔드**: 인증/권한/감사로그/관리자 + 운영자 의사결정 (확정/제외/학습)
@ -31,10 +33,12 @@ kcg-ai-monitoring/
## 명령어
```bash
make install # 의존성 설치
make install # 전체 의존성 설치
make dev # 프론트 + 백엔드 동시 실행
make dev-all # 프론트 + 백엔드 + prediction 동시 실행
make dev-frontend # 프론트만
make dev-backend # 백엔드만
make dev-prediction # prediction 분석 엔진만 (FastAPI :8001)
make build # 전체 빌드
make lint # 프론트 lint
make format # 프론트 prettier
@ -52,7 +56,14 @@ make format # 프론트 prettier
- React Router 7
- ESLint 10 + Prettier
### Backend (`backend/`) — Phase 2에서 초기화
### Prediction (`prediction/`) — 분석 엔진
- Python 3.11+, FastAPI, APScheduler
- 14개 알고리즘 (어구 추론, 다크베셀, 스푸핑, 환적, 위험도 등)
- 7단계 분류 파이프라인 (전처리→행동→리샘플→특징→분류→클러스터→계절)
- AIS 원본: SNPDB (5분 증분), 결과: kcgaidb (직접 write)
- prediction과 backend는 DB만 공유 (HTTP 호출 X)
### Backend (`backend/`)
- Spring Boot 3.x + Java 21
- Spring Security + JWT
- PostgreSQL + Flyway

파일 보기

@ -1,11 +1,13 @@
.PHONY: help install dev dev-frontend dev-backend build build-frontend build-backend lint format test clean
.PHONY: help install dev dev-frontend dev-backend dev-prediction build build-frontend build-backend lint format test clean
help:
@echo "사용 가능한 명령:"
@echo " make install - 프론트엔드 의존성 설치"
@echo " make install - 전체 의존성 설치"
@echo " make dev - 프론트엔드 + 백엔드 동시 실행"
@echo " make dev-all - 프론트 + 백엔드 + prediction 동시 실행"
@echo " make dev-frontend - 프론트엔드 dev 서버만 실행 (Vite)"
@echo " make dev-backend - 백엔드 dev 서버만 실행 (Spring Boot)"
@echo " make dev-prediction - prediction 분석 엔진만 실행 (FastAPI :8001)"
@echo " make build - 프론트엔드 + 백엔드 빌드"
@echo " make build-frontend - 프론트엔드 빌드"
@echo " make build-backend - 백엔드 빌드"
@ -16,6 +18,7 @@ help:
install:
cd frontend && npm install
@if [ -f backend/pom.xml ]; then cd backend && ./mvnw dependency:resolve || true; fi
@if [ -f prediction/requirements.txt ]; then cd prediction && pip install -r requirements.txt 2>/dev/null || echo "prediction 의존성 설치는 가상환경에서 실행하세요: cd prediction && uv venv && source .venv/bin/activate && uv pip install -r requirements.txt"; fi
dev-frontend:
cd frontend && npm run dev
@ -24,9 +27,15 @@ dev-backend:
@if [ -f backend/pom.xml ]; then cd backend && ./mvnw spring-boot:run -Dspring-boot.run.profiles=local; \
else echo "백엔드가 아직 초기화되지 않았습니다 (Phase 2에서 추가)"; fi
dev-prediction:
cd prediction && python main.py
dev:
@$(MAKE) -j2 dev-frontend dev-backend
dev-all:
@$(MAKE) -j3 dev-frontend dev-backend dev-prediction
build-frontend:
cd frontend && npm run build

파일 보기

파일 보기

@ -0,0 +1,59 @@
import pandas as pd
from algorithms.location import haversine_nm
GAP_SUSPICIOUS_SEC = 1800 # 30분
GAP_HIGH_SUSPICIOUS_SEC = 3600 # 1시간
GAP_VIOLATION_SEC = 86400 # 24시간
def detect_ais_gaps(df_vessel: pd.DataFrame) -> list[dict]:
"""AIS 수신 기록에서 소실 구간 추출."""
if len(df_vessel) < 2:
return []
gaps = []
records = df_vessel.sort_values('timestamp').to_dict('records')
for i in range(1, len(records)):
prev, curr = records[i - 1], records[i]
prev_ts = pd.Timestamp(prev['timestamp'])
curr_ts = pd.Timestamp(curr['timestamp'])
gap_sec = (curr_ts - prev_ts).total_seconds()
if gap_sec < GAP_SUSPICIOUS_SEC:
continue
disp = haversine_nm(
prev['lat'], prev['lon'],
curr['lat'], curr['lon'],
)
if gap_sec >= GAP_VIOLATION_SEC:
severity = 'VIOLATION'
elif gap_sec >= GAP_HIGH_SUSPICIOUS_SEC:
severity = 'HIGH_SUSPICIOUS'
else:
severity = 'SUSPICIOUS'
gaps.append({
'gap_sec': int(gap_sec),
'gap_min': round(gap_sec / 60, 1),
'displacement_nm': round(disp, 2),
'severity': severity,
})
return gaps
def is_dark_vessel(df_vessel: pd.DataFrame) -> tuple[bool, int]:
"""다크베셀 여부 판정.
Returns: (is_dark, max_gap_duration_min)
"""
gaps = detect_ais_gaps(df_vessel)
if not gaps:
return False, 0
max_gap_min = max(g['gap_min'] for g in gaps)
is_dark = max_gap_min >= 30 # 30분 이상 소실
return is_dark, int(max_gap_min)

파일 보기

@ -0,0 +1,137 @@
from __future__ import annotations
import pandas as pd
from algorithms.location import haversine_nm, classify_zone # noqa: F401 (haversine_nm re-exported for callers)
# Yan et al. (2022) 어구별 조업 속도 임계값
GEAR_SOG_THRESHOLDS: dict[str, tuple[float, float]] = {
'PT': (2.5, 4.5), # 쌍끌이저인망
'OT': (2.0, 4.0), # 단선저인망
'GN': (0.5, 2.5), # 자망·유망
'SQ': (0.0, 1.0), # 오징어채낚기
'TRAP': (0.3, 1.5), # 통발
'PS': (3.0, 6.0), # 선망
'TRAWL': (2.0, 4.5), # (alias)
'PURSE': (3.0, 6.0), # (alias)
'LONGLINE': (0.5, 2.5),
}
TRANSIT_SOG_MIN = 5.0
ANCHORED_SOG_MAX = 0.5
def classify_vessel_state(sog: float, cog_delta: float = 0.0,
gear_type: str = 'PT') -> str:
"""UCAF: 어구별 상태 분류."""
if sog <= ANCHORED_SOG_MAX:
return 'ANCHORED'
if sog >= TRANSIT_SOG_MIN:
return 'TRANSIT'
sog_min, sog_max = GEAR_SOG_THRESHOLDS.get(gear_type, (1.0, 5.0))
if sog_min <= sog <= sog_max:
return 'FISHING'
return 'UNKNOWN'
def compute_ucaf_score(df_vessel: pd.DataFrame, gear_type: str = 'PT') -> float:
"""UCAF 점수: 어구별 조업 상태 비율 (0~1)."""
if len(df_vessel) == 0:
return 0.0
sog_min, sog_max = GEAR_SOG_THRESHOLDS.get(gear_type, (1.0, 5.0))
in_range = df_vessel['sog'].between(sog_min, sog_max).sum()
return round(in_range / len(df_vessel), 4)
def compute_ucft_score(df_vessel: pd.DataFrame) -> float:
"""UCFT 점수: 조업 vs 항행 이진 신뢰도 (0~1)."""
if len(df_vessel) == 0:
return 0.0
fishing = (df_vessel['sog'].between(0.5, 5.0)).sum()
transit = (df_vessel['sog'] >= TRANSIT_SOG_MIN).sum()
total = fishing + transit
if total == 0:
return 0.0
return round(fishing / total, 4)
def detect_fishing_segments(df_vessel: pd.DataFrame,
window_min: int = 15,
gear_type: str = 'PT') -> list[dict]:
"""연속 조업 구간 추출."""
if len(df_vessel) < 2:
return []
segments: list[dict] = []
in_fishing = False
seg_start_idx = 0
records = df_vessel.to_dict('records')
for i, rec in enumerate(records):
sog = rec.get('sog', 0)
state = classify_vessel_state(sog, gear_type=gear_type)
if state == 'FISHING' and not in_fishing:
in_fishing = True
seg_start_idx = i
elif state != 'FISHING' and in_fishing:
start_ts = records[seg_start_idx].get('timestamp')
end_ts = rec.get('timestamp')
if start_ts and end_ts:
dur_sec = (pd.Timestamp(end_ts) - pd.Timestamp(start_ts)).total_seconds()
dur_min = dur_sec / 60
if dur_min >= window_min:
zone_info = classify_zone(
records[seg_start_idx].get('lat', 0),
records[seg_start_idx].get('lon', 0),
)
segments.append({
'start_idx': seg_start_idx,
'end_idx': i - 1,
'duration_min': round(dur_min, 1),
'zone': zone_info.get('zone', 'UNKNOWN'),
'in_territorial_sea': zone_info.get('zone') == 'TERRITORIAL_SEA',
})
in_fishing = False
# 트랙 끝까지 조업 중이면 마지막 세그먼트 추가
if in_fishing and len(records) > seg_start_idx:
start_ts = records[seg_start_idx].get('timestamp')
end_ts = records[-1].get('timestamp')
if start_ts and end_ts:
dur_sec = (pd.Timestamp(end_ts) - pd.Timestamp(start_ts)).total_seconds()
dur_min = dur_sec / 60
if dur_min >= window_min:
zone_info = classify_zone(
records[seg_start_idx].get('lat', 0),
records[seg_start_idx].get('lon', 0),
)
segments.append({
'start_idx': seg_start_idx,
'end_idx': len(records) - 1,
'duration_min': round(dur_min, 1),
'zone': zone_info.get('zone', 'UNKNOWN'),
'in_territorial_sea': zone_info.get('zone') == 'TERRITORIAL_SEA',
})
return segments
def detect_trawl_uturn(df_vessel: pd.DataFrame,
uturn_threshold_deg: float = 150.0,
min_uturn_count: int = 3) -> dict:
"""U-turn 왕복 패턴 감지 (저인망 특징)."""
if len(df_vessel) < 2:
return {'uturn_count': 0, 'trawl_suspected': False}
uturn_count = 0
cog_vals = df_vessel['cog'].values
sog_vals = df_vessel['sog'].values
for i in range(1, len(cog_vals)):
delta = abs((cog_vals[i] - cog_vals[i - 1] + 180) % 360 - 180)
if delta >= uturn_threshold_deg and sog_vals[i] < TRANSIT_SOG_MIN:
uturn_count += 1
return {
'uturn_count': uturn_count,
'trawl_suspected': uturn_count >= min_uturn_count,
}

파일 보기

@ -0,0 +1,177 @@
"""선단(Fleet) 패턴 탐지 — 공간+행동 기반.
단순 공간 근접이 아닌, 협조 운항 패턴(유사 속도/방향/역할)으로 선단을 판별.
- PT 저인망: 2, 3NM 이내, 유사 속도(2~5kn) + 유사 방향(20° 이내)
- PS 선망: 3~5, 2NM 이내, 모선(고속)+조명선(정지)+운반선(저속 대형)
- FC 환적: 2, 0.5NM 이내, 양쪽 저속(2kn 이하)
"""
import logging
from typing import Optional
import numpy as np
import pandas as pd
from algorithms.location import haversine_nm, dist_to_baseline
logger = logging.getLogger(__name__)
def _heading_diff(h1: float, h2: float) -> float:
"""두 방향 사이 최소 각도차 (0~180)."""
d = abs(h1 - h2) % 360
return d if d <= 180 else 360 - d
def detect_fleet_patterns(
vessel_dfs: dict[str, pd.DataFrame],
) -> dict[int, list[dict]]:
"""행동 패턴 기반 선단 탐지.
Returns: {fleet_id: [{mmsi, lat, lon, sog, cog, role, pattern}, ...]}
"""
# 각 선박의 최신 스냅샷 추출
snapshots: list[dict] = []
for mmsi, df in vessel_dfs.items():
if df is None or len(df) == 0:
continue
last = df.iloc[-1]
snapshots.append({
'mmsi': mmsi,
'lat': float(last['lat']),
'lon': float(last['lon']),
'sog': float(last.get('sog', 0)),
'cog': float(last.get('cog', 0)),
})
if len(snapshots) < 2:
return {}
matched: set[str] = set()
fleets: dict[int, list[dict]] = {}
fleet_id = 0
# 1차: PT 저인망 쌍 탐지 (2척, 3NM, 유사 속도/방향)
for i in range(len(snapshots)):
if snapshots[i]['mmsi'] in matched:
continue
a = snapshots[i]
for j in range(i + 1, len(snapshots)):
if snapshots[j]['mmsi'] in matched:
continue
b = snapshots[j]
dist = haversine_nm(a['lat'], a['lon'], b['lat'], b['lon'])
if dist > 3.0:
continue
# 둘 다 조업 속도 (2~5kn)
if not (2.0 <= a['sog'] <= 5.0 and 2.0 <= b['sog'] <= 5.0):
continue
# 유사 속도 (차이 1kn 미만)
if abs(a['sog'] - b['sog']) >= 1.0:
continue
# 유사 방향 (20° 미만)
if _heading_diff(a['cog'], b['cog']) >= 20.0:
continue
fleets[fleet_id] = [
{**a, 'role': 'LEADER', 'pattern': 'TRAWL_PAIR'},
{**b, 'role': 'MEMBER', 'pattern': 'TRAWL_PAIR'},
]
matched.add(a['mmsi'])
matched.add(b['mmsi'])
fleet_id += 1
break
# 2차: FC 환적 쌍 탐지 (2척, 0.5NM, 양쪽 저속)
for i in range(len(snapshots)):
if snapshots[i]['mmsi'] in matched:
continue
a = snapshots[i]
for j in range(i + 1, len(snapshots)):
if snapshots[j]['mmsi'] in matched:
continue
b = snapshots[j]
dist = haversine_nm(a['lat'], a['lon'], b['lat'], b['lon'])
if dist > 0.5:
continue
if a['sog'] > 2.0 or b['sog'] > 2.0:
continue
fleets[fleet_id] = [
{**a, 'role': 'LEADER', 'pattern': 'TRANSSHIP'},
{**b, 'role': 'MEMBER', 'pattern': 'TRANSSHIP'},
]
matched.add(a['mmsi'])
matched.add(b['mmsi'])
fleet_id += 1
break
# 3차: PS 선망 선단 탐지 (3~10척, 2NM 이내 클러스터)
unmatched = [s for s in snapshots if s['mmsi'] not in matched]
for anchor in unmatched:
if anchor['mmsi'] in matched:
continue
nearby = []
for other in unmatched:
if other['mmsi'] == anchor['mmsi'] or other['mmsi'] in matched:
continue
dist = haversine_nm(anchor['lat'], anchor['lon'], other['lat'], other['lon'])
if dist <= 2.0:
nearby.append(other)
if len(nearby) < 2: # 본인 포함 3척 이상
continue
# 역할 분류: 고속(모선), 정지(조명선), 나머지(멤버)
members = [{**anchor, 'role': 'LEADER', 'pattern': 'PURSE_SEINE'}]
matched.add(anchor['mmsi'])
for n in nearby[:9]: # 최대 10척
if n['sog'] < 0.5:
role = 'LIGHTING'
else:
role = 'MEMBER'
members.append({**n, 'role': role, 'pattern': 'PURSE_SEINE'})
matched.add(n['mmsi'])
fleets[fleet_id] = members
fleet_id += 1
logger.info('fleet detection: %d fleets found (%d vessels matched)',
len(fleets), len(matched))
return fleets
def assign_fleet_roles(
vessel_dfs: dict[str, pd.DataFrame],
cluster_map: dict[str, int],
) -> dict[str, dict]:
"""선단 역할 할당 — 패턴 매칭 기반.
cluster_map은 파이프라인에서 전달되지만, 여기서는 vessel_dfs로 직접 패턴 탐지.
"""
fleets = detect_fleet_patterns(vessel_dfs)
results: dict[str, dict] = {}
# 매칭된 선박 (fleet_id를 cluster_id로 사용)
fleet_mmsis: set[str] = set()
for fid, members in fleets.items():
for m in members:
fleet_mmsis.add(m['mmsi'])
results[m['mmsi']] = {
'cluster_id': fid,
'cluster_size': len(members),
'is_leader': m['role'] == 'LEADER',
'fleet_role': m['role'],
}
# 매칭 안 된 선박 → NOISE (cluster_id = -1)
for mmsi in vessel_dfs:
if mmsi not in fleet_mmsis:
results[mmsi] = {
'cluster_id': -1,
'cluster_size': 0,
'is_leader': False,
'fleet_role': 'NOISE',
}
return results

파일 보기

@ -0,0 +1,854 @@
"""어구 그룹 다단계 연관성 분석 — 멀티모델 패턴 추적.
Phase 1: default 모델 1개로 동작 (DB에서 is_active=true 모델 로드).
Phase 2: 글로벌 모델 max 5 병렬 실행.
어구 중심 점수 체계:
- 어구 신호 기준 관측 윈도우 (어구 비활성 FREEZE)
- 선박 shadow 추적 (비활성 활성 전환 보너스)
- 적응형 EMA + streak 자기강화
- 퍼센트 기반 무제한 추적 (50%+)
"""
from __future__ import annotations
import logging
import math
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Optional
from algorithms.polygon_builder import _get_time_bucket_age
from config import qualified_table
logger = logging.getLogger(__name__)
# ── 상수 ──────────────────────────────────────────────────────────
_EARTH_RADIUS_NM = 3440.065
_NM_TO_M = 1852.0
CORRELATION_PARAM_MODELS = qualified_table('correlation_param_models')
GEAR_CORRELATION_SCORES = qualified_table('gear_correlation_scores')
GEAR_CORRELATION_RAW_METRICS = qualified_table('gear_correlation_raw_metrics')
# ── 파라미터 모델 ─────────────────────────────────────────────────
@dataclass
class ModelParams:
"""추적 모델의 전체 파라미터셋."""
model_id: int = 1
name: str = 'default'
# EMA
alpha_base: float = 0.30
alpha_min: float = 0.08
alpha_decay_per_streak: float = 0.005
# 임계값
track_threshold: float = 0.50
polygon_threshold: float = 0.70
# 메트릭 가중치 — 어구-선박
w_proximity: float = 0.45
w_visit: float = 0.35
w_activity: float = 0.20
# 메트릭 가중치 — 선박-선박
w_dtw: float = 0.30
w_sog_corr: float = 0.20
w_heading: float = 0.25
w_prox_vv: float = 0.25
# 메트릭 가중치 — 어구-어구
w_prox_persist: float = 0.50
w_drift: float = 0.30
w_signal_sync: float = 0.20
# Freeze 기준
group_quiet_ratio: float = 0.30
normal_gap_hours: float = 1.0
# 감쇠
decay_slow: float = 0.025
decay_fast: float = 0.10
stale_hours: float = 6.0
# Shadow
shadow_stay_bonus: float = 0.10
shadow_return_bonus: float = 0.15
# 거리
candidate_radius_factor: float = 3.0
proximity_threshold_nm: float = 5.0
visit_threshold_nm: float = 5.0
# 야간
night_bonus: float = 1.3
# 장기 감쇠
long_decay_days: float = 7.0
@classmethod
def from_db_row(cls, row: dict) -> ModelParams:
"""DB correlation_param_models 행에서 생성."""
params_json = row.get('params', {})
return cls(
model_id=row['id'],
name=row['name'],
**{k: v for k, v in params_json.items() if hasattr(cls, k)},
)
# ── Haversine 거리 ────────────────────────────────────────────────
def _haversine_nm(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""두 좌표 간 거리 (해리)."""
phi1 = math.radians(lat1)
phi2 = math.radians(lat2)
dphi = math.radians(lat2 - lat1)
dlam = math.radians(lon2 - lon1)
a = math.sin(dphi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(dlam / 2) ** 2
return _EARTH_RADIUS_NM * 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
# ── Freeze 판단 ───────────────────────────────────────────────────
def should_freeze(
gear_group_active_ratio: float,
target_last_observed: Optional[datetime],
now: datetime,
params: ModelParams,
) -> tuple[bool, str]:
"""감쇠 적용 여부 판단. 어구 그룹이 기준."""
# 1. 어구 그룹 비활성 → 비교 불가
if gear_group_active_ratio < params.group_quiet_ratio:
return True, 'GROUP_QUIET'
# 2. 개별 부재가 정상 범위
if target_last_observed is not None:
hours_absent = (now - target_last_observed).total_seconds() / 3600
if hours_absent < params.normal_gap_hours:
return True, 'NORMAL_GAP'
return False, 'ACTIVE'
# ── EMA 업데이트 ──────────────────────────────────────────────────
def update_score(
prev_score: Optional[float],
raw_score: Optional[float],
streak: int,
last_observed: Optional[datetime],
now: datetime,
gear_group_active_ratio: float,
shadow_bonus: float,
params: ModelParams,
) -> tuple[float, int, str]:
"""적응형 EMA 점수 업데이트.
Returns: (new_score, new_streak, state)
"""
# 관측 불가
if raw_score is None:
frz, reason = should_freeze(
gear_group_active_ratio, last_observed, now, params,
)
if frz:
return (prev_score or 0.0), streak, reason
# 실제 이탈 → 감쇠
hours_absent = 0.0
if last_observed is not None:
hours_absent = (now - last_observed).total_seconds() / 3600
decay = params.decay_fast if hours_absent > params.stale_hours else params.decay_slow
return max(0.0, (prev_score or 0.0) - decay), 0, 'SIGNAL_LOSS'
# Shadow 보너스
adjusted = min(1.0, raw_score + shadow_bonus)
# Case 1: 임계값 이상 → streak 보상
if adjusted >= params.track_threshold:
streak += 1
alpha = max(params.alpha_min,
params.alpha_base - streak * params.alpha_decay_per_streak)
if prev_score is None:
return adjusted, streak, 'ACTIVE'
return alpha * adjusted + (1.0 - alpha) * prev_score, streak, 'ACTIVE'
# Case 2: 패턴 이탈
alpha = params.alpha_base
if prev_score is None:
return adjusted, 0, 'PATTERN_DIVERGE'
return alpha * adjusted + (1.0 - alpha) * prev_score, 0, 'PATTERN_DIVERGE'
# ── 어구-선박 메트릭 ──────────────────────────────────────────────
def _compute_gear_vessel_metrics(
gear_center_lat: float,
gear_center_lon: float,
gear_radius_nm: float,
vessel_track: list[dict],
params: ModelParams,
) -> dict:
"""어구 그룹 중심 vs 선박 궤적 메트릭.
vessel_track: [{lat, lon, sog, cog, timestamp}, ...]
"""
if not vessel_track:
return {'proximity_ratio': 0, 'visit_score': 0, 'activity_sync': 0, 'composite': 0}
threshold_nm = max(gear_radius_nm * 2, params.proximity_threshold_nm)
# 1. proximity_ratio — 거리 구간별 차등 점수
_PROX_CLOSE_NM = 2.5
_PROX_NEAR_NM = 5.0
_PROX_FAR_NM = 10.0
prox_total = 0.0
for p in vessel_track:
d = _haversine_nm(gear_center_lat, gear_center_lon, p['lat'], p['lon'])
if d < _PROX_CLOSE_NM:
prox_total += 1.0
elif d < _PROX_NEAR_NM:
prox_total += 0.5
elif d < _PROX_FAR_NM:
prox_total += 0.15
proximity_ratio = prox_total / len(vessel_track)
# 2. visit_score — 방문 패턴 (3NM 임계, 8회 기준)
_VISIT_THRESHOLD_NM = 3.0
_VISIT_MAX = 8.0
in_zone = False
visits = 0
stay_points = 0
consecutive_stay = 0
stay_bonus = 0.0
away_points = 0
for p in vessel_track:
d = _haversine_nm(gear_center_lat, gear_center_lon, p['lat'], p['lon'])
if d < _VISIT_THRESHOLD_NM:
if not in_zone:
visits += 1
in_zone = True
consecutive_stay = 0
stay_points += 1
consecutive_stay += 1
if consecutive_stay >= 3:
stay_bonus += 0.05 # 연속 체류 보너스
else:
in_zone = False
consecutive_stay = 0
away_points += 1
visit_count_norm = min(1.0, visits / _VISIT_MAX) if visits > 0 else 0.0
total = stay_points + away_points
stay_ratio = stay_points / total if total > 0 else 0.0
visit_score = min(1.0, 0.5 * visit_count_norm + 0.5 * stay_ratio + stay_bonus)
# 3. activity_sync — 이중 판정 (저속 조업 + 고속 조업)
_MIN_ACTIVITY_POINTS = 6
in_zone_count = 0
activity_total = 0.0
for p in vessel_track:
d = _haversine_nm(gear_center_lat, gear_center_lon, p['lat'], p['lon'])
if d < _PROX_NEAR_NM:
in_zone_count += 1
sog = p.get('sog', 0) or 0
if sog < 3.0:
activity_total += 1.0 # 저속 조업 (정박/어구 관리)
elif sog <= 7.0:
activity_total += 0.6 # 고속 조업 (쌍끌이/예인)
# else: 이동 중 → 0
activity_sync = (activity_total / in_zone_count) if in_zone_count >= _MIN_ACTIVITY_POINTS else 0.0
# 가중 합산
composite = (
params.w_proximity * proximity_ratio
+ params.w_visit * visit_score
+ params.w_activity * activity_sync
)
return {
'proximity_ratio': round(proximity_ratio, 4),
'visit_score': round(visit_score, 4),
'activity_sync': round(activity_sync, 4),
'composite': round(composite, 4),
}
# ── 선박-선박 메트릭 ──────────────────────────────────────────────
def _compute_vessel_vessel_metrics(
track_a: list[dict],
track_b: list[dict],
params: ModelParams,
) -> dict:
"""두 선박 궤적 간 메트릭."""
from algorithms.track_similarity import (
compute_heading_coherence,
compute_proximity_ratio,
compute_sog_correlation,
compute_track_similarity,
)
if not track_a or not track_b:
return {
'dtw_similarity': 0, 'speed_correlation': 0,
'heading_coherence': 0, 'proximity_ratio': 0, 'composite': 0,
}
# DTW
pts_a = [(p['lat'], p['lon']) for p in track_a]
pts_b = [(p['lat'], p['lon']) for p in track_b]
dtw_sim = compute_track_similarity(pts_a, pts_b)
# SOG 상관
sog_a = [p.get('sog', 0) for p in track_a]
sog_b = [p.get('sog', 0) for p in track_b]
sog_corr = compute_sog_correlation(sog_a, sog_b)
# COG 동조
cog_a = [p.get('cog', 0) for p in track_a]
cog_b = [p.get('cog', 0) for p in track_b]
heading = compute_heading_coherence(cog_a, cog_b)
# 근접비
prox = compute_proximity_ratio(pts_a, pts_b, params.proximity_threshold_nm)
composite = (
params.w_dtw * dtw_sim
+ params.w_sog_corr * sog_corr
+ params.w_heading * heading
+ params.w_prox_vv * prox
)
return {
'dtw_similarity': round(dtw_sim, 4),
'speed_correlation': round(sog_corr, 4),
'heading_coherence': round(heading, 4),
'proximity_ratio': round(prox, 4),
'composite': round(composite, 4),
}
# ── 어구-어구 메트릭 ──────────────────────────────────────────────
def _compute_gear_gear_metrics(
center_a: tuple[float, float],
center_b: tuple[float, float],
center_history_a: list[dict],
center_history_b: list[dict],
params: ModelParams,
) -> dict:
"""두 어구 그룹 간 메트릭."""
if not center_history_a or not center_history_b:
return {
'proximity_ratio': 0, 'drift_similarity': 0,
'composite': 0,
}
# 1. 근접 지속성 — 현재 중심 간 거리의 안정성
dist_nm = _haversine_nm(center_a[0], center_a[1], center_b[0], center_b[1])
prox_persist = max(0.0, 1.0 - dist_nm / 20.0) # 20NM 이상이면 0
# 2. 표류 유사도 — 중심 이동 벡터 코사인 유사도
drift_sim = 0.0
n = min(len(center_history_a), len(center_history_b))
if n >= 2:
# 마지막 2점으로 이동 벡터 계산
da_lat = center_history_a[-1].get('lat', 0) - center_history_a[-2].get('lat', 0)
da_lon = center_history_a[-1].get('lon', 0) - center_history_a[-2].get('lon', 0)
db_lat = center_history_b[-1].get('lat', 0) - center_history_b[-2].get('lat', 0)
db_lon = center_history_b[-1].get('lon', 0) - center_history_b[-2].get('lon', 0)
dot = da_lat * db_lat + da_lon * db_lon
mag_a = (da_lat ** 2 + da_lon ** 2) ** 0.5
mag_b = (db_lat ** 2 + db_lon ** 2) ** 0.5
if mag_a > 1e-10 and mag_b > 1e-10:
cos_sim = dot / (mag_a * mag_b)
drift_sim = max(0.0, (cos_sim + 1.0) / 2.0)
composite = (
params.w_prox_persist * prox_persist
+ params.w_drift * drift_sim
)
return {
'proximity_ratio': round(prox_persist, 4),
'drift_similarity': round(drift_sim, 4),
'composite': round(composite, 4),
}
# ── Shadow 보너스 계산 ────────────────────────────────────────────
def compute_shadow_bonus(
vessel_positions_during_inactive: list[dict],
last_known_gear_center: tuple[float, float],
group_radius_nm: float,
params: ModelParams,
) -> tuple[float, bool, bool]:
"""어구 비활성 동안 선박이 어구 근처에 머물렀는지 평가.
Returns: (bonus, stayed_nearby, returned_before_resume)
"""
if not vessel_positions_during_inactive or last_known_gear_center is None:
return 0.0, False, False
gc_lat, gc_lon = last_known_gear_center
threshold_nm = max(group_radius_nm * 2, params.proximity_threshold_nm)
# 1. 평균 거리
dists = [
_haversine_nm(gc_lat, gc_lon, p['lat'], p['lon'])
for p in vessel_positions_during_inactive
]
avg_dist = sum(dists) / len(dists)
stayed = avg_dist < threshold_nm
# 2. 마지막 위치가 근처인지 (복귀 판단)
returned = dists[-1] < threshold_nm if dists else False
bonus = 0.0
if stayed:
bonus += params.shadow_stay_bonus
if returned:
bonus += params.shadow_return_bonus
return bonus, stayed, returned
# ── 후보 필터링 ───────────────────────────────────────────────────
def _compute_group_radius(members: list[dict]) -> float:
"""그룹 멤버 간 최대 거리의 절반 (NM)."""
if len(members) < 2:
return 1.0 # 최소 1NM
max_dist = 0.0
for i in range(len(members)):
for j in range(i + 1, len(members)):
d = _haversine_nm(
members[i]['lat'], members[i]['lon'],
members[j]['lat'], members[j]['lon'],
)
if d > max_dist:
max_dist = d
return max(1.0, max_dist / 2.0)
def find_candidates(
gear_center_lat: float,
gear_center_lon: float,
group_radius_nm: float,
group_mmsis: set[str],
all_positions: dict[str, dict],
params: ModelParams,
) -> list[str]:
"""어구 그룹 주변 후보 MMSI 필터링."""
search_radius = group_radius_nm * params.candidate_radius_factor
candidates = []
for mmsi, pos in all_positions.items():
if mmsi in group_mmsis:
continue
d = _haversine_nm(gear_center_lat, gear_center_lon, pos['lat'], pos['lon'])
if d < search_radius:
candidates.append(mmsi)
return candidates
# ── 메인 실행 ─────────────────────────────────────────────────────
def _get_vessel_track(vessel_store, mmsi: str, hours: int = 6) -> list[dict]:
"""vessel_store에서 특정 MMSI의 최근 N시간 궤적 추출 (벡터화)."""
df = vessel_store._tracks.get(mmsi)
if df is None or len(df) == 0:
return []
import pandas as pd
now = datetime.now(timezone.utc)
cutoff = now - pd.Timedelta(hours=hours)
ts_col = df['timestamp']
if hasattr(ts_col.dtype, 'tz') and ts_col.dtype.tz is not None:
mask = ts_col >= pd.Timestamp(cutoff)
else:
mask = ts_col >= pd.Timestamp(cutoff.replace(tzinfo=None))
recent = df.loc[mask]
if recent.empty:
return []
# 벡터화 추출 (iterrows 대신)
lats = recent['lat'].values
lons = recent['lon'].values
sogs = (recent['sog'] if 'sog' in recent.columns
else recent.get('raw_sog', pd.Series(dtype=float))).fillna(0).values
cogs = (recent['cog'] if 'cog' in recent.columns
else pd.Series(0, index=recent.index)).fillna(0).values
timestamps = recent['timestamp'].tolist()
return [
{'lat': float(lats[i]), 'lon': float(lons[i]),
'sog': float(sogs[i]), 'cog': float(cogs[i]), 'timestamp': timestamps[i]}
for i in range(len(lats))
]
def _compute_gear_active_ratio(
gear_members: list[dict],
all_positions: dict[str, dict],
now: datetime,
stale_sec: float = 3600,
) -> float:
"""어구 그룹의 활성 멤버 비율."""
if not gear_members:
return 0.0
active = 0
for m in gear_members:
pos = all_positions.get(m['mmsi'])
if pos is None:
continue
ts = pos.get('timestamp')
if ts is None:
continue
if isinstance(ts, datetime):
last_dt = ts if ts.tzinfo is not None else ts.replace(tzinfo=timezone.utc)
else:
try:
import pandas as pd
last_dt = pd.Timestamp(ts).to_pydatetime()
if last_dt.tzinfo is None:
last_dt = last_dt.replace(tzinfo=timezone.utc)
except Exception:
continue
age = (now - last_dt).total_seconds()
if age < stale_sec:
active += 1
return active / len(gear_members)
def _is_gear_pattern(name: str) -> bool:
"""어구 이름 패턴 판별."""
import re
return bool(re.match(r'^.+_\d+_\d*$', name or ''))
_MAX_CANDIDATES_PER_GROUP = 30 # 후보 수 상한 (성능 보호)
def run_gear_correlation(
vessel_store,
gear_groups: list[dict],
conn,
) -> dict:
"""어구 연관성 분석 메인 실행 (배치 최적화).
Args:
vessel_store: VesselStore 인스턴스
gear_groups: detect_gear_groups() 결과
conn: kcgdb 커넥션
Returns:
{'updated': int, 'models': int, 'raw_inserted': int}
"""
import time as _time
import re as _re
_gear_re = _re.compile(r'^.+_(?=\S*\d)\S+(?:[_ ]\S*)*[_ ]*$|^.+%$|^\d+$')
t0 = _time.time()
now = datetime.now(timezone.utc)
all_positions = vessel_store.get_all_latest_positions()
# 활성 모델 로드
models = _load_active_models(conn)
if not models:
logger.warning('no active correlation models found')
return {'updated': 0, 'models': 0, 'raw_inserted': 0}
# 기존 점수 전체 사전 로드 (건별 쿼리 대신 벌크)
all_scores = _load_all_scores(conn)
raw_batch: list[tuple] = []
score_batch: list[tuple] = []
total_updated = 0
total_raw = 0
processed_keys: set[tuple] = set() # (model_id, parent_name, sub_cluster_id, target_mmsi)
default_params = models[0]
for gear_group in gear_groups:
parent_name = gear_group['parent_name']
sub_cluster_id = gear_group.get('sub_cluster_id', 0)
members = gear_group['members']
if not members:
continue
# 1h 활성 멤버 필터 (center/radius 계산용)
display_members = [
m for m in members
if _get_time_bucket_age(m.get('mmsi'), all_positions, now) <= 3600
]
# fallback: < 2이면 time_bucket 최신 2개 유지
if len(display_members) < 2 and len(members) >= 2:
display_members = sorted(
members,
key=lambda m: _get_time_bucket_age(m.get('mmsi'), all_positions, now),
)[:2]
active_members = display_members if len(display_members) >= 2 else members
# 그룹 중심 + 반경 (1h 활성 멤버 기반)
center_lat = sum(m['lat'] for m in active_members) / len(active_members)
center_lon = sum(m['lon'] for m in active_members) / len(active_members)
group_radius = _compute_group_radius(active_members)
# 어구 활성도
active_ratio = _compute_gear_active_ratio(members, all_positions, now)
# 그룹 멤버 MMSI 셋
group_mmsis = {m['mmsi'] for m in members}
if gear_group.get('parent_mmsi'):
group_mmsis.add(gear_group['parent_mmsi'])
# 후보 필터링 + 수 제한
candidates = find_candidates(
center_lat, center_lon, group_radius,
group_mmsis, all_positions, default_params,
)
if not candidates:
continue
if len(candidates) > _MAX_CANDIDATES_PER_GROUP:
# 가까운 순서로 제한
candidates.sort(key=lambda m: _haversine_nm(
center_lat, center_lon,
all_positions[m]['lat'], all_positions[m]['lon'],
))
candidates = candidates[:_MAX_CANDIDATES_PER_GROUP]
for target_mmsi in candidates:
target_pos = all_positions.get(target_mmsi)
if target_pos is None:
continue
target_name = target_pos.get('name', '')
target_is_gear = bool(_gear_re.match(target_name or ''))
target_type = 'GEAR_BUOY' if target_is_gear else 'VESSEL'
# 메트릭 계산 (어구는 단순 거리, 선박은 track 기반)
if target_is_gear:
d = _haversine_nm(center_lat, center_lon,
target_pos['lat'], target_pos['lon'])
prox = max(0.0, 1.0 - d / 20.0)
metrics = {'proximity_ratio': prox, 'composite': prox}
else:
vessel_track = _get_vessel_track(vessel_store, target_mmsi, hours=6)
metrics = _compute_gear_vessel_metrics(
center_lat, center_lon, group_radius,
vessel_track, default_params,
)
# raw 메트릭 배치 수집
raw_batch.append((
now, parent_name, sub_cluster_id, target_mmsi, target_type, target_name,
metrics.get('proximity_ratio'), metrics.get('visit_score'),
metrics.get('activity_sync'), metrics.get('dtw_similarity'),
metrics.get('speed_correlation'), metrics.get('heading_coherence'),
metrics.get('drift_similarity'), False, False, active_ratio,
))
total_raw += 1
# 모델별 EMA 업데이트
for model in models:
if target_is_gear:
composite = metrics.get('proximity_ratio', 0) * model.w_prox_persist
else:
composite = (
model.w_proximity * (metrics.get('proximity_ratio') or 0)
+ model.w_visit * (metrics.get('visit_score') or 0)
+ model.w_activity * (metrics.get('activity_sync') or 0)
)
# 사전 로드된 점수에서 조회 (DB 쿼리 없음)
score_key = (model.model_id, parent_name, sub_cluster_id, target_mmsi)
prev = all_scores.get(score_key)
prev_score = prev['current_score'] if prev else None
streak = prev['streak_count'] if prev else 0
last_obs = prev['last_observed_at'] if prev else None
new_score, new_streak, state = update_score(
prev_score, composite, streak,
last_obs, now, active_ratio,
0.0, model,
)
processed_keys.add(score_key)
if new_score >= model.track_threshold or prev is not None:
score_batch.append((
model.model_id, parent_name, sub_cluster_id, target_mmsi,
target_type, target_name,
round(new_score, 6), new_streak, state,
now, now, now,
))
total_updated += 1
# ── 반경 밖 이탈 선박 강제 감쇠 ──────────────────────────────────
# all_scores에 기록이 있지만 이번 사이클 후보에서 빠진 항목:
# 선박이 탐색 반경(group_radius × 3)을 완전히 벗어난 경우.
# Freeze 조건 무시하고 decay_fast 적용 → 빠르게 0으로 수렴.
for score_key, prev in all_scores.items():
if score_key in processed_keys:
continue
prev_score = prev['current_score']
if prev_score is None or prev_score <= 0:
continue
model_id, parent_name_s, sub_cluster_id_s, target_mmsi_s = score_key
# 해당 모델의 decay_fast 파라미터 사용
model_params = next((m for m in models if m.model_id == model_id), default_params)
new_score = max(0.0, prev_score - model_params.decay_fast)
score_batch.append((
model_id, parent_name_s, sub_cluster_id_s, target_mmsi_s,
prev.get('target_type', 'VESSEL'), prev.get('target_name', ''),
round(new_score, 6), 0, 'OUT_OF_RANGE',
prev.get('last_observed_at', now), now, now,
))
total_updated += 1
# 배치 DB 저장
_batch_insert_raw(conn, raw_batch)
_batch_upsert_scores(conn, score_batch)
conn.commit()
elapsed = round(_time.time() - t0, 2)
logger.info(
'gear correlation internals: %.2fs, %d groups, %d raw, %d scores, %d models',
elapsed, len(gear_groups), total_raw, total_updated, len(models),
)
return {
'updated': total_updated,
'models': len(models),
'raw_inserted': total_raw,
}
# ── DB 헬퍼 (배치 최적화) ─────────────────────────────────────────
def _load_active_models(conn) -> list[ModelParams]:
"""활성 모델 로드."""
cur = conn.cursor()
try:
cur.execute(
f"SELECT id, name, params FROM {CORRELATION_PARAM_MODELS} "
"WHERE is_active = TRUE ORDER BY is_default DESC, id ASC"
)
rows = cur.fetchall()
models = []
for row in rows:
import json
params = row[2] if isinstance(row[2], dict) else json.loads(row[2])
models.append(ModelParams.from_db_row({
'id': row[0], 'name': row[1], 'params': params,
}))
return models
except Exception as e:
logger.error('failed to load models: %s', e)
return [ModelParams()]
finally:
cur.close()
def _load_all_scores(conn) -> dict[tuple, dict]:
"""모든 점수를 사전 로드. {(model_id, group_key, sub_cluster_id, target_mmsi): {...}}"""
cur = conn.cursor()
try:
cur.execute(
"SELECT model_id, group_key, sub_cluster_id, target_mmsi, "
"current_score, streak_count, last_observed_at, "
"target_type, target_name "
f"FROM {GEAR_CORRELATION_SCORES}"
)
result = {}
for row in cur.fetchall():
key = (row[0], row[1], row[2], row[3])
result[key] = {
'current_score': row[4],
'streak_count': row[5],
'last_observed_at': row[6],
'target_type': row[7],
'target_name': row[8],
}
return result
except Exception as e:
logger.warning('failed to load all scores: %s', e)
return {}
finally:
cur.close()
def _batch_insert_raw(conn, batch: list[tuple]):
"""raw 메트릭 배치 INSERT."""
if not batch:
return
cur = conn.cursor()
try:
from psycopg2.extras import execute_values
execute_values(
cur,
f"""INSERT INTO {GEAR_CORRELATION_RAW_METRICS}
(observed_at, group_key, sub_cluster_id, target_mmsi, target_type, target_name,
proximity_ratio, visit_score, activity_sync,
dtw_similarity, speed_correlation, heading_coherence,
drift_similarity, shadow_stay, shadow_return,
gear_group_active_ratio)
VALUES %s""",
batch,
page_size=500,
)
except Exception as e:
logger.warning('batch insert raw failed: %s', e)
finally:
cur.close()
def _batch_upsert_scores(conn, batch: list[tuple]):
"""점수 배치 UPSERT."""
if not batch:
return
cur = conn.cursor()
try:
from psycopg2.extras import execute_values
execute_values(
cur,
f"""INSERT INTO {GEAR_CORRELATION_SCORES}
(model_id, group_key, sub_cluster_id, target_mmsi, target_type, target_name,
current_score, streak_count, freeze_state,
first_observed_at, last_observed_at, updated_at)
VALUES %s
ON CONFLICT (model_id, group_key, sub_cluster_id, target_mmsi)
DO UPDATE SET
target_type = EXCLUDED.target_type,
target_name = EXCLUDED.target_name,
current_score = EXCLUDED.current_score,
streak_count = EXCLUDED.streak_count,
freeze_state = EXCLUDED.freeze_state,
observation_count = {GEAR_CORRELATION_SCORES}.observation_count + 1,
last_observed_at = EXCLUDED.last_observed_at,
updated_at = EXCLUDED.updated_at""",
batch,
page_size=500,
)
except Exception as e:
logger.warning('batch upsert scores failed: %s', e)
finally:
cur.close()

파일 보기

@ -0,0 +1,19 @@
"""어구 parent name 정규화/필터 규칙."""
from __future__ import annotations
from typing import Optional
_TRACKABLE_PARENT_MIN_LENGTH = 4
_REMOVE_TOKENS = (' ', '_', '-', '%')
def normalize_parent_name(name: Optional[str]) -> str:
value = (name or '').upper().strip()
for token in _REMOVE_TOKENS:
value = value.replace(token, '')
return value
def is_trackable_parent_name(name: Optional[str]) -> bool:
return len(normalize_parent_name(name)) >= _TRACKABLE_PARENT_MIN_LENGTH

파일 보기

@ -0,0 +1,631 @@
"""어구 모선 추론 episode continuity + prior bonus helper."""
from __future__ import annotations
import json
import math
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any, Iterable, Optional
from uuid import uuid4
from config import qualified_table
GEAR_GROUP_EPISODES = qualified_table('gear_group_episodes')
GEAR_GROUP_EPISODE_SNAPSHOTS = qualified_table('gear_group_episode_snapshots')
GEAR_GROUP_PARENT_CANDIDATE_SNAPSHOTS = qualified_table('gear_group_parent_candidate_snapshots')
GEAR_PARENT_LABEL_SESSIONS = qualified_table('gear_parent_label_sessions')
_ACTIVE_EPISODE_WINDOW_HOURS = 6
_EPISODE_PRIOR_WINDOW_HOURS = 24
_LINEAGE_PRIOR_WINDOW_DAYS = 7
_LABEL_PRIOR_WINDOW_DAYS = 30
_CONTINUITY_SCORE_THRESHOLD = 0.45
_MERGE_SCORE_THRESHOLD = 0.35
_CENTER_DISTANCE_THRESHOLD_NM = 12.0
_EPISODE_PRIOR_MAX = 0.05
_LINEAGE_PRIOR_MAX = 0.03
_LABEL_PRIOR_MAX = 0.07
_TOTAL_PRIOR_CAP = 0.10
def _clamp(value: float, floor: float = 0.0, ceil: float = 1.0) -> float:
return max(floor, min(ceil, value))
def _haversine_nm(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
earth_radius_nm = 3440.065
phi1 = math.radians(lat1)
phi2 = math.radians(lat2)
dphi = math.radians(lat2 - lat1)
dlam = math.radians(lon2 - lon1)
a = math.sin(dphi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(dlam / 2) ** 2
return earth_radius_nm * 2 * math.atan2(math.sqrt(a), math.sqrt(max(0.0, 1 - a)))
def _json_list(value: Any) -> list[str]:
if value is None:
return []
if isinstance(value, list):
return [str(item) for item in value if item]
try:
parsed = json.loads(value)
except Exception:
return []
if isinstance(parsed, list):
return [str(item) for item in parsed if item]
return []
@dataclass
class GroupEpisodeInput:
group_key: str
normalized_parent_name: str
sub_cluster_id: int
member_mmsis: list[str]
member_count: int
center_lat: float
center_lon: float
@property
def key(self) -> tuple[str, int]:
return (self.group_key, self.sub_cluster_id)
@dataclass
class EpisodeState:
episode_id: str
lineage_key: str
group_key: str
normalized_parent_name: str
current_sub_cluster_id: int
member_mmsis: list[str]
member_count: int
center_lat: float
center_lon: float
last_snapshot_time: datetime
status: str
@dataclass
class EpisodeAssignment:
group_key: str
sub_cluster_id: int
normalized_parent_name: str
episode_id: str
continuity_source: str
continuity_score: float
split_from_episode_id: Optional[str]
merged_from_episode_ids: list[str]
member_mmsis: list[str]
member_count: int
center_lat: float
center_lon: float
@property
def key(self) -> tuple[str, int]:
return (self.group_key, self.sub_cluster_id)
@dataclass
class EpisodePlan:
assignments: dict[tuple[str, int], EpisodeAssignment]
expired_episode_ids: set[str]
merged_episode_targets: dict[str, str]
def _member_jaccard(left: Iterable[str], right: Iterable[str]) -> tuple[float, int]:
left_set = {item for item in left if item}
right_set = {item for item in right if item}
if not left_set and not right_set:
return 0.0, 0
overlap = len(left_set & right_set)
union = len(left_set | right_set)
return (overlap / union if union else 0.0), overlap
def continuity_score(current: GroupEpisodeInput, previous: EpisodeState) -> tuple[float, int, float]:
jaccard, overlap_count = _member_jaccard(current.member_mmsis, previous.member_mmsis)
distance_nm = _haversine_nm(current.center_lat, current.center_lon, previous.center_lat, previous.center_lon)
center_support = _clamp(1.0 - (distance_nm / _CENTER_DISTANCE_THRESHOLD_NM))
score = _clamp((0.75 * jaccard) + (0.25 * center_support))
return round(score, 6), overlap_count, round(distance_nm, 3)
def load_active_episode_states(conn, lineage_keys: list[str]) -> dict[str, list[EpisodeState]]:
if not lineage_keys:
return {}
cur = conn.cursor()
try:
cur.execute(
f"""
SELECT episode_id, lineage_key, group_key, normalized_parent_name,
current_sub_cluster_id, current_member_mmsis, current_member_count,
ST_Y(current_center_point) AS center_lat,
ST_X(current_center_point) AS center_lon,
last_snapshot_time, status
FROM {GEAR_GROUP_EPISODES}
WHERE lineage_key = ANY(%s)
AND status = 'ACTIVE'
AND last_snapshot_time >= NOW() - (%s * INTERVAL '1 hour')
ORDER BY lineage_key, last_snapshot_time DESC, episode_id ASC
""",
(lineage_keys, _ACTIVE_EPISODE_WINDOW_HOURS),
)
result: dict[str, list[EpisodeState]] = {}
for row in cur.fetchall():
state = EpisodeState(
episode_id=row[0],
lineage_key=row[1],
group_key=row[2],
normalized_parent_name=row[3],
current_sub_cluster_id=int(row[4] or 0),
member_mmsis=_json_list(row[5]),
member_count=int(row[6] or 0),
center_lat=float(row[7] or 0.0),
center_lon=float(row[8] or 0.0),
last_snapshot_time=row[9],
status=row[10],
)
result.setdefault(state.lineage_key, []).append(state)
return result
finally:
cur.close()
def group_to_episode_input(group: dict[str, Any], normalized_parent_name: str) -> GroupEpisodeInput:
members = group.get('members') or []
member_mmsis = sorted({str(member.get('mmsi')) for member in members if member.get('mmsi')})
member_count = len(member_mmsis)
if members:
center_lat = sum(float(member['lat']) for member in members) / len(members)
center_lon = sum(float(member['lon']) for member in members) / len(members)
else:
center_lat = 0.0
center_lon = 0.0
return GroupEpisodeInput(
group_key=group['parent_name'],
normalized_parent_name=normalized_parent_name,
sub_cluster_id=int(group.get('sub_cluster_id', 0)),
member_mmsis=member_mmsis,
member_count=member_count,
center_lat=center_lat,
center_lon=center_lon,
)
def build_episode_plan(
groups: list[GroupEpisodeInput],
previous_by_lineage: dict[str, list[EpisodeState]],
) -> EpisodePlan:
assignments: dict[tuple[str, int], EpisodeAssignment] = {}
expired_episode_ids: set[str] = set()
merged_episode_targets: dict[str, str] = {}
groups_by_lineage: dict[str, list[GroupEpisodeInput]] = {}
for group in groups:
groups_by_lineage.setdefault(group.normalized_parent_name, []).append(group)
for lineage_key, current_groups in groups_by_lineage.items():
previous_groups = previous_by_lineage.get(lineage_key, [])
qualified_matches: dict[tuple[str, int], list[tuple[EpisodeState, float, int, float]]] = {}
prior_to_currents: dict[str, list[tuple[GroupEpisodeInput, float, int, float]]] = {}
for current in current_groups:
for previous in previous_groups:
score, overlap_count, distance_nm = continuity_score(current, previous)
if score >= _CONTINUITY_SCORE_THRESHOLD or (
overlap_count > 0 and distance_nm <= _CENTER_DISTANCE_THRESHOLD_NM
):
qualified_matches.setdefault(current.key, []).append((previous, score, overlap_count, distance_nm))
prior_to_currents.setdefault(previous.episode_id, []).append((current, score, overlap_count, distance_nm))
consumed_previous_ids: set[str] = set()
assigned_current_keys: set[tuple[str, int]] = set()
for current in current_groups:
matches = sorted(
qualified_matches.get(current.key, []),
key=lambda item: (item[1], item[2], -item[3], item[0].last_snapshot_time),
reverse=True,
)
merge_candidates = [
item for item in matches
if item[1] >= _MERGE_SCORE_THRESHOLD
]
if len(merge_candidates) >= 2:
episode_id = f"ep-{uuid4().hex[:12]}"
merged_ids = [item[0].episode_id for item in merge_candidates]
assignments[current.key] = EpisodeAssignment(
group_key=current.group_key,
sub_cluster_id=current.sub_cluster_id,
normalized_parent_name=current.normalized_parent_name,
episode_id=episode_id,
continuity_source='MERGE_NEW',
continuity_score=round(max(item[1] for item in merge_candidates), 6),
split_from_episode_id=None,
merged_from_episode_ids=merged_ids,
member_mmsis=current.member_mmsis,
member_count=current.member_count,
center_lat=current.center_lat,
center_lon=current.center_lon,
)
assigned_current_keys.add(current.key)
for merged_id in merged_ids:
consumed_previous_ids.add(merged_id)
merged_episode_targets[merged_id] = episode_id
previous_ranked = sorted(
previous_groups,
key=lambda item: item.last_snapshot_time,
reverse=True,
)
for previous in previous_ranked:
if previous.episode_id in consumed_previous_ids:
continue
matches = [
item for item in prior_to_currents.get(previous.episode_id, [])
if item[0].key not in assigned_current_keys
]
if not matches:
continue
matches.sort(key=lambda item: (item[1], item[2], -item[3]), reverse=True)
current, score, _, _ = matches[0]
split_candidate_count = len(prior_to_currents.get(previous.episode_id, []))
assignments[current.key] = EpisodeAssignment(
group_key=current.group_key,
sub_cluster_id=current.sub_cluster_id,
normalized_parent_name=current.normalized_parent_name,
episode_id=previous.episode_id,
continuity_source='SPLIT_CONTINUE' if split_candidate_count > 1 else 'CONTINUED',
continuity_score=score,
split_from_episode_id=None,
merged_from_episode_ids=[],
member_mmsis=current.member_mmsis,
member_count=current.member_count,
center_lat=current.center_lat,
center_lon=current.center_lon,
)
assigned_current_keys.add(current.key)
consumed_previous_ids.add(previous.episode_id)
for current in current_groups:
if current.key in assigned_current_keys:
continue
matches = sorted(
qualified_matches.get(current.key, []),
key=lambda item: (item[1], item[2], -item[3], item[0].last_snapshot_time),
reverse=True,
)
split_from_episode_id = None
continuity_source = 'NEW'
continuity_score_value = 0.0
if matches:
best_previous, score, _, _ = matches[0]
split_from_episode_id = best_previous.episode_id
continuity_source = 'SPLIT_NEW'
continuity_score_value = score
assignments[current.key] = EpisodeAssignment(
group_key=current.group_key,
sub_cluster_id=current.sub_cluster_id,
normalized_parent_name=current.normalized_parent_name,
episode_id=f"ep-{uuid4().hex[:12]}",
continuity_source=continuity_source,
continuity_score=continuity_score_value,
split_from_episode_id=split_from_episode_id,
merged_from_episode_ids=[],
member_mmsis=current.member_mmsis,
member_count=current.member_count,
center_lat=current.center_lat,
center_lon=current.center_lon,
)
assigned_current_keys.add(current.key)
current_previous_ids = {assignment.episode_id for assignment in assignments.values() if assignment.normalized_parent_name == lineage_key}
for previous in previous_groups:
if previous.episode_id in merged_episode_targets:
continue
if previous.episode_id not in current_previous_ids:
expired_episode_ids.add(previous.episode_id)
return EpisodePlan(
assignments=assignments,
expired_episode_ids=expired_episode_ids,
merged_episode_targets=merged_episode_targets,
)
def load_episode_prior_stats(conn, episode_ids: list[str]) -> dict[tuple[str, str], dict[str, Any]]:
if not episode_ids:
return {}
cur = conn.cursor()
try:
cur.execute(
f"""
SELECT episode_id, candidate_mmsi,
COUNT(*) AS seen_count,
SUM(CASE WHEN rank = 1 THEN 1 ELSE 0 END) AS top1_count,
AVG(final_score) AS avg_score,
MAX(observed_at) AS last_seen_at
FROM {GEAR_GROUP_PARENT_CANDIDATE_SNAPSHOTS}
WHERE episode_id = ANY(%s)
AND observed_at >= NOW() - (%s * INTERVAL '1 hour')
GROUP BY episode_id, candidate_mmsi
""",
(episode_ids, _EPISODE_PRIOR_WINDOW_HOURS),
)
result: dict[tuple[str, str], dict[str, Any]] = {}
for episode_id, candidate_mmsi, seen_count, top1_count, avg_score, last_seen_at in cur.fetchall():
result[(episode_id, candidate_mmsi)] = {
'seen_count': int(seen_count or 0),
'top1_count': int(top1_count or 0),
'avg_score': float(avg_score or 0.0),
'last_seen_at': last_seen_at,
}
return result
finally:
cur.close()
def load_lineage_prior_stats(conn, lineage_keys: list[str]) -> dict[tuple[str, str], dict[str, Any]]:
if not lineage_keys:
return {}
cur = conn.cursor()
try:
cur.execute(
f"""
SELECT normalized_parent_name, candidate_mmsi,
COUNT(*) AS seen_count,
SUM(CASE WHEN rank = 1 THEN 1 ELSE 0 END) AS top1_count,
SUM(CASE WHEN rank <= 3 THEN 1 ELSE 0 END) AS top3_count,
AVG(final_score) AS avg_score,
MAX(observed_at) AS last_seen_at
FROM {GEAR_GROUP_PARENT_CANDIDATE_SNAPSHOTS}
WHERE normalized_parent_name = ANY(%s)
AND observed_at >= NOW() - (%s * INTERVAL '1 day')
GROUP BY normalized_parent_name, candidate_mmsi
""",
(lineage_keys, _LINEAGE_PRIOR_WINDOW_DAYS),
)
result: dict[tuple[str, str], dict[str, Any]] = {}
for lineage_key, candidate_mmsi, seen_count, top1_count, top3_count, avg_score, last_seen_at in cur.fetchall():
result[(lineage_key, candidate_mmsi)] = {
'seen_count': int(seen_count or 0),
'top1_count': int(top1_count or 0),
'top3_count': int(top3_count or 0),
'avg_score': float(avg_score or 0.0),
'last_seen_at': last_seen_at,
}
return result
finally:
cur.close()
def load_label_prior_stats(conn, lineage_keys: list[str]) -> dict[tuple[str, str], dict[str, Any]]:
if not lineage_keys:
return {}
cur = conn.cursor()
try:
cur.execute(
f"""
SELECT normalized_parent_name, label_parent_mmsi,
COUNT(*) AS session_count,
MAX(active_from) AS last_labeled_at
FROM {GEAR_PARENT_LABEL_SESSIONS}
WHERE normalized_parent_name = ANY(%s)
AND active_from >= NOW() - (%s * INTERVAL '1 day')
GROUP BY normalized_parent_name, label_parent_mmsi
""",
(lineage_keys, _LABEL_PRIOR_WINDOW_DAYS),
)
result: dict[tuple[str, str], dict[str, Any]] = {}
for lineage_key, candidate_mmsi, session_count, last_labeled_at in cur.fetchall():
result[(lineage_key, candidate_mmsi)] = {
'session_count': int(session_count or 0),
'last_labeled_at': last_labeled_at,
}
return result
finally:
cur.close()
def _recency_support(observed_at: Optional[datetime], now: datetime, hours: float) -> float:
if observed_at is None:
return 0.0
if observed_at.tzinfo is None:
observed_at = observed_at.replace(tzinfo=timezone.utc)
delta_hours = max(0.0, (now - observed_at.astimezone(timezone.utc)).total_seconds() / 3600.0)
return _clamp(1.0 - (delta_hours / hours))
def compute_prior_bonus_components(
observed_at: datetime,
normalized_parent_name: str,
episode_id: str,
candidate_mmsi: str,
episode_prior_stats: dict[tuple[str, str], dict[str, Any]],
lineage_prior_stats: dict[tuple[str, str], dict[str, Any]],
label_prior_stats: dict[tuple[str, str], dict[str, Any]],
) -> dict[str, float]:
episode_stats = episode_prior_stats.get((episode_id, candidate_mmsi), {})
lineage_stats = lineage_prior_stats.get((normalized_parent_name, candidate_mmsi), {})
label_stats = label_prior_stats.get((normalized_parent_name, candidate_mmsi), {})
episode_bonus = 0.0
if episode_stats:
episode_bonus = _EPISODE_PRIOR_MAX * (
0.35 * min(1.0, episode_stats.get('seen_count', 0) / 6.0)
+ 0.35 * min(1.0, episode_stats.get('top1_count', 0) / 3.0)
+ 0.15 * _clamp(float(episode_stats.get('avg_score', 0.0)))
+ 0.15 * _recency_support(episode_stats.get('last_seen_at'), observed_at, _EPISODE_PRIOR_WINDOW_HOURS)
)
lineage_bonus = 0.0
if lineage_stats:
lineage_bonus = _LINEAGE_PRIOR_MAX * (
0.30 * min(1.0, lineage_stats.get('seen_count', 0) / 12.0)
+ 0.25 * min(1.0, lineage_stats.get('top3_count', 0) / 6.0)
+ 0.20 * min(1.0, lineage_stats.get('top1_count', 0) / 3.0)
+ 0.15 * _clamp(float(lineage_stats.get('avg_score', 0.0)))
+ 0.10 * _recency_support(lineage_stats.get('last_seen_at'), observed_at, _LINEAGE_PRIOR_WINDOW_DAYS * 24.0)
)
label_bonus = 0.0
if label_stats:
label_bonus = _LABEL_PRIOR_MAX * (
0.70 * min(1.0, label_stats.get('session_count', 0) / 3.0)
+ 0.30 * _recency_support(label_stats.get('last_labeled_at'), observed_at, _LABEL_PRIOR_WINDOW_DAYS * 24.0)
)
total = min(_TOTAL_PRIOR_CAP, episode_bonus + lineage_bonus + label_bonus)
return {
'episodePriorBonus': round(episode_bonus, 6),
'lineagePriorBonus': round(lineage_bonus, 6),
'labelPriorBonus': round(label_bonus, 6),
'priorBonusTotal': round(total, 6),
}
def sync_episode_states(conn, observed_at: datetime, plan: EpisodePlan) -> None:
cur = conn.cursor()
try:
if plan.expired_episode_ids:
cur.execute(
f"""
UPDATE {GEAR_GROUP_EPISODES}
SET status = 'EXPIRED',
updated_at = %s
WHERE episode_id = ANY(%s)
""",
(observed_at, list(plan.expired_episode_ids)),
)
for previous_episode_id, merged_into_episode_id in plan.merged_episode_targets.items():
cur.execute(
f"""
UPDATE {GEAR_GROUP_EPISODES}
SET status = 'MERGED',
merged_into_episode_id = %s,
updated_at = %s
WHERE episode_id = %s
""",
(merged_into_episode_id, observed_at, previous_episode_id),
)
for assignment in plan.assignments.values():
cur.execute(
f"""
INSERT INTO {GEAR_GROUP_EPISODES} (
episode_id, lineage_key, group_key, normalized_parent_name,
current_sub_cluster_id, status, continuity_source, continuity_score,
first_seen_at, last_seen_at, last_snapshot_time,
current_member_count, current_member_mmsis, current_center_point,
split_from_episode_id, merged_from_episode_ids, metadata, updated_at
) VALUES (
%s, %s, %s, %s,
%s, 'ACTIVE', %s, %s,
%s, %s, %s,
%s, %s::jsonb, ST_SetSRID(ST_MakePoint(%s, %s), 4326),
%s, %s::jsonb, '{{}}'::jsonb, %s
)
ON CONFLICT (episode_id)
DO UPDATE SET
group_key = EXCLUDED.group_key,
normalized_parent_name = EXCLUDED.normalized_parent_name,
current_sub_cluster_id = EXCLUDED.current_sub_cluster_id,
status = 'ACTIVE',
continuity_source = EXCLUDED.continuity_source,
continuity_score = EXCLUDED.continuity_score,
last_seen_at = EXCLUDED.last_seen_at,
last_snapshot_time = EXCLUDED.last_snapshot_time,
current_member_count = EXCLUDED.current_member_count,
current_member_mmsis = EXCLUDED.current_member_mmsis,
current_center_point = EXCLUDED.current_center_point,
split_from_episode_id = COALESCE(EXCLUDED.split_from_episode_id, {GEAR_GROUP_EPISODES}.split_from_episode_id),
merged_from_episode_ids = EXCLUDED.merged_from_episode_ids,
updated_at = EXCLUDED.updated_at
""",
(
assignment.episode_id,
assignment.normalized_parent_name,
assignment.group_key,
assignment.normalized_parent_name,
assignment.sub_cluster_id,
assignment.continuity_source,
assignment.continuity_score,
observed_at,
observed_at,
observed_at,
assignment.member_count,
json.dumps(assignment.member_mmsis, ensure_ascii=False),
assignment.center_lon,
assignment.center_lat,
assignment.split_from_episode_id,
json.dumps(assignment.merged_from_episode_ids, ensure_ascii=False),
observed_at,
),
)
finally:
cur.close()
def insert_episode_snapshots(
conn,
observed_at: datetime,
plan: EpisodePlan,
snapshot_payloads: dict[tuple[str, int], dict[str, Any]],
) -> int:
if not snapshot_payloads:
return 0
rows: list[tuple[Any, ...]] = []
for key, payload in snapshot_payloads.items():
assignment = plan.assignments.get(key)
if assignment is None:
continue
rows.append((
assignment.episode_id,
assignment.normalized_parent_name,
assignment.group_key,
assignment.normalized_parent_name,
assignment.sub_cluster_id,
observed_at,
assignment.member_count,
json.dumps(assignment.member_mmsis, ensure_ascii=False),
assignment.center_lon,
assignment.center_lat,
assignment.continuity_source,
assignment.continuity_score,
json.dumps(payload.get('parentEpisodeIds') or assignment.merged_from_episode_ids, ensure_ascii=False),
payload.get('topCandidateMmsi'),
payload.get('topCandidateScore'),
payload.get('resolutionStatus'),
json.dumps(payload.get('metadata') or {}, ensure_ascii=False),
))
if not rows:
return 0
cur = conn.cursor()
try:
from psycopg2.extras import execute_values
execute_values(
cur,
f"""
INSERT INTO {GEAR_GROUP_EPISODE_SNAPSHOTS} (
episode_id, lineage_key, group_key, normalized_parent_name, sub_cluster_id,
observed_at, member_count, member_mmsis, center_point,
continuity_source, continuity_score, parent_episode_ids,
top_candidate_mmsi, top_candidate_score, resolution_status, metadata
) VALUES %s
ON CONFLICT (episode_id, observed_at) DO NOTHING
""",
rows,
template="(%s, %s, %s, %s, %s, %s, %s, %s::jsonb, ST_SetSRID(ST_MakePoint(%s, %s), 4326), %s, %s, %s::jsonb, %s, %s, %s, %s::jsonb)",
page_size=200,
)
return len(rows)
finally:
cur.close()

파일 크기가 너무 크기때문에 변경 상태를 표시하지 않습니다. Load Diff

파일 보기

@ -0,0 +1,175 @@
from __future__ import annotations
import json
import math
from pathlib import Path
from typing import List, Optional, Tuple
EARTH_RADIUS_NM = 3440.065
TERRITORIAL_SEA_NM = 12.0
CONTIGUOUS_ZONE_NM = 24.0
_baseline_points: Optional[List[Tuple[float, float]]] = None
_zone_polygons: Optional[list] = None
def _load_baseline() -> List[Tuple[float, float]]:
global _baseline_points
if _baseline_points is not None:
return _baseline_points
path = Path(__file__).parent.parent / 'data' / 'korea_baseline.json'
with open(path, 'r') as f:
data = json.load(f)
_baseline_points = [(p['lat'], p['lon']) for p in data['points']]
return _baseline_points
def haversine_nm(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""두 좌표 간 거리 (해리)."""
R = EARTH_RADIUS_NM
phi1, phi2 = math.radians(lat1), math.radians(lat2)
dphi = math.radians(lat2 - lat1)
dlam = math.radians(lon2 - lon1)
a = math.sin(dphi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(dlam / 2) ** 2
return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
def dist_to_baseline(vessel_lat: float, vessel_lon: float,
baseline_points: Optional[List[Tuple[float, float]]] = None) -> float:
"""선박 좌표에서 기선까지 최소 거리 (NM)."""
if baseline_points is None:
baseline_points = _load_baseline()
min_dist = float('inf')
for bp_lat, bp_lon in baseline_points:
d = haversine_nm(vessel_lat, vessel_lon, bp_lat, bp_lon)
if d < min_dist:
min_dist = d
return min_dist
def _epsg3857_to_wgs84(x: float, y: float) -> Tuple[float, float]:
"""EPSG:3857 (Web Mercator) → WGS84 변환."""
lon = x / (math.pi * 6378137) * 180
lat = math.atan(math.exp(y / 6378137)) * 360 / math.pi - 90
return lat, lon
def _load_zone_polygons() -> list:
"""특정어업수역 ~Ⅳ GeoJSON 로드 + EPSG:3857→WGS84 변환."""
global _zone_polygons
if _zone_polygons is not None:
return _zone_polygons
zone_dir = Path(__file__).parent.parent / 'data' / 'zones'
zones_meta = [
('ZONE_I', '수역Ⅰ(동해)', ['PS', 'FC'], '특정어업수역Ⅰ.json'),
('ZONE_II', '수역Ⅱ(남해)', ['PT', 'OT', 'GN', 'PS', 'FC'], '특정어업수역Ⅱ.json'),
('ZONE_III', '수역Ⅲ(서남해)', ['PT', 'OT', 'GN', 'PS', 'FC'], '특정어업수역Ⅲ.json'),
('ZONE_IV', '수역Ⅳ(서해)', ['GN', 'PS', 'FC'], '특정어업수역Ⅳ.json'),
]
result = []
for zone_id, name, allowed, filename in zones_meta:
filepath = zone_dir / filename
if not filepath.exists():
continue
with open(filepath, 'r') as f:
data = json.load(f)
multi_coords = data['features'][0]['geometry']['coordinates']
wgs84_polys = []
for poly in multi_coords:
wgs84_rings = []
for ring in poly:
wgs84_rings.append([_epsg3857_to_wgs84(x, y) for x, y in ring])
wgs84_polys.append(wgs84_rings)
result.append({
'id': zone_id, 'name': name, 'allowed': allowed,
'polygons': wgs84_polys,
})
_zone_polygons = result
return result
def _point_in_polygon(lat: float, lon: float, ring: list) -> bool:
"""Ray-casting point-in-polygon."""
n = len(ring)
inside = False
j = n - 1
for i in range(n):
yi, xi = ring[i]
yj, xj = ring[j]
if ((yi > lat) != (yj > lat)) and (lon < (xj - xi) * (lat - yi) / (yj - yi) + xi):
inside = not inside
j = i
return inside
def _point_in_multipolygon(lat: float, lon: float, polygons: list) -> bool:
"""MultiPolygon 내 포함 여부 (외곽 링 in + 내곽 링 hole 제외)."""
for poly in polygons:
outer = poly[0]
if _point_in_polygon(lat, lon, outer):
for hole in poly[1:]:
if _point_in_polygon(lat, lon, hole):
return False
return True
return False
def classify_zone(vessel_lat: float, vessel_lon: float) -> dict:
"""선박 위치 수역 분류 — 특정어업수역 ~Ⅳ 폴리곤 기반."""
zones = _load_zone_polygons()
for z in zones:
if _point_in_multipolygon(vessel_lat, vessel_lon, z['polygons']):
dist = dist_to_baseline(vessel_lat, vessel_lon)
return {
'zone': z['id'],
'zone_name': z['name'],
'allowed_gears': z['allowed'],
'dist_from_baseline_nm': round(dist, 2),
'violation': False,
'alert_level': 'WATCH',
}
dist = dist_to_baseline(vessel_lat, vessel_lon)
if dist <= TERRITORIAL_SEA_NM:
return {
'zone': 'TERRITORIAL_SEA',
'dist_from_baseline_nm': round(dist, 2),
'violation': True,
'alert_level': 'CRITICAL',
}
elif dist <= CONTIGUOUS_ZONE_NM:
return {
'zone': 'CONTIGUOUS_ZONE',
'dist_from_baseline_nm': round(dist, 2),
'violation': False,
'alert_level': 'WATCH',
}
else:
return {
'zone': 'EEZ_OR_BEYOND',
'dist_from_baseline_nm': round(dist, 2),
'violation': False,
'alert_level': 'NORMAL',
}
def bd09_to_wgs84(bd_lat: float, bd_lon: float) -> tuple[float, float]:
"""BD-09 좌표계를 WGS84로 변환."""
x = bd_lon - 0.0065
y = bd_lat - 0.006
z = math.sqrt(x ** 2 + y ** 2) - 0.00002 * math.sin(y * 52.35987756)
theta = math.atan2(y, x) - 0.000003 * math.cos(x * 52.35987756)
gcj_lon = z * math.cos(theta)
gcj_lat = z * math.sin(theta)
wgs_lat = gcj_lat - 0.0023
wgs_lon = gcj_lon - 0.0059
return wgs_lat, wgs_lon
def compute_bd09_offset(lat: float, lon: float) -> float:
"""BD09 좌표와 WGS84 좌표 간 오프셋 (미터)."""
wgs_lat, wgs_lon = bd09_to_wgs84(lat, lon)
dist_nm = haversine_nm(lat, lon, wgs_lat, wgs_lon)
return round(dist_nm * 1852.0, 1) # NM to meters

파일 보기

@ -0,0 +1,558 @@
"""선단/어구그룹 폴리곤 생성기.
프론트엔드 FleetClusterLayer.tsx의 어구그룹 탐지 + convexHull/padPolygon 로직을
Python으로 이관한다. Shapely 라이브러리로 폴리곤 생성.
"""
from __future__ import annotations
import logging
import math
import re
from datetime import datetime, timezone
from typing import Optional
from zoneinfo import ZoneInfo
import pandas as pd
from algorithms.gear_name_rules import is_trackable_parent_name
try:
from shapely.geometry import MultiPoint, Point
from shapely import wkt as shapely_wkt
_SHAPELY_AVAILABLE = True
except ImportError:
_SHAPELY_AVAILABLE = False
from algorithms.location import classify_zone
logger = logging.getLogger(__name__)
# 어구 이름 패턴 — _ 필수 (공백만으로는 어구 미판정, fleet_tracker.py와 동일)
GEAR_PATTERN = re.compile(r'^(.+?)_(?=\S*\d)\S+(?:[_ ]\S*)*[_ ]*$|^(\d+)$')
MAX_DIST_DEG = 0.15 # ~10NM
STALE_SEC = 21600 # 6시간 (어구 P75 갭 3.5h, P90 갭 8h 커버) — 그룹 멤버 탐색용
DISPLAY_STALE_SEC = 3600 # 1시간 — 폴리곤 스냅샷 노출 기준 (프론트엔드 초기 로드 minutes=60과 동기화)
# time_bucket(적재시간) 기반 필터링 — AIS 원본 timestamp는 부표 시계 오류로 부정확할 수 있음
FLEET_BUFFER_DEG = 0.02
GEAR_BUFFER_DEG = 0.01
MIN_GEAR_GROUP_SIZE = 2 # 최소 어구 수 (비허가 구역 외)
_KST = ZoneInfo('Asia/Seoul')
def _get_time_bucket_age(mmsi: str, all_positions: dict, now: datetime) -> float:
"""MMSI의 time_bucket 기반 age(초) 반환. 실패 시 inf."""
pos = all_positions.get(mmsi)
tb = pos.get('time_bucket') if pos else None
if tb is None:
return float('inf')
try:
tb_dt = pd.Timestamp(tb)
if tb_dt.tzinfo is None:
tb_dt = tb_dt.tz_localize(_KST).tz_convert(timezone.utc)
return (now - tb_dt.to_pydatetime()).total_seconds()
except Exception:
return float('inf')
# 수역 내 어구 색상, 수역 외 어구 색상
_COLOR_GEAR_IN_ZONE = '#ef4444'
_COLOR_GEAR_OUT_ZONE = '#f97316'
# classify_zone이 수역 내로 판정하는 zone 값 목록
_IN_ZONE_PREFIXES = ('ZONE_',)
def _is_in_zone(zone_info: dict) -> bool:
"""classify_zone 결과가 특정어업수역 내인지 판별."""
zone = zone_info.get('zone', '')
return any(zone.startswith(prefix) for prefix in _IN_ZONE_PREFIXES)
def _cluster_color(seed: int) -> str:
"""프론트 clusterColor(id) 이관 — hsl({(seed * 137) % 360}, 80%, 55%)."""
h = (seed * 137) % 360
return f'hsl({h}, 80%, 55%)'
def compute_area_sq_nm(polygon, center_lat: float) -> float:
"""Shapely Polygon의 면적(degrees²) → 제곱 해리 변환.
1 위도 60 NM, 1 경도 60 * cos(lat) NM
sq_nm = area_deg2 * 60 * 60 * cos(center_lat_rad)
"""
area_deg2 = polygon.area
center_lat_rad = math.radians(center_lat)
sq_nm = area_deg2 * 60.0 * 60.0 * math.cos(center_lat_rad)
return round(sq_nm, 4)
def build_group_polygon(
points: list[tuple[float, float]],
buffer_deg: float,
) -> tuple[Optional[str], Optional[str], float, float, float]:
"""좌표 목록으로 버퍼 폴리곤을 생성한다.
Args:
points: (lon, lat) 좌표 목록 Shapely (x, y) 순서.
buffer_deg: 버퍼 크기().
Returns:
(polygon_wkt, center_wkt, area_sq_nm, center_lat, center_lon)
polygon_wkt/center_wkt: ST_GeomFromText에 사용할 WKT 문자열.
좌표가 없거나 Shapely 미설치 (None, None, 0.0, 0.0, 0.0).
"""
if not _SHAPELY_AVAILABLE:
logger.warning('shapely 미설치 — build_group_polygon 건너뜀')
return None, None, 0.0, 0.0, 0.0
if not points:
return None, None, 0.0, 0.0, 0.0
if len(points) == 1:
geom = Point(points[0]).buffer(buffer_deg)
elif len(points) == 2:
# LineString → buffer로 Polygon 생성
from shapely.geometry import LineString
geom = LineString(points).buffer(buffer_deg)
else:
# 3점 이상 → convex_hull → buffer
geom = MultiPoint(points).convex_hull.buffer(buffer_deg)
# 중심 계산
centroid = geom.centroid
center_lon = centroid.x
center_lat = centroid.y
area_sq_nm = compute_area_sq_nm(geom, center_lat)
polygon_wkt = shapely_wkt.dumps(geom, rounding_precision=6)
center_wkt = f'POINT({center_lon:.6f} {center_lat:.6f})'
return polygon_wkt, center_wkt, area_sq_nm, center_lat, center_lon
def detect_gear_groups(
vessel_store,
now: Optional[datetime] = None,
) -> list[dict]:
"""어구 이름 패턴으로 어구그룹을 탐지한다.
프론트엔드 FleetClusterLayer.tsx gearGroupMap useMemo 로직 이관.
전체 AIS 선박(vessel_store._tracks)에서 어구 패턴을 탐지한다.
Args:
vessel_store: VesselStore get_all_latest_positions() + get_vessel_info().
now: 기준 시각 (None이면 UTC now).
Returns:
[{parent_name, parent_mmsi, members: [{mmsi, name, lat, lon, sog, cog}]}]
"""
if now is None:
now = datetime.now(timezone.utc)
# 전체 선박의 최신 위치 가져오기
all_positions = vessel_store.get_all_latest_positions()
# 선박명 → mmsi 맵 (모선 탐색용, 어구 패턴이 아닌 선박만)
# 정규화 키(공백 제거) + 원본 이름 모두 등록
name_to_mmsi: dict[str, str] = {}
for mmsi, pos in all_positions.items():
name = (pos.get('name') or '').strip()
if name and not GEAR_PATTERN.match(name):
name_to_mmsi[name] = mmsi
name_to_mmsi[name.replace(' ', '')] = mmsi
# parent 이름 정규화 — 공백 제거 후 같은 모선은 하나로 통합
def _normalize_parent(raw: str) -> str:
return raw.replace(' ', '')
# 1단계: 같은 모선명 어구 수집 (60분 이내만, 공백 정규화)
raw_groups: dict[str, list[dict]] = {}
parent_display: dict[str, str] = {} # normalized → 대표 원본 이름
for mmsi, pos in all_positions.items():
name = (pos.get('name') or '').strip()
if not name:
continue
# staleness 체크
ts = pos.get('timestamp')
if ts is not None:
if isinstance(ts, datetime):
last_dt = ts if ts.tzinfo is not None else ts.replace(tzinfo=timezone.utc)
else:
try:
last_dt = pd.Timestamp(ts).to_pydatetime()
if last_dt.tzinfo is None:
last_dt = last_dt.replace(tzinfo=timezone.utc)
except Exception:
continue
age_sec = (now - last_dt).total_seconds()
if age_sec > STALE_SEC:
continue
m = GEAR_PATTERN.match(name)
if not m:
continue
# 한국 국적 선박(MMSI 440/441)은 어구 AIS 미사용 → 제외
if mmsi.startswith('440') or mmsi.startswith('441'):
continue
parent_raw = (m.group(1) or name).strip()
if not is_trackable_parent_name(parent_raw):
continue
parent_key = _normalize_parent(parent_raw)
# 대표 이름: 공백 없는 버전 우선 (더 정규화된 형태)
if parent_key not in parent_display or ' ' not in parent_raw:
parent_display[parent_key] = parent_raw
entry = {
'mmsi': mmsi,
'name': name,
'lat': pos['lat'],
'lon': pos['lon'],
'sog': pos.get('sog', 0),
'cog': pos.get('cog', 0),
'timestamp': ts,
}
raw_groups.setdefault(parent_key, []).append(entry)
# 2단계: 연결 기반 서브 클러스터링 (각 어구가 클러스터 내 최소 1개와 MAX_DIST_DEG 이내)
# 같은 parent 이름이라도 거리가 먼 어구들은 별도 서브그룹으로 분리
results: list[dict] = []
for parent_key, gears in raw_groups.items():
parent_mmsi = name_to_mmsi.get(parent_key)
display_name = parent_display.get(parent_key, parent_key)
if not gears:
continue
# 모선 위치 (있으면 시드 포인트로 활용)
seed_lat: Optional[float] = None
seed_lon: Optional[float] = None
if parent_mmsi and parent_mmsi in all_positions:
p = all_positions[parent_mmsi]
seed_lat, seed_lon = p['lat'], p['lon']
# 연결 기반 클러스터링 (Union-Find 방식)
n = len(gears)
parent_uf = list(range(n))
def find(x: int) -> int:
while parent_uf[x] != x:
parent_uf[x] = parent_uf[parent_uf[x]]
x = parent_uf[x]
return x
def union(a: int, b: int) -> None:
ra, rb = find(a), find(b)
if ra != rb:
parent_uf[ra] = rb
for i in range(n):
for j in range(i + 1, n):
if (abs(gears[i]['lat'] - gears[j]['lat']) <= MAX_DIST_DEG
and abs(gears[i]['lon'] - gears[j]['lon']) <= MAX_DIST_DEG):
union(i, j)
# 클러스터별 그룹화
clusters: dict[int, list[int]] = {}
for i in range(n):
clusters.setdefault(find(i), []).append(i)
# 모선이 있으면 모선과 가장 가까운 클러스터에 연결 (MAX_DIST_DEG 이내만)
seed_cluster_root: Optional[int] = None
if seed_lat is not None and seed_lon is not None:
best_dist = float('inf')
for root, idxs in clusters.items():
for i in idxs:
d = abs(gears[i]['lat'] - seed_lat) + abs(gears[i]['lon'] - seed_lon)
if d < best_dist:
best_dist = d
seed_cluster_root = root
# 모선이 어느 클러스터와도 MAX_DIST_DEG 초과 → 연결하지 않음
if best_dist > MAX_DIST_DEG * 2:
seed_cluster_root = None
# 클러스터마다 서브그룹 생성 (최소 2개 이상이거나 모선 포함)
for ci, (root, idxs) in enumerate(clusters.items()):
has_seed = (root == seed_cluster_root)
if len(idxs) < 2 and not has_seed:
continue
members = [
{'mmsi': gears[i]['mmsi'], 'name': gears[i]['name'],
'lat': gears[i]['lat'], 'lon': gears[i]['lon'],
'sog': gears[i]['sog'], 'cog': gears[i]['cog']}
for i in idxs
]
# group_key는 항상 원본명 유지, 서브클러스터는 별도 ID로 구분
sub_cluster_id = 0 if len(clusters) == 1 else (ci + 1)
sub_mmsi = parent_mmsi if has_seed else None
results.append({
'parent_name': display_name,
'parent_key': parent_key,
'parent_mmsi': sub_mmsi,
'sub_cluster_id': sub_cluster_id,
'members': members,
})
# 3단계: 동일 parent_key 서브그룹 간 근접 병합 (거리 이내 시)
# prefix 기반 병합은 과도한 그룹화 유발 → 동일 키만 병합
def _groups_nearby(a: dict, b: dict) -> bool:
for ma in a['members']:
for mb in b['members']:
if abs(ma['lat'] - mb['lat']) <= MAX_DIST_DEG and abs(ma['lon'] - mb['lon']) <= MAX_DIST_DEG:
return True
return False
merged: list[dict] = []
skip: set[int] = set()
results.sort(key=lambda g: len(g['members']), reverse=True)
for i, big in enumerate(results):
if i in skip:
continue
for j, small in enumerate(results):
if j <= i or j in skip:
continue
# 동일 parent_key만 병합 (prefix 매칭 제거 — 과도한 병합 방지)
if big['parent_key'] == small['parent_key'] and _groups_nearby(big, small):
existing_mmsis = {m['mmsi'] for m in big['members']}
for m in small['members']:
if m['mmsi'] not in existing_mmsis:
big['members'].append(m)
existing_mmsis.add(m['mmsi'])
if not big['parent_mmsi'] and small['parent_mmsi']:
big['parent_mmsi'] = small['parent_mmsi']
big['sub_cluster_id'] = 0 # 병합됨 → 단일 클러스터
skip.add(j)
del big['parent_key']
merged.append(big)
return merged
def build_all_group_snapshots(
vessel_store,
company_vessels: dict[int, list[str]],
companies: dict[int, dict],
) -> list[dict]:
"""선단(FLEET) + 어구그룹(GEAR) 폴리곤 스냅샷을 생성한다.
Shapely 미설치 리스트를 반환한다.
Args:
vessel_store: VesselStore get_all_latest_positions() + get_vessel_info().
company_vessels: {company_id: [mmsi_list]}.
companies: {id: {name_cn, name_en}}.
Returns:
DB INSERT용 dict 목록.
"""
if not _SHAPELY_AVAILABLE:
logger.warning('shapely 미설치 — build_all_group_snapshots 빈 리스트 반환')
return []
now = datetime.now(timezone.utc)
snapshots: list[dict] = []
all_positions = vessel_store.get_all_latest_positions()
# ── FLEET 타입: company_vessels 순회 ──────────────────────────
for company_id, mmsi_list in company_vessels.items():
company_info = companies.get(company_id, {})
group_label = company_info.get('name_cn') or company_info.get('name_en') or str(company_id)
# 각 선박의 최신 좌표 추출
points: list[tuple[float, float]] = []
members: list[dict] = []
for mmsi in mmsi_list:
pos = all_positions.get(mmsi)
if not pos:
continue
lat = pos['lat']
lon = pos['lon']
sog = pos.get('sog', 0)
cog = pos.get('cog', 0)
points.append((lon, lat))
members.append({
'mmsi': mmsi,
'name': pos.get('name', ''),
'lat': lat,
'lon': lon,
'sog': sog,
'cog': cog,
'role': 'LEADER' if mmsi == mmsi_list[0] else 'MEMBER',
'isParent': False,
})
newest_age = min(
(_get_time_bucket_age(m['mmsi'], all_positions, now) for m in members),
default=float('inf'),
)
# 2척 미만 또는 최근 적재가 DISPLAY_STALE_SEC 초과 → 폴리곤 미생성
if len(points) < 2 or newest_age > DISPLAY_STALE_SEC:
continue
polygon_wkt, center_wkt, area_sq_nm, center_lat, center_lon = build_group_polygon(
points, FLEET_BUFFER_DEG
)
snapshots.append({
'group_type': 'FLEET',
'group_key': str(company_id),
'group_label': group_label,
'resolution': '1h',
'snapshot_time': now,
'polygon_wkt': polygon_wkt,
'center_wkt': center_wkt,
'area_sq_nm': area_sq_nm,
'member_count': len(members),
'zone_id': None,
'zone_name': None,
'members': members,
'color': _cluster_color(company_id),
})
# ── GEAR 타입: detect_gear_groups 결과 → 1h/6h 듀얼 스냅샷 ────
gear_groups = detect_gear_groups(vessel_store, now=now)
# parent_name 기준 전체 1h 활성 멤버 합산 (서브클러스터 분리 전)
parent_active_1h: dict[str, int] = {}
for group in gear_groups:
pn = group['parent_name']
cnt = sum(
1 for gm in group['members']
if _get_time_bucket_age(gm.get('mmsi'), all_positions, now) <= DISPLAY_STALE_SEC
)
parent_active_1h[pn] = parent_active_1h.get(pn, 0) + cnt
for group in gear_groups:
parent_name: str = group['parent_name']
parent_mmsi: Optional[str] = group['parent_mmsi']
gear_members: list[dict] = group['members'] # 6h STALE 기반 전체 멤버
if not gear_members:
continue
# ── 1h 활성 멤버 필터 (이 서브클러스터 내) ──
active_members_1h = [
gm for gm in gear_members
if _get_time_bucket_age(gm.get('mmsi'), all_positions, now) <= DISPLAY_STALE_SEC
]
# fallback: 서브클러스터 내 1h < 2이면 time_bucket 최신 2개 유지
display_members_1h = active_members_1h
if len(active_members_1h) < 2 and len(gear_members) >= 2:
sorted_by_age = sorted(
gear_members,
key=lambda gm: _get_time_bucket_age(gm.get('mmsi'), all_positions, now),
)
display_members_1h = sorted_by_age[:2]
# ── 6h 전체 멤버 노출 조건: 최신 적재가 STALE_SEC 이내 ──
newest_age_6h = min(
(_get_time_bucket_age(gm.get('mmsi'), all_positions, now) for gm in gear_members),
default=float('inf'),
)
display_members_6h = gear_members
# ── resolution별 스냅샷 생성 ──
# 1h-fb: parent_name 전체 1h 활성 < 2 → 리플레이/일치율 추적용, 라이브 현황에서 제외
# parent_name 전체 기준으로 판단 (서브클러스터 분리로 개별 멤버가 적어져도 그룹 전체가 활성이면 1h)
res_1h = '1h' if parent_active_1h.get(parent_name, 0) >= 2 else '1h-fb'
for resolution, members_for_snap in [(res_1h, display_members_1h), ('6h', display_members_6h)]:
if len(members_for_snap) < 2:
continue
# 6h: 최신 적재가 STALE_SEC(6h) 초과 시 스킵
if resolution == '6h' and newest_age_6h > STALE_SEC:
continue
# 수역 분류: anchor(모선 or 첫 멤버) 위치 기준
anchor_lat: Optional[float] = None
anchor_lon: Optional[float] = None
if parent_mmsi and parent_mmsi in all_positions:
parent_pos = all_positions[parent_mmsi]
anchor_lat = parent_pos['lat']
anchor_lon = parent_pos['lon']
if anchor_lat is None and members_for_snap:
anchor_lat = members_for_snap[0]['lat']
anchor_lon = members_for_snap[0]['lon']
if anchor_lat is None:
continue
zone_info = classify_zone(float(anchor_lat), float(anchor_lon))
in_zone = _is_in_zone(zone_info)
zone_id = zone_info.get('zone') if in_zone else None
zone_name = zone_info.get('zone_name') if in_zone else None
# 비허가(수역 외) 어구: MIN_GEAR_GROUP_SIZE 미만 제외
if not in_zone and len(members_for_snap) < MIN_GEAR_GROUP_SIZE:
continue
# 폴리곤 points: 멤버 좌표 + 모선 좌표 (근접 시에만)
points = [(g['lon'], g['lat']) for g in members_for_snap]
parent_nearby = False
if parent_mmsi and parent_mmsi in all_positions:
parent_pos = all_positions[parent_mmsi]
p_lon, p_lat = parent_pos['lon'], parent_pos['lat']
if any(abs(g['lat'] - p_lat) <= MAX_DIST_DEG * 2
and abs(g['lon'] - p_lon) <= MAX_DIST_DEG * 2 for g in members_for_snap):
if (p_lon, p_lat) not in points:
points.append((p_lon, p_lat))
parent_nearby = True
polygon_wkt, center_wkt, area_sq_nm, _clat, _clon = build_group_polygon(
points, GEAR_BUFFER_DEG
)
# members JSONB 구성
members_out: list[dict] = []
if parent_nearby and parent_mmsi and parent_mmsi in all_positions:
parent_pos = all_positions[parent_mmsi]
members_out.append({
'mmsi': parent_mmsi,
'name': parent_name,
'lat': parent_pos['lat'],
'lon': parent_pos['lon'],
'sog': parent_pos.get('sog', 0),
'cog': parent_pos.get('cog', 0),
'role': 'PARENT',
'isParent': True,
})
for g in members_for_snap:
members_out.append({
'mmsi': g['mmsi'],
'name': g['name'],
'lat': g['lat'],
'lon': g['lon'],
'sog': g['sog'],
'cog': g['cog'],
'role': 'GEAR',
'isParent': False,
})
color = _COLOR_GEAR_IN_ZONE if in_zone else _COLOR_GEAR_OUT_ZONE
snapshots.append({
'group_type': 'GEAR_IN_ZONE' if in_zone else 'GEAR_OUT_ZONE',
'group_key': parent_name,
'group_label': parent_name,
'sub_cluster_id': group.get('sub_cluster_id', 0),
'resolution': resolution,
'snapshot_time': now,
'polygon_wkt': polygon_wkt,
'center_wkt': center_wkt,
'area_sq_nm': area_sq_nm,
'member_count': len(members_out),
'zone_id': zone_id,
'zone_name': zone_name,
'members': members_out,
'color': color,
})
return snapshots

파일 보기

@ -0,0 +1,126 @@
from typing import Optional, Tuple
import pandas as pd
from algorithms.location import classify_zone
from algorithms.fishing_pattern import detect_fishing_segments, detect_trawl_uturn
from algorithms.dark_vessel import detect_ais_gaps
from algorithms.spoofing import detect_teleportation
def compute_lightweight_risk_score(
zone_info: dict,
sog: float,
is_permitted: Optional[bool] = None,
) -> Tuple[int, str]:
"""위치·허가 이력 기반 경량 위험도 (파이프라인 미통과 선박용).
compute_vessel_risk_score의 1(위치)+4(허가) 로직과 동일.
Returns: (risk_score, risk_level)
"""
score = 0
# 1. 위치 기반 (최대 40점)
zone = zone_info.get('zone', '')
if zone == 'TERRITORIAL_SEA':
score += 40
elif zone == 'CONTIGUOUS_ZONE':
score += 10
elif zone.startswith('ZONE_'):
if is_permitted is not None and not is_permitted:
score += 25
# 4. 허가 이력 (최대 20점)
if is_permitted is not None and not is_permitted:
score += 20
score = min(score, 100)
if score >= 70:
level = 'CRITICAL'
elif score >= 50:
level = 'HIGH'
elif score >= 30:
level = 'MEDIUM'
else:
level = 'LOW'
return score, level
def compute_vessel_risk_score(
mmsi: str,
df_vessel: pd.DataFrame,
zone_info: Optional[dict] = None,
is_permitted: Optional[bool] = None,
) -> Tuple[int, str]:
"""선박별 종합 위반 위험도 (0~100점).
Returns: (risk_score, risk_level)
"""
if len(df_vessel) == 0:
return 0, 'LOW'
score = 0
# 1. 위치 기반 (최대 40점)
if zone_info is None:
last = df_vessel.iloc[-1]
zone_info = classify_zone(last['lat'], last['lon'])
zone = zone_info.get('zone', '')
if zone == 'TERRITORIAL_SEA':
score += 40
elif zone == 'CONTIGUOUS_ZONE':
score += 10
elif zone.startswith('ZONE_'):
# 특정어업수역 내 — 무허가면 가산
if is_permitted is not None and not is_permitted:
score += 25
# 2. 조업 행위 (최대 30점)
segs = detect_fishing_segments(df_vessel)
ts_fishing = [s for s in segs if s.get('in_territorial_sea')]
if ts_fishing:
score += 20
elif segs:
score += 5
uturn = detect_trawl_uturn(df_vessel)
if uturn.get('trawl_suspected'):
score += 10
# 3. AIS 조작 (최대 35점)
teleports = detect_teleportation(df_vessel)
if teleports:
score += 20
from algorithms.spoofing import count_speed_jumps
jumps = count_speed_jumps(df_vessel)
if jumps >= 3:
score += 10
elif jumps >= 1:
score += 5
gaps = detect_ais_gaps(df_vessel)
critical_gaps = [g for g in gaps if g['gap_min'] >= 60]
if critical_gaps:
score += 15
elif gaps:
score += 5
# 4. 허가 이력 (최대 20점)
if is_permitted is not None and not is_permitted:
score += 20
score = min(score, 100)
if score >= 70:
level = 'CRITICAL'
elif score >= 50:
level = 'HIGH'
elif score >= 30:
level = 'MEDIUM'
else:
level = 'LOW'
return score, level

파일 보기

@ -0,0 +1,82 @@
import pandas as pd
from algorithms.location import haversine_nm, bd09_to_wgs84, compute_bd09_offset # noqa: F401
MAX_FISHING_SPEED_KNOTS = 25.0
def detect_teleportation(df_vessel: pd.DataFrame,
max_speed_knots: float = MAX_FISHING_SPEED_KNOTS) -> list[dict]:
"""연속 AIS 포인트 간 물리적 불가능 이동 탐지."""
if len(df_vessel) < 2:
return []
anomalies = []
records = df_vessel.sort_values('timestamp').to_dict('records')
for i in range(1, len(records)):
prev, curr = records[i - 1], records[i]
dist_nm = haversine_nm(prev['lat'], prev['lon'], curr['lat'], curr['lon'])
dt_hours = (
pd.Timestamp(curr['timestamp']) - pd.Timestamp(prev['timestamp'])
).total_seconds() / 3600
if dt_hours <= 0:
continue
implied_speed = dist_nm / dt_hours
if implied_speed > max_speed_knots:
anomalies.append({
'idx': i,
'dist_nm': round(dist_nm, 2),
'implied_kn': round(implied_speed, 1),
'type': 'TELEPORTATION',
'confidence': 'HIGH' if implied_speed > 50 else 'MED',
})
return anomalies
def count_speed_jumps(df_vessel: pd.DataFrame, threshold_knots: float = 10.0) -> int:
"""연속 SOG 급변 횟수."""
if len(df_vessel) < 2:
return 0
sog = df_vessel['sog'].values
jumps = 0
for i in range(1, len(sog)):
if abs(sog[i] - sog[i - 1]) > threshold_knots:
jumps += 1
return jumps
def compute_spoofing_score(df_vessel: pd.DataFrame) -> float:
"""종합 GPS 스푸핑 점수 (0~1)."""
if len(df_vessel) < 2:
return 0.0
score = 0.0
n = len(df_vessel)
# 순간이동 비율
teleports = detect_teleportation(df_vessel)
if teleports:
score += min(0.4, len(teleports) / n * 10)
# SOG 급변 비율
jumps = count_speed_jumps(df_vessel)
if jumps > 0:
score += min(0.3, jumps / n * 5)
# BD09 오프셋 — 중국 선박(412*)은 좌표계 차이로 항상 ~300m이므로 제외
mmsi_str = str(df_vessel.iloc[0].get('mmsi', '')) if 'mmsi' in df_vessel.columns else ''
if not mmsi_str.startswith('412'):
mid_idx = len(df_vessel) // 2
row = df_vessel.iloc[mid_idx]
offset = compute_bd09_offset(row['lat'], row['lon'])
if offset > 300:
score += 0.3
elif offset > 100:
score += 0.1
return round(min(score, 1.0), 4)

파일 보기

@ -0,0 +1,394 @@
"""궤적 유사도 — 시간 정렬 쌍 비교 + DTW(레거시) 지원."""
import math
from typing import Optional
_MAX_RESAMPLE_POINTS = 50
_TEMPORAL_INTERVAL_MS = 300_000 # 5분
_MAX_GAP_MS = 14_400_000 # 4시간 — 보간 상한 (어구 간헐 수신 허용)
_DECAY_DIST_M = 3000.0 # 지수 감쇠 기준거리 (3km)
_COG_PENALTY_THRESHOLD_DEG = 45.0 # COG 차이 페널티 임계
_COG_PENALTY_FACTOR = 1.5 # COG 페널티 배수
def haversine_m(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""두 좌표 간 거리 (미터)."""
R = 6371000
phi1, phi2 = math.radians(lat1), math.radians(lat2)
dphi = math.radians(lat2 - lat1)
dlam = math.radians(lon2 - lon1)
a = math.sin(dphi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(dlam / 2) ** 2
return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
def _resample(track: list[tuple[float, float]], n: int) -> list[tuple[float, float]]:
"""궤적을 n 포인트로 균등 리샘플링 (선형 보간)."""
if len(track) == 0:
return []
if len(track) == 1:
return [track[0]] * n
if len(track) <= n:
return list(track)
# 누적 거리 계산
cumulative = [0.0]
for i in range(1, len(track)):
d = haversine_m(track[i - 1][0], track[i - 1][1], track[i][0], track[i][1])
cumulative.append(cumulative[-1] + d)
total_dist = cumulative[-1]
if total_dist == 0.0:
return [track[0]] * n
step = total_dist / (n - 1)
result: list[tuple[float, float]] = []
seg = 0
for k in range(n):
target = step * k
# 해당 target 거리에 해당하는 선분 찾기
while seg < len(cumulative) - 2 and cumulative[seg + 1] < target:
seg += 1
seg_len = cumulative[seg + 1] - cumulative[seg]
if seg_len == 0.0:
result.append(track[seg])
else:
t = (target - cumulative[seg]) / seg_len
lat = track[seg][0] + t * (track[seg + 1][0] - track[seg][0])
lon = track[seg][1] + t * (track[seg + 1][1] - track[seg][1])
result.append((lat, lon))
return result
def _dtw_distance(
track_a: list[tuple[float, float]],
track_b: list[tuple[float, float]],
) -> float:
"""두 궤적 간 DTW 거리 (미터 단위 평균 거리)."""
n, m = len(track_a), len(track_b)
if n == 0 or m == 0:
return float('inf')
INF = float('inf')
# 1D 롤링 DP (공간 최적화)
prev = [INF] * (m + 1)
prev[0] = 0.0
# 첫 행 초기화
row = [INF] * (m + 1)
row[0] = INF
dp_prev = [INF] * (m + 1)
dp_curr = [INF] * (m + 1)
dp_prev[0] = 0.0
for j in range(1, m + 1):
dp_prev[j] = INF
for i in range(1, n + 1):
dp_curr[0] = INF
for j in range(1, m + 1):
cost = haversine_m(track_a[i - 1][0], track_a[i - 1][1],
track_b[j - 1][0], track_b[j - 1][1])
min_prev = min(dp_curr[j - 1], dp_prev[j], dp_prev[j - 1])
dp_curr[j] = cost + min_prev
dp_prev, dp_curr = dp_curr, [INF] * (m + 1)
# dp_prev는 마지막으로 계산된 행
total = dp_prev[m]
if total == INF:
return INF
return total / (n + m)
# ── 시간 정렬 리샘플 (v2) ─────────────────────────────────────
def _resample_temporal(
track: list[dict],
interval_ms: int = _TEMPORAL_INTERVAL_MS,
max_gap_ms: int = _MAX_GAP_MS,
) -> list[Optional[dict]]:
"""타임스탬프 기반 등간격 리샘플. 갭 > max_gap_ms인 슬롯은 None.
입력: [{lat, lon, ts(epoch_ms), cog?}, ...] (ts 정렬 필수 아님)
반환: [dict | None, ...] 5 간격 슬롯. None = 보간 불가 구간.
"""
if not track:
return []
sorted_pts = sorted(track, key=lambda p: p['ts'])
if len(sorted_pts) < 2:
return [sorted_pts[0]]
t_start = sorted_pts[0]['ts']
t_end = sorted_pts[-1]['ts']
if t_end <= t_start:
return [sorted_pts[0]]
slots: list[Optional[dict]] = []
seg_idx = 0
# 절대 시간 경계로 정렬 (epoch 기준 interval_ms 배수)
t = (t_start // interval_ms) * interval_ms
while t <= t_end:
# seg_idx를 t가 속하는 구간까지 전진
while seg_idx < len(sorted_pts) - 2 and sorted_pts[seg_idx + 1]['ts'] < t:
seg_idx += 1
p0 = sorted_pts[seg_idx]
p1 = sorted_pts[min(seg_idx + 1, len(sorted_pts) - 1)]
gap = p1['ts'] - p0['ts']
if gap > max_gap_ms or gap <= 0:
# 갭이 너무 크거나 동일 시점 → 보간 불가
if abs(t - p0['ts']) < interval_ms:
slots.append(p0)
else:
slots.append(None)
else:
ratio = (t - p0['ts']) / gap
ratio = max(0.0, min(1.0, ratio))
lat = p0['lat'] + ratio * (p1['lat'] - p0['lat'])
lon = p0['lon'] + ratio * (p1['lon'] - p0['lon'])
cog0 = p0.get('cog')
cog1 = p1.get('cog')
cog = None
if cog0 is not None and cog1 is not None:
# 원형 보간
diff = (cog1 - cog0 + 540) % 360 - 180
cog = (cog0 + ratio * diff) % 360
slots.append({'lat': lat, 'lon': lon, 'ts': t, 'cog': cog})
t += interval_ms
return slots
def _angular_diff(a: float, b: float) -> float:
"""두 각도의 최소 차이 (0~180)."""
diff = abs(a - b) % 360
return min(diff, 360 - diff)
def compute_track_similarity_v2(
track_a: list[dict],
track_b: list[dict],
interval_ms: int = _TEMPORAL_INTERVAL_MS,
max_gap_ms: int = _MAX_GAP_MS,
) -> float:
"""시간 정렬 기반 궤적 유사도 (0~1).
입력: [{lat, lon, ts(epoch_ms), cog?}, ...]
- 5 간격으로 양쪽 리샘플
- 동일 시각 슬롯만 쌍으로 비교
- 거리: haversine + COG 페널티
- 점수: exp(-avg_dist / 3000)
"""
if not track_a or not track_b:
return 0.0
slots_a = _resample_temporal(track_a, interval_ms, max_gap_ms)
slots_b = _resample_temporal(track_b, interval_ms, max_gap_ms)
# 시간 범위 정렬: 공통 구간만 비교
if not slots_a or not slots_b:
return 0.0
first_a = next((s for s in slots_a if s is not None), None)
first_b = next((s for s in slots_b if s is not None), None)
if first_a is None or first_b is None:
return 0.0
# 양쪽의 시작/끝 시간
t_start_a = first_a['ts']
t_start_b = first_b['ts']
t_start = max(t_start_a, t_start_b)
last_a = next((s for s in reversed(slots_a) if s is not None), None)
last_b = next((s for s in reversed(slots_b) if s is not None), None)
if last_a is None or last_b is None:
return 0.0
t_end = min(last_a['ts'], last_b['ts'])
if t_end <= t_start:
return 0.0
# 인덱스 매핑 (각 슬롯의 ts → 슬롯)
map_a: dict[int, dict] = {}
for s in slots_a:
if s is not None:
map_a[s['ts']] = s
map_b: dict[int, dict] = {}
for s in slots_b:
if s is not None:
map_b[s['ts']] = s
total_dist = 0.0
count = 0
t = t_start
while t <= t_end:
# 가장 가까운 슬롯 찾기 (interval 반경 내)
sa = map_a.get(t)
sb = map_b.get(t)
if sa is not None and sb is not None:
dist = haversine_m(sa['lat'], sa['lon'], sb['lat'], sb['lon'])
# COG 페널티
if sa.get('cog') is not None and sb.get('cog') is not None:
cog_diff = _angular_diff(sa['cog'], sb['cog'])
if cog_diff > _COG_PENALTY_THRESHOLD_DEG:
dist *= _COG_PENALTY_FACTOR
total_dist += dist
count += 1
t += interval_ms
if count < 3:
return 0.0
avg_dist = total_dist / count
return math.exp(-avg_dist / _DECAY_DIST_M)
def compute_track_similarity(
track_a: list[tuple[float, float]],
track_b: list[tuple[float, float]],
max_dist_m: float = 10000.0,
) -> float:
"""두 궤적의 DTW 거리 기반 유사도 (0~1).
track이 비어있으면 0.0 반환.
유사할수록 1.0 가까움.
"""
if not track_a or not track_b:
return 0.0
a = _resample(track_a, _MAX_RESAMPLE_POINTS)
b = _resample(track_b, _MAX_RESAMPLE_POINTS)
avg_dist = _dtw_distance(a, b)
if avg_dist == float('inf') or max_dist_m <= 0.0:
return 0.0
similarity = 1.0 - (avg_dist / max_dist_m)
return max(0.0, min(1.0, similarity))
def match_gear_by_track(
gear_tracks: dict[str, list[tuple[float, float]]],
vessel_tracks: dict[str, list[tuple[float, float]]],
threshold: float = 0.6,
) -> list[dict]:
"""어구 궤적을 선단 선박 궤적과 비교하여 매칭.
Args:
gear_tracks: mmsi [(lat, lon), ...] 어구 궤적
vessel_tracks: mmsi [(lat, lon), ...] 선박 궤적
threshold: 유사도 하한 (이상이면 매칭)
Returns:
[{gear_mmsi, vessel_mmsi, similarity, match_method: 'TRACK_SIMILAR'}]
"""
results: list[dict] = []
for gear_mmsi, g_track in gear_tracks.items():
if not g_track:
continue
best_mmsi: str | None = None
best_sim = -1.0
for vessel_mmsi, v_track in vessel_tracks.items():
if not v_track:
continue
sim = compute_track_similarity(g_track, v_track)
if sim > best_sim:
best_sim = sim
best_mmsi = vessel_mmsi
if best_mmsi is not None and best_sim >= threshold:
results.append({
'gear_mmsi': gear_mmsi,
'vessel_mmsi': best_mmsi,
'similarity': best_sim,
'match_method': 'TRACK_SIMILAR',
})
return results
def compute_sog_correlation(
sog_a: list[float],
sog_b: list[float],
) -> float:
"""두 SOG 시계열의 피어슨 상관계수 (0~1 정규화).
시계열 길이가 다르면 짧은 기준으로 자름.
데이터 부족(< 3)이면 0.0 반환.
"""
n = min(len(sog_a), len(sog_b))
if n < 3:
return 0.0
a = sog_a[:n]
b = sog_b[:n]
mean_a = sum(a) / n
mean_b = sum(b) / n
cov = sum((a[i] - mean_a) * (b[i] - mean_b) for i in range(n))
var_a = sum((x - mean_a) ** 2 for x in a)
var_b = sum((x - mean_b) ** 2 for x in b)
denom = (var_a * var_b) ** 0.5
if denom < 1e-12:
return 0.0
corr = cov / denom # -1 ~ 1
return max(0.0, (corr + 1.0) / 2.0) # 0 ~ 1 정규화
def compute_heading_coherence(
cog_a: list[float],
cog_b: list[float],
threshold_deg: float = 30.0,
) -> float:
"""두 COG 시계열의 방향 동조율 (0~1).
angular diff < threshold_deg 비율.
시계열 길이가 다르면 짧은 기준.
데이터 부족(< 3)이면 0.0 반환.
"""
n = min(len(cog_a), len(cog_b))
if n < 3:
return 0.0
coherent = 0
for i in range(n):
diff = abs(cog_a[i] - cog_b[i])
if diff > 180.0:
diff = 360.0 - diff
if diff < threshold_deg:
coherent += 1
return coherent / n
def compute_proximity_ratio(
track_a: list[tuple[float, float]],
track_b: list[tuple[float, float]],
threshold_nm: float = 10.0,
) -> float:
"""두 궤적의 근접 지속비 (0~1).
시간 정렬된 포인트 쌍에서 haversine < threshold_nm 비율.
시계열 길이가 다르면 짧은 기준.
데이터 부족(< 2)이면 0.0 반환.
"""
n = min(len(track_a), len(track_b))
if n < 2:
return 0.0
close = 0
threshold_m = threshold_nm * 1852.0
for i in range(n):
dist = haversine_m(track_a[i][0], track_a[i][1],
track_b[i][0], track_b[i][1])
if dist < threshold_m:
close += 1
return close / n

파일 보기

@ -0,0 +1,234 @@
"""환적(Transshipment) 의심 선박 탐지 — 서버사이드 O(n log n) 구현.
프론트엔드 useKoreaFilters.ts의 O() 근접 탐지를 대체한다.
scipy 미설치 환경을 고려하여 그리드 기반 공간 인덱스를 사용한다.
알고리즘 개요:
1. 후보 선박 필터: sog < 2kn, 선종 (tanker/cargo/fishing), 외국 해안선 제외
2. 그리드 기반 근접 탐지: O(n log n) 분할 + 인접 9 조회
3. pair_history dict로 쌍별 최초 탐지 시각 영속화 (호출 유지)
4. 60 이상 지속 근접 의심 쌍으로 판정
"""
from __future__ import annotations
import logging
import math
from datetime import datetime, timezone
from typing import Optional
import pandas as pd
logger = logging.getLogger(__name__)
# ──────────────────────────────────────────────────────────────
# 상수
# ──────────────────────────────────────────────────────────────
SOG_THRESHOLD_KN = 2.0 # 정박/표류 기준 속도 (노트)
PROXIMITY_DEG = 0.001 # 근접 판정 임계값 (~110m)
SUSPECT_DURATION_MIN = 60 # 의심 판정 최소 지속 시간 (분)
PAIR_EXPIRY_MIN = 120 # pair_history 항목 만료 기준 (분)
# 외국 해안 근접 제외 경계
_CN_LON_MAX = 123.5 # 중국 해안: 경도 < 123.5
_JP_LON_MIN = 130.5 # 일본 해안: 경도 > 130.5
_TSUSHIMA_LAT_MIN = 33.8 # 대마도: 위도 > 33.8 AND 경도 > 129.0
_TSUSHIMA_LON_MIN = 129.0
# 탐지 대상 선종 (소문자 정규화 후 비교)
_CANDIDATE_TYPES: frozenset[str] = frozenset({'tanker', 'cargo', 'fishing'})
# 그리드 셀 크기 = PROXIMITY_DEG (셀 하나 = 근접 임계와 동일 크기)
_GRID_CELL_DEG = PROXIMITY_DEG
# ──────────────────────────────────────────────────────────────
# 내부 헬퍼
# ──────────────────────────────────────────────────────────────
def _is_near_foreign_coast(lat: float, lon: float) -> bool:
"""외국 해안 근처 여부 — 중국/일본/대마도 경계 확인."""
if lon < _CN_LON_MAX:
return True
if lon > _JP_LON_MIN:
return True
if lat > _TSUSHIMA_LAT_MIN and lon > _TSUSHIMA_LON_MIN:
return True
return False
def _cell_key(lat: float, lon: float) -> tuple[int, int]:
"""위도/경도를 그리드 셀 인덱스로 변환."""
return (int(math.floor(lat / _GRID_CELL_DEG)),
int(math.floor(lon / _GRID_CELL_DEG)))
def _build_grid(records: list[dict]) -> dict[tuple[int, int], list[int]]:
"""선박 리스트를 그리드 셀로 분류.
Returns: {(row, col): [record index, ...]}
"""
grid: dict[tuple[int, int], list[int]] = {}
for idx, rec in enumerate(records):
key = _cell_key(rec['lat'], rec['lon'])
if key not in grid:
grid[key] = []
grid[key].append(idx)
return grid
def _within_proximity(a: dict, b: dict) -> bool:
"""두 선박이 PROXIMITY_DEG 이내인지 확인 (위경도 직교 근사)."""
dlat = abs(a['lat'] - b['lat'])
if dlat >= PROXIMITY_DEG:
return False
cos_lat = math.cos(math.radians((a['lat'] + b['lat']) / 2.0))
dlon_scaled = abs(a['lon'] - b['lon']) * cos_lat
return dlon_scaled < PROXIMITY_DEG
def _normalize_type(raw: Optional[str]) -> str:
"""선종 문자열 소문자 정규화."""
if not raw:
return ''
return raw.strip().lower()
def _pair_key(mmsi_a: str, mmsi_b: str) -> tuple[str, str]:
"""MMSI 순서를 정규화하여 중복 쌍 방지."""
return (mmsi_a, mmsi_b) if mmsi_a < mmsi_b else (mmsi_b, mmsi_a)
def _evict_expired_pairs(
pair_history: dict[tuple[str, str], datetime],
now: datetime,
) -> None:
"""PAIR_EXPIRY_MIN 이상 갱신 없는 pair_history 항목 제거."""
expired = [
key for key, first_seen in pair_history.items()
if (now - first_seen).total_seconds() / 60 > PAIR_EXPIRY_MIN
]
for key in expired:
del pair_history[key]
# ──────────────────────────────────────────────────────────────
# 공개 API
# ──────────────────────────────────────────────────────────────
def detect_transshipment(
df: pd.DataFrame,
pair_history: dict[tuple[str, str], datetime],
) -> list[tuple[str, str, int]]:
"""환적 의심 쌍 탐지.
Args:
df: 선박 위치 DataFrame.
필수 컬럼: mmsi, lat, lon, sog
선택 컬럼: ship_type (없으면 전체 선종 허용)
pair_history: 쌍별 최초 탐지 시각을 저장하는 영속 dict.
스케줄러에서 호출 유지하여 전달해야 한다.
: (mmsi_a, mmsi_b) mmsi_a < mmsi_b 정규화 적용.
: 최초 탐지 시각 (UTC datetime, timezone-aware).
Returns:
[(mmsi_a, mmsi_b, duration_minutes), ...] 60 이상 지속된 의심 .
mmsi_a < mmsi_b 정규화 적용.
"""
if df.empty:
return []
required_cols = {'mmsi', 'lat', 'lon', 'sog'}
missing = required_cols - set(df.columns)
if missing:
logger.error('detect_transshipment: missing required columns: %s', missing)
return []
now = datetime.now(timezone.utc)
# ── 1. 후보 선박 필터 ──────────────────────────────────────
has_type_col = 'ship_type' in df.columns
candidate_mask = df['sog'] < SOG_THRESHOLD_KN
if has_type_col:
type_mask = df['ship_type'].apply(_normalize_type).isin(_CANDIDATE_TYPES)
candidate_mask = candidate_mask & type_mask
candidates = df[candidate_mask].copy()
if candidates.empty:
_evict_expired_pairs(pair_history, now)
return []
# 외국 해안 근처 제외
coast_mask = candidates.apply(
lambda row: not _is_near_foreign_coast(row['lat'], row['lon']),
axis=1,
)
candidates = candidates[coast_mask]
if len(candidates) < 2:
_evict_expired_pairs(pair_history, now)
return []
records = candidates[['mmsi', 'lat', 'lon']].to_dict('records')
for rec in records:
rec['mmsi'] = str(rec['mmsi'])
# ── 2. 그리드 기반 근접 쌍 탐지 ──────────────────────────
grid = _build_grid(records)
active_pairs: set[tuple[str, str]] = set()
for (row, col), indices in grid.items():
# 현재 셀 내부 쌍
for i in range(len(indices)):
for j in range(i + 1, len(indices)):
a = records[indices[i]]
b = records[indices[j]]
if _within_proximity(a, b):
active_pairs.add(_pair_key(a['mmsi'], b['mmsi']))
# 인접 셀 (우측 3셀 + 아래 3셀 = 중복 없는 방향성 순회)
for dr, dc in ((0, 1), (1, -1), (1, 0), (1, 1)):
neighbor_key = (row + dr, col + dc)
if neighbor_key not in grid:
continue
for ai in indices:
for bi in grid[neighbor_key]:
a = records[ai]
b = records[bi]
if _within_proximity(a, b):
active_pairs.add(_pair_key(a['mmsi'], b['mmsi']))
# ── 3. pair_history 갱신 ─────────────────────────────────
# 현재 활성 쌍 → 최초 탐지 시각 등록
for pair in active_pairs:
if pair not in pair_history:
pair_history[pair] = now
# 비활성 쌍 → pair_history에서 제거 (다음 접근 시 재시작)
inactive = [key for key in pair_history if key not in active_pairs]
for key in inactive:
del pair_history[key]
# 만료 항목 정리 (비활성 제거 후 잔여 방어용)
_evict_expired_pairs(pair_history, now)
# ── 4. 의심 쌍 판정 ──────────────────────────────────────
suspects: list[tuple[str, str, int]] = []
for pair, first_seen in pair_history.items():
duration_min = int((now - first_seen).total_seconds() / 60)
if duration_min >= SUSPECT_DURATION_MIN:
suspects.append((pair[0], pair[1], duration_min))
if suspects:
logger.info(
'transshipment detection: %d suspect pairs (candidates=%d)',
len(suspects),
len(candidates),
)
return suspects

0
prediction/cache/__init__.py vendored Normal file
파일 보기

463
prediction/cache/vessel_store.py vendored Normal file
파일 보기

@ -0,0 +1,463 @@
import logging
from datetime import datetime, timezone
from typing import Optional
from zoneinfo import ZoneInfo
import numpy as np
_KST = ZoneInfo('Asia/Seoul')
import pandas as pd
from time_bucket import compute_initial_window_start, compute_safe_bucket
logger = logging.getLogger(__name__)
_STATIC_REFRESH_INTERVAL_MIN = 60
_PERMIT_REFRESH_INTERVAL_MIN = 30
_EARTH_RADIUS_NM = 3440.065
_MAX_REASONABLE_SOG = 30.0
_CHINESE_MMSI_PREFIX = '412'
def _compute_sog_cog(df: pd.DataFrame) -> pd.DataFrame:
"""Compute SOG (knots) and COG (degrees) from consecutive lat/lon/timestamp points."""
df = df.sort_values(['mmsi', 'timestamp']).copy()
lat1 = np.radians(df['lat'].values[:-1])
lon1 = np.radians(df['lon'].values[:-1])
lat2 = np.radians(df['lat'].values[1:])
lon2 = np.radians(df['lon'].values[1:])
# Haversine distance (nautical miles)
dlat = lat2 - lat1
dlon = lon2 - lon1
a = np.sin(dlat / 2) ** 2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2) ** 2
dist_nm = _EARTH_RADIUS_NM * 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
# Time difference (hours)
ts = df['timestamp'].values
dt_sec = (ts[1:] - ts[:-1]).astype('timedelta64[s]').astype(float)
dt_hours = dt_sec / 3600.0
dt_hours[dt_hours <= 0] = np.nan
# SOG = dist / time (knots)
computed_sog = dist_nm / dt_hours
computed_sog = np.clip(np.nan_to_num(computed_sog, nan=0.0), 0, _MAX_REASONABLE_SOG)
# COG = bearing (degrees)
x = np.sin(dlon) * np.cos(lat2)
y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(dlon)
bearing = (np.degrees(np.arctan2(x, y)) + 360) % 360
# Append last value (copy from previous)
sog_arr = np.append(computed_sog, computed_sog[-1:] if len(computed_sog) > 0 else [0])
cog_arr = np.append(bearing, bearing[-1:] if len(bearing) > 0 else [0])
# Reset at MMSI boundaries
mmsi_vals = df['mmsi'].values
boundary = np.where(mmsi_vals[:-1] != mmsi_vals[1:])[0]
for idx in boundary:
sog_arr[idx + 1] = df['raw_sog'].iloc[idx + 1] if 'raw_sog' in df.columns else 0
cog_arr[idx + 1] = 0
# Where computed SOG is 0 or NaN, fall back to raw_sog
df['sog'] = sog_arr
if 'raw_sog' in df.columns:
mask = (df['sog'] == 0) | np.isnan(df['sog'])
df.loc[mask, 'sog'] = df.loc[mask, 'raw_sog'].fillna(0)
df['cog'] = cog_arr
return df
class VesselStore:
"""In-memory vessel trajectory store for Korean waters vessel data.
Maintains a 24-hour sliding window of all vessel tracks and supports
incremental 5-minute updates. Chinese vessel (MMSI 412*) filtering
is applied only at analysis target selection time.
"""
def __init__(self) -> None:
self._tracks: dict[str, pd.DataFrame] = {}
self._last_bucket: Optional[datetime] = None
self._static_info: dict[str, dict] = {}
self._permit_set: set[str] = set()
self._static_refreshed_at: Optional[datetime] = None
self._permit_refreshed_at: Optional[datetime] = None
# ------------------------------------------------------------------
# Public load / update methods
# ------------------------------------------------------------------
def load_initial(self, hours: int = 24) -> None:
"""Load all Korean waters vessel data for the past N hours.
Fetches a bulk DataFrame from snpdb, groups by MMSI, and stores
each vessel's track separately. Also triggers static info and
permit registry refresh.
"""
from db import snpdb
logger.info('loading initial vessel tracks (last %dh)...', hours)
try:
df_all = snpdb.fetch_all_tracks(hours)
except Exception as e:
logger.error('fetch_all_tracks failed: %s', e)
return
if df_all.empty:
logger.warning('fetch_all_tracks returned empty DataFrame')
return
# Rename sog column to raw_sog to preserve original AIS-reported speed
if 'sog' in df_all.columns and 'raw_sog' not in df_all.columns:
df_all = df_all.rename(columns={'sog': 'raw_sog'})
self._tracks = {}
for mmsi, group in df_all.groupby('mmsi'):
self._tracks[str(mmsi)] = group.reset_index(drop=True)
# last_bucket 설정 — incremental fetch 시작점
# snpdb time_bucket은 tz-naive KST이므로 UTC 변환하지 않고 그대로 유지
if 'time_bucket' in df_all.columns and not df_all['time_bucket'].dropna().empty:
max_bucket = pd.to_datetime(df_all['time_bucket'].dropna()).max()
if hasattr(max_bucket, 'to_pydatetime'):
max_bucket = max_bucket.to_pydatetime()
if isinstance(max_bucket, datetime) and max_bucket.tzinfo is not None:
max_bucket = max_bucket.replace(tzinfo=None)
self._last_bucket = max_bucket
elif 'timestamp' in df_all.columns and not df_all['timestamp'].dropna().empty:
max_ts = pd.to_datetime(df_all['timestamp'].dropna()).max()
if hasattr(max_ts, 'to_pydatetime'):
max_ts = max_ts.to_pydatetime()
# timestamp는 UTC aware → KST wall-clock naive로 변환
if isinstance(max_ts, datetime) and max_ts.tzinfo is not None:
max_ts = max_ts.astimezone(_KST).replace(tzinfo=None)
self._last_bucket = max_ts
vessel_count = len(self._tracks)
point_count = sum(len(v) for v in self._tracks.values())
logger.info(
'initial load complete: %d vessels, %d total points, last_bucket=%s',
vessel_count,
point_count,
self._last_bucket,
)
self.refresh_static_info()
self.refresh_permit_registry()
def merge_incremental(self, df_new: pd.DataFrame) -> None:
"""Merge a new batch of vessel positions into the in-memory store.
Deduplicates by timestamp within each MMSI and updates _last_bucket.
"""
if df_new.empty:
logger.debug('merge_incremental called with empty DataFrame, skipping')
return
if 'sog' in df_new.columns and 'raw_sog' not in df_new.columns:
df_new = df_new.rename(columns={'sog': 'raw_sog'})
new_buckets: list[datetime] = []
for mmsi, group in df_new.groupby('mmsi'):
mmsi_str = str(mmsi)
if mmsi_str in self._tracks:
combined = pd.concat([self._tracks[mmsi_str], group], ignore_index=True)
combined = combined.sort_values(['timestamp', 'time_bucket'])
combined = combined.drop_duplicates(subset=['timestamp'], keep='last')
self._tracks[mmsi_str] = combined.reset_index(drop=True)
else:
self._tracks[mmsi_str] = group.sort_values(['timestamp', 'time_bucket']).reset_index(drop=True)
if 'time_bucket' in group.columns and not group['time_bucket'].empty:
bucket_vals = pd.to_datetime(group['time_bucket'].dropna())
if not bucket_vals.empty:
new_buckets.append(bucket_vals.max().to_pydatetime())
if new_buckets:
latest = max(new_buckets)
if isinstance(latest, datetime) and latest.tzinfo is not None:
latest = latest.replace(tzinfo=None)
if self._last_bucket is None or latest > self._last_bucket:
self._last_bucket = latest
logger.debug(
'incremental merge done: %d mmsis in batch, store has %d vessels',
df_new['mmsi'].nunique(),
len(self._tracks),
)
def evict_stale(self, hours: int = 24) -> None:
"""Remove track points older than N hours and evict empty MMSI entries."""
import datetime as _dt
safe_bucket = compute_safe_bucket()
cutoff_bucket = compute_initial_window_start(hours, safe_bucket)
now = datetime.now(timezone.utc)
cutoff_aware = now - _dt.timedelta(hours=hours)
cutoff_naive = cutoff_aware.replace(tzinfo=None)
before_total = sum(len(v) for v in self._tracks.values())
evicted_mmsis: list[str] = []
for mmsi in list(self._tracks.keys()):
df = self._tracks[mmsi]
if 'time_bucket' in df.columns and not df['time_bucket'].dropna().empty:
bucket_col = pd.to_datetime(df['time_bucket'], errors='coerce')
mask = bucket_col >= pd.Timestamp(cutoff_bucket)
else:
ts_col = df['timestamp']
# Handle tz-aware and tz-naive timestamps uniformly
if hasattr(ts_col.dtype, 'tz') and ts_col.dtype.tz is not None:
mask = ts_col >= pd.Timestamp(cutoff_aware)
else:
mask = ts_col >= pd.Timestamp(cutoff_naive)
filtered = df[mask].reset_index(drop=True)
if filtered.empty:
del self._tracks[mmsi]
evicted_mmsis.append(mmsi)
else:
self._tracks[mmsi] = filtered
after_total = sum(len(v) for v in self._tracks.values())
logger.info(
'eviction complete: removed %d points, evicted %d mmsis (threshold=%dh, cutoff_bucket=%s)',
before_total - after_total,
len(evicted_mmsis),
hours,
cutoff_bucket,
)
def refresh_static_info(self) -> None:
"""Fetch vessel static info (type, name, dimensions) from snpdb.
Skips refresh if called within the last 60 minutes.
"""
now = datetime.now(timezone.utc)
if self._static_refreshed_at is not None:
elapsed_min = (now - self._static_refreshed_at).total_seconds() / 60
if elapsed_min < _STATIC_REFRESH_INTERVAL_MIN:
logger.debug(
'static info refresh skipped (%.1f min since last refresh)',
elapsed_min,
)
return
if not self._tracks:
logger.debug('no tracks in store, skipping static info refresh')
return
from db import snpdb
mmsi_list = list(self._tracks.keys())
try:
info = snpdb.fetch_static_info(mmsi_list)
self._static_info.update(info)
self._static_refreshed_at = now
logger.info('static info refreshed: %d vessels', len(info))
except Exception as e:
logger.error('fetch_static_info failed: %s', e)
def refresh_permit_registry(self) -> None:
"""Fetch permitted Chinese fishing vessel MMSIs from snpdb.
Skips refresh if called within the last 30 minutes.
"""
now = datetime.now(timezone.utc)
if self._permit_refreshed_at is not None:
elapsed_min = (now - self._permit_refreshed_at).total_seconds() / 60
if elapsed_min < _PERMIT_REFRESH_INTERVAL_MIN:
logger.debug(
'permit registry refresh skipped (%.1f min since last refresh)',
elapsed_min,
)
return
from db import snpdb
try:
mmsis = snpdb.fetch_permit_mmsis()
self._permit_set = set(mmsis)
self._permit_refreshed_at = now
logger.info('permit registry refreshed: %d permitted vessels', len(self._permit_set))
except Exception as e:
logger.error('fetch_permit_mmsis failed: %s', e)
# ------------------------------------------------------------------
# Analysis target selection
# ------------------------------------------------------------------
def select_analysis_targets(self) -> pd.DataFrame:
"""Build a combined DataFrame of Chinese vessel tracks with computed SOG/COG.
Filters to MMSI starting with '412', computes SOG and COG from
consecutive lat/lon/timestamp pairs using the haversine formula,
and falls back to raw_sog where computed values are zero or NaN.
Returns:
DataFrame with columns: mmsi, timestamp, lat, lon, sog, cog
"""
chinese_mmsis = [m for m in self._tracks if m.startswith(_CHINESE_MMSI_PREFIX)]
if not chinese_mmsis:
logger.info('no Chinese vessels (412*) found in store')
return pd.DataFrame(columns=['mmsi', 'timestamp', 'lat', 'lon', 'sog', 'cog'])
frames = [self._tracks[m] for m in chinese_mmsis]
combined = pd.concat(frames, ignore_index=True)
required_cols = {'mmsi', 'timestamp', 'lat', 'lon'}
missing = required_cols - set(combined.columns)
if missing:
logger.error('combined DataFrame missing required columns: %s', missing)
return pd.DataFrame(columns=['mmsi', 'timestamp', 'lat', 'lon', 'sog', 'cog'])
result = _compute_sog_cog(combined)
output_cols = ['mmsi', 'timestamp', 'lat', 'lon', 'sog', 'cog']
available = [c for c in output_cols if c in result.columns]
return result[available].reset_index(drop=True)
# ------------------------------------------------------------------
# Lookup helpers
# ------------------------------------------------------------------
def is_permitted(self, mmsi: str) -> bool:
"""Return True if the given MMSI is in the permitted Chinese fishing vessel registry."""
return mmsi in self._permit_set
def get_vessel_info(self, mmsi: str) -> dict:
"""Return static vessel info dict for the given MMSI, or empty dict if not found."""
return self._static_info.get(mmsi, {})
def get_all_latest_positions(self) -> dict[str, dict]:
"""모든 선박의 최신 위치 반환. {mmsi: {lat, lon, sog, cog, timestamp, name}}
cog는 마지막 2점의 좌표로 bearing 계산."""
import math
result: dict[str, dict] = {}
for mmsi, df in self._tracks.items():
if df is None or len(df) == 0:
continue
last = df.iloc[-1]
info = self._static_info.get(mmsi, {})
# COG: 마지막 2점으로 bearing 계산
cog = 0.0
if len(df) >= 2:
prev = df.iloc[-2]
lat1 = math.radians(float(prev['lat']))
lat2 = math.radians(float(last['lat']))
dlon = math.radians(float(last['lon']) - float(prev['lon']))
x = math.sin(dlon) * math.cos(lat2)
y = math.cos(lat1) * math.sin(lat2) - math.sin(lat1) * math.cos(lat2) * math.cos(dlon)
cog = (math.degrees(math.atan2(x, y)) + 360) % 360
result[mmsi] = {
'lat': float(last['lat']),
'lon': float(last['lon']),
'sog': float(last.get('sog', 0) or last.get('raw_sog', 0) or 0),
'cog': cog,
'timestamp': last.get('timestamp'),
'time_bucket': last.get('time_bucket'),
'name': info.get('name', ''),
}
return result
def get_vessel_tracks(self, mmsis: list[str], hours: int = 24) -> dict[str, list[dict]]:
"""Return track points for given MMSIs within the specified hours window.
Returns dict mapping mmsi to list of {ts, lat, lon, sog, cog} dicts,
sorted by timestamp ascending.
"""
import datetime as _dt
now = datetime.now(timezone.utc)
cutoff_aware = now - _dt.timedelta(hours=hours)
cutoff_naive = cutoff_aware.replace(tzinfo=None)
result: dict[str, list[dict]] = {}
for mmsi in mmsis:
df = self._tracks.get(mmsi)
if df is None or len(df) == 0:
continue
ts_col = df['timestamp']
if hasattr(ts_col.dtype, 'tz') and ts_col.dtype.tz is not None:
mask = ts_col >= pd.Timestamp(cutoff_aware)
else:
mask = ts_col >= pd.Timestamp(cutoff_naive)
filtered = df[mask].sort_values('timestamp')
if filtered.empty:
continue
# Compute SOG/COG for this vessel's track
if len(filtered) >= 2:
track_with_sog = _compute_sog_cog(filtered.copy())
else:
track_with_sog = filtered.copy()
if 'sog' not in track_with_sog.columns:
track_with_sog['sog'] = track_with_sog.get('raw_sog', 0)
if 'cog' not in track_with_sog.columns:
track_with_sog['cog'] = 0
points = []
for _, row in track_with_sog.iterrows():
ts = row['timestamp']
# Convert to epoch ms
if hasattr(ts, 'timestamp'):
epoch_ms = int(ts.timestamp() * 1000)
else:
epoch_ms = int(pd.Timestamp(ts).timestamp() * 1000)
points.append({
'ts': epoch_ms,
'lat': float(row['lat']),
'lon': float(row['lon']),
'sog': float(row.get('sog', 0) or 0),
'cog': float(row.get('cog', 0) or 0),
})
if points:
result[mmsi] = points
return result
def get_chinese_mmsis(self) -> set:
"""Return the set of all Chinese vessel MMSIs (412*) currently in the store."""
return {m for m in self._tracks if m.startswith(_CHINESE_MMSI_PREFIX)}
# ------------------------------------------------------------------
# Properties
# ------------------------------------------------------------------
@property
def last_bucket(self) -> Optional[datetime]:
"""Return the latest time bucket seen across all merged incremental batches."""
return self._last_bucket
# ------------------------------------------------------------------
# Diagnostics
# ------------------------------------------------------------------
def stats(self) -> dict:
"""Return store statistics for health/status reporting."""
total_points = sum(len(v) for v in self._tracks.values())
chinese_count = sum(1 for m in self._tracks if m.startswith(_CHINESE_MMSI_PREFIX))
# Rough memory estimate: each row ~200 bytes across columns
memory_mb = round((total_points * 200) / (1024 * 1024), 2)
return {
'vessels': len(self._tracks),
'points': total_points,
'memory_mb': memory_mb,
'last_bucket': self._last_bucket.isoformat() if self._last_bucket else None,
'targets': chinese_count,
'permitted': len(self._permit_set),
}
# Module-level singleton
vessel_store = VesselStore()

파일 보기

90
prediction/chat/cache.py Normal file
파일 보기

@ -0,0 +1,90 @@
"""Redis 캐시 유틸 — 분석 컨텍스트 + 대화 히스토리."""
import json
import logging
from typing import Optional
import redis
from config import settings
logger = logging.getLogger(__name__)
_redis: Optional[redis.Redis] = None
def _get_redis() -> redis.Redis:
global _redis
if _redis is None:
_redis = redis.Redis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
password=settings.REDIS_PASSWORD or None,
decode_responses=True,
socket_connect_timeout=3,
)
return _redis
# ── 분석 컨텍스트 캐시 (전역, 5분 주기 갱신) ──
CONTEXT_KEY = 'kcg:chat:context'
CONTEXT_TTL = 360 # 6분 (5분 주기 + 1분 버퍼)
def cache_analysis_context(context_dict: dict):
"""스케줄러에서 분석 완료 후 호출 — Redis에 요약 데이터 캐싱."""
try:
r = _get_redis()
r.setex(CONTEXT_KEY, CONTEXT_TTL, json.dumps(context_dict, ensure_ascii=False, default=str))
logger.debug('cached analysis context (%d bytes)', len(json.dumps(context_dict)))
except Exception as e:
logger.warning('failed to cache analysis context: %s', e)
def get_cached_context() -> Optional[dict]:
"""Redis에서 캐시된 분석 컨텍스트 조회."""
try:
r = _get_redis()
data = r.get(CONTEXT_KEY)
return json.loads(data) if data else None
except Exception as e:
logger.warning('failed to read cached context: %s', e)
return None
# ── 대화 히스토리 (계정별, 24h TTL) ──
HISTORY_TTL = 86400 # 24시간
MAX_HISTORY = 50
def save_chat_history(user_id: str, messages: list[dict]):
"""대화 히스토리 저장 (최근 50개 메시지만 유지)."""
try:
r = _get_redis()
key = f'kcg:chat:history:{user_id}'
trimmed = messages[-MAX_HISTORY:]
r.setex(key, HISTORY_TTL, json.dumps(trimmed, ensure_ascii=False))
except Exception as e:
logger.warning('failed to save chat history for %s: %s', user_id, e)
def load_chat_history(user_id: str) -> list[dict]:
"""대화 히스토리 로드."""
try:
r = _get_redis()
data = r.get(f'kcg:chat:history:{user_id}')
return json.loads(data) if data else []
except Exception as e:
logger.warning('failed to load chat history for %s: %s', user_id, e)
return []
def clear_chat_history(user_id: str):
"""대화 히스토리 삭제."""
try:
r = _get_redis()
r.delete(f'kcg:chat:history:{user_id}')
except Exception as e:
logger.warning('failed to clear chat history for %s: %s', user_id, e)

파일 보기

@ -0,0 +1,140 @@
"""vessel_store + kcgdb 분석 데이터 + 도메인 지식을 기반으로 LLM 시스템 프롬프트를 구성."""
import logging
import re
from datetime import datetime, timezone
from chat.cache import get_cached_context
from chat.domain_knowledge import build_compact_prompt
logger = logging.getLogger(__name__)
def _build_realtime_context(ctx: dict) -> str:
"""Redis 캐시 데이터로 실시간 현황 프롬프트 구성 (간소화)."""
stats = ctx.get('vessel_stats', {})
risk = ctx.get('risk_distribution', {})
now = datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M UTC')
return f"""## 현황 ({now})
전체 {stats.get('vessels', 0)}, 중국 {stats.get('chinese', 0)}, 분석완료 {stats.get('targets', 0)}, 허가 {stats.get('permitted', 0)}/906
CRITICAL {risk.get('CRITICAL', 0)} / HIGH {risk.get('HIGH', 0)} / MEDIUM {risk.get('MEDIUM', 0)} / LOW {risk.get('LOW', 0)}
다크 {ctx.get('dark_count', 0)} / 스푸핑 {ctx.get('spoofing_count', 0)} / 환적 {ctx.get('transship_count', 0)}
영해 {risk.get('TERRITORIAL_SEA', 0)} / 접속 {risk.get('CONTIGUOUS_ZONE', 0)} / I {risk.get('ZONE_I', 0)} / II {risk.get('ZONE_II', 0)} / III {risk.get('ZONE_III', 0)} / IV {risk.get('ZONE_IV', 0)} / EEZ {risk.get('EEZ_OR_BEYOND', 0)}
(상세 데이터는 query_vessels 도구로 조회)"""
def _build_fallback_context() -> str:
"""Redis 캐시가 없을 때 vessel_store + kcgdb에서 직접 구성."""
try:
from cache.vessel_store import vessel_store
stats = vessel_store.stats()
from db import kcgdb
summary = kcgdb.fetch_analysis_summary()
top_risk = kcgdb.fetch_recent_high_risk(10)
polygon_summary = kcgdb.fetch_polygon_summary()
ctx = {
'vessel_stats': stats,
'risk_distribution': summary.get('risk_distribution', {}),
'dark_count': summary.get('dark_count', 0),
'spoofing_count': summary.get('spoofing_count', 0),
'transship_count': summary.get('transship_count', 0),
'top_risk_vessels': top_risk,
'polygon_summary': polygon_summary,
}
from chat.cache import cache_analysis_context
cache_analysis_context(ctx)
return _build_realtime_context(ctx)
except Exception as e:
logger.error('fallback context build failed: %s', e)
return '(실시간 데이터를 불러올 수 없습니다. 일반 해양 감시 지식으로 답변합니다.)'
# ── RAG: 사용자 질문에서 MMSI를 추출하여 선박별 상세 컨텍스트 주입 ──
_MMSI_PATTERN = re.compile(r'\b(\d{9})\b')
def _extract_mmsis(text: str) -> list[str]:
"""사용자 메시지에서 9자리 MMSI 추출."""
return _MMSI_PATTERN.findall(text)
def _build_vessel_detail(mmsi: str) -> str:
"""특정 MMSI의 분석 결과를 상세 컨텍스트로 구성 (RAG)."""
try:
from cache.vessel_store import vessel_store
info = vessel_store.get_vessel_info(mmsi)
positions = vessel_store.get_all_latest_positions()
pos = positions.get(mmsi)
from db import kcgdb
high_risk = kcgdb.fetch_recent_high_risk(100)
vessel_data = next((v for v in high_risk if v['mmsi'] == mmsi), None)
if not vessel_data and not pos:
return f'\n(MMSI {mmsi}: 분석 데이터 없음)\n'
lines = [f'\n## 선박 상세: {mmsi}']
if info:
name = info.get('name', 'N/A')
lines.append(f'- 선명: {name}')
if pos:
lines.append(f"- 위치: {pos.get('lat', 'N/A')}°N, {pos.get('lon', 'N/A')}°E")
lines.append(f"- SOG: {pos.get('sog', 'N/A')} knots, COG: {pos.get('cog', 'N/A')}°")
is_permitted = vessel_store.is_permitted(mmsi)
lines.append(f"- 허가 여부: {'허가어선' if is_permitted else '미허가/미등록'}")
if vessel_data:
lines.append(f"- 위험도: {vessel_data.get('risk_score', 'N/A')}점 ({vessel_data.get('risk_level', 'N/A')})")
lines.append(f"- 수역: {vessel_data.get('zone', 'N/A')}")
lines.append(f"- 활동: {vessel_data.get('activity_state', 'N/A')}")
lines.append(f"- 다크베셀: {'Y' if vessel_data.get('is_dark') else 'N'}")
lines.append(f"- 환적 의심: {'Y' if vessel_data.get('is_transship') else 'N'}")
lines.append(f"- 스푸핑 점수: {vessel_data.get('spoofing_score', 0):.2f}")
return '\n'.join(lines)
except Exception as e:
logger.warning('vessel detail build failed for %s: %s', mmsi, e)
return f'\n(MMSI {mmsi}: 상세 조회 실패)\n'
class MaritimeContextBuilder:
"""도메인 지식 + 실시간 데이터 + 선박별 RAG를 결합하여 시스템 프롬프트 구성."""
def build_system_prompt(self, user_message: str = '') -> str:
"""시스템 프롬프트 구성.
구조:
1) 압축 도메인 지식 (~500토큰: 역할+핵심용어+도구목록)
2) 실시간 현황 (Redis 캐시 DB fallback)
3) RAG: 사용자 질문에 포함된 MMSI의 선박별 상세 데이터
상세 도메인 지식은 LLM이 get_knowledge 도구로 필요 조회.
"""
parts = []
# 1) 압축 도메인 지식 (~500토큰)
parts.append(build_compact_prompt())
# 2) 실시간 현황
cached = get_cached_context()
if cached:
parts.append(_build_realtime_context(cached))
else:
parts.append(_build_fallback_context())
# 3) RAG: MMSI 기반 선박 상세
if user_message:
mmsis = _extract_mmsis(user_message)
for mmsi in mmsis[:3]: # 최대 3척
parts.append(_build_vessel_detail(mmsi))
return '\n\n'.join(parts)

파일 보기

@ -0,0 +1,471 @@
"""해양 감시 도메인 전문 지식 — LLM 시스템 프롬프트 보강용.
수집 출처:
- 한중어업협정 (2001.6.30 발효, 한국민족문화대백과사전)
- 해양수산부 한중어업공동위원회 결과 공표
- UNCLOS 해양법협약 (영해/접속수역/EEZ 기준)
- Global Fishing Watch 환적 탐지 기준
- 해양경찰청 불법조업 단속 현황
- MarineTraffic AIS/GNSS 스푸핑 가이드
"""
from config import settings
# ── 역할 정의 ──
ROLE_DEFINITION = """당신은 대한민국 해양경찰청의 **해양상황 분석 AI 어시스턴트**입니다.
Python AI 분석 파이프라인(7단계 + 8 알고리즘) 실시간 결과를 기반으로,
해양 감시 전문가 수준의 분석과 조치 권고를 제공합니다.
당신이 접근하는 데이터:
- 14,000 이상의 AIS 실시간 위치 (24시간 슬라이딩 윈도우)
- 중국 어선(412* MMSI) 대상 AI 분석 결과 (28 필드, 5 주기 갱신)
- 선단/어구 그룹 폴리곤 (Shapely 기반, 5 주기)
- 한중어업협정 허가어선 DB (906 등록)"""
# ── 해양 수역 법적 체계 ──
MARITIME_ZONES = """## 해양 수역 법적 체계 (UNCLOS + 국내법)
| 수역 | 범위 | 법적 지위 | 단속 권한 |
|------|------|----------|----------|
| **영해** (TERRITORIAL_SEA) | 기선~12해리 | 완전한 주권 | 즉시 나포 가능 |
| **접속수역** (CONTIGUOUS_ZONE) | 12~24해리 | 관세·출입국 통제 | 정선·검색 가능 |
| **EEZ** (EEZ_OR_BEYOND) | 24~200해리 | 자원 주권적 권리 | 어업법 적용 |
- 1해리 = 1,852m, 기선은 서해·남해 직선기선, 동해 통상기선
- 서해는 한중 중간선이 200해리 미만이므로 EEZ 경계 미확정
- 독도·울릉도·제주도는 해안에서 12해리
### 특정어업수역 (한중어업협정)
- **수역 I~IV**: 한국 EEZ 중국 허가어선 조업 가능 구역
- **잠정조치수역**: 83,000km², 한중 공동 관리 (북위 37°~32°11')
- **과도수역**: 잠정조치수역 좌우 20해리 (2005.6.30부터 연차 감축)
- 수역 조업 = **불법** (무허가 조업)"""
# ── 한중어업협정 상세 ──
FISHING_AGREEMENT = """## 한중어업협정 상세 (2001.6.30 발효)
### 허가어선 현황 (총 906척)
| 어구코드 | 어구명 | 허가 | 비고 |
|---------|--------|---------|------|
| PT | 쌍끌이 저인망 | 323 (646) | 2 1 운영 |
| GN | 유자망 (길그물) | 200 | |
| PS | 위망 (선망) | 16 | |
| OT | 기선인망 (외끌이) | 13 | 1 단독 |
| FC | 운반선 | 31 | 어획물 운반 전용 |
### 휴어기 (조업 금지 기간)
| 어구 | 기간 | 비고 |
|------|------|------|
| PT (저인망) | 4/16 ~ 10/15 (6개월) | 산란기 보호 |
| OT (외끌이) | 4/16 ~ 10/15 (6개월) | PT와 동일 |
| GN (유자망) | 6/2 ~ 8/31 (3개월) | 하절기 |
### 어구별 조업 속도 기준 (UCAF 판정 참조)
| 어구 | 조업 속도 | 항행 속도 | 판별 기준 |
|------|----------|----------|----------|
| PT/OT (저인망) | 2.5~4.5 knots | 6+ knots | 그물 끌기 |
| GN (유자망) | 0.5~2.0 knots | 5+ knots | 그물 투망/양망 |
| PS (위망) | 1.0~3.0 knots | 7+ knots | 그물 ·양망 |
| TRAP (통발) | 0.5~2.0 knots | 5+ knots | 통발 · |
| LONGLINE (연승) | 1.0~3.0 knots | 6+ knots | ·양승 |
### 2024.5.1 시행 신규 합의사항
- 한국 EEZ 모든 중국어선 **AIS 의무 장착·가동**
- 자망어선: 어구마다 부표/깃대 설치 의무 (30×20cm 표지)
- 위반 : 허가 취소 + 벌금 + 3 이내 재허가 불가"""
# ── 알고리즘 해석 가이드 ──
ALGORITHM_GUIDE = """## AI 분석 알고리즘 해석 가이드 (8개 알고리즘)
### ALGO 01: 위치 분석 (location)
- `zone`: 선박이 현재 위치한 해양 수역
- TERRITORIAL_SEA (영해): **즉각 주의** 외국어선 영해 침범
- CONTIGUOUS_ZONE (접속수역): 감시 강화 필요
- ZONE_I~IV (특정어업수역): 허가 여부 확인 필수
- EEZ_OR_BEYOND: 일반 감시
- `dist_to_baseline_nm`: 기선까지 거리 (NM)
- <12NM: 영해 최고 위험
- 12~24NM: 접속수역 높은 경계
- >24NM: EEZ 이원
### ALGO 02: 활동 패턴 (activity)
- `activity_state`: STATIONARY(정박) / FISHING(조업) / SAILING(항행)
- SOG 1.0 STATIONARY
- SOG 1.0~5.0 FISHING (어구에 따라 다름)
- SOG >5.0 SAILING
- `ucaf_score` (0~1): 어구별 조업속도 매칭률
- >0.7: 높은 확률로 해당 어구 사용
- 0.3~0.7: 불확실
- <0.3: 비매칭 (다른 어구이거나 항행 )
- `ucft_score` (0~1): 조업-항행 구분 신뢰도
- >0.8: 명확히 조업/항행 구분됨
- <0.5: 패턴 불명확
### ALGO 03: 다크베셀 (dark_vessel)
- `is_dark`: AIS 신호 의도적 차단 의심
- `gap_duration_min`: AIS 최장 공백 시간 ()
- 30~60: 경미한 (기술적 원인 가능)
- 60~180: 의심 수준 의도적 차단 가능성
- 180+: **높은 의심** 불법조업 은폐 목적 추정
- 참고: 2024.5.1부터 한국 EEZ 중국어선 AIS 의무화
- AIS 차단 자체가 **협정 위반**
### ALGO 04: GPS 스푸핑 (gps_spoofing)
- `spoofing_score` (0~1): 종합 스푸핑 의심도
- >0.7: **높은 스푸핑 의심** 위치 조작 추정
- 0.3~0.7: 중간 의심
- <0.3: 정상
- `bd09_offset_m`: 바이두(BD-09) 좌표계 오프셋 (미터)
- 중국 선박 특유의 GPS 좌표 변환 오차
- 412* MMSI는 기본 제외 (중국 위성항법 특성)
- `speed_jump_count`: 비현실적 속도 점프 횟수
- 0: 정상
- 1~2: 일시적 GPS 오류 가능
- 3+: **스푸핑 강력 의심** 위치 은폐 목적
### ALGO 05-06: 선단 분석 (fleet/cluster)
- `cluster_id`: 선단 그룹 ID (-1 = 미소속)
- `cluster_size`: 같은 선단 소속 선박
- 2~5: 소규모 선단
- 5~15: 중규모 선단 (일반적)
- 15+: 대규모 선단 조직적 조업
- `fleet_role`: 선단 역할
- LEADER: 선단 지휘선 (이동 경로 결정)
- FOLLOWER: 추종선 (리더 경로 따름)
- PROCESS_VESSEL: 가공선 (어획물 처리)
- FUEL_VESSEL: 급유선
- NOISE: 미분류
### ALGO 07: 위험도 종합 (risk_score)
- 0~100 종합 점수, 4 영역 합산:
- **위치** (최대 40): 영해 =40, 접속수역=10
- **조업 행위** (최대 30): 영해 조업=20, 기타 조업=5, U-turn 패턴=10
- **AIS 조작** (최대 35): 순간이동=20, 장시간 =15, 단시간 =5
- **허가 이력** (최대 20): 미허가 어선=20
- 등급: CRITICAL(70) / HIGH(50) / MEDIUM(30) / LOW(<30)
- 프론트엔드 표시: WATCH=HIGH, MONITOR=MEDIUM, NORMAL=LOW
### ALGO 08: 환적 의심 (transshipment)
- `is_transship_suspect`: 해상 환적 의심 여부
- `transship_pair_mmsi`: 상대 선박 MMSI
- `transship_duration_min`: 접촉 지속 시간 ()
- 탐지 기준 (Global Fishing Watch 참조):
- 선박 500m 이내 접근
- 속도 2노트 미만
- 2시간 이상 지속
- 정박지에서 10km 이상 떨어진 해상"""
# ── 대응 절차 가이드 ──
RESPONSE_GUIDE = """## 위험도별 대응 절차 권고
### CRITICAL (≥70점) — 즉각 대응
1. 해당 선박 위치·항적 실시간 추적
2. 인근 경비함정 긴급 출동 지시
3. VHF 채널 16 경고방송 (한국어+중국어)
4. 정선명령 승선검색 나포
5. 상급기관 즉시 보고
### WATCH/HIGH (≥50점) — 강화 감시
1. 감시 우선순위 상향
2. 항적 지속 추적 (15 간격)
3. 인근 해역 순찰 함정에 정보 공유
4. 위험도 변화 CRITICAL 대응 전환 준비
### MONITOR/MEDIUM (≥30점) — 일반 감시
1. 정기 모니터링 대상 등록
2. 1시간 간격 위치·상태 확인
3. 패턴 변화(조업이동, 군집화 ) 알림
### NORMAL/LOW (<30점) — 기본 감시
1. 시스템 자동 모니터링
2. 일일 요약 보고에 포함
### 불법조업 유형별 조치
| 유형 | 해당 알고리즘 | 즉시 조치 |
|------|-------------|----------|
| 영해 침범 | zone=TERRITORIAL_SEA | 나포 (영해법 위반) |
| 무허가 조업 | is_permitted=False + zone=ZONE_* | 정선·검색 |
| AIS 차단 | is_dark=True, gap>60min | 위치 추적 + 출동 |
| GPS 위치조작 | spoofing_score>0.7 | 실제 위치 특정 출동 |
| 불법 환적 | is_transship_suspect=True | 쌍방 정선·검색 |
| 휴어기 위반 | 어구+날짜 크로스체크 | 정선·어구 확인 |"""
# ── 응답 규칙 ──
RESPONSE_RULES = """## 응답 규칙
- 한국어로 답변
- 데이터 기반 분석 (추측 최소화, 근거 수치 명시)
- 구체적 MMSI, 좌표, 점수, 수역명 제시
- 불법조업 의심 **법적 근거 + 알고리즘 근거 + 조치 권고** 3가지를 함께 제시
- 위험도 등급 언급 점수도 함께 표기 (: "CRITICAL(82점)")
- 마크다운 형식으로 구조화 (, 목록, 강조 활용)
- "~일 수 있습니다" 대신 데이터에 근거한 단정적 분석 제공
- 선박 특정 질문 해당 선박의 모든 알고리즘 결과를 종합 제시"""
# ── DB 스키마 + Tool Calling 가이드 ──
DB_SCHEMA_AND_TOOLS = """## 데이터 조회 도구 (Tool Calling)
사용자 질문에 답하기 위해 실시간 DB 조회가 필요하면, 다음 도구를 호출할 있습니다.
도구 호출 반드시 아래 형식을 사용하세요:
### 사용 가능한 도구
#### 1. query_vessels — 선박 분석 결과 조회
조건에 맞는 선박 목록을 조회합니다.
```json
{"tool": "query_vessels", "params": {"zone": "ZONE_I", "activity": "FISHING", "risk_level": "CRITICAL", "is_dark": true, "limit": 20}}
```
- 모든 파라미터는 선택적 (조합 가능)
- zone : TERRITORIAL_SEA, CONTIGUOUS_ZONE, ZONE_I, ZONE_II, ZONE_III, ZONE_IV, EEZ_OR_BEYOND
- activity : STATIONARY, FISHING, SAILING
- risk_level : CRITICAL, HIGH, MEDIUM, LOW
- is_dark: true/false
- is_transship: true/false
- vessel_type : TRAWL, PURSE, LONGLINE, TRAP, UNKNOWN
- limit: 최대 반환 (기본 20)
#### 2. query_vessel_detail — 특정 선박 상세
```json
{"tool": "query_vessel_detail", "params": {"mmsi": "412236758"}}
```
#### 3. query_fleet_group — 선단/어구 그룹 조회
```json
{"tool": "query_fleet_group", "params": {"group_type": "FLEET", "zone_id": "ZONE_I"}}
```
- group_type: FLEET, GEAR_IN_ZONE, GEAR_OUT_ZONE
#### 4. query_vessel_history — 선박 항적 이력 (snpdb daily)
```json
{"tool": "query_vessel_history", "params": {"mmsi": "412236758", "days": 7}}
```
- 일별 이동거리, 평균/최대 속도, AIS 포인트
- 최대 30일까지 조회
#### 5. query_vessel_static — 선박 정적정보 + 변경 이력 (snpdb)
```json
{"tool": "query_vessel_static", "params": {"mmsi": "412236758", "limit": 10}}
```
- 최신 선명/선종/제원/목적지/상태 + 변경 이력 감지
- 선명·목적지·상태 변경 시점과 이전/이후 표시
### DB 스키마 참조 (쿼리 조합 시 참고)
#### kcg.vessel_analysis_results (5분 주기 갱신, 48시간 보존)
| 컬럼 | 타입 | 예시 |
|------|------|---------|
| mmsi | varchar | '412236758' (중국=412*) |
| timestamp | timestamptz | 분석 시점 |
| vessel_type | varchar | TRAWL/PURSE/LONGLINE/TRAP/UNKNOWN |
| zone | varchar | TERRITORIAL_SEA/CONTIGUOUS_ZONE/ZONE_I~IV/EEZ_OR_BEYOND |
| dist_to_baseline_nm | float | 기선까지 거리(NM) |
| activity_state | varchar | STATIONARY/FISHING/SAILING |
| ucaf_score | float | 0~1 (어구 매칭률) |
| is_dark | boolean | AIS 차단 의심 |
| gap_duration_min | int | AIS 최장 공백() |
| spoofing_score | float | 0~1 |
| risk_score | int | 0~100 |
| risk_level | varchar | CRITICAL(70)/HIGH(50)/MEDIUM(30)/LOW(<30) |
| cluster_id | int | 선단 ID (-1=미소속) |
| cluster_size | int | 선단 규모 |
| fleet_role | varchar | LEADER/FOLLOWER/PROCESS_VESSEL/FUEL_VESSEL/NOISE |
| is_transship_suspect | boolean | 환적 의심 |
| transship_pair_mmsi | varchar | 상대 선박 |
| analyzed_at | timestamptz | WHERE 조건에 사용 (> NOW() - '1 hour') |
- PK: (mmsi, timestamp), 인덱스: mmsi, timestamp DESC
#### kcg.fleet_vessels (허가어선 등록부)
| 컬럼 | 타입 | 설명 |
|------|------|------|
| mmsi | varchar | 매칭된 MMSI (NULL 가능) |
| permit_no | varchar | 허가번호 |
| name_cn | text | 중국어 선명 |
| gear_code | varchar | PT/GN/PS/OT/FC |
| company_id | int | fleet_companies.id |
| tonnage | int | 톤수 |
#### kcg.group_polygon_snapshots (선단/어구 폴리곤, 5분 APPEND, 7일 보존)
| 컬럼 | 타입 | 설명 |
|------|------|------|
| group_type | varchar | FLEET/GEAR_IN_ZONE/GEAR_OUT_ZONE |
| group_key | varchar | 그룹 식별자 |
| group_label | text | 표시 라벨 |
| snapshot_time | timestamptz | 스냅샷 시점 |
| member_count | int | 소속 선박 |
| zone_id | varchar | 수역 ID |
| members | jsonb | [{mmsi, name, lat, lon, sog, cog, ...}] |
### snpdb 테이블 상세 (signal 스키마, 읽기 전용)
#### signal.t_vessel_tracks_5min — 실시간 항적 (5분 집계)
| 컬럼 | 타입 | 설명 |
|------|------|------|
| mmsi | varchar | 선박 ID |
| time_bucket | timestamp | 5 버킷 시점 |
| track_geom | LineStringM | 타임스탬프 포함 궤적 |
| distance_nm | numeric | 이동 거리(NM) |
| avg_speed | numeric | 평균 속도(knots) |
| max_speed | numeric | 최대 속도(knots) |
| point_count | int | AIS 포인트 |
| start_position | jsonb | {lat, lon, sog, cog, timestamp} |
| end_position | jsonb | {lat, lon, sog, cog, timestamp} |
- PK: (mmsi, time_bucket), 인덱스: mmsi, time_bucket
- **일별 파티셔닝**: t_vessel_tracks_5min_YYMMDD (: _260326 = 2026-03-26)
- 하루 850 , vessel_store에 24시간 인메모리 캐시
- **활용**: 최근 시간 ~ 24시간 세밀한 이동 패턴 분석
#### signal.t_vessel_tracks_hourly — 시간별 항적 집계
| 컬럼 | 타입 | 설명 |
|------|------|------|
| mmsi | varchar | 선박 ID |
| time_bucket | timestamp | 1시간 버킷 |
| track_geom | LineStringM | 시간별 궤적 |
| distance_nm | numeric | 시간당 이동 거리 |
| avg_speed | numeric | 평균 속도 |
| max_speed | numeric | 최대 속도 |
| point_count | int | AIS 포인트 |
| start_position | jsonb | 시작 위치 |
| end_position | jsonb | 종료 위치 |
- **월별 파티셔닝**: t_vessel_tracks_hourly_YYYY_MM (: _2026_03)
- 1.2
- **활용**: 수일~수주 단위 이동 경로 추적, 패턴 비교
#### signal.t_vessel_tracks_daily — 일별 항적 요약
| 컬럼 | 타입 | 설명 |
|------|------|------|
| mmsi | varchar | 선박 ID |
| time_bucket | date | 날짜 |
| track_geom | LineStringM | 하루 궤적 |
| distance_nm | numeric | 일일 이동 거리(NM) |
| avg_speed | numeric | 평균 속도 |
| max_speed | numeric | 최대 속도 |
| point_count | int | AIS 포인트 |
| operating_hours | numeric | 운항 시간 |
| port_visits | jsonb | 입출항 기록 |
| start_position | jsonb | 시작 위치 |
| end_position | jsonb | 종료 위치 |
- **월별 파티셔닝**: t_vessel_tracks_daily_YYYY_MM (: _2026_03)
- 800 , **2015 8~현재** 11+ 이력
- **활용**: 장기 행동 패턴, 계절별 어장 이동, 기간 비교 분석
#### signal.t_vessel_static — 선박 정적정보 (1시간 주기 스냅샷)
| 컬럼 | 타입 | 설명 | 예시 |
|------|------|------|---------|
| mmsi | varchar | 선박 ID | '412236758' |
| time_bucket | timestamptz | 스냅샷 시점 (1시간 간격) | |
| imo | bigint | IMO 번호 | |
| name | varchar | 선명 (AIS 브로드캐스트) | 'LU_RONG_YU_55759' |
| callsign | varchar | 호출부호 | |
| vessel_type | varchar | 선종 | Cargo/Tanker/Vessel/Fishing/N/A |
| extra_info | varchar | 추가 정보 | |
| length | int | 선장(m) | |
| width | int | 선폭(m) | |
| draught | float | 흘수(m) | |
| destination | varchar | 목적지 (AIS 입력) | 'PU TIAN' |
| eta | timestamptz | 도착 예정 시각 | |
| status | varchar | 항해 상태 | Under way using engine/Moored/Anchored/Engaged in fishing |
| class_type | varchar | AIS 클래스 | A/B |
- PK: (mmsi, time_bucket)
- **변경 이력 보존**: 동일 MMSI라도 1시간마다 스냅샷 저장. name, destination, status 등이 변경되면 히스토리로 추적 가능
- **활용 예시**:
- 선명 변경 이력 추적 (위장/은폐 탐지)
- 목적지(destination) 변경 패턴 분석
- AIS 상태(status) 시계열 'Engaged in fishing' 'Under way' 전환 빈도
- 선박 제원(length/width/draught) 불일치 탐지
### snpdb 테이블 활용 가이드
| 분석 목적 | 사용 테이블 | 조회 범위 | 쿼리 |
|----------|-----------|----------|---------|
| **실시간 위치 추적** | 5min (오늘 파티션) | 최근 시간 | `_YYMMDD` 파티션 직접 지정 |
| **최근 항적 패턴** | 5min | 최근 24h | vessel_store 인메모리 캐시 우선 |
| **수일간 이동 경로** | hourly | 최근 7 | `_YYYY_MM` 파티션 |
| **장기 행동 패턴** | daily | 수개월~수년 | 파티션, distance_nm 집계 |
| **선명/목적지 변경** | static | 변경 이력 | mmsi 기준 time_bucket DESC |
| **선박 제원 확인** | static | 최신 1 | MAX(time_bucket) |
| **AIS 상태 시계열** | static | 최근 수일 | status 변화 패턴 |
| **계절 조업 패턴** | daily | 단위 | 월별 distance_nm, avg_speed 비교 |
### 파티션 테이블 쿼리 시 주의
- 5min: `signal.t_vessel_tracks_5min_YYMMDD` (날짜 6자리)
- hourly: `signal.t_vessel_tracks_hourly_YYYY_MM` (연_월)
- daily: `signal.t_vessel_tracks_daily_YYYY_MM` (연_월)
- **부모 테이블 직접 조회 가능** (PostgreSQL이 파티션 프루닝 수행)
- 대량 조회 파티션 직접 지정이 성능에 유리
### 데이터 흐름
```
snpdb (AIS 원본 항적) vessel_store (인메모리 24h) 7단계 파이프라인
kcgdb.vessel_analysis_results (분석 결과, 48h 보존)
kcgdb.group_polygon_snapshots (선단/어구 폴리곤, 7 보존)
Redis (채팅 컨텍스트 캐시, 6 TTL)
```
### 도구 호출 규칙
- 답변에 필요한 구체적 선박 목록이 시스템 프롬프트에 없으면 도구를 호출하세요
- 도구 호출 결과를 받은 , 데이터를 기반으로 답변하세요
- 번에 최대 2 도구 호출 가능
- 집계 데이터( 척인지) 이미 시스템 프롬프트에 있으므로 도구 불필요
- 대부분의 질문은 kcgdb로 충분 snpdb 직접 조회는 특수한 항적 분석에만 사용"""
DB_SCHEMA_AND_TOOLS = DB_SCHEMA_AND_TOOLS.replace('kcg.', f'{settings.KCGDB_SCHEMA}.')
# ── 지식 섹션 레지스트리 (키워드 → 상세 텍스트) ──
KNOWLEDGE_SECTIONS: dict[str, str] = {
'maritime_zones': MARITIME_ZONES,
'fishing_agreement': FISHING_AGREEMENT,
'algorithm_guide': ALGORITHM_GUIDE,
'response_guide': RESPONSE_GUIDE,
'db_schema': DB_SCHEMA_AND_TOOLS,
}
def get_knowledge_section(key: str) -> str:
"""키워드로 특정 도메인 지식 섹션을 반환."""
return KNOWLEDGE_SECTIONS.get(key, f'(알 수 없는 지식 키: {key})')
# ── 압축 시스템 프롬프트 (항상 포함, ~500토큰) ──
COMPACT_SYSTEM_PROMPT = """당신은 대한민국 해양경찰청의 해양상황 분석 AI 어시스턴트입니다.
14,000 AIS 실시간 모니터링 + AI 분석 파이프라인(8 알고리즘) 결과를 기반으로 답변합니다.
핵심 용어:
- 수역: 영해(TERRITORIAL_SEA, 12NM이내), 접속수역(CONTIGUOUS_ZONE, 12~24NM), 특정어업수역(ZONE_I~IV), EEZ
- 위험도: CRITICAL(70) / HIGH/WATCH(50) / MEDIUM/MONITOR(30) / LOW/NORMAL(<30)
- 다크베셀: AIS 의도적 차단 (gap_duration_min), 2024.5.1부터 AIS 의무화
- 허가어선: 906 등록 (PT 저인망 323, GN 유자망 200, PS 위망 16, OT 외끌이 13, FC 운반 31)
- 휴어기: PT/OT 4/16~10/15, GN 6/2~8/31
도구를 호출하여 데이터를 조회하거나 상세 지식에 접근할 있습니다:
- query_vessels: 조건별 선박 목록 조회 (zone, activity, risk_level, is_dark, vessel_type)
- query_vessel_detail: MMSI별 상세 분석 결과
- query_fleet_group: 선단/어구 그룹 조회
- query_vessel_history: 일별 항적 이력 (snpdb, 최대 30)
- query_vessel_static: 선박 정적정보 + 변경 이력 (snpdb)
- get_knowledge: 상세 도메인 지식 조회 (: maritime_zones, fishing_agreement, algorithm_guide, response_guide, db_schema)
도구 호출 형식:
```json
{"tool": "도구명", "params": {"key": "value"}}
```
응답 규칙: 한국어, 데이터 기반, 구체적 수치 명시, 마크다운 형식, 불법 의심 근거+조치 권고"""
def build_domain_knowledge() -> str:
"""전체 도메인 지식 반환 (레거시 호환용)."""
return '\n\n'.join([
ROLE_DEFINITION,
MARITIME_ZONES,
FISHING_AGREEMENT,
ALGORITHM_GUIDE,
RESPONSE_GUIDE,
RESPONSE_RULES,
DB_SCHEMA_AND_TOOLS,
])
def build_compact_prompt() -> str:
"""압축 시스템 프롬프트 반환 (~500토큰)."""
return COMPACT_SYSTEM_PROMPT

236
prediction/chat/router.py Normal file
파일 보기

@ -0,0 +1,236 @@
"""AI 해양분석 채팅 엔드포인트 — 사전 쿼리 + SSE 스트리밍 + Tool Calling."""
import json
import logging
import httpx
from fastapi import APIRouter
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from chat.cache import load_chat_history, save_chat_history, clear_chat_history
from chat.context_builder import MaritimeContextBuilder
from chat.tools import detect_prequery, execute_prequery, parse_tool_calls, execute_tool_call
from config import settings
logger = logging.getLogger(__name__)
router = APIRouter(prefix='/api/v1/chat', tags=['chat'])
class ChatRequest(BaseModel):
message: str
user_id: str = 'anonymous'
stream: bool = True
class ChatResponse(BaseModel):
role: str = 'assistant'
content: str
@router.post('')
async def chat(req: ChatRequest):
"""해양분석 채팅 — 사전 쿼리 + 분석 컨텍스트 + Ollama SSE 스트리밍."""
history = load_chat_history(req.user_id)
builder = MaritimeContextBuilder()
system_prompt = builder.build_system_prompt(user_message=req.message)
# ── 사전 쿼리: 키워드 패턴 매칭으로 DB 조회 후 컨텍스트 보강 ──
prequery_params = detect_prequery(req.message)
prequery_result = ''
if prequery_params:
prequery_result = execute_prequery(prequery_params)
logger.info('prequery: params=%s, results=%d chars', prequery_params, len(prequery_result))
# 시스템 프롬프트에 사전 쿼리 결과 추가
if prequery_result:
system_prompt += '\n\n' + prequery_result
messages = [
{'role': 'system', 'content': system_prompt},
*history[-10:],
{'role': 'user', 'content': req.message},
]
ollama_payload = {
'model': settings.OLLAMA_MODEL,
'messages': messages,
'stream': req.stream,
'options': {
'temperature': 0.3,
'num_predict': 1024,
'num_ctx': 2048,
},
}
if req.stream:
return StreamingResponse(
_stream_with_tools(ollama_payload, req.user_id, history, req.message),
media_type='text/event-stream',
headers={
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'X-Accel-Buffering': 'no',
},
)
return await _call_with_tools(ollama_payload, req.user_id, history, req.message)
async def _stream_with_tools(payload: dict, user_id: str, history: list[dict], user_message: str):
"""SSE 스트리밍 — 1차 응답 후 Tool Call 감지 시 2차 호출."""
accumulated = ''
try:
async with httpx.AsyncClient(timeout=httpx.Timeout(settings.OLLAMA_TIMEOUT_SEC)) as client:
# 1차 LLM 호출
async with client.stream(
'POST',
f'{settings.OLLAMA_BASE_URL}/api/chat',
json=payload,
) as response:
async for line in response.aiter_lines():
if not line:
continue
try:
chunk = json.loads(line)
content = chunk.get('message', {}).get('content', '')
done = chunk.get('done', False)
accumulated += content
sse_data = json.dumps({
'content': content,
'done': False, # 아직 done 보내지 않음 (tool call 가능)
}, ensure_ascii=False)
yield f'data: {sse_data}\n\n'
if done:
break
except json.JSONDecodeError:
continue
# Tool Call 감지
tool_calls = parse_tool_calls(accumulated)
if tool_calls:
# Tool 실행
tool_results = []
for tc in tool_calls:
result = execute_tool_call(tc)
tool_results.append(result)
logger.info('tool call: %s%d chars', tc.get('tool'), len(result))
tool_context = '\n'.join(tool_results)
# 2차 LLM 호출 (tool 결과 포함)
payload['messages'].append({'role': 'assistant', 'content': accumulated})
payload['messages'].append({
'role': 'user',
'content': f'도구 조회 결과입니다. 이 데이터를 기반으로 사용자 질문에 답변하세요:\n{tool_context}',
})
# 구분자 전송
separator = json.dumps({'content': '\n\n---\n_데이터 조회 완료. 분석 결과:_\n\n', 'done': False}, ensure_ascii=False)
yield f'data: {separator}\n\n'
accumulated_2 = ''
async with client.stream(
'POST',
f'{settings.OLLAMA_BASE_URL}/api/chat',
json=payload,
) as response2:
async for line in response2.aiter_lines():
if not line:
continue
try:
chunk = json.loads(line)
content = chunk.get('message', {}).get('content', '')
done = chunk.get('done', False)
accumulated_2 += content
sse_data = json.dumps({
'content': content,
'done': done,
}, ensure_ascii=False)
yield f'data: {sse_data}\n\n'
if done:
break
except json.JSONDecodeError:
continue
# 히스토리에는 최종 답변만 저장
accumulated = accumulated_2 or accumulated
except httpx.TimeoutException:
err_msg = json.dumps({'content': '\n\n[응답 시간 초과]', 'done': True})
yield f'data: {err_msg}\n\n'
except Exception as e:
logger.error('ollama stream error: %s', e)
err_msg = json.dumps({'content': f'[오류: {e}]', 'done': True})
yield f'data: {err_msg}\n\n'
if accumulated:
updated = history + [
{'role': 'user', 'content': user_message},
{'role': 'assistant', 'content': accumulated},
]
save_chat_history(user_id, updated)
yield 'data: [DONE]\n\n'
async def _call_with_tools(
payload: dict, user_id: str, history: list[dict], user_message: str,
) -> ChatResponse:
"""비스트리밍 — Tool Calling 포함."""
try:
async with httpx.AsyncClient(timeout=httpx.Timeout(settings.OLLAMA_TIMEOUT_SEC)) as client:
# 1차 호출
response = await client.post(
f'{settings.OLLAMA_BASE_URL}/api/chat',
json=payload,
)
data = response.json()
content = data.get('message', {}).get('content', '')
# Tool Call 감지
tool_calls = parse_tool_calls(content)
if tool_calls:
tool_results = [execute_tool_call(tc) for tc in tool_calls]
tool_context = '\n'.join(tool_results)
payload['messages'].append({'role': 'assistant', 'content': content})
payload['messages'].append({
'role': 'user',
'content': f'도구 조회 결과입니다. 이 데이터를 기반으로 답변하세요:\n{tool_context}',
})
response2 = await client.post(
f'{settings.OLLAMA_BASE_URL}/api/chat',
json=payload,
)
data2 = response2.json()
content = data2.get('message', {}).get('content', content)
updated = history + [
{'role': 'user', 'content': user_message},
{'role': 'assistant', 'content': content},
]
save_chat_history(user_id, updated)
return ChatResponse(content=content)
except Exception as e:
logger.error('ollama sync error: %s', e)
return ChatResponse(content=f'오류: AI 서버 연결 실패 ({e})')
@router.get('/history')
async def get_history(user_id: str = 'anonymous'):
return load_chat_history(user_id)
@router.delete('/history')
async def delete_history(user_id: str = 'anonymous'):
clear_chat_history(user_id)
return {'ok': True}

420
prediction/chat/tools.py Normal file
파일 보기

@ -0,0 +1,420 @@
"""LLM Tool Calling 실행기 — 사전 쿼리 + 동적 DB 조회."""
import json
import logging
import re
from typing import Optional
from config import qualified_table
logger = logging.getLogger(__name__)
VESSEL_ANALYSIS_RESULTS = qualified_table('vessel_analysis_results')
FLEET_VESSELS = qualified_table('fleet_vessels')
GROUP_POLYGON_SNAPSHOTS = qualified_table('group_polygon_snapshots')
GEAR_CORRELATION_SCORES = qualified_table('gear_correlation_scores')
CORRELATION_PARAM_MODELS = qualified_table('correlation_param_models')
# ── 사전 쿼리 패턴 (키워드 기반, 1회 왕복으로 해결) ──
_ZONE_MAP = {
'수역1': 'ZONE_I', '수역 1': 'ZONE_I', '수역I': 'ZONE_I', 'ZONE_I': 'ZONE_I', '수역i': 'ZONE_I',
'수역2': 'ZONE_II', '수역 2': 'ZONE_II', '수역II': 'ZONE_II', 'ZONE_II': 'ZONE_II',
'수역3': 'ZONE_III', '수역 3': 'ZONE_III', '수역III': 'ZONE_III', 'ZONE_III': 'ZONE_III',
'수역4': 'ZONE_IV', '수역 4': 'ZONE_IV', '수역IV': 'ZONE_IV', 'ZONE_IV': 'ZONE_IV',
'영해': 'TERRITORIAL_SEA', '접속수역': 'CONTIGUOUS_ZONE',
}
_ACTIVITY_MAP = {
'조업': 'FISHING', '어로': 'FISHING', '조업중': 'FISHING', '조업활동': 'FISHING',
'정박': 'STATIONARY', '정지': 'STATIONARY', '대기': 'STATIONARY',
'항행': 'SAILING', '이동': 'SAILING', '항해': 'SAILING',
}
_RISK_MAP = {
'크리티컬': 'CRITICAL', 'critical': 'CRITICAL', '긴급': 'CRITICAL',
'워치': 'HIGH', 'watch': 'HIGH', '경고': 'HIGH', '고위험': 'HIGH',
'모니터': 'MEDIUM', 'monitor': 'MEDIUM', '주의': 'MEDIUM',
'위험': None, # 위험 선박 → CRITICAL+HIGH
}
_DARK_KEYWORDS = ['다크', '다크베셀', 'dark', 'ais 차단', 'ais차단', '신호차단']
_TRANSSHIP_KEYWORDS = ['환적', 'transshipment', '전재']
_SPOOF_KEYWORDS = ['스푸핑', 'spoofing', 'gps 조작', 'gps조작', '위치조작']
def detect_prequery(message: str) -> Optional[dict]:
"""사용자 메시지에서 사전 쿼리 패턴을 감지하여 DB 조회 파라미터 반환."""
msg = message.lower().strip()
params: dict = {}
# 수역 감지
for keyword, zone in _ZONE_MAP.items():
if keyword.lower() in msg:
params['zone'] = zone
break
# 활동 감지
for keyword, activity in _ACTIVITY_MAP.items():
if keyword in msg:
params['activity'] = activity
break
# 위험도 감지
for keyword, level in _RISK_MAP.items():
if keyword in msg:
if level:
params['risk_level'] = level
else:
params['risk_levels'] = ['CRITICAL', 'HIGH']
break
# 다크베셀 감지
if any(k in msg for k in _DARK_KEYWORDS):
params['is_dark'] = True
# 환적 감지
if any(k in msg for k in _TRANSSHIP_KEYWORDS):
params['is_transship'] = True
# 스푸핑 감지
if any(k in msg for k in _SPOOF_KEYWORDS):
params['spoofing'] = True
return params if params else None
def execute_prequery(params: dict) -> str:
"""사전 쿼리 패턴에 해당하는 DB 조회를 실행하여 결과를 텍스트로 반환."""
try:
from db import kcgdb
conditions = ["analyzed_at > NOW() - INTERVAL '1 hour'"]
bind_params: list = []
if 'zone' in params:
conditions.append('zone = %s')
bind_params.append(params['zone'])
if 'activity' in params:
conditions.append('activity_state = %s')
bind_params.append(params['activity'])
if 'risk_level' in params:
conditions.append('risk_level = %s')
bind_params.append(params['risk_level'])
elif 'risk_levels' in params:
placeholders = ','.join(['%s'] * len(params['risk_levels']))
conditions.append(f'risk_level IN ({placeholders})')
bind_params.extend(params['risk_levels'])
if params.get('is_dark'):
conditions.append('is_dark = TRUE')
if params.get('is_transship'):
conditions.append('is_transship_suspect = TRUE')
if params.get('spoofing'):
conditions.append('spoofing_score > 0.5')
where = ' AND '.join(conditions)
query = f"""
SELECT v.mmsi, v.risk_score, v.risk_level, v.zone, v.activity_state,
v.vessel_type, v.is_dark, v.gap_duration_min, v.spoofing_score,
v.cluster_id, v.cluster_size, v.dist_to_baseline_nm,
v.is_transship_suspect, v.transship_pair_mmsi,
fv.permit_no, fv.name_cn, fv.gear_code
FROM {VESSEL_ANALYSIS_RESULTS} v
LEFT JOIN {FLEET_VESSELS} fv ON v.mmsi = fv.mmsi
WHERE {where}
ORDER BY v.risk_score DESC
LIMIT 30
"""
with kcgdb.get_conn() as conn:
with conn.cursor() as cur:
cur.execute(query, bind_params)
rows = cur.fetchall()
if not rows:
return '\n## 조회 결과\n해당 조건에 맞는 선박이 없습니다.\n'
# 결과를 간략 테이블로 구성 (토큰 절약)
lines = [f'\n## 조회 결과 ({len(rows)}척)']
lines.append('| MMSI | 점수 | 수역 | 활동 | 허가 | 다크 |')
lines.append('|---|---|---|---|---|---|')
for row in rows[:15]: # 최대 15척
mmsi, risk_score, risk_level, zone, activity, vtype, is_dark, gap, spoof, \
cid, csize, dist_nm, is_trans, trans_pair, permit, name_cn, gear = row
permit_str = 'Y' if permit else 'N'
dark_str = 'Y' if is_dark else '-'
lines.append(f'| {mmsi} | {risk_score} | {zone} | {activity} | {permit_str} | {dark_str} |')
return '\n'.join(lines)
except Exception as e:
logger.error('prequery execution failed: %s', e)
return f'\n(DB 조회 실패: {e})\n'
# ── LLM Tool Calling 응답 파싱 + 실행 ──
_TOOL_CALL_PATTERN = re.compile(
r'\{"tool"\s*:\s*"(\w+)"\s*,\s*"params"\s*:\s*(\{[^}]+\})\}',
)
def parse_tool_calls(llm_response: str) -> list[dict]:
"""LLM 응답에서 tool call JSON을 추출."""
calls = []
for match in _TOOL_CALL_PATTERN.finditer(llm_response):
try:
tool_name = match.group(1)
params = json.loads(match.group(2))
calls.append({'tool': tool_name, 'params': params})
except json.JSONDecodeError:
continue
return calls[:2] # 최대 2개
def execute_tool_call(call: dict) -> str:
"""단일 tool call 실행."""
tool = call.get('tool', '')
params = call.get('params', {})
if tool == 'query_vessels':
return execute_prequery(params)
if tool == 'query_vessel_detail':
mmsi = params.get('mmsi', '')
if mmsi:
from chat.context_builder import _build_vessel_detail
return _build_vessel_detail(mmsi)
return '(MMSI 미지정)'
if tool == 'query_fleet_group':
return _query_fleet_group(params)
if tool == 'query_vessel_history':
return _query_vessel_history(params)
if tool == 'query_vessel_static':
return _query_vessel_static(params)
if tool == 'get_knowledge':
return _get_knowledge(params)
if tool == 'query_gear_correlation':
return _query_gear_correlation(params)
return f'(알 수 없는 도구: {tool})'
def _get_knowledge(params: dict) -> str:
"""도메인 지식 섹션 조회."""
key = params.get('key', '')
if not key:
return '(key 미지정. 사용 가능: maritime_zones, fishing_agreement, algorithm_guide, response_guide, db_schema)'
from chat.domain_knowledge import get_knowledge_section
return get_knowledge_section(key)
def _query_fleet_group(params: dict) -> str:
"""선단/어구 그룹 조회."""
try:
from db import kcgdb
conditions = [f"snapshot_time = (SELECT MAX(snapshot_time) FROM {GROUP_POLYGON_SNAPSHOTS})"]
bind_params: list = []
if 'group_type' in params:
conditions.append('group_type = %s')
bind_params.append(params['group_type'])
if 'zone_id' in params:
conditions.append('zone_id = %s')
bind_params.append(params['zone_id'])
where = ' AND '.join(conditions)
query = f"""
SELECT group_type, group_key, group_label, member_count, zone_name, members
FROM {GROUP_POLYGON_SNAPSHOTS}
WHERE {where}
ORDER BY member_count DESC
LIMIT 20
"""
with kcgdb.get_conn() as conn:
with conn.cursor() as cur:
cur.execute(query, bind_params)
rows = cur.fetchall()
if not rows:
return '\n(해당 조건의 그룹 없음)\n'
lines = [f'\n## 그룹 조회 결과 ({len(rows)}건)']
lines.append('| 유형 | 키 | 라벨 | 선박수 | 수역 |')
lines.append('|---|---|---|---|---|')
for row in rows:
gtype, gkey, glabel, mcount, zname, members = row
lines.append(f'| {gtype} | {gkey} | {glabel or "-"} | {mcount} | {zname or "-"} |')
return '\n'.join(lines)
except Exception as e:
logger.error('fleet group query failed: %s', e)
return f'\n(그룹 조회 실패: {e})\n'
def _query_vessel_history(params: dict) -> str:
"""snpdb에서 선박 항적 이력 조회 (daily 집계)."""
try:
from db import snpdb
mmsi = params.get('mmsi', '')
days = min(params.get('days', 7), 30) # 최대 30일
if not mmsi:
return '(MMSI 미지정)'
query = """
SELECT time_bucket, distance_nm, avg_speed, max_speed, point_count,
start_position, end_position
FROM signal.t_vessel_tracks_daily
WHERE mmsi = %s AND time_bucket >= CURRENT_DATE - %s
ORDER BY time_bucket DESC
"""
with snpdb.get_conn() as conn:
with conn.cursor() as cur:
cur.execute(query, (mmsi, days))
rows = cur.fetchall()
if not rows:
return f'\n(MMSI {mmsi}: 최근 {days}일 항적 데이터 없음)\n'
lines = [f'\n## 항적 이력: {mmsi} (최근 {days}일)']
lines.append('| 날짜 | 이동거리(NM) | 평균속도 | 최대속도 | AIS포인트 |')
lines.append('|---|---|---|---|---|')
for row in rows:
dt, dist, avg_spd, max_spd, pts, start_pos, end_pos = row
lines.append(
f"| {dt} | {float(dist or 0):.1f} | {float(avg_spd or 0):.1f}kt "
f"| {float(max_spd or 0):.1f}kt | {pts or 0} |"
)
return '\n'.join(lines)
except Exception as e:
logger.error('vessel history query failed: %s', e)
return f'\n(항적 이력 조회 실패: {e})\n'
def _query_vessel_static(params: dict) -> str:
"""snpdb에서 선박 정적정보 + 변경 이력 조회."""
try:
from db import snpdb
mmsi = params.get('mmsi', '')
limit = min(params.get('limit', 10), 24)
if not mmsi:
return '(MMSI 미지정)'
query = """
SELECT time_bucket, name, vessel_type, length, width, draught,
destination, status, callsign, imo
FROM signal.t_vessel_static
WHERE mmsi = %s
ORDER BY time_bucket DESC
LIMIT %s
"""
with snpdb.get_conn() as conn:
with conn.cursor() as cur:
cur.execute(query, (mmsi, limit))
rows = cur.fetchall()
if not rows:
return f'\n(MMSI {mmsi}: 정적정보 없음)\n'
# 최신 정보
latest = rows[0]
lines = [f'\n## 선박 정적정보: {mmsi}']
lines.append(f'- 선명: {latest[1] or "N/A"}')
lines.append(f'- 선종: {latest[2] or "N/A"}')
lines.append(f'- 제원: L={latest[3] or 0}m × W={latest[4] or 0}m, 흘수={latest[5] or 0}m')
lines.append(f'- 목적지: {latest[6] or "N/A"}')
lines.append(f'- 상태: {latest[7] or "N/A"}')
lines.append(f'- 호출부호: {latest[8] or "N/A"}, IMO: {latest[9] or "N/A"}')
# 변경 이력 감지
changes = []
for i in range(len(rows) - 1):
curr, prev = rows[i], rows[i + 1]
diffs = []
if curr[1] != prev[1]:
diffs.append(f'선명: {prev[1]}{curr[1]}')
if curr[6] != prev[6]:
diffs.append(f'목적지: {prev[6]}{curr[6]}')
if curr[7] != prev[7]:
diffs.append(f'상태: {prev[7]}{curr[7]}')
if diffs:
changes.append(f'- {curr[0].strftime("%m/%d %H:%M")}: {", ".join(diffs)}')
if changes:
lines.append(f'\n### 변경 이력 (최근 {len(changes)}건)')
lines.extend(changes[:10])
return '\n'.join(lines)
except Exception as e:
logger.error('vessel static query failed: %s', e)
return f'\n(정적정보 조회 실패: {e})\n'
def _query_gear_correlation(params: dict) -> str:
"""어구 그룹의 연관 선박/어구 조회."""
from db import kcgdb
group_key = params.get('group_key', '')
limit = int(params.get('limit', 10))
with kcgdb.get_conn() as conn:
cur = conn.cursor()
try:
cur.execute(
'SELECT target_name, target_mmsi, target_type, current_score, '
'streak_count, observation_count, proximity_ratio, visit_score, '
'heading_coherence, freeze_state '
f'FROM {GEAR_CORRELATION_SCORES} s '
f'JOIN {CORRELATION_PARAM_MODELS} m ON s.model_id = m.id '
'WHERE s.group_key = %s AND m.is_default = TRUE AND s.current_score >= 0.3 '
'ORDER BY s.current_score DESC LIMIT %s',
(group_key, limit),
)
rows = cur.fetchall()
except Exception:
return f'어구 그룹 "{group_key}"에 대한 연관성 데이터가 없습니다 (테이블 미생성).'
finally:
cur.close()
if not rows:
return f'어구 그룹 "{group_key}"에 대한 연관성 데이터가 없습니다.'
lines = [f'## {group_key} 연관 분석 (상위 {len(rows)}개, default 모델)']
for r in rows:
name, mmsi, ttype, score, streak, obs, prox, visit, heading, state = r
pct = score * 100
disp_name = name or mmsi
detail_parts = []
if prox is not None:
detail_parts.append(f'근접 {prox*100:.0f}%')
if visit is not None:
detail_parts.append(f'방문 {visit*100:.0f}%')
if heading is not None:
detail_parts.append(f'COG동조 {heading*100:.0f}%')
detail = ', '.join(detail_parts) if detail_parts else ''
lines.append(
f'- **{disp_name}** ({mmsi}, {ttype}): '
f'일치율 {pct:.1f}% (연속 {streak}회, 관측 {obs}회) '
f'[{detail}] 상태: {state}'
)
return '\n'.join(lines)

66
prediction/config.py Normal file
파일 보기

@ -0,0 +1,66 @@
import re
from typing import Optional
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
# snpdb (궤적 데이터 소스)
SNPDB_HOST: str = '211.208.115.83'
SNPDB_PORT: int = 5432
SNPDB_NAME: str = 'snpdb'
SNPDB_USER: str = 'snp'
SNPDB_PASSWORD: str = 'snp#8932'
# kcgdb (분석 결과 저장 — kcgaidb 통합 DB)
KCGDB_HOST: str = '211.208.115.83'
KCGDB_PORT: int = 5432
KCGDB_NAME: str = 'kcgaidb'
KCGDB_SCHEMA: str = 'kcg'
KCGDB_USER: str = 'kcg-app'
KCGDB_PASSWORD: str = 'Kcg2026ai'
# 스케줄러
SCHEDULER_INTERVAL_MIN: int = 5
# 인메모리 캐시
CACHE_WINDOW_HOURS: int = 24
INITIAL_LOAD_HOURS: int = 24
STATIC_INFO_REFRESH_MIN: int = 60
PERMIT_REFRESH_MIN: int = 30
SNPDB_SAFE_DELAY_MIN: int = 12
SNPDB_BACKFILL_BUCKETS: int = 3
# 파이프라인
TRAJECTORY_HOURS: int = 6
MMSI_PREFIX: str = '412'
MIN_TRAJ_POINTS: int = 100
# Ollama (LLM)
OLLAMA_BASE_URL: str = 'http://localhost:11434'
OLLAMA_MODEL: str = 'qwen3:14b' # CPU-only: 14b 권장, GPU 있으면 32b
OLLAMA_TIMEOUT_SEC: int = 300
# Redis
REDIS_HOST: str = 'localhost'
REDIS_PORT: int = 6379
REDIS_PASSWORD: str = ''
# 로깅
LOG_LEVEL: str = 'INFO'
model_config = {'env_file': '.env', 'env_file_encoding': 'utf-8', 'extra': 'ignore'}
settings = Settings()
_SQL_IDENTIFIER = re.compile(r'^[A-Za-z_][A-Za-z0-9_]*$')
def qualified_table(table_name: str, schema: Optional[str] = None) -> str:
resolved_schema = schema or settings.KCGDB_SCHEMA
if not _SQL_IDENTIFIER.fullmatch(resolved_schema):
raise ValueError(f'Invalid schema name: {resolved_schema!r}')
if not _SQL_IDENTIFIER.fullmatch(table_name):
raise ValueError(f'Invalid table name: {table_name!r}')
return f'{resolved_schema}.{table_name}'

파일 보기

@ -0,0 +1 @@
{"points": [{"lat": 37.0, "lon": 124.0}, {"lat": 35.0, "lon": 129.0}]}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

파일 보기

@ -0,0 +1 @@
{"type": "FeatureCollection", "name": "\ud2b9\uc815\uc5b4\uc5c5\uc218\uc5ed4", "crs": {"type": "name", "properties": {"name": "urn:ogc:def:crs:OGC:1.3:CRS84"}}, "features": [{"type": "Feature", "properties": {"fid": 0, "GML_ID": null, "OBJECTID": null, "ZONE_NM": null, "MNCT_NO": null, "MNCT_SCALE": null, "MNCT_NM": null, "RELREGLTN": null, "RELGOAG": null, "REVIYR": null, "ZONE_DESC": null, "PHOTO1_PAT": null, "ID": -2147483647, "CATE_CD": null, "ADR_CD": null, "ADR_KNM": null, "ORIGIN": null, "ORIYR": null, "ORIORG": null, "NAME": "\ud2b9\uc815\uc5b4\uc5c5\uc218\uc5ed\u2163", "WARD_NM": null, "WARD_ID": null, "GISID": null, "FID_2": null, "NAME_2": null, "FID_3": null, "NAME_3": null, "GID": null, "NAME_4": null, "FID_4": null, "NAME_5": null, "FID_5": null, "NAME_6": null}, "geometry": {"type": "MultiPolygon", "coordinates": [[[[13859276.603817873, 4232038.462456921], [13859276.603762543, 4321218.244482412], [13859276.603710985, 4404317.064005076], [13840719.645028654, 4439106.786523586], [13884632.712472571, 4439106.787250583], [13884632.712472571, 4439504.084564682], [13940418.269436067, 4439504.375880923], [13969123.924724836, 4439504.525783945], [13968718.329494288, 4438626.439593866], [13962623.599395147, 4425543.915710401], [13960437.31344761, 4420657.3891166765], [13958238.813611617, 4416093.569832627], [13958143.094601436, 4415900.994484875], [13958143.094601437, 4415900.994484875], [13957298.344237303, 4414201.456484755], [13953878.455604602, 4406316.186534493], [13949652.450365951, 4397019.979821594], [13948553.200448176, 4393395.13065616], [13947612.731073817, 4389132.176741289], [13947612.731072996, 4387549.226905922], [13947466.164417507, 4385829.556682826], [13947783.725505754, 4381721.729468383], [13948260.06713652, 4379835.70012994], [13949359.317054221, 4375897.403884492], [13951093.689146286, 4371808.582233328], [13954867.780530114, 4365670.678186072], [13964809.885341855, 4351190.629491161], [13978342.873219142, 4331838.456925102], [13980382.592510404, 4329007.496874151], [13981728.043604897, 4327079.749205159], [13985775.34591557, 4321280.81855131], [13997066.763484716, 4305102.598482491], [13999424.043863578, 4300225.286038025], [14003039.354703771, 4290447.064438686], [14005091.287883686, 4284626.561498255], [14006520.312777169, 4279426.932176922], [14007631.77658257, 4275178.643476352], [14008242.470981453, 4271549.325573796], [14009378.362562515, 4262248.123573576], [14009427.990871342, 4261704.85208626], [14009708.137538105, 4258638.140769343], [14009854.704193696, 4257224.555715567], [14009378.362562606, 4254698.603440943], [14005347.779531531, 4240996.452433007], [14002367.590864772, 4231511.1380338315], [14001280.554835469, 4227266.412716273], [14000486.652116666, 4225212.134400094], [13998047.81589918, 4222926.459154359], [13991387.305576058, 4216684.234498038], [13970721.407121927, 4197120.494488488], [13958654.085803084, 4185745.4565721145], [13956602.15262321, 4184012.5742896623], [13944065.033685392, 4171984.566055202], [13940467.606607554, 4168533.224265296], [13935619.01320107, 4163881.1438622964], [13935718.55954324, 4163976.6556012244], [13817590.293393573, 4163976.6556012244], [13859276.603817873, 4232038.462456921]]]]}}]}

파일 보기

330
prediction/db/kcgdb.py Normal file
파일 보기

@ -0,0 +1,330 @@
import json
import logging
from contextlib import contextmanager
from typing import TYPE_CHECKING, Optional
import psycopg2
from psycopg2 import pool
from psycopg2.extras import execute_values
from config import qualified_table, settings
if TYPE_CHECKING:
from models.result import AnalysisResult
logger = logging.getLogger(__name__)
_pool: Optional[pool.ThreadedConnectionPool] = None
GROUP_POLYGON_SNAPSHOTS = qualified_table('group_polygon_snapshots')
def init_pool():
global _pool
_pool = pool.ThreadedConnectionPool(
minconn=1,
maxconn=5,
host=settings.KCGDB_HOST,
port=settings.KCGDB_PORT,
dbname=settings.KCGDB_NAME,
user=settings.KCGDB_USER,
password=settings.KCGDB_PASSWORD,
options=f'-c search_path={settings.KCGDB_SCHEMA},public',
)
logger.info('kcgdb connection pool initialized')
def close_pool():
global _pool
if _pool:
_pool.closeall()
_pool = None
logger.info('kcgdb connection pool closed')
@contextmanager
def get_conn():
conn = _pool.getconn()
try:
yield conn
except Exception:
conn.rollback()
raise
finally:
_pool.putconn(conn)
def check_health() -> bool:
try:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute('SELECT 1')
return True
except Exception as e:
logger.error('kcgdb health check failed: %s', e)
return False
def upsert_results(results: list['AnalysisResult']) -> int:
"""분석 결과를 vessel_analysis_results 테이블에 upsert."""
if not results:
return 0
insert_sql = """
INSERT INTO vessel_analysis_results (
mmsi, timestamp, vessel_type, confidence, fishing_pct,
cluster_id, season, zone, dist_to_baseline_nm, activity_state,
ucaf_score, ucft_score, is_dark, gap_duration_min,
spoofing_score, bd09_offset_m, speed_jump_count,
cluster_size, is_leader, fleet_role,
risk_score, risk_level,
is_transship_suspect, transship_pair_mmsi, transship_duration_min,
features, analyzed_at
) VALUES %s
ON CONFLICT (mmsi, timestamp) DO UPDATE SET
vessel_type = EXCLUDED.vessel_type,
confidence = EXCLUDED.confidence,
fishing_pct = EXCLUDED.fishing_pct,
cluster_id = EXCLUDED.cluster_id,
season = EXCLUDED.season,
zone = EXCLUDED.zone,
dist_to_baseline_nm = EXCLUDED.dist_to_baseline_nm,
activity_state = EXCLUDED.activity_state,
ucaf_score = EXCLUDED.ucaf_score,
ucft_score = EXCLUDED.ucft_score,
is_dark = EXCLUDED.is_dark,
gap_duration_min = EXCLUDED.gap_duration_min,
spoofing_score = EXCLUDED.spoofing_score,
bd09_offset_m = EXCLUDED.bd09_offset_m,
speed_jump_count = EXCLUDED.speed_jump_count,
cluster_size = EXCLUDED.cluster_size,
is_leader = EXCLUDED.is_leader,
fleet_role = EXCLUDED.fleet_role,
risk_score = EXCLUDED.risk_score,
risk_level = EXCLUDED.risk_level,
is_transship_suspect = EXCLUDED.is_transship_suspect,
transship_pair_mmsi = EXCLUDED.transship_pair_mmsi,
transship_duration_min = EXCLUDED.transship_duration_min,
features = EXCLUDED.features,
analyzed_at = EXCLUDED.analyzed_at
"""
try:
with get_conn() as conn:
with conn.cursor() as cur:
tuples = [r.to_db_tuple() for r in results]
execute_values(cur, insert_sql, tuples, page_size=100)
conn.commit()
count = len(tuples)
logger.info('upserted %d analysis results', count)
return count
except Exception as e:
logger.error('failed to upsert results: %s', e)
return 0
def cleanup_old(hours: int = 48) -> int:
"""오래된 분석 결과 삭제."""
try:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute(
'DELETE FROM vessel_analysis_results WHERE analyzed_at < NOW() - (%s * INTERVAL \'1 hour\')',
(hours,),
)
deleted = cur.rowcount
conn.commit()
if deleted > 0:
logger.info('cleaned up %d old results (older than %dh)', deleted, hours)
return deleted
except Exception as e:
logger.error('failed to cleanup old results: %s', e)
return 0
def save_group_snapshots(snapshots: list[dict]) -> int:
"""group_polygon_snapshots에 폴리곤 스냅샷 배치 INSERT.
snapshots: polygon_builder.build_all_group_snapshots() 결과
항목은: group_type, group_key, group_label, snapshot_time,
polygon_wkt (str|None), center_wkt (str|None),
area_sq_nm, member_count, zone_id, zone_name,
members (list[dict]), color
"""
if not snapshots:
return 0
insert_sql = f"""
INSERT INTO {GROUP_POLYGON_SNAPSHOTS} (
group_type, group_key, group_label, sub_cluster_id, resolution, snapshot_time,
polygon, center_point, area_sq_nm, member_count,
zone_id, zone_name, members, color
) VALUES (
%s, %s, %s, %s, %s, %s,
ST_GeomFromText(%s, 4326), ST_GeomFromText(%s, 4326),
%s, %s, %s, %s, %s::jsonb, %s
)
"""
inserted = 0
try:
with get_conn() as conn:
with conn.cursor() as cur:
for s in snapshots:
cur.execute(
insert_sql,
(
s['group_type'],
s['group_key'],
s['group_label'],
s.get('sub_cluster_id', 0),
s.get('resolution', '6h'),
s['snapshot_time'],
s.get('polygon_wkt'),
s.get('center_wkt'),
s.get('area_sq_nm'),
s.get('member_count'),
s.get('zone_id'),
s.get('zone_name'),
json.dumps(s.get('members', []), ensure_ascii=False),
s.get('color'),
),
)
inserted += 1
conn.commit()
logger.info('saved %d group polygon snapshots', inserted)
return inserted
except Exception as e:
logger.error('failed to save group snapshots: %s', e)
return 0
def fetch_analysis_summary() -> dict:
"""최근 1시간 분석 결과 요약 (채팅 컨텍스트용)."""
try:
with get_conn() as conn:
with conn.cursor() as cur:
# 위험도 분포
cur.execute("""
SELECT risk_level, COUNT(*) FROM vessel_analysis_results
WHERE analyzed_at > NOW() - INTERVAL '1 hour'
GROUP BY risk_level
""")
risk_dist = {row[0]: row[1] for row in cur.fetchall()}
# 수역별 분포
cur.execute("""
SELECT zone, COUNT(*) FROM vessel_analysis_results
WHERE analyzed_at > NOW() - INTERVAL '1 hour'
GROUP BY zone
""")
zone_dist = {row[0]: row[1] for row in cur.fetchall()}
# 다크/스푸핑/환적 카운트
cur.execute("""
SELECT
COUNT(*) FILTER (WHERE is_dark = TRUE) AS dark_count,
COUNT(*) FILTER (WHERE spoofing_score > 0.5) AS spoofing_count,
COUNT(*) FILTER (WHERE is_transship_suspect = TRUE) AS transship_count
FROM vessel_analysis_results
WHERE analyzed_at > NOW() - INTERVAL '1 hour'
""")
row = cur.fetchone()
result = {
'risk_distribution': {**risk_dist, **zone_dist},
'dark_count': row[0] if row else 0,
'spoofing_count': row[1] if row else 0,
'transship_count': row[2] if row else 0,
}
return result
except Exception as e:
logger.error('fetch_analysis_summary failed: %s', e)
return {'risk_distribution': {}, 'dark_count': 0, 'spoofing_count': 0, 'transship_count': 0}
def fetch_recent_high_risk(limit: int = 10) -> list[dict]:
"""위험도 상위 N척 선박 상세 (채팅 컨텍스트용)."""
try:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute("""
SELECT mmsi, risk_score, risk_level, zone, is_dark,
is_transship_suspect, activity_state, spoofing_score
FROM vessel_analysis_results
WHERE analyzed_at > NOW() - INTERVAL '1 hour'
ORDER BY risk_score DESC
LIMIT %s
""", (limit,))
rows = cur.fetchall()
result = []
for row in rows:
result.append({
'mmsi': row[0],
'name': row[0], # vessel_store에서 이름 조회 필요시 보강
'risk_score': row[1],
'risk_level': row[2],
'zone': row[3],
'is_dark': row[4],
'is_transship': row[5],
'activity_state': row[6],
'spoofing_score': float(row[7]) if row[7] else 0.0,
})
return result
except Exception as e:
logger.error('fetch_recent_high_risk failed: %s', e)
return []
def fetch_polygon_summary() -> dict:
"""최신 그룹 폴리곤 요약 (채팅 컨텍스트용)."""
try:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute(f"""
SELECT group_type, COUNT(*), SUM(member_count)
FROM {GROUP_POLYGON_SNAPSHOTS}
WHERE snapshot_time = (
SELECT MAX(snapshot_time) FROM {GROUP_POLYGON_SNAPSHOTS}
)
GROUP BY group_type
""")
rows = cur.fetchall()
result = {
'fleet_count': 0, 'fleet_members': 0,
'gear_in_zone': 0, 'gear_out_zone': 0,
}
for row in rows:
gtype, count, members = row[0], row[1], row[2] or 0
if gtype == 'FLEET':
result['fleet_count'] = count
result['fleet_members'] = members
elif gtype == 'GEAR_IN_ZONE':
result['gear_in_zone'] = count
elif gtype == 'GEAR_OUT_ZONE':
result['gear_out_zone'] = count
return result
except Exception as e:
logger.error('fetch_polygon_summary failed: %s', e)
return {'fleet_count': 0, 'fleet_members': 0, 'gear_in_zone': 0, 'gear_out_zone': 0}
def cleanup_group_snapshots(days: int = 7) -> int:
"""오래된 그룹 폴리곤 스냅샷 삭제."""
try:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute(
f"DELETE FROM {GROUP_POLYGON_SNAPSHOTS} "
"WHERE snapshot_time < NOW() - (%s * INTERVAL '1 day')",
(days,),
)
deleted = cur.rowcount
conn.commit()
if deleted > 0:
logger.info('cleaned up %d old group snapshots (older than %dd)', deleted, days)
return deleted
except Exception as e:
logger.error('failed to cleanup group snapshots: %s', e)
return 0

파일 보기

@ -0,0 +1,143 @@
"""gear_correlation_raw_metrics 파티션 유지보수.
APScheduler 일별 작업으로 실행:
- system_config에서 설정 읽기 (hot-reload, 프로세스 재시작 불필요)
- 미래 파티션 미리 생성
- 만료 파티션 DROP
- 미관측 점수 레코드 정리
"""
import logging
from datetime import date, datetime, timedelta
from config import qualified_table, settings
logger = logging.getLogger(__name__)
SYSTEM_CONFIG = qualified_table('system_config')
GEAR_CORRELATION_RAW_METRICS = qualified_table('gear_correlation_raw_metrics')
GEAR_CORRELATION_SCORES = qualified_table('gear_correlation_scores')
def _get_config_int(conn, key: str, default: int) -> int:
"""system_config에서 설정값 조회. 없으면 default."""
cur = conn.cursor()
try:
cur.execute(
f"SELECT value::text FROM {SYSTEM_CONFIG} WHERE key = %s",
(key,),
)
row = cur.fetchone()
return int(row[0].strip('"')) if row else default
except Exception:
return default
finally:
cur.close()
def _create_future_partitions(conn, days_ahead: int) -> int:
"""미래 N일 파티션 생성. 반환: 생성된 파티션 수."""
cur = conn.cursor()
created = 0
try:
for i in range(days_ahead + 1):
d = date.today() + timedelta(days=i)
partition_name = f'gear_correlation_raw_metrics_{d.strftime("%Y%m%d")}'
cur.execute(
"SELECT 1 FROM pg_class c "
"JOIN pg_namespace n ON n.oid = c.relnamespace "
"WHERE c.relname = %s AND n.nspname = %s",
(partition_name, settings.KCGDB_SCHEMA),
)
if cur.fetchone() is None:
next_d = d + timedelta(days=1)
cur.execute(
f"CREATE TABLE IF NOT EXISTS {qualified_table(partition_name)} "
f"PARTITION OF {GEAR_CORRELATION_RAW_METRICS} "
f"FOR VALUES FROM ('{d.isoformat()}') TO ('{next_d.isoformat()}')"
)
created += 1
logger.info('created partition: %s.%s', settings.KCGDB_SCHEMA, partition_name)
conn.commit()
except Exception as e:
conn.rollback()
logger.error('failed to create partitions: %s', e)
finally:
cur.close()
return created
def _drop_expired_partitions(conn, retention_days: int) -> int:
"""retention_days 초과 파티션 DROP. 반환: 삭제된 파티션 수."""
cutoff = date.today() - timedelta(days=retention_days)
cur = conn.cursor()
dropped = 0
try:
cur.execute(
"SELECT c.relname FROM pg_class c "
"JOIN pg_namespace n ON n.oid = c.relnamespace "
"WHERE c.relname LIKE 'gear_correlation_raw_metrics_%%' "
"AND n.nspname = %s AND c.relkind = 'r'",
(settings.KCGDB_SCHEMA,),
)
for (name,) in cur.fetchall():
date_str = name.rsplit('_', 1)[-1]
try:
partition_date = datetime.strptime(date_str, '%Y%m%d').date()
except ValueError:
continue
if partition_date < cutoff:
cur.execute(f'DROP TABLE IF EXISTS {qualified_table(name)}')
dropped += 1
logger.info('dropped expired partition: %s.%s', settings.KCGDB_SCHEMA, name)
conn.commit()
except Exception as e:
conn.rollback()
logger.error('failed to drop partitions: %s', e)
finally:
cur.close()
return dropped
def _cleanup_stale_scores(conn, cleanup_days: int) -> int:
"""cleanup_days 이상 미관측 점수 레코드 삭제."""
cur = conn.cursor()
try:
cur.execute(
f"DELETE FROM {GEAR_CORRELATION_SCORES} "
"WHERE last_observed_at < NOW() - make_interval(days => %s)",
(cleanup_days,),
)
deleted = cur.rowcount
conn.commit()
return deleted
except Exception as e:
conn.rollback()
logger.error('failed to cleanup stale scores: %s', e)
return 0
finally:
cur.close()
def maintain_partitions():
"""일별 파티션 유지보수 — 스케줄러에서 호출.
system_config에서 설정을 매번 읽으므로
API를 통한 설정 변경이 다음 실행 즉시 반영됨.
"""
from db import kcgdb
with kcgdb.get_conn() as conn:
retention = _get_config_int(conn, 'partition.raw_metrics.retention_days', 7)
ahead = _get_config_int(conn, 'partition.raw_metrics.create_ahead_days', 3)
cleanup_days = _get_config_int(conn, 'partition.scores.cleanup_days', 30)
created = _create_future_partitions(conn, ahead)
dropped = _drop_expired_partitions(conn, retention)
cleaned = _cleanup_stale_scores(conn, cleanup_days)
logger.info(
'partition maintenance: %d created, %d dropped, %d stale scores cleaned '
'(retention=%dd, ahead=%dd, cleanup=%dd)',
created, dropped, cleaned, retention, ahead, cleanup_days,
)

210
prediction/db/snpdb.py Normal file
파일 보기

@ -0,0 +1,210 @@
import logging
from contextlib import contextmanager
from datetime import datetime
from typing import Optional
import pandas as pd
import psycopg2
from psycopg2 import pool
from config import settings
from time_bucket import compute_incremental_window_start, compute_initial_window_start, compute_safe_bucket
logger = logging.getLogger(__name__)
_pool: Optional[pool.ThreadedConnectionPool] = None
def init_pool():
global _pool
_pool = pool.ThreadedConnectionPool(
minconn=1,
maxconn=3,
host=settings.SNPDB_HOST,
port=settings.SNPDB_PORT,
dbname=settings.SNPDB_NAME,
user=settings.SNPDB_USER,
password=settings.SNPDB_PASSWORD,
)
logger.info('snpdb connection pool initialized')
def close_pool():
global _pool
if _pool:
_pool.closeall()
_pool = None
logger.info('snpdb connection pool closed')
@contextmanager
def get_conn():
conn = _pool.getconn()
try:
yield conn
finally:
_pool.putconn(conn)
def check_health() -> bool:
try:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute('SELECT 1')
return True
except Exception as e:
logger.error('snpdb health check failed: %s', e)
return False
def fetch_all_tracks(hours: int = 24) -> pd.DataFrame:
"""한국 해역 전 선박의 궤적 포인트를 조회한다.
LineStringM 지오메트리에서 개별 포인트를 추출하며,
한국 해역(122-132E, 31-39N) 최근 N시간 데이터를 반환한다.
"""
safe_bucket = compute_safe_bucket()
window_start = compute_initial_window_start(hours, safe_bucket)
query = """
SELECT
t.mmsi,
to_timestamp(ST_M((dp).geom)) as timestamp,
t.time_bucket,
ST_Y((dp).geom) as lat,
ST_X((dp).geom) as lon,
CASE
WHEN (dp).path[1] = 1 THEN (t.start_position->>'sog')::float
ELSE COALESCE((t.end_position->>'sog')::float, t.avg_speed::float)
END as raw_sog
FROM signal.t_vessel_tracks_5min t,
LATERAL ST_DumpPoints(t.track_geom) dp
WHERE t.time_bucket >= %s
AND t.time_bucket <= %s
AND t.track_geom && ST_MakeEnvelope(122, 31, 132, 39, 4326)
ORDER BY t.mmsi, to_timestamp(ST_M((dp).geom))
"""
try:
with get_conn() as conn:
df = pd.read_sql_query(query, conn, params=(window_start, safe_bucket))
logger.info(
'fetch_all_tracks: %d rows, %d vessels (window=%s..%s, last %dh safe)',
len(df),
df['mmsi'].nunique() if len(df) > 0 else 0,
window_start,
safe_bucket,
hours,
)
return df
except Exception as e:
logger.error('fetch_all_tracks failed: %s', e)
return pd.DataFrame(columns=['mmsi', 'timestamp', 'lat', 'lon', 'raw_sog'])
def fetch_incremental(last_bucket: datetime) -> pd.DataFrame:
"""last_bucket 이후의 신규 궤적 포인트를 조회한다.
스케줄러 증분 업데이트에 사용되며, time_bucket > last_bucket 조건으로
이미 처리한 버킷을 건너뛴다.
"""
safe_bucket = compute_safe_bucket()
from_bucket = compute_incremental_window_start(last_bucket)
if safe_bucket <= from_bucket:
logger.info(
'fetch_incremental skipped: safe_bucket=%s, from_bucket=%s, last_bucket=%s',
safe_bucket,
from_bucket,
last_bucket,
)
return pd.DataFrame(columns=['mmsi', 'timestamp', 'lat', 'lon', 'raw_sog'])
query = """
SELECT
t.mmsi,
to_timestamp(ST_M((dp).geom)) as timestamp,
t.time_bucket,
ST_Y((dp).geom) as lat,
ST_X((dp).geom) as lon,
CASE
WHEN (dp).path[1] = 1 THEN (t.start_position->>'sog')::float
ELSE COALESCE((t.end_position->>'sog')::float, t.avg_speed::float)
END as raw_sog
FROM signal.t_vessel_tracks_5min t,
LATERAL ST_DumpPoints(t.track_geom) dp
WHERE t.time_bucket > %s
AND t.time_bucket <= %s
AND t.track_geom && ST_MakeEnvelope(122, 31, 132, 39, 4326)
ORDER BY t.mmsi, to_timestamp(ST_M((dp).geom))
"""
try:
with get_conn() as conn:
df = pd.read_sql_query(query, conn, params=(from_bucket, safe_bucket))
logger.info(
'fetch_incremental: %d rows, %d vessels (from %s, safe %s, last %s)',
len(df),
df['mmsi'].nunique() if len(df) > 0 else 0,
from_bucket.isoformat(),
safe_bucket.isoformat(),
last_bucket.isoformat(),
)
return df
except Exception as e:
logger.error('fetch_incremental failed: %s', e)
return pd.DataFrame(columns=['mmsi', 'timestamp', 'lat', 'lon', 'raw_sog'])
def fetch_static_info(mmsi_list: list[str]) -> dict[str, dict]:
"""MMSI 목록에 해당하는 선박 정적 정보를 조회한다.
DISTINCT ON (mmsi) 최신 레코드만 반환한다.
"""
query = """
SELECT DISTINCT ON (mmsi) mmsi, name, vessel_type, length, width
FROM signal.t_vessel_static
WHERE mmsi = ANY(%s)
ORDER BY mmsi, time_bucket DESC
"""
try:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute(query, (mmsi_list,))
rows = cur.fetchall()
result = {
row[0]: {
'name': row[1],
'vessel_type': row[2],
'length': row[3],
'width': row[4],
}
for row in rows
}
logger.info('fetch_static_info: %d vessels resolved', len(result))
return result
except Exception as e:
logger.error('fetch_static_info failed: %s', e)
return {}
def fetch_permit_mmsis() -> set[str]:
"""중국 허가어선 MMSI 목록을 조회한다.
signal.t_chnprmship_positions 테이블에서 DISTINCT mmsi를 반환한다.
"""
query = """
SELECT DISTINCT mmsi FROM signal.t_chnprmship_positions
"""
try:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute(query)
rows = cur.fetchall()
result = {row[0] for row in rows}
logger.info('fetch_permit_mmsis: %d permitted vessels', len(result))
return result
except Exception as e:
logger.error('fetch_permit_mmsis failed: %s', e)
return set()

34
prediction/env.example Normal file
파일 보기

@ -0,0 +1,34 @@
# snpdb (궤적 데이터 소스)
SNPDB_HOST=211.208.115.83
SNPDB_PORT=5432
SNPDB_NAME=snpdb
SNPDB_USER=snp
SNPDB_PASSWORD=snp#8932
# kcgdb (분석 결과 저장)
KCGDB_HOST=211.208.115.83
KCGDB_PORT=5432
KCGDB_NAME=kcgdb
KCGDB_SCHEMA=kcg
KCGDB_USER=kcg_app
KCGDB_PASSWORD=Kcg2026monitor
# 스케줄러
SCHEDULER_INTERVAL_MIN=5
# 파이프라인
TRAJECTORY_HOURS=6
MMSI_PREFIX=412
MIN_TRAJ_POINTS=100
# Ollama (LLM)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen3:32b
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
# 로깅
LOG_LEVEL=INFO

370
prediction/fleet_tracker.py Normal file
파일 보기

@ -0,0 +1,370 @@
"""등록 선단 기반 추적기."""
import logging
import re
import time
from datetime import datetime, timezone
from typing import Optional
import pandas as pd
from algorithms.gear_name_rules import is_trackable_parent_name
from config import qualified_table
logger = logging.getLogger(__name__)
# 어구 이름 패턴 — 공백/영숫자 인덱스/끝_ 허용
GEAR_PATTERN = re.compile(r'^(.+?)_(?=\S*\d)\S+(?:[_ ]\S*)*[_ ]*$|^(\d+)$')
GEAR_PATTERN_PCT = re.compile(r'^(.+?)%$')
_REGISTRY_CACHE_SEC = 3600
FLEET_COMPANIES = qualified_table('fleet_companies')
FLEET_VESSELS = qualified_table('fleet_vessels')
GEAR_IDENTITY_LOG = qualified_table('gear_identity_log')
GEAR_CORRELATION_SCORES = qualified_table('gear_correlation_scores')
FLEET_TRACKING_SNAPSHOT = qualified_table('fleet_tracking_snapshot')
class FleetTracker:
def __init__(self) -> None:
self._companies: dict[int, dict] = {} # id → {name_cn, name_en}
self._vessels: dict[int, dict] = {} # id → {permit_no, name_cn, ...}
self._name_cn_map: dict[str, int] = {} # name_cn → vessel_id
self._name_en_map: dict[str, int] = {} # name_en(lowercase) → vessel_id
self._mmsi_to_vid: dict[str, int] = {} # mmsi → vessel_id (매칭된 것만)
self._gear_active: dict[str, dict] = {} # mmsi → {name, parent_mmsi, ...}
self._last_registry_load: float = 0.0
def load_registry(self, conn) -> None:
"""DB에서 fleet_companies + fleet_vessels 로드. 1시간 캐시."""
if time.time() - self._last_registry_load < _REGISTRY_CACHE_SEC:
return
cur = conn.cursor()
cur.execute(f'SELECT id, name_cn, name_en FROM {FLEET_COMPANIES}')
self._companies = {r[0]: {'name_cn': r[1], 'name_en': r[2]} for r in cur.fetchall()}
cur.execute(
f"""SELECT id, company_id, permit_no, name_cn, name_en, tonnage,
gear_code, fleet_role, pair_vessel_id, mmsi
FROM {FLEET_VESSELS}"""
)
self._vessels = {}
self._name_cn_map = {}
self._name_en_map = {}
self._mmsi_to_vid = {}
for r in cur.fetchall():
vid = r[0]
v: dict = {
'id': vid,
'company_id': r[1],
'permit_no': r[2],
'name_cn': r[3],
'name_en': r[4],
'tonnage': r[5],
'gear_code': r[6],
'fleet_role': r[7],
'pair_vessel_id': r[8],
'mmsi': r[9],
}
self._vessels[vid] = v
if r[3]:
self._name_cn_map[r[3]] = vid
if r[4]:
self._name_en_map[r[4].lower().strip()] = vid
if r[9]:
self._mmsi_to_vid[r[9]] = vid
cur.close()
self._last_registry_load = time.time()
logger.info(
'fleet registry loaded: %d companies, %d vessels',
len(self._companies),
len(self._vessels),
)
def match_ais_to_registry(self, ais_vessels: list[dict], conn) -> None:
"""AIS 선박을 등록 선단에 매칭. DB 업데이트.
ais_vessels: [{mmsi, name, lat, lon, sog, cog}, ...]
"""
cur = conn.cursor()
matched = 0
for v in ais_vessels:
mmsi = v.get('mmsi', '')
name = v.get('name', '')
if not mmsi or not name:
continue
# 이미 매칭됨 → last_seen_at 업데이트
if mmsi in self._mmsi_to_vid:
cur.execute(
f'UPDATE {FLEET_VESSELS} SET last_seen_at = NOW() WHERE id = %s',
(self._mmsi_to_vid[mmsi],),
)
continue
# NAME_EXACT 매칭
vid: Optional[int] = self._name_cn_map.get(name)
if not vid:
vid = self._name_en_map.get(name.lower().strip())
if vid:
cur.execute(
f"""UPDATE {FLEET_VESSELS}
SET mmsi = %s, match_confidence = 0.95, match_method = 'NAME_EXACT',
last_seen_at = NOW(), updated_at = NOW()
WHERE id = %s AND (mmsi IS NULL OR mmsi = %s)""",
(mmsi, vid, mmsi),
)
self._mmsi_to_vid[mmsi] = vid
matched += 1
conn.commit()
cur.close()
if matched > 0:
logger.info('AIS→registry matched: %d vessels', matched)
def track_gear_identity(self, gear_signals: list[dict], conn) -> None:
"""어구/어망 정체성 추적.
gear_signals: [{mmsi, name, lat, lon}, ...] 이름이 XXX_숫자_숫자 패턴인 AIS 신호
"""
cur = conn.cursor()
now = datetime.now(timezone.utc)
for g in gear_signals:
mmsi = g['mmsi']
name = g['name']
lat = g.get('lat', 0)
lon = g.get('lon', 0)
# 모선명 + 인덱스 추출
parent_name: Optional[str] = None
idx1: Optional[int] = None
idx2: Optional[int] = None
m = GEAR_PATTERN.match(name)
if m:
# group(1): parent+index 패턴, group(2): 순수 숫자 패턴
if m.group(1):
parent_name = m.group(1).strip()
suffix = name[m.end(1):].strip(' _')
digits = re.findall(r'\d+', suffix)
idx1 = int(digits[0]) if len(digits) >= 1 else None
idx2 = int(digits[1]) if len(digits) >= 2 else None
else:
# 순수 숫자 이름 (예: 12345) — parent 없음, 인덱스만
idx1 = int(m.group(2))
else:
m2 = GEAR_PATTERN_PCT.match(name)
if m2:
parent_name = m2.group(1).strip()
effective_parent_name = parent_name or name
if not is_trackable_parent_name(effective_parent_name):
continue
# 모선 매칭
parent_mmsi: Optional[str] = None
parent_vid: Optional[int] = None
if parent_name:
vid = self._name_cn_map.get(parent_name)
if not vid:
vid = self._name_en_map.get(parent_name.lower())
if vid:
parent_vid = vid
parent_mmsi = self._vessels[vid].get('mmsi')
match_method: Optional[str] = 'NAME_PARENT' if parent_vid else None
confidence = 0.9 if parent_vid else 0.0
# 기존 활성 행 조회
cur.execute(
f"""SELECT id, name FROM {GEAR_IDENTITY_LOG}
WHERE mmsi = %s AND is_active = TRUE""",
(mmsi,),
)
existing = cur.fetchone()
if existing:
if existing[1] == name:
# 같은 MMSI + 같은 이름 → 위치/시간 업데이트
cur.execute(
f"""UPDATE {GEAR_IDENTITY_LOG}
SET last_seen_at = %s, lat = %s, lon = %s
WHERE id = %s""",
(now, lat, lon, existing[0]),
)
else:
# 같은 MMSI + 다른 이름 → 이전 비활성화 + 새 행
cur.execute(
f'UPDATE {GEAR_IDENTITY_LOG} SET is_active = FALSE WHERE id = %s',
(existing[0],),
)
cur.execute(
f"""INSERT INTO {GEAR_IDENTITY_LOG}
(mmsi, name, parent_name, parent_mmsi, parent_vessel_id,
gear_index_1, gear_index_2, lat, lon,
match_method, match_confidence, first_seen_at, last_seen_at)
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)""",
(mmsi, name, parent_name, parent_mmsi, parent_vid,
idx1, idx2, lat, lon,
match_method, confidence, now, now),
)
else:
# 새 MMSI → 같은 이름이 다른 MMSI로 있는지 확인
cur.execute(
f"""SELECT id, mmsi FROM {GEAR_IDENTITY_LOG}
WHERE name = %s AND is_active = TRUE AND mmsi != %s""",
(name, mmsi),
)
old_mmsi_row = cur.fetchone()
if old_mmsi_row:
# 같은 이름 + 다른 MMSI → MMSI 변경
cur.execute(
f'UPDATE {GEAR_IDENTITY_LOG} SET is_active = FALSE WHERE id = %s',
(old_mmsi_row[0],),
)
logger.info('gear MMSI change: %s%s (name=%s)', old_mmsi_row[1], mmsi, name)
# 어피니티 점수 이전 (이전 MMSI → 새 MMSI)
try:
cur.execute(
f"UPDATE {GEAR_CORRELATION_SCORES} "
"SET target_mmsi = %s, updated_at = NOW() "
"WHERE target_mmsi = %s",
(mmsi, old_mmsi_row[1]),
)
if cur.rowcount > 0:
logger.info(
'transferred %d affinity scores: %s%s',
cur.rowcount, old_mmsi_row[1], mmsi,
)
except Exception as e:
logger.warning('affinity score transfer failed: %s', e)
cur.execute(
f"""INSERT INTO {GEAR_IDENTITY_LOG}
(mmsi, name, parent_name, parent_mmsi, parent_vessel_id,
gear_index_1, gear_index_2, lat, lon,
match_method, match_confidence, first_seen_at, last_seen_at)
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)""",
(mmsi, name, parent_name, parent_mmsi, parent_vid,
idx1, idx2, lat, lon,
match_method, confidence, now, now),
)
conn.commit()
cur.close()
def build_fleet_clusters(self, vessel_dfs: dict[str, pd.DataFrame]) -> dict[str, dict]:
"""등록 선단 기준으로 cluster 정보 구성.
Returns: {mmsi {cluster_id, cluster_size, is_leader, fleet_role}}
cluster_id = company_id (등록 선단 기준)
"""
results: dict[str, dict] = {}
# 회사별로 현재 AIS 수신 중인 선박 그룹핑
company_vessels: dict[int, list[str]] = {}
for mmsi, vid in self._mmsi_to_vid.items():
v = self._vessels.get(vid)
if not v or mmsi not in vessel_dfs:
continue
cid = v['company_id']
company_vessels.setdefault(cid, []).append(mmsi)
for cid, mmsis in company_vessels.items():
if len(mmsis) < 2:
# 단독 선박 → NOISE
for mmsi in mmsis:
v = self._vessels.get(self._mmsi_to_vid.get(mmsi, -1), {})
results[mmsi] = {
'cluster_id': -1,
'cluster_size': 1,
'is_leader': False,
'fleet_role': v.get('fleet_role', 'NOISE'),
}
continue
# 2척 이상 → 등록 선단 클러스터
for mmsi in mmsis:
vid = self._mmsi_to_vid[mmsi]
v = self._vessels[vid]
results[mmsi] = {
'cluster_id': cid,
'cluster_size': len(mmsis),
'is_leader': v['fleet_role'] == 'MAIN',
'fleet_role': v['fleet_role'],
}
# 매칭 안 된 선박 → NOISE
for mmsi in vessel_dfs:
if mmsi not in results:
results[mmsi] = {
'cluster_id': -1,
'cluster_size': 0,
'is_leader': False,
'fleet_role': 'NOISE',
}
return results
def save_snapshot(self, vessel_dfs: dict[str, pd.DataFrame], conn) -> None:
"""fleet_tracking_snapshot 저장."""
now = datetime.now(timezone.utc)
cur = conn.cursor()
company_vessels: dict[int, list[str]] = {}
for mmsi, vid in self._mmsi_to_vid.items():
v = self._vessels.get(vid)
if not v or mmsi not in vessel_dfs:
continue
company_vessels.setdefault(v['company_id'], []).append(mmsi)
for cid, mmsis in company_vessels.items():
active = len(mmsis)
total = sum(1 for v in self._vessels.values() if v['company_id'] == cid)
lats: list[float] = []
lons: list[float] = []
for mmsi in mmsis:
df = vessel_dfs.get(mmsi)
if df is not None and len(df) > 0:
last = df.iloc[-1]
lats.append(float(last['lat']))
lons.append(float(last['lon']))
center_lat = sum(lats) / len(lats) if lats else None
center_lon = sum(lons) / len(lons) if lons else None
cur.execute(
f"""INSERT INTO {FLEET_TRACKING_SNAPSHOT}
(company_id, snapshot_time, total_vessels, active_vessels,
center_lat, center_lon)
VALUES (%s, %s, %s, %s, %s, %s)""",
(cid, now, total, active, center_lat, center_lon),
)
conn.commit()
cur.close()
logger.info('fleet snapshot saved: %d companies', len(company_vessels))
def get_company_vessels(self, vessel_dfs: dict[str, 'pd.DataFrame']) -> dict[int, list[str]]:
"""현재 AIS 수신 중인 등록 선단의 회사별 MMSI 목록 반환.
Returns: {company_id: [mmsi, ...]}
"""
result: dict[int, list[str]] = {}
for mmsi, vid in self._mmsi_to_vid.items():
v = self._vessels.get(vid)
if not v or mmsi not in vessel_dfs:
continue
result.setdefault(v['company_id'], []).append(mmsi)
return result
# 싱글턴
fleet_tracker = FleetTracker()

159
prediction/main.py Normal file
파일 보기

@ -0,0 +1,159 @@
import logging
import sys
from contextlib import asynccontextmanager
from fastapi import BackgroundTasks, FastAPI
from config import qualified_table, settings
from db import kcgdb, snpdb
from scheduler import get_last_run, run_analysis_cycle, start_scheduler, stop_scheduler
logging.basicConfig(
level=getattr(logging, settings.LOG_LEVEL, logging.INFO),
format='%(asctime)s [%(levelname)s] %(name)s: %(message)s',
stream=sys.stdout,
)
logger = logging.getLogger(__name__)
GEAR_CORRELATION_SCORES = qualified_table('gear_correlation_scores')
CORRELATION_PARAM_MODELS = qualified_table('correlation_param_models')
@asynccontextmanager
async def lifespan(application: FastAPI):
from cache.vessel_store import vessel_store
logger.info('starting KCG Prediction Service')
snpdb.init_pool()
kcgdb.init_pool()
# 인메모리 캐시 초기 로드 (24시간)
logger.info('loading initial vessel data (%dh)...', settings.INITIAL_LOAD_HOURS)
vessel_store.load_initial(settings.INITIAL_LOAD_HOURS)
logger.info('initial load complete: %s', vessel_store.stats())
start_scheduler()
yield
stop_scheduler()
snpdb.close_pool()
kcgdb.close_pool()
logger.info('KCG Prediction Service stopped')
app = FastAPI(
title='KCG Prediction Service',
version='2.1.0',
lifespan=lifespan,
)
# AI 해양분석 채팅 라우터
from chat.router import router as chat_router
app.include_router(chat_router)
@app.get('/health')
def health_check():
from cache.vessel_store import vessel_store
return {
'status': 'ok',
'snpdb': snpdb.check_health(),
'kcgdb': kcgdb.check_health(),
'store': vessel_store.stats(),
}
@app.get('/api/v1/analysis/status')
def analysis_status():
return get_last_run()
@app.post('/api/v1/analysis/trigger')
def trigger_analysis(background_tasks: BackgroundTasks):
background_tasks.add_task(run_analysis_cycle)
return {'message': 'analysis cycle triggered'}
@app.get('/api/v1/correlation/{group_key:path}/tracks')
def get_correlation_tracks(
group_key: str,
hours: int = 24,
min_score: float = 0.3,
):
"""Return correlated vessels with their track history for map rendering.
Queries gear_correlation_scores (ALL active models) and enriches with
24h track data from in-memory vessel_store.
Each vessel includes which models detected it.
"""
from cache.vessel_store import vessel_store
try:
with kcgdb.get_conn() as conn:
cur = conn.cursor()
# Get correlated vessels from ALL active models
cur.execute(f"""
SELECT s.target_mmsi, s.target_type, s.target_name,
s.current_score, m.name AS model_name
FROM {GEAR_CORRELATION_SCORES} s
JOIN {CORRELATION_PARAM_MODELS} m ON s.model_id = m.id
WHERE s.group_key = %s
AND s.current_score >= %s
AND m.is_active = TRUE
ORDER BY s.current_score DESC
""", (group_key, min_score))
rows = cur.fetchall()
cur.close()
logger.info('correlation tracks: group_key=%r, min_score=%s, rows=%d',
group_key, min_score, len(rows))
if not rows:
return {'groupKey': group_key, 'vessels': []}
# Group by MMSI: collect all models per vessel, keep highest score
vessel_map: dict[str, dict] = {}
for row in rows:
mmsi = row[0]
model_name = row[4]
score = float(row[3])
if mmsi not in vessel_map:
vessel_map[mmsi] = {
'mmsi': mmsi,
'type': row[1],
'name': row[2] or '',
'score': score,
'models': {model_name: score},
}
else:
entry = vessel_map[mmsi]
entry['models'][model_name] = score
if score > entry['score']:
entry['score'] = score
mmsis = list(vessel_map.keys())
# Get tracks from vessel_store
tracks = vessel_store.get_vessel_tracks(mmsis, hours)
with_tracks = sum(1 for m in mmsis if m in tracks and len(tracks[m]) > 0)
logger.info('correlation tracks: %d unique mmsis, %d with track data, vessel_store._tracks has %d entries',
len(mmsis), with_tracks, len(vessel_store._tracks))
# Build response
vessels = []
for info in vessel_map.values():
track = tracks.get(info['mmsi'], [])
vessels.append({
'mmsi': info['mmsi'],
'name': info['name'],
'type': info['type'],
'score': info['score'],
'models': info['models'], # {modelName: score, ...}
'track': track,
})
return {'groupKey': group_key, 'vessels': vessels}
except Exception as e:
logger.warning('get_correlation_tracks failed for %s: %s', group_key, e)
return {'groupKey': group_key, 'vessels': []}

파일 보기

38
prediction/models/ais.py Normal file
파일 보기

@ -0,0 +1,38 @@
from dataclasses import dataclass, field
from typing import List, Dict
import pandas as pd
@dataclass
class AISPoint:
mmsi: str
ts: pd.Timestamp
lat: float
lon: float
sog: float
cog: float
state: str = 'UNKNOWN'
@dataclass
class VesselTrajectory:
mmsi: str
points: List[AISPoint] = field(default_factory=list)
vessel_type: str = 'UNKNOWN'
cluster_id: int = -1
season: str = 'UNKNOWN'
fishing_pct: float = 0.0
features: Dict = field(default_factory=dict)
@dataclass
class ClassificationResult:
mmsi: str
vessel_type: str
confidence: float
dominant_state: str
fishing_pct: float
cluster_id: int
season: str
feature_vector: Dict

104
prediction/models/result.py Normal file
파일 보기

@ -0,0 +1,104 @@
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Optional
@dataclass
class AnalysisResult:
"""vessel_analysis_results 테이블 28컬럼 매핑."""
mmsi: str
timestamp: datetime
# 분류 결과
vessel_type: str = 'UNKNOWN'
confidence: float = 0.0
fishing_pct: float = 0.0
cluster_id: int = -1
season: str = 'UNKNOWN'
# ALGO 01: 위치
zone: str = 'EEZ_OR_BEYOND'
dist_to_baseline_nm: float = 999.0
# ALGO 02: 활동 상태
activity_state: str = 'UNKNOWN'
ucaf_score: float = 0.0
ucft_score: float = 0.0
# ALGO 03: 다크 베셀
is_dark: bool = False
gap_duration_min: int = 0
# ALGO 04: GPS 스푸핑
spoofing_score: float = 0.0
bd09_offset_m: float = 0.0
speed_jump_count: int = 0
# ALGO 05+06: 선단
cluster_size: int = 0
is_leader: bool = False
fleet_role: str = 'NOISE'
# ALGO 07: 위험도
risk_score: int = 0
risk_level: str = 'LOW'
# ALGO 08: 환적 의심
is_transship_suspect: bool = False
transship_pair_mmsi: str = ''
transship_duration_min: int = 0
# 특징 벡터
features: dict = field(default_factory=dict)
# 메타
analyzed_at: Optional[datetime] = None
def __post_init__(self):
if self.analyzed_at is None:
self.analyzed_at = datetime.now(timezone.utc)
def to_db_tuple(self) -> tuple:
import json
def _f(v: object) -> float:
"""numpy float → Python float 변환."""
return float(v) if v is not None else 0.0
def _i(v: object) -> int:
"""numpy int → Python int 변환."""
return int(v) if v is not None else 0
# features dict 내부 numpy 값도 변환
safe_features = {k: float(v) for k, v in self.features.items()} if self.features else {}
return (
str(self.mmsi),
self.timestamp,
str(self.vessel_type),
_f(self.confidence),
_f(self.fishing_pct),
_i(self.cluster_id),
str(self.season),
str(self.zone),
_f(self.dist_to_baseline_nm),
str(self.activity_state),
_f(self.ucaf_score),
_f(self.ucft_score),
bool(self.is_dark),
_i(self.gap_duration_min),
_f(self.spoofing_score),
_f(self.bd09_offset_m),
_i(self.speed_jump_count),
_i(self.cluster_size),
bool(self.is_leader),
str(self.fleet_role),
_i(self.risk_score),
str(self.risk_level),
bool(self.is_transship_suspect),
str(self.transship_pair_mmsi),
_i(self.transship_duration_min),
json.dumps(safe_features),
self.analyzed_at,
)

파일 보기

파일 보기

@ -0,0 +1,31 @@
import pandas as pd
from pipeline.constants import SOG_STATIONARY_MAX, SOG_FISHING_MAX
class BehaviorDetector:
"""
속도 기반 3단계 행동 분류 (Yan et al. 2022, Natale et al. 2015)
정박(STATIONARY) / 조업(FISHING) / 항행(SAILING)
"""
@staticmethod
def classify_point(sog: float) -> str:
if sog < SOG_STATIONARY_MAX:
return 'STATIONARY'
elif sog <= SOG_FISHING_MAX:
return 'FISHING'
else:
return 'SAILING'
def detect(self, df: pd.DataFrame) -> pd.DataFrame:
df = df.copy()
df['state'] = df['sog'].apply(self.classify_point)
return df
@staticmethod
def compute_fishing_ratio(df_vessel: pd.DataFrame) -> float:
total = len(df_vessel)
if total == 0:
return 0.0
fishing = (df_vessel['state'] == 'FISHING').sum()
return round(fishing / total * 100, 2)

파일 보기

@ -0,0 +1,100 @@
import pandas as pd
from typing import Dict, Tuple
class VesselTypeClassifier:
"""
Rule-based scoring classifier for fishing vessel types.
Scoring: for each feature in a type's profile, if the value falls within
the defined range a distance-based score is added (closer to the range
centre = higher score). Values outside the range incur a penalty.
Returns (vessel_type, confidence).
TRAWL trawling speed 2.54.5 kt, high COG variation
PURSE purse-seine speed 35 kt, circular COG pattern
LONGLINE longline speed 0.52 kt, low COG variation, long fishing runs
TRAP trap/pot speed ~0 kt, many stationary events, short range
"""
PROFILES: Dict[str, Dict[str, Tuple[float, float]]] = {
'TRAWL': {
'sog_fishing_mean': (2.5, 4.5),
'cog_change_mean': (0.15, 9.9),
'fishing_pct': (0.3, 0.7),
'fishing_run_mean': (5, 50),
'stationary_events': (0, 5),
},
'PURSE': {
'sog_fishing_mean': (3.0, 5.0),
'cog_circularity': (0.2, 1.0),
'fishing_pct': (0.1, 0.5),
'fishing_run_mean': (3, 30),
'stationary_events': (0, 3),
},
'LONGLINE': {
'sog_fishing_mean': (0.5, 2.5),
'cog_change_mean': (0.0, 0.15),
'fishing_pct': (0.4, 0.9),
'fishing_run_mean': (20, 999),
'stationary_events': (0, 10),
},
'TRAP': {
'sog_fishing_mean': (0.0, 2.0),
'stationary_pct': (0.2, 0.8),
'stationary_events': (5, 999),
'fishing_run_mean': (1, 10),
'total_distance_km': (0, 100),
},
}
def classify(self, features: Dict) -> Tuple[str, float]:
"""Classify a vessel from its feature dict.
Returns:
(vessel_type, confidence) where confidence is in [0, 1].
"""
if not features:
return 'UNKNOWN', 0.0
scores: Dict[str, float] = {}
for vtype, profile in self.PROFILES.items():
score = 0.0
matched = 0
for feat_name, (lo, hi) in profile.items():
val = features.get(feat_name)
if val is None:
continue
matched += 1
if lo <= val <= hi:
mid = (lo + hi) / 2
span = (hi - lo) / 2 if (hi - lo) > 0 else 1
score += max(0.0, 1 - abs(val - mid) / span)
else:
overshoot = min(abs(val - lo), abs(val - hi))
score -= min(0.5, overshoot / (hi - lo + 1e-9))
scores[vtype] = score / matched if matched > 0 else 0.0
best_type = max(scores, key=lambda k: scores[k])
total = sum(max(v, 0.0) for v in scores.values())
confidence = scores[best_type] / total if total > 0 else 0.0
return best_type, round(confidence, 3)
def get_season(ts: pd.Timestamp) -> str:
"""Return the Northern-Hemisphere season for a timestamp.
Reference: paper 12 seasonal activity analysis (Chinese EEZ).
Chinese fishing ban period: Yellow Sea / East China Sea MaySep,
South China Sea MayAug.
"""
m = ts.month
if m in [3, 4, 5]:
return 'SPRING'
elif m in [6, 7, 8]:
return 'SUMMER'
elif m in [9, 10, 11]:
return 'FALL'
else:
return 'WINTER'

파일 보기

@ -0,0 +1,101 @@
from collections import Counter
from typing import Dict, Optional
import numpy as np
import pandas as pd
from pipeline.constants import BIRCH_THRESHOLD, BIRCH_BRANCHING, MIN_CLUSTER_SIZE
class EnhancedBIRCHClusterer:
"""Trajectory clustering using sklearn Birch with a simple K-means fallback.
Based on the enhanced-BIRCH approach (Yan, Yang et al.):
1. Resample each trajectory to a fixed-length vector.
2. Build a BIRCH CF-tree for memory-efficient hierarchical clustering.
3. Small clusters (< MIN_CLUSTER_SIZE) are relabelled as noise (-1).
"""
def __init__(
self,
threshold: float = BIRCH_THRESHOLD,
branching: int = BIRCH_BRANCHING,
n_clusters: Optional[int] = None,
) -> None:
self.threshold = threshold
self.branching = branching
self.n_clusters = n_clusters
self._model = None
def _traj_to_vector(self, df_vessel: pd.DataFrame, n_points: int = 20) -> np.ndarray:
"""Convert a vessel trajectory DataFrame to a fixed-length vector.
Linearly samples n_points from the trajectory and interleaves lat/lon
values, then normalises to zero mean / unit variance.
"""
lats = df_vessel['lat'].values
lons = df_vessel['lon'].values
idx = np.linspace(0, len(lats) - 1, n_points).astype(int)
vec = np.concatenate([lats[idx], lons[idx]])
vec = (vec - vec.mean()) / (vec.std() + 1e-9)
return vec
def fit_predict(self, vessels: Dict[str, pd.DataFrame]) -> Dict[str, int]:
"""Cluster vessel trajectories.
Args:
vessels: mapping of mmsi -> resampled trajectory DataFrame.
Returns:
Mapping of mmsi -> cluster_id. Vessels in small clusters are
assigned cluster_id -1 (noise). Vessels with fewer than 20
points are excluded from the result.
"""
mmsi_list: list[str] = []
vectors: list[np.ndarray] = []
for mmsi, df_v in vessels.items():
if len(df_v) < 20:
continue
mmsi_list.append(mmsi)
vectors.append(self._traj_to_vector(df_v))
if len(vectors) < 3:
return {m: 0 for m in mmsi_list}
X = np.array(vectors)
try:
from sklearn.cluster import Birch
model = Birch(
threshold=self.threshold,
branching_factor=self.branching,
n_clusters=self.n_clusters,
)
labels = model.fit_predict(X)
self._model = model
except ImportError:
labels = self._simple_cluster(X)
cnt = Counter(labels)
labels = np.array([lbl if cnt[lbl] >= MIN_CLUSTER_SIZE else -1 for lbl in labels])
return dict(zip(mmsi_list, labels.tolist()))
@staticmethod
def _simple_cluster(X: np.ndarray, k: int = 5) -> np.ndarray:
"""Fallback K-means used when sklearn is unavailable."""
n = len(X)
k = min(k, n)
centers = X[np.random.choice(n, k, replace=False)]
labels = np.zeros(n, dtype=int)
for _ in range(20):
dists = np.array([[np.linalg.norm(x - c) for c in centers] for x in X])
labels = dists.argmin(axis=1)
new_centers = np.array(
[X[labels == i].mean(axis=0) if (labels == i).any() else centers[i] for i in range(k)]
)
if np.allclose(centers, new_centers, atol=1e-6):
break
centers = new_centers
return labels

파일 보기

@ -0,0 +1,26 @@
SOG_STATIONARY_MAX = 1.0
SOG_FISHING_MAX = 5.0
SOG_SAILING_MIN = 5.0
VESSEL_SOG_PROFILE = {
'TRAWL': {'min': 1.5, 'max': 4.5, 'mean': 2.8, 'cog_var': 'high'},
'PURSE': {'min': 2.0, 'max': 5.0, 'mean': 3.5, 'cog_var': 'circular'},
'LONGLINE': {'min': 0.5, 'max': 3.0, 'mean': 1.8, 'cog_var': 'low'},
'TRAP': {'min': 0.0, 'max': 2.0, 'mean': 0.8, 'cog_var': 'very_low'},
}
RESAMPLE_INTERVAL_MIN = 4
BIRCH_THRESHOLD = 0.35
BIRCH_BRANCHING = 50
MIN_CLUSTER_SIZE = 5
MMSI_DIGITS = 9
MAX_VESSEL_LENGTH = 300
MAX_SOG_KNOTS = 30.0
MIN_TRAJ_POINTS = 20
KR_BOUNDS = {
'lat_min': 32.0, 'lat_max': 39.0,
'lon_min': 124.0, 'lon_max': 132.0,
}

파일 보기

@ -0,0 +1,93 @@
import math
import numpy as np
import pandas as pd
from typing import Dict
class FeatureExtractor:
"""
어선 유형 분류를 위한 특징 벡터 추출
논문 12 (남중국해 어선 유형 식별) 기반 핵심 피처:
- 속도 통계 (mean, std, 분위수)
- 침로 변동성 (COG variance 선회 패턴)
- 조업 비율 조업 지속 시간
- 이동 거리 해역 커버리지
- 정박 빈도 (투망/양망 간격 추정)
"""
@staticmethod
def haversine(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""두 좌표 간 거리 (km)"""
R = 6371.0
phi1, phi2 = math.radians(lat1), math.radians(lat2)
dphi = math.radians(lat2 - lat1)
dlam = math.radians(lon2 - lon1)
a = math.sin(dphi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(dlam / 2) ** 2
return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
def extract(self, df_vessel: pd.DataFrame) -> Dict[str, float]:
if len(df_vessel) < 10:
return {}
sog = df_vessel['sog'].values
cog = df_vessel['cog'].values
states = df_vessel['state'].values
# Speed features
fishing_sog = sog[states == 'FISHING'] if (states == 'FISHING').any() else np.array([0])
feat: Dict[str, float] = {
'sog_mean': float(np.mean(sog)),
'sog_std': float(np.std(sog)),
'sog_fishing_mean': float(np.mean(fishing_sog)),
'sog_fishing_std': float(np.std(fishing_sog)),
'sog_q25': float(np.percentile(sog, 25)),
'sog_q75': float(np.percentile(sog, 75)),
}
# COG features (선망: 원형, 트롤: 직선왕복, 연승: 부드러운 곡선)
cog_diff = np.abs(np.diff(np.unwrap(np.radians(cog))))
feat['cog_change_mean'] = float(np.mean(cog_diff))
feat['cog_change_std'] = float(np.std(cog_diff))
feat['cog_circularity'] = float(np.sum(cog_diff > np.pi / 4) / len(cog_diff))
# State ratios
n = len(states)
feat['fishing_pct'] = float((states == 'FISHING').sum() / n)
feat['stationary_pct'] = float((states == 'STATIONARY').sum() / n)
feat['sailing_pct'] = float((states == 'SAILING').sum() / n)
# Stationary events (투망·양망 횟수 추정)
stationary_events = 0
prev = None
for s in states:
if s == 'STATIONARY' and prev != 'STATIONARY':
stationary_events += 1
prev = s
feat['stationary_events'] = float(stationary_events)
# Total distance (km)
lats = df_vessel['lat'].values
lons = df_vessel['lon'].values
total_dist = sum(
self.haversine(lats[i], lons[i], lats[i + 1], lons[i + 1])
for i in range(len(lats) - 1)
)
feat['total_distance_km'] = round(total_dist, 2)
# Coverage (바운딩 박스 면적 — 근사)
feat['coverage_deg2'] = round(float(np.ptp(lats)) * float(np.ptp(lons)), 4)
# Average fishing run length
fishing_runs = []
run = 0
for s in states:
if s == 'FISHING':
run += 1
elif run > 0:
fishing_runs.append(run)
run = 0
if run > 0:
fishing_runs.append(run)
feat['fishing_run_mean'] = float(np.mean(fishing_runs)) if fishing_runs else 0.0
return feat

파일 보기

@ -0,0 +1,95 @@
import logging
import pandas as pd
from pipeline.preprocessor import AISPreprocessor
from pipeline.behavior import BehaviorDetector
from pipeline.resampler import TrajectoryResampler
from pipeline.features import FeatureExtractor
from pipeline.classifier import VesselTypeClassifier, get_season
from pipeline.clusterer import EnhancedBIRCHClusterer
from pipeline.constants import RESAMPLE_INTERVAL_MIN
logger = logging.getLogger(__name__)
class ChineseFishingVesselPipeline:
"""7-step pipeline for classifying Chinese fishing vessel activity types.
Steps:
1. AIS preprocessing (Yan et al. 2022)
2. Behaviour-state detection (speed-based 3-class)
3. Trajectory resampling (Yan, Yang et al. 4-minute interval)
4. Feature vector extraction (paper 12)
5. Vessel-type classification (rule-based scoring)
6. Enhanced BIRCH trajectory clustering (Yan, Yang et al.)
7. Seasonal activity tagging (paper 12)
"""
def __init__(self) -> None:
self.preprocessor = AISPreprocessor()
self.detector = BehaviorDetector()
self.resampler = TrajectoryResampler(RESAMPLE_INTERVAL_MIN)
self.extractor = FeatureExtractor()
self.classifier = VesselTypeClassifier()
self.clusterer = EnhancedBIRCHClusterer()
def run(
self, df_raw: pd.DataFrame
) -> tuple[list[dict], dict[str, pd.DataFrame]]:
"""Run the 7-step pipeline.
Args:
df_raw: raw AIS DataFrame with columns mmsi, timestamp, lat, lon,
sog, cog.
Returns:
(results, vessel_dfs) where:
- results is a list of classification dicts, each containing:
mmsi, vessel_type, confidence, fishing_pct, cluster_id, season,
n_points, features.
- vessel_dfs is a mapping of mmsi -> resampled trajectory DataFrame.
"""
# Step 1: preprocess
df = self.preprocessor.run(df_raw)
if len(df) == 0:
logger.warning('pipeline: no rows after preprocessing')
return [], {}
# Step 2: behaviour detection
df = self.detector.detect(df)
# Steps 35: per-vessel processing
vessel_dfs: dict[str, pd.DataFrame] = {}
results: list[dict] = []
for mmsi, df_v in df.groupby('mmsi'):
df_resampled = self.resampler.resample(df_v)
vessel_dfs[mmsi] = df_resampled
features = self.extractor.extract(df_resampled)
vtype, confidence = self.classifier.classify(features)
fishing_pct = BehaviorDetector.compute_fishing_ratio(df_resampled)
season = get_season(df_v['timestamp'].iloc[len(df_v) // 2])
results.append({
'mmsi': mmsi,
'vessel_type': vtype,
'confidence': confidence,
'fishing_pct': fishing_pct,
'season': season,
'n_points': len(df_resampled),
'features': features,
})
# Step 6: BIRCH clustering
cluster_map = self.clusterer.fit_predict(vessel_dfs)
for r in results:
r['cluster_id'] = cluster_map.get(r['mmsi'], -1)
logger.info(
'pipeline complete: %d vessels, types=%s',
len(results),
{r['vessel_type'] for r in results},
)
return results, vessel_dfs

파일 보기

@ -0,0 +1,52 @@
import pandas as pd
from collections import defaultdict
from pipeline.constants import KR_BOUNDS, MAX_SOG_KNOTS, MIN_TRAJ_POINTS
class AISPreprocessor:
"""Delete-Supplement-Update (Yan et al. 2022)"""
def __init__(self):
self.stats = defaultdict(int)
def run(self, df: pd.DataFrame) -> pd.DataFrame:
original = len(df)
required = ['mmsi', 'timestamp', 'lat', 'lon', 'sog', 'cog']
missing = [c for c in required if c not in df.columns]
if missing:
raise ValueError(f"필수 컬럼 누락: {missing}")
df = df.copy()
df['timestamp'] = pd.to_datetime(df['timestamp'])
valid_mmsi = df['mmsi'].astype(str).str.match(r'^\d{9}$')
df = df[valid_mmsi]
self.stats['invalid_mmsi'] += original - len(df)
df = df[(df['lat'].between(-90, 90)) & (df['lon'].between(-180, 180))]
df = df[
df['lat'].between(KR_BOUNDS['lat_min'], KR_BOUNDS['lat_max']) &
df['lon'].between(KR_BOUNDS['lon_min'], KR_BOUNDS['lon_max'])
]
df = df.sort_values(['mmsi', 'timestamp'])
df['sog'] = df.groupby('mmsi')['sog'].transform(
lambda x: x.where(
x.between(0, MAX_SOG_KNOTS),
x.rolling(3, center=True, min_periods=1).mean(),
)
)
df = df[(df['sog'] >= 0) & (df['sog'] <= MAX_SOG_KNOTS)]
counts = df.groupby('mmsi').size()
valid_mmsi_list = counts[counts >= MIN_TRAJ_POINTS].index
df = df[df['mmsi'].isin(valid_mmsi_list)]
df = df.drop_duplicates(subset=['mmsi', 'timestamp'])
self.stats['final_records'] = len(df)
self.stats['retention_pct'] = round(len(df) / max(original, 1) * 100, 2)
return df.reset_index(drop=True)

파일 보기

@ -0,0 +1,35 @@
import pandas as pd
from pipeline.constants import RESAMPLE_INTERVAL_MIN
from pipeline.behavior import BehaviorDetector
class TrajectoryResampler:
"""
불균등 AIS 수신 간격을 균등 시간 간격으로 보간
목적: BIRCH 군집화의 입력 벡터 정규화
방법: 선형 보간 (위도·경도·SOG·COG)
기준: 4 간격 (Shepperson et al. 2017)
"""
def __init__(self, interval_min: int = RESAMPLE_INTERVAL_MIN):
self.interval = pd.Timedelta(minutes=interval_min)
def resample(self, df_vessel: pd.DataFrame) -> pd.DataFrame:
df_vessel = df_vessel.sort_values('timestamp').copy()
if len(df_vessel) < 2:
return df_vessel
t_start = df_vessel['timestamp'].iloc[0]
t_end = df_vessel['timestamp'].iloc[-1]
new_times = pd.date_range(t_start, t_end, freq=self.interval)
df_vessel = df_vessel.set_index('timestamp')
df_vessel = df_vessel.reindex(df_vessel.index.union(new_times))
for col in ['lat', 'lon', 'sog', 'cog']:
if col in df_vessel.columns:
df_vessel[col] = df_vessel[col].interpolate(method='time')
df_vessel = df_vessel.loc[new_times].reset_index()
df_vessel.rename(columns={'index': 'timestamp'}, inplace=True)
df_vessel['state'] = df_vessel['sog'].apply(BehaviorDetector.classify_point)
return df_vessel

파일 보기

@ -0,0 +1,12 @@
fastapi==0.115.0
uvicorn==0.30.6
pydantic-settings>=2.0
psycopg2-binary>=2.9
numpy>=1.26
pandas>=2.2
scikit-learn>=1.5
apscheduler>=3.10
shapely>=2.0
tzdata
httpx>=0.27
redis>=5.0

385
prediction/scheduler.py Normal file
파일 보기

@ -0,0 +1,385 @@
import logging
import time
from datetime import datetime, timezone
from typing import Optional
from apscheduler.schedulers.background import BackgroundScheduler
from config import settings
logger = logging.getLogger(__name__)
_scheduler: Optional[BackgroundScheduler] = None
_last_run: dict = {
'timestamp': None,
'duration_sec': 0,
'vessel_count': 0,
'upserted': 0,
'error': None,
}
_transship_pair_history: dict = {}
def get_last_run() -> dict:
return _last_run.copy()
def run_analysis_cycle():
"""5분 주기 분석 사이클 — 인메모리 캐시 기반."""
import re as _re
from cache.vessel_store import vessel_store
from db import snpdb, kcgdb
from pipeline.orchestrator import ChineseFishingVesselPipeline
from algorithms.location import classify_zone
from algorithms.fishing_pattern import compute_ucaf_score, compute_ucft_score
from algorithms.dark_vessel import is_dark_vessel
from algorithms.spoofing import compute_spoofing_score, count_speed_jumps, compute_bd09_offset
from algorithms.risk import compute_vessel_risk_score
from fleet_tracker import fleet_tracker
from models.result import AnalysisResult
start = time.time()
_last_run['timestamp'] = datetime.now(timezone.utc).isoformat()
_last_run['error'] = None
try:
# 1. 증분 로드 + stale 제거
if vessel_store.last_bucket is None:
logger.warning('last_bucket is None, skipping incremental fetch (initial load not complete)')
df_new = None
else:
df_new = snpdb.fetch_incremental(vessel_store.last_bucket)
if df_new is not None and len(df_new) > 0:
vessel_store.merge_incremental(df_new)
vessel_store.evict_stale(settings.CACHE_WINDOW_HOURS)
# 정적정보 / 허가어선 주기적 갱신
vessel_store.refresh_static_info()
vessel_store.refresh_permit_registry()
# 2. 분석 대상 선별 (SOG/COG 계산 포함)
df_targets = vessel_store.select_analysis_targets()
if len(df_targets) == 0:
logger.info('no analysis targets, skipping cycle')
_last_run['vessel_count'] = 0
return
# 3. 7단계 파이프라인 실행
pipeline = ChineseFishingVesselPipeline()
classifications, vessel_dfs = pipeline.run(df_targets)
if not classifications:
logger.info('no vessels classified, skipping')
_last_run['vessel_count'] = 0
return
# 4. 등록 선단 기반 fleet 분석
_gear_re = _re.compile(r'^.+_(?=\S*\d)\S+(?:[_ ]\S*)*[_ ]*$|^\d+$|^.+%$')
with kcgdb.get_conn() as kcg_conn:
fleet_tracker.load_registry(kcg_conn)
all_ais = []
for mmsi, df in vessel_dfs.items():
if len(df) > 0:
last = df.iloc[-1]
all_ais.append({
'mmsi': mmsi,
'name': vessel_store.get_vessel_info(mmsi).get('name', ''),
'lat': float(last['lat']),
'lon': float(last['lon']),
})
fleet_tracker.match_ais_to_registry(all_ais, kcg_conn)
gear_signals = [v for v in all_ais if _gear_re.match(v.get('name', ''))]
fleet_tracker.track_gear_identity(gear_signals, kcg_conn)
fleet_roles = fleet_tracker.build_fleet_clusters(vessel_dfs)
fleet_tracker.save_snapshot(vessel_dfs, kcg_conn)
gear_groups = []
# 4.5 그룹 폴리곤 생성 + 저장
try:
from algorithms.polygon_builder import detect_gear_groups, build_all_group_snapshots
company_vessels = fleet_tracker.get_company_vessels(vessel_dfs)
gear_groups = detect_gear_groups(vessel_store)
group_snapshots = build_all_group_snapshots(
vessel_store, company_vessels,
fleet_tracker._companies,
)
saved = kcgdb.save_group_snapshots(group_snapshots)
cleaned = kcgdb.cleanup_group_snapshots(days=7)
logger.info('group polygons: %d saved, %d cleaned, %d gear groups',
saved, cleaned, len(gear_groups))
except Exception as e:
logger.warning('group polygon generation failed: %s', e)
# 4.7 어구 연관성 분석 (멀티모델 패턴 추적)
try:
from algorithms.gear_correlation import run_gear_correlation
from algorithms.gear_parent_inference import run_gear_parent_inference
corr_result = run_gear_correlation(
vessel_store=vessel_store,
gear_groups=gear_groups,
conn=kcg_conn,
)
logger.info(
'gear correlation: %d scores updated, %d raw metrics, %d models',
corr_result['updated'], corr_result['raw_inserted'],
corr_result['models'],
)
inference_result = run_gear_parent_inference(
vessel_store=vessel_store,
gear_groups=gear_groups,
conn=kcg_conn,
)
logger.info(
'gear parent inference: %d groups, %d direct-match, %d candidates, %d promoted, %d review, %d skipped',
inference_result['groups'],
inference_result.get('direct_matched', 0),
inference_result['candidates'],
inference_result['promoted'],
inference_result['review_required'],
inference_result['skipped'],
)
except Exception as e:
logger.warning('gear correlation failed: %s', e)
# 5. 선박별 추가 알고리즘 → AnalysisResult 생성
results = []
for c in classifications:
mmsi = c['mmsi']
df_v = vessel_dfs.get(mmsi)
if df_v is None or len(df_v) == 0:
continue
last_row = df_v.iloc[-1]
ts = last_row.get('timestamp')
zone_info = classify_zone(last_row['lat'], last_row['lon'])
gear_map = {'TRAWL': 'OT', 'PURSE': 'PS', 'LONGLINE': 'GN', 'TRAP': 'TRAP'}
gear = gear_map.get(c['vessel_type'], 'OT')
ucaf = compute_ucaf_score(df_v, gear)
ucft = compute_ucft_score(df_v)
dark, gap_min = is_dark_vessel(df_v)
spoof_score = compute_spoofing_score(df_v)
speed_jumps = count_speed_jumps(df_v)
bd09_offset = compute_bd09_offset(last_row['lat'], last_row['lon'])
fleet_info = fleet_roles.get(mmsi, {})
is_permitted = vessel_store.is_permitted(mmsi)
risk_score, risk_level = compute_vessel_risk_score(
mmsi, df_v, zone_info, is_permitted=is_permitted,
)
activity = 'UNKNOWN'
if 'state' in df_v.columns and len(df_v) > 0:
activity = df_v['state'].mode().iloc[0]
results.append(AnalysisResult(
mmsi=mmsi,
timestamp=ts,
vessel_type=c['vessel_type'],
confidence=c['confidence'],
fishing_pct=c['fishing_pct'],
cluster_id=fleet_info.get('cluster_id', -1),
season=c['season'],
zone=zone_info.get('zone', 'EEZ_OR_BEYOND'),
dist_to_baseline_nm=zone_info.get('dist_from_baseline_nm', 999.0),
activity_state=activity,
ucaf_score=ucaf,
ucft_score=ucft,
is_dark=dark,
gap_duration_min=gap_min,
spoofing_score=spoof_score,
bd09_offset_m=bd09_offset,
speed_jump_count=speed_jumps,
cluster_size=fleet_info.get('cluster_size', 0),
is_leader=fleet_info.get('is_leader', False),
fleet_role=fleet_info.get('fleet_role', 'NOISE'),
risk_score=risk_score,
risk_level=risk_level,
features=c.get('features', {}),
))
# ── 5.5 경량 분석 — 파이프라인 미통과 412* 선박 ──
from algorithms.risk import compute_lightweight_risk_score
pipeline_mmsis = {c['mmsi'] for c in classifications}
lightweight_mmsis = vessel_store.get_chinese_mmsis() - pipeline_mmsis
if lightweight_mmsis:
now = datetime.now(timezone.utc)
all_positions = vessel_store.get_all_latest_positions()
lw_count = 0
for mmsi in lightweight_mmsis:
pos = all_positions.get(mmsi)
if pos is None or pos.get('lat') is None:
continue
lat, lon = pos['lat'], pos['lon']
sog = pos.get('sog', 0) or 0
cog = pos.get('cog', 0) or 0
ts = pos.get('timestamp', now)
zone_info = classify_zone(lat, lon)
if sog <= 1.0:
state = 'STATIONARY'
elif sog <= 5.0:
state = 'FISHING'
else:
state = 'SAILING'
is_permitted = vessel_store.is_permitted(mmsi)
risk_score, risk_level = compute_lightweight_risk_score(
zone_info, sog, is_permitted=is_permitted,
)
# BD-09 오프셋은 중국 선박이므로 제외 (412* = 중국)
results.append(AnalysisResult(
mmsi=mmsi,
timestamp=ts,
vessel_type='UNKNOWN',
confidence=0.0,
fishing_pct=0.0,
zone=zone_info.get('zone', 'EEZ_OR_BEYOND'),
dist_to_baseline_nm=zone_info.get('dist_from_baseline_nm', 999.0),
activity_state=state,
ucaf_score=0.0,
ucft_score=0.0,
is_dark=False,
gap_duration_min=0,
spoofing_score=0.0,
bd09_offset_m=0.0,
speed_jump_count=0,
cluster_id=-1,
cluster_size=0,
is_leader=False,
fleet_role='NONE',
risk_score=risk_score,
risk_level=risk_level,
is_transship_suspect=False,
transship_pair_mmsi='',
transship_duration_min=0,
))
lw_count += 1
logger.info('lightweight analysis: %d vessels', lw_count)
# 6. 환적 의심 탐지 (pair_history 모듈 레벨로 사이클 간 유지)
from algorithms.transshipment import detect_transshipment
results_map = {r.mmsi: r for r in results}
transship_pairs = detect_transshipment(df_targets, _transship_pair_history)
for mmsi_a, mmsi_b, dur in transship_pairs:
if mmsi_a in results_map:
results_map[mmsi_a].is_transship_suspect = True
results_map[mmsi_a].transship_pair_mmsi = mmsi_b
results_map[mmsi_a].transship_duration_min = dur
if mmsi_b in results_map:
results_map[mmsi_b].is_transship_suspect = True
results_map[mmsi_b].transship_pair_mmsi = mmsi_a
results_map[mmsi_b].transship_duration_min = dur
# 7. 결과 저장
upserted = kcgdb.upsert_results(results)
kcgdb.cleanup_old(hours=48)
# 8. Redis에 분석 컨텍스트 캐싱 (채팅용)
try:
from chat.cache import cache_analysis_context
results_map = {r.mmsi: r for r in results}
risk_dist = {}
zone_dist = {}
dark_count = 0
spoofing_count = 0
transship_count = 0
top_risk_list = []
for r in results:
risk_dist[r.risk_level] = risk_dist.get(r.risk_level, 0) + 1
zone_dist[r.zone] = zone_dist.get(r.zone, 0) + 1
if r.is_dark:
dark_count += 1
if r.spoofing_score > 0.5:
spoofing_count += 1
if r.is_transship_suspect:
transship_count += 1
top_risk_list.append({
'mmsi': r.mmsi,
'name': vessel_store.get_vessel_info(r.mmsi).get('name', r.mmsi),
'risk_score': r.risk_score,
'risk_level': r.risk_level,
'zone': r.zone,
'is_dark': r.is_dark,
'is_transship': r.is_transship_suspect,
'activity_state': r.activity_state,
})
top_risk_list.sort(key=lambda x: x['risk_score'], reverse=True)
cache_analysis_context({
'vessel_stats': vessel_store.stats(),
'risk_distribution': {**risk_dist, **zone_dist},
'dark_count': dark_count,
'spoofing_count': spoofing_count,
'transship_count': transship_count,
'top_risk_vessels': top_risk_list[:10],
'polygon_summary': kcgdb.fetch_polygon_summary(),
})
except Exception as e:
logger.warning('failed to cache analysis context for chat: %s', e)
elapsed = round(time.time() - start, 2)
_last_run['duration_sec'] = elapsed
_last_run['vessel_count'] = len(results)
_last_run['upserted'] = upserted
logger.info(
'analysis cycle: %d vessels, %d upserted, %.2fs',
len(results), upserted, elapsed,
)
except Exception as e:
_last_run['error'] = str(e)
logger.exception('analysis cycle failed: %s', e)
def start_scheduler():
global _scheduler
_scheduler = BackgroundScheduler()
_scheduler.add_job(
run_analysis_cycle,
'interval',
minutes=settings.SCHEDULER_INTERVAL_MIN,
id='vessel_analysis',
max_instances=1,
replace_existing=True,
)
# 파티션 유지보수 (매일 04:00)
from db.partition_manager import maintain_partitions
_scheduler.add_job(
maintain_partitions,
'cron', hour=4, minute=0,
id='partition_maintenance',
max_instances=1,
replace_existing=True,
)
_scheduler.start()
logger.info('scheduler started (interval=%dm)', settings.SCHEDULER_INTERVAL_MIN)
def stop_scheduler():
global _scheduler
if _scheduler:
_scheduler.shutdown(wait=False)
_scheduler = None
logger.info('scheduler stopped')

파일 보기

@ -0,0 +1,176 @@
"""선단 구성 JSX → kcgdb fleet_companies + fleet_vessels 적재.
Usage: python3 prediction/scripts/load_fleet_registry.py
"""
import json
import re
import sys
from pathlib import Path
import psycopg2
import psycopg2.extras
# JSX 파일에서 D 배열 추출
JSX_PATH = Path(__file__).parent.parent.parent.parent / 'gc-wing-dev' / 'legacy' / '선단구성_906척_어업수역 (1).jsx'
# kcgdb 접속 — prediction/.env 또는 환경변수
DB_HOST = '211.208.115.83'
DB_PORT = 5432
DB_NAME = 'kcgdb'
DB_USER = 'kcg_app'
DB_SCHEMA = 'kcg'
def parse_jsx(path: Path) -> list[list]:
"""JSX 파일에서 D=[ ... ] 배열을 파싱."""
text = path.read_text(encoding='utf-8')
# const D=[ 부터 ]; 까지 추출
m = re.search(r'const\s+D\s*=\s*\[', text)
if not m:
raise ValueError('D 배열을 찾을 수 없습니다')
start = m.end() - 1 # [ 위치
# 중첩 배열을 추적하여 닫는 ] 찾기
depth = 0
end = start
for i in range(start, len(text)):
if text[i] == '[':
depth += 1
elif text[i] == ']':
depth -= 1
if depth == 0:
end = i + 1
break
raw = text[start:end]
# JavaScript → JSON 변환 (trailing comma 제거)
raw = re.sub(r',\s*]', ']', raw)
raw = re.sub(r',\s*}', '}', raw)
return json.loads(raw)
def load_to_db(data: list[list], db_password: str):
"""파싱된 데이터를 DB에 적재."""
conn = psycopg2.connect(
host=DB_HOST, port=DB_PORT, dbname=DB_NAME,
user=DB_USER, password=db_password,
options=f'-c search_path={DB_SCHEMA}',
)
conn.autocommit = False
cur = conn.cursor()
try:
# 기존 데이터 초기화
cur.execute('DELETE FROM fleet_vessels')
cur.execute('DELETE FROM fleet_companies')
company_count = 0
vessel_count = 0
pair_links = [] # (vessel_id, pair_vessel_id) 후처리
for row in data:
if len(row) < 7:
continue
name_cn = row[0]
name_en = row[1]
# 회사 INSERT
cur.execute(
'INSERT INTO fleet_companies (name_cn, name_en) VALUES (%s, %s) RETURNING id',
(name_cn, name_en),
)
company_id = cur.fetchone()[0]
company_count += 1
# 인덱스: 0=own, 1=ownEn, 2=pairs, 3=gn, 4=ot, 5=ps, 6=fc, 7=upt, 8=upts
pairs = row[2] if len(row) > 2 and isinstance(row[2], list) else []
gn = row[3] if len(row) > 3 and isinstance(row[3], list) else []
ot = row[4] if len(row) > 4 and isinstance(row[4], list) else []
ps = row[5] if len(row) > 5 and isinstance(row[5], list) else []
fc = row[6] if len(row) > 6 and isinstance(row[6], list) else []
upt = row[7] if len(row) > 7 and isinstance(row[7], list) else []
upts = row[8] if len(row) > 8 and isinstance(row[8], list) else []
def insert_vessel(v, gear_code, role):
nonlocal vessel_count
if not isinstance(v, list) or len(v) < 4:
return None
cur.execute(
'''INSERT INTO fleet_vessels
(company_id, permit_no, name_cn, name_en, tonnage, gear_code, fleet_role)
VALUES (%s, %s, %s, %s, %s, %s, %s) RETURNING id''',
(company_id, v[0], v[1], v[2], v[3], gear_code, role),
)
vessel_count += 1
return cur.fetchone()[0]
# PT 본선쌍 (pairs)
for pair in pairs:
if not isinstance(pair, list) or len(pair) < 2:
continue
main_id = insert_vessel(pair[0], 'C21', 'MAIN')
sub_id = insert_vessel(pair[1], 'C21', 'SUB')
if main_id and sub_id:
pair_links.append((main_id, sub_id))
# GN 유자망
for v in gn:
insert_vessel(v, 'C25', 'GN')
# OT 기타
for v in ot:
insert_vessel(v, 'C22', 'OT')
# PS 선망
for v in ps:
insert_vessel(v, 'C23', 'PS')
# FC 운반선
for v in fc:
insert_vessel(v, 'C40', 'FC')
# UPT 단독 본선
for v in upt:
insert_vessel(v, 'C21', 'MAIN_SOLO')
# UPTS 단독 부속선
for v in upts:
insert_vessel(v, 'C21', 'SUB_SOLO')
# PT 쌍 상호 참조 설정
for main_id, sub_id in pair_links:
cur.execute('UPDATE fleet_vessels SET pair_vessel_id = %s WHERE id = %s', (sub_id, main_id))
cur.execute('UPDATE fleet_vessels SET pair_vessel_id = %s WHERE id = %s', (main_id, sub_id))
conn.commit()
print(f'적재 완료: {company_count}개 회사, {vessel_count}척 선박, {len(pair_links)}쌍 PT')
except Exception as e:
conn.rollback()
print(f'적재 실패: {e}', file=sys.stderr)
raise
finally:
cur.close()
conn.close()
if __name__ == '__main__':
if not JSX_PATH.exists():
print(f'파일을 찾을 수 없습니다: {JSX_PATH}', file=sys.stderr)
sys.exit(1)
# DB 비밀번호 — 환경변수 또는 직접 입력
import os
password = os.environ.get('KCGDB_PASSWORD', 'Kcg2026monitor')
print(f'JSX 파싱: {JSX_PATH}')
data = parse_jsx(JSX_PATH)
print(f'파싱 완료: {len(data)}개 회사')
print('DB 적재 시작...')
load_to_db(data, password)

파일 보기

@ -0,0 +1,177 @@
import unittest
import sys
import types
from datetime import datetime, timedelta, timezone
stub = types.ModuleType('pydantic_settings')
class BaseSettings:
def __init__(self, **kwargs):
for name, value in self.__class__.__dict__.items():
if name.isupper():
setattr(self, name, kwargs.get(name, value))
stub.BaseSettings = BaseSettings
sys.modules.setdefault('pydantic_settings', stub)
from algorithms.gear_parent_episode import (
GroupEpisodeInput,
EpisodeState,
build_episode_plan,
compute_prior_bonus_components,
continuity_score,
)
class GearParentEpisodeTest(unittest.TestCase):
def test_continuity_score_prefers_member_overlap_and_near_center(self):
current = GroupEpisodeInput(
group_key='ZHEDAIYU02394',
normalized_parent_name='ZHEDAIYU02394',
sub_cluster_id=1,
member_mmsis=['100', '200', '300'],
member_count=3,
center_lat=35.0,
center_lon=129.0,
)
previous = EpisodeState(
episode_id='ep-prev',
lineage_key='ZHEDAIYU02394',
group_key='ZHEDAIYU02394',
normalized_parent_name='ZHEDAIYU02394',
current_sub_cluster_id=0,
member_mmsis=['100', '200', '400'],
member_count=3,
center_lat=35.02,
center_lon=129.01,
last_snapshot_time=datetime.now(timezone.utc),
status='ACTIVE',
)
score, overlap_count, distance_nm = continuity_score(current, previous)
self.assertGreaterEqual(overlap_count, 2)
self.assertGreater(score, 0.45)
self.assertLess(distance_nm, 12.0)
def test_build_episode_plan_creates_merge_episode(self):
now = datetime.now(timezone.utc)
current = GroupEpisodeInput(
group_key='JINSHI',
normalized_parent_name='JINSHI',
sub_cluster_id=0,
member_mmsis=['a', 'b', 'c', 'd'],
member_count=4,
center_lat=35.0,
center_lon=129.0,
)
previous_a = EpisodeState(
episode_id='ep-a',
lineage_key='JINSHI',
group_key='JINSHI',
normalized_parent_name='JINSHI',
current_sub_cluster_id=1,
member_mmsis=['a', 'b'],
member_count=2,
center_lat=35.0,
center_lon=129.0,
last_snapshot_time=now - timedelta(minutes=5),
status='ACTIVE',
)
previous_b = EpisodeState(
episode_id='ep-b',
lineage_key='JINSHI',
group_key='JINSHI',
normalized_parent_name='JINSHI',
current_sub_cluster_id=2,
member_mmsis=['c', 'd'],
member_count=2,
center_lat=35.01,
center_lon=129.01,
last_snapshot_time=now - timedelta(minutes=5),
status='ACTIVE',
)
plan = build_episode_plan([current], {'JINSHI': [previous_a, previous_b]})
assignment = plan.assignments[current.key]
self.assertEqual(assignment.continuity_source, 'MERGE_NEW')
self.assertEqual(set(assignment.merged_from_episode_ids), {'ep-a', 'ep-b'})
self.assertEqual(plan.merged_episode_targets['ep-a'], assignment.episode_id)
self.assertEqual(plan.merged_episode_targets['ep-b'], assignment.episode_id)
def test_build_episode_plan_marks_split_continue_and_split_new(self):
now = datetime.now(timezone.utc)
previous = EpisodeState(
episode_id='ep-prev',
lineage_key='A01859',
group_key='A01859',
normalized_parent_name='A01859',
current_sub_cluster_id=0,
member_mmsis=['a', 'b', 'c', 'd'],
member_count=4,
center_lat=35.0,
center_lon=129.0,
last_snapshot_time=now - timedelta(minutes=5),
status='ACTIVE',
)
current_a = GroupEpisodeInput(
group_key='A01859',
normalized_parent_name='A01859',
sub_cluster_id=1,
member_mmsis=['a', 'b', 'c'],
member_count=3,
center_lat=35.0,
center_lon=129.0,
)
current_b = GroupEpisodeInput(
group_key='A01859',
normalized_parent_name='A01859',
sub_cluster_id=2,
member_mmsis=['c', 'd'],
member_count=2,
center_lat=35.02,
center_lon=129.02,
)
plan = build_episode_plan([current_a, current_b], {'A01859': [previous]})
sources = {plan.assignments[current_a.key].continuity_source, plan.assignments[current_b.key].continuity_source}
self.assertIn('SPLIT_CONTINUE', sources)
self.assertIn('SPLIT_NEW', sources)
def test_compute_prior_bonus_components_caps_total_bonus(self):
observed_at = datetime.now(timezone.utc)
bonuses = compute_prior_bonus_components(
observed_at=observed_at,
normalized_parent_name='JINSHI',
episode_id='ep-1',
candidate_mmsi='412333326',
episode_prior_stats={
('ep-1', '412333326'): {
'seen_count': 12,
'top1_count': 5,
'avg_score': 0.88,
'last_seen_at': observed_at - timedelta(hours=1),
},
},
lineage_prior_stats={
('JINSHI', '412333326'): {
'seen_count': 24,
'top1_count': 6,
'top3_count': 10,
'avg_score': 0.82,
'last_seen_at': observed_at - timedelta(hours=3),
},
},
label_prior_stats={
('JINSHI', '412333326'): {
'session_count': 4,
'last_labeled_at': observed_at - timedelta(days=1),
},
},
)
self.assertGreater(bonuses['episodePriorBonus'], 0.0)
self.assertGreater(bonuses['lineagePriorBonus'], 0.0)
self.assertGreater(bonuses['labelPriorBonus'], 0.0)
self.assertLessEqual(bonuses['priorBonusTotal'], 0.20)
if __name__ == '__main__':
unittest.main()

파일 보기

@ -0,0 +1,279 @@
import unittest
import sys
import types
from datetime import datetime, timedelta, timezone
stub = types.ModuleType('pydantic_settings')
class BaseSettings:
def __init__(self, **kwargs):
for name, value in self.__class__.__dict__.items():
if name.isupper():
setattr(self, name, kwargs.get(name, value))
stub.BaseSettings = BaseSettings
sys.modules.setdefault('pydantic_settings', stub)
from algorithms.gear_parent_inference import (
RegistryVessel,
CandidateScore,
_AUTO_PROMOTED_STATUS,
_apply_final_score_bonus,
_build_track_coverage_metrics,
_build_candidate_scores,
_china_mmsi_prefix_bonus,
_direct_parent_member,
_direct_parent_stable_cycles,
_label_tracking_row,
_NO_CANDIDATE_STATUS,
_REVIEW_REQUIRED_STATUS,
_UNRESOLVED_STATUS,
_name_match_score,
_select_status,
_top_candidate_stable_cycles,
is_trackable_parent_name,
normalize_parent_name,
)
class GearParentInferenceRuleTest(unittest.TestCase):
def _candidate(self, *, mmsi='123456789', score=0.8, sources=None):
return CandidateScore(
mmsi=mmsi,
name='TEST',
vessel_id=1,
target_type='VESSEL',
candidate_source=','.join(sources or ['CORRELATION']),
base_corr_score=0.7,
name_match_score=0.1,
track_similarity_score=0.8,
visit_score_6h=0.4,
proximity_score_6h=0.3,
activity_sync_score_6h=0.2,
stability_score=0.9,
registry_bonus=0.05,
episode_prior_bonus=0.0,
lineage_prior_bonus=0.0,
label_prior_bonus=0.0,
final_score=score,
streak_count=6,
model_id=1,
model_name='default',
evidence={'sources': sources or ['CORRELATION']},
)
def test_normalize_parent_name_removes_space_symbols(self):
self.assertEqual(normalize_parent_name(' A_B-C% 12 '), 'ABC12')
def test_trackable_parent_name_requires_length_four_after_normalize(self):
self.assertFalse(is_trackable_parent_name('A-1%'))
self.assertFalse(is_trackable_parent_name('ZSY'))
self.assertFalse(is_trackable_parent_name('991'))
self.assertTrue(is_trackable_parent_name(' AB_12 '))
def test_name_match_score_prefers_raw_exact(self):
self.assertEqual(_name_match_score('LUWENYU 53265', 'LUWENYU 53265', None), 1.0)
def test_name_match_score_supports_compact_exact_and_prefix(self):
registry = RegistryVessel(
vessel_id=1,
mmsi='412327765',
name_cn='LUWENYU53265',
name_en='LUWENYU 53265',
)
self.assertEqual(_name_match_score('LUWENYU 53265', 'LUWENYU53265', None), 0.8)
self.assertEqual(_name_match_score('LUWENYU 532', 'LUWENYU53265', None), 0.5)
self.assertEqual(_name_match_score('LUWENYU 53265', 'DIFFERENT', registry), 1.0)
self.assertEqual(_name_match_score('ZHEDAIYU02433', 'ZHEDAIYU06178', None), 0.3)
def test_name_match_score_does_not_use_candidate_registry_self_match(self):
registry = RegistryVessel(
vessel_id=1,
mmsi='412413545',
name_cn='ZHEXIANGYU55005',
name_en='ZHEXIANGYU55005',
)
self.assertEqual(_name_match_score('JINSHI', 'ZHEXIANGYU55005', registry), 0.0)
def test_direct_parent_member_prefers_parent_member_then_parent_mmsi(self):
all_positions = {'412420673': {'name': 'ZHEDAIYU02433'}}
from_members = _direct_parent_member(
{
'parent_name': 'ZHEDAIYU02433',
'members': [
{'mmsi': '412420673', 'name': 'ZHEDAIYU02433', 'isParent': True},
{'mmsi': '24330082', 'name': 'ZHEDAIYU02433_82_99_', 'isParent': False},
],
},
all_positions,
)
self.assertEqual(from_members['mmsi'], '412420673')
from_parent_mmsi = _direct_parent_member(
{
'parent_name': 'ZHEDAIYU02433',
'parent_mmsi': '412420673',
'members': [],
},
all_positions,
)
self.assertEqual(from_parent_mmsi['mmsi'], '412420673')
self.assertEqual(from_parent_mmsi['name'], 'ZHEDAIYU02433')
def test_direct_parent_stable_cycles_reuses_same_parent(self):
existing = {
'selected_parent_mmsi': '412420673',
'stable_cycles': 4,
'evidence_summary': {'directParentMmsi': '412420673'},
}
self.assertEqual(_direct_parent_stable_cycles(existing, '412420673'), 5)
self.assertEqual(_direct_parent_stable_cycles(existing, '412000000'), 1)
def test_china_prefix_bonus_requires_threshold(self):
self.assertEqual(_china_mmsi_prefix_bonus('412327765', 0.30), 0.15)
self.assertEqual(_china_mmsi_prefix_bonus('413987654', 0.65), 0.15)
self.assertEqual(_china_mmsi_prefix_bonus('412327765', 0.29), 0.0)
self.assertEqual(_china_mmsi_prefix_bonus('440123456', 0.75), 0.0)
def test_apply_final_score_bonus_adds_bonus_after_weighted_score(self):
pre_bonus_score, china_bonus, final_score = _apply_final_score_bonus('412333326', 0.66)
self.assertIsInstance(pre_bonus_score, float)
self.assertIsInstance(china_bonus, float)
self.assertIsInstance(final_score, float)
self.assertEqual(pre_bonus_score, 0.66)
self.assertEqual(china_bonus, 0.15)
self.assertEqual(final_score, 0.81)
def test_top_candidate_stable_cycles_resets_on_candidate_change(self):
existing = {
'stable_cycles': 5,
'evidence_summary': {'topCandidateMmsi': '111111111'},
}
self.assertEqual(_top_candidate_stable_cycles(existing, self._candidate(mmsi='111111111')), 6)
self.assertEqual(_top_candidate_stable_cycles(existing, self._candidate(mmsi='222222222')), 1)
def test_select_status_requires_recent_stability_and_correlation_for_auto(self):
self.assertEqual(
_select_status(self._candidate(score=0.8, sources=['CORRELATION']), margin=0.2, stable_cycles=3),
(_AUTO_PROMOTED_STATUS, 'AUTO_PROMOTION'),
)
self.assertEqual(
_select_status(self._candidate(score=0.8, sources=['PREVIOUS_SELECTION']), margin=0.2, stable_cycles=3),
(_REVIEW_REQUIRED_STATUS, 'AUTO_REVIEW'),
)
self.assertEqual(
_select_status(self._candidate(score=0.8, sources=['CORRELATION']), margin=0.2, stable_cycles=2),
(_REVIEW_REQUIRED_STATUS, 'AUTO_REVIEW'),
)
def test_select_status_marks_candidate_gaps_explicitly(self):
self.assertEqual(_select_status(None, margin=0.0, stable_cycles=0), (_NO_CANDIDATE_STATUS, 'AUTO_NO_CANDIDATE'))
self.assertEqual(
_select_status(self._candidate(score=0.45, sources=['CORRELATION']), margin=0.1, stable_cycles=1),
(_UNRESOLVED_STATUS, 'AUTO_SCORE'),
)
def test_build_candidate_scores_applies_active_exclusions_before_scoring(self):
class FakeStore:
_tracks = {}
candidates = _build_candidate_scores(
vessel_store=FakeStore(),
observed_at=datetime(2026, 4, 3, 0, 0, tzinfo=timezone.utc),
group={'parent_name': 'AB1234', 'sub_cluster_id': 1},
episode_assignment=types.SimpleNamespace(
episode_id='ep-test',
continuity_source='NEW',
continuity_score=0.0,
),
default_model_id=1,
default_model_name='default',
score_rows=[
{
'target_mmsi': '412111111',
'target_type': 'VESSEL',
'target_name': 'AB1234',
'current_score': 0.8,
'streak_count': 4,
},
{
'target_mmsi': '440222222',
'target_type': 'VESSEL',
'target_name': 'AB1234',
'current_score': 0.7,
'streak_count': 3,
},
],
raw_metrics={},
center_track=[],
all_positions={},
registry_by_mmsi={},
registry_by_name={},
existing=None,
excluded_candidate_mmsis={'412111111'},
episode_prior_stats={},
lineage_prior_stats={},
label_prior_stats={},
)
self.assertEqual([candidate.mmsi for candidate in candidates], ['440222222'])
def test_track_coverage_metrics_penalize_short_track_support(self):
now = datetime(2026, 4, 3, 0, 0, tzinfo=timezone.utc)
center_track = [
{'timestamp': now - timedelta(hours=5), 'lat': 35.0, 'lon': 129.0},
{'timestamp': now - timedelta(hours=1), 'lat': 35.1, 'lon': 129.1},
]
short_track = [
{'timestamp': now - timedelta(minutes=10), 'lat': 35.1, 'lon': 129.1, 'sog': 0.5},
]
long_track = [
{'timestamp': now - timedelta(minutes=90) + timedelta(minutes=10 * idx), 'lat': 35.0, 'lon': 129.0 + (0.01 * idx), 'sog': 0.5}
for idx in range(10)
]
short_metrics = _build_track_coverage_metrics(center_track, short_track, 35.05, 129.05)
long_metrics = _build_track_coverage_metrics(center_track, long_track, 35.05, 129.05)
self.assertEqual(short_metrics['trackPointCount'], 1)
self.assertEqual(short_metrics['trackCoverageFactor'], 0.0)
self.assertGreater(long_metrics['trackCoverageFactor'], 0.0)
self.assertGreater(long_metrics['coverageFactor'], short_metrics['coverageFactor'])
def test_label_tracking_row_tracks_rank_and_match_flags(self):
top_candidate = self._candidate(mmsi='412333326', score=0.81, sources=['CORRELATION'])
top_candidate.evidence = {
'sources': ['CORRELATION'],
'scoreBreakdown': {'preBonusScore': 0.66},
}
labeled_candidate = self._candidate(mmsi='440123456', score=0.62, sources=['CORRELATION'])
labeled_candidate.evidence = {
'sources': ['CORRELATION'],
'scoreBreakdown': {'preBonusScore': 0.62},
}
row = _label_tracking_row(
observed_at='2026-04-03T00:00:00Z',
label_session={
'id': 10,
'label_parent_mmsi': '440123456',
'label_parent_name': 'TARGET',
},
auto_status='REVIEW_REQUIRED',
top_candidate=top_candidate,
margin=0.19,
candidates=[top_candidate, labeled_candidate],
)
self.assertEqual(row[0], 10)
self.assertEqual(row[8], 2)
self.assertTrue(row[9])
self.assertEqual(row[10], 2)
self.assertEqual(row[11], 0.62)
self.assertEqual(row[12], 0.62)
self.assertFalse(row[14])
self.assertTrue(row[15])
if __name__ == '__main__':
unittest.main()

파일 보기

@ -0,0 +1,90 @@
import unittest
import sys
import types
from datetime import datetime, timezone
from zoneinfo import ZoneInfo
import pandas as pd
stub = types.ModuleType('pydantic_settings')
class BaseSettings:
def __init__(self, **kwargs):
for name, value in self.__class__.__dict__.items():
if name.isupper():
setattr(self, name, kwargs.get(name, value))
stub.BaseSettings = BaseSettings
sys.modules.setdefault('pydantic_settings', stub)
from cache.vessel_store import VesselStore
from time_bucket import compute_incremental_window_start, compute_initial_window_start, compute_safe_bucket
class TimeBucketRuleTest(unittest.TestCase):
def test_safe_bucket_uses_delay_then_floors_to_5m(self):
now = datetime(2026, 4, 2, 15, 14, 0, tzinfo=ZoneInfo('Asia/Seoul'))
self.assertEqual(compute_safe_bucket(now), datetime(2026, 4, 2, 15, 0, 0))
def test_incremental_window_includes_overlap_buckets(self):
last_bucket = datetime(2026, 4, 2, 15, 0, 0)
self.assertEqual(compute_incremental_window_start(last_bucket), datetime(2026, 4, 2, 14, 45, 0))
def test_initial_window_start_anchors_to_safe_bucket(self):
safe_bucket = datetime(2026, 4, 2, 15, 0, 0)
self.assertEqual(compute_initial_window_start(24, safe_bucket), datetime(2026, 4, 1, 15, 0, 0))
def test_merge_incremental_prefers_newer_overlap_rows(self):
store = VesselStore()
store._tracks = {
'412000001': pd.DataFrame([
{
'mmsi': '412000001',
'timestamp': pd.Timestamp('2026-04-02T00:01:00Z'),
'time_bucket': datetime(2026, 4, 2, 9, 0, 0),
'lat': 30.0,
'lon': 120.0,
'raw_sog': 1.0,
},
{
'mmsi': '412000001',
'timestamp': pd.Timestamp('2026-04-02T00:02:00Z'),
'time_bucket': datetime(2026, 4, 2, 9, 0, 0),
'lat': 30.1,
'lon': 120.1,
'raw_sog': 1.0,
},
])
}
df_new = pd.DataFrame([
{
'mmsi': '412000001',
'timestamp': pd.Timestamp('2026-04-02T00:02:00Z'),
'time_bucket': datetime(2026, 4, 2, 9, 0, 0),
'lat': 30.2,
'lon': 120.2,
'raw_sog': 2.0,
},
{
'mmsi': '412000001',
'timestamp': pd.Timestamp('2026-04-02T00:03:00Z'),
'time_bucket': datetime(2026, 4, 2, 9, 5, 0),
'lat': 30.3,
'lon': 120.3,
'raw_sog': 2.0,
},
])
store.merge_incremental(df_new)
merged = store._tracks['412000001']
self.assertEqual(len(merged), 3)
replacement = merged.loc[merged['timestamp'] == pd.Timestamp('2026-04-02T00:02:00Z')].iloc[0]
self.assertEqual(float(replacement['lat']), 30.2)
self.assertEqual(float(replacement['lon']), 120.2)
if __name__ == '__main__':
unittest.main()

42
prediction/time_bucket.py Normal file
파일 보기

@ -0,0 +1,42 @@
from __future__ import annotations
from datetime import datetime, timedelta, timezone
from zoneinfo import ZoneInfo
from config import settings
_KST = ZoneInfo('Asia/Seoul')
_BUCKET_MINUTES = 5
def normalize_bucket_kst(bucket: datetime) -> datetime:
if bucket.tzinfo is None:
return bucket
return bucket.astimezone(_KST).replace(tzinfo=None)
def floor_bucket_kst(value: datetime, bucket_minutes: int = _BUCKET_MINUTES) -> datetime:
if value.tzinfo is None:
localized = value.replace(tzinfo=_KST)
else:
localized = value.astimezone(_KST)
floored_minute = (localized.minute // bucket_minutes) * bucket_minutes
return localized.replace(minute=floored_minute, second=0, microsecond=0)
def compute_safe_bucket(now: datetime | None = None) -> datetime:
current = now or datetime.now(timezone.utc)
if current.tzinfo is None:
current = current.replace(tzinfo=timezone.utc)
safe_point = current.astimezone(_KST) - timedelta(minutes=settings.SNPDB_SAFE_DELAY_MIN)
return floor_bucket_kst(safe_point).replace(tzinfo=None)
def compute_initial_window_start(hours: int, safe_bucket: datetime | None = None) -> datetime:
anchor = normalize_bucket_kst(safe_bucket or compute_safe_bucket())
return anchor - timedelta(hours=hours)
def compute_incremental_window_start(last_bucket: datetime) -> datetime:
normalized = normalize_bucket_kst(last_bucket)
return normalized - timedelta(minutes=settings.SNPDB_BACKFILL_BUCKETS * _BUCKET_MINUTES)