wing-ops/prediction/image/mx15hdi/Detect/mmsegmentation/configs/pspnet/README.md
jeonghyo.k 3946ff6a25 feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거
- prediction/image/ FastAPI 서버 Docker 환경 구성
  - Dockerfile: PyTorch 2.1 + CUDA 12.1 기반 GPU 이미지
  - docker-compose.yml: GPU 할당 + 데이터 볼륨 마운트
  - requirements.txt: 서버 의존성 목록
  - .env.example: 환경변수 템플릿
  - DOCKER_USAGE.md: 빌드/실행/API 사용법 문서
  - Dockerfile에 .dockerignore 제외 폴더 mkdir -p 추가
- .gitignore: prediction/image 결과물 및 모델 가중치(.pth) 제외 추가
- dbInsert_csv.py, dbInsert_shp.py 삭제 (미사용 DB 로직)
- api.py: dbInsert import 및 주석 처리된 DB 호출 코드 제거
- aerialRouter.ts: req.params 타입 오류 수정
2026-03-10 18:37:36 +09:00

54 KiB

PSPNet

Pyramid Scene Parsing Network

Introduction

Official Repo

Code Snippet

Abstract

Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

Citation

@inproceedings{zhao2017pspnet,
  title={Pyramid Scene Parsing Network},
  author={Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya},
  booktitle={CVPR},
  year={2017}
}
@article{wightman2021resnet,
  title={Resnet strikes back: An improved training procedure in timm},
  author={Wightman, Ross and Touvron, Hugo and J{\'e}gou, Herv{\'e}},
  journal={arXiv preprint arXiv:2110.00476},
  year={2021}
}

Results and models

Cityscapes

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-50-D8 512x1024 40000 6.1 4.07 77.85 79.18 config model | log
PSPNet R-101-D8 512x1024 40000 9.6 2.68 78.34 79.74 config model | log
PSPNet R-50-D8 769x769 40000 6.9 1.76 78.26 79.88 config model | log
PSPNet R-101-D8 769x769 40000 10.9 1.15 79.08 80.28 config model | log
PSPNet R-18-D8 512x1024 80000 1.7 15.71 74.87 76.04 config model | log
PSPNet R-50-D8 512x1024 80000 - - 78.55 79.79 config model | log
PSPNet R-50b-D8 rsb 512x1024 80000 6.2 3.82 78.47 79.45 config model | log
PSPNet R-101-D8 512x1024 80000 - - 79.76 81.01 config model | log
PSPNet (FP16) R-101-D8 512x1024 80000 5.34 8.77 79.46 - config model | log
PSPNet R-18-D8 769x769 80000 1.9 6.20 75.90 77.86 config model | log
PSPNet R-50-D8 769x769 80000 - - 79.59 80.69 config model | log
PSPNet R-101-D8 769x769 80000 - - 79.77 81.06 config model | log
PSPNet R-18b-D8 512x1024 80000 1.5 16.28 74.23 75.79 config model | log
PSPNet R-50b-D8 512x1024 80000 6.0 4.30 78.22 79.46 config model | log
PSPNet R-101b-D8 512x1024 80000 9.5 2.76 79.69 80.79 config model | log
PSPNet R-18b-D8 769x769 80000 1.7 6.41 74.92 76.90 config model | log
PSPNet R-50b-D8 769x769 80000 6.8 1.88 78.50 79.96 config model | log
PSPNet R-101b-D8 769x769 80000 10.8 1.17 78.87 80.04 config model | log
PSPNet R-50-D32 512x1024 80000 3.0 15.21 73.88 76.85 config model | log
PSPNet R-50b-D32 rsb 512x1024 80000 3.1 16.08 74.09 77.18 config model | log
PSPNet R-50b-D32 512x1024 80000 2.9 15.41 72.61 75.51 config model | log

ADE20K

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-50-D8 512x512 80000 8.5 23.53 41.13 41.94 config model | log
PSPNet R-101-D8 512x512 80000 12 15.30 43.57 44.35 config model | log
PSPNet R-50-D8 512x512 160000 - - 42.48 43.44 config model | log
PSPNet R-101-D8 512x512 160000 - - 44.39 45.35 config model | log

Pascal VOC 2012 + Aug

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-50-D8 512x512 20000 6.1 23.59 76.78 77.61 config model | log
PSPNet R-101-D8 512x512 20000 9.6 15.02 78.47 79.25 config model | log
PSPNet R-50-D8 512x512 40000 - - 77.29 78.48 config model | log
PSPNet R-101-D8 512x512 40000 - - 78.52 79.57 config model | log

Pascal Context

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-101-D8 480x480 40000 8.8 9.68 46.60 47.78 config model | log
PSPNet R-101-D8 480x480 80000 - - 46.03 47.15 config model | log

Pascal Context 59

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-101-D8 480x480 40000 - - 52.02 53.54 config model | log
PSPNet R-101-D8 480x480 80000 - - 52.47 53.99 config model | log

Dark Zurich and Nighttime Driving

We support evaluation results on these two datasets using models above trained on Cityscapes training set.

Method Backbone Training Dataset Test Dataset mIoU config evaluation checkpoint
PSPNet R-50-D8 Cityscapes Training set Dark Zurich 10.91 config model | log
PSPNet R-50-D8 Cityscapes Training set Nighttime Driving 23.02 config model | log
PSPNet R-50-D8 Cityscapes Training set Cityscapes Validation set 77.85 config model | log
PSPNet R-101-D8 Cityscapes Training set Dark Zurich 10.16 config model | log
PSPNet R-101-D8 Cityscapes Training set Nighttime Driving 20.25 config model | log
PSPNet R-101-D8 Cityscapes Training set Cityscapes Validation set 78.34 config model | log
PSPNet R-101b-D8 Cityscapes Training set Dark Zurich 15.54 config model | log
PSPNet R-101b-D8 Cityscapes Training set Nighttime Driving 22.25 config model | log
PSPNet R-101b-D8 Cityscapes Training set Cityscapes Validation set 79.69 config model | log

COCO-Stuff 10k

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-50-D8 512x512 20000 9.6 20.5 35.69 36.62 config model | log
PSPNet R-101-D8 512x512 20000 13.2 11.1 37.26 38.52 config model | log
PSPNet R-50-D8 512x512 40000 - - 36.33 37.24 config model | log
PSPNet R-101-D8 512x512 40000 - - 37.76 38.86 config model | log

COCO-Stuff 164k

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-50-D8 512x512 80000 9.6 20.5 38.80 39.19 config model | log
PSPNet R-101-D8 512x512 80000 13.2 11.1 40.34 40.79 config model | log
PSPNet R-50-D8 512x512 160000 - - 39.64 39.97 config model | log
PSPNet R-101-D8 512x512 160000 - - 41.28 41.66 config model | log
PSPNet R-50-D8 512x512 320000 - - 40.53 40.75 config model | log
PSPNet R-101-D8 512x512 320000 - - 41.95 42.42 config model | log

LoveDA

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-18-D8 512x512 80000 1.45 26.87 48.62 47.57 config model | log
PSPNet R-50-D8 512x512 80000 6.14 6.60 50.46 50.19 config model | log
PSPNet R-101-D8 512x512 80000 9.61 4.58 51.86 51.34 config model | log

Potsdam

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-18-D8 512x512 80000 1.50 85.12 77.09 78.30 config model | log
PSPNet R-50-D8 512x512 80000 6.14 30.21 78.12 78.98 config model | log
PSPNet R-101-D8 512x512 80000 9.61 19.40 78.62 79.47 config model | log

Vaihingen

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-18-D8 512x512 80000 1.45 85.06 71.46 73.36 config model | log
PSPNet R-50-D8 512x512 80000 6.14 30.29 72.36 73.75 config model | log
PSPNet R-101-D8 512x512 80000 9.61 19.97 72.61 74.18 config model | log

iSAID

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-18-D8 896x896 80000 4.52 26.91 60.22 61.25 config model | log
PSPNet R-50-D8 896x896 80000 16.58 8.88 65.36 66.48 config model | log

Note:

  • FP16 means Mixed Precision (FP16) is adopted in training.
  • 896x896 is the Crop Size of iSAID dataset, which is followed by the implementation of PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
  • rsb is short for 'Resnet strikes back'.
  • The b in R-50b means ResNetV1b, which is a standard ResNet backbone. In MMSegmentation, default backbone is ResNetV1c, which usually performs better in semantic segmentation task.