jeonghyo.k 3946ff6a25 feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거 - prediction/image/ FastAPI 서버 Docker 환경 구성 - Dockerfile: PyTorch 2.1 + CUDA 12.1 기반 GPU 이미지 - docker-compose.yml: GPU 할당 + 데이터 볼륨 마운트 - requirements.txt: 서버 의존성 목록 - .env.example: 환경변수 템플릿 - DOCKER_USAGE.md: 빌드/실행/API 사용법 문서 - Dockerfile에 .dockerignore 제외 폴더 mkdir -p 추가 - .gitignore: prediction/image 결과물 및 모델 가중치(.pth) 제외 추가 - dbInsert_csv.py, dbInsert_shp.py 삭제 (미사용 DB 로직) - api.py: dbInsert import 및 주석 처리된 DB 호출 코드 제거 - aerialRouter.ts: req.params 타입 오류 수정		2026-03-10 18:37:36 +09:00
..
README.md	feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거	2026-03-10 18:37:36 +09:00
setr_mla_512x512_160k_b8_ade20k.py	feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거	2026-03-10 18:37:36 +09:00
setr_mla_512x512_160k_b16_ade20k.py	feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거	2026-03-10 18:37:36 +09:00
setr_naive_512x512_160k_b16_ade20k.py	feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거	2026-03-10 18:37:36 +09:00
setr_pup_512x512_160k_b16_ade20k.py	feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거	2026-03-10 18:37:36 +09:00
setr_vit-large_mla_8x1_768x768_80k_cityscapes.py	feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거	2026-03-10 18:37:36 +09:00
setr_vit-large_naive_8x1_768x768_80k_cityscapes.py	feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거	2026-03-10 18:37:36 +09:00
setr_vit-large_pup_8x1_768x768_80k_cityscapes.py	feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거	2026-03-10 18:37:36 +09:00
setr.yml	feat(prediction): 이미지 분석 서버 Docker 패키징 + DB 코드 제거	2026-03-10 18:37:36 +09:00

README.md

SETR

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Introduction

Official Repo

Code Snippet

Abstract

Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (ie, without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.

This head has two version head.

Citation

@article{zheng2020rethinking,
  title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers},
  author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip HS and others},
  journal={arXiv preprint arXiv:2012.15840},
  year={2020}
}

Usage

You can download the pretrain from here. Then you can convert its keys with the script vit2mmseg.py in the tools directory.

python tools/model_converters/vit2mmseg.py ${PRETRAIN_PATH} ${STORE_PATH}

E.g.

python tools/model_converters/vit2mmseg.py \
jx_vit_large_p16_384-b3be5167.pth pretrain/vit_large_p16.pth

This script convert the model from PRETRAIN_PATH and store the converted model in STORE_PATH.

Results and models

ADE20K

Method	Backbone	Crop Size	Batch Size	Lr schd	Mem (GB)	Inf time (fps)	mIoU	mIoU(ms+flip)	config	download
SETR Naive	ViT-L	512x512	16	160000	18.40	4.72	48.28	49.56	config	model \| log
SETR PUP	ViT-L	512x512	16	160000	19.54	4.50	48.24	49.99	config	model \| log
SETR MLA	ViT-L	512x512	8	160000	10.96	-	47.34	49.05	config	model \| log
SETR MLA	ViT-L	512x512	16	160000	17.30	5.25	47.54	49.37	config	model \| log

Cityscapes

Method	Backbone	Crop Size	Batch Size	Lr schd	Mem (GB)	Inf time (fps)	mIoU	mIoU(ms+flip)	config	download
SETR Naive	ViT-L	768x768	8	80000	24.06	0.39	78.10	80.22	config	model \| log
SETR PUP	ViT-L	768x768	8	80000	27.96	0.37	79.21	81.02	config	model \| log
SETR MLA	ViT-L	768x768	8	80000	24.10	0.41	77.00	79.59	config	model \| log