docs(admin-ops) spec write[#55] by baekyutae · Pull Request #66 · f-lab-edu/Biblio

baekyutae · 2026-04-13T12:57:16Z

summary

작성한 spec 문서

Feedback_Ingestion_Pipeline_Spec.md
ML_Pipeline_Execution_Spec.md

두 컴포넌트에 대한 spec 문서를 작성하였습니다.
피드백 루프의 경우 책임 범위가 넓어서,
(1) 피드백 데이터 전처리-> 모델 학습 및 평가 : ML_Pipeline_Execution_Spec.md
(2) 학습된 모델로 재색인 및 배포(모델 교체): Model_Release_and_Reindex_Spec.md
로 분리하여 작성하되 구현시에는 한 컴포넌트로 구현하려 합니다.

관리자 기능과 observation 부분도 이어서 작성되는대로 이 pr에 커밋하고 코멘트 달겠습니다.

git bash(cli) 로그인이 자꾸 안되서 어쩔수 없이 github desktop으로 pr을 올리다 보니 spec외의 문서들 커밋이 섞여 있습니다.(깃헙데스크탑은 모든 커밋을 푸시 안하면 pr 생성이 안되더군요 )

해서 리뷰 해주실때

docs(admin-ops): 관리자 기능 컴포넌트별 spec 작성 이 commit만 보시면 됩니다!!

관련 이슈

Closes #55

1. Feedback_Ingestion_Pipeline_Spec.md 2. ML_Pipeline_Execution_Spec.md

학습 완료 이후 재색인·롤백 계약 및 제약사항 명시

HTTP / RPC / Consumer 인터페이스 -> 외부 진입 인터페이스 사유: 경우에 따라 HTTP / RPC / Consumer 세 항목에 포함 안되는경우도 있기 때문

baekyutae · 2026-04-16T07:08:09Z

0416

model_release_and_reindex_spec 작성하였습니다.

앞서 작성한
Feedback_Ingestion_Pipeline_Spec.md
ML_Pipeline_Execution_Spec.md
까지 포함하여
피드백 수집 부터 ~ 모델 학습 밒 배포, 롤백까지 spec으로 작성 하였습니다

- Admin Control Plane 스펙을 추가해 관리자 조회 및 제어 액션 계약을 정의 - ML Pipeline Execution 스펙에 모델 평가 완료 후 릴리스 단계로 넘기는 조건을 구체화 - Model Release and Reindex 스펙을 공통 저장 상태를 다시 읽어 시작하는 방식에 맞게 정리

baekyutae · 2026-04-17T11:20:46Z

0417

Admin Control Plane spec 신규 추가
ML Pipeline Execution의 release 인계 조건 명확화
Model Release/Reindex가 shared SOT를 읽어 시작하도록 정리(MLPipelineRun.status가 READY_FOR_RELEASE로 기록)

f-lab-jesse · 2026-04-17T21:17:24Z

+## 1. 목적과 범위 (Purpose and Scope)
+
+### 1.1 한 줄 요약
+- Feedback Ingestion Pipeline(FIP)은 Core API Server가 검증 완료한 검색 응답 단위 피드백 이벤트를 비동기로 수신하여, 손실을 최소화하면서 수정 불가능한 원본 로그로 Object Storage에 적재하는 컴포넌트다.


질문: 지금은 피드백을 Core API Server 가 받아주고 있는데, 별도로 APIi 를 분리하는 것과 Core API Server 가 받는 것의 장단점이 무엇이 있을까요?

Core API가 받는 방식의 장점은, 이미 Core API가 가진 인증/인가 흐름과 사용자 컨텍스트를 그대로 활용해 피드백 유효성 검증을 일관되게 처리할 수 있다는 점입니다.

단점은 피드백 트래픽이 몰릴 경우 Core API의 다른 기능에도 부하가 전파될 수 있고, Core API에 업로드/삭제/재처리 외에 피드백 public endpoint와 검증 책임이 추가되어 책임 범위가 넓어진다는 점입니다.

별도 Feedback API를 두면 피드백 트래픽과 배포를 분리할 수 있지만, 인증/인가와 req_id 검증 책임을 다시 구현하거나 Core API/Metadata DB와 연동해야 하므로 중복과 운영 복잡도가 생깁니다.

f-lab-jesse · 2026-04-17T21:26:14Z

+- Transport: Core API가 broker로 publish하고, FIP가 비동기로 consume한다.
+- Routing surface: feedback 전용 메시지 경로를 둔다. 물리 queue/topic/exchange 이름은 이 SPEC에서 고정하지 않는다.
+- Producer / consumer responsibility:
+  - Producer(Core API): `req_id` 스냅샷 검증, 사용자/시간 창/무효화 여부 검사, feedback 전용 계약으로 broker publish


질문: 사용자의 피드백이 매우 많은 경우에 어떻게 대응해야할까요?

인스턴스를 무한정 늘린다?

받을 수 있는 만큼만 받는다?

그 외에는??

이런 로그를 받아들이는 vector.dev 같은 녀석은 어떻게 동작할까요?

f-lab-jesse · 2026-04-17T21:28:10Z

+
+
+#### 외부 서비스 계약 (해당 시)
+| Dependency | Used for | Required behavior / assumption | Failure impact |


질문: 그런데 이거 제가 vector.dev 같은 외부 서비스 쓰자고 했던 것 같은데..?

여기서 말하는 “외부 서비스 계약”은 외부 인프라 여부가 아니라, 이 컴포넌트가 런타임 중 연동하는 다른 컴포넌트와의 계약사항을 적는 항목입니다.

다만 “외부 서비스”라는 표현이 외부 인프라/SaaS로 오해될 수 있어서, 섹션명을 “외부 연동 컴포넌트 계약”으로 바꾸겠습니다.

Vector처럼 FIP 자체를 구성하는 런타임 기술은 1.3 기술 스택 선택에 표기해뒀습니다.

f-lab-jesse · 2026-04-17T21:29:51Z

+
+### 2.4 한계와 운영 제약
+- Performance / latency target:
+  - 사용자 동기 응답이 아니라 비동기 적재 경로이므로 per-event 저지연보다 손실 최소화와 backlog 회복 가능성을 우선한다.


질문: 사용자 피드백을 "손실 최소화", "backlog 회복 가능성" 우선으로 두는 이유는?

배치 처리 주기에 지장을 줄 정도로 적재가 밀리지 않는 한, 단건 저지연보다 검증된 피드백 이벤트를 잃지 않고 보존하는 것이 더 중요하다고 봤습니다.

왜냐하면 피드백 손실은 학습 데이터셋의 품질과 이후 모델 성능 역추적에 영향을 줄 수 있기 때문입니다.

f-lab-jesse · 2026-04-17T21:33:33Z

+
+### 간단한 흐름 (Simple Flow)
+1. 원본 피드백 로그에서 신규 이벤트를 읽어 학습 데이터셋 버전을 만든다.
+2. 활성 실행이 없으면 즉시 실행을 시작하고, 있으면 최신 데이터셋 기준 다음 실행 하나만 남긴다.


질문: 만약 검증 시간이 오래 걸리면 검증하는 동안 학습 데이터셋 버전을 미리 만들어두면 좋지 않을까요? 한번에 하나를 "반드시" 고수할 필요가 있는지..?

1. 외부 서비스 계약 -> 외부 연동 컴포넌트 계약 - 항목 명칭 수정 2. Vector 설정 기반 수집/재시도 계약으로 정리 - FIP를 custom consumer가 아니라 Vector 기반 ingestion pipeline으로 표현 - broker read, Object Storage sink, buffer/retry를 Vector source/sink 설정과 broker 재전달에 맡김 - 미지원 schema/필수 필드 누락을 error_logs/ sink로 라우팅 - “consume/ack/nack” 같은 직접 consumer 구현 뉘앙스를 줄임

- feedback event 전용 PostgreSQL MQ queue 계약을 명시한다. - `event_id` 기준 idempotency와 curation dataset dedupe 책임을 정리한다. - raw log 영속화 완료 기준, retry 고갈 시 보존 경로, Vector 설정의 버전 관리 대상 여부를 명시한다. - broad한 `invalid message` 상태를 `unsupported_schema_version`과 `malformed_feedback_event`로 분리한다. - raw log 적재 지연, 실패율, backlog/ingestion lag 노출 기준을 service level contract로 추가한다.

sonarqubecloud · 2026-04-19T11:19:01Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

baekyutae added 9 commits April 12, 2026 12:40

docs(admin-ops): 문서 버전 최신화

dd038eb

docs: 검색서비스 spec,plan 문서 폴더 정리

09f45d2

docs(admin-ops): spec.md 작성

9b065da

1. Feedback_Ingestion_Pipeline_Spec.md 2. ML_Pipeline_Execution_Spec.md

docs(admin-ops): 관리자 기능 컴포넌트별 spec 작성

2394b89

docs: 기타 문서 최신화

a06f661

docs: system desgin, diagram 브랜치 변경사항 반영

227b0c1

docs: 폴더 위치 이동

c25da2a

docs: spec, plan template 수정

a66e880

docs(admin-ops): 관리자 기능 컴포넌트별 spec 작성

542236d

baekyutae added this to the 관리자 서비스 - 프로젝트 설계 - 1차 milestone Apr 13, 2026

baekyutae requested a review from f-lab-jesse April 13, 2026 12:57

baekyutae self-assigned this Apr 13, 2026

baekyutae changed the title ~~Docs/55 admin ops spec~~ docs(admin-ops) spec write[#55] Apr 13, 2026

baekyutae added 4 commits April 15, 2026 11:48

docs(diagram): 롤백 정책 수정

59284fb

docs(system-design): 롤백 정책 수정

0b1d150

docs(spec): Model_Release_and_Reindex_Spec 작성

9da8827

학습 완료 이후 재색인·롤백 계약 및 제약사항 명시

docs(spec): 단어 수정

a1c8f16

HTTP / RPC / Consumer 인터페이스 -> 외부 진입 인터페이스 사유: 경우에 따라 HTTP / RPC / Consumer 세 항목에 포함 안되는경우도 있기 때문

f-lab-jesse reviewed Apr 17, 2026

View reviewed changes

baekyutae added 3 commits April 19, 2026 16:31

docs(spec): MLPipelineRun 상태 전이 책임 명시

fdfe006

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(admin-ops) spec write[#55]#66

docs(admin-ops) spec write[#55]#66
baekyutae wants to merge 17 commits intodocs/admin-opsfrom
docs/55-admin-ops-spec

baekyutae commented Apr 13, 2026

Uh oh!

baekyutae commented Apr 16, 2026

Uh oh!

baekyutae commented Apr 17, 2026 •

edited

Loading

Uh oh!

f-lab-jesse Apr 17, 2026

Uh oh!

baekyutae Apr 19, 2026

Uh oh!

f-lab-jesse Apr 17, 2026

Uh oh!

f-lab-jesse Apr 17, 2026

Uh oh!

f-lab-jesse Apr 17, 2026

Uh oh!

baekyutae Apr 19, 2026

Uh oh!

f-lab-jesse Apr 17, 2026

Uh oh!

baekyutae Apr 19, 2026

Uh oh!

f-lab-jesse Apr 17, 2026

Uh oh!

sonarqubecloud bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		#### 외부 서비스 계약 (해당 시)
		\| Dependency \| Used for \| Required behavior / assumption \| Failure impact \|

Conversation

baekyutae commented Apr 13, 2026

summary

관련 이슈

Uh oh!

baekyutae commented Apr 16, 2026

Uh oh!

baekyutae commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Apr 19, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

baekyutae commented Apr 17, 2026 •

edited

Loading