[BUG] FEMU crashed on FDP mode in victim_ru_get_pri(a=0x0)

**Describe the bug**
Hello! I'm testing YCSB on RocksDB on emulated FDP SSD, FEMU crashed without any error message during KV pair loading phase, the workload I used is not to heavy, It only reach half of the capacity after about 4min, FDP-TRACE log output GC_BACK_RESERT triggered but delay GC (ru 21 ipc 7168 threshold 8192 full 65536) repeatedly then FEMU crashed.

**Environment**
- Host OS: Debian 13
- Kernel version: Linux 6.12.57+deb13-amd64
- FEMU version/commit: eb01bb73b85deb4069dcbdd8490c3cab3c977fe9
- FEMU mode: FDP-mode
- GuestOS Image: Ubuntu 25.10 qcow2 image
- Guest Kernel: Linux 6.17.0-29-generic
- FDP confituration:
    - `fdp=on`
    - `fdp.nruh=8`
    - `fdp.nrg=1`
    - `fdp.nru=141`
    - `fdp.ru.size=256MB`
    - `device.size=32G`

**To Reproduce**
Steps to reproduce the behavior:
1. Compile FEMU from commit eb01bb73b85deb4069dcbdd8490c3cab3c977fe9
2. Add such configuration to run-blackbox-fdp-sh
```
secsz=512         # sector size in bytes
secs_per_pg=8     # number of sectors in a flash page
pgs_per_blk=1024   # number of pages per flash block
blks_per_pl=141   # number of blocks per plane
pls_per_lun=1     # keep it at one, no multiplanes support
luns_per_ch=8     # number of chips per channel
nchs=8            # number of channels
ssd_size=32768    # in megabytes, consider 25% overprovisioning

# Latency in nanoseconds
pg_rd_lat=40000   # page read latency
pg_wr_lat=200000  # page write latency
blk_er_lat=2000000 # block erase latency
ch_xfer_lat=0     # channel transfer time

# GC Threshold (1-100)
gc_thres_pcent=75
gc_thres_pcent_high=95

# FDP Configuration
fdp_nruh=8        # number of reclaim unit handles
fdp_nrg=1         # number of reclaim groups
fdp_nru=$blks_per_pl  # total number of reclaim units (= blks_per_pl)
```
3. Compile YCSB-CPP from https://github.com/ls4154/YCSB-cpp and TorFS from https://github.com/SamsungDS/TorFS with RocksDB 9.3.1
4. Create XFS on /dev/nvme0n1 and mounted it on /mnt/fdp
5. Testing YCSB workloada with 15M record and 30M operation with DB path point to our FDP SSD mount point, then FEMU will crashed during KV loading phase**Expected behavior**
The benchmark should finished gracefully.

**Error logs**
First, FEMU will allocate RU for RUH0 perfectly,
```[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 128) old_ru=128 new_ru=136 reason=full_victim victim_ru_cnt 1
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 136) old_ru=136 new_ru=137 reason=full_victim victim_ru_cnt 2
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 137) old_ru=137 new_ru=138 reason=full_victim victim_ru_cnt 3
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 138) old_ru=138 new_ru=139 reason=full_victim victim_ru_cnt 4
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 139) old_ru=139 new_ru=140 reason=full_victim victim_ru_cnt 5
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 140) old_ru=140 new_ru=0 reason=full_victim victim_ru_cnt 6
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 0) old_ru=0 new_ru=1 reason=full_victim victim_ru_cnt 7
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 1) old_ru=1 new_ru=2 reason=full_victim victim_ru_cnt 8
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 2) old_ru=2 new_ru=3 reason=full_victim victim_ru_cnt 9
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 3) old_ru=3 new_ru=4 reason=full_victim victim_ru_cnt 10
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 4) old_ru=4 new_ru=5 reason=full_victim victim_ru_cnt 11
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 5) old_ru=5 new_ru=6 reason=full_victim victim_ru_cnt 12
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 6) old_ru=6 new_ru=7 reason=full_victim victim_ru_cnt 13
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 7) old_ru=7 new_ru=8 reason=full_victim victim_ru_cnt 14
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 8) old_ru=8 new_ru=9 reason=full_victim victim_ru_cnt 15

```

Then FEMU start GC like this

```
[FEMU] FDP-Trace: GC_START rgid=0 ruhid=0 victim_ru=136 victim_vpc=3 isolation=PI gc_type=BACK
[FEMU] FDP-Trace: GC_DONE victim_ru=136 pages_migrated=3 blocks_erased=64 mbmw_delta=12288 mbe_delta=268435456
[FEMU] FDP-Trace: GC_START rgid=0 ruhid=0 victim_ru=12 victim_vpc=3 isolation=PI gc_type=BACK
[FEMU] FDP-Trace: GC_DONE victim_ru=12 pages_migrated=3 blocks_erased=64 mbmw_delta=12288 mbe_delta=268435456
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 92) old_ru=92 new_ru=94 reason=full_victim victim_ru_cnt 97
[FEMU] FDP-Trace: GC_START rgid=0 ruhid=0 victim_ru=7 victim_vpc=4 isolation=PI gc_type=BACK
[FEMU] FDP-Trace: GC_DONE victim_ru=7 pages_migrated=4 blocks_erased=64 mbmw_delta=16384 mbe_delta=268435456
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 94) old_ru=94 new_ru=95 reason=full_victim victim_ru_cnt 97
[FEMU] FDP-Trace: GC_START rgid=0 ruhid=0 victim_ru=24 victim_vpc=4 isolation=PI gc_type=BACK
[FEMU] FDP-Trace: GC_DONE victim_ru=24 pages_migrated=4 blocks_erased=64 mbmw_delta=16384 mbe_delta=268435456
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 95) old_ru=95 new_ru=96 reason=full_victim victim_ru_cnt 97
[FEMU] FDP-Trace: GC_START rgid=0 ruhid=0 victim_ru=6 victim_vpc=5 isolation=PI gc_type=BACK
[FEMU] FDP-Trace: GC_DONE victim_ru=6 pages_migrated=5 blocks_erased=64 mbmw_delta=20480 mbe_delta=268435456
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 96) old_ru=96 new_ru=97 reason=full_victim victim_ru_cnt 97
[FEMU] FDP-Trace: GC_START rgid=0 ruhid=0 victim_ru=22 victim_vpc=5 isolation=PI gc_type=BACK
[FEMU] FDP-Trace: GC_DONE victim_ru=22 pages_migrated=5 blocks_erased=64 mbmw_delta=20480 mbe_delta=268435456
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 97) old_ru=97 new_ru=98 reason=full_victim victim_ru_cnt 97
[FEMU] FDP-Trace: GC_START rgid=0 ruhid=0 victim_ru=9 victim_vpc=6 isolation=PI gc_type=BACK
[FEMU] FDP-Trace: GC_DONE victim_ru=9 pages_migrated=6 blocks_erased=64 mbmw_delta=24576 mbe_delta=268435456
[FEMU] FDP-Trace: RU_ROTATE ruhid=0(curr_ru 98) old_ru=98 new_ru=99 reason=full_victim victim_ru_cnt 97

```
After a few minutes, some GC of victim RU are delayed since the invalid page in this RU don't reach the threshold

```
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 87 ipc 6823 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 50 ipc 6616 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 47 ipc 6240 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 75 ipc 5955 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 47 ipc 6240 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 75 ipc 5955 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 47 ipc 6240 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 75 ipc 5955 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 47 ipc 6240 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 75 ipc 5955 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 47 ipc 6240 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 75 ipc 5955 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 47 ipc 6240 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 75 ipc 5955 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 47 ipc 6240 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 75 ipc 5955 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 47 ipc 6240 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 75 ipc 5955 threshold 8192 full 65536)
```

Then FEMU crashed without any error message.

GDB point to `victim_ru_get_pri` and told that `qemu-system-x86` quit with signal `SIGSEGV`
```
Thread 37 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.

0x0000555555ac3a58 in victim_ru_get_pri (a=0x0)
    at ../hw/femu/bbssd/ftl.c:152

152         return ((FemuReclaimUnit *)a)->vpc;
```
And I try to use gdb catch some backtrace log

```
#0  0x0000555555ac3a58 in victim_ru_get_pri (a=0x0) at ../hw/femu/bbssd/ftl.c:152
#1  0x0000555555acdc2b in percolate_down (q=0x55557bf60a90, i=103) at ../hw/femu/lib/pqueue.c:109
        child_node = 0
        moving_node = 0x0
        moving_pri = 16109300872511081216
#2  0x0000555555acded3 in pqueue_change_priority (q=0x55557bf60a90, new_pri=65534, d=0x55557bf609a0) at ../hw/femu/lib/pqueue.c:157
        posn = 103
        old_pri = 65534
#3  0x0000555555ac6b33 in mark_page_invalid_fdp (ssd=0x555558aef920, ppa=0x7ff34b2fb6f8) at ../hw/femu/bbssd/ftl.c:1378
        spp = 0x555558aef928
        blk = 0x55556790e050
        pg = 0x5555681be9e0
        line = 0x55557bf5c740
        ru = 0x55557bf609a0
        rm = 0x555558450ac0
        was_full_ru = false
#4  0x0000555555ac8a99 in ssd_stream_write (n=0x555558a50840, ssd=0x555558aef920, req=0x7ffff4080ef0) at ../hw/femu/bbssd/ftl.c:2025
        ret = 0x55557bf60380
        swr = {
          type = 0,
          cmd = 1,
          stime = 12375027858264394
        }
        ns = 0x555558a59ab0
        spp = 0x555558aef928
        rg = 0x55557bf5cc80
        ruh = 0x55557bf64040
        ru = 0x55557bf60380
        lba = 42968992
        len = 8192
        start_lpn = 5371124
        end_lpn = 5372147
        ppa = {
          {
            g = {
              blk = 139,
              pg = 9,
              sec = 0,
              pl = 0,
              lun = 3,
              ch = 3,
              rsv = 0
            },
            ppa = 217017207044505739
          }
        }
        lpn = 5371921
        curlat = 43661807
        maxlat = 43661807
        r = 0
        pid = 0
        dtype = 0 '\000'
        ph = 0
        rgid = 0
        ruhid = 0
#5  0x0000555555ac8ebd in nvme_do_write_fdp (n=0x555558a50840, req=0x7ffff4080ef0, slba=42968992, nlb=8192) at ../hw/femu/bbssd/ftl.c:2097
        ns = 0x555558a59ab0
        ssd = 0x555558aef920
        spp = 0x555558aef928
        data_bytes = 4194304
        pid = 0
        dtype = 0 '\000'
        ph = 0
        rg = 0
        ruhid = 0
#6  0x0000555555aca924 in ftl_thread (arg=0x555558a50840) at ../hw/femu/bbssd/ftl.c:2467
        n = 0x555558a50840
        ssd = 0x555558aef920
        req = 0x7ffff4080ef0
        lat = 0
        rc = 1
        i = 1
#7  0x000055555616996e in qemu_thread_start (args=0x55557bf691f0) at ../util/qemu-thread-posix.c:393
        __cancel_buf = {
          __cancel_jmp_buf = {{
              __cancel_jmp_buf = {140682915204316, -5503248113342606125, 32, 0, 140737488343216, 140682906812416, -5503248113319537453, -1806873028232342317},
              __mask_was_saved = 0
            }},
          __pad = {0x7ff34b2fb970, 0x0, 0x0, 0x0}
        }
        __cancel_routine = 0x5555561697f2 <qemu_thread_atexit_notify>
        __cancel_arg = 0x0
        __not_first_call = 0
        qemu_thread_args = 0x55557bf691f0
        start_routine = 0x555555aca77e <ftl_thread>
        arg = 0x555558a50840
        r = 0x0
#8  0x00007ffff7702b7b in ??? () at /lib/x86_64-linux-gnu/libc.so.6
#9  0x00007ffff77807b8 in ??? () at /lib/x86_64-linux-gnu/libc.so.6
```
**Additional context**
I can fulfill this experiment on EA version of WARP with same configuration of FEMU and identical image.
But even in EA version of WARP, if the RU usage exceed the `gc_thres_pcent_high`, some gc may cause `[FEMU] FTL-Err:  unable to find victim RU, gc skip`, then WARP also crashed silencely.

I attach the full log of FEMU and gdb backtrace in attached file.

[femu-segv-victim-ru.log](https://github.com/user-attachments/files/28598250/femu-segv-victim-ru.log)

[femu-fdp-2026-06-04-083835.log](https://github.com/user-attachments/files/28598268/femu-fdp-2026-06-04-083835.log)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] FEMU crashed on FDP mode in victim_ru_get_pri(a=0x0) #189

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] FEMU crashed on FDP mode in victim_ru_get_pri(a=0x0) #189

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions