diff --git a/docs/en/14-reference/03-taos-sql/22-function.md b/docs/en/14-reference/03-taos-sql/22-function.md
index 335b3b2f6873..0cd239d03ade 100644
--- a/docs/en/14-reference/03-taos-sql/22-function.md
+++ b/docs/en/14-reference/03-taos-sql/22-function.md
@@ -865,6 +865,49 @@ LTRIM(expr)
 
 **Applicable to**: Tables and supertables.
 
+#### REGEXP_EXTRACT
+
+```sql
+REGEXP_EXTRACT(expr, pattern [, group_idx])
+```
+
+**Function Description**: Applies the POSIX extended regular expression `pattern` to `expr` and returns the substring matched by capture group `group_idx`. Returns NULL when there is no match or when `expr` or `pattern` is NULL.
+
+**Return Type**: Same as `expr` (VARCHAR or NCHAR).
+
+**Applicable Data Types**: `expr`: VARCHAR, NCHAR. `pattern`: VARCHAR, NCHAR.
+
+**Nested Subquery Support**: Applicable to both inner and outer queries.
+
+**Applicable to**: Tables and supertables.
+
+**Usage**:
+
+- If omitted, `group_idx` defaults to `1`.
+- If provided as a non-`NULL` value, `group_idx` must be a non-negative integer constant. `0` returns the entire match; `1` returns the first capture group, `2` the second, and so on. The maximum value is 512.
+- If `group_idx` is SQL `NULL`, the function returns `NULL`.
+- Returns NULL if `group_idx` exceeds the number of capture groups in `pattern`, or if the addressed group did not participate in the match.
+- `pattern` must be provided as a constant literal or parameter placeholder; it cannot reference a column or be computed from other expressions.
+
+**Example**:
+
+```sql
+taos> SELECT REGEXP_EXTRACT('2026-04-22', '([0-9]{4})-([0-9]{2})-([0-9]{2})', 1);
+ regexp_extract('2026-04-22', '([0-9]{4})-([0-9]{2})-([0-9]{2})', 1) |
+=======================================================================
+ 2026                                                                  |
+
+taos> SELECT REGEXP_EXTRACT('2026-04-22', '([0-9]{4})-([0-9]{2})-([0-9]{2})', 0);
+ regexp_extract('2026-04-22', '([0-9]{4})-([0-9]{2})-([0-9]{2})', 0) |
+=======================================================================
+ 2026-04-22                                                            |
+
+taos> SELECT REGEXP_EXTRACT('no-digits-here', '[0-9]+', 1);
+ regexp_extract('no-digits-here', '[0-9]+', 1) |
+===============================================
+ NULL                                          |
+```
+
 #### REGEXP_IN_SET
 
 ```sql
diff --git a/docs/zh/14-reference/03-taos-sql/22-function.md b/docs/zh/14-reference/03-taos-sql/22-function.md
index 83872b2af85a..1fd1ccb7dc5d 100644
--- a/docs/zh/14-reference/03-taos-sql/22-function.md
+++ b/docs/zh/14-reference/03-taos-sql/22-function.md
@@ -1044,6 +1044,47 @@ taos> select position('d' in 'cba');
                       0 |
 ```
 
+#### REGEXP_EXTRACT
+
+```sql
+REGEXP_EXTRACT(expr, pattern [, group_idx])
+```
+
+**功能说明**：对 `expr` 应用 POSIX 扩展正则表达式 `pattern`，返回第 `group_idx` 个捕获组匹配的子串。无匹配、`expr` 或 `pattern` 为 NULL 时返回 NULL。
+
+**返回结果类型**：与 `expr` 相同（VARCHAR 或 NCHAR）。
+
+**适用数据类型**：`expr`：VARCHAR、NCHAR；`pattern`：VARCHAR、NCHAR。
+
+**嵌套子查询支持**：适用于内层查询和外层查询。
+
+**适用于**：表和超级表。
+
+**使用说明**：
+
+- `group_idx` 通常为非负整数常量，默认为 `1`。`0` 返回整个匹配串，`1` 返回第一个捕获组，`2` 返回第二个，以此类推，最大值为 512。若 `group_idx` 为 SQL `NULL`，则返回 `NULL`。
+- 若 `group_idx` 超过 `pattern` 中的捕获组数量，或对应捕获组未参与匹配，返回 NULL。
+- `pattern` 必须为常量（字面量或预处理占位符），不可引用列；不支持 `concat('a','b')` 这类常量表达式。
+
+**举例**：
+
+```sql
+taos> SELECT REGEXP_EXTRACT('2026-04-22', '([0-9]{4})-([0-9]{2})-([0-9]{2})', 1);
+ regexp_extract('2026-04-22', '([0-9]{4})-([0-9]{2})-([0-9]{2})', 1) |
+=======================================================================
+ 2026                                                                  |
+
+taos> SELECT REGEXP_EXTRACT('2026-04-22', '([0-9]{4})-([0-9]{2})-([0-9]{2})', 0);
+ regexp_extract('2026-04-22', '([0-9]{4})-([0-9]{2})-([0-9]{2})', 0) |
+=======================================================================
+ 2026-04-22                                                            |
+
+taos> SELECT REGEXP_EXTRACT('no-digits-here', '[0-9]+', 1);
+ regexp_extract('no-digits-here', '[0-9]+', 1) |
+===============================================
+ NULL                                          |
+```
+
 #### REGEXP_IN_SET
 
 ```sql
@@ -2369,11 +2410,11 @@ LAG(expr, offset[, default_val])
 **使用说明**：
 
 - `offset` 必须为大于 0 的整数。
-- `default_val` 可选；当目标行不存在时返回该值，未指定时返回 `NULL`。
-- `default_val` 需要与 `expr` 类型兼容。
-- `LAG` 按输入结果集的行序计算；可以结合 `ORDER BY` 改变计算顺序。
-- 支持与 `_rowts`、`tbname`、标签列等一起查询，也支持在子查询和 `PARTITION BY` 场景中使用。
-- 与窗口一起使用时，`LAG` 仅在当前窗口内部按窗口内结果顺序计算，不会跨窗口继承上一窗口的状态。
+- `default_val` 可选；当目标行不存在时返回该值，未指定时返回 `NULL`。
+- `default_val` 需要与 `expr` 类型兼容。
+- `LAG` 按输入结果集的行序计算；可以结合 `ORDER BY` 改变计算顺序。
+- 支持与 `_rowts`、`tbname`、标签列等一起查询，也支持在子查询和 `PARTITION BY` 场景中使用。
+- 与窗口一起使用时，`LAG` 仅在当前窗口内部按窗口内结果顺序计算，不会跨窗口继承上一窗口的状态。
 
 #### LEAD
 
@@ -2392,11 +2433,11 @@ LEAD(expr, offset[, default_val])
 **使用说明**：
 
 - `offset` 必须为大于 0 的整数。
-- `default_val` 可选；当目标行不存在时返回该值，未指定时返回 `NULL`。
-- `default_val` 需要与 `expr` 类型兼容。
-- `LEAD` 按输入结果集的行序计算；可以结合 `ORDER BY` 改变计算顺序。
-- 支持与 `_rowts`、`tbname`、标签列等一起查询，也支持在子查询和 `PARTITION BY` 场景中使用。
-- 与窗口一起使用时，`LEAD` 仅在当前窗口内部按窗口内结果顺序计算，不会跨窗口读取下一窗口的数据。
+- `default_val` 可选；当目标行不存在时返回该值，未指定时返回 `NULL`。
+- `default_val` 需要与 `expr` 类型兼容。
+- `LEAD` 按输入结果集的行序计算；可以结合 `ORDER BY` 改变计算顺序。
+- 支持与 `_rowts`、`tbname`、标签列等一起查询，也支持在子查询和 `PARTITION BY` 场景中使用。
+- 与窗口一起使用时，`LEAD` 仅在当前窗口内部按窗口内结果顺序计算，不会跨窗口读取下一窗口的数据。
 
 #### MAX
 
@@ -3155,11 +3196,11 @@ MAVG(expr, k)
 
 **适用于**：表和超级表。
 
-**使用说明**：
-
-- 不支持 +、-、*、/ 运算，如 mavg(col1, k1) + mavg(col2, k1);
-- 只能与普通列，选择（Selection）、投影（Projection）函数一起使用，不能与聚合（Aggregation）函数一起使用；
-- 与窗口一起使用时，`MAVG` 仅在当前窗口内部按样本顺序计算，不会跨窗口延续上一窗口的样本状态。
+**使用说明**：
+
+- 不支持 +、-、*、/ 运算，如 mavg(col1, k1) + mavg(col2, k1);
+- 只能与普通列，选择（Selection）、投影（Projection）函数一起使用，不能与聚合（Aggregation）函数一起使用；
+- 与窗口一起使用时，`MAVG` 仅在当前窗口内部按样本顺序计算，不会跨窗口延续上一窗口的样本状态。
 
 #### STATECOUNT
 
@@ -3182,9 +3223,9 @@ STATECOUNT(expr, oper, val)
 
 **适用于**：表和超级表。
 
-**使用说明**：
-
-- 与窗口一起使用时，`STATECOUNT` 仅统计当前窗口内部的连续记录，不会跨窗口累计。
+**使用说明**：
+
+- 与窗口一起使用时，`STATECOUNT` 仅统计当前窗口内部的连续记录，不会跨窗口累计。
 
 #### STATEDURATION
 
@@ -3208,9 +3249,9 @@ STATEDURATION(expr, oper, val, unit)
 
 **适用于**：表和超级表。
 
-**使用说明**：
-
-- 与窗口一起使用时，`STATEDURATION` 仅统计当前窗口内部满足条件的连续时长，不会跨窗口累计。
+**使用说明**：
+
+- 与窗口一起使用时，`STATEDURATION` 仅统计当前窗口内部满足条件的连续时长，不会跨窗口累计。
 
 ### 时间加权统计
 
diff --git a/include/libs/function/functionMgt.h b/include/libs/function/functionMgt.h
index 2b569a9e6ba6..7e4ae7b609dc 100644
--- a/include/libs/function/functionMgt.h
+++ b/include/libs/function/functionMgt.h
@@ -141,6 +141,7 @@ typedef enum EFunctionType {
   FUNCTION_TYPE_AES_DECRYPT,
   FUNCTION_TYPE_SM4_ENCRYPT,
   FUNCTION_TYPE_SM4_DECRYPT,
+  FUNCTION_TYPE_REGEXP_EXTRACT,
 
   // conversion function
   FUNCTION_TYPE_CAST = 2000,
diff --git a/include/libs/scalar/scalar.h b/include/libs/scalar/scalar.h
index 518b13da7b32..807351cc7d07 100644
--- a/include/libs/scalar/scalar.h
+++ b/include/libs/scalar/scalar.h
@@ -132,6 +132,11 @@ int32_t crc32Function(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOut
 int32_t findInSetFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput);
 int32_t likeInSetFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput);
 int32_t regexpInSetFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput);
+int32_t regexpExtractFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput);
+
+// Maximum capture-group index accepted by regexp_extract() — shared between
+// translate-time validation (builtins.c) and runtime validation (sclfunc.c).
+#define REGEXP_EXTRACT_MAX_GROUP_IDX 512
 int32_t generateTotpSecretFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput);
 int32_t generateTotpCodeFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput);
 
diff --git a/source/libs/executor/src/externalwindowoperator.c b/source/libs/executor/src/externalwindowoperator.c
index 758e08303d9a..79b433a80d05 100644
--- a/source/libs/executor/src/externalwindowoperator.c
+++ b/source/libs/executor/src/externalwindowoperator.c
@@ -2493,8 +2493,8 @@ static int32_t extWinApplyAggPostProjection(SOperatorInfo* pOperator, SExternalW
 
   SSDataBlock* pSlice = pExtW->pProjTmpBlock;
   TAOS_CHECK_EXIT(projectApplyFunctions(pExtW->projSupp.pExprInfo, pSlice, pSlice, pExtW->projSupp.pCtx,
-                                        pExtW->projSupp.numOfExprs, NULL, GET_STM_RTINFO(pOperator->pTaskInfo),
-                                        pOperator->pTaskInfo));
+                                        pExtW->projSupp.numOfExprs, NULL,
+                                        GET_STM_RTINFO(pOperator->pTaskInfo), pOperator->pTaskInfo));
 
   int32_t numOfCols = taosArrayGetSize(pBlock->pDataBlock);
   // TODO(perf): only copy back the slots actually written by projSupp, not all columns.
diff --git a/source/libs/function/src/builtins.c b/source/libs/function/src/builtins.c
index 67096a51b5a6..5e8e72de2060 100644
--- a/source/libs/function/src/builtins.c
+++ b/source/libs/function/src/builtins.c
@@ -1105,6 +1105,117 @@ static int32_t translateRand(SFunctionNode* pFunc, char* pErrBuf, int32_t len) {
 static int32_t translateSleep(SFunctionNode* pFunc, char* pErrBuf, int32_t len) {
   FUNC_ERR_RET(validateParam(pFunc, pErrBuf, len));
   pFunc->node.resType = (SDataType){.bytes = tDataTypes[TSDB_DATA_TYPE_INT].bytes, .type = TSDB_DATA_TYPE_INT};
+
+  return TSDB_CODE_SUCCESS;
+}
+
+static int32_t translateRegexpExtract(SFunctionNode* pFunc, char* pErrBuf, int32_t len) {
+  FUNC_ERR_RET(validateParam(pFunc, pErrBuf, len));
+  int32_t numOfParams = LIST_LENGTH(pFunc->pParameterList);
+
+  // param[1]: pattern must be a literal/parameter constant VALUE node.
+  // Constant expressions are not accepted here because regexp_extract
+  // currently validates only VALUE nodes.
+  SNode* pPatNode = nodesListGetNode(pFunc->pParameterList, 1);
+  if (QUERY_NODE_VALUE != nodeType(pPatNode)) {
+    return invaildFuncParaTypeErrMsg(
+        pErrBuf, len, "regexp_extract: pattern must be a literal or parameter constant");
+  }
+
+  // Validate the regex pattern compiles as POSIX ERE.
+  // For prepared-statement placeholders, literal may contain the placeholder
+  // token (for example "?") instead of the bound pattern. Prefer the
+  // materialized datum when available, and otherwise defer validation to
+  // runtime for placeholders. For NCHAR patterns datum.p holds UCS-4 vardata;
+  // convert it to UTF-8 to match the runtime path in regexpExtractFunction.
+  SValueNode* pPatVal = (SValueNode*)pPatNode;
+  {
+    const char* regPattern = NULL;
+    char*       utf8Pat = NULL;
+    bool        freeUtf8Pat = false;
+    bool        deferValidation = (pPatVal->placeholderNo != 0 && pPatVal->datum.p == NULL);
+
+    if (!deferValidation) {
+      if (pPatVal->node.resType.type == TSDB_DATA_TYPE_NCHAR && pPatVal->datum.p != NULL) {
+        int32_t ncharBytes = varDataLen(pPatVal->datum.p);
+        utf8Pat = taosMemoryCalloc(ncharBytes + 1, 1);
+        if (utf8Pat == NULL) return terrno;
+        int32_t utf8Len = taosUcs4ToMbs((TdUcs4*)varDataVal(pPatVal->datum.p), ncharBytes,
+                                        utf8Pat, pPatVal->charsetCxt);
+        if (utf8Len < 0) {
+          taosMemoryFree(utf8Pat);
+          return buildFuncErrMsg(pErrBuf, len, TSDB_CODE_PAR_REGULAR_EXPRESSION_ERROR,
+                                 "regexp_extract: failed to convert NCHAR pattern to UTF-8");
+        }
+        utf8Pat[utf8Len] = '\0';
+        regPattern = utf8Pat;
+        freeUtf8Pat = true;
+      } else if (pPatVal->datum.p != NULL) {
+        // datum.p is a length-prefixed vardata buffer — not NUL-terminated.
+        // Build a NUL-terminated copy for regcomp().
+        int32_t patBytes = varDataLen(pPatVal->datum.p);
+        utf8Pat = taosMemoryMalloc(patBytes + 1);
+        if (utf8Pat == NULL) return terrno;
+        (void)memcpy(utf8Pat, varDataVal(pPatVal->datum.p), patBytes);
+        utf8Pat[patBytes] = '\0';
+        regPattern  = utf8Pat;
+        freeUtf8Pat = true;
+      } else {
+        regPattern = pPatVal->literal;
+      }
+    }
+
+    if (regPattern != NULL) {
+      regex_t re;
+      int     ret = regcomp(&re, regPattern, REG_EXTENDED);
+      if (ret != 0) {
+        char msgbuf[256] = {0};
+        (void)regerror(ret, NULL, msgbuf, sizeof(msgbuf));
+        // do not call regfree — regcomp failed, re is partially initialised (POSIX)
+        if (freeUtf8Pat) taosMemoryFree(utf8Pat);
+        return buildFuncErrMsg(pErrBuf, len, TSDB_CODE_PAR_REGULAR_EXPRESSION_ERROR,
+                               "Invalid regex pattern for regexp_extract: %s", msgbuf);
+      }
+      regfree(&re);  // only reached when regcomp succeeded
+    }
+    if (freeUtf8Pat) taosMemoryFree(utf8Pat);
+  }
+
+  // param[2]: group_idx (optional) must be a non-negative integer constant.
+  // NULL is also allowed by the builtin signature and should propagate like
+  // other scalar functions, so accept NULL-typed value nodes here and rely
+  // on runtime to return a NULL result.
+  if (numOfParams == 3) {
+    SNode* pIdxNode = nodesListGetNode(pFunc->pParameterList, 2);
+    if (QUERY_NODE_VALUE != nodeType(pIdxNode)) {
+      return invaildFuncParaTypeErrMsg(pErrBuf, len, "regexp_extract: group_idx must be a constant integer");
+    }
+
+    SValueNode* pIdxVal = (SValueNode*)pIdxNode;
+    int32_t idxType = pIdxVal->node.resType.type;
+
+    if (TSDB_DATA_TYPE_NULL != idxType) {
+      if (!IS_INTEGER_TYPE(idxType)) {
+        return invaildFuncParaTypeErrMsg(pErrBuf, len, "regexp_extract: group_idx must be an integer");
+      }
+      // Skip range validation for prepared-statement placeholders — the bound value
+      // is not yet known; the runtime check in regexpExtractFunction applies instead.
+      if (pIdxVal->placeholderNo == 0) {
+        int64_t groupIdx = pIdxVal->datum.i;
+        if (groupIdx < 0 || groupIdx > REGEXP_EXTRACT_MAX_GROUP_IDX) {
+          char errmsg[64];
+          (void)snprintf(errmsg, sizeof(errmsg),
+                         "regexp_extract: group_idx must be between 0 and %d",
+                         REGEXP_EXTRACT_MAX_GROUP_IDX);
+          return invaildFuncParaValueErrMsg(pErrBuf, len, errmsg);
+        }
+      }
+    }
+  }
+
+  // Return type matches str (param[0]): same VARCHAR/NCHAR type and byte width
+  pFunc->node.resType = *getSDataTypeFromNode(nodesListGetNode(pFunc->pParameterList, 0));
+
   return TSDB_CODE_SUCCESS;
 }
 
@@ -7441,6 +7552,41 @@ const SBuiltinFuncDefinition funcMgtBuiltins[] = {
     .sprocessFunc = sleepFunction,
     .finalizeFunc = NULL
   },
+  {
+    .name = "regexp_extract",
+    .type = FUNCTION_TYPE_REGEXP_EXTRACT,
+    .classification = FUNC_MGT_SCALAR_FUNC | FUNC_MGT_STRING_FUNC,
+    .parameters = {.minParamNum = 2,
+                   .maxParamNum = 3,
+                   .paramInfoPattern = 1,
+                   .inputParaInfo[0][0] = {.isLastParam = false,
+                                           .startParam = 1,
+                                           .endParam = 1,
+                                           .validDataType = FUNC_PARAM_SUPPORT_VARCHAR_TYPE | FUNC_PARAM_SUPPORT_NCHAR_TYPE | FUNC_PARAM_SUPPORT_NULL_TYPE,
+                                           .validNodeType = FUNC_PARAM_SUPPORT_EXPR_NODE,
+                                           .paramAttribute = FUNC_PARAM_NO_SPECIFIC_ATTRIBUTE,
+                                           .valueRangeFlag = FUNC_PARAM_NO_SPECIFIC_VALUE,},
+                   .inputParaInfo[0][1] = {.isLastParam = false,
+                                           .startParam = 2,
+                                           .endParam = 2,
+                                           .validDataType = FUNC_PARAM_SUPPORT_VARCHAR_TYPE | FUNC_PARAM_SUPPORT_NCHAR_TYPE | FUNC_PARAM_SUPPORT_NULL_TYPE,
+                                           .validNodeType = FUNC_PARAM_SUPPORT_VALUE_NODE,
+                                           .paramAttribute = FUNC_PARAM_NO_SPECIFIC_ATTRIBUTE,
+                                           .valueRangeFlag = FUNC_PARAM_NO_SPECIFIC_VALUE,},
+                   .inputParaInfo[0][2] = {.isLastParam = true,
+                                           .startParam = 3,
+                                           .endParam = 3,
+                                           .validDataType = FUNC_PARAM_SUPPORT_INTEGER_TYPE | FUNC_PARAM_SUPPORT_NULL_TYPE,
+                                           .validNodeType = FUNC_PARAM_SUPPORT_VALUE_NODE,
+                                           .paramAttribute = FUNC_PARAM_NO_SPECIFIC_ATTRIBUTE,
+                                           .valueRangeFlag = FUNC_PARAM_NO_SPECIFIC_VALUE,},
+                   .outputParaInfo = {.validDataType = FUNC_PARAM_SUPPORT_VARCHAR_TYPE | FUNC_PARAM_SUPPORT_NCHAR_TYPE}},
+    .translateFunc = translateRegexpExtract,
+    .getEnvFunc   = NULL,
+    .initFunc     = NULL,
+    .sprocessFunc = regexpExtractFunction,
+    .finalizeFunc = NULL,
+  },
 };
 // clang-format on
 
diff --git a/source/libs/scalar/src/sclfunc.c b/source/libs/scalar/src/sclfunc.c
index d0d0f787f69b..ea14b3c06b65 100644
--- a/source/libs/scalar/src/sclfunc.c
+++ b/source/libs/scalar/src/sclfunc.c
@@ -1817,6 +1817,228 @@ static int32_t base32Encode(const uint8_t *in, int32_t inLen, char *out) {
   return outLen;
 }
 
+int32_t regexpExtractFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput) {
+  int32_t code = TSDB_CODE_SUCCESS;
+
+  int32_t          numOfRows  = pInput[0].numOfRows;
+  SColumnInfoData *pStrData   = pInput[0].columnData;
+  SColumnInfoData *pPatData   = pInput[1].columnData;
+  SColumnInfoData *pOutputData = pOutput->columnData;
+
+  if (numOfRows == 0) {
+    pOutput->numOfRows = 0;
+    return TSDB_CODE_SUCCESS;
+  }
+
+  if (IS_NULL_TYPE(GET_PARAM_TYPE(&pInput[0])) || IS_NULL_TYPE(GET_PARAM_TYPE(&pInput[1]))) {
+    colDataSetNNULL(pOutputData, 0, numOfRows);
+    pOutput->numOfRows = numOfRows;
+    return TSDB_CODE_SUCCESS;
+  }
+
+  if (colDataIsNull_s(pPatData, 0)) {
+    colDataSetNNULL(pOutputData, 0, numOfRows);
+    pOutput->numOfRows = numOfRows;
+    return TSDB_CODE_SUCCESS;
+  }
+
+  // Get group_idx (default 1; param[2] is an optional integer constant).
+  // Read into int64_t first to avoid silent truncation/wrap for BIGINT/UBIGINT
+  // placeholder values before the range check, then cast after validation.
+  // An explicit SQL NULL group_idx propagates NULL to all output rows.
+  int64_t groupIdxRaw = 1;
+  if (inputNum == 3) {
+    if (IS_NULL_TYPE(GET_PARAM_TYPE(&pInput[2])) || colDataIsNull_s(pInput[2].columnData, 0)) {
+      colDataSetNNULL(pOutputData, 0, numOfRows);
+      pOutput->numOfRows = numOfRows;
+      return TSDB_CODE_SUCCESS;
+    }
+    GET_TYPED_DATA(groupIdxRaw, int64_t, GET_PARAM_TYPE(&pInput[2]),
+                   colDataGetData(pInput[2].columnData, 0),
+                   typeGetTypeModFromColInfo(&pInput[2].columnData->info));
+  }
+  if (groupIdxRaw < 0 || groupIdxRaw > REGEXP_EXTRACT_MAX_GROUP_IDX) {
+    pOutput->numOfRows = numOfRows;
+    SCL_ERR_RET(TSDB_CODE_FUNC_FUNTION_PARA_VALUE);
+  }
+  int32_t groupIdx = (int32_t)groupIdxRaw;
+
+  // Build null-terminated UTF-8 pattern string (pattern is a constant, always 1 row)
+  char    patBuf[512];
+  char   *patStr     = patBuf;
+  int32_t patLen     = 0;
+  bool    needFreePat = false;
+  {
+    char   *rawPat    = varDataVal(colDataGetData(pPatData, 0));
+    int32_t rawPatLen = varDataLen(colDataGetData(pPatData, 0));
+    if (GET_PARAM_TYPE(&pInput[1]) == TSDB_DATA_TYPE_NCHAR) {
+      if (rawPatLen == 0) {
+        patLen = 0;
+        patStr = patBuf;
+        patStr[0] = '\0';
+      } else {
+        patStr = NULL;  // ensure convNcharToVarchar always mallocs a fresh heap buffer
+        code = convNcharToVarchar(rawPat, &patStr, rawPatLen, &patLen, pInput[1].charsetCxt);
+        if (code != TSDB_CODE_SUCCESS) goto _exit;
+        needFreePat = true;
+        // convNcharToVarchar allocates rawPatLen bytes (no +1 for NUL); when the
+        // UTF-8 output fills the buffer entirely there is no room for a terminator.
+        // threadGetRegComp requires a NUL-terminated string — grow by one byte.
+        char *tmp = taosMemoryRealloc(patStr, patLen + 1);
+        if (tmp == NULL) {
+          taosMemoryFree(patStr);
+          needFreePat = false;
+          code = terrno;
+          goto _exit;
+        }
+        patStr = tmp;
+        patStr[patLen] = '\0';
+      }
+    } else {
+      patLen = rawPatLen;
+      if (patLen >= (int32_t)sizeof(patBuf)) {
+        patStr = taosMemoryMalloc(patLen + 1);
+        if (patStr == NULL) {
+          code = terrno;
+          goto _exit;
+        }
+        needFreePat = true;
+      }
+      (void)memcpy(patStr, rawPat, patLen);
+      patStr[patLen] = '\0';
+    }
+  }
+
+  // Compile (or retrieve cached) regex — pattern is constant so cache hits every row
+  regex_t *regex = NULL;
+  code = threadGetRegComp(&regex, patStr);
+  if (code != 0) {
+    terrno = code;
+    goto _exit;
+  }
+
+  // regmatch_t array: index 0 = whole match, 1..groupIdx = capture groups.
+  // Initialize all entries to -1 so any submatch slots not written by regexec
+  // (for example when groupIdx exceeds regex->re_nsub) remain deterministic.
+  int32_t     nmatch  = groupIdx + 1;
+  regmatch_t *pmatch  = taosMemoryMalloc(nmatch * sizeof(regmatch_t));
+  if (pmatch == NULL) {
+    code = terrno;
+    goto _exit;
+  }
+  (void)memset(pmatch, 0xFF, nmatch * sizeof(regmatch_t));
+
+  // Each output cell is a VarData value, and for var-length types info.bytes
+  // already includes the VARSTR_HEADER_SIZE length prefix plus payload space.
+  int32_t outBufLen = pStrData->info.bytes;
+  char   *outBuf    = taosMemoryMalloc(outBufLen);
+  if (outBuf == NULL) {
+    taosMemoryFree(pmatch);
+    code = terrno;
+    goto _exit;
+  }
+
+  int32_t strType = GET_PARAM_TYPE(&pInput[0]);
+  bool    isNchar = (strType == TSDB_DATA_TYPE_NCHAR);
+
+  // Null-termination buffer shared across rows — grown via realloc only when needed
+  char   *strNt    = NULL;
+  int32_t strNtCap = 0;
+
+  for (int32_t i = 0; i < numOfRows; i++) {
+    if (colDataIsNull_s(pStrData, i)) {
+      colDataSetNULL(pOutputData, i);
+      continue;
+    }
+
+    char   *strRaw = colDataGetData(pStrData, i);
+    char   *strVal = varDataVal(strRaw);
+    int32_t strLen = varDataLen(strRaw);
+
+    // Grow the null-termination buffer only when the current row needs more space.
+    // For NCHAR: UTF-8 output is at most strLen bytes (UCS-4 byte count >= UTF-8 byte count),
+    // so strLen + 1 is a safe upper bound for both NCHAR and VARCHAR paths.
+    if (strLen + 1 > strNtCap) {
+      char *tmp = taosMemoryRealloc(strNt, strLen + 1);
+      if (tmp == NULL) {
+        code = terrno;
+        break;
+      }
+      strNt    = tmp;
+      strNtCap = strLen + 1;
+    }
+
+    // Convert input into the NUL-terminated UTF-8 scratch buffer.
+    // For NCHAR: convert UCS-4 directly into strNt — avoids per-row malloc/free.
+    // For VARCHAR: data is already UTF-8, just copy it.
+    int32_t strUtf8Len;
+    if (isNchar) {
+      strUtf8Len = taosUcs4ToMbs((TdUcs4 *)strVal, strLen, strNt, pInput[0].charsetCxt);
+      if (strUtf8Len < 0) {
+        code = TSDB_CODE_SCALAR_CONVERT_ERROR;
+        terrno = code;
+        break;
+      }
+    } else {
+      (void)memcpy(strNt, strVal, strLen);
+      strUtf8Len = strLen;
+    }
+    strNt[strUtf8Len] = '\0';
+
+    int ret = regexec(regex, strNt, nmatch, pmatch, 0);
+    if (ret == REG_NOMATCH || (ret == 0 && pmatch[groupIdx].rm_so == -1)) {
+      // no match, or the requested capture group did not participate
+      colDataSetNULL(pOutputData, i);
+    } else if (ret != 0) {
+      // real regex execution error — capture the reason for production debugging
+      char msgbuf[256] = {0};
+      (void)regerror(ret, regex, msgbuf, sizeof(msgbuf));
+      qDebug("REGEXP_EXTRACT: regexec failed for pattern '%s', reason: %s", patStr, msgbuf);
+      code = TSDB_CODE_PAR_REGULAR_EXPRESSION_ERROR;
+      terrno = code;
+      break;
+    } else {
+      int32_t matchStart = pmatch[groupIdx].rm_so;
+      int32_t matchLen   = pmatch[groupIdx].rm_eo - pmatch[groupIdx].rm_so;
+
+      if (isNchar) {
+        // Convert matched UTF-8 bytes back to NCHAR (UCS-4) directly into outBuf
+        // to avoid a per-row malloc/free cycle.
+        // outBuf data capacity (outBufLen - VARSTR_HEADER_SIZE) >= N*TSDB_NCHAR_SIZE
+        // which is always >= matchedCodepoints*TSDB_NCHAR_SIZE.
+        int32_t matchedNcharLen = 0;
+        bool    ok = taosMbsToUcs4(strNt + matchStart, matchLen,
+                                   (TdUcs4 *)(outBuf + VARSTR_HEADER_SIZE),
+                                   outBufLen - VARSTR_HEADER_SIZE,
+                                   &matchedNcharLen, pInput[0].charsetCxt);
+        if (!ok) {
+          code = TSDB_CODE_SCALAR_CONVERT_ERROR;
+          terrno = code;
+          break;
+        }
+        *(VarDataLenT *)outBuf = matchedNcharLen;
+        code = colDataSetVal(pOutputData, i, outBuf, false);
+        if (code != TSDB_CODE_SUCCESS) terrno = code;
+      } else {
+        *(VarDataLenT *)outBuf = matchLen;
+        (void)memcpy(outBuf + VARSTR_HEADER_SIZE, strNt + matchStart, matchLen);
+        code = colDataSetVal(pOutputData, i, outBuf, false);
+        if (code != TSDB_CODE_SUCCESS) terrno = code;
+      }
+    }
+
+    if (code != TSDB_CODE_SUCCESS) break;
+  }
+
+  taosMemoryFree(strNt);
+  taosMemoryFree(outBuf);
+  taosMemoryFree(pmatch);
+_exit:
+  if (needFreePat) taosMemoryFree(patStr);
+  pOutput->numOfRows = numOfRows;
+  return code;
+}
+
 int32_t generateTotpSecretFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput) {
   SColumnInfoData *pInputData = pInput->columnData;
   SColumnInfoData *pOutputData = pOutput->columnData;
diff --git a/test/cases/11-Functions/01-Scalar/test_fun_sca_regexp_extract.py b/test/cases/11-Functions/01-Scalar/test_fun_sca_regexp_extract.py
new file mode 100644
index 000000000000..5eed7e1e4f7c
--- /dev/null
+++ b/test/cases/11-Functions/01-Scalar/test_fun_sca_regexp_extract.py
@@ -0,0 +1,345 @@
+from new_test_framework.utils import tdLog, tdSql
+import datetime
+
+
+class TestFunRegexpExtract:
+
+    def setup_class(cls):
+        cls.replicaVar = 1
+        tdLog.debug(f"start to execute {__file__}")
+
+    # ------------------------------------------------------------------
+    # Helpers
+    # ------------------------------------------------------------------
+
+    def _create_tb(self, dbname="db"):
+        tdSql.execute(f"""CREATE STABLE {dbname}.st (
+            ts TIMESTAMP, vc VARCHAR(128), nc NCHAR(64), iv INT
+        ) TAGS (t INT)""")
+        tdSql.execute(f"CREATE TABLE {dbname}.ct1 USING {dbname}.st TAGS(1)")
+        tdSql.execute(f"CREATE TABLE {dbname}.ct2 USING {dbname}.st TAGS(2)")
+        tdSql.execute(f"""CREATE TABLE {dbname}.nt (
+            ts TIMESTAMP, vc VARCHAR(128), nc NCHAR(64), iv INT
+        )""")
+
+    def _insert_data(self, dbname="db"):
+        now = int(datetime.datetime.timestamp(datetime.datetime.now()) * 1000)
+        # ct1: log-style rows + one NULL row
+        ct1_rows = [
+            (now - 4000, "'code=42,type=DISK_FULL'", "'code=42,type=DISK_FULL'", 42),
+            (now - 3000, "'code=7,type=LOW_MEM'",   "'code=7,type=LOW_MEM'",     7),
+            (now - 2000, "'code=0,type=OK'",         "'code=0,type=OK'",          0),
+            (now - 1000, "NULL",                     "NULL",                      "NULL"),
+        ]
+        for ts, vc, nc, iv in ct1_rows:
+            tdSql.execute(f"INSERT INTO {dbname}.ct1 VALUES({ts}, {vc}, {nc}, {iv})")
+        # ct2: URL-style rows
+        ct2_rows = [
+            (now - 3000, "'https://example.com'", "'https://example.com'", 1),
+            (now - 2000, "'http://api.example.org'", "'http://api.example.org'", 2),
+            (now - 1000, "'ftp://files.example.net'", "'ftp://files.example.net'", 3),
+        ]
+        for ts, vc, nc, iv in ct2_rows:
+            tdSql.execute(f"INSERT INTO {dbname}.ct2 VALUES({ts}, {vc}, {nc}, {iv})")
+        # nt: same as ct1
+        for ts, vc, nc, iv in ct1_rows:
+            tdSql.execute(f"INSERT INTO {dbname}.nt VALUES({ts}, {vc}, {nc}, {iv})")
+
+    def _check_basic(self, dbname="db"):
+        # -----------------------------------------------------------------
+        # §1  Default group_idx=1 — no-table queries
+        # -----------------------------------------------------------------
+        # RXE-BASIC-001: single capture group → group 1
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', '(b)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'b')
+
+        # RXE-BASIC-002: multiple capture groups, default → group 1 only
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', '(b)(c)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'b')
+
+        # RXE-BASIC-003: no capture group, default group_idx=1 → NULL
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', 'b')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # -----------------------------------------------------------------
+        # §2  group_idx=0 whole match
+        # -----------------------------------------------------------------
+        # RXE-GRP0-001: no capture group, group_idx=0 → whole match
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', 'b', 0)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'b')
+
+        # RXE-GRP0-002: with capture group, group_idx=0 → whole match ≠ group 1
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', '(b)c', 0)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'bc')
+
+        # RXE-GRP0-003: no match, group_idx=0 → NULL
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', 'x+', 0)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # -----------------------------------------------------------------
+        # §3  Multiple capture groups by explicit index
+        # -----------------------------------------------------------------
+        # RXE-GRP-001: explicit group_idx=1 → group 1
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', '(b)(c)', 1)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'b')
+
+        # RXE-GRP-002: explicit group_idx=2 → group 2
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', '(b)(c)', 2)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'c')
+
+        # RXE-GRP-003: group_idx out of range → NULL, no error
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', '(b)(c)', 3)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # -----------------------------------------------------------------
+        # §4  NULL and no-match
+        # -----------------------------------------------------------------
+        # RXE-NULL-001: str=NULL → NULL
+        tdSql.query("SELECT REGEXP_EXTRACT(NULL, '(a+)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # RXE-NULL-002: no match → NULL
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', '(x+)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # RXE-NULL-003: multiple matches, only first (leftmost) returned
+        tdSql.query("SELECT REGEXP_EXTRACT('a1b2', '([0-9])')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, '1')
+
+        # RXE-NULL-004: str=NULL with group_idx=0 → NULL
+        tdSql.query("SELECT REGEXP_EXTRACT(NULL, 'a+', 0)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # RXE-NULL-005: explicit NULL group_idx → NULL
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', '(b)', NULL)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # RXE-NULL-006: non-participating group in alternation → NULL
+        # pattern '(a)|(b)' matches 'b' via group 2; group 1 did not participate
+        tdSql.query("SELECT REGEXP_EXTRACT('b', '(a)|(b)', 1)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # RXE-NULL-007: participating group 2 returns matched content
+        tdSql.query("SELECT REGEXP_EXTRACT('b', '(a)|(b)', 2)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'b')
+
+        # RXE-NULL-008: pattern=NULL → NULL
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', NULL)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # -----------------------------------------------------------------
+        # §5  Empty string scenarios
+        # -----------------------------------------------------------------
+        # RXE-EMPTY-001: capture group matches empty string → '' (not NULL)
+        tdSql.query("SELECT REGEXP_EXTRACT('ac', '(b?)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, '')
+
+        # RXE-EMPTY-002: empty input str with zero-length match → ''
+        tdSql.query("SELECT REGEXP_EXTRACT('', '(a*)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, '')
+
+        # -----------------------------------------------------------------
+        # §6  Table queries — per-row scalar behavior
+        # -----------------------------------------------------------------
+        # RXE-TBL-001: extract numeric code — multiple rows match 'code=([0-9]+)';
+        # verify row-by-row extraction for 42, 7, 0, and NULL propagation
+        tdSql.query(f"SELECT REGEXP_EXTRACT(vc, 'code=([0-9]+)') FROM {dbname}.ct1 ORDER BY ts")
+        tdSql.checkRows(4)
+        tdSql.checkData(0, 0, '42')
+        tdSql.checkData(1, 0, '7')
+        tdSql.checkData(2, 0, '0')
+        tdSql.checkData(3, 0, None)   # NULL row → NULL
+
+        # RXE-TBL-002: NULL column row → NULL; non-NULL rows → extracted value
+        tdSql.query(f"SELECT REGEXP_EXTRACT(vc, 'type=([A-Z_]+)') FROM {dbname}.ct1 ORDER BY ts")
+        tdSql.checkRows(4)
+        tdSql.checkData(0, 0, 'DISK_FULL')
+        tdSql.checkData(1, 0, 'LOW_MEM')
+        tdSql.checkData(2, 0, 'OK')
+        tdSql.checkData(3, 0, None)
+
+        # RXE-TBL-003: empty table → 0 rows, no error
+        tdSql.execute(f"CREATE TABLE IF NOT EXISTS {dbname}.empty_t (ts TIMESTAMP, vc VARCHAR(64))")
+        tdSql.query(f"SELECT REGEXP_EXTRACT(vc, '([0-9]+)') FROM {dbname}.empty_t")
+        tdSql.checkRows(0)
+
+        # -----------------------------------------------------------------
+        # §7  WHERE clause
+        # -----------------------------------------------------------------
+        # RXE-WHERE-001: IS NOT NULL filters to rows with a match
+        tdSql.query(f"SELECT vc FROM {dbname}.ct1 "
+                    "WHERE REGEXP_EXTRACT(vc, 'code=([4-9][0-9]+)') IS NOT NULL "
+                    "ORDER BY ts")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'code=42,type=DISK_FULL')
+
+        # RXE-WHERE-002: equality on extracted scheme value
+        tdSql.query(f"SELECT vc FROM {dbname}.ct2 "
+                    "WHERE REGEXP_EXTRACT(vc, '(https?)://') = 'https' "
+                    "ORDER BY ts")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'https://example.com')
+
+        # -----------------------------------------------------------------
+        # §8  NCHAR column: extraction result equals VARCHAR equivalent
+        # -----------------------------------------------------------------
+        # RXE-NCHAR-001: NCHAR input yields same extracted value as VARCHAR
+        tdSql.query(f"SELECT REGEXP_EXTRACT(nc, 'code=([0-9]+)') FROM {dbname}.ct1 ORDER BY ts")
+        tdSql.checkRows(4)
+        tdSql.checkData(0, 0, '42')
+        tdSql.checkData(1, 0, '7')
+        tdSql.checkData(2, 0, '0')
+        tdSql.checkData(3, 0, None)
+
+        # -----------------------------------------------------------------
+        # §9  Subquery with GROUP BY
+        # -----------------------------------------------------------------
+        # RXE-SUB-001: group by extracted URL scheme
+        tdSql.query(f"""SELECT scheme, COUNT(*) AS cnt
+            FROM (SELECT REGEXP_EXTRACT(vc, '(https?)://') AS scheme FROM {dbname}.ct2) t
+            WHERE scheme IS NOT NULL
+            GROUP BY scheme
+            ORDER BY scheme""")
+        tdSql.checkRows(2)
+        tdSql.checkData(0, 0, 'http')
+        tdSql.checkData(0, 1, 1)
+        tdSql.checkData(1, 0, 'https')
+        tdSql.checkData(1, 1, 1)
+
+        # -----------------------------------------------------------------
+        # §10  ERE features (character class, anchors, case sensitivity)
+        # -----------------------------------------------------------------
+        # RXE-RE-001: character class extracts decimal number
+        tdSql.query("SELECT REGEXP_EXTRACT('v=3.14', '([0-9]+\\.[0-9]+)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, '3.14')
+
+        # RXE-RE-002a: anchor ^ matches at start → 'a'
+        tdSql.query("SELECT REGEXP_EXTRACT('abc', '^(a)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'a')
+
+        # RXE-RE-002b: anchor ^ requires position 0; 'x' blocks match → NULL
+        tdSql.query("SELECT REGEXP_EXTRACT('xabc', '^(a)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # RXE-RE-003a: case-sensitive by default → NULL
+        tdSql.query("SELECT REGEXP_EXTRACT('ABC', '(abc)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+        # RXE-RE-003b: LOWER() enables case-insensitive extraction → 'abc'
+        tdSql.query("SELECT REGEXP_EXTRACT(LOWER('ABC'), '(abc)')")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, 'abc')
+
+    def _check_error(self, dbname="db"):
+        # -----------------------------------------------------------------
+        # §11  Error cases
+        # -----------------------------------------------------------------
+        # RXE-ERR-001: too few arguments (1)
+        tdSql.error("SELECT REGEXP_EXTRACT('abc')")
+
+        # RXE-ERR-002: too many arguments (4)
+        tdSql.error("SELECT REGEXP_EXTRACT('abc', '(b)', 1, 0)")
+
+        # RXE-ERR-003: str is non-string type (INT column)
+        tdSql.error(f"SELECT REGEXP_EXTRACT(iv, '([0-9]+)') FROM {dbname}.ct1")
+
+        # RXE-ERR-004: pattern is a column reference (not a constant)
+        tdSql.error(f"SELECT REGEXP_EXTRACT(vc, vc) FROM {dbname}.ct1")
+
+        # RXE-ERR-005: negative group_idx → translation-phase error
+        tdSql.error("SELECT REGEXP_EXTRACT('abc', '(b)', -1)")
+
+        # RXE-ERR-006: invalid regex (unmatched parenthesis)
+        tdSql.error("SELECT REGEXP_EXTRACT('abc', '(b', 1)")
+
+        # RXE-ERR-007: group_idx exceeds maximum (512)
+        tdSql.error("SELECT REGEXP_EXTRACT('abc', '(b)', 513)")
+
+    def _check_doc_examples(self):
+        # -----------------------------------------------------------------
+        # §12  Doc examples — verify the three queries from the user manual
+        # -----------------------------------------------------------------
+        # RXE-DOC-001: date string, group 1 → year
+        tdSql.query("SELECT REGEXP_EXTRACT('2026-04-22', '([0-9]{4})-([0-9]{2})-([0-9]{2})', 1)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, '2026')
+
+        # RXE-DOC-002: date string, group 0 → whole match
+        tdSql.query("SELECT REGEXP_EXTRACT('2026-04-22', '([0-9]{4})-([0-9]{2})-([0-9]{2})', 0)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, '2026-04-22')
+
+        # RXE-DOC-003: no match → NULL
+        tdSql.query("SELECT REGEXP_EXTRACT('no-digits-here', '[0-9]+', 1)")
+        tdSql.checkRows(1)
+        tdSql.checkData(0, 0, None)
+
+    def all_test(self, dbname="db"):
+        self._check_basic(dbname)
+        self._check_error(dbname)
+        self._check_doc_examples()
+
+    def test_fun_sca_regexp_extract(self):
+        """Fun: regexp_extract()
+
+        1. regexp_extract default group_idx=1 returns first capture group
+        2. regexp_extract group_idx=0 returns whole match substring
+        3. regexp_extract with explicit group index (1, 2, out-of-range)
+        4. regexp_extract NULL input (str, pattern, group_idx) and no-match return NULL
+        5. regexp_extract capture group matching empty string returns ''
+        6. regexp_extract on table columns with per-row scalar semantics
+        7. regexp_extract in WHERE clause for row filtering
+        8. regexp_extract on NCHAR column (return type NCHAR)
+        9. regexp_extract in subquery with GROUP BY
+        10. regexp_extract POSIX ERE features: character class, anchors, case sensitivity
+        11. regexp_extract invalid parameter error cases (including group_idx > 512)
+        12. regexp_extract user-manual doc examples
+
+        Since: v3.4.2.0
+
+        Labels: common,ci
+
+        Jira: None
+
+        History:
+            - 2026-04-20 Stephen Created
+        """
+        dbname = "db"
+        tdSql.prepare()
+
+        tdLog.printNoPrefix("==========step1:create table")
+        self._create_tb(dbname)
+
+        tdLog.printNoPrefix("==========step2:insert data")
+        self._insert_data(dbname)
+
+        tdLog.printNoPrefix("==========step3:all check")
+        self.all_test(dbname)
+
+        tdSql.execute(f"flush database {dbname}")
+
+        tdLog.printNoPrefix("==========step4:after wal, all check again")
+        self.all_test(dbname)
diff --git a/test/ci/cases.task b/test/ci/cases.task
index 7b415ae1e8a3..a2217d5e0c01 100644
--- a/test/ci/cases.task
+++ b/test/ci/cases.task
@@ -455,6 +455,7 @@
 ,,y,.,./ci/pytest.sh pytest cases/11-Functions/01-Scalar/test_fun_sca_to_iso8601.py
 ,,y,.,./ci/pytest.sh pytest cases/11-Functions/01-Scalar/test_fun_sca_to_timestamp.py
 ,,y,.,./ci/pytest.sh pytest cases/11-Functions/01-Scalar/test_fun_sca_to_unixtimestamp.py
+,,y,.,./ci/pytest.sh pytest cases/11-Functions/01-Scalar/test_fun_sca_regexp_extract.py
 ,,y,.,./ci/pytest.sh pytest cases/11-Functions/01-Scalar/test_fun_sca_today.py
 ,,y,.,./ci/pytest.sh pytest cases/11-Functions/01-Scalar/test_fun_sca_upper.py
 ,,y,.,./ci/pytest.sh pytest cases/11-Functions/01-Scalar/test_fun_sca_cast_blob.py