Add performance test suite: memory leak detection + battery drain estimation by human-pages-ai · Pull Request #6495 · BasedHardware/omi

human-pages-ai · 2026-04-09T08:22:40Z

Summary

Resolves #3858 — adds a comprehensive performance test suite for the Omi Flutter app.

New files

integration_test/memory_leak_test.dart — Memory leak detection via RSS tracking across 10 navigation cycles. Uses linear regression with R² correlation to detect monotonic growth. Includes isolated widget lifecycle stress tests (ListView, AnimationController, Image grid — 20 create/destroy cycles each). Outputs JSON + CSV.
integration_test/battery_drain_test.dart — Profiles 5 app states (idle home, active scrolling, chat, typing indicator, rapid navigation) for 60s each using FrameTiming callbacks. Reports build/raster times at p50/p90/p99. Detects frame rendering cost degradation over 10 rounds of 30s each. Uses relative rendering cost metric for comparing states.
scripts/run_performance_tests.sh — Orchestrates all 6 integration tests (4 existing + 2 new) via flutter drive --profile. Supports --duration 24 for 24-hour continuous runs with hourly checkpoint reports. Generates JSON + Markdown reports. Quick mode (--quick) and single-test mode (--test memory).
test_driver/integration_test.dart — Required driver file for flutter drive.

Design decisions

RSS, not Dart heap: We use ProcessInfo.currentRss (process-level Resident Set Size) rather than VM Service heap snapshots. RSS is a coarser proxy but is available without observatory connection. The code is honest about this — documented in comments, report headers say "RSS" not "heap."
Relative rendering cost, not absolute battery: The battery test computes FPS × avg frame time as a relative metric for comparing app states. It is explicitly documented as NOT an absolute mAh measurement.
Fail-fast on auth: If the app can't reach the home screen (stuck on login/onboarding), tests fail immediately with a clear error rather than silently profiling the wrong screen.
GC honesty: We don't pretend to force GC. The code pauses to give the runtime opportunities to collect, and documents that GC timing is not guaranteed.

Usage

bash app/scripts/run_performance_tests.sh [device_id] [options]
bash app/scripts/run_performance_tests.sh --quick          # smoke test
bash app/scripts/run_performance_tests.sh --duration 24    # 24h continuous
bash app/scripts/run_performance_tests.sh --test memory    # single test

Validation

dart analyze: 0 issues on all new files
dart format --line-length 120: formatted
No changes to existing files

Looking for a device tester 🤝

This code passes static analysis and is structurally complete, but we don't have an Omi device to run on. If you have a device and want to co-own this bounty, let us know — we'll iterate together on any runtime issues.

AI disclosure

This code was generated with Claude (Anthropic). Human-reviewed and refined through adversarial critique.

Test plan

Run dart analyze integration_test/memory_leak_test.dart integration_test/battery_drain_test.dart — should show 0 issues
Run bash -n app/scripts/run_performance_tests.sh — should show syntax OK
Run bash app/scripts/run_performance_tests.sh --test memory on a device with the app signed in
Run bash app/scripts/run_performance_tests.sh --quick for a full smoke test
Verify JSON + Markdown reports are generated in /tmp/omi_perf_reports/

🤖 Generated with Claude Code

Required by flutter drive to run integration tests on real devices. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RSS-based memory tracking across navigation cycles with linear regression trend analysis. Detects monotonic growth patterns that indicate leaks. Includes isolated widget lifecycle stress tests (ListView, AnimationController, Image grid). Outputs JSON + CSV reports. Honestly documents that RSS != Dart heap and that GC cannot be forced. Fails loudly if app is stuck on login screen. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Profiles 5 app states (idle, scrolling, chat, typing, rapid nav) for 60s each using FrameTiming callbacks. Measures build/raster times at p50/p90/p99. Detects frame cost degradation over 10 rounds. Uses relative rendering cost metric (FPS * frame time) for comparing states — clearly documented as relative, not absolute mAh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Orchestrates all 6 integration tests via flutter drive --profile. Supports --duration 24 for 24h continuous runs with hourly checkpoint reports. Generates JSON + Markdown reports. Supports --quick mode and --test <name> for individual tests. Usage: bash app/scripts/run_performance_tests.sh [device_id] [options] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-04-09T08:27:20Z

Greptile Summary

This PR adds a performance test suite for the Omi Flutter app, comprising memory leak detection (memory_leak_test.dart), battery drain estimation (battery_drain_test.dart), and an orchestration shell script (run_performance_tests.sh), plus the required test_driver/integration_test.dart.

The shell script uses declare -A associative arrays, which require bash 4.0+ but macOS ships with bash 3.2, causing the script to fail immediately on macOS without a Homebrew-installed bash.
battery_drain_test.dart calls app.main() in both testWidgets blocks; the second invocation attempts to re-initialize Firebase, providers, and the full app while the process is already running.

Confidence Score: 4/5

Not safe to merge as-is; two P1 issues need resolution before reliable use on macOS or multi-test runs.

Two P1 findings: the bash 3.2 incompatibility will prevent the script from running at all on macOS, and the double app.main() call risks corrupting the second integration test's results or crashing the test process. Both are straightforward to fix. All remaining findings are P2 or lower.

app/scripts/run_performance_tests.sh (bash 4+ dependency) and app/integration_test/battery_drain_test.dart (double app.main() in second testWidgets)

Vulnerabilities

No security concerns identified. The tests operate in a sandboxed test environment, write only to /tmp, and do not handle credentials or sensitive user data.

Important Files Changed

Filename	Overview
app/integration_test/battery_drain_test.dart	Adds battery drain / frame cost profiling tests; two issues: double `app.main()` call in second testWidgets block (P1), hardcoded 16ms jank threshold doesn't account for 120Hz devices, and `/tmp` output paths silently fail on iOS.
app/integration_test/memory_leak_test.dart	Adds RSS-based memory leak detection with linear regression; correctly isolated from the app binary in the second testWidgets block; `/tmp` write paths silently fail on iOS devices.
app/scripts/run_performance_tests.sh	Orchestration script uses `declare -A` associative arrays (bash 4.0+) which fail immediately on macOS default bash 3.2, blocking the primary developer use case.
app/test_driver/integration_test.dart	Minimal boilerplate driver file required by `flutter drive`; correct and complete.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Script as run_performance_tests.sh
    participant FlutterDrive as flutter drive
    participant MemTest as memory_leak_test.dart
    participant BatTest as battery_drain_test.dart
    participant App as Omi App (device)
    participant Tmp as /tmp/omi_perf_reports/

    Dev->>Script: bash run_performance_tests.sh
    Script->>Script: declare -A TESTS (requires bash 4+)
    Script->>FlutterDrive: flutter drive --target=memory_leak_test.dart --profile
    FlutterDrive->>MemTest: run testWidgets Detect heap growth
    MemTest->>App: app.main() to home screen
    loop 10 navigation cycles
        MemTest->>App: navigate screens, scroll, back
        MemTest->>MemTest: ProcessInfo.currentRss snapshot
    end
    MemTest->>MemTest: linear regression + R2 on RSS values
    MemTest->>Tmp: omi_memory_*.json + omi_memory_*.csv
    FlutterDrive->>BatTest: run testWidgets Profile CPU and frame cost
    BatTest->>App: app.main() already running P1
    loop 5 app states x 60s each
        BatTest->>App: idle / scroll / chat / typing / rapid nav
        BatTest->>BatTest: WidgetsBinding.addTimingsCallback FrameTiming
        BatTest->>BatTest: compute FPS x avgFrameTime renderingCost
    end
    BatTest->>Tmp: omi_battery_*.json
    Script->>Tmp: generate report_final.json + report_final.md
    Script->>Dev: summary N passed / N failed

_{Reviews (1): Last reviewed commit: "Add performance test runner script" | Re-trigger Greptile}

greptile-apps · 2026-04-09T08:27:23Z

app/scripts/run_performance_tests.sh

+TESTS[memory]="integration_test/memory_leak_test.dart"
+TESTS[battery]="integration_test/battery_drain_test.dart"
+TESTS[animation]="integration_test/animation_performance_test.dart"
+TESTS[shimmer]="integration_test/shimmer_cpu_test.dart"
+TESTS[rebuild]="integration_test/widget_rebuild_profiling_test.dart"


declare -A requires bash 4.0+ — fails on macOS default bash

macOS ships with bash 3.2 at /bin/bash (GPLv2 cap). declare -A (associative arrays) was added in bash 4.0, so the script aborts immediately on an unmodified macOS machine with:

./run_performance_tests.sh: line 141: declare: -A: invalid option

This blocks the primary intended use case. Replace associative arrays with positional parallel arrays (bash 3.2-compatible) or add an explicit version guard at the top:

Suggested change

TESTS[memory]="integration_test/memory_leak_test.dart"

TESTS[battery]="integration_test/battery_drain_test.dart"

TESTS[animation]="integration_test/animation_performance_test.dart"

TESTS[shimmer]="integration_test/shimmer_cpu_test.dart"

TESTS[rebuild]="integration_test/widget_rebuild_profiling_test.dart"

# Require bash 4+ for associative arrays (macOS ships with bash 3.2)

if [[ "${BASH_VERSINFO[0]}" -lt 4 ]]; then

echo "ERROR: bash 4.0+ required. On macOS run: brew install bash" >&2

exit 1

fi

# Array of test files and their names

declare -A TESTS

TESTS[memory]="integration_test/memory_leak_test.dart"

TESTS[battery]="integration_test/battery_drain_test.dart"

TESTS[animation]="integration_test/animation_performance_test.dart"

TESTS[shimmer]="integration_test/shimmer_cpu_test.dart"

TESTS[rebuild]="integration_test/widget_rebuild_profiling_test.dart"

TESTS[app]="integration_test/app_performance_test.dart"

greptile-apps · 2026-04-09T08:27:24Z

app/integration_test/battery_drain_test.dart

+      app.main();
+      for (int i = 0; i < 50; i++) {
+        await tester.pump(const Duration(milliseconds: 100));
+      }
+


Double app.main() call in shared integration test process

Both testWidgets blocks in this file call app.main(). When flutter drive runs this file, all tests execute in the same process — Firebase, provider registrations, and the widget tree from the first test are still live when the second test starts. Calling app.main() a second time can throw FirebaseException, crash the provider graph, or silently produce invalid profiling data because two widget trees are competing.

The second test (Measure frame rendering cost over time) should rely on the already-running app (omit app.main()) or this test should be moved to its own file with a single testWidgets + app.main() call.

greptile-apps · 2026-04-09T08:27:25Z

app/integration_test/battery_drain_test.dart

+void _writeResultsToFile(List<_StateProfile> states) {
+  try {
+    final timestamp = DateTime.now().toIso8601String().replaceAll(':', '-');
+    final file = File('/tmp/omi_battery_$timestamp.json');
+
+    final buffer = StringBuffer();
+    buffer.writeln('{');
+    buffer.writeln('  "test": "battery_drain_estimation",');
+    buffer.writeln('  "timestamp": "${DateTime.now().toIso8601String()}",');
+    buffer.writeln('  "states": [');
+    for (int i = 0; i < states.length; i++) {
+      final s = states[i];
+      buffer.write('    {"name": "${s.name}", '
+          '"duration_seconds": ${s.durationSeconds}, '
+          '"total_frames": ${s.totalFrames}, '
+          '"fps": ${s.framesPerSecond.toStringAsFixed(2)}, '
+          '"avg_build_ms": ${s.avgBuildMs.toStringAsFixed(3)}, '
+          '"avg_raster_ms": ${s.avgRasterMs.toStringAsFixed(3)}, '
+          '"p50_build_ms": ${s.p50BuildMs.toStringAsFixed(3)}, '
+          '"p90_build_ms": ${s.p90BuildMs.toStringAsFixed(3)}, '
+          '"p99_build_ms": ${s.p99BuildMs.toStringAsFixed(3)}, '
+          '"janky_frames": ${s.jankyFrames}, '
+          '"janky_percent": ${s.jankyPercent.toStringAsFixed(2)}, '
+          '"rendering_cost_relative": ${s.renderingCost.toStringAsFixed(2)}}');
+      if (i < states.length - 1) buffer.writeln(',');
+    }
+    buffer.writeln('\n  ]');
+    buffer.writeln('}');
+    file.writeAsStringSync(buffer.toString());
+    debugPrint('Battery results saved to: ${file.path}');
+  } catch (e) {
+    debugPrint('Could not save results: $e');
+  }


Hardcoded /tmp paths silently fail on iOS devices

File('/tmp/omi_battery_$timestamp.json') is inaccessible inside the iOS app sandbox. The try/catch prevents a crash, but test results are silently discarded whenever these tests run on a real iOS device.

Use path_provider's getTemporaryDirectory() for cross-platform temp paths:

import 'package:path_provider/path_provider.dart'; // ... final dir = await getTemporaryDirectory(); final file = File('${dir.path}/omi_battery_$timestamp.json');

The same issue applies to _writeFrameCostResults() in this file and both _writeResultsToFile()/_writeWidgetLeakResults() in memory_leak_test.dart.

greptile-apps · 2026-04-09T08:27:26Z

app/integration_test/battery_drain_test.dart

+    p50BuildMs: buildUs[buildUs.length ~/ 2] / 1000,
+    p90BuildMs: buildUs[(buildUs.length * 0.9).toInt()] / 1000,
+    p99BuildMs: buildUs[(buildUs.length * 0.99).toInt()] / 1000,
+    jankyFrames: timings.where((t) => t.buildDuration.inMilliseconds + t.rasterDuration.inMilliseconds > 16).length,


Jank threshold hardcoded to 16ms ignores 90/120Hz displays

> 16 ms catches frames that miss a 60 Hz budget, but Omi targets modern devices where 90 Hz (11.1 ms) and 120 Hz (8.3 ms) are common. On a ProMotion display, frames between 8.3–16 ms are janky but won't be counted here, making the jank percentage misleadingly low.

Suggested change

jankyFrames: timings.where((t) => t.buildDuration.inMilliseconds + t.rasterDuration.inMilliseconds > 16).length,

jankyFrames: timings

.where((t) => t.buildDuration.inMicroseconds + t.rasterDuration.inMicroseconds > 16667) // 16.67ms = 60Hz budget; 120Hz devices may still jank below this

.length,

human-pages-ai and others added 4 commits April 9, 2026 15:16

Add test_driver for flutter drive integration tests

af4663a

Required by flutter drive to run integration tests on real devices. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

human-pages-ai mentioned this pull request Apr 9, 2026

omi mobile app performance tests ($300) #3858

Open

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance test suite: memory leak detection + battery drain estimation#6495

Add performance test suite: memory leak detection + battery drain estimation#6495
human-pages-ai wants to merge 4 commits intoBasedHardware:mainfrom
human-pages-ai:perf-test-suite-3858

human-pages-ai commented Apr 9, 2026

Uh oh!

greptile-apps bot commented Apr 9, 2026

Vulnerabilities

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

human-pages-ai commented Apr 9, 2026

Summary

New files

Design decisions

Usage

Validation

Looking for a device tester 🤝

AI disclosure

Test plan

Uh oh!

greptile-apps bot commented Apr 9, 2026

Greptile Summary

Confidence Score: 4/5

Vulnerabilities

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant