Skip to content

Add performance test suite: memory leak detection + battery drain estimation#6495

Open
human-pages-ai wants to merge 4 commits intoBasedHardware:mainfrom
human-pages-ai:perf-test-suite-3858
Open

Add performance test suite: memory leak detection + battery drain estimation#6495
human-pages-ai wants to merge 4 commits intoBasedHardware:mainfrom
human-pages-ai:perf-test-suite-3858

Conversation

@human-pages-ai
Copy link
Copy Markdown

Summary

Resolves #3858 — adds a comprehensive performance test suite for the Omi Flutter app.

New files

  • integration_test/memory_leak_test.dart — Memory leak detection via RSS tracking across 10 navigation cycles. Uses linear regression with R² correlation to detect monotonic growth. Includes isolated widget lifecycle stress tests (ListView, AnimationController, Image grid — 20 create/destroy cycles each). Outputs JSON + CSV.

  • integration_test/battery_drain_test.dart — Profiles 5 app states (idle home, active scrolling, chat, typing indicator, rapid navigation) for 60s each using FrameTiming callbacks. Reports build/raster times at p50/p90/p99. Detects frame rendering cost degradation over 10 rounds of 30s each. Uses relative rendering cost metric for comparing states.

  • scripts/run_performance_tests.sh — Orchestrates all 6 integration tests (4 existing + 2 new) via flutter drive --profile. Supports --duration 24 for 24-hour continuous runs with hourly checkpoint reports. Generates JSON + Markdown reports. Quick mode (--quick) and single-test mode (--test memory).

  • test_driver/integration_test.dart — Required driver file for flutter drive.

Design decisions

  • RSS, not Dart heap: We use ProcessInfo.currentRss (process-level Resident Set Size) rather than VM Service heap snapshots. RSS is a coarser proxy but is available without observatory connection. The code is honest about this — documented in comments, report headers say "RSS" not "heap."
  • Relative rendering cost, not absolute battery: The battery test computes FPS × avg frame time as a relative metric for comparing app states. It is explicitly documented as NOT an absolute mAh measurement.
  • Fail-fast on auth: If the app can't reach the home screen (stuck on login/onboarding), tests fail immediately with a clear error rather than silently profiling the wrong screen.
  • GC honesty: We don't pretend to force GC. The code pauses to give the runtime opportunities to collect, and documents that GC timing is not guaranteed.

Usage

bash app/scripts/run_performance_tests.sh [device_id] [options]
bash app/scripts/run_performance_tests.sh --quick          # smoke test
bash app/scripts/run_performance_tests.sh --duration 24    # 24h continuous
bash app/scripts/run_performance_tests.sh --test memory    # single test

Validation

  • dart analyze: 0 issues on all new files
  • dart format --line-length 120: formatted
  • No changes to existing files

Looking for a device tester 🤝

This code passes static analysis and is structurally complete, but we don't have an Omi device to run on. If you have a device and want to co-own this bounty, let us know — we'll iterate together on any runtime issues.

AI disclosure

This code was generated with Claude (Anthropic). Human-reviewed and refined through adversarial critique.

Test plan

  • Run dart analyze integration_test/memory_leak_test.dart integration_test/battery_drain_test.dart — should show 0 issues
  • Run bash -n app/scripts/run_performance_tests.sh — should show syntax OK
  • Run bash app/scripts/run_performance_tests.sh --test memory on a device with the app signed in
  • Run bash app/scripts/run_performance_tests.sh --quick for a full smoke test
  • Verify JSON + Markdown reports are generated in /tmp/omi_perf_reports/

🤖 Generated with Claude Code

human-pages-ai and others added 4 commits April 9, 2026 15:16
Required by flutter drive to run integration tests on real devices.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RSS-based memory tracking across navigation cycles with linear regression
trend analysis. Detects monotonic growth patterns that indicate leaks.
Includes isolated widget lifecycle stress tests (ListView,
AnimationController, Image grid). Outputs JSON + CSV reports.

Honestly documents that RSS != Dart heap and that GC cannot be forced.
Fails loudly if app is stuck on login screen.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Profiles 5 app states (idle, scrolling, chat, typing, rapid nav) for 60s
each using FrameTiming callbacks. Measures build/raster times at
p50/p90/p99. Detects frame cost degradation over 10 rounds.

Uses relative rendering cost metric (FPS * frame time) for comparing
states — clearly documented as relative, not absolute mAh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Orchestrates all 6 integration tests via flutter drive --profile.
Supports --duration 24 for 24h continuous runs with hourly checkpoint
reports. Generates JSON + Markdown reports. Supports --quick mode and
--test <name> for individual tests.

Usage: bash app/scripts/run_performance_tests.sh [device_id] [options]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR adds a performance test suite for the Omi Flutter app, comprising memory leak detection (memory_leak_test.dart), battery drain estimation (battery_drain_test.dart), and an orchestration shell script (run_performance_tests.sh), plus the required test_driver/integration_test.dart.

  • The shell script uses declare -A associative arrays, which require bash 4.0+ but macOS ships with bash 3.2, causing the script to fail immediately on macOS without a Homebrew-installed bash.
  • battery_drain_test.dart calls app.main() in both testWidgets blocks; the second invocation attempts to re-initialize Firebase, providers, and the full app while the process is already running.

Confidence Score: 4/5

Not safe to merge as-is; two P1 issues need resolution before reliable use on macOS or multi-test runs.

Two P1 findings: the bash 3.2 incompatibility will prevent the script from running at all on macOS, and the double app.main() call risks corrupting the second integration test's results or crashing the test process. Both are straightforward to fix. All remaining findings are P2 or lower.

app/scripts/run_performance_tests.sh (bash 4+ dependency) and app/integration_test/battery_drain_test.dart (double app.main() in second testWidgets)

Vulnerabilities

No security concerns identified. The tests operate in a sandboxed test environment, write only to /tmp, and do not handle credentials or sensitive user data.

Important Files Changed

Filename Overview
app/integration_test/battery_drain_test.dart Adds battery drain / frame cost profiling tests; two issues: double app.main() call in second testWidgets block (P1), hardcoded 16ms jank threshold doesn't account for 120Hz devices, and /tmp output paths silently fail on iOS.
app/integration_test/memory_leak_test.dart Adds RSS-based memory leak detection with linear regression; correctly isolated from the app binary in the second testWidgets block; /tmp write paths silently fail on iOS devices.
app/scripts/run_performance_tests.sh Orchestration script uses declare -A associative arrays (bash 4.0+) which fail immediately on macOS default bash 3.2, blocking the primary developer use case.
app/test_driver/integration_test.dart Minimal boilerplate driver file required by flutter drive; correct and complete.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Script as run_performance_tests.sh
    participant FlutterDrive as flutter drive
    participant MemTest as memory_leak_test.dart
    participant BatTest as battery_drain_test.dart
    participant App as Omi App (device)
    participant Tmp as /tmp/omi_perf_reports/

    Dev->>Script: bash run_performance_tests.sh
    Script->>Script: declare -A TESTS (requires bash 4+)
    Script->>FlutterDrive: flutter drive --target=memory_leak_test.dart --profile
    FlutterDrive->>MemTest: run testWidgets Detect heap growth
    MemTest->>App: app.main() to home screen
    loop 10 navigation cycles
        MemTest->>App: navigate screens, scroll, back
        MemTest->>MemTest: ProcessInfo.currentRss snapshot
    end
    MemTest->>MemTest: linear regression + R2 on RSS values
    MemTest->>Tmp: omi_memory_*.json + omi_memory_*.csv
    FlutterDrive->>BatTest: run testWidgets Profile CPU and frame cost
    BatTest->>App: app.main() already running P1
    loop 5 app states x 60s each
        BatTest->>App: idle / scroll / chat / typing / rapid nav
        BatTest->>BatTest: WidgetsBinding.addTimingsCallback FrameTiming
        BatTest->>BatTest: compute FPS x avgFrameTime renderingCost
    end
    BatTest->>Tmp: omi_battery_*.json
    Script->>Tmp: generate report_final.json + report_final.md
    Script->>Dev: summary N passed / N failed
Loading

Reviews (1): Last reviewed commit: "Add performance test runner script" | Re-trigger Greptile

Comment on lines +141 to +145
TESTS[memory]="integration_test/memory_leak_test.dart"
TESTS[battery]="integration_test/battery_drain_test.dart"
TESTS[animation]="integration_test/animation_performance_test.dart"
TESTS[shimmer]="integration_test/shimmer_cpu_test.dart"
TESTS[rebuild]="integration_test/widget_rebuild_profiling_test.dart"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 declare -A requires bash 4.0+ — fails on macOS default bash

macOS ships with bash 3.2 at /bin/bash (GPLv2 cap). declare -A (associative arrays) was added in bash 4.0, so the script aborts immediately on an unmodified macOS machine with:

./run_performance_tests.sh: line 141: declare: -A: invalid option

This blocks the primary intended use case. Replace associative arrays with positional parallel arrays (bash 3.2-compatible) or add an explicit version guard at the top:

Suggested change
TESTS[memory]="integration_test/memory_leak_test.dart"
TESTS[battery]="integration_test/battery_drain_test.dart"
TESTS[animation]="integration_test/animation_performance_test.dart"
TESTS[shimmer]="integration_test/shimmer_cpu_test.dart"
TESTS[rebuild]="integration_test/widget_rebuild_profiling_test.dart"
# Require bash 4+ for associative arrays (macOS ships with bash 3.2)
if [[ "${BASH_VERSINFO[0]}" -lt 4 ]]; then
echo "ERROR: bash 4.0+ required. On macOS run: brew install bash" >&2
exit 1
fi
# Array of test files and their names
declare -A TESTS
TESTS[memory]="integration_test/memory_leak_test.dart"
TESTS[battery]="integration_test/battery_drain_test.dart"
TESTS[animation]="integration_test/animation_performance_test.dart"
TESTS[shimmer]="integration_test/shimmer_cpu_test.dart"
TESTS[rebuild]="integration_test/widget_rebuild_profiling_test.dart"
TESTS[app]="integration_test/app_performance_test.dart"

Comment on lines +171 to +175
app.main();
for (int i = 0; i < 50; i++) {
await tester.pump(const Duration(milliseconds: 100));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Double app.main() call in shared integration test process

Both testWidgets blocks in this file call app.main(). When flutter drive runs this file, all tests execute in the same process — Firebase, provider registrations, and the widget tree from the first test are still live when the second test starts. Calling app.main() a second time can throw FirebaseException, crash the provider graph, or silently produce invalid profiling data because two widget trees are competing.

The second test (Measure frame rendering cost over time) should rely on the already-running app (omit app.main()) or this test should be moved to its own file with a single testWidgets + app.main() call.

Comment on lines +487 to +519
void _writeResultsToFile(List<_StateProfile> states) {
try {
final timestamp = DateTime.now().toIso8601String().replaceAll(':', '-');
final file = File('/tmp/omi_battery_$timestamp.json');

final buffer = StringBuffer();
buffer.writeln('{');
buffer.writeln(' "test": "battery_drain_estimation",');
buffer.writeln(' "timestamp": "${DateTime.now().toIso8601String()}",');
buffer.writeln(' "states": [');
for (int i = 0; i < states.length; i++) {
final s = states[i];
buffer.write(' {"name": "${s.name}", '
'"duration_seconds": ${s.durationSeconds}, '
'"total_frames": ${s.totalFrames}, '
'"fps": ${s.framesPerSecond.toStringAsFixed(2)}, '
'"avg_build_ms": ${s.avgBuildMs.toStringAsFixed(3)}, '
'"avg_raster_ms": ${s.avgRasterMs.toStringAsFixed(3)}, '
'"p50_build_ms": ${s.p50BuildMs.toStringAsFixed(3)}, '
'"p90_build_ms": ${s.p90BuildMs.toStringAsFixed(3)}, '
'"p99_build_ms": ${s.p99BuildMs.toStringAsFixed(3)}, '
'"janky_frames": ${s.jankyFrames}, '
'"janky_percent": ${s.jankyPercent.toStringAsFixed(2)}, '
'"rendering_cost_relative": ${s.renderingCost.toStringAsFixed(2)}}');
if (i < states.length - 1) buffer.writeln(',');
}
buffer.writeln('\n ]');
buffer.writeln('}');
file.writeAsStringSync(buffer.toString());
debugPrint('Battery results saved to: ${file.path}');
} catch (e) {
debugPrint('Could not save results: $e');
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded /tmp paths silently fail on iOS devices

File('/tmp/omi_battery_$timestamp.json') is inaccessible inside the iOS app sandbox. The try/catch prevents a crash, but test results are silently discarded whenever these tests run on a real iOS device.

Use path_provider's getTemporaryDirectory() for cross-platform temp paths:

import 'package:path_provider/path_provider.dart';
// ...
final dir = await getTemporaryDirectory();
final file = File('${dir.path}/omi_battery_$timestamp.json');

The same issue applies to _writeFrameCostResults() in this file and both _writeResultsToFile()/_writeWidgetLeakResults() in memory_leak_test.dart.

p50BuildMs: buildUs[buildUs.length ~/ 2] / 1000,
p90BuildMs: buildUs[(buildUs.length * 0.9).toInt()] / 1000,
p99BuildMs: buildUs[(buildUs.length * 0.99).toInt()] / 1000,
jankyFrames: timings.where((t) => t.buildDuration.inMilliseconds + t.rasterDuration.inMilliseconds > 16).length,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Jank threshold hardcoded to 16ms ignores 90/120Hz displays

> 16 ms catches frames that miss a 60 Hz budget, but Omi targets modern devices where 90 Hz (11.1 ms) and 120 Hz (8.3 ms) are common. On a ProMotion display, frames between 8.3–16 ms are janky but won't be counted here, making the jank percentage misleadingly low.

Suggested change
jankyFrames: timings.where((t) => t.buildDuration.inMilliseconds + t.rasterDuration.inMilliseconds > 16).length,
jankyFrames: timings
.where((t) => t.buildDuration.inMicroseconds + t.rasterDuration.inMicroseconds > 16667) // 16.67ms = 60Hz budget; 120Hz devices may still jank below this
.length,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

omi mobile app performance tests ($300)

1 participant