Add performance test suite: memory leak detection + battery drain estimation#6495
Add performance test suite: memory leak detection + battery drain estimation#6495human-pages-ai wants to merge 4 commits intoBasedHardware:mainfrom
Conversation
Required by flutter drive to run integration tests on real devices. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RSS-based memory tracking across navigation cycles with linear regression trend analysis. Detects monotonic growth patterns that indicate leaks. Includes isolated widget lifecycle stress tests (ListView, AnimationController, Image grid). Outputs JSON + CSV reports. Honestly documents that RSS != Dart heap and that GC cannot be forced. Fails loudly if app is stuck on login screen. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Profiles 5 app states (idle, scrolling, chat, typing, rapid nav) for 60s each using FrameTiming callbacks. Measures build/raster times at p50/p90/p99. Detects frame cost degradation over 10 rounds. Uses relative rendering cost metric (FPS * frame time) for comparing states — clearly documented as relative, not absolute mAh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Orchestrates all 6 integration tests via flutter drive --profile. Supports --duration 24 for 24h continuous runs with hourly checkpoint reports. Generates JSON + Markdown reports. Supports --quick mode and --test <name> for individual tests. Usage: bash app/scripts/run_performance_tests.sh [device_id] [options] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR adds a performance test suite for the Omi Flutter app, comprising memory leak detection (
Confidence Score: 4/5Not safe to merge as-is; two P1 issues need resolution before reliable use on macOS or multi-test runs. Two P1 findings: the bash 3.2 incompatibility will prevent the script from running at all on macOS, and the double app.main() call risks corrupting the second integration test's results or crashing the test process. Both are straightforward to fix. All remaining findings are P2 or lower. app/scripts/run_performance_tests.sh (bash 4+ dependency) and app/integration_test/battery_drain_test.dart (double app.main() in second testWidgets)
|
| Filename | Overview |
|---|---|
| app/integration_test/battery_drain_test.dart | Adds battery drain / frame cost profiling tests; two issues: double app.main() call in second testWidgets block (P1), hardcoded 16ms jank threshold doesn't account for 120Hz devices, and /tmp output paths silently fail on iOS. |
| app/integration_test/memory_leak_test.dart | Adds RSS-based memory leak detection with linear regression; correctly isolated from the app binary in the second testWidgets block; /tmp write paths silently fail on iOS devices. |
| app/scripts/run_performance_tests.sh | Orchestration script uses declare -A associative arrays (bash 4.0+) which fail immediately on macOS default bash 3.2, blocking the primary developer use case. |
| app/test_driver/integration_test.dart | Minimal boilerplate driver file required by flutter drive; correct and complete. |
Sequence Diagram
sequenceDiagram
participant Dev as Developer
participant Script as run_performance_tests.sh
participant FlutterDrive as flutter drive
participant MemTest as memory_leak_test.dart
participant BatTest as battery_drain_test.dart
participant App as Omi App (device)
participant Tmp as /tmp/omi_perf_reports/
Dev->>Script: bash run_performance_tests.sh
Script->>Script: declare -A TESTS (requires bash 4+)
Script->>FlutterDrive: flutter drive --target=memory_leak_test.dart --profile
FlutterDrive->>MemTest: run testWidgets Detect heap growth
MemTest->>App: app.main() to home screen
loop 10 navigation cycles
MemTest->>App: navigate screens, scroll, back
MemTest->>MemTest: ProcessInfo.currentRss snapshot
end
MemTest->>MemTest: linear regression + R2 on RSS values
MemTest->>Tmp: omi_memory_*.json + omi_memory_*.csv
FlutterDrive->>BatTest: run testWidgets Profile CPU and frame cost
BatTest->>App: app.main() already running P1
loop 5 app states x 60s each
BatTest->>App: idle / scroll / chat / typing / rapid nav
BatTest->>BatTest: WidgetsBinding.addTimingsCallback FrameTiming
BatTest->>BatTest: compute FPS x avgFrameTime renderingCost
end
BatTest->>Tmp: omi_battery_*.json
Script->>Tmp: generate report_final.json + report_final.md
Script->>Dev: summary N passed / N failed
Reviews (1): Last reviewed commit: "Add performance test runner script" | Re-trigger Greptile
| TESTS[memory]="integration_test/memory_leak_test.dart" | ||
| TESTS[battery]="integration_test/battery_drain_test.dart" | ||
| TESTS[animation]="integration_test/animation_performance_test.dart" | ||
| TESTS[shimmer]="integration_test/shimmer_cpu_test.dart" | ||
| TESTS[rebuild]="integration_test/widget_rebuild_profiling_test.dart" |
There was a problem hiding this comment.
declare -A requires bash 4.0+ — fails on macOS default bash
macOS ships with bash 3.2 at /bin/bash (GPLv2 cap). declare -A (associative arrays) was added in bash 4.0, so the script aborts immediately on an unmodified macOS machine with:
./run_performance_tests.sh: line 141: declare: -A: invalid option
This blocks the primary intended use case. Replace associative arrays with positional parallel arrays (bash 3.2-compatible) or add an explicit version guard at the top:
| TESTS[memory]="integration_test/memory_leak_test.dart" | |
| TESTS[battery]="integration_test/battery_drain_test.dart" | |
| TESTS[animation]="integration_test/animation_performance_test.dart" | |
| TESTS[shimmer]="integration_test/shimmer_cpu_test.dart" | |
| TESTS[rebuild]="integration_test/widget_rebuild_profiling_test.dart" | |
| # Require bash 4+ for associative arrays (macOS ships with bash 3.2) | |
| if [[ "${BASH_VERSINFO[0]}" -lt 4 ]]; then | |
| echo "ERROR: bash 4.0+ required. On macOS run: brew install bash" >&2 | |
| exit 1 | |
| fi | |
| # Array of test files and their names | |
| declare -A TESTS | |
| TESTS[memory]="integration_test/memory_leak_test.dart" | |
| TESTS[battery]="integration_test/battery_drain_test.dart" | |
| TESTS[animation]="integration_test/animation_performance_test.dart" | |
| TESTS[shimmer]="integration_test/shimmer_cpu_test.dart" | |
| TESTS[rebuild]="integration_test/widget_rebuild_profiling_test.dart" | |
| TESTS[app]="integration_test/app_performance_test.dart" |
| app.main(); | ||
| for (int i = 0; i < 50; i++) { | ||
| await tester.pump(const Duration(milliseconds: 100)); | ||
| } | ||
|
|
There was a problem hiding this comment.
Double
app.main() call in shared integration test process
Both testWidgets blocks in this file call app.main(). When flutter drive runs this file, all tests execute in the same process — Firebase, provider registrations, and the widget tree from the first test are still live when the second test starts. Calling app.main() a second time can throw FirebaseException, crash the provider graph, or silently produce invalid profiling data because two widget trees are competing.
The second test (Measure frame rendering cost over time) should rely on the already-running app (omit app.main()) or this test should be moved to its own file with a single testWidgets + app.main() call.
| void _writeResultsToFile(List<_StateProfile> states) { | ||
| try { | ||
| final timestamp = DateTime.now().toIso8601String().replaceAll(':', '-'); | ||
| final file = File('/tmp/omi_battery_$timestamp.json'); | ||
|
|
||
| final buffer = StringBuffer(); | ||
| buffer.writeln('{'); | ||
| buffer.writeln(' "test": "battery_drain_estimation",'); | ||
| buffer.writeln(' "timestamp": "${DateTime.now().toIso8601String()}",'); | ||
| buffer.writeln(' "states": ['); | ||
| for (int i = 0; i < states.length; i++) { | ||
| final s = states[i]; | ||
| buffer.write(' {"name": "${s.name}", ' | ||
| '"duration_seconds": ${s.durationSeconds}, ' | ||
| '"total_frames": ${s.totalFrames}, ' | ||
| '"fps": ${s.framesPerSecond.toStringAsFixed(2)}, ' | ||
| '"avg_build_ms": ${s.avgBuildMs.toStringAsFixed(3)}, ' | ||
| '"avg_raster_ms": ${s.avgRasterMs.toStringAsFixed(3)}, ' | ||
| '"p50_build_ms": ${s.p50BuildMs.toStringAsFixed(3)}, ' | ||
| '"p90_build_ms": ${s.p90BuildMs.toStringAsFixed(3)}, ' | ||
| '"p99_build_ms": ${s.p99BuildMs.toStringAsFixed(3)}, ' | ||
| '"janky_frames": ${s.jankyFrames}, ' | ||
| '"janky_percent": ${s.jankyPercent.toStringAsFixed(2)}, ' | ||
| '"rendering_cost_relative": ${s.renderingCost.toStringAsFixed(2)}}'); | ||
| if (i < states.length - 1) buffer.writeln(','); | ||
| } | ||
| buffer.writeln('\n ]'); | ||
| buffer.writeln('}'); | ||
| file.writeAsStringSync(buffer.toString()); | ||
| debugPrint('Battery results saved to: ${file.path}'); | ||
| } catch (e) { | ||
| debugPrint('Could not save results: $e'); | ||
| } |
There was a problem hiding this comment.
Hardcoded
/tmp paths silently fail on iOS devices
File('/tmp/omi_battery_$timestamp.json') is inaccessible inside the iOS app sandbox. The try/catch prevents a crash, but test results are silently discarded whenever these tests run on a real iOS device.
Use path_provider's getTemporaryDirectory() for cross-platform temp paths:
import 'package:path_provider/path_provider.dart';
// ...
final dir = await getTemporaryDirectory();
final file = File('${dir.path}/omi_battery_$timestamp.json');The same issue applies to _writeFrameCostResults() in this file and both _writeResultsToFile()/_writeWidgetLeakResults() in memory_leak_test.dart.
| p50BuildMs: buildUs[buildUs.length ~/ 2] / 1000, | ||
| p90BuildMs: buildUs[(buildUs.length * 0.9).toInt()] / 1000, | ||
| p99BuildMs: buildUs[(buildUs.length * 0.99).toInt()] / 1000, | ||
| jankyFrames: timings.where((t) => t.buildDuration.inMilliseconds + t.rasterDuration.inMilliseconds > 16).length, |
There was a problem hiding this comment.
Jank threshold hardcoded to 16ms ignores 90/120Hz displays
> 16 ms catches frames that miss a 60 Hz budget, but Omi targets modern devices where 90 Hz (11.1 ms) and 120 Hz (8.3 ms) are common. On a ProMotion display, frames between 8.3–16 ms are janky but won't be counted here, making the jank percentage misleadingly low.
| jankyFrames: timings.where((t) => t.buildDuration.inMilliseconds + t.rasterDuration.inMilliseconds > 16).length, | |
| jankyFrames: timings | |
| .where((t) => t.buildDuration.inMicroseconds + t.rasterDuration.inMicroseconds > 16667) // 16.67ms = 60Hz budget; 120Hz devices may still jank below this | |
| .length, |
Summary
Resolves #3858 — adds a comprehensive performance test suite for the Omi Flutter app.
New files
integration_test/memory_leak_test.dart— Memory leak detection via RSS tracking across 10 navigation cycles. Uses linear regression with R² correlation to detect monotonic growth. Includes isolated widget lifecycle stress tests (ListView, AnimationController, Image grid — 20 create/destroy cycles each). Outputs JSON + CSV.integration_test/battery_drain_test.dart— Profiles 5 app states (idle home, active scrolling, chat, typing indicator, rapid navigation) for 60s each usingFrameTimingcallbacks. Reports build/raster times at p50/p90/p99. Detects frame rendering cost degradation over 10 rounds of 30s each. Uses relative rendering cost metric for comparing states.scripts/run_performance_tests.sh— Orchestrates all 6 integration tests (4 existing + 2 new) viaflutter drive --profile. Supports--duration 24for 24-hour continuous runs with hourly checkpoint reports. Generates JSON + Markdown reports. Quick mode (--quick) and single-test mode (--test memory).test_driver/integration_test.dart— Required driver file forflutter drive.Design decisions
ProcessInfo.currentRss(process-level Resident Set Size) rather than VM Service heap snapshots. RSS is a coarser proxy but is available without observatory connection. The code is honest about this — documented in comments, report headers say "RSS" not "heap."FPS × avg frame timeas a relative metric for comparing app states. It is explicitly documented as NOT an absolute mAh measurement.Usage
Validation
dart analyze: 0 issues on all new filesdart format --line-length 120: formattedLooking for a device tester 🤝
This code passes static analysis and is structurally complete, but we don't have an Omi device to run on. If you have a device and want to co-own this bounty, let us know — we'll iterate together on any runtime issues.
AI disclosure
This code was generated with Claude (Anthropic). Human-reviewed and refined through adversarial critique.
Test plan
dart analyze integration_test/memory_leak_test.dart integration_test/battery_drain_test.dart— should show 0 issuesbash -n app/scripts/run_performance_tests.sh— should show syntax OKbash app/scripts/run_performance_tests.sh --test memoryon a device with the app signed inbash app/scripts/run_performance_tests.sh --quickfor a full smoke test/tmp/omi_perf_reports/🤖 Generated with Claude Code