Rating: 4.97 / 5 Based on 40 reviews
We’re back with another update on Patrol - Patrol 2.0! During the last 2 months, we focused on shipping a new truly 10x feature – test bundling. Put shortly, test bundling is a new, advanced compatibility layer between the flutter_test package and native testing frameworks.
Test bundling fixes many long-standing problems with end-to-end UI testing in Flutter and unlocks use cases that weren't possible before. Read on to learn more about the issue and how we approached it.
💡Note
Our journey today starts with the integration_test plugin. Paraphrazing its tagline:
This package enables self-driving testing of Flutter apps on devices and emulators. It adapts flutter_test results into a format that is compatible with native Android and iOS instrumentation testing.
This sounds serious, but it’s actually a thin layer on top of the flutter_test package.
Let’s consider a widget test:
void main() {
testWidgets('sign up flow', (WidgetTester tester) async {
// test code omitted
});
}
Placing this widget test in the integration_test directory (instead of the test directory) makes it no longer a widget test – it’s now an integration test. The code stays the same as if you were writing a widget test, and all APIs from flutter_test can still be used.
To run that test, flutter test integration_test command must be used (instead of flutter test). Running that command generates a temporary file containing, among lots of glue code, a call to IntegrationTestWidgetsFlutterBinding.ensureInitialized(). That’s why you don’t have to call that method at the beginning of integration tests.
💡Bindings
Actually, IntegrationTestWidgetsFlutterBinding is the only public Dart API exposed by integration_test. But since it’s a plugin, there’s some native code in it. What does it do?
Put shortly, the native part of the integration_test plugin makes it possible to integrate with native tooling and to get Dart test results in native format. You may rightfully ask – why would I want that?
I’m certain that an answer to this question deserves its own paragraph.
iOS and Android have existed for the past 15 years, and the huge developer communities gathered around them have created tons of useful software. That also includes many great testing-related tools and platforms. Cloud device farms (Firebase Test Lab, AWS Device Farm, emulator.wtf, BrowserStack), open-source test runners (Flank, Marathon), test frameworks (way too many!), report generators to every format imaginable – these are just a few examples of the amazing tooling that developers of native mobile apps have at their disposal.
This extensive and well-established ecosystem offers huge advantages. It’s also important for enterprise clients, who often have large infrastructure built on top of it, with custom in-house tooling.
But there’s a problem – all these amazing tooling works only with native test frameworks. For example, for UI tests to run on Firebase Test Lab, they must be written using JUnit (on Android) or XCTest (on iOS).
This means that Flutter developers cannot easily tap into that mature testing ecosystem because they don’t write tests in either of those native frameworks. Instead, they use the official flutter_test package. You can address this problem in two ways:
As you’ve probably already realized, the first solution is infeasible, so let’s go with the bridge. Thanks to it, Flutter apps could use the existing native test tooling without that tooling needing to support Flutter explicitly.
I think this is exactly what integration_test would be in a perfect world. But we don’t live in a perfect world.
The integration_test plugin has many problems, big and small. What’s also worrisome is that it hasn’t received any significant improvements since its release in Flutter 2.0 in March 2021.
So, what’s wrong with integration_test?
Let's consider an app with 3 integration tests:
integration_test
├── sign_up_test.dart
├── location_test.dart
└── sign_out_test.dart
Running flutter test integration_test builds the application 3 times, once for every integration test file:
> sign_up_test.dart
> Build app
> Install app
> Run app and execute tests
> Kill app
> Uninstall app
> location_test.dart
> Build app
> Install app
> Run app and execute tests
> Kill app
> Uninstall app
> sign_out_test.dart
> Build app
> Install app
> Run app and execute tests
> Kill app
> Uninstall app
This is unnecessarily slow because the full app build is performed for each of the 3 Dart test files. The only difference in inputs to these builds is a single integration test file. As the number of tests you have grows, the time it takes to execute them also increases – and there’s no way around that. But requiring a full app rebuild **for every test file tip the scale, making tests’ total build and execution time unbearably long.
Our first idea to solve this problem of unnecessary builds was to bundle tests together. We generated the integration_test/bundled_test.dart file and filled it with references to other tests in the integration_test directory, which we obtained by walking that directory:
import 'package:test/test.dart';
import 'notifications_test.dart' as notifications_test;
import 'permission_location_test.dart' as permission_location_test;
void main() {
group('notifications_test.dart', notifications_test.main);
group('permission_location_test.dart', permission_location_test.main);
}
This worked and fixed the problem of unnecessary builds. Now we could run flutter test integration_test/bundled_test.dart once, and all tests were built into a single app binary.
But after a while, we found out that this “primitive” test bundling approach had a major flaw. Take a look at this snippet:
void main() {
patrolTest(
'some test',
($) async {
await $.pumpWidgetAndSettle(ExampleApp());
await $('some button').tap();
// ... omitted more test code
exit(1); // kills the app
},
);
}
which leads us to the next problem…
If something really bad happens to the app under test and it crashes (simulated by the exit() in the snippet above), subsequent tests don’t execute, and no test report will be available. The call to exit() in the snippet above might look off – after all, you never use it in Flutter apps – but it’s here just for demo purposes. If you want a more real-like example, imagine a native crash occurring in the app, resulting in a dreaded App Not Responding dialog.
A crash like this has fatal consequences for the tests.
In Flutter, the tests are built into and run inside the app; since the app died, the tests have also died! No subsequent tests will be executed, and there will be no report since it’s generated at the end of the test run, but the test run crashed.
But a crash is not the only danger to the primitive test bundling approach. Remember, all tests run in the same process. You are in charge of ensuring no state is shared – for example, resetting global variables and ensuring plugins are not initialized more than once.
Sharding means splitting the test suite across many workers (shards), which execute tests in parallel, reducing the total time it takes for a test suite to finish running. It also helps reveal implicit dependencies between tests because usually, tests are split into shards randomly, so there are no guarantees about the order in which they'll be executed.
But because of how integration_test is implemented, sharding is broken.
The integration_test plugin creates native tests only after all Dart tests execute, so there's no way to shard them - they just don't exist at the time when sharding happens!
This problem is not as bad as the previous one, but it's still *bad*. It makes running even medium-sized test suites infeasible because of how long it takes to execute them.
Compared to the previous problems, this one is merely annoying.
Consider a simple Dart test file:
void main() {
testWidgets('alpha test', (WidgetTester tester) async {
await tester.pumpWidget(const MyApp());
await Future.delayed(const Duration(seconds: 10));
});
testWidgets('bravo test', (WidgetTester tester) async {
await tester.pumpWidget(const MyApp());
await Future.delayed(const Duration(seconds: 10));
});
}
Both these tests will take about 10 seconds to execute each. Unfortunately, that’s not how their run times are reported. The first test’s duration is reported to be a few hundred milliseconds, and the subsequent ones finish instantly:
Why's that?
The cause is the same as before – native tests are created only after Dart tests finish running. When the native part of integration_test receives the results of Dart tests, it creates native tests out of them. But these test cases are simply stubs – their execution is finished immediately after creation. That's why the run times are reported incorrectly.
💡 See it yourself
Now that we know what the problem is, we can sketch out the acceptance criteria for a solution: tests must be completely isolated from one another to prevent flakiness and remove implicit dependencies between them.
That’s why we named this approach primitive test bundling. The idea was spot on, but the flaws in implementation disqualified it.
After lots of thinking and workshops, we realized that it was impossible to fix the flaws of primitive test bundling in pure Flutter and Dart. We had to drop down to the native level. That’s how advanced test bundling was born.
Wait, what? This is an article about UI testing, and now we’re talking about accessibility?
Yes – because how accessibility works in Flutter is similar to how we implemented advanced test bundling.
Let’s think about how it works that Android and iOS can display the accessibility information over Flutter widgets which, well, are Flutter widgets - they exist only in the Flutter part of the app. Android and iOS have no slightest idea what a “Flutter widget” is.
In other words, how does it work that when you run this simple code with TalkBack/VoiceOver enabled and tap on the blue rectangle, you’ll hear Late nights in the middle of June?
class MyApp extends StatelessWidget {
const MyApp({super.key});
Widget build(BuildContext context) {
return MaterialApp(
home: Center(
child: Semantics(
label: 'Late nights in the middle of June',
child: Container(
width: 100,
height: 100,
color: Colors.blueAccent,
),
),
),
);
}
}
This is made possible by a component called accessibility bridge. It’s part of the Flutter Engine, and there’s a separate implementation for every operating system supported by Flutter (because all platforms have different accessibility APIs). The accessibility bridge receives semantics information from the Flutter framework and translates it into a format the operating system can understand. Then the accessibility information is laid on top of Flutter widgets.
This sounds simple in principle, but since Flutter supports many operating systems, and each differs slightly, there are many edge cases. This is done by some hardcore hacking of native accessibility frameworks, but it works reliably, thanks to the incredible engineering done by Google.
Here’s a drawing I sketched out to visualize this process:
💡 Accessibility bridge
At some point, I realized that what we need to fix the problems above is a component similar to the accessibility bridge but for tests. After all, the situation is fairly similar - similar concept (a test) exist in both Flutter and native, but there’s no link between them.
Fortunately, since we’re only focusing on Android and iOS (though that’ll probably change in the future), our case is much simpler than Flutter’s accessibility bridge.
On both mobile platforms, end-to-end UI tests work similarly. There are always 2 apps involved:
The instrumentation app runs first and starts executing tests one by one. The first thing each test does is start the app under test. Then, the actual test begins – tapping, entering text, assertions, and so on.
Test suite state lives in the instrumentation process, safe from any fatal crashes that may occur in the app process.
💡Differences
Now that we know what tools are available, here’s how I imagined a new test bridge would work:
What’s important is that there are no native tests at compile time - they are only created at runtime in step 2. We refer to this process as the “dynamic creation of tests”.
💡Test is a blurry term
I came to call this approach advanced test bundling. The name test bundling stayed because it still accurately describes what’s going on (the Dart file bundling all other Dart tests is still generated), but the internals are completely different.
Another drawing, this time visualizing how an improved version of integration_test should work:
💡Nothing fancy, after all
You already understand the why and what of test bundling. Now we’re getting to the coolest part - implementation.
Let’s look under the hood and see what parts our new test bridge consists of and how they play together.
I started the implementation of advanced test bundling from Dart, and right off the bat, I faced a problem. The gist is that package:test doesn’t allow for retrieving the test suite structure before it starts executing, but this is exactly what we need. If you’d like to learn more about this issue, I reported it to dart-lang/test repository.
We worked around this problem by using non-public APIs of the package:test_api, which is a dependency of package:test. It’s not a perfect solution, but it’s pretty simple, and it works. Also, the APIs we depend on are fairly stable, even though they’re not public. The workaround is to create a special patrol_test_explorer test case which runs first and retrieves the test suite structure. It gets this information from the global Invoker object, which is an internal API from the test_api package.
The code for this lives in the integration_test/bundled_test.dart file. Remember that Patrol CLI automatically generates this file during patrol test or patrol build, and the whole process is transparent to the developer.
// This whole file is generated automatically by Patrol CLI at build time.
// Internal API imports. Not nice, booo.
import 'package:test_api/src/backend/invoker.dart';
import 'package:test_api/src/backend/group.dart';
// Imports of other tests in the integration_test directory.
import 'notifications_test.dart' as notifications_test;
import 'permission_location_test.dart' as permission_location_test;
void main() async {
final testExplorationCompleter = Completer<Group>();
test('patrol_test_explorer', () {
final topLevelGroup = Invoker.current!.liveTest.groups.first;
testExplorationCompleter.complete(topLevelGroup);
});
group('notifications_test.dart', notifications_test.main);
group('permission_location_test.dart', permission_location_test.main);
final Group topGroup = await testExplorationCompleter.future;
// At this point, we have test suite structure!
// Later, we serve it over gRPC so the native part can query it.
}
Once this problem was fixed, work on the native part could begin.
💡 Omission alert!
This article is already much longer than I planned, so I decided to leave it out. Solutions to these problems also depend on accessing some internal test suite state through Invoker.
Here’s a typical “sign-in” UI test written using first-party Android tools. It’s defined at compile time, the testing framework is JUnit4 coupled with AndroidJUnitRunner instrumentation runner, and UI interactions and assertions are done using the very popular Espresso library:
(AndroidJUnit4.class)
public class ExampleTest {
public static Iterable<Object[]> testCases() {
public void launchActivity() {
ActivityScenario.launch(MainActivity.class);
}
public void signIn() {
onView(withId(R.id.editTextUsername)).perform(typeText("charlie_root"));
onView(withId(R.id.editTextPassword)).perform(typeText("ny4ncat"));
onView(withId(R.id.buttonSignIn)).perform(click());
onView(withId(R.id.textViewWelcomeMessage)).check(matches(isDisplayed()));
}
}
This is not easily adaptable to what we need in our advanced test bundling approach. Here, tests are defined at compile time, but we have to generate them dynamically (i.e., at runtime) from Dart tests. Adding methods to a class at runtime is hard to do in JVM/ART, so that’s a no-go.
Fortunately, the JUnit4 library has the Parametrized runner. It lets us define a test case once and run it multiple times, feeding it new data each time. Here’s a simple example showing how the + (addition) operation could be tested in a calculator application with the help of the Parametrized runner:
(Parameterized.class)
public class CalculatorTest {
public static Iterable<Object[]> testCases() {
return Arrays.asList(new Object[][]{
{0, 0, 0},
{2, 2, 4},
{3, 2, 5},
{3, 3, 6}
});
}
private final double firstNum;
private final double secondNum;
private final double sum;
public CalculatorTest(double firstNum, double secondNum, double sum) {
this.firstNum = firstNum;
this.secondNum = secondNum;
this.sum = sum;
}
public void addTwoNumbers() {
// omitted code that starts the activity
onView(withId(R.id.firstNumEditText)).perform(typeText(Integer.toString(firstNum)));
onView(withId(R.id.secondNumEditText)).perform(typeText(Integer.toString(secondNum)));
onView(withId(R.id.sumTextView)).check(matches(Integer.toString(sum)));
}
}
There are 4 tests, and a new app process is started for each of them to achieve isolation. During each test, an instance of CalculatorTest is created, and then the addTwoNumbers() method is called - it’s all taken care of by the Parametrized runner.
Watch the video below
The list of tests to execute is returned by the testCases() method. Inside of it, we make a gRPC call to get Dart tests defined in the app under test:
(Parameterized.class)
public class ExampleTest {
public static Iterable<Object[]> testCases() {
List<String> dartTests;
// make a gRPC call to the Flutter part of the app to get Dart tests
return dartTests;
}
/// ...
}
It's worth mentioning that CalculatorTest constructor and the testCases() method are actually called 5 times. That's because Android Test Orchestrator (a component responsible for executing tests) makes an initial run to collect the tests, so it can execute them later.
There was a little problem, though – by default, AndroidJUnitRunner doesn't start the app during the initial run – it only starts the instrumentation and inspects what tests are defined in it. It makes sense to test a typical Android app when the tests are known at compile time, but it breaks in the case of a Flutter app. That's because, in Flutter, tests are defined inside the app, so the app must be run to collect these tests. It also has a few unwanted behaviors - the most annoying is lifecycle, which isn't easily configurable, and results in the app being killed immediately after the initial test run, before Dart tests are retrieved.
We fixed these problems by creating our own PatrolJUnitRunner. It extends from AndroidJUnitRunner, but changes the unwanted behaviors and adds new ones, such as starting the app during the initial test run.
This is all quite complicated, and if you’re still reading, you might feel (rightfully!) overwhelmed. But here’s the good news - you don’t have to understand any of this to use Patrol – we’ve managed to mask the complexity. The setup process is simple, and we have a guide explaining in detail what to do.
---------------------
Result? Deep and seamless integration with Android’s existing testing stack. Here’s a video of running the test suite of Patrol’s example app. It has over a dozen tests, and a few of them are currently failing (Our bad – but actually, this is nice for demo purposes!).
Watch the video below
Getting advanced test bundling to work with XCTest on iOS was relatively straightforward, much more so than on Android. We had a good starting point – existing code in integration_test – that had to be adapted instead of rewritten from scratch.
How end-to-end UI tests execute on iOS is arguably simpler than on Android. Again, let’s start with a vanilla “sign in” test written in Swift using the XCTest framework provided by Apple.
import XCTest
class ExampleTest: XCTestCase {
func testLogin() {
let app = XCUIApplication()
app.textFields["Username"].tap().typeText("charlie_root")
app.secureTextFields["Password"].tap().type("ny4ncat")
app.buttons["Sign in"].tap();
let welcomeLabel = app.staticTexts["WelcomeLabel"]
XCTAssertEqual(welcomeLabel.label, "Welcome, charlie_root!")
}
}
But as you already know, we can’t have tests defined at compile time. Good news: the XCTest framework supports dynamic creation of test cases by overriding the testInvocations() method of XCTestCase. Not necessarily good news: it’s only available for use in Objective-C.
That’s because testInvocations(), while similar to JUnit’s Parametrized runner, works very differently, and doesn’t play well with statically compiled Swift code. Here’s the signature of testInvocations():
@property(class, readonly, copy) NSArray<NSInvocation *> *testInvocations;
What matters to us the most is its return type. It’s a list – yes, NSArray is a list – of pointers to methods to execute (roughly speaking). But testInvocations() doesn’t provide any obvious way to create these methods. It leaves with what Objective-C Runtime can do.
Fortunately, Objective-C Runtime can do literally everything. It’s an absolutely crazy powerful tool. It allows for all kinds of unusual things impossible in other programming languages – creating classes, interfaces, and methods, changing superclasses, replacing implementations of built-in classes and methods, and using Apple’s private frameworks. All. At. Runtime.
Let’s see how we used a fraction of its power to implement a bridge between XCTest and Flutter’s built-in test framework.
// In Objective-C, every class has an interface. This one is a subclass of XCTestCase.
ParametrizedTests : XCTestCase
ParametrizedTests
// Plus sign in method's signature means it's attached to a class rather than the instance.
+ (NSArray<NSInvocation *> *)testInvocations {
// Create a list for storing Dart tests that will be retrieved later from the Flutter app.
NSMutableArray<NSString *> *dartTestFiles = [[NSMutableArray alloc] init];
// Start the Flutter app. This is the initial run during which Dart tests are retrieved.
[[[XCUIApplication alloc] init] launch];
// Omitted: Create a gRPC client.
// Omitted: Query the gRPC server running in the Flutter app for Patrol tests defined in Dart.
// Create a list of references to methods – "invocations".
NSMutableArray<NSInvocation *> *invocations = [[NSMutableArray alloc] init];
for (int i = 0; i < dartTestFiles.count; i++) {
NSString *testName = dartTestFiles[i];
// Omitted: wizardry with Objective-C Runtime:
// 1. Dynamic creation of test case method signatures and bodies. Name is *testName.
// 2. Adding the newly created method to the current class.
// 3. Creating an "invocation" for the method and appending it to the "invocations" array.
}
return invocations;
}
Once testInvocations() returns with the list of test case methods, our involvement ends – now it’s up to XCTest to execute them.
💡 Objective-C no more!
---------------------
Results?
Watch the video below
And since the dynamically generated test cases are just that – native XCTest cases, all goodies that Xcode has are available – including detailed test reports:
Watch the video below
Unfortunately, there’s also a darker side to iOS’s tooling being simpler than Android’s – it’s less advanced. Some nice features available out-of-the-box in Android Test Orchestrator don’t exist on iOS.
A great example is the Android Test Orchestrator’s clearPackageData flag – enabling it causes all of the app’s data to be cleared in between individual test case runs. Even if some data is saved to the app’s local storage, it’ll be automatically cleared at the end of the test. Subsequent tests will start from the blank slate, helping to achieve full isolation between test runs. There’s no equivalent functionality on iOS, so you have to clear the app’s NSUserDefaults and other data by adding code to test’s start-up or tear-down logic.
More problems come from severe restrictions, or rather a ban, on JIT compilation on actual iOS devices, present since iOS 14. It doesn’t apply to iOS Simulators. If you’ve been a Flutter developer for some time now, you’ve surely seen this screen after opening a debug Flutter app from iPhone’s home screen:
It’s painful for Flutter apps, which JIT-compile code all the time in debug mode. Apart from jailbreak, there’s exactly one escape hatch allowing apps to execute JIT-compiled code. This escape hatch is running the app with a debugger attached. That’s what happens during flutter run - the lldb debugger is attached to the running Flutter app. But when the debugger’s detached – say, flutter run process exits or the app is killed by the user through App Switcher – it can’t be opened again by tapping its icon on the home screen.
But how does this affect Patrol? Surely we can do the same thing that flutter run does – attach lldb and call it a day? Unfortunately, it’s not that simple.
[[[XCUIApplication alloc] init] launch];
Remember this line from the snippet above? It’s the same as tapping the app’s icon on the home screen. Meaning, it doesn’t work for Flutter apps built in debug mode. We could try attaching and detaching the debugger at the start of every dynamically created test case, but it would only work with the patrol test command, while one of Patrol’s coolest features is the patrol build command. It produces artifacts that can be uploaded to cloud device farms (think Firebase Test Lab). The problem is there’s no debugger there. Game over.
The workaround is to run tests on apps built in release mode, but it has its problems.
Building an app binary in release mode takes much more time than in release mode. It also removes all asserts from code and removes all debugging facilities present in debug mode, such as Dart DevTools and Dart Observatory. This impedes debugging of integration tests; for example, flutter logs doesn’t work because the code responsible that makes it work is optimized out during a release build.
Still, even with all these problems, Patrol’s advanced test bundling approach on physical iOS devices is worth the hassle. First and foremost, much less time is spent in the build phase – UI tests are inherently slow to execute, but because of integration_test, they were also unnecessarily slow to build. Together with full-blown test reports, correct test run durations, and process-level isolation between them, it makes for a nice upgrade over integration_test.
Patrol 2.0, with its new test bundling feature, fixes the long-standing problems that made end-to-end UI testing in Flutter a huge pain. It wasn't an easy thing – it required a few weeks of diving deep into Dart, Android, and iOS testing frameworks, rewiring internals, and sometimes hacking around problems using private APIs – but it was fun!
There’s still area for improvement, of course. No software is ever finished (maybe except for Apollo 11 computer source code), and Patrol is no different. We gather and track all problems with the test bundling feature in this issue.
And that’s it for this article. I hope you liked it and have learned something along the way. Thanks for reading, and if you're as excited about Patrol as we are and want to support us:
And as always - if you encounter bugs or have interesting feature requests, don’t hesitate to create an issue, a discussion, or even better – a pull request!
Also, now you don’t have any more excuses not to write a few tests for the app you’re working on. So what are you waiting for? Start using Patrol now!