Device drivers are a substantial part of the Linux kernel accounting for about
66% of the project’s lines of code. Testing these components is fundamental to
providing confidence to the operation of GNU/Linux systems under diverse
workloads. Testing is so essential that it is considered part of the software
development cycle. However, testing device drivers may be hard due to many
possible drawbacks such as not exposing a user space interface, architecture
dependence, requirement of custom configuration symbols, etc. A fact that
instigates to question: how are device drivers being tested?
To answer this question is the main goal of my master’s research project at the
Institute of Mathematics and Statistics of the University of São Paulo
(IME-USP). To provide a comprehensive answer to the question, I resorted to a
systematic mapping study, and to a grey literature review. With the former, I
gathered information from peer-reviewed articles published by academic media
while the latter provided data from online publications sponsored by reputable
organizations and magazines.
There is a variety of ways to test software, though. One may provide a program
with all possible inputs (random testing / fuzzing), run only a few parts of the
program (unit testing, integration testing), run a program under extreme
conditions (stress testing), analyze the code semantics (semantic check, static
analysis), or apply any other of many software testing techniques. This is
because software testing “is the process of determining if a program behaves as
expected.” [1] “An
activity in which a system or component is executed under specified conditions,
the results are observed or recorded, and an evaluation is made of some aspect
of the system or component.” [2]
It is arguable that sometimes we don’t even need to run a program to estimate if
it’s going to work as desired. If we relax a little bit our concept of software
testing, we may consider code review as a sort of testing as well. “Code
walkthrough, also known as peer code review, is used to review code and may be
considered as a static testing technique.”
[1] However, assessing every
such testing practice would be a cumbersome task, one that would not fit in a
master’s program. So, to limit the scope of this investigative work, we (my
advisors and I) decided to look only at a subset of the tools used for kernel
testing. Precisely, the tools that enabled device driver testing.
This article presents a discussion about some of the tools employed to test
Linux device drivers. The tools addressed here are the ones most cited by
publications assessed in our mapping study and grey literature review.
Nevertheless, these tools do not portray all solutions available for kernel
testing, nor do they encompass all the possible approaches for driver testing.
The details of our research methods are available
here.
Our answer to the question of how Linux drivers are being tested is going to
stem from the assessment of the tools that are said to enable driver testing. In
fact, we derived a couple of subquestions to guide ourselves throughout this
investigation. What testing tools are being used by the Linux community to test
the kernel? What are the main features of these tools? As we need to get to know
the testing apparatus to evaluate their role in assessing the functioning of
device drivers, we decided to make a catalog to synthesize the information about
these testing tools. Thus, one of our goals is to catalog the available Linux
kernel testing tools, their characteristics, how they work, and in what contexts
they are used.
Besides, I’d like to give something back to the kernel community. For that, I
intend to use the testing tool catalog to provide advice to fellow kernel
developers interested in enhancing their workflow with the use of testing tools.
Also, there is a Kernel Testing Guide [3]
documentation page that could benefit from our work. Moreover, a mailing list
thread [4] indicates that there
is a desire for a more complete testing guide.
Thank for you writing this much needed document.
Thanks, Shuah: I hope I haven’t misrepresented kselftest too much. :-)
Looks great. How about adding a section for Static analysis tools?
A mention coccicheck scripts and mention of smatch?
Good idea. I agree it’d be great to have such a section, though I
doubt I’m the most qualified person to write it. If no one else picks
it up, though, I can try to put a basic follow-up patch together when
I’ve got some time.
To build the proposed test tool catalog, we carried out an evaluation process to
assess the usage of each testing tool selected throughout our study. We searched
each project’s repository, looked their documentation for instructions on how to
install and use each tool, installed them, and, finally, made basic use of the
testing tools. Moreover, we reached every author by email when we faced setbacks
in using those testing tools. We also visited the commit history of some
projects and their corresponding mailing lists between 2022-01-12 and
2022-01-24.
Let us finally talk about those testing tools.
What are the options?
To some extent, several tools can facilitate device driver testing. It’s nearly
impossible to analyze them all. Yet, this post covers the twenty Linux kernel
testing tools selected by our study for being either focused on driver testing
or most cited by online publications. These tools make up an heterogeneous group
of test solutions comprising diverse features and testing techniques. From unit
testing to end-to-end testing, dynamic or static analysis, many ways of puting
Linux to the test have been conceived. The following table tries to outline
test types and tools associated with them.
The checks match tests and tools according to what we found expressly reported
in the literature, with a few complementary marks added by me. Also, some
testing types interleave with each other. For instance, unit testing may be
considered a sort of regression testing, fuzzing is also an end-to-end test,
fault injection tools often instrument the source code to trigger error paths.
Thus, it’s more than possible that some tools be not marked with all types of
tests they can provide. If you think that’s the case, feel free to leave a
comment at the end of the page. Improvement suggestions are greatly appreciated.
\ ________ Tool Test Type \ |
Kselftest |
0-day |
KernelCI |
LKFT |
Trinity |
Syzkaller |
LTP |
ktest |
Smatch |
Coccinelle |
jstest |
TuxMake |
Sparse |
KUnit |
SymDrive |
FAUmachine |
ADFI |
EH-Test |
COD |
Troll |
Unit testing (2) |
 |
|
|
|
|
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
Regression testing (1) |
 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Stress testing (2) |
 |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Fuzz testing (3) |
|
|
|
|
 |
 |
|
 |
|
|
|
|
|
|
|
|
|
|
|
|
Reliability testing (1) |
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Robustness testing (1) |
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Stability testing (1) |
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
|
|
|
|
|
End-to-end testing (3) |
 |
|
|
|
|
|
|
 |
|
|
 |
|
|
|
|
|
|
|
|
|
Build testing (2) |
|
|
|
|
|
|
|
 |
|
|
|
 |
|
|
|
|
|
|
|
|
Static analysis (3) |
|
|
|
|
|
|
|
|
 |
 |
|
|
 |
|
|
|
|
|
|
|
Symbolic execution (1) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
Fault injection (1) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
 |
 |
 |
|
|
Code Instrumentation (2) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
 |
|
 |
|
Concolic Execution (1) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
 |
|
Local analysis and grouping (1) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
 |
To give an overview of how easy (or not) it is to use these tools, I made a
chart of setup versus usage effort. First, I defined a set of tasks that
comprise the work of setting up a test tool. Then, I counted how many of these
tasks I carried on while setting up the considered tools. A test tool got a
point if I had to:
- A) install system packages (+1), because it was needed to install packages
other than the ones required to build the Linux kernel.
- B) download and build a source code repository (+1), because it was needed to
download the source code and build it locally.
- C) create a VM (+1), because the tests were potentially destructive and could
cause problems to the running system.
- D) configure a VM (+1), because it was needed to do additional VM
configuration such as installing packages, enabling ssh, messing up with grub,
etc.
- E) write/edit configuration files (+1), because it was needed to create or
modify configuration files for the tests to run.
Finally, I gave five points to the tools I could not set up after trying all the
above. The effort ratings were set to None for tools with no effort point, Low
for tools with one or two points, Moderate for tools with three to four points,
and High for tools with five points.
The classification of usage effort took into account if:
- R) the tests can run as an usual application program (+1).
- S) the tests have to run inside a VM (+1), due to risk of compromising the running system.
- T) the tests require a high amount of CPU or memory to run (+1).
In some cases, I couldn’t get test results because I could not run the tests
myself or the test results were unavailable to me. Those tools got three effort
points. Lastly, the usage effort ratings were set to None, Low, Moderate, and
High for tools with zero, one, two, and three points, respectively.
Figure 1. Setup and usage effort of Linux kernel testing tools.
The next table shows the exact traits that had led me to classify the testing
tools the way they appear in the above picture. The cases where I was unable to
setup or to run a tool are indicated with a U. Also, there are two slight
adjustments I’d like to note. First, I gave an additional D to ktest because it
took me almost three days to setup everything needed to run it. Second, I added
two extra points to FAU machine since its documentation was completely outdated
and unusable.
Tool |
Setup Effort Points |
Usage Effort Points |
Kselftest |
A |
R |
Trinity |
B, C |
R, S |
Syzkaller |
A, B, E |
R, S |
LTP |
B, C, D |
R, S |
ktest |
A, C, D, E, extra D |
R, S |
Smatch |
A, B |
R |
Coccinelle |
A |
R |
jstest |
A |
R |
TuxMake |
A |
R |
Sparse |
A |
R |
KUnit |
- |
R |
SymDrive |
U |
U |
FAU machine |
A, B, E, +2 |
R, S |
ADFI |
U |
U |
EH-Test |
U |
U |
COD |
U |
U |
Troll |
A, B, E |
R, T |
One may, of course, discord from these criteria in many ways. The next section
presents some evidence to support my considerations about these tools. Anyhow,
if you’re unhappy with the information given here or feel like a tool was too
misrepresented, don’t hesitate to write a comment at the end of the post.
This section briefly describes each evaluated test tool and tells the experience
we had when using them for testing the Linux kernel.
Kselftest
Kernel selftests (kselftest) is a unit and regression test suite distributed
with the Linux kernel tree under the tools/testing/selftests/ directory
[5]
[6]
[7].
Kselftest contains tests for various kernel features and sub-systems such as
tests for breakpoints, cpu-hotplug, efivarfs, ipc, kcmp, memory-hotplug, mqueue,
net, powerpc, ptrace, rcutorture, timers, and vm sub-systems [10]. These tests are intended to be small
developer-focused tests that target individual code paths and short-running
units supposed to terminate in a timely fashion of 20 minutes . Kselftest consists of shell scripts and user-space
programs that test kernel API and features. Test cases may span kernel and
use-space programs working in conjunction with a kernel module to test [6]. Even though kselftest’s main purpose is
to provide kernel developers and end-users a quick method of running tests
against the Linux kernel, the test suite is run every day on several Linux
kernel integration test rings such as the 0-Day robot and Linaro Test Farm [7]. It is stated that someday Kselftest
will be a comprehensive test suite for the Linux kernel
[20] [25].
We had a smooth experience while using kselftest. There are recent patches and
ongoing discussions on the project mailing list
as well as recent commits in the subsystem tree. The documentation presents all
the instructions necessary to compile and run the tests. There are also sections
exemplifying how to run only subsets of the tests. Some kselftest tests require
additional libraries, listed with the kselftest_deps.sh script.
Unfortunately, by the date we evaluated the tool, the documentation did not
mention the build dependencies script. Nevertheless, kselftest documentation
had enough information for us to run some tests without any problem.
0-day test robot
The 0-day test robot is a test framework and infrastructure that runs several
tests over the Linux kernel, covering core components such as virtual memory
management, I/O subsystem, process scheduler, file system, network, device
drivers, and more [31]. Static analysis
tools such as sparse, smatch, and coccicheck are run by 0-day as well
[24]. These tests are provided by Intel as a
service that picks up patches from the mailing lists and tests them, often
before they are accepted for inclusion [25].
0-day also test key developers’ trees before patches move forward in the
development process. The robot is accounted for finding 223 bugs during a
development period of about 14 months from Linux release 4.8 to Linux 4.13
(which came out September 3, 2017). With that, the 0-day robot achieved the rank
of top bug reporter for that period [25].
Analyzing Linux 5.4 development cycle, though,
Corbet [34] reports that there have
been worries that Intel’s 0-day testing service is not proving as useful as it
once was.
KernelCI
KernelCI is an efort to test upstream Linux kernels in a continuous integration
(CI) fashion. The project main goal is to improve the quality, stabiblity and
long-term maintenance of the Linux kernel. It is a community-led test system
that follows an open philoshophy to enable the same collaboration to happen with
testing as open source does to the code itself
[19]
[27]. KernelCI generates various
configurations for different kernel trees, submits boot jobs to several labs
around the world, collects, and stores test results into a database. The test
database kept by KernelCI includes tests run natively by KernelCI, but also Red
Hat’s CKI, Google’s syzbot and many others
[23][27].
LKFT
Linaro’s LKFT (Linux Kernel Functional Testing) is an automated test
infrastructure that builds and tests Linux release candidates on the arm and
arm64 hardware architectures [24]. The
mission of LKFT is to improve the quality of Linux by performing functional
testing on real and emulated hardware targets. Weekly, LKFT runs tests over 350
release-architecture-target combinations on every git-branch push made to the
latest 6 Linux long-term-stable releases. In addition, Linaro claims that their
test system can consistently report results from nearly 40 of these test setup
combinations in under 48 hours [28]
[35]. LKFT incorporates and runs tests from
several test suites such as LTP, kselftest, libhugetlbfs, perf, v4l2-compliance
tests, KVM-unit-tests, SI/O Benchmark Suite, and KUnit
[36].
Trinity
Trinity is a random tester (fuzzer) that specializes in testing the system call
interfaces that the Linux kernel presents to user space
[33]. Trinity employs some techniques to
pass semi-intelligent arguments to the syscalls being called. For instance, it
accepts a directory argument from which it will open files and pass the
corresponding file descriptors to system calls under test. This can be useful
for discovering failures in filesystems. Thus, Trinity can find bugs in parts of
the kernel other than the system call interface. Some areas where people used
Trinity to find bugs include the networking stack, virtual memory code, and
drivers [32]
[33].
Trinity is accessible through a
repository on GitHub. The latest
commit to that repository is from about 1.5 months behind the repository
inspection date. From the recent commit history, we estimate that the project
change rate is roughly one commit per month and that the tool has been
maintained by three core developers. Trinity documentation is scarce and has
not been updated for four years. Although there are some usage examples, the
documentation does not contain a tool installation guide.
In our experience with Trinity, we let the fuzzer run for a few minutes. It
looks like Trinity is still working the same way Konovalov
[13] described: “Trinity is a kernel
fuzzer that keeps making system calls in an infinite loop.” There is no precise
number of tests to run and no time limit for their completion. After being
interrupted, the program shows the number of executed system calls, how many
ended successfully, and how many terminated with failures.
Syzkaller
Syzkaller is said to be a state-of-the-art Linux kernel fuzzer
[13]. The syzbot system is a robot
developed as part of the syzkaller project that continuously fuzzes main Linux
kernel branches and automatically reports found bugs to kernel mailing lists.
Syzbot can test patches against bug reproducers. This can be useful for testing
bug fix patches, debugging, or checking if the bug still happens. While syzbot
can test patches that fix bugs, it does not support applying custom patches
during fuzzing. It always tests vanilla unmodified git trees. Nonetheless, one
can always run syzkaller locally on any kernel for better testing a particular
subsystem or patch [18]. Syzbot is
receiving increasing attention from kernel developers. For instance, Sasha Levin
said that he hoped that failure reproducers from syzbot fuzz testing could be
added as part of testing for the stable tree at some point
[11].
Syzkaller is accessible from a GitHub
repository. The project received various
contributions in the length of time close to our evaluation window. The majority
of those changes were committed by five core developers. Also, the Syzkaller
project mailing list had
several messages recent to our evaluation period.
The Syzkaller documentation is fairly complete. It contains detailed
instructions on how to install and use Syzkaller as well as several
troubleshooting sections with tips against possible setup problems. The
documentation also includes pages describing how the fuzzer works, how to report
bugs found in the Linux kernel, and how to contribute to the tool.
When run, Syzkaller prints execution environment information to the terminal and
activates an HTTP server. The server pages display detailed test information
such as code coverage, the number of syscall sequences executed, number of
crashes, execution logs, etc.
LTP
The Linux Test Project (LTP) is a test suite that contains a collection of
automated and semi-automated tests to validate the reliability, robustness, and
stability of Linux and related features [10]
[14]. By default, LTP run script include tests
for filesystems, disk I/O, memory management, inter process comunication (IPC),
the process scheduler, and the system call interface. Moreover, the test suite
can be customized by adding new tests, and the LTP project welcomes
contributions [10].
Some Linux testing projects are built on top of LTP or incorporate it somewhat.
For example, LTP was chosen as a starting point for Lachesis, whereas the LAVA
framework provides commands to run LTP tests from within
it [10]. Another test suite that runs LTP
is LKFT [24]
[36].
LTP is available at a GitHub
repository at which many developers
committed in the weeks preceding the evaluation period. There were also a few
discussions in progress on the project’s mailing list.
In addition, the LTP documentation contains a tool installation and usage guide
as well as other information we found helpful.
We ran a few LTP syscall tests separately and had a pleasing first impression of
the test suite. The completion time of each test was short, and their results
(pass or fail) were very clear. It took about 30 minutes to run the entire
collection of system call tests. LTP also has a set of device driver tests, but
many of them are outdated and do not work anymore.
ktest
ktest provides an automated test suite that can build, install, and boot test
Linux on a target machine. It can also run post boot scripts on the target
system to perform further testing[10]
[16]. ktest has been included in the Linux
kernel repository under the directory tools/testing/ktest. The tool consists
of a perl script (ktest.pl) and a set of configuration files containing test
setup properties. In addition to the build and boot tests, ktest also supports
git bisect, config bisect, randconfig, and patch check as additional types of
tests. If a cross-compiler is installed, ktest can also run cross-compile tests
[17][10].
ktest is available from within the Linux kernel repository under the
tools/testing/ktest/ directory. Despite belonging to the kernel project, the
last contribution to ktest dates from five months before our inspection date.
Its documentation is sparse and has only a description of the configuration
options along with a brief description of the existing example configuration
files. There is no installation guide, nor any list of test dependencies. To set
up ktest, we followed the guidelines provided by Jordan
[16] and adapted several runtime
configurations. We only ran an elementary build and boot test over a couple of
patches. Nevertheless, we think ktest could be useful in automating many test
activities mentioned in the literature, such as patch checking, bisecting, and
config bisect.
Smatch
Smatch (the source matcher) is a static analyzer developed to detect programming
logic errors. For instance, smatch can detect errors such as attempts to unlock
already unlocked spinlock. It is written in C and uses Sparse as its C parser.
Also, smatch is run on Linux kernel trees by autotest bots such as 0-day, Hulk
Robot [10]
[15][24].
We got Smatch by cloning this repository. The
project’s commit history showed us contributions recent to the time we evaluated
the tool, although a single developer had authored the majority of those
changes. The mailing list archives
we found have registered no messages for years. Smatch also has a mailing list
at vger.kernel.org, but we did
not find mail archives for those. The Smatch documentation is brief,
nevertheless, it contains instructions on how to install and use the source
matcher. Within a few minutes, we had set up Smatch and run some static tests
against Linux drivers.
Coccinelle / coccicheck
Coccinelle is a static analyzer engine that provides a language for specifying
matches and transformations in C code. Coccinelle is used to aid collateral
evolution of source code and to help in catching certain bugs that have been
expressed semantically. Collateral evolution is needed when client code has to
be updated due to development in library API. Renaming a function, adding
function parameters, and reorganizing data structures are examples of changes
that may lead to collateral evolution. Also, bug chase and fixing are made with
the aid of coccicheck, a collection of semantic patches that makes use of the
Coccinelle engine to interpret and run a set of tests. coccicheck is available
in the Linux kernel under a make target with the same name
[37]
[29]
[30].
Moreover, coccicheck is run on Linux kernel trees by automated test robots such
as 0-day and Hulk robots [24]
[29].
Coccinelle can be obtained through the package manager of many GNU/Linux
distributions, as a compressed tar.gz file from the project’s web page, or
through a GitHub repository. As of the day we evaluated the static analyzer,
Coccinelle’s repository had some recent commits, most of them by a single
developer. Also, the project’s mailing list
had ongoing conversations and patches under review.
The Linux kernel documentation has a page with installation and usage
instructions for both Coccinelle and coccicheck. Moreover, the kernel has a
Makefile target named “coccicheck” for running coccicheck semantic patches. In a
few minutes, we installed Coccinelle and ran some checks on Linux drivers.
jstest
jstest is a userspace utility program that displays joystick information such as
device status and incomming events. It can be used to test the features of the
Linux joystick API as well as for testing the functionality of a joystick
driver [9]
[22]
[38].
jstest is part of the Linux Console Project and can be obtained from
Source Forge or through the
package manager of some GNU/Linux distributions. However, as of the date we
evaluated jstest, the project’s repository was about a year without updates and
the associated mailing lists
were without discussions or patches for even longer. Despite that, the jstest
documentation was helpful as it listed dependency packages and the installation
steps for the tool. Also, the manual page is brief yet informative, containing
what you need to use the tool. To use jstest, one must provide the path to a
joystick or gamepad device. jstest displays the inputs obtained from joysticks
and gamepads and thus can be used to test the functioning of drivers for these
devices in a black-box fashion.
TuxMake
TuxMake, by Linaro, is a command line tool and Python library designed to make
building the Linux kernel easier. It seeks to simplify Linux kernel building by
providing a consistent command line interface to build the kernel across a
variety of architectures, toolchains, kernel configurations, and make targets.
By removing the friction of dealing with different build requirements, TuxMake
assists developers, especially newcomers, to build test the kernel for uncommon
toolchain/architecture combinations. Moreover, TuxMake comes with a set of
curated portable build environments distributed as container images. These
versioned and hermetic filesystem images make it easier to describe and
reproduce builds and build problems. Although it does not support every Linux
make target, the TuxMake team plans to add support for additional targets such
as kselftest, cpupower, perf, and documentation. TuxMake is part of TuxSuite,
which in turn makes part of Linaro’s main Linux testing effort
[21]
[26]
[28].
TuxMake is available from its GitLab
repository and can also be downloaded as a
package for many GNU/Linux distros. The contributions to the project’s
repository are recent to the date we evaluated the tool. In addition, the
TuxMake documentation contains installation instructions and examples of how to
use the tool. We note, however, that TuxMake focuses on build testing and thus
only builds the artifacts bound to make targets, not triggering the execution of
further tests cases even when those targets would do so by default.
Sparse
Sparse (semantic parser) is a semantic checker for C programs originally written
by Linus Torvalds to support his work on the Linux kernel. Sparse does semantic
parsing of source code files in a few phases summarized as full-file
tokenization, pre-processing, semantic parsing, lazy type evaluation, inline
function expansion, and syntax tree simplification
[50]. The semantic parser can help test C
programs by performing type-checking, lock checking, value range checking, as
well as reporting various errors and warnings while examining the code
[49][50].
In fact, Sparse also comprises a compiler frontend capable of parsing most ANSI
C programs as well as a collection of compiler backends, one of which, is a
static analyzer that takes the same name [51].
The kernel build system has a couple of make options that support checking code
with static analyzers, and it uses Sparse as the default checker
[10].
Also, autotest robots such as 0-day and Hulk Robot run Sparse on kernel trees
they test [24].
SymDrive
Renzelmann et al. [39] focused on testing
Linux kernel device drivers using symbolic execution. This technique consists of
replacing a program’s input with symbolic values. Rather than using the actual
data for a given function, symbolic execution comes up with input values
throughout the range of possible values to each parameter. SymDrive intercepts
all calls into and out of a driver with stubs that call a test framework and
checkers. Stubs may invoke checkers passing the set of parameters for the
function under analysis, the function’s return, and a flag indicating whether
the checker is running before or after the function under test. Driver state can
be accessed by calling a supporting library. Thus, checkers can evaluate the
behavior of a function under test from the execution conditions and the obtained
results.
Symdrive stood out between related works as a testing tool for drivers in the
kernel through symbolic code execution. To set up Symdrive, we followed the
installation steps listed on their developer’s page
[40]. One of the first steps of the setup
consists of compiling and installing S2E, a software platform that provides
functionalities for symbolic execution on virtual machines. The S2E
documentation mentions the use of Ubuntu as a prerequisite for setting up the
[41][42]
[43], yet, we found indications of
compatibility with other OSes after inspecting the compilation and installation
scripts. Also, we encountered installation error messages notifying us that S2E
is compatible only with a restricted set of processors. However, even though we
had configured the VM with a compatible processor, our 10GB of system memory was
not enough to prevent installation scripts from failing due to lack of RAM.
We found out that the S2E mailing list was semi-open, meaning that only
subscribed addresses may send emails to it. To subscribe to the S2E list, a
moderator must first approve your subscriptions request. However, it took a
month for an S2E moderator to accept our subscription request to their mailing
list. By the time they granted access to us, we were assessing other testing
tools and did not want to come back to this one. Finally, our email to the
authors of the SymDriver paper was unanswered. So, after a series of setbacks
related to installation and lack of access to support, we gave up on installing
S2E and evaluating the use of the SymDrive tool.
FAU machine
Buchacker and Sieh [44] developed a
framework for testing fault tolerance of GNU/Linux systems by injecting faults
in an entirely simulated running system. FAU machine runs a User Mode Linux
(UML) port of the Linux kernel, which maps every UML process onto a single
process in the host system. Thus, a complete virtualized machine runs on top of
a real-world Linux machine as a single process. For injecting faults into the
virtualized system, the framework launches a second process in the host system.
Every time a UML process makes a system call, return from a system call, or
receives a signal, it is stopped by the auxiliary host process. The host process
then decides whether the halted process will continue with or without the signal
received, if errors should be returned from system calls instead of the actual
value, and so on. This technique of virtualization combined with the
interception of processes has the benefits of maintaining binary compatibility
of programs, allowing fault injection in core kernel functionalities, peripheral
faults, external faults, real-time clock faults, and interrupt/exception faults.
To test with FAU machine, we firts asked the authors of
[44] to point out the tool’s repository.
Next, we downloaded the associated repositories and installed the packages
needed for build and installation. The project documentation is outdated and is
not maintained by the developers. For instance, two packages indicated in the
documentation as necessary for the build are deprecated and no longer needed. In
reply to one of our messages, the project maintainer said that questions could
be answered by email: “Just forget *any* documentation you find regarding
FAUmachine. None is correct any more. Sorry for that. We just don’t have time to
update these documents. I think you must ask your questions using e-mail.”.
After compiling and installing the FAU machine, we tried to run some tests by
setting up an example from FAU source files. The experiment consisted of
starting a virtual machine and installing a Debian image on its disk. However,
the experiment run script failed during image installation. Our following email
to the maintainer asking for help with the experiment went unanswered. Still,
within the menus and options in the virtual machine management window, it was
possible to see items referencing system fault injection. The evaluation of
these tests, however, cannot be completed.
ADFI
Cong et al. [45] introduced a tool that
generates fault scenarios for testing device drivers based on previously
collected runtime traces. ADFI hooks internal kernel API so that function calls
and return values are intercepted and recorded in trace files. A fault scenario
generator takes trace files as input and iteratively produces fault scenarios
where an intercepted return to a driver is replaced by a fault. Each fault
scenario is then run, and the resulting stack traces are collected to feed
further iterations of the fault scenario generator. ADFI employs this test
method aiming to assess driver error handling code paths that, otherwise, would
rarely be followed.
According to [45], the efforts to run ADFI
include (1) preparing a configuration file for driver testing; (2) crash
analysis; and (3) (optionally) compilation flag modification to support test
coverage. ADFI automatically runs each generated fault scenario, one after
another, so test execution is automated.
Nevertheless, there is no link or web page address for the ADFI project
repository in the article we found about the tool. Moreover, the authors did not
respond to our email asking how to get ADFI. Thus, it was not possible to
evaluate ADFI as we could not even get the tool.
EH-Test
Bai et al. [46] focused on device driver testing
through a similar approach. They developed a kernel module to monitor and record
driver runtime information. Further, a pattern-based fault extractor takes
runtime data plus driver source code and kernel interface functions as input and
extracts target functions from them. EH-Test considers target functions taking
into account driver-specific knowledge such as function return types and whether
values returned by functions are checked inside some if
statement. The C
programming language has no built-in error handling mechanism (such as
try-catch
), so developers often use an if
statement to decide whether error
handling code should be triggered in device drivers. Then, a fault injector
module generates test cases in which target function returns are replaced by
faulty values. Finally, a probe inserter generates a separate loadable driver
for each test case. These loadable driver modules have target function calls
replaced by an error function in their code.
As for evaluating EH-Test, we downloaded the tool’s source code and, with some
adjustments, we managed to build some of the test modules. However, some EH-Test
components do not build with current GCC and LLVM versions. We mailed the main
author asking for some installation and usage guidance, but we had no feedback.
COD
B. Chen et al. [47] presented a test approach
based on hybrid symbolic-concrete (concolic) execution. Their work focus on
testing LKM (Linux Kernel Modules) using two main techniques: (1) automated test
case generation from LKM interfaces with concolic execution; (2) automated test
case replay that repeatedly reproduced detected bugs.
During test case generation, the COD Agent component sequentially executes
commands from an initial test case to trigger functionalities of target LKMs
through the base kernel. Two custom kernel modules intercept interactions
between base Linux kernel and LKMs under test and add new tainted values to a
taint analysis engine. When all commands in the test harness are finished, COD
captures the runtime execution trace into a file and sends it to a symbolic
engine. A trace replayer performs symbolic analysis over the captured trace
file, then sends the generated test cases back to the execution environment.
These steps then repeat to produce more test cases until some criteria (such as
elapsed time) are met.
In test case replay mode, COD Test Case Replayer picks a test case and executes
the commands in the test harness to trigger functionalities of target LKMs.
Three custom kernel modules intercept the interactions between kernel and LKMs
under test, modify these interactions when needed, and capture kernel API usage
information. After all commands in the test harness are finished, COD retrieves
the kernel API usage information from the custom kernel modules and checks for
potential bugs. This process repeats for each test case given as input.
Yet, for reasons analogous to ADFI, COD could not be tested either. There is no
repository link in [47] or instruction on how
to get COD. We sent an email to the paper authors, but that was unanswered.
Troll
Rothberg et al. [48] developed a tool to
generate representative kernel compilation configurations for testing. Troll
parses files locally for configuration options (#ifdef) and creates a
partial kernel compilation configuration. This initial step is called sampling.
Each partial configuration is then abstracted by a node in a configuration
compatibility graph (CCG). In this graph, mutually compatible configurations are
linked by an edge. In the next step (merging), Troll looks up the CCG for the
largest click (set of nodes that are all linked together) and merges all those
partial configurations that belong to the click. The compilation configuration
obtained with the biggest click covers most of the #ifdef and generates several
warnings when Sparse analyzes the code generated by such arrangement. With a
valid configuration providing good coverage of different configurations
(#ifdef), further automated testing is more likely to find bugs.
Since Troll was designed to generate Linux kernel build configurations, it does
not fit into the kernel tests category. Despite that, we decided to give Troll a
try. Nevertheless, on our first shot, we found that some new Kconfig features
were not supported by Undertaker, a software whose output was needed to feed
Troll. Also, the Undertaker mailing list was semi-open. Since our adjustments to
the kernel symbols were insufficient to make Undertaker generate partial kernel
configurations, our last resort was to reach Troll’s developers. Surprisingly,
the authors were very responsive and helped us to set up the latest Undertaker
version. After that, we ran an example from Troll documentation that generates
Linux kernel compilation settings. The uses of Troll presented in
[48] are analogous to the documentation
example we ran but the fact that they require more than 10GB of system memory to
complete. Due to that, we could not reproduce those use cases.
History
- V1: Release
References
[1] Aditya P. Mathur. “Foundations of Software Testing”. (2013) Pearson India. URL: https://www.oreilly.com/library/view/foundations-of-software/9788131794760/. ⤴
[2] “IEEE Standard Glossary of Software Engineering Terminology”. In IEEE Std 610.12-1990. (1990) Pages 1-84. URL: https://ieeexplore.ieee.org/document/159342. ⤴
[3] “Kernel Testing Guide”. URL: https://www.kernel.org/doc/html/latest/dev-tools/testing-overview.html. ⤴
[4] “Re: [PATCH v3] Documentation: dev-tools: Add Testing Overview”. URL: https://lore.kernel.org/linux-doc/CABVgOS=2iYtqTVdxwH=mcFpcSuLP4cpJ4s6PKP4Gc-SH6jidgQ@mail.gmail.com/. ⤴
[5] “Linux Kernel Selftests”. (2021) URL: https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html. ⤴
[6] Shuah Khan. “Kernel Validation With Kselftest”. (2021) URL: https://linuxfoundation.org/webinars/kernel-validation-with-kselftest/. ⤴
[7] “Kernel self-test”. (2019) URL: https://kselftest.wiki.kernel.org/. ⤴
[8] . “A Tour Through RCU’s Requirements”. (2021) URL: https://www.kernel.org/doc/html/latest/RCU/Design/Requirements/Requirements.html. ⤴
[9] . “xpad - Linux USB driver for Xbox compatible controllers”. (2021) URL: https://www.kernel.org/doc/html/latest/input/devices/xpad.html. ⤴
[10] Shuah Khan. “Linux Kernel Testing and Debugging”. (2014) URL: https://www.linuxjournal.com/content/linux-kernel-testing-and-debugging. ⤴
[11] Jake Edge. “Maintaining stable stability”. (2020) URL: https://lwn.net/Articles/825536/. ⤴
[12] Jonathan Corbet. “Some 5.12 development statistics”. (2021) URL: https://lwn.net/Articles/853039/. ⤴
[13] Andrey Konovalov. “Fuzzing Linux Kernel”. (2021) URL: https://linuxfoundation.org/webinars/fuzzing-linux-kernel/. ⤴
[14] Manoj Iyer. “LTP HowTo”. (2012) URL: http://ltp.sourceforge.net/documentation/how-to/ltp.php. ⤴
[15] . “Smatch The Source Matcher”. (2021) URL: http://smatch.sourceforge.net/. ⤴
[16] Daniel Jordan. “So, you are a Linux kernel programmer and you want to do some automated testing…”. (2021) URL: https://blogs.oracle.com/linux/ktest. ⤴
[17] . “Ktest”. (2017) URL: https://elinux.org/Ktest. ⤴
[18] Dmitry Vyukov and Andrey Konovalov and Marco Elver. “syzbot”. (2021) URL: https://github.com/google/syzkaller/blob/master/docs/syzbot.md. ⤴
[19] . “Distributed Linux Testing Platform KernelCI Secures Funding and Long-Term Sustainability as New Linux Foundation Project”. (2019) URL: https://www.prnewswire.com/news-releases/distributed-linux-testing-platform-kernelci-secures-funding-and-long-term-sustainability-as-new-linux-foundation-project-300945978.html. ⤴
[20] . “Linux Kernel Developer: Shuah Khan”. (2017) URL: https://linuxfoundation.org/blog/linux-kernel-developer-shuah-khan/. ⤴
[21] Dan Rue. “Portable and reproducible kernel builds with TuxMake”. (2021) URL: https://lwn.net/Articles/841624/. ⤴
[22] . “Linux Joystick support - Introduction”. (2021) URL: https://www.kernel.org/doc/html/latest/input/joydev/joystick.html. ⤴
[23] Mark Filion. “How Continuous Integration Can Help You Keep Pace With the Linux Kernel”. (2016) URL: https://www.linux.com/audience/enterprise/how-continuous-integration-can-help-you-keep-pace-linux-kernel/. ⤴
[24] . “2020 Linux Kernel History Report”. (2020) URL: https://linuxfoundation.org/wp-content/uploads/2020_kernel_history_report_082720.pdf. ⤴
[25] Jonathan Corbet, Greg Kroah-Hartman. “2017 Linux Kernel Development Report”. (2017) URL: https://www.linuxfoundation.org/wp-content/uploads/linux-kernel-report-2017.pdf. ⤴
[26] Dan Rue, Antonio Terceiro. “Linaro/tuxmake - README.md”. (2021) URL: https://gitlab.com/Linaro/tuxmake. ⤴
[27] . “Welcome to KernelCI”. (2021) URL: https://kernelci.org/. ⤴
[28] . “Rapid Operating System Build and Test”. (2021) URL: https://www.linaro.org/os-build-and-test/. ⤴
[29] Luis R. Rodriguez, Nicolas Palix. “coccicheck [Wiki]”. (2016) URL: https://bottest.wiki.kernel.org/coccicheck. ⤴
[30] Luis R. Rodriguez, Tyler Baker, Valentin Rothberg. “linux-kernel-bot-tests - start [Wiki]”. (2016) URL: https://bottest.wiki.kernel.org/. ⤴
[31] . “Linux Kernel Performance”. (2021) URL: https://01.org/lkp. ⤴
[32] Dave Jones. “Linux system call fuzzer - README”. (2017) URL: https://github.com/kernelslacker/trinity. ⤴
[33] Michael Kerrisk. “LCA: The Trinity fuzz tester”. (2013) URL: https://lwn.net/Articles/536173/. ⤴
[34] Jonathan Corbet. “Statistics from the 5.4 development cycle”. (2019) URL: https://lwn.net/Articles/804119/. ⤴
[35] . “Linaro’s Linux Kernel Functional Test framework”. (2021) URL: https://lkft.linaro.org/. ⤴
[36] . “Tests in LKFT”. (2021) URL: https://lkft.linaro.org/tests/. ⤴
[37] . “Coccinelle: A Program Matching and Transformation Tool for Systems Code”. (2022) URL: https://coccinelle.gitlabpages.inria.fr/website/. ⤴
[38] Stephen Kitt. “jstest - joystick test program”. (2009) URL: https://sourceforge.net/p/linuxconsole/code/ci/master/tree/docs/jstest.1. ⤴
[39] Matthew J. Renzelmann and Asim Kadav and Michael M. Swift. “SymDrive: Testing Drivers without Devices”. (2012) Pages 279-292. ⤴
[40] Matthew J. Renzelmann and Asim Kadav and Michael M. Swift. “SymDrive Download and Setup”. (2012) URL: https://research.cs.wisc.edu/sonar/projects/symdrive/downloads.shtml. ⤴
[41] Cyberhaven. “Creating analysis projects with s2e-env - S2E 2.0 documentation”. (2020) URL: http://s2e.systems/docs/s2e-env.html. ⤴
[42] Adrian Herrera. “s2e-env/README.md at master - S2E/s2e-env”. (2020) URL: https://github.com/S2E/s2e-env/blob/master/README.md. ⤴
[43] Cyberhaven. “Building the S2E platform manually - S2E 2.0 documentation”. (2020) URL: http://s2e.systems/docs/BuildingS2E.html. ⤴
[44] Buchacker, K. and Sieh, V.. “Framework for testing the fault-tolerance of systems including OS and network aspects”. (2001) Pages 95-105. ⤴
[45] Cong, Kai and Lei, Li and Yang, Zhenkun and Xie, Fei. “Automatic Fault Injection for Driver Robustness Testing”. (2015) Pages 361–372. URL: https://doi.org/10.1145/2771783.2771811. ⤴
[46] Jia-Ju Bai and Yu-Ping Wang and Jie Yin and Shi-Min Hu. “Testing Error Handling Code in Device Drivers Using Characteristic Fault Injection”. (2016) Pages 635-647. ⤴
[47] Chen, Bo and Yang, Zhenkun and Lei, Li and Cong, Kai and Xie, Fei. “Automated Bug Detection and Replay for COTS Linux Kernel Modules with Concolic Execution”. (2020) Pages 172-183. ⤴
[48] Rothberg, Valentin and Dietrich, Christian and Ziegler, Andreas and Lohmann, Daniel. “Towards Scalable Configuration Testing in Variable Software”. (2016) Pages 156–167. URL: https://doi.org/10.1145/3093335.2993252. ⤴
[49] . “Sparse”. (2022) URL: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/dev-tools/sparse.rst?h=v5.17-rc7. ⤴
[50] Neil Brown. “Sparse: a look under the hood”. (2016) URL: https://lwn.net/Articles/689907/. ⤴
[51] . “Welcome to sparse’s documentation”. (2022) URL: https://sparse.docs.kernel.org/en/latest/. ⤴