Runbook Reference
What is a runbook
- In simple terms:
The **runbook** contains all the configurations of LISA operation. It keeps you from lengthy command-line commands and makes it easy to adjust configurations.
See Runbook for further knowledge.
How-to
Configure Azure deployment
Below section is for running cases on Azure platform, it specifies:
admin_private_key_file: the private key file to access the Azure VM. (Optional)
subscription_id: Azure VM is created under this subscription.
azcopy_path: the installation path of the AzCopy tool on the machine where LISA is installed. It speeds up copying VHDs between Azure storage accounts. (Optional)
resource_group_tags: tags to apply to created resource groups as key-value pairs. (Optional)
platform:
- type: azure
admin_private_key_file: $(admin_private_key_file)
azure:
subscription_id: $(subscription_id)
azcopy_path: $(azcopy_path)
resource_group_tags:
Environment: Testing
Project: LISA
Select and set test cases
Below section is to specify P0 and P1 test cases excluding case with
name hello.
testcase:
- criteria:
priority: [0, 1]
- criteria:
name: hello
select_action: exclude
Below section demonstrates how to configure test cases with retry, repetition, and timeout settings. The first test case will automatically retry up to 2 times if it fails, redeploying the environment for each retry attempt. The second test case demonstrates stress testing by running 3 times unconditionally (regardless of pass/fail) with a custom timeout of 1 hour.
testcase:
- criteria:
priority: 0
retry: 2
- criteria:
name: verify_reboot_in_platform
times: 3
timeout: 3600
Use variable and secrets
Below section is to specify the variable in name/value format. We can
use this variable in other field in this format $(location).
variable:
- name: location
value: westus3
The value of variable passed from command line will override the value in runbook yaml file.
lisa -r sample.yml -v "location:westus3"
Below section is to specify the path of yaml file which stores the secret values.
variable:
- file: secret.yml
Content of secret.yml.
subscription_id:
value: replace_your_subscription_id_here
is_secret: true
mask: guid
Use partial runbook
Below three yaml files will be loaded in this sequence.
loading runbook sample.yml
|-- loading include tier.yml
| |-- loading include t0.yml
The variable values in the included yaml file(s) will be overridden by the including yaml file(s). The relative path is always relative to the including yaml file.
Part of sample.yml
include:
- path: ./tier.yml
Part of tier.yml.
include:
- path: ./t$(tier).yml
variable:
- name: tier
value: 0
Part of t0.yml.
testcase:
- criteria:
priority: 0
Use extensions
Below section is to specify path of extensions, the extensions are modules for test cases or extended features.
extension:
- name: extended_features
path: ../../extensions
- ../../lisa/microsoft/testsuites/core
Conditionally enable/disable environments or nodes
You can use the enabled field to conditionally enable or disable entire
environments or individual nodes within an environment. This is particularly
useful when combined with variables for dynamic configuration.
Below example shows how to enable/disable environments based on a variable:
variable:
- name: use_prod
value: true
- name: use_dev
value: false
environment:
environments:
- name: production_env
enabled: $(use_prod) # Controlled by variable
nodes:
- type: local
- name: dev_env
enabled: $(use_dev) # This environment will be skipped
nodes:
- type: local
Below example shows how to selectively disable specific nodes within an environment:
environment:
environments:
- name: multi_node_env
nodes:
- name: primary_node
type: local
enabled: true # Always enabled
- name: secondary_node
type: local
enabled: false # Temporarily disabled
- name: optional_node
type: remote
address: 192.168.1.100
enabled: $(include_remote_node) # Variable-controlled
This allows you to:
Temporarily disable environments or nodes without deleting their configuration
Use variables to control which environments/nodes are active
Maintain multiple environment configurations and switch between them dynamically
Use transformers
Transformers are executed one by one. The order is decided by their dependencies. If there is no dependencies, their order in runbook affects the execution order.
Below transformer shows how to deploy a VM in Azure, and export it to a VHD. Before the exporting, other transformers can be added, like install kernel.
transformer:
- type: azure_deploy
requirement:
azure:
marketplace: redhat rhel 7_9 7.9.2021051701
- type: azure_vhd
resource_group_name: $(azure_deploy_resource_group_name)
rename:
azure_vhd_url: vhd
- type: azure_delete
resource_group_name: $(azure_deploy_resource_group_name)
Below is the transformer to build kernel from source code and patches.
transformer:
- type: azure_deploy
requirement:
azure:
marketplace: $(marketplace_image)
core_count: 16
enabled: true
- type: kernel_installer
connection:
address: $(azure_deploy_address)
private_key_file: $(admin_private_key_file)
installer:
type: source
location:
type: repo
path: /mnt/code
ref: tags/v4.9.184
modifier:
- type: patch
repo: https://github.com/microsoft/azure-linux-kernel.git
file_pattern: Patches_Following_Mainline_History/4.9.184/*.patch
Reference
name
type: str, optional, default is “not_named”
Part of the test run name. This name will be used to group results and put it in title of the html report, also the created resources’ name contains this specified str.
name: Azure Default
test_project
type: str, optional, default is empty
The project name of this test run. This name will be used to group test results in html, it also shows up in notifier message.
test_project: Azure Image Weekly Testing
test_pass
type: str, optional, default is empty
The test pass name of this test run. This name combined with test project name will be used to group test results in html report, it also shows up in notifier message.
test_pass: bvt testing
concurrency
type: int, optional, default is 1.
The number of concurrent running environments.
exit_on_first_failure
type: bool, optional, default is False.
When set to True, LISA will terminate test execution immediately after the first test case failure. All remaining queued test cases will be marked as skipped with the message “Test execution stops early.” This is particularly useful for debugging and reproducing specific test failures quickly.
exit_on_first_failure: true
Note
This setting only affects test case execution order. Test cases that are already running in parallel when a failure occurs will continue to completion.
import_builtin_tests
type: bool, optional, default is False.
When set to True, LISA will import and make available built-in Microsoft test cases located in the lisa/microsoft directory. These are test cases provided by Microsoft Linux System Group for comprehensive system validation.
import_builtin_tests: true
include
type: list of path, optional, default is empty
Share runbook parts for similar runs, including the shared content via that yaml primitive.
path
It can be absolute or relative path of current runbook.
extension
type: list of path str or name/path pairs, optional, default: empty
The path and the name of the modules, we can also just specify the extension path directly.
extension:
- name: ms
path: ../../extensions
name
type: str, optional, default is empty
Each extension can be specified a name. With the name, one extension can reference another one, using above example extension, in code we can reference it like this way ms.submodule.
path
type: str, optional, default is empty
Path of extension, it can be absolute or relative path of current runbook file.
variable
type: list of path str or name/value pairs, optional, default: empty
Used to support variables in other fields.
The values pass from command line has the highest priority, with below
example, any places use ${subscription_id} will be replaced with
value subscription id B.
lisa -r ./microsoft/runbook/azure.yml -v "subscription_id:<subscription id A>"
variable:
- name: subscription_id
value: subscription id B
The variable values in the runbook have higher priority than the same variables
defined in any included runbook file. Thus, ${location} will be replaced with
value northeurope in the following example.
include:
- path: tier.yml
variable:
- name: location
value: northeurope
tier.yml
variable:
- name: location
value: westus3
The later defined variables values in runbook have higher priority than
the same variables previous defined. ${location} will be replaced
with value northeurope.
variable:
- name: location
value: westus3
- name: location
value: northeurope
is_case_visible
type: bool, optional, default is False.
When set to True, the value of this variable will be passed to the testcases,
such as perf_nested_kvm_storage_singledisk which requires information
about nested image.
is_secret
type: bool, optional, default is False.
When set to True, the value of this variable will be masked in log and other output information.
Recommend to use secret file or env variable. It’s not recommended to specify secret value in runbook directly.
file
type: list of str, optional, default: empty
Specify path of other yml files which define variables.
name
type: str, optional, default is empty.
Variable name.
value
type: str, optional, default is empty
Value of the paired variable.
transformer
type: list of Transformer, default is empty
type
type: str, required, the type of transformer. See transformers for all transformers.
name
type: str, optional, default is the type.
Unique name of the transformer. It’s depended by other transformers. If
it’s not specified, it will use the type field. But if there are two
transformers with the same type, one of them should have name at least.
prefix
type: str, optional, default is the name.
The prefix of generated variables from this transformer. If it’s not
specified, it will use the name field.
depends_on
type: list of str, optional, default is None.
The depended transformers. The depended transformers will run before this one.
rename
type: Dict[str, str], optional, default is None.
The variables, which need to be renamed. If the variable exists already,
its value will be overwritten by the transformer. For example,
["to_list_image", "image"] means change the variable name
to_list_image to image. The original variable name must exist in
the output variables of the transformer.
combinator
type: str, required.
The type of combinator, for example, grid or batch.
grid combinator
items
type: List[Variable], required.
The variables which are in the matrix. Each variable must be a list.
For example,
- type: grid
items:
- name: image
value:
- Ubuntu
- CentOs
- name: vm_size
value:
- Standard_DS2_v2
- Standard_DS3_v2
- Standard_DS4_v2
batch combinator
items
type: List[Dict[str, Any]], required.
Specify batches of variables. Each batch will run once.
For example,
- type: batch
items:
- image: Ubuntu
vm_size: Standard_DS2_v2
- image: Ubuntu
vm_size: Standard_DS3_v2
- image: CentOS
vm_size: Standard_DS3_v2
bisect combinator
Specify a git repo url, the good commit and bad commit. The combinator performs bisect operations on VM specified under ‘connection’.
The runbook will be iterated until the bisect operations completes.
For example,
combinator:
type: git_bisect
repo: $(repo_url)
bad_commit: $(bad_commit)
good_commit: $(good_commit)
connection:
address: $(bisect_vm_address)
private_key_file: $(admin_private_key_file)
Refer Sample runbook
notifier
Receive messages during the test run and output them somewhere.
console
One of notifier type. It outputs messages to the console and file log and demonstrates how to implement notification procedures.
Example of console notifier:
notifier:
- type: console
log_level: INFO
log_level
type: str, optional, default: DEBUG, values: DEBUG, INFO, WARNING…
Set log level of notification messages.
html
Output test results in html format. It can be used for local development or as the body of an email.
path
type: str, optional, default: lisa.html
Specify the output file name and path.
auto_open
type: bool, optional, default: False
When set to True, the html will be opened in the browser after completion. Useful in local run.
Example of html notifier:
notifier:
- type: html
path: ./lisa.html
auto_open: true
junit
Output test results in JUnit XML format. The generated XML file can be used for integration with CI/CD systems, dashboards, and other tools that consume JUnit test results.
path
type: str, optional, default: lisa.junit.xml
Specify the output file name and path for the JUnit XML report.
include_subtest
type: bool, optional, default: True
When set to True, subtests will be included as separate test cases in the JUnit XML output. When set to False, only main test cases are included.
append_message_id
type: bool, optional, default: True
When set to True, the message ID will be appended to test case names in the format “test_name (message_id)”. This is useful when using combinators to distinguish multiple test runs of the same test case. When set to False, only the base test case name is used.
Example of junit notifier:
notifier:
- type: junit
path: ./results.xml
include_subtest: true
append_message_id: false
log_agent
AI-powered log analysis notifier for automated test failure investigation. This notifier leverages Azure OpenAI to automatically analyze failed test cases, providing intelligent insights into potential root causes by examining test execution logs and code context from the LISA framework.
The log_agent notifier uses a multi-agent AI system that combines:
LogSearchAgent: Specialized in searching and analyzing log files for error patterns
CodeSearchAgent: Examines source code files and analyzes implementations related to errors
Magentic Orchestration: Coordinates the agents to provide comprehensive analysis
The analysis results are attached to test result messages and made available to downstream notifiers and reporting systems.
Prerequisites:
Azure OpenAI Access with the following deployments: - GPT-4.1 or GPT-4o for general analysis - GPT-4.1 for software-specific analysis (optional) - Text-embedding-3-large for similarity calculations (optional)
Required Python packages (automatically included with LISA): - openai - agent-framework-core - agent-framework-azure-ai - retry
azure_openai_endpoint
type: str, required
Azure OpenAI service endpoint URL for the AI analysis service.
Example: https://your-resource.openai.azure.com
azure_openai_api_key
type: str, optional, default: “”
Azure OpenAI API key for authentication. If not set, the notifier will use default authentication methods available in the environment.
Note: This value is automatically marked as secret and will be masked in logs.
general_deployment_name
type: str, optional, default: “gpt-4o”
Primary GPT model deployment name for general analysis tasks. This model is used by the orchestration manager to coordinate the analysis and synthesize findings.
software_deployment_name
type: str, optional, default: “gpt-4.1”
Specialized GPT model deployment name for software-specific analysis tasks. This model is used by the CodeSearchAgent for examining source code.
embedding_endpoint
type: str, optional, default: “”
Optional embedding service endpoint for similarity calculations and analysis quality measurement.
selected_flow
type: str, optional, default: “default”
Analysis workflow type to execute. Currently supported flows:
default: Standard multi-agent analysis workflow
gpt-5: Advanced analysis workflow (future enhancement)
skip_duplicate_errors
type: bool, optional, default: True
When set to True, the notifier will skip analysis for errors that have already been analyzed in the current test run, improving performance and avoiding redundant processing.
Example of log_agent notifier:
notifier:
- type: log_agent
azure_openai_endpoint: https://your-resource.openai.azure.com
azure_openai_api_key: $(azure_openai_api_key)
general_deployment_name: gpt-4o
software_deployment_name: gpt-4.1
selected_flow: default
skip_duplicate_errors: true
How it works:
Failure Detection: Automatically triggered when test cases fail
Log Analysis: Searches through test execution logs for error patterns
Code Review: Examines related source code if call traces are available
Hypothesis Generation: Generates possible reasons for the failure
Evidence Gathering: Searches for supporting evidence in logs
Root Cause Analysis: Provides comprehensive analysis with actionable insights
The AI analysis results are stored in the test result message’s analysis["AI"]
field and can be consumed by other notifiers like HTML or custom reporting systems.
perfevaluation
Evaluates performance test results against predefined criteria and optionally fails tests when targets are not met.
Basic Usage:
notifier:
- type: perfevaluation
criteria_file: "perf_criteria.yml"
output_file: "results.json"
fail_test_on_performance_failure: true
Parameters:
criteria_file
type: str, optional, default: “*_criteria.yml”
Path or glob pattern to YAML files containing performance criteria.
criteria
type: dict, optional, default: None
Direct criteria definition in runbook. Takes priority over criteria_file.
Example:
notifier:
- type: perfevaluation
criteria:
statistics_times: 1
error_threshold: 0.1
statistics_type: average
groups:
- name: "NVMe Performance"
conditions:
- name: "test_case"
type: "metadata"
value: "perf_nvme"
- name: "vm_size"
type: "information"
value: "Standard_L64*"
metrics:
- name: "qdepth_32_iodepth_1_numjob_32_setup_raw_bs_4k_cores_32_disks_8_read_iops"
min_value: 800000.0
target_value: 979000.0
error_threshold: 0.25
output_file
type: str, optional, default: None
Output path for detailed evaluation results in JSON format.
statistics_times
type: int, optional, default: None
Number of test runs to use for statistical calculations. If specified, overrides the global setting in criteria YAML.
fail_test_on_performance_failure
type: bool, optional, default: False
Mark tests as failed when performance criteria are not met.
YAML Criteria Format:
Hierarchical format with groups and conditions:
# Global settings
statistics_times: 1
error_threshold: 0.1
statistics_type: average
groups:
- name: "NVMe Performance - L64 Series"
description: "Performance criteria for Standard_L64as_v3 and Standard_L64s_v2 VMs"
error_threshold: 0.20
statistics_type: average
statistics_times: 1
conditions:
- name: "test_case"
type: "metadata"
value: "perf_nvme"
- name: "vm_size"
type: "information"
value: "Standard_L64*"
metrics:
- name: "qdepth_32_iodepth_1_numjob_32_setup_raw_bs_4k_cores_32_disks_8_read_iops"
min_value: 800000.0
target_value: 979000.0
error_threshold: 0.25
Global Configuration:
statistics_times: Number of test runs for statistical calculations (default: 1)error_threshold: Global tolerance for performance deviation (default: 0.1 = 10%)statistics_type: Statistical method to use -average(default),median,min, ormax
Group Configuration:
Each group can override global settings and contains:
name: Group identifierdescription: Human-readable descriptionerror_threshold: Group-level tolerancestatistics_type: Statistical method for this groupstatistics_times: Number of runs for this groupconditions: Matching rules for test resultsmetrics: Performance metrics to evaluate
Metric Properties:
min_value: Minimum acceptable value (inclusive)max_value: Maximum acceptable value (inclusive)target_value: Expected target valueerror_threshold: Acceptable deviation from target (as decimal, e.g., 0.25 = 25%)
Pattern Matching:
Uses fnmatch-style wildcards:
Standard_L64*: Matches Standard_L64as_v3, Standard_L64s_v2, etc.*nvme*: Test cases containing “nvme”Standard_D*ads_v5: D-series with specific pattern
Condition Structure:
Each condition must specify three fields:
name: The field name to match (e.g.,test_case,vm_size)type: The condition type - eithermetadataorinformationvalue: The pattern to match (supports wildcards)
Condition Types:
metadata: Matches test case metadata fields (e.g.,test_casename)information: Matches runtime information fields (e.g.,vm_size)All conditions within a group must match (AND logic)
Example condition:
conditions:
- name: "test_case"
type: "metadata"
value: "perf_nvme*"
- name: "vm_size"
type: "information"
value: "Standard_L*"
Example - Network Performance:
groups:
- name: "TCP NTTTCP SRIOV Performance"
conditions:
- name: "test_case"
type: "metadata"
value: "perf_tcp_ntttcp_sriov"
- name: "vm_size"
type: "information"
value: "Standard_D2ads_v5"
metrics:
- name: "throughput_in_gbps_conn_1"
min_value: 10.0
target_value: 11.89
error_threshold: 0.30
environment
List of environments. For more information, refer to Node and Environment.
retry
Number of retry attempts for failed deployments, default value is 0.
environments
List of test run environment.
name
type: str, optional, default is empty
The name of the environment.
enabled
type: bool, optional, default is true
Controls whether the environment is loaded and used during test execution. When
set to false, the environment will be skipped during initialization. This is
useful for definining multiple similar environments in the same runbook.
Example:
environment:
environments:
- name: prod_env
enabled: true # This environment will be loaded
nodes:
- type: local
- name: dev_env
enabled: $(use_dev_env) # Variable-controlled
nodes:
- type: local
topology
type: str, optional, default is “subnet”
The topology of the environment, current only support value “subnet”.
nodes
List of node, it can be a virtual machine on Azure or Hyper-V, bare metal or others. For more information, refer to Node and Environment.
Each node supports an enabled field:
enabled (bool, optional, default is true): Controls whether the node is
loaded during environment initialization. When set to false, the node will
be skipped. This is useful for selecting specific nodes from the same
environment configuration.
Example:
environment:
environments:
- name: test_env
nodes:
- name: node1
type: local
enabled: true # This node will be loaded
- name: node2
type: local
enabled: false # This node will be skipped
- name: node3
type: remote
address: 192.168.1.100
enabled: $(enable_node3) # Variable-controlled
nodes_requirement
List of testing required environments, by default node_count (default is 1), core_count (default is 1), memory_mb (default is 512 MB), data_disk_count (default is 0), nic_count (default is 1), gpu_count (default is 0). The node can be created once the node requirement is met.
type
type: str, optional, default value is “requirement”, supported values are “requirement”, “remote”, “local”.
platform
List of platform, default value is “ready”, current support values are “ready”, “azure”.
testcase
type: list of str, optional, default: lisa
Criteria to select cases.
criteria
type: list of dictionary, optional, default is empty
Select test cases by area, category, name, priority or tags combined with select action.
select_action can be “none”, “include”, “exclude”, “forceInclude” and “forceExclude”, default value is “none”.
testcase:
- criteria:
priority: 0
select_action: include
- criteria:
priority: 1
select_action: exclude
times
type: int, optional, default is 1
Run this group of test cases the specified number of times. This is useful for stress testing or ensuring test reliability.
testcase:
- criteria:
priority: 0
times: 3
retry
type: int, optional, default is 0
Number of retry attempts if a test case fails. When a test case fails, LISA will automatically retry it up to the specified number of times. The test environment is deleted and recreated for each retry attempt to ensure a clean state.
This is particularly useful for:
Tests that may experience transient failures
Flaky tests that need multiple attempts to pass
Tests that interact with external services
testcase:
- criteria:
priority: 0
retry: 2
Note
The retry count is independent of the times count. If both are set, the test will run times × (1 + retry attempts) in the worst case where all attempts fail.
timeout
type: int, optional, default is 0
Timeout in seconds for each test case. When a test case runs, LISA uses the maximum value between the timeout specified in the runbook and the test case’s own metadata timeout. If this field is set to 0 (default) or not specified, only the test case’s metadata timeout is used (which defaults to 3600 seconds / 1 hour if not explicitly set in the test case). This allows you to extend timeouts for specific test runs without modifying the test case code.
Note that this timeout applies to the overall test case execution. Any additional command-level timeouts set within the test case code itself will not be affected by this setting.
testcase:
- criteria:
name: verify_deployment_provision_ultra_datadisk
timeout: 3600
use_new_environment
type: bool, optional, default is False
When set to True, each test case with this rule will be run in a newly created environment. This ensures complete isolation between test cases but increases the overall test execution time.
testcase:
- criteria:
name: verify_stop_start_in_platform
use_new_environment: true
ignore_failure
type: bool, optional, default is False
When set to True, failed test results will be rewritten as success. This is intended as a temporary workaround for known issues and should not be overused.
testcase:
- criteria:
name: known_flaky_test
ignore_failure: true
Warning
This setting masks test failures and should only be used as a temporary measure. Do not use it to hide real issues.