Runbook Reference

What is a runbook
How-to
Reference
- name
- test_project
- test_pass
- tags
- concurrency
- exit_on_first_failure
- import_internal_tests
- include
  - path
- extension
  - name
  - path
- variable
  - is_case_visible
  - is_secret
  - file
  - name
  - value
- transformer
  - type
  - name
  - prefix
  - depends_on
  - rename
- combinator
- notifier
- environment
  - retry
  - environments
    - name
    - enabled
    - topology
    - nodes
    - nodes_requirement
      - type
- platform
- testcase

What is a runbook

In simple terms:: The **runbook** contains all the configurations of LISA operation. It keeps you from lengthy command-line commands and makes it easy to adjust configurations.

See Runbook for further knowledge.

How-to

Configure Azure deployment

Below section is for running cases on Azure platform, it specifies:

admin_private_key_file: the private key file to access the Azure VM. (Optional)
subscription_id: Azure VM is created under this subscription.
azcopy_path: the installation path of the AzCopy tool on the machine where LISA is installed. It speeds up copying VHDs between Azure storage accounts. (Optional)
resource_group_tags: tags to apply to created resource groups as key-value pairs. (Optional)

platform:
  - type: azure
    admin_private_key_file: $(admin_private_key_file)
    azure:
      subscription_id: $(subscription_id)
      azcopy_path: $(azcopy_path)
      resource_group_tags:
        Environment: Testing
        Project: LISA

Select and set test cases

Below section is to specify P0 and P1 test cases excluding case with name hello.

testcase:
  - criteria:
      priority: [0, 1]
  - criteria:
      name: hello
    select_action: exclude

Below section demonstrates how to configure test cases with retry, repetition, and timeout settings. The first test case will automatically retry up to 2 times if it fails, redeploying the environment for each retry attempt. The second test case demonstrates stress testing by running 3 times unconditionally (regardless of pass/fail) with a custom timeout of 1 hour.

testcase:
  - criteria:
      priority: 0
    retry: 2
  - criteria:
      name: verify_reboot_in_platform
    times: 3
    timeout: 3600

Use variable and secrets

Below section is to specify the variable in name/value format. We can use this variable in other field in this format $(location).

variable:
  - name: location
    value: westus3

The value of variable passed from command line will override the value in runbook yaml file.

lisa -r sample.yml -v "location:westus3"

Below section is to specify the path of yaml file which stores the secret values.

variable:
  - file: secret.yml

Content of secret.yml.

subscription_id:
  value: replace_your_subscription_id_here
  is_secret: true
  mask: guid

Use partial runbook

Below three yaml files will be loaded in this sequence.

loading runbook sample.yml
|-- loading include tier.yml
|   |-- loading include t0.yml

The variable values in the included yaml file(s) will be overridden by the including yaml file(s). The relative path is always relative to the including yaml file.

Part of sample.yml

include:
  - path: ./tier.yml

Part of tier.yml.

include:
  - path: ./t$(tier).yml
variable:
  - name: tier
    value: 0

Part of t0.yml.

testcase:
  - criteria:
      priority: 0

Use extensions

Below section is to specify path of extensions, the extensions are modules for test cases or extended features.

extension:
  - name: extended_features
    path: ../../extensions
  - ../../lisa/microsoft/testsuites/core

Conditionally enable/disable environments or nodes

You can use the enabled field to conditionally enable or disable entire environments or individual nodes within an environment. This is particularly useful when combined with variables for dynamic configuration.

Below example shows how to enable/disable environments based on a variable:

variable:
  - name: use_prod
    value: true
  - name: use_dev
    value: false

environment:
  environments:
    - name: production_env
      enabled: $(use_prod)  # Controlled by variable
      nodes:
        - type: local
    - name: dev_env
      enabled: $(use_dev)  # This environment will be skipped
      nodes:
        - type: local

Below example shows how to selectively disable specific nodes within an environment:

environment:
  environments:
    - name: multi_node_env
      nodes:
        - name: primary_node
          type: local
          enabled: true  # Always enabled
        - name: secondary_node
          type: local
          enabled: false  # Temporarily disabled
        - name: optional_node
          type: remote
          address: 192.168.1.100
          enabled: $(include_remote_node)  # Variable-controlled

This allows you to:

Temporarily disable environments or nodes without deleting their configuration
Use variables to control which environments/nodes are active
Maintain multiple environment configurations and switch between them dynamically

Use transformers

Transformers are executed one by one. The order is decided by their dependencies. If there is no dependencies, their order in runbook affects the execution order.

Below transformer shows how to deploy a VM in Azure, and export it to a VHD. Before the exporting, other transformers can be added, like install kernel.

transformer:
- type: azure_deploy
  requirement:
    azure:
      marketplace: redhat rhel 7_9 7.9.2021051701
- type: azure_vhd
  resource_group_name: $(azure_deploy_resource_group_name)
  rename:
    azure_vhd_url: vhd
- type: azure_delete
  resource_group_name: $(azure_deploy_resource_group_name)

Below is the transformer to build kernel from source code and patches.

transformer:
- type: azure_deploy
  requirement:
    azure:
      marketplace: $(marketplace_image)
    core_count: 16
  enabled: true
- type: kernel_installer
  connection:
    address: $(azure_deploy_address)
    private_key_file: $(admin_private_key_file)
  installer:
    type: source
    location:
      type: repo
      path: /mnt/code
      ref: tags/v4.9.184
    modifier:
      - type: patch
        repo: https://github.com/microsoft/azure-linux-kernel.git
        file_pattern: Patches_Following_Mainline_History/4.9.184/*.patch

Reference

name

type: str, optional, default is “not_named”

Part of the test run name. This name will be used to group results and put it in title of the html report, also the created resources’ name contains this specified str.

name: Azure Default

test_project

type: str, optional, default is empty

The project name of this test run. This name will be used to group test results in html, it also shows up in notifier message.

test_project: Azure Image Weekly Testing

test_pass

type: str, optional, default is empty

The test pass name of this test run. This name combined with test project name will be used to group test results in html report, it also shows up in notifier message.

test_pass: bvt testing

tags

type: list of str, optional, default is empty

The tags of the test run. This name combined with test project name and test pass name will be used to group test results in html report, it also shows up in notifier message.

tags:
  - test
  - bvt

concurrency

type: int, optional, default is 1.

The number of concurrent running environments.

exit_on_first_failure

type: bool, optional, default is False.

When set to True, LISA will terminate test execution immediately after the first test case failure. All remaining queued test cases will be marked as skipped with the message “Test execution stops early.” This is particularly useful for debugging and reproducing specific test failures quickly.

exit_on_first_failure: true

Note

This setting only affects test case execution order. Test cases that are already running in parallel when a failure occurs will continue to completion.

import_builtin_tests

type: bool, optional, default is False.

When set to True, LISA will import and make available built-in Microsoft test cases located in the lisa/microsoft directory. These are test cases provided by Microsoft Linux System Group for comprehensive system validation.

import_builtin_tests: true

include

type: list of path, optional, default is empty

Share runbook parts for similar runs, including the shared content via that yaml primitive.

path

It can be absolute or relative path of current runbook.

extension

type: list of path str or name/path pairs, optional, default: empty

The path and the name of the modules, we can also just specify the extension path directly.

extension:
  - name: ms
    path: ../../extensions

name

type: str, optional, default is empty

Each extension can be specified a name. With the name, one extension can reference another one, using above example extension, in code we can reference it like this way ms.submodule.

path

type: str, optional, default is empty

Path of extension, it can be absolute or relative path of current runbook file.

variable

type: list of path str or name/value pairs, optional, default: empty

Used to support variables in other fields.

The values pass from command line has the highest priority, with below example, any places use ${subscription_id} will be replaced with value subscription id B.

lisa -r ./microsoft/runbook/azure.yml -v "subscription_id:<subscription id A>"

variable:
  - name: subscription_id
    value: subscription id B

The variable values in the runbook have higher priority than the same variables defined in any included runbook file. Thus, ${location} will be replaced with value northeurope in the following example.

include:
  - path: tier.yml
variable:
  - name: location
    value: northeurope

tier.yml

variable:
  - name: location
    value: westus3

The later defined variables values in runbook have higher priority than the same variables previous defined. ${location} will be replaced with value northeurope.

variable:
  - name: location
    value: westus3
  - name: location
    value: northeurope

is_case_visible

type: bool, optional, default is False.

When set to True, the value of this variable will be passed to the testcases, such as perf_nested_kvm_storage_singledisk which requires information about nested image.

is_secret

type: bool, optional, default is False.

When set to True, the value of this variable will be masked in log and other output information.

Recommend to use secret file or env variable. It’s not recommended to specify secret value in runbook directly.

file

type: list of str, optional, default: empty

Specify path of other yml files which define variables.

name

type: str, optional, default is empty.

Variable name.

value

type: str, optional, default is empty

Value of the paired variable.

transformer

type: list of Transformer, default is empty

type

type: str, required, the type of transformer. See transformers for all transformers.

See documentation for transformers.

name

type: str, optional, default is the type.

Unique name of the transformer. It’s depended by other transformers. If it’s not specified, it will use the type field. But if there are two transformers with the same type, one of them should have name at least.

prefix

type: str, optional, default is the name.

The prefix of generated variables from this transformer. If it’s not specified, it will use the name field.

depends_on

type: list of str, optional, default is None.

The depended transformers. The depended transformers will run before this one.

rename

type: Dict[str, str], optional, default is None.

The variables, which need to be renamed. If the variable exists already, its value will be overwritten by the transformer. For example, ["to_list_image", "image"] means change the variable name to_list_image to image. The original variable name must exist in the output variables of the transformer.

combinator

type: str, required.

The type of combinator, for example, grid or batch.

grid combinator

items

type: List[Variable], required.

The variables which are in the matrix. Each variable must be a list.

For example,

- type: grid
  items:
  - name: image
    value:
      - Ubuntu
      - CentOs
  - name: vm_size
    value:
      - Standard_DS2_v2
      - Standard_DS3_v2
      - Standard_DS4_v2

batch combinator

items

type: List[Dict[str, Any]], required.

Specify batches of variables. Each batch will run once.

For example,

- type: batch
  items:
  - image: Ubuntu
    vm_size: Standard_DS2_v2
  - image: Ubuntu
    vm_size: Standard_DS3_v2
  - image: CentOS
    vm_size: Standard_DS3_v2

bisect combinator

Specify a git repo url, the good commit and bad commit. The combinator performs bisect operations on VM specified under ‘connection’.

The runbook will be iterated until the bisect operations completes.

For example,

combinator:
  type: git_bisect
  repo: $(repo_url)
  bad_commit: $(bad_commit)
  good_commit: $(good_commit)
  connection:
    address: $(bisect_vm_address)
    private_key_file: $(admin_private_key_file)

Refer Sample runbook

notifier

Receive messages during the test run and output them somewhere.

console

One of notifier type. It outputs messages to the console and file log and demonstrates how to implement notification procedures.

Example of console notifier:

notifier:
  - type: console
    log_level: INFO

log_level

type: str, optional, default: DEBUG, values: DEBUG, INFO, WARNING…

Set log level of notification messages.

html

Output test results in html format. It can be used for local development or as the body of an email.

path

type: str, optional, default: lisa.html

Specify the output file name and path.

auto_open

type: bool, optional, default: False

When set to True, the html will be opened in the browser after completion. Useful in local run.

Example of html notifier:

notifier:
  - type: html
    path: ./lisa.html
    auto_open: true

junit

Output test results in JUnit XML format. The generated XML file can be used for integration with CI/CD systems, dashboards, and other tools that consume JUnit test results.

path

type: str, optional, default: lisa.junit.xml

Specify the output file name and path for the JUnit XML report.

include_subtest

type: bool, optional, default: True

When set to True, subtests will be included as separate test cases in the JUnit XML output. When set to False, only main test cases are included.

append_message_id

type: bool, optional, default: True

When set to True, the message ID will be appended to test case names in the format “test_name (message_id)”. This is useful when using combinators to distinguish multiple test runs of the same test case. When set to False, only the base test case name is used.

Example of junit notifier:

notifier:
  - type: junit
    path: ./results.xml
    include_subtest: true
    append_message_id: false

log_agent

AI-powered log analysis notifier for automated test failure investigation. This notifier leverages Azure OpenAI to automatically analyze failed test cases, providing intelligent insights into potential root causes by examining test execution logs and code context from the LISA framework.

The log_agent notifier uses a multi-agent AI system that combines:

LogSearchAgent: Specialized in searching and analyzing log files for error patterns
CodeSearchAgent: Examines source code files and analyzes implementations related to errors
Magentic Orchestration: Coordinates the agents to provide comprehensive analysis

The analysis results are attached to test result messages and made available to downstream notifiers and reporting systems.

Prerequisites:

Azure OpenAI Access with the following deployments: - GPT-4.1 or GPT-4o for general analysis - GPT-4.1 for software-specific analysis (optional) - Text-embedding-3-large for similarity calculations (optional)
Required Python packages (automatically included with LISA): - openai - agent-framework-core - agent-framework-azure-ai - retry

azure_openai_endpoint

type: str, required

Azure OpenAI service endpoint URL for the AI analysis service.

Example: https://your-resource.openai.azure.com

azure_openai_api_key

type: str, optional, default: “”

Azure OpenAI API key for authentication. If not set, the notifier will use default authentication methods available in the environment.

Note: This value is automatically marked as secret and will be masked in logs.

general_deployment_name

type: str, optional, default: “gpt-4o”

Primary GPT model deployment name for general analysis tasks. This model is used by the orchestration manager to coordinate the analysis and synthesize findings.

software_deployment_name

type: str, optional, default: “gpt-4.1”

Specialized GPT model deployment name for software-specific analysis tasks. This model is used by the CodeSearchAgent for examining source code.

embedding_endpoint

type: str, optional, default: “”

Optional embedding service endpoint for similarity calculations and analysis quality measurement.

selected_flow

type: str, optional, default: “default”

Analysis workflow type to execute. Currently supported flows:

default: Standard multi-agent analysis workflow
gpt-5: Advanced analysis workflow (future enhancement)

skip_duplicate_errors

type: bool, optional, default: True

When set to True, the notifier will skip analysis for errors that have already been analyzed in the current test run, improving performance and avoiding redundant processing.

Example of log_agent notifier:

notifier:
  - type: log_agent
    azure_openai_endpoint: https://your-resource.openai.azure.com
    azure_openai_api_key: $(azure_openai_api_key)
    general_deployment_name: gpt-4o
    software_deployment_name: gpt-4.1
    selected_flow: default
    skip_duplicate_errors: true

How it works:

Failure Detection: Automatically triggered when test cases fail
Log Analysis: Searches through test execution logs for error patterns
Code Review: Examines related source code if call traces are available
Hypothesis Generation: Generates possible reasons for the failure
Evidence Gathering: Searches for supporting evidence in logs
Root Cause Analysis: Provides comprehensive analysis with actionable insights

The AI analysis results are stored in the test result message’s analysis["AI"] field and can be consumed by other notifiers like HTML or custom reporting systems.

perfevaluation

Evaluates performance test results against predefined criteria and optionally fails tests when targets are not met.

Basic Usage:

notifier:
  - type: perfevaluation
    criteria_file: "perf_criteria.yml"
    output_file: "results.json"
    fail_test_on_performance_failure: true

Parameters:

criteria_file

type: str, optional, default: “*_criteria.yml”

Path or glob pattern to YAML files containing performance criteria.

criteria

type: dict, optional, default: None

Direct criteria definition in runbook. Takes priority over criteria_file.

Example:

notifier:
  - type: perfevaluation
    criteria:
      statistics_times: 1
      error_threshold: 0.1
      statistics_type: average
      groups:
        - name: "NVMe Performance"
          conditions:
            - name: "test_case"
              type: "metadata"
              value: "perf_nvme"
            - name: "vm_size"
              type: "information"
              value: "Standard_L64*"
          metrics:
            - name: "qdepth_32_iodepth_1_numjob_32_setup_raw_bs_4k_cores_32_disks_8_read_iops"
              min_value: 800000.0
              target_value: 979000.0
              error_threshold: 0.25

output_file

type: str, optional, default: None

Output path for detailed evaluation results in JSON format.

statistics_times

type: int, optional, default: None

Number of test runs to use for statistical calculations. If specified, overrides the global setting in criteria YAML.

fail_test_on_performance_failure

type: bool, optional, default: False

Mark tests as failed when performance criteria are not met.

YAML Criteria Format:

Hierarchical format with groups and conditions:

# Global settings
statistics_times: 1
error_threshold: 0.1
statistics_type: average

groups:
  - name: "NVMe Performance - L64 Series"
    description: "Performance criteria for Standard_L64as_v3 and Standard_L64s_v2 VMs"
    error_threshold: 0.20
    statistics_type: average
    statistics_times: 1

    conditions:
      - name: "test_case"
        type: "metadata"
        value: "perf_nvme"
      - name: "vm_size"
        type: "information"
        value: "Standard_L64*"

    metrics:
      - name: "qdepth_32_iodepth_1_numjob_32_setup_raw_bs_4k_cores_32_disks_8_read_iops"
        min_value: 800000.0
        target_value: 979000.0
        error_threshold: 0.25

Global Configuration:

statistics_times: Number of test runs for statistical calculations (default: 1)
error_threshold: Global tolerance for performance deviation (default: 0.1 = 10%)
statistics_type: Statistical method to use - average (default), median, min, or max

Group Configuration:

Each group can override global settings and contains:

name: Group identifier
description: Human-readable description
error_threshold: Group-level tolerance
statistics_type: Statistical method for this group
statistics_times: Number of runs for this group
conditions: Matching rules for test results
metrics: Performance metrics to evaluate

Metric Properties:

min_value: Minimum acceptable value (inclusive)
max_value: Maximum acceptable value (inclusive)
target_value: Expected target value
error_threshold: Acceptable deviation from target (as decimal, e.g., 0.25 = 25%)

Pattern Matching:

Uses fnmatch-style wildcards:

Standard_L64*: Matches Standard_L64as_v3, Standard_L64s_v2, etc.
*nvme*: Test cases containing “nvme”
Standard_D*ads_v5: D-series with specific pattern

Condition Structure:

Each condition must specify three fields:

name: The field name to match (e.g., test_case, vm_size)
type: The condition type - either metadata or information
value: The pattern to match (supports wildcards)

Condition Types:

metadata: Matches test case metadata fields (e.g., test_case name)
information: Matches runtime information fields (e.g., vm_size)
All conditions within a group must match (AND logic)

Example condition:

conditions:
  - name: "test_case"
    type: "metadata"
    value: "perf_nvme*"
  - name: "vm_size"
    type: "information"
    value: "Standard_L*"

Example - Network Performance:

groups:
  - name: "TCP NTTTCP SRIOV Performance"
    conditions:
      - name: "test_case"
        type: "metadata"
        value: "perf_tcp_ntttcp_sriov"
      - name: "vm_size"
        type: "information"
        value: "Standard_D2ads_v5"

    metrics:
      - name: "throughput_in_gbps_conn_1"
        min_value: 10.0
        target_value: 11.89
        error_threshold: 0.30

environment

List of environments. For more information, refer to Node and Environment.

retry

Number of retry attempts for failed deployments, default value is 0.

environments

List of test run environment.

name

type: str, optional, default is empty

The name of the environment.

enabled

type: bool, optional, default is true

Controls whether the environment is loaded and used during test execution. When set to false, the environment will be skipped during initialization. This is useful for definining multiple similar environments in the same runbook.

Example:

environment:
  environments:
    - name: prod_env
      enabled: true  # This environment will be loaded
      nodes:
        - type: local
    - name: dev_env
      enabled: $(use_dev_env)  # Variable-controlled
      nodes:
        - type: local

topology

type: str, optional, default is “subnet”

The topology of the environment, current only support value “subnet”.

nodes

List of node, it can be a virtual machine on Azure or Hyper-V, bare metal or others. For more information, refer to Node and Environment.

Each node supports an enabled field:

enabled (bool, optional, default is true): Controls whether the node is loaded during environment initialization. When set to false, the node will be skipped. This is useful for selecting specific nodes from the same environment configuration.

Example:

environment:
  environments:
    - name: test_env
      nodes:
        - name: node1
          type: local
          enabled: true  # This node will be loaded
        - name: node2
          type: local
          enabled: false  # This node will be skipped
        - name: node3
          type: remote
          address: 192.168.1.100
          enabled: $(enable_node3)  # Variable-controlled

nodes_requirement

List of testing required environments, by default node_count (default is 1), core_count (default is 1), memory_mb (default is 512 MB), data_disk_count (default is 0), nic_count (default is 1), gpu_count (default is 0). The node can be created once the node requirement is met.

type

type: str, optional, default value is “requirement”, supported values are “requirement”, “remote”, “local”.

platform

List of platform, default value is “ready”, current support values are “ready”, “azure”.

testcase

type: list of str, optional, default: lisa

Criteria to select cases.

criteria

type: list of dictionary, optional, default is empty

Select test cases by area, category, name, priority or tags combined with select action.

select_action can be “none”, “include”, “exclude”, “forceInclude” and “forceExclude”, default value is “none”.

testcase:
  - criteria:
      priority: 0
    select_action: include
  - criteria:
      priority: 1
    select_action: exclude

times

type: int, optional, default is 1

Run this group of test cases the specified number of times. This is useful for stress testing or ensuring test reliability.

testcase:
  - criteria:
      priority: 0
    times: 3

retry

type: int, optional, default is 0

Number of retry attempts if a test case fails. When a test case fails, LISA will automatically retry it up to the specified number of times. The test environment is deleted and recreated for each retry attempt to ensure a clean state.

This is particularly useful for:

Tests that may experience transient failures
Flaky tests that need multiple attempts to pass
Tests that interact with external services

testcase:
  - criteria:
      priority: 0
    retry: 2

Note

The retry count is independent of the times count. If both are set, the test will run times × (1 + retry attempts) in the worst case where all attempts fail.

timeout

type: int, optional, default is 0

Timeout in seconds for each test case. When a test case runs, LISA uses the maximum value between the timeout specified in the runbook and the test case’s own metadata timeout. If this field is set to 0 (default) or not specified, only the test case’s metadata timeout is used (which defaults to 3600 seconds / 1 hour if not explicitly set in the test case). This allows you to extend timeouts for specific test runs without modifying the test case code.

Note that this timeout applies to the overall test case execution. Any additional command-level timeouts set within the test case code itself will not be affected by this setting.

testcase:
  - criteria:
      name: verify_deployment_provision_ultra_datadisk
    timeout: 3600

use_new_environment

type: bool, optional, default is False

When set to True, each test case with this rule will be run in a newly created environment. This ensures complete isolation between test cases but increases the overall test execution time.

testcase:
  - criteria:
      name: verify_stop_start_in_platform
    use_new_environment: true

ignore_failure

type: bool, optional, default is False

When set to True, failed test results will be rewritten as success. This is intended as a temporary workaround for known issues and should not be overused.

testcase:
  - criteria:
      name: known_flaky_test
    ignore_failure: true

Warning

This setting masks test failures and should only be used as a temporary measure. Do not use it to hide real issues.