Brilliaz

A/B testing

How to run experiments measuring accessibility changes with representative sampling of assistive technology users

This evergreen guide outlines rigorous experimental design and sampling strategies to measure accessibility shifts, ensuring inclusive participation from assistive technology users and yielding actionable, reliable insights for designers and researchers alike.

By Ian Roberts

July 23, 2025

Accessibility research thrives on systematic experimentation that centers user experience while controlling for confounding factors. Begin by framing a clear hypothesis about how a given change might influence usability, readability, navigation, or performance for assistive technology users. Establish measurable outcomes that align with real-world tasks, such as completing a form, locating information, or performing a sequence of actions within an app. Develop a stable baseline by testing current interfaces with a representative sample across assistive technologies. Document context, tasks, metrics, and environmental conditions so replication is straightforward. Ensure consent, privacy, and accessibility of study materials are integrated from the outset to support ethical research practices.

Once you have a baseline, plan your sampling to reflect diverse accessibility needs and device configurations. Identify variables such as screen readers, magnification levels, keyboard navigation proficiency, voice control, and cognitive load. Include participants with varying disability types to avoid skewed results toward one user profile. Determine sample size with a formal calculation that balances statistical power against practical constraints like recruitment time and budget. Use stratified sampling to guarantee representation of key subgroups, and consider quota-based approaches if certain assistive technologies are less common in your user population. Predefine inclusion criteria, compensation policies, and accessibility accommodations during participation.

Ensuring robust sampling and rigorous measurement methods

The measurement strategy should mirror how users interact in ordinary contexts, not just laboratory tasks. Combine objective metrics, such as error rates, task completion times, and interaction counts, with subjective feedback captured through accessible surveys and interviews. Ensure that tasks align with standard workflows in the product domain, from onboarding to routine maintenance. Use counterbalanced task orders to minimize learning effects, and implement randomization where appropriate to eliminate systematic biases. Record environmental variables like device type, operating system, network conditions, and screen reader versions. Analyze data with methods that accommodate non-normal distributions and missing values, using imputation strategies that preserve essential relationships.

Transparency is essential for credible findings. Pre-register your study design, hypotheses, and analysis plan to deter selective reporting. Publish a detailed protocol describing recruitment methods, materials, and ethics approvals. During analysis, report confidence intervals, effect sizes, and practical significance alongside p-values, helping stakeholders assess real-world impact. Include sensitivity analyses to demonstrate robustness under alternative assumptions. When sharing results, provide accessible summaries for non-technical audiences and supply data dictionaries that clarify variable definitions. Encourage independent replication by sharing anonymized datasets and analysis scripts in a repository with clear licensing.

Handling variability in assistive technology ecosystems and user capabilities

Recruitment should target a broad audience of assistive technology users to avoid biased conclusions. Leverage partnerships with disability organizations, accessibility consultants, and community groups to reach potential participants who reflect varied ages, languages, and cultural backgrounds. Offer multiple participation modalities, including remote, in-person, and asynchronous tasks, to reduce barriers. Provide interpreters or captions as needed to support comprehension during consent and instructions. Maintain flexible schedules and accessible facilities, and verify assistive technology compatibility before sessions begin. Track response rates and reasons for dropout to identify and address points of friction in the process, adjusting outreach strategies accordingly. Document demographic and usage characteristics for stratified analyses.

Data quality hinges on precise task scripting and instrumentation. Create standardized prompts and avoid ambiguous language that could confuse participants across diverse assistive technologies. Instrument devices to capture consistent metrics, ensuring timestamps, event logs, and interaction traces are synchronized. Calibrate tools to account for differences in verbosity, speech recognition accuracy, and keyboard layouts. Establish adjudication rules for ambiguous outcomes and implement double coding for qualitative responses. Use pilot studies to refine materials and confirm that all accessibility features function as intended. Maintain rigorous version control so researchers can reproduce the exact experimental conditions.

Translating results into design decisions and policy implications

Variability in devices, software, and user proficiency is inevitable, but it can be managed. Implement a factorial design when feasible to explore the influence of multiple factors such as device type, assistive technology version, and user expertise. Use blocking to group similar sessions, reducing variance due to extraneous conditions. Record explicit details about each participant’s device, software, and customization settings, as these may influence outcomes. Incorporate adaptive difficulty in tasks to prevent ceilings or floors that obscure true effects. Analyze interactions between factors to identify combinations that yield the most meaningful accessibility improvements or unintended regressions.

When changes yield mixed results, interpret findings with nuance and care. Distinguish statistical significance from practical relevance, particularly in accessibility where small gains can translate into meaningful everyday benefits. Explore subgroup effects to determine whether particular combinations of assistive technology and interface adjustments help specific user groups more than others. Present confidence intervals that reflect uncertainty and acknowledge limitations due to sample size or measurement noise. Offer actionable recommendations that consider maintenance costs, scalability, and compatibility with existing accessibility guidelines to support informed decision-making.

Best practices for ongoing, representative accessibility experimentation

The ultimate goal of rigorous testing is to guide design decisions that improve accessibility without compromising other usability goals. Translate findings into concrete design changes, such as simplifying navigation patterns, enhancing focus management, or adjusting color contrast targets. Prioritize changes that deliver the greatest benefit across the widest spectrum of assistive technologies while preserving performance for all users. Align recommendations with recognized accessibility standards and industry best practices, but tailor them to the product’s context and constraints. Document expected trade-offs and estimation of long-term impact to help leaders allocate resources effectively and justify investments in accessibility.

Stakeholder engagement is key to turning data into action. Present findings in accessible formats for product teams, executives, and end users, incorporating visualizations, narratives, and concrete examples. Facilitate workshops where designers, researchers, and engineers review results and brainstorm iterative improvements. Build a roadmap that sequences enhancements by impact, feasibility, and risk, including short-term wins and long-term commitments. Establish metrics for ongoing monitoring that extend beyond release cycles, enabling continuous refinement. Encourage cross-functional accountability by assigning owners for each recommended change and defining milestones for validation studies.

As accessibility evolves, so should your experimentation framework. Regularly refresh representative samples to reflect changing technologies, user needs, and product ecosystems. Schedule periodic re-testing of core tasks after major updates and whenever new assistive technology features are released. Maintain a living protocol that incorporates lessons learned, updates to measurement definitions, and improved recruitment strategies. Foster a culture of curiosity where teams seek to understand unintended consequences and pursue incremental improvement. Ensure that ethical considerations remain central, including voluntary participation, fair compensation, and clear communication about how data will be used to advance accessibility.

In continuously evolving digital environments, the right method is as important as the right outcome. Use rigorous experimental controls combined with empathetic user engagement to build confidence among stakeholders. Emphasize transparency, reproducibility, and inclusivity in every phase—from planning and recruitment to analysis and dissemination. Prioritize accessibility in reporting so that stakeholders internalize the value of inclusion and invest in durable, scalable solutions. By grounding decisions in representative sampling and robust analytics, organizations can deliver interfaces that serve everyone more effectively, while advancing professional standards for accessibility research and product development.

How to design consistent randomization strategies to prevent contamination across treatment and control groups.

Crafting robust randomization in experiments requires disciplined planning, clear definitions, and safeguards that minimize cross-group influence while preserving statistical validity and practical relevance across diverse data environments.

Get marketing news you’ll actually want to read