paper review

User-Centred Design and Evaluation of Infrastructural Software

Francisco Bernardo

29 Dec 2018 • 7 min read

This is a review of two publications about user-centred design (UCD) and evaluation of infrastructural software (also known as middleware)—the 2002 technical report “Stuck in the Middle: Bridging the Gap Between Design, Evaluation, and Middleware” and the follow-on CHI 2003 paper “Stuck in the Middle: The Challenges of User-Centered Design and Evaluation for Infrastructure” — from Edwards, Bellotti, Dey and Newman.

The fundamental question asked is: “How can designers of infrastructural software design and evaluate features without knowing about client applications and their users?”

In these publications, Edwards et al. address the problem of UCD and evaluation of ‘infrastructural software’ or ‘middleware’ (cf. both publications as their use of the terms). Infrastructural software is a class of software designed to support the development and operation of other software, i.e. client software such as end-user applications or other middleware. Infrastructural software provides client applications with technical capabilities, such as supporting styles of interaction or determining application features. Infrastructural software can take many forms, such as software libraries, online services, toolkits for software development and other platforms, and authors provide two contrasting examples: graphical toolkits, which enable rapid creation of the application presentation layer, and document management infrastructures, which handle documents and data objects. Of course, examples can extend to many other application domains, such libraries for security and cryptography, computer vision, web APIs for financial services, etc.

From a UCD perspective, determining which features go into an application and how well these features address the needs of users, determines the usefulness of the software, and ultimately, its worth. However, apart from conventional “technical” software evaluation metrics (performance, scalability, security, robustness, etc.), the authors found no user-centred criteria — i.e. usability and usefulness — for designing and evaluating infrastructure features.

Designing and evaluating user-centred infrastructural software is a task with inherent difficulty since the technical features of underlying infrastructural code are not directly visible, but rather expressed through the features of the client applications. This “level of indirection” makes defining clear criteria for designing and evaluating infrastructural software features very challenging. The authors stated this problem on the first paper as the middleware design and evaluation “gaps” and use this as the point of departure for their dual foci investigation.

For the study of the design gap of user-centred infrastructural software, Edwards et al. look at the process of determining which features are designed into it and exposed. They attempt to go beyond the more conventional designer-centric approach (i.e. based on the designer’s experience, intuition, sensibility, …) of abstracting feature requirements from potential client applications into infrastructural software features. Rather, they attempt to focus on:

the relationship between designing client code features and infrastructural software features — how to couple their design more tightly? How to design for this direct connection when client code does not yet exist?
the impact of users’ context and context of usage of client applications on the design of the middleware features
achieving a feature set that strikes a balance between usefulness and complexity — avoiding feature creep and a bloated, overly complex system, and at the same time developing a system that does not require constant updates.

Edwards et al. concede that infrastructural software can manifest itself in any number of possible applications and make a clear positioning about middleware evaluation gap, in which only client applications and their users should be used for assessment purpose—“the aspects of [a toolkit] that are designed to support a particular user experience can only be evaluated in the context of users, not programmers, and thus must be evaluated indirectly — through applications built using the toolkit.”. This led their research to focus on:

how to choose applications to evaluate the middleware while considering their users and context of use.
the usefulness of this kind of indirect evaluation—what “does the manifestation of the technology in a particular application say about the capabilities (or even desirability) of the middleware itself?”
the adequacy of techniques for evaluation — are the “techniques for evaluating applications acceptable when our goal is to evaluate the middleware upon which those applications are based?”

Edwards et al. analyse the major challenges and lessons learned on the course of a set of case studies around middleware and infrastructural code that was motivated by human concerns. They propose a set of guidelines to bridge the middleware design and evaluation gaps which I quote and paraphrase below, in order to highlight the main takeaways:

“Prioritise core infrastructure features” — use a minimalistic infrastructure to test and assess the value of features that support core design ideas. The first step should be the identification core design ideas and features that implement them. Only after their validation, get feedback to inform new features while striving to avoid feature creep. Edwards et al. illustrate this with the Placeless documents system, where they used a scenario based-approach to identify a considerable amount of features. Although most of them were motivated to improve user experience, their implementation led to technical bloatware.
“First, build prototypes with high fidelity that express the core objectives of the infrastructure: Initial prototypes should leverage the fullest extent of the power of the infrastructure, since the assumption that this power makes a difference is the critical thing to test first”—identify the features of highest value and validate them for the class of applications that the infrastructure should enable. In Placeless, the key features that crucially leveraged the novel fluid organisation were the flexible model for document metadata coupled with a dynamic query mechanism. Instead, too many resources were diverted into building robustness and support features for real-world and long-term use applications.
“Any test-application built to demonstrate the infrastructure must also satisfy the usual criteria of usability and usefulness” — core test applications that satisfy usability and usefulness demand “requirements gathering, prototyping, evaluation, and iteration”; the more ambitious the application, the more time is required to satisfy these requirements. This demands thoughtful management as it competes for resources available for building and testing core infrastructure features. Edwards et al. pointed out how shortcomings of the two real-world testing applications in meeting usability and usefulness criteria—because of their complexity and of the tradeoffs of evaluating the infrastructure—undermined the evaluation strategy.
“Initial proof-of-concept applications should be lightweight” — early testing of core infrastructure features should focus on getting the basics of the infrastructure right and assessing its intrinsic value, rather than on long-term use feedback of a well-rounded, real-world- style application.
“Be clear about what your test-application prototypes will tell you about your infrastructure”—the purpose of a core-test application is to understand the pros and cons of the infrastructure; the demands of building test applications divert from this purpose and should be minimised.
“Do not confuse the design and testing of experimental infrastructure with the provision of an infrastructure for other experimental application developers”—core-test applications should contribute to progressive evolution and incremental stability of infrastructural code, rather than to open up new feature contributions, unmanaged change, and all the inherent problems that these bring (“…propagation of redundant, missing or changing features and consequential chaos and breakage.”).
“Define a limited scope for test-applications and permissible uses of the infrastructure”— regardless of the numerous possibilities that the infrastructure offers for exploitation and leveraging complexity, there should be a common interpretation of its purpose, about what features to use and how best to use them, in order to facilitate the assessment of its strengths and weaknesses.
“There is no point in faking components and data if you intend to test for user experience benefits”—this item is self-explanatory and mostly derived from a specific decision in Context Toolkit project about using simulation support to compensate for the lack of specific sensors in real use scenario.
“Understand that the scenarios you use for evaluation may not reflect how the technology will ultimately be used […] the process of determining the most appropriate embodiment of the technology comes from further scenarios, market analysis, and studies of the intended setting of the technology.” — it should be well acknowledged that evaluation scenarios provide limited coverage. Future use and market adoption for instance, are outside the scope of infrastructure evaluation as they have different requirements.
“Anticipate the consequences of the tradeoff between building useful/usable applications versus applications that test the core features of the middleware.” although core test applications will hardly fulfil the usefulness/usability criteria the same way full-fledged applications, they provide benefits as listed above for leading the discovery of useful infrastructural extensions. Further, they can pave the way for more “serious” client applications.

In summary and appropriating of Bill Buxton’s famous title, UCD and evaluation of infrastructural software are about getting the right infrastructural features and designing them right, just as much as with other UCD artefacts, such as full-fledged GUI applications. For Edwards et al., the main point is that UCD and evaluation of infrastructural software come with a specific level of indirection that makes it more challenging. The specific challenges of UCD for infrastructure include using multiple scenarios to drive design, providing a minimalistic set of features and avoiding bloat while still providing usefulness, and designing it for usability. The specific challenges for user-centred evaluation are applying it in real use contexts and via lightweight technology demonstrations, assessing its value early and focusing on usefulness and usability as the fundamental evaluation criteria.

The value that infrastructural features provide is obviously tied to how they are expressed through client code applications and how these are extended with new technical capabilities or more usable interaction styles. A set of core test applications should support the process of validation and assessment of infrastructure code features if they represent the class of applications that the infrastructure is intended to support. The evaluation should be directed at informing, improving, validating and understanding the pros and cons of the highest value features of the infrastructural code. The design of these core test applications should follow user-centred design guidelines and therefore, strive for usability and usefulness; this entails that core-test applications have their own independent cycle of “requirements gathering, prototyping, evaluation, and iteration”. The caveats of this process are that this is a resource-intensive process which competes with developing infrastructural code and any collateral work should be deferred.

References:

Edwards, W. K., Bellotti, V., Dey, A. K., & Newman, M. W. (2002). Stuck in the Middle: Bridging the Gap Between Design, Evaluation, and Middleware. Intel research Berkeley. California. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.3.133&rep=rep1&type=pdf
Edwards, W. K., Bellotti, V., Dey, A. K., & Newman, M. W. (2003). Stuck in the Middle: The Challenges of User-Centered Design and Evaluation for Infrastructure. CHI 2003, 297–304. http://doi.org/10.1145/642611.642664

Sign up for more like this.