paper review

Simard et al. “Machine Teaching: A New Paradigm for Building Machine Learning Systems” review

Francisco Bernardo

29 Jul 2017 • 5 min read

Another review in my Human-Centred Machine Learning review series, authored by Simard et al. from Microsoft Research, “Machine Teaching: A New Paradigm for Building Machine Learning Systems” is a position paper and manifesto for Machine Teaching.

[Illustration IDEO — https://www.fastcodesign.com/90147010/exclusive-ideos-plan-to-stage-an-ai-revolution]

Simard et al. propose Machine Teaching as a new discipline that is much related to machine learning, but fundamentally distinct:

Machine Learning (ML) research - “aims at making the learner better by improving ML algorithms”
Machine Teaching (MT) research - “aims at making the teacher more productive at building machine learning models.”

These operational definitions make the distinction stand out by highlighting the human-centred nature of MT and the shift of focus from optimising ML performance to optimising ML productivity. MT is motivated by the frailties of the traditional lifecycle of a ML model, which Simard et al. describe as including:

long iterations for model building (typically weeks, incl. data collection, labelling, training, evaluation, optimisation, etc.)
reasonable but temporary stability (typically months, until it breaks for many reasons, e.g., covariate shifts, variations in the feature space, label noise, concept evolution, software bugs and updates, etc.)
problems in re-iterating the process (documentation, lack of available expertise, moving staff, lack of modularity, high maintenance costs, etc.)

Simard et al. propose MT as an holistic approach to reduce human costs (i.e., required expertise and maintenance time) in the process of teaching a machine learner. Under the MT paradigm, the teacher is shielded from the complexities of the algorithmic runtime and optimisation procedures with a solution that uses well-defined and standardised interfaces and ML algorithms that support those interfaces. These interfaces should 1) describe inputs (feature values) and outputs (labels, predictions) of learning algorithms, 2) enable “examples to be distinguished in meaningful ways”, 3) enable the addition and removal of features for improving feature blindness and approximation error, and 4) enable the addition of labelled examples to to improve the estimation error.

Simard et al. took inspiration in many aspects of software engineering (SE) to think and design the Machine Teaching approach. For instance, when solving complex problems, modularisation and decomposition are important principles used in SE. Also supporting collaboration through adoption of standardised tools (e.g., programming languages, APIs, documentation, design patterns, componentisation, version control, etc.) enables to scale to multiple contributions to the solution of the complex problem. Simard et al. extrapolate on these principles — and the historical evolution of programming, which focused on compute performance, expanded to domain applications, and through subsequent explosions resulting from personal computing, high-level programming languages and web programming— to level up the expectations of MT to the level of SE.

The role of machine teachers

Simard et al. propose that “the role of the teacher is to transfer knowledge to the learning machine so that it can generate a useful model that can approximate a concept”. They provide the following set of operational definitions for understanding what they meant:

“A concept is a mapping from any example to a label value.”
“A feature is a concept that assigns each example a scalar value.”
“A teacher is the person who transfers concept knowledge to a learning machine.”
“Selection is the process by which teachers gain access to an example that exemplies useful aspects of a concept.”
“A label is a (example, concept value) pair created by a teacher in relation to a concept.”
“A schema is a relationship graph between concepts.”
“A generic feature is a set of related feature functions.”
“Decomposition is the act of using simpler concepts to express more complex ones.”

Simard et al. also synthesised a set of principles for MT:

Universal Teaching Language - in order to support and enable different teachers, Simard et al. propose the standardisation of a language as one simple and easy-to-learn interface that is agnostic of ML algorithms, but that provides access to their power by enabling to exchange them according to the best match for the concept to learn.
Feature completeness - all desired target concepts should be “ ‘realisable’ through a composition of models and features”. The assumption is that it is the system’s responsibility to provide feature completeness, so that the teacher can focus on exploring, adding and discriminating features and examples to augment the capacity of the system to model the concept.
Rich and diverse sampling set - the data set should enable the teacher to explore it “to express knowledge through selection”. Simard et al. propose the need for new ways of collecting data that retain as much of the semantic value of data as possible. That they imply that storing data indiscriminately could be a solution (“effort of storing data is negligible compared to the cost of teaching”).
Distribution robustness - teacher should be able to explore and label freely without concerns. A critical assumption made Simard et al. is that a teacher will be able reach a correct teaching outcome (i.e., a robust model that is correct for any example) given a rich and diverse sampling set, feature completeness and ML algorithms that are robust to covariate shift. Covariate shifts refer to changes in the distribution of the new examples that make deployed model or running in the wild, to loose efficacy. It is one of causes for models to break and that require a new model-building iteration. I would say that this is a very difficult conjunction of factors, and should act as a constrain of application of the MT process.
Modular development - MT should support decomposition in concept modelling through modular development (i.e. decomposing concepts into sub-concepts, using models as features of other models). Simard et. postulate that it can be achieved by standardising interfaces for models and features, in analogy to elements of integrated development environments, such as solution, projects and project dependencies.
Version Control - all teacher’s actions are relevant and contribute to build a concept “program”. Hence they should be stored, analog to code versioning and commits, and used to facilitate collaboration between different teachers and integrate their contributions.

Mainly the paper proposes a set of principles for an unconventional and disintermediated approach to teaching ML systems, which grants the machine teacher full ownership of the process, cutting ML experts and system engineers out of the loop. Here, the domain and teaching knowledge are the simultaneous conditions that elect teachers to this role. As domain experts, machine teachers should understand better the concepts, and with the right infrastructural support and the given assumptions about how this support is provided, it is claimed that machine teachers will be able to express concepts and teach them effectively.

Reference

Patrice Simard, Saleema Amershi, Max Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Chris Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, John Wernsing (2017) Machine Teaching: A New Paradigm for Building Machine Learning Systems

Sign up for more like this.