Programming By Example

on July 7, 2020

An introduction

The aim of Programming By Example (PBE) is to develop programs through the synthesis of a series of examples. First, a sequence of actions is performed or given by the user: this is the starting point of a combination of functions which result in a programmatic output, designated for a specific task.

With this technique, users are able to create programs by interacting with the interfaces they are used to, implementing generalizations to problem-solving techniques which are independent from the data they were generated with.

A more technical definition states that PBE is a synthesis technique by which programs are iteratively and automatically generated, from tuples of inputs and outputs called examples

In case the generated program does not operate correctly, a new tuple has to be introduced to adjust the programmatic output. Thus, users provide input/output combinations (examples) of the task they want to perform, and the computer infers a program that is capable of addressing the problem.

Although PBE is targeted at non-expert users and its purpose is to lighten the workload associated with programming, it nevertheless has added value to advanced users because it mitigates tedious and repetitive tasks, optimizing their work. Moreover, the generated code can be reviewed by a human –the output is legible and easy to understand depending on the developer-, given that a large part of the program is normally correct, only some parts are too oriented to the examples it was trained with. This characteristic means a program does not have to simply be taken or discarded as ML black-box models do. Thus, in PBE, a program can be modified if the number of examples is not sufficient.

Nonetheless, this methodology suffers from a series of limitations. For example, the generalization is not broad enough to deal with all the plausible data types, and the program is not able to cope with variations to its output.
The definition of a generic Domain-Specific Language (DSL) is key to a PBE. DSL is a grammar of production rules whose aim is to narrow down the search space; it represents the limits of a PBE system. If a program can be described in terms of a DSL, then a solution may be found. Otherwise, it is impossible, no matter how many examples are provided.

PBE is also known as inductive synthesis: a synthesis process that is based on examples. A deductive synthesis, in turn, is based on logical specifications defined by the user. PBE breaks a common programming rule, in that PBE users are not simply consumers because they can, to some degree, build their own code. This means small scripts are automatically generated for little everyday tasks. For advanced users, PBE can be a helping hand too. It’s especially useful for data scientists who must normally manage big amounts of data before they can apply AI algorithms. Normally, data is obtained from diverse sources which have different degrees of structuring. While they provide users with a high level of flexibility, they make it hard to exploit, combine, and query data. Unfortunately, a major problem associated with inductive synthesis is the ambiguity which results from defining the behaviour of the program and not its exact requirements.

PBE Application and generated output

The main application fields of PBE are robotics, code refactoring, data parsing or query building, and prominently, the so-called data wrangling.

Data wrangling consists in pre-processing the data that is to be fed to other tasks. The process can be divided into three parts: extraction, transformation and formatting. Extraction consists in the generation of structured data from semi-structured sources, such as web pages or JSON files, where a program is built for every field extraction. Transformation addresses type casting and combining fields, e.g. the composition of names from several related fields. Finally, formatting means that a specific format is applied in a repetitive way or a structured output is created from the previously generated data.

Code refactoring allows users to save time on common maintenance tasks, enhancing users’ time management and performance.

Regarding the code generated by PBE tools, code generation is a complex process whose results are not always satisfying: while in an ordinary sense traditional program synthesis consists in creating scripts which satisfy a series of logical conditions, in PBE, scripts are synthesised from a number of input/output states. This model is successful because it allows users to define the desired behaviour, and then, tune-up the result manually. On the other hand, it is difficult to generate code that is consistent with all the examples provided by the user.

The absence of a sufficient number of examples is a common problem during program composition. It is tackled by applying techniques such as Machine Learning (ML), which make it possible to rank intermediate functions, or to extract feedback on the generated programs.

In addition, more complex programs can be created. These programs use sequential predefined functions to perform specific tasks. This is done in AI approaches whose aim is to enhance the results of a PBE process, obtaining programs which solve high-level functions from some simpler, atomic ones.

ML vs PBE

The relationship between ML and PBE is complementary, although, in some cases, they may be used separately to deal with similar problems. While both use example data to produce specific-purpose code, the main difference lies in PBE’s suitability for small repetitive tasks:

PBE programs may be edited and adapted after they have been generated, to ensure they are fit for their final purpose. This can be done to optimize and adjust their functioning. On the contrary, ML models may only be applied to data.
PBE requires a lesser amount of data (examples) to infer a generalization, and in this manner generate a proper output.
ML can be used to enhance the PBE process of program generation, making the search for an ideal function faster. Also, PBE makes it easier to tackle tasks that must be performed prior to the application of AI algorithms. Note that ML has to be used to deal with complex data tasks.
Some researchers have made efforts to apply Neural Networks to PBE code generation, by means of the aforementioned process of using sequences of atomic particular-purpose functions to achieve a complex result.