Skip to main content

Writing extensions

Introduction

This is a very brief overview of the structure of OpenRefine extensions. For more detailed documentation and step-by-step guides please see the following external documentation/tutorials:

OpenRefine makes use of a modified version of the Butterfly framework to provide an extension architecture. OpenRefine extensions are Butterfly modules. You don't really need to know about Butterfly itself, but you might encounter "butterfly" here and there in the code base.

Extensions that come with the code base are located under the extensions sub-directory, but when you develop your own extension, you can put its code anywhere as long as you point Butterfly to it. That is done by any one of the following methods

  • refer to your extension's directory in the butterfly.properties file through a butterfly.modules.path setting.
  • specify the butterfly.modules.path property on the command line when you run OpenRefine. This overrides the values in the property file, so you need to include the default values first e.g. -Dbutterfly.modules.path=modules,../../extensions,/path/to/your/extension

Please note that you should bundle any dependencies yourself, so you are insulated from OpenRefine packaging changes over time.

Directory Layout

A OpenRefine extension sits in a file directory that contains the following files and sub-directories:

pom.xml
src/
com/foo/bar/... *.java source files
module/
*.html, *.vt files
scripts/... *.js files
styles/... *.css files
images/... image files
MOD-INF/
lib/*.jar files
classes/... java class files
module.properties
controller.js

The file named module.properties (see example) contains the extension's metadata. Of importance is the name field, which gives the extension a name that's used in many other places to refer to it. This can be different from the extension's directory name.

name = my-extension-name

Your extension's client-side resources (.html, .js, .css files) stored in the module/ sub-directory will be accessible from http://127.0.0.1:3333/extension/my-extension-name/ when OpenRefine is running.

Also of importance is the dependency

requires = core

which makes sure that the core module of OpenRefine is loaded before the extension attempts to hook into it.

The file named controller.js is responsible for registering the extension's hooks into OpenRefine. Look at the sample-extension extension's controller.js file for an example. It should have a function called init() that does the hook registrations.

The pom.xml file is an Apache Maven build file. You can make a copy of the sample extension's pom.xml file to get started. The important point here is that the Java classes should be built into the module/MOD-INF/classes sub-directory.

Note that your extension's Java code would need to reference some libraries used in OpenRefine and OpenRefine's Java classes themselves. These dependencies are reflected in the Maven configuration for the extension.

Sample extension

The sample extension is included in the code base so that you can copy it and get started on writing your own extension. After you copy it, make sure you change its name inside its module/MOD-INF/controller.js file.

Basic Structure

The sample extension's code is in refine/extensions/sample/. In that directory, Java source code is contained under the src sub-directory, and webapp code is under the module sub-directory. Here is the full directory layout:

refine/extensions/sample/
build.xml (ant build script)
src/
com/google/refine/sampleExtension/
... Java source code ...
module/
MOD-INF/
module.properties (module settings)
controller.js (module init and routing logic in Javascript)
classes/
... compiled Java classes ...
lib/
... Java jars ...
... velocity templates (.vt) ...
... .css files ...
... client-side files (.html, .css, .js, image files) ...

The sub-directory MOD-INF contains the Butterfly module's metadata and is what Butterfly looks for when it scans directories for modules. MOD-INF serves similar functions as WEB-INF in other web frameworks.

Java code is built into the sub-directory classes inside MOD-INF, and supporting external Java jars are in the lib sub-directory. Those will be automatically loaded by Butterfly. (The build.xml script is wired to compile into the classes sub-directory.)

Client-side code is in the inner module sub-directory. They can be plain old .html, .css, .js, and image files. There are also Velocity .vt files, but they need to be routed inside MOD-INF/controller.js.

MOD-INF/controller.js lets you configure the extension's initialization and URL routing in Javascript rather than in Java. For example, when the requested URL path is either / or an empty string, we process and return MOD-INF/index.vt ( see http://127.0.0.1:3333/extension/sample/ if OpenRefine is running).

The init() function in controller.js allows the extension to register various client-side handlers for augmenting pages served by Refine's core. These handlers are feature-specific. For example, this is where the jython extension adds its parser. As for the sample extension, it adds its script project-injection.js and style project-injection.css into the /project page. If you view the source of the /project page, you will see references to those two files.

Wiring Up the Extension

The Extensions are loaded by the Butterfly framework. Butterfly refers to these as 'modules'. The location of modules is set in the main/webapp/butterfly.properties file. Butterfly simply descends into each of those paths and looks for any MOD-INF directories.

For more information, see Extension Points.

Extension points

Client-side: Javascript and CSS

The UI in OpenRefine for working with a project is coded in the /main/webapp/modules/core/project.vt file. The file is quite small, and that's because almost all of its content is to be expanded dynamically through the Velocity variables $scriptInjection and $styleInjection. So that your own Javascript and CSS files get loaded, you need to register them with the ClientSideResourceManager, which is done in the /module/MOD-INF/controller.js file. See the controller.js file in this sample extension code for an example.

In the registration call, the variable module is already available to your code by default, and it refers to your own extension.

ClientSideResourceManager.addPaths(
"project/scripts",
module,
[
"scripts/foo.js",
"scripts/subdir/bar.js"
]
);

You can specify one or more files for registration, and their paths are relative to the module sub-directory of your extension. They are included in the order listed.

Javascript Bundling: Note that project.vt belongs to the core module and is thus under the control of the core module's controller.js file. The Javascript files to be included in project.vt are by default bundled together for performance. When debugging, you can prevent this bundling behavior by setting bundle to false near the top of that controller.js file. (If you have commit access to this code base, be sure not to check that change in.)

Client-side: Images

We recommend that you always refer to images through your CSS files rather than in your Javascript code. URLs to images will thus be relative to your CSS files, e.g.,

.foo {
background: url(../images/x.png);
}

If you really really absolutely need to refer to your images in your Javascript code, then look up your extension's URL path in the global Javascript variable ModuleWirings:

ModuleWirings["my-extension"] + "images/x.png"

Client-side: HTML Templates

Beside Javascript, CSS, and images, your extension might also include HTML templates that get loaded on the fly by your Javascript code and injected into the page's DOM. For example, here is the Cluster edit dialog template, which gets loaded by code in the equivalent javascript file 'clustering-dialog.js':

var dialog = $(DOM.loadHTML("core", "scripts/dialogs/clustering-dialog.html"));

DOM.loadHTML returns the content of the file as a string, and $(...) turns it into a DOM fragment. Where "core" is, you would want your extension's name. The path of the HTML file is relative to your extension's module sub-directory.

Client-side: Project UI Extension Points

Getting your extension's Javascript code included in project.vt doesn't accomplish much by itself unless your code also registers hooks into the UI. For example, you can surely implement an exporter in Javascript, but unless you add a corresponding menu command in the UI, your user can't use your exporter.

The main menu can be extended by calling any one of the methods MenuBar.appendTo, MenuBar.insertBefore, and MenuBar.insertAfter. Each method takes 2 arguments: an array of strings that identify a particular existing menu item or submenu, and one new single menu item or submenu or an array of menu items and submenus. For example, to insert 2 menu items and a menu separator before the menu item Project > Export Filtered Rows > Templating..., write this Javascript code wherever that would execute when your Javascript files get loaded:

MenuBar.insertBefore(
["core/project", "core/export", "core/export-templating"],
[
{
"label":"Menu item 1",
"click": function() { ... }
},
{
"label":"Menu item 2",
"click": function() { ... }
},
{} // separator
]
);

The array ["core/project", "core/export", "core/export-templating"] pinpoints the reference menu item.

See the beginning of /main/webapp/modules/core/scripts/project/menu-bar.js for IDs of menu items and submenus.

Column Header Menu

The drop-down menu of each column can also be extended, but the mechanism is slightly different compared to the main menu. Because the drop-down menu for a particular column is constructed on the fly when the user actually clicks the drop-down menu button, extending the column header menu can't really be done once at start-up time, but must be done every time a column header menu gets created. So, registration in this case involves providing a function that gets called each such time:

DataTableColumnHeaderUI.extendMenu(function(column, columnHeaderUI, menu) { ... do stuff to menu ... });

That function takes in the column object (which contains the column's name), the column header UI object (generally not so useful), and the menu to extend. In the previous code line where it says "do stuff to menu", you can write something like this:

MenuSystem.appendTo(menu, ["core/facet"], [
{
id: "core/text-facet",
label: "My Facet on " + column.name,
click: function() {
... use column.name and do something ...
}
},
]);

In addition to MenuSystem.appendTo, you can also call MenuSystem.insertBefore and MenuSystem.insertAfter which the same 3 arguments. To see what IDs you can use, see the function DataTableColumnHeaderUI.prototype._createMenuForColumnHeader in /main/webapp/modules/core/scripts/views/data-table/column-header-ui.js.

Cell renderers

From OpenRefine 3.7 onwards, extensions can also customize the way cells are rendered. This is done by registering a renderer, which is responsible for transforming the JSON representation of a cell into DOM elements rendering it:

class MyCellRenderer {

constructor() {
super();
// some initialization code can be added here
}

render(
rowIndex, // the 0-based row index
cellIndex, // the position of the column to which the cell belongs
cell, // the deserialized JSON representation of the cell
cellUI // the parent CellUI object which called this renderer
) {
// this renderer has the opportunity to return a DOM element represeting the cell, as follows:
return $('<span>rendered cell</span>');
// or it may not return anything, in which case the next cell renderer will be executed
}
}

OpenRefine holds an ordered list of cell renderers in its CellRendererRegistry. To render a cell, OpenRefine will execute each cell renderer in order, until the first renderer which returns a DOM element. The following renderers are not executed and that DOM element is used as the cell representation.

The registration of the renderer is done with

CellRendererRegistry.addRenderer(
'my-renderer-identifier', // a string identifying our new renderer
new MyCellRenderer(), // the renderer itself
'recon' // the existing renderer it should be inserted before
);

The default renderers available in OpenRefine itself can be found in main/webapp/modules/core/scripts/views/data-table/cell-renderers/registry.js.

Server-side: Ajax Commands

The client-side of OpenRefine gets things done by calling AJAX commands on the server-side. These commands must be registered with the OpenRefine servlet, so that the servlet knows how to route AJAX calls from the client-side. This can be done inside the init function in your extension's controller.js file, e.g.,

function init() {
var RefineServlet = Packages.com.google.refine.RefineServlet;
RefineServlet.registerCommand(module, "my-command", new Packages.com.foo.bar.MyCommand());
}

Your command will then be accessible at http://127.0.0.1:3333/command/my-extension/my-command.

Server-side: Operations

Most commands change the project's data. Most of them do so by creating abstract operations. See the Changes, History, Processes, and Operations section of the Server Side Architecture document.

You can register an operation class in the init function as follows:

Packages.com.google.refine.operations.OperationRegistry.registerOperation(
module,
"operation-name",
Packages.com.foo.bar.MyOperation
);

Do not call new to construct an operation instance. You must register the class itself. The class should have a static function for reconstructing an operation instance from a JSON blob:

static public AbstractOperation reconstruct(Project project, JSONObject obj) throws Exception {
...
}

Server-side: GREL

GREL can be extended with new functions. This is also done in the init function in controller.js, e.g.,

Packages.com.google.refine.grel.ControlFunctionRegistry.registerFunction(
"functionName", new Packages.com.foo.bar.TheFunctionClass());

You might also want to provide new variables (beyond just value, cells, row, etc.) available to expressions. This is done by registering a binder that implements the interface com.google.refine.expr.Binder:

Packages.com.google.refine.expr.ExpressionUtils.registerBinder(
new Packages.com.foo.bar.MyBinder());

Server-side: Importers

You can register an importer as follows:

Packages.com.google.refine.importers.ImporterRegistry.registerImporter(
"importer-name", new Packages.com.foo.bar.MyImporter());

The string "importer-name" isn't important at all. It's not really related to file extension or mime-type. Just use something unique. Your importer will be explicitly called to test if it can import something.

Server-side: Exporters

You can register an exporter as follows:

Packages.com.google.refine.exporters.ExporterRegistry.registerExporter(
"exporter-name", new Packages.com.foo.bar.MyExporter());

The string "exporter-name" isn't important at all. It's only used by the client-side to tell the server-side which exporter to use. Just use something unique and, of course, relevant.

Server-side: Overlay Models

Overlay models are objects attached onto a core Project object to store and manage additional data for that project. For example, the schema alignment skeleton is managed by the Protograph overlay model. An overlay model implements the interface com.google.refine.model.OverlayModel and can be registered like so:

Packages.com.google.refine.model.Project.registerOverlayModel(
"model-name",
Packages.com.foo.bar.MyOverlayModel);

Note that you register the class , not an instance. The class should implement the following static method for reconstructing an overlay model instance from a JSON blob:

static public OverlayModel reconstruct(JSONObject o) throws JSONException {
...
}

When the project gets saved, the overlay model instance's write method will be called:

public void write(JSONWriter writer, Properties options) throws JSONException {
...
}

Server-side: Scripting Languages

A scripting language (such as Jython) can be registered as follows:

Packages.com.google.refine.expr.MetaParser.registerLanguageParser(
"jython",
"Jython",
Packages.com.google.refine.jython.JythonEvaluable.createParser(),
"return value"
);

The first string is the prefix that gets prepended to each expression so that we know which language the expression is in. This should be short, unique, and identifying. The second string is a user-friendly name of the language. The third is an object that implements the interface com.google.refine.expr.LanguageSpecificParser. The final string is the default expression in that language that would return the cell's value.

In 2018 we are making important changes to OpenRefine to modernize it, for the benefit of users and contributors. This page describes the changes that impact developers of extensions or forks and is intended to minimize the effort required on their end to follow the transition. The instructions are written specifically with extension maintainers in mind, but fork maintainers should also find it useful.

This document describes the migrations in the order they are committed to the master branch. This means that it should be possible to perform each migration in turn, with the ability to run the software between each stage by checking out the appropriate git commit.