This is a very brief overview of the structure of OpenRefine extensions. For more detailed documentation and step-by-step guides please see the following external documentation/tutorials:
- Giuliano Tortoreto has written documentation detailling how to build extension for OpenRefine
- Owen Stephens has written a guide to developing an extension which adds new GREL functions to OpenRefine.
OpenRefine makes use of a modified version of the Butterfly framework to provide an extension architecture. OpenRefine extensions are Butterfly modules. You don't really need to know about Butterfly itself, but you might encounter "butterfly" here and there in the code base.
Extensions that come with the code base are located under the extensions subdirectory, but when you develop your own extension, you can put its code anywhere as long as you point Butterfly to it. That is done by any one of the following methods
- refer to your extension's directory in the butterfly.properties file through a
- specify the butterfly.modules.path property on the command line when you run OpenRefine. This overrides the values in the property file, so you need to include the default values first e.g.
Please note that you should bundle any dependencies yourself, so you are insulated from OpenRefine packaging changes over time.
A OpenRefine extension sits in a file directory that contains the following files and sub-directories:
The file named module.properties (see example) contains the extension's metadata. Of importance is the name field, which gives the extension a name that's used in many other places to refer to it. This can be different from the extension's directory name.
Your extension's client-side resources (.html, .js, .css files) stored in the module/ subdirectory will be accessible from http://127.0.0.1:3333/extension/my-extension-name/ when OpenRefine is running.
Also of importance is the dependency
which makes sure that the core module of OpenRefine is loaded before the extension attempts to hook into it.
The file named controller.js is responsible for registering the extension's hooks into OpenRefine. Look at the sample-extension extension's controller.js file for an example. It should have a function called init() that does the hook registrations.
pom.xml file is an Apache Maven build file. You can make a copy of the sample extension's
pom.xml file to get started. The important point here is that the Java classes should be built into the
Note that your extension's Java code would need to reference some libraries used in OpenRefine and OpenRefine's Java classes themselves. These dependencies are reflected in the Maven configuration for the extension.
The sample extension is included in the code base so that you can copy it and get started on writing your own extension. After you copy it, make sure you change its name inside its
The sample extension's code is in
refine/extensions/sample/. In that directory, Java source code is contained under the
src sub-directory, and webapp code is under the
module sub-directory. Here is the full directory layout:
MOD-INF contains the Butterfly module's metadata and is what Butterfly looks for when it scans directories for modules.
MOD-INF serves similar functions as
WEB-INF in other web frameworks.
Java code is built into the sub-directory
MOD-INF, and supporting external Java jars are in the
lib sub-directory. Those will be automatically loaded by Butterfly. (The build.xml script is wired to compile into the
Client-side code is in the inner
module sub-directory. They can be plain old .html, .css, .js, and image files, or they can be LESS files that get processed into CSS. There are also Velocity .vt files, but they need to be routed inside
/ or an empty string, we process and return
MOD-INF/index.vt ( see http://127.0.0.1:3333/extension/sample/ if OpenRefine is running).
init() function in
controller.js allows the extension to register various client-side handlers for augmenting pages served by Refine's core. These handlers are feature-specific. For example, this is where the jython extension adds its parser. As for the sample extension, it adds its script
project-injection.js and style
project-injection.less into the
/project page. If you view the source of the /project page, you will see references to those two files.
Wiring Up the Extension
The Extensions are loaded by the Butterfly framework. Butterfly refers to these as 'modules'. The location of modules is set in the
main/webapp/butterfly.properties file. Butterfly simply descends into each of those paths and looks for any
For more information, see Extension Points.
In the registration call, the variable
module is already available to your code by default, and it refers to your own extension.
You can specify one or more files for registration, and their paths are relative to the
module subdirectory of your extension. They are included in the order listed.
project.vt are by default bundled together for performance. When debugging, you can prevent this bundling behavior by setting
false near the top of that
controller.js file. (If you have commit access to this code base, be sure not to check that change in.)
Client-side: HTML Templates
DOM.loadHTML returns the content of the file as a string, and
$(...) turns it into a DOM fragment. Where
"core" is, you would want your extension's name. The path of the HTML file is relative to your extension's
Client-side: Project UI Extension Points
The main menu can be extended by calling any one of the methods
["core/project", "core/export", "core/export-templating"] pinpoints the reference menu item.
See the beginning of /main/webapp/modules/core/scripts/project/menu-bar.js for IDs of menu items and submenus.
Column Header Menu
The drop-down menu of each column can also be extended, but the mechanism is slightly different compared to the main menu. Because the drop-down menu for a particular column is constructed on the fly when the user actually clicks the drop-down menu button, extending the column header menu can't really be done once at start-up time, but must be done every time a column header menu gets created. So, registration in this case involves providing a function that gets called each such time:
That function takes in the column object (which contains the column's name), the column header UI object (generally not so useful), and the menu to extend. In the previous code line where it says "do stuff to menu", you can write something like this:
In addition to
MenuSystem.appendTo, you can also call
MenuSystem.insertAfter which the same 3 arguments. To see what IDs you can use, see the function
DataTableColumnHeaderUI.prototype._createMenuForColumnHeader in /main/webapp/modules/core/scripts/views/data-table/column-header-ui.js.
Server-side: Ajax Commands
The client-side of OpenRefine gets things done by calling AJAX commands on the server-side. These commands must be registered with the OpenRefine servlet, so that the servlet knows how to route AJAX calls from the client-side. This can be done inside the
init function in your extension's
controller.js file, e.g.,
Your command will then be accessible at http://127.0.0.1:3333/command/my-extension/my-command.
Most commands change the project's data. Most of them do so by creating abstract operations. See the Changes, History, Processes, and Operations section of the Server Side Architecture document.
You can register an operation class in the
init function as follows:
Do not call
new to construct an operation instance. You must register the class itself. The class should have a static function for reconstructing an operation instance from a JSON blob:
GREL can be extended with new functions. This is also done in the
init function in
You might also want to provide new variables (beyond just
row, etc.) available to expressions. This is done by registering a binder that implements the interface
You can register an importer as follows:
"importer-name" isn't important at all. It's not really related to file extension or mime-type. Just use something unique. Your importer will be explicitly called to test if it can import something.
You can register an exporter as follows:
"exporter-name" isn't important at all. It's only used by the client-side to tell the server-side which exporter to use. Just use something unique and, of course, relevant.
Server-side: Overlay Models
Overlay models are objects attached onto a core Project object to store and manage additional data for that project. For example, the schema alignment skeleton is managed by the Protograph overlay model. An overlay model implements the interface
com.google.refine.model.OverlayModel and can be registered like so:
Note that you register the class , not an instance. The class should implement the following static method for reconstructing an overlay model instance from a JSON blob:
When the project gets saved, the overlay model instance's
write method will be called:
Server-side: Scripting Languages
A scripting language (such as Jython) can be registered as follows:
The first string is the prefix that gets prepended to each expression so that we know which language the expression is in. This should be short, unique, and identifying. The second string is a user-friendly name of the language. The third is an object that implements the interface
com.google.refine.expr.LanguageSpecificParser. The final string is the default expression in that language that would return the cell's value.
In 2018 we are making important changes to OpenRefine to modernize it, for the benefit of users and contributors. This page describes the changes that impact developers of extensions or forks and is intended to minimize the effort required on their end to follow the transition. The instructions are written specifically with extension maintainers in mind, but fork maintainers should also find it useful.
This document describes the migrations in the order they are committed to the master branch. This means that it should be possible to perform each migration in turn, with the ability to run the software between each stage by checking out the appropriate git commit.