Pattrn data packages

In the Getting started tutorial we have seen how to package a dataset directly with the source code of the Pattrn app and how to publish them together on the web.

As the Pattrn code itself will mostly stay the same while the dataset for a Pattrn instance may be updated frequently, in most cases it is advisable to keep Pattrn code and data (along with metadata and settings) separate, so that Editors only need to focus on curating the dataset.

In order to do so, Pattrn supports the use of Pattrn data packages: these are NPM packages that contain data, metadata and settings for a Pattrn instance.

A consistent way to package Pattrn data also allows Pattrn editors to make interesting datasets available for others to reuse and analyse in their own Pattrn instances, therefore facilitating collaboration around datasets.

A future aim for the Pattrn project is to switch to using Frictionless Data Packages, as these provide a standard way to package data for reuse: if you are a data scientist or developer interested in adding support for Frictionless Data Packages to Pattrn, please get in touch! (see Joining the Pattrn developers discussion room ).

In this section of the Pattrn manual, you will learn:

Prerequisites

In order to install Pattrn data packages you will need to use Git and Node.js (version 6.10 or later) on your computer. You will therefore need to have at least basic proficiency with using the command line, although no specific experience with Git, Node.js or JavaScript in general is required.

Additionally, in order to create Pattrn data packages, you will need to be able to edit JSON files, to handle basic Git operations (commit, branch, pushing to remotes) and to use a code collaboration platform (we recommend to host Pattrn data packages on GitLab).

Installing a Pattrn data package

In order to use Pattrn with a Pattrn data package, we will be building the Pattrn app from its source code (rather than directly using the pre-built app as you may have done if you followed the Getting started tutorial in this manual).

Firstly, clone the current version of Pattrn from the master GitHub repository:

git clone https://github.com/pattrn-project/pattrn.git

Then enter the pattrn source folder:

cd pattrn

To configure Pattrn to use and bundle a Pattrn data package, simply create a file with name source-data-packages.json within the pattrn folder. Its content should be as in the example below (just replace the URI of the sample data package with the URI of your own data package in the source setting):

{
    "source_data_packages": [
        {
        "package": "pattrn-data-where-the-drones-strike",
        "source": "https://gitlab.com/pattrn-data/pattrn-data-where-the-drones-strike.git#pattrn-data"
        }
    ]
}

The source_data_packages configuration object is an array, although the current version of Pattrn only supports a single data package for each Pattrn instance.

The package setting needs to match the NPM package name as defined within the NPM package’s own package.json file. The source setting matches the syntax of NPM’s dependencies configuration section.

Once the source-data-packages.json file has been created with suitable content, building the Pattrn app will automatically retrieve the configured Pattrn data package and bundle it with the Pattrn app:

npm install && npm run gulp build

As part of the Pattrn build process, the Gulp build scripts will install the Pattrn data package configured in the source-data-packages.json file and copy all the content of its pattrn-data folder to the dist folder where the Pattrn app gets built: effectively, the pattrn-data folder of the Pattrn data package gets merged with the content of the Pattrn app.

If the app is built correctly, you will find your Pattrn app bundled with your dataset inside of the dist folder. You can now publish this folder (for example on Netlify, as illustrated in the Getting started tutorial), or run the app locally in order to check that everything is working as expected:

npm start

(the URI where to access the app will be displayed as part of NPM’s output for the command above).

When developing a new Pattrn data package (see section below), it may be useful to reference a local repository in the source-data-packages.json file, rather than a web URI, so that any changes made to the data package during the development process can be reflected almost immediately in the local Pattrn app (just run npm build && npm start to merge the latest local copy of the Pattrn data package into the development copy of Pattrn.

Developing a Pattrn data package

If you have a dataset in GeoJSON format, creating a Pattrn data package for your own Pattrn instances or to share with other Pattrn users is easy.

The process involves:

  • creating a project folder with a simple subfolder structure
  • placing your GeoJSON data file alongside Pattrn’s metadata, settings and core config files at specific locations of the project’s folder structure
  • creating a package.json file to turn the folder into a NPM package

Let’s now go over the process in detail.

Before starting to package your data, you will want to make sure that the GeoJSON dataset you wish to package is ready to be used with Pattrn (see the section of this manual about managing GeoJSON datasets for all the relevant details).

To create your Pattrn data package, first create a project folder for the data package; the name of the folder is not relevant: you will normally want to use a name that mirrors your NPM package name, and to allow users to easily distinguish a NPM package that is a Pattrn data package we recommend to use the pattrn-data- prefix. For example, for the sample dataset used in the Getting started tutorial in this manual, we would create and enter the project folder as:

mkdir pattrn-data-where-the-drones-strike && cd "$_"

Initializing the NPM package

Within the project folder, run the npm init command: this will ask a few questions (providing sensible defaults) and then create a package.json file that turns your project into a NPM package.

You may customise any of the settings while running the npm init command or by editing the generated package.json file afterwards. We recommend to:

  • set the name according to the suggestion in the previous section
  • use semantic versioning to manage package versions
  • set a brief description of the dataset
  • include pattrn-data amongst any keywords for the NPM package
  • set the private field to true, at least initially, to avoid accidentally publishing the package to NPM if it is not intended for public use or until it has not been fully checked.

You will need to choose a license for your package. If you are simply packaging a dataset provided by a third party, make sure to check their terms of use and licenses and to comply with these. Your own packaging work and any scripts to clean up data should be distributed under a license you choose (we recommend to use the GNU GPL v3 or later), but when doing so you must indicate this clearly by creating a text file with details about the licenses of each component of the Pattrn data package (e.g. LICENSE.txt) and setting the license field of the package.json file to SEE LICENSE IN LICENSE.txt (or the name of your file with license details), as recommended in the NPM documentation for the license field.

Adding data and metadata to the package

You can put any content inside the project folder (for example, the raw source data and any R or Python scripts to clean it up and export it to GeoJSON format): when installing a Pattrn data package (see previous section), the build script will always look for a folder named pattrn-data, ignoring all the rest of the project folder’s contents.

Advanced users will likely want to avoid putting content directly into the pattrn-data folder, opting instead to create its content via a build pipeline (e.g. a Makefile running some R scripts); for this tutorial we will simply create the pattrn-data and its contents directly. For an example of a Pattrn data package that uses a build pipeline, see the full source code of the sample data package used in the Getting started tutorial).

Within the project folder, create the following subfolders and (empty, for the moment) plain text files:

/pattrn-data-project/
  /config.json
  /data/
     /metadata.json
     /settings.json
     /data.geojson

If you have been working through the Getting started tutorial, you will recognise the folder structure and the files listed above:

  • config.json is the core Pattrn config file, which instructs the Pattrn code on where to find the data, settings and metadata files (when using a GeoJSON data source)
  • metadata.json (or metadata.yaml, if you prefer to use the YAML syntax) is the file that describes the variables in the instance’s dataset
  • settings.json is the file with the instance’s settings
  • data.geojson is your data file

Once all the files are in place with their full content (see the Getting started tutorial and the GeoJSON datasets part of the Managing data for Pattrn section of this manual for details), you can test the Pattrn data package by configuring it to be used in a Pattrn instance (see the previous section).

Once you are happy with the content of your Pattrn data package, you can publish it by committing it into Git and pushing it to a code collaboration platform such as GitLab.

To commit your work into Git, from the root of your project folder:

git init && git add . && git commit

You can then create a new project on GitLab (or any other code collaboration platform you wish to use), and configure it as a remote for your Git repository:

git remote add origin <URI-of-the-remote-git-repository>
git push origin master

You (and other users, if your repository is publicly accessible) will now be able to configure the remote URI of the repository in the source-data-packages.json file of a Pattrn instance and let the Pattrn build script retrieve and bundle it with a new Pattrn instance.

You may also want to publish the package to NPM: refer to the relevant documentation on docs.npmjs.com for this.