Pattrn data packages¶
In the Getting started tutorial we have seen how to package a dataset directly with the source code of the Pattrn app and how to publish them together on the web.
As the Pattrn code itself will mostly stay the same while the dataset for a Pattrn instance may be updated frequently, in most cases it is advisable to keep Pattrn code and data (along with metadata and settings) separate, so that Editors only need to focus on curating the dataset.
In order to do so, Pattrn supports the use of Pattrn data packages: these are NPM packages that contain data, metadata and settings for a Pattrn instance.
A consistent way to package Pattrn data also allows Pattrn editors to make interesting datasets available for others to reuse and analyse in their own Pattrn instances, therefore facilitating collaboration around datasets.
A future aim for the Pattrn project is to switch to using Frictionless Data Packages, as these provide a standard way to package data for reuse: if you are a data scientist or developer interested in adding support for Frictionless Data Packages to Pattrn, please get in touch! (see Joining the Pattrn developers discussion room ).
In this section of the Pattrn manual, you will learn:
- how to install a Pattrn data package and use it for a Pattrn instance
- how to prepare and publish a Pattrn data package
Prerequisites¶
In order to install Pattrn data packages you will need to use Git and Node.js (version 6.10 or later) on your computer. You will therefore need to have at least basic proficiency with using the command line, although no specific experience with Git, Node.js or JavaScript in general is required.
Additionally, in order to create Pattrn data packages, you will need to be able to edit JSON files, to handle basic Git operations (commit, branch, pushing to remotes) and to use a code collaboration platform (we recommend to host Pattrn data packages on GitLab).
Installing a Pattrn data package¶
In order to use Pattrn with a Pattrn data package, we will be building the Pattrn app from its source code (rather than directly using the pre-built app as you may have done if you followed the Getting started tutorial in this manual).
In order to build Pattrn from source, you will need Node.js (the current Node.js LTS release is recommended as Pattrn is mainly developed and tested with this version) and Yarn.
Firstly, clone the current version of Pattrn from the master GitHub repository:
git clone https://github.com/pattrn-project/pattrn.git
Then enter the pattrn
source folder:
cd pattrn
To configure Pattrn to use and bundle a Pattrn data package, simply
create a file with name source-data-packages.json
within the
pattrn
folder. Its content should be as in the example below
(just replace the URI of the sample data package with the URI of
your own data package in the source
setting):
{
"source_data_packages": [
{
"package": "pattrn-data-where-the-drones-strike",
"source": "https://gitlab.com/pattrn-data/pattrn-data-where-the-drones-strike.git#pattrn-data"
}
]
}
The source_data_packages
configuration object is an array, although
the current version of Pattrn only supports a single data package for
each Pattrn instance.
The package
setting needs to match the NPM package name as defined
within the NPM package’s own package.json
file. The source
setting matches the syntax of NPM’s dependencies
configuration section.
Once the source-data-packages.json
file has been created with suitable
content, building the Pattrn app will automatically retrieve the configured
Pattrn data package and bundle it with the Pattrn app:
yarn install && yarn run gulp build
If you run into a build error related to the node-sass
package, running
yarn install --force && yarn run gulp build
should fix the issue (see
https://github.com/sass/node-sass/issues/1579#issuecomment-227663782 for
details).
As part of the Pattrn build process, the Gulp build scripts will install
the Pattrn data package configured in the source-data-packages.json
file and copy all the content of its pattrn-data
folder to the
dist
folder where the Pattrn app gets built: effectively, the
pattrn-data
folder of the Pattrn data package gets merged with
the content of the Pattrn app.
If the app is built correctly, you will find your Pattrn app bundled with
your dataset inside of the dist
folder. You can now publish this folder
(for example on Netlify, as illustrated in the Getting started
tutorial), or run the app locally in order to check that everything is
working as expected:
yarn start
(the URI where to access the app will be displayed as part of Yarn’s output for the command above).
When developing a new Pattrn data package (see section below), it may
be useful to reference a local repository in the
source-data-packages.json
file, rather than a web URI, so that
any changes made to the data package during the development process
can be reflected almost immediately in the local Pattrn app (just
run yarn build && yarn start
to merge the latest local copy of
the Pattrn data package into the development copy of Pattrn.
Developing a Pattrn data package¶
If you have a dataset in GeoJSON format, creating a Pattrn data package for your own Pattrn instances or to share with other Pattrn users is easy.
The process involves:
- creating a project folder with a simple subfolder structure
- placing your GeoJSON data file alongside Pattrn’s metadata, settings and core config files at specific locations of the project’s folder structure
- creating a
package.json
file to turn the folder into a NPM package
Let’s now go over the process in detail.
Before starting to package your data, you will want to make sure that the GeoJSON dataset you wish to package is ready to be used with Pattrn (see the section of this manual about managing GeoJSON datasets for all the relevant details).
To create your Pattrn data package, first create a project folder for
the data package; the name of the folder is not relevant: you will
normally want to use a name that mirrors your NPM package name, and
to allow users to easily distinguish a NPM package that is a Pattrn
data package we recommend to use the pattrn-data-
prefix. For
example, for the sample dataset used in the Getting started tutorial
in this manual, we would create and enter the project folder as:
mkdir pattrn-data-where-the-drones-strike && cd "$_"
Initializing the NPM package¶
Within the project folder, run the npm init
command: this
will ask a few questions (providing sensible defaults) and then
create a package.json
file that turns your project into a NPM
package.
You may customise any of the settings while running the npm init
command or by editing the generated package.json
file afterwards.
We recommend to:
- set the
name
according to the suggestion in the previous section - use semantic versioning to manage package versions
- set a brief
description
of the dataset - include
pattrn-data
amongst anykeywords
for the NPM package - set the
private
field totrue
, at least initially, to avoid accidentally publishing the package to NPM if it is not intended for public use or until it has not been fully checked.
You will need to choose a license for your package. If you are simply
packaging a dataset provided by a third party, make sure to check their
terms of use and licenses and to comply with these. Your own packaging
work and any scripts to clean up data should be distributed under
a license you choose (we recommend to use the GNU GPL v3 or later),
but when doing so you must indicate this clearly by creating a
text file with details about the licenses of each component of the
Pattrn data package (e.g. LICENSE.txt
) and setting the
license
field of the package.json
file to SEE LICENSE IN
LICENSE.txt
(or the name of your file with license details), as
recommended in the
NPM documentation
for the license
field.
Adding data and metadata to the package¶
You can put any content inside the project folder (for example, the
raw source data and any R or Python scripts to clean it up and export
it to GeoJSON format): when installing a Pattrn data package (see
previous section), the build script will always look for a folder
named pattrn-data
, ignoring all the rest of the project folder’s
contents.
Advanced users will likely want to avoid putting content directly
into the pattrn-data
folder, opting instead to create its
content via a build pipeline (e.g. a Makefile running some R scripts);
for this tutorial we will simply create the pattrn-data
and its
contents directly. For an example of a Pattrn data package that uses
a build pipeline, see the full source code of the sample data package
used in the Getting started tutorial).
Within the project folder, create the following subfolders and (empty, for the moment) plain text files:
/pattrn-data-project/
/config.json
/data/
/metadata.json
/settings.json
/data.geojson
If you have been working through the Getting started tutorial, you will recognise the folder structure and the files listed above:
config.json
is the core Pattrn config file, which instructs the Pattrn code on where to find the data, settings and metadata files (when using a GeoJSON data source)metadata.json
(ormetadata.yaml
, if you prefer to use the YAML syntax) is the file that describes the variables in the instance’s datasetsettings.json
is the file with the instance’s settingsdata.geojson
is your data file
Once all the files are in place with their full content (see the Getting started tutorial and the GeoJSON datasets part of the Managing data for Pattrn section of this manual for details), you can test the Pattrn data package by configuring it to be used in a Pattrn instance (see the previous section).
Once you are happy with the content of your Pattrn data package, you can publish it by committing it into Git and pushing it to a code collaboration platform such as GitLab.
To commit your work into Git, from the root of your project folder:
git init && git add . && git commit
You can then create a new project on GitLab (or any other code collaboration platform you wish to use), and configure it as a remote for your Git repository:
git remote add origin <URI-of-the-remote-git-repository>
git push origin master
You (and other users, if your repository is publicly accessible) will
now be able to configure the remote URI of the repository in the
source-data-packages.json
file of a Pattrn instance and let the
Pattrn build script retrieve and bundle it with a new Pattrn instance.
You may also want to publish the package to NPM: refer to the relevant documentation on docs.npmjs.com for this.