==================== Pattrn data packages ==================== In the :ref:`getting-started` tutorial we have seen how to package a dataset directly with the source code of the Pattrn app and how to publish them together on the web. As the Pattrn code itself will mostly stay the same while the dataset for a Pattrn instance may be updated frequently, in most cases it is advisable to keep Pattrn code and data (along with metadata and settings) separate, so that Editors only need to focus on curating the dataset. In order to do so, Pattrn supports the use of *Pattrn data packages*: these are `NPM `_ packages that contain data, metadata and settings for a Pattrn instance. A consistent way to package Pattrn data also allows Pattrn editors to make interesting datasets available for others to reuse and analyse in their own Pattrn instances, therefore facilitating collaboration around datasets. *A future aim for the Pattrn project is to switch to using* `Frictionless Data Packages `_, *as these provide a standard way to package data for reuse: if you are a data scientist or developer interested in adding support for Frictionless Data Packages to Pattrn, please get in touch! (see* :ref:`pattrn-gitter-room` *).* In this section of the Pattrn manual, you will learn: * how to :ref:`install ` a Pattrn data package and use it for a Pattrn instance * how to :ref:`prepare and publish ` a Pattrn data package Prerequisites ------------- In order to *install* Pattrn data packages you will need to use `Git `_ and `Node.js `_ (version 6.10 or later) on your computer. You will therefore need to have at least basic proficiency with using the command line, although no specific experience with Git, Node.js or JavaScript in general is required. Additionally, in order to *create* Pattrn data packages, you will need to be able to edit JSON files, to handle basic Git operations (commit, branch, pushing to remotes) and to use a code collaboration platform (we recommend to host Pattrn data packages on `GitLab `_). .. _installing-a-pattern-data-package: Installing a Pattrn data package -------------------------------- In order to use Pattrn with a Pattrn data package, we will be *building* the Pattrn app from its source code (rather than directly using the pre-built app as you may have done if you followed the :ref:`getting-started` tutorial in this manual). In order to build Pattrn from source, you will need Node.js (the current `Node.js LTS release `_ is recommended as Pattrn is mainly developed and tested with this version) and `Yarn `_. Firstly, *clone* the current version of Pattrn from the master GitHub repository:: git clone https://github.com/pattrn-project/pattrn.git Then enter the ``pattrn`` source folder:: cd pattrn To configure Pattrn to use and bundle a Pattrn data package, simply create a file with name ``source-data-packages.json`` within the ``pattrn`` folder. Its content should be as in the example below (just replace the URI of the sample data package with the URI of your own data package in the ``source`` setting):: { "source_data_packages": [ { "package": "pattrn-data-where-the-drones-strike", "source": "https://gitlab.com/pattrn-data/pattrn-data-where-the-drones-strike.git#pattrn-data" } ] } The ``source_data_packages`` configuration object is an array, although the current version of Pattrn only supports a single data package for each Pattrn instance. The ``package`` setting needs to match the NPM package name as defined within the NPM package's own ``package.json`` file. The ``source`` setting matches the syntax of NPM's ``dependencies`` `configuration section `_. Once the ``source-data-packages.json`` file has been created with suitable content, building the Pattrn app will automatically retrieve the configured Pattrn data package and bundle it with the Pattrn app:: yarn install && yarn run gulp build If you run into a build error related to the ``node-sass`` package, running ``yarn install --force && yarn run gulp build`` should fix the issue (see https://github.com/sass/node-sass/issues/1579#issuecomment-227663782 for details). As part of the Pattrn build process, the Gulp build scripts will install the Pattrn data package configured in the ``source-data-packages.json`` file **and copy all the content of its** ``pattrn-data`` **folder to the** ``dist`` **folder where the Pattrn app gets built**: effectively, the ``pattrn-data`` folder of the Pattrn data package gets *merged* with the content of the Pattrn app. If the app is built correctly, you will find your Pattrn app bundled with your dataset inside of the ``dist`` folder. You can now publish this folder (for example on Netlify, as illustrated in the :ref:`getting-started` tutorial), or run the app locally in order to check that everything is working as expected:: yarn start (the URI where to access the app will be displayed as part of Yarn's output for the command above). When developing a new Pattrn data package (see section below), it may be useful to reference a *local* repository in the ``source-data-packages.json`` file, rather than a web URI, so that any changes made to the data package during the development process can be reflected almost immediately in the local Pattrn app (just run ``yarn build && yarn start`` to merge the latest local copy of the Pattrn data package into the development copy of Pattrn. .. _developing-a-pattern-data-package: Developing a Pattrn data package -------------------------------- If you have a dataset in GeoJSON format, creating a Pattrn data package for your own Pattrn instances or to share with other Pattrn users is easy. The process involves: * creating a project folder with a simple subfolder structure * placing your GeoJSON data file alongside Pattrn's metadata, settings and core config files at specific locations of the project's folder structure * creating a ``package.json`` file to turn the folder into a NPM package Let's now go over the process in detail. Before starting to package your data, you will want to make sure that the GeoJSON dataset you wish to package is ready to be used with Pattrn (see the section of this manual about managing :ref:`managing-data-geojson` for all the relevant details). To create your Pattrn data package, first create a project folder for the data package; the name of the folder is not relevant: you will normally want to use a name that mirrors your NPM package name, and to allow users to easily distinguish a NPM package that is a Pattrn data package we recommend to use the ``pattrn-data-`` prefix. For example, for the sample dataset used in the :ref:`getting-started` tutorial in this manual, we would create and enter the project folder as:: mkdir pattrn-data-where-the-drones-strike && cd "$_" Initializing the NPM package ............................ Within the project folder, run the ``npm init`` command: this will ask a few questions (providing sensible defaults) and then create a ``package.json`` file that turns your project into a NPM package. You may customise any of the settings while running the ``npm init`` command or by editing the generated ``package.json`` file afterwards. We recommend to: * set the ``name`` according to the suggestion in the previous section * use `semantic versioning `_ to manage package versions * set a brief ``description`` of the dataset * include ``pattrn-data`` amongst any ``keywords`` for the NPM package * set the ``private`` field to ``true``, at least initially, to avoid accidentally publishing the package to NPM if it is not intended for public use or until it has not been fully checked. You will need to choose a license for your package. If you are simply packaging a dataset provided by a third party, make sure to check their terms of use and licenses and to comply with these. Your own packaging work and any scripts to clean up data should be distributed under a license you choose (we recommend to use the GNU GPL v3 or later), but when doing so you **must** indicate this clearly by creating a text file with details about the licenses of each component of the Pattrn data package (e.g. ``LICENSE.txt``) and setting the ``license`` field of the ``package.json`` file to ``SEE LICENSE IN LICENSE.txt`` (or the name of your file with license details), as recommended in the `NPM documentation `_ for the ``license`` field. Adding data and metadata to the package ....................................... You can put any content inside the project folder (for example, the raw source data and any R or Python scripts to clean it up and export it to GeoJSON format): when installing a Pattrn data package (see previous section), the build script will always look for a folder named ``pattrn-data``, ignoring all the rest of the project folder's contents. *Advanced users will likely want to avoid putting content directly into the* ``pattrn-data`` *folder, opting instead to create its content via a build pipeline (e.g. a Makefile running some R scripts); for this tutorial we will simply create the* ``pattrn-data`` *and its contents directly. For an example of a Pattrn data package that uses a build pipeline, see the* `full source code of the sample data package `_ *used in the* :ref:`getting-started` *tutorial).* Within the project folder, create the following subfolders and (empty, for the moment) plain text files:: /pattrn-data-project/ /config.json /data/ /metadata.json /settings.json /data.geojson If you have been working through the :ref:`getting-started` tutorial, you will recognise the folder structure and the files listed above: * ``config.json`` is the core Pattrn config file, which instructs the Pattrn code on where to find the data, settings and metadata files (when using a GeoJSON data source) * ``metadata.json`` (or ``metadata.yaml``, if you prefer to use the YAML syntax) is the file that describes the variables in the instance's dataset * ``settings.json`` is the file with the instance's settings * ``data.geojson`` is your data file Once all the files are in place with their full content (see the :ref:`getting-started` tutorial and the :ref:`managing-data-geojson` part of the :ref:`managing-data` section of this manual for details), you can test the Pattrn data package by configuring it to be used in a Pattrn instance (see the :ref:`previous section `). Once you are happy with the content of your Pattrn data package, you can publish it by committing it into Git and pushing it to a code collaboration platform such as `GitLab `_. To commit your work into Git, from the root of your project folder:: git init && git add . && git commit You can then create a new project on GitLab (or any other code collaboration platform you wish to use), and configure it as a remote for your Git repository:: git remote add origin git push origin master You (and other users, if your repository is publicly accessible) will now be able to configure the remote URI of the repository in the ``source-data-packages.json`` file of a Pattrn instance and let the Pattrn build script retrieve and bundle it with a new Pattrn instance. You may also want to publish the package to NPM: refer to the `relevant documentation on docs.npmjs.com `_ for this.