Managing data for Pattrn

Principles of Pattrn Data structure

Pattrn works with datasets of events.

An event is defined as row of data that contains a date/time information, as well as a set of geographical coordinates. The details about each event are expressed as a series of attributes in the row of data.

The current version of Pattrn (2.0) can only visualise events with point coordinate; support for other kinds of GeoJSON geometry objects (e.g. linestrings or polygons) may be added in future versions.

GeoJSON datasets

The preferred data format for Pattrn datasets from Pattrn version 2.0 onwards is GeoJSON, as this provides much better flexibility than the Google Sheets data source type used in Pattrn version 1.0 (and still available as an option in Pattrn v2.0).

For example, GeoJSON files can easily be produced through scripts that gather data from a database, clean and filter it and finally export it to a GeoJSON file. Doing so with Google Sheets is possible but it requires a greater effort and using brittle workflows. Additionally, some Pattrn users will likely want to avoid using Google’s proprietary and opaque cloud services when collecting sensitive data from vulnerable contributors.

Cleaning up data and preparing it for Pattrn

When using GeoJSON as data format for Pattrn datasets, you will want to make sure that:

  • each object’s geometry is of type Point (other with any other type of geometry will be ignored by Pattrn)

Additionally, a few core properties, when set (if not compulsory), must have exactly the names listed below:

pattrn_date_time (compulsory field)
A date and time must be provided for each object, using the ISO 8601 format. Year, month, day of the month, hour and minute must be provided for each event (although times are not used in charts, so if dealing with events with no time you may want to set an arbitrary time of the day such as 12:00). Timezone offsets are currently not supported. Events without a pattrn_date_time property defined, or whose pattrn_date_time property cannot be parsed as an ISO 8601 date/time will automatically be discarded while the dataset is being first loaded.
A name of the location where the event described by the object happened (this is displayed in each event’s detail view).
A short description of the event (this is displayed in each event’s detail view).

All the other properties/variables can have arbitrary names, but it is advisable to:

  • only use alphanumeric characters and the underscore (_) character (avoiding any other characters, foremost spaces)
  • keep variable names short; this reduces the size of the GeoJSON file and these short names will not be displayed to the App’s users if a descriptive label for each field is specified in the instance’s metadata file

Preparing metadata for a dataset

When loading a dataset, Pattrn will use the variable names listed above for its core variables and interpret them according to the meaning defined; all other (i.e. dataset-specific) variables need to be configured in a metadata file so that Pattrn can use them correctly, for example by plotting integer variables on line charts, tag and boolean variables on bar charts, and so on. Additionally, a metadata file can (and definitely should) be used to associate a brief, meaningful description to each variable: the Pattrn code will use the short variable name contained in the GeoJSON file, while showing to users the brief description configured.

The format of the metadata file is documented in the Pattrn metadata: configuring variables section of the Getting started tutorial: please refer to it for help on how to prepare your metadata files.

Whereas for the sake of simplicity the YAML file format is used in the tutorial, Pattrn can also read metadata files in JSON format: by using the .json file extension instead of .yaml, Pattrn will process the metadata file accordingly.

Google Sheets datasets

The Google Sheets data source was the only available option in the first version of Pattrn. Using Google Sheets (whether by editing data directly in the sheet or through the Pattrn Editor) can be easier than handling a GeoJSON file for first-time users, although the way this kind of data source is used in Pattrn has several limitations:

  • only variables of type integer, tag and boolean are supported (no variables of type tree, which were introduced with Pattrn 2.0)
  • only at most five variables of each variable type are supported (e.g. only up to five integer variables); this may be enough for simple datasets, but more advanced users will likely need more flexibility: Pattrn 2.0 supports an arbitrary numer of variables of each variable type.
  • it is not possible to visually differentiate between events that originate from distinct source datasets: Pattrn 2.0 using GeoJSON datasets can be used to aggregate events reported by different organizations or by different sources and bundle them into a single GeoJSON dataset while still allowing to visually differentiate between the different sources (by using map markers of different colours)
  • additionally, when using distinct source datasets as described in the previous point, Pattrn 2.0 allows to expose variables that only apply to individual source datasets, whereas the Google Sheets source for Pattrn 1.0 requires all variables to be shared across all events of the dataset

For users who are analysing and visualising basic datasets and who find the spreadsheet interface convenient to edit and manage data, the following documentation applies.

Where data is stored

Once a Google Sheets data source has been set up (see relevant section of this manual), data visualised through Pattrn is stored in the PATTRN_Master spreadsheet. This is pre-formatted with the data template in use with Pattrn.

Two ways of editing data

It is recommended to use the Pattrn Editor to enter and edit data on the PATTRN_Master spreadsheet, as it is designed to output data in the correct format and thereby to avoid display issues or glitches in the Pattrn app.

However, you can also enter and edit data directly on the Master spreadsheet , using Google Sheets. For example, if you want to visualise an existing dataset of events, you can copy and paste the available data in the PATTRN_Master spreadsheet. Make sure to format it correctly, by always referring to the “Data Formatting Reference” sheet in the PATTRN_Master spreadsheet.

Customising your data structure

For any event in the dataset in use with Pattrn, the following are the fixed fields – the headers of which must not be modified in the PATTRN_Master spreadsheet:

  • unique_event_ID
  • location_name
  • latitude
  • longitude
  • geo-accuracy
  • date_time
  • event_summary
  • source_name

In addition, the data structure of the dataset to be used with Pattrn can be customised by adding up to 5 fields of numeric data, up to 5 fields of data tags , and up to 5 fields of boolean data (Yes/No).

You can create these custom columns of data either inside the Pattrn Editor, or by manually renaming the header of one of the column comprised between “I” and “W” in the PATTRN_Master spreadsheet.

Automatically populated fields

When using the Pattrn Editor, certain fields in the PATTRN_Master spreadsheet will be automatically populated. Those fields are:

  • unique_event_ID: assigns a unique Event ID to all new Events.
  • geo-accuracy: provides an indication of the accuracy of the geolocation result, on the basis of the textual locational information entered in the Location field.
  • media_available: populated with a series of tags corresponding to the types of media that have been attached to each event.

Required fields

In order for the Pattrn app to successfully load and display data from the PATTRN_Master spreadsheet, all rows that are not completely empty need to have correctly formatted data in at least three key fields. Those fields are:

  • latitude
  • longitude
  • date_time

Warning : if a single row in a large dataset hosted on Google Sheets lacks data, or contains data that is wrongly formatted, in one of these three fields, the Pattrn app won’t load any data. For this reason, it is recommended to use the Pattrn Editor. When it is necessary to edit directly into the PATTRN_Master spreadsheet, Editors need to be very careful with the formatting of data in these three key fields in particular. Data validation custom formulas have been integrated in the PATTRN_Master spreadsheet, in order to facilitate the identification of any wrongly formatted row. (see Troubleshooting).

Data Formatting Reference

The PATTRN_Master spreadsheet contains a sheet named ” Data Formatting Reference” (third tab at the bottom of the spreadsheet).

This sheet features a row corresponding to a dummy event, every field of which holds data that is correctly formatted for use with Pattrn.

Do always consult this sheet when entering data directly into the PATTRN_Master spreadsheet.

Data limits

Indicative maximum number of events in a Dataset

The Pattrn app and Pattrn Editor have been tested to work with a dataset of 2,000 events, with every event containing data in the 15 different custom data fields as well as photos, videos and links.

Above this volume of data in the dataset, it is possible that the Pattrn app and/or the Pattrn Editor will be less responsive, or even stop working, especially when using older browsers on less powerful computers.

Nevertheless, the Pattrn app and the Pattrn Editor have also passed the performance test with larger datasets in terms of number of events (15,000+), but with lesser amounts of data per event.

Note: In order to deliver a dynamic interactive experience, the Pattrn app needs to load all the data contained in the PATTRN_Master spreadsheet, at once, when the app is accessed online. For this reason, if you’re using Pattrn with a large dataset, it may take up to 30 seconds for the Pattrn app to load all the data in the first instance, especially when accessing the app on slower mobile connections. Please be patient.

Concerning tags

For each column of tag data entered in the dataset, a Bar Chart will be automatically generated by the Pattrn app.

For these Bar Charts to display correctly, it is recommended to use:

  • no more than 12 different tags per column of tag data.
  • tags no longer than 24 characters each

Note: this applies to the source_name column as well.