Managing data for Pattrn¶
Principles of Pattrn Data structure¶
Pattrn works with datasets of events.
An event is defined as row of data that contains a date/time information, as well as a set of geographical coordinates. The details about each event are expressed as a series of attributes in the row of data.
The current version of Pattrn (2.0) can only visualise events with point coordinate; support for other kinds of GeoJSON geometry objects (e.g. linestrings or polygons) may be added in future versions.
The preferred data format for Pattrn datasets from Pattrn version 2.0 onwards is GeoJSON, as this provides much better flexibility than the Google Sheets data source type used in Pattrn version 1.0 (and still available as an option in Pattrn v2.0).
For example, GeoJSON files can easily be produced through scripts that gather data from a database, clean and filter it and finally export it to a GeoJSON file. Doing so with Google Sheets is possible but it requires a greater effort and using brittle workflows. Additionally, some Pattrn users will likely want to avoid using Google’s proprietary and opaque cloud services when collecting sensitive data from vulnerable contributors.
Cleaning up data and preparing it for Pattrn¶
When using GeoJSON as data format for Pattrn datasets, you will want to make sure that:
- each object’s geometry is of type
Point(other with any other type of geometry will be ignored by Pattrn)
Additionally, a few core properties, when set (if not compulsory), must have exactly the names listed below:
- A date and time must be provided for each object, using the
ISO 8601 format. Year,
month, day of the month, hour and minute must be provided for
each event (although times are not used in charts, so if dealing
with events with no time you may want to set an arbitrary time of
the day such as 12:00). Timezone offsets are currently not supported.
Events without a
pattrn_date_timeproperty defined, or whose
pattrn_date_timeproperty cannot be parsed as an ISO 8601 date/time will automatically be discarded while the dataset is being first loaded.
- A name of the location where the event described by the object happened (this is displayed in each event’s detail view).
- A short description of the event (this is displayed in each event’s detail view).
All the other properties/variables can have arbitrary names, but it is advisable to:
- only use alphanumeric characters and the underscore (
_) character (avoiding any other characters, foremost spaces)
- keep variable names short; this reduces the size of the GeoJSON file and these short names will not be displayed to the App’s users if a descriptive label for each field is specified in the instance’s metadata file
Preparing metadata for a dataset¶
When loading a dataset, Pattrn will use the variable names listed above for its core variables and interpret them according to the meaning defined; all other (i.e. dataset-specific) variables need to be configured in a metadata file so that Pattrn can use them correctly, for example by plotting integer variables on line charts, tag and boolean variables on bar charts, and so on. Additionally, a metadata file can (and definitely should) be used to associate a brief, meaningful description to each variable: the Pattrn code will use the short variable name contained in the GeoJSON file, while showing to users the brief description configured.
Whereas for the sake of simplicity the YAML file format is used in the
tutorial, Pattrn can also read metadata files in JSON format: by using
.json file extension instead of
.yaml, Pattrn will process
the metadata file accordingly.
Google Sheets datasets¶
The Google Sheets data source was the only available option in the first version of Pattrn. Using Google Sheets (whether by editing data directly in the sheet or through the Pattrn Editor) can be easier than handling a GeoJSON file for first-time users, although the way this kind of data source is used in Pattrn has several limitations:
- only variables of type integer, tag and boolean are supported (no variables of type tree, which were introduced with Pattrn 2.0)
- only at most five variables of each variable type are supported (e.g. only up to five integer variables); this may be enough for simple datasets, but more advanced users will likely need more flexibility: Pattrn 2.0 supports an arbitrary numer of variables of each variable type.
- it is not possible to visually differentiate between events that originate from distinct source datasets: Pattrn 2.0 using GeoJSON datasets can be used to aggregate events reported by different organizations or by different sources and bundle them into a single GeoJSON dataset while still allowing to visually differentiate between the different sources (by using map markers of different colours)
- additionally, when using distinct source datasets as described in the previous point, Pattrn 2.0 allows to expose variables that only apply to individual source datasets, whereas the Google Sheets source for Pattrn 1.0 requires all variables to be shared across all events of the dataset
For users who are analysing and visualising basic datasets and who find the spreadsheet interface convenient to edit and manage data, the following documentation applies.
Where data is stored¶
Once a Google Sheets data source has been set up (see relevant section
of this manual), data visualised through Pattrn is stored in the
PATTRN_Master spreadsheet. This is pre-formatted with the data template
in use with Pattrn.
Two ways of editing data¶
It is recommended to use the Pattrn Editor to enter and edit data on
PATTRN_Master spreadsheet, as it is designed to output data in the
correct format and thereby to avoid display issues or glitches in the
However, you can also enter and edit data directly on the Master
spreadsheet , using Google Sheets. For example, if you want to
visualise an existing dataset of events, you can copy and paste the
available data in the
PATTRN_Master spreadsheet. Make sure to format it
correctly, by always referring to the “Data Formatting Reference” sheet
Customising your data structure¶
For any event in the dataset in use with Pattrn, the following are the
fixed fields – the headers of which must not be modified in the
In addition, the data structure of the dataset to be used with Pattrn can be customised by adding up to 5 fields of numeric data, up to 5 fields of data tags , and up to 5 fields of boolean data (Yes/No).
You can create these custom columns of data either inside the Pattrn
Editor, or by manually renaming the header of one of the column
comprised between “I” and “W” in the
Automatically populated fields¶
When using the Pattrn Editor, certain fields in the
spreadsheet will be automatically populated. Those fields are:
unique_event_ID: assigns a unique Event ID to all new Events.
geo-accuracy: provides an indication of the accuracy of the geolocation result, on the basis of the textual locational information entered in the Location field.
media_available: populated with a series of tags corresponding to the types of media that have been attached to each event.
In order for the Pattrn app to successfully load and display data
PATTRN_Master spreadsheet, all rows that are not completely
empty need to have correctly formatted data in at least three key
fields. Those fields are:
Warning : if a single row in a large dataset hosted on
Google Sheets lacks data, or contains data that is wrongly
formatted, in one of these three fields,
the Pattrn app won’t load any data. For this reason, it is
recommended to use the Pattrn Editor. When it is necessary to edit
directly into the
PATTRN_Master spreadsheet, Editors need to be very
careful with the formatting of data in these three key fields in
particular. Data validation custom formulas have been integrated in the
PATTRN_Master spreadsheet, in order to facilitate the identification of
any wrongly formatted row. (see Troubleshooting).
Data Formatting Reference¶
PATTRN_Master spreadsheet contains a sheet named ” Data
Formatting Reference” (third tab at the bottom of the spreadsheet).
This sheet features a row corresponding to a dummy event, every field of which holds data that is correctly formatted for use with Pattrn.
Do always consult this sheet when entering data directly into the
Indicative maximum number of events in a Dataset¶
The Pattrn app and Pattrn Editor have been tested to work with a dataset of 2,000 events, with every event containing data in the 15 different custom data fields as well as photos, videos and links.
Above this volume of data in the dataset, it is possible that the Pattrn app and/or the Pattrn Editor will be less responsive, or even stop working, especially when using older browsers on less powerful computers.
Nevertheless, the Pattrn app and the Pattrn Editor have also passed the performance test with larger datasets in terms of number of events (15,000+), but with lesser amounts of data per event.
Note: In order to deliver a dynamic interactive experience, the Pattrn
app needs to load all the data contained in the
spreadsheet, at once, when the app is accessed online. For this
reason, if you’re using Pattrn with a large dataset, it may take up to
30 seconds for the Pattrn app to load all the data in the first
instance, especially when accessing the app on slower mobile connections.
Please be patient.