Managing data for Pattrn¶
Principles of Pattrn Data structure¶
Pattrn works with datasets of events.
An event is defined as row of data that contains a date/time information, as well as a set of geographical coordinates. The details about each event are expressed as a series of attributes in the row of data.
The current version of Pattrn (2.0) can only visualise events with point coordinate; support for other kinds of GeoJSON geometry objects (e.g. linestrings or polygons) may be added in future versions.
GeoJSON datasets¶
The preferred data format for Pattrn datasets from Pattrn version 2.0 onwards is GeoJSON, as this provides much better flexibility than the Google Sheets data source type used in Pattrn version 1.0 (and still available as an option in Pattrn v2.0).
For example, GeoJSON files can easily be produced through scripts that gather data from a database, clean and filter it and finally export it to a GeoJSON file. Doing so with Google Sheets is possible but it requires a greater effort and using brittle workflows. Additionally, some Pattrn users will likely want to avoid using Google’s proprietary and opaque cloud services when collecting sensitive data from vulnerable contributors.
Cleaning up data and preparing it for Pattrn¶
When using GeoJSON as data format for Pattrn datasets, you will want to make sure that:
- each object’s geometry is of type
Point
(other with any other type of geometry will be ignored by Pattrn)
Additionally, a few core properties, when set (if not compulsory), must have exactly the names listed below:
pattrn_date_time
(compulsory field)- A date and time must be provided for each object, using the
ISO 8601 format. Year,
month, day of the month, hour and minute must be provided for
each event (although times are not used in charts, so if dealing
with events with no time you may want to set an arbitrary time of
the day such as 12:00). Timezone offsets are currently not supported.
Events without a
pattrn_date_time
property defined, or whosepattrn_date_time
property cannot be parsed as an ISO 8601 date/time will automatically be discarded while the dataset is being first loaded. location_name
- A name of the location where the event described by the object happened (this is displayed in each event’s detail view).
event_summary
- A short description of the event (this is displayed in each event’s detail view).
All the other properties/variables can have arbitrary names, but it is advisable to:
- only use alphanumeric characters and the underscore (
_
) character (avoiding any other characters, foremost spaces) - keep variable names short; this reduces the size of the GeoJSON file and these short names will not be displayed to the App’s users if a descriptive label for each field is specified in the instance’s metadata file
Preparing metadata for a dataset¶
When loading a dataset, Pattrn will use the variable names listed above for its core variables and interpret them according to the meaning defined; all other (i.e. dataset-specific) variables need to be configured in a metadata file so that Pattrn can use them correctly, for example by plotting integer variables on line charts, tag and boolean variables on bar charts, and so on. Additionally, a metadata file can (and definitely should) be used to associate a brief, meaningful description to each variable: the Pattrn code will use the short variable name contained in the GeoJSON file, while showing to users the brief description configured.
The format of the metadata file is documented in the Pattrn metadata: configuring variables section of the Getting started tutorial: please refer to it for help on how to prepare your metadata files.
Whereas for the sake of simplicity the YAML file format is used in the
tutorial, Pattrn can also read metadata files in JSON format: by using
the .json
file extension instead of .yaml
, Pattrn will process
the metadata file accordingly.
Google Sheets datasets¶
The Google Sheets data source was the only available option in the first version of Pattrn. Using Google Sheets (whether by editing data directly in the sheet or through the Pattrn Editor) can be easier than handling a GeoJSON file for first-time users, although the way this kind of data source is used in Pattrn has several limitations:
- only variables of type integer, tag and boolean are supported (no variables of type tree, which were introduced with Pattrn 2.0)
- only at most five variables of each variable type are supported (e.g. only up to five integer variables); this may be enough for simple datasets, but more advanced users will likely need more flexibility: Pattrn 2.0 supports an arbitrary numer of variables of each variable type.
- it is not possible to visually differentiate between events that originate from distinct source datasets: Pattrn 2.0 using GeoJSON datasets can be used to aggregate events reported by different organizations or by different sources and bundle them into a single GeoJSON dataset while still allowing to visually differentiate between the different sources (by using map markers of different colours)
- additionally, when using distinct source datasets as described in the previous point, Pattrn 2.0 allows to expose variables that only apply to individual source datasets, whereas the Google Sheets source for Pattrn 1.0 requires all variables to be shared across all events of the dataset
For users who are analysing and visualising basic datasets and who find the spreadsheet interface convenient to edit and manage data, the following documentation applies.
Where data is stored¶
Once a Google Sheets data source has been set up (see relevant section
of this manual), data visualised through Pattrn is stored in the
PATTRN_Master
spreadsheet. This is pre-formatted with the data template
in use with Pattrn.
Two ways of editing data¶
It is recommended to use the Pattrn Editor to enter and edit data on
the PATTRN_Master
spreadsheet, as it is designed to output data in the
correct format and thereby to avoid display issues or glitches in the
Pattrn app.
However, you can also enter and edit data directly on the Master
spreadsheet , using Google Sheets. For example, if you want to
visualise an existing dataset of events, you can copy and paste the
available data in the PATTRN_Master
spreadsheet. Make sure to format it
correctly, by always referring to the “Data Formatting Reference” sheet
in the PATTRN_Master
spreadsheet.
Customising your data structure¶
For any event in the dataset in use with Pattrn, the following are the
fixed fields – the headers of which must not be modified in the
PATTRN_Master
spreadsheet:
unique_event_ID
location_name
latitude
longitude
geo-accuracy
date_time
event_summary
source_name
In addition, the data structure of the dataset to be used with Pattrn can be customised by adding up to 5 fields of numeric data, up to 5 fields of data tags , and up to 5 fields of boolean data (Yes/No).
You can create these custom columns of data either inside the Pattrn
Editor, or by manually renaming the header of one of the column
comprised between “I” and “W” in the PATTRN_Master
spreadsheet.
Photos, Videos, Web Links¶
Using the Pattrn Editor, you can link Photos , Videos , and Web Links to each event.
All the Photos uploaded through the Pattrn Editor will be stored in
the PATTRN_Photos
folder you have created in your Google Drive
when setting up your Google Sheets data source and the associated
Pattrn Editor.
Videos related to an event can be embedded in the Pattrn app. The Pattrn Editor integrates an interface to embed videos from YouTube. Note that Pattrn does not support the upload of actual video files: videos must first be uploaded to YouTube in order to be embedded in a Pattrn app.
The Pattrn Editor also integrates an interface to attach Web Links to an event. Those links can point to web pages, or to PDF files online (such as a full version of a report by an NGO, for example).
When an Editor uses the Pattrn Editor to attach a Photo, Video, or Link
to an event, the Pattrn Editor populates the corresponding fields in the
Master spreadsheet with a JSON object (contained within the {
and }
characters), which contains all the information needed for the Pattrn
app to display the content correctly, together with its related
information.
Automatically populated fields¶
When using the Pattrn Editor, certain fields in the PATTRN_Master
spreadsheet will be automatically populated. Those fields are:
unique_event_ID
: assigns a unique Event ID to all new Events.geo-accuracy
: provides an indication of the accuracy of the geolocation result, on the basis of the textual locational information entered in the Location field.media_available
: populated with a series of tags corresponding to the types of media that have been attached to each event.
Required fields¶
In order for the Pattrn app to successfully load and display data
from the PATTRN_Master
spreadsheet, all rows that are not completely
empty need to have correctly formatted data in at least three key
fields. Those fields are:
latitude
longitude
date_time
Warning : if a single row in a large dataset hosted on
Google Sheets lacks data, or contains data that is wrongly
formatted, in one of these three fields,
the Pattrn app won’t load any data. For this reason, it is
recommended to use the Pattrn Editor. When it is necessary to edit
directly into the PATTRN_Master
spreadsheet, Editors need to be very
careful with the formatting of data in these three key fields in
particular. Data validation custom formulas have been integrated in the
PATTRN_Master
spreadsheet, in order to facilitate the identification of
any wrongly formatted row. (see Troubleshooting).
Data Formatting Reference¶
The PATTRN_Master
spreadsheet contains a sheet named ” Data
Formatting Reference” (third tab at the bottom of the spreadsheet).
This sheet features a row corresponding to a dummy event, every field of which holds data that is correctly formatted for use with Pattrn.
Do always consult this sheet when entering data directly into the
PATTRN_Master
spreadsheet.
Data limits¶
Indicative maximum number of events in a Dataset¶
The Pattrn app and Pattrn Editor have been tested to work with a dataset of 2,000 events, with every event containing data in the 15 different custom data fields as well as photos, videos and links.
Above this volume of data in the dataset, it is possible that the Pattrn app and/or the Pattrn Editor will be less responsive, or even stop working, especially when using older browsers on less powerful computers.
Nevertheless, the Pattrn app and the Pattrn Editor have also passed the performance test with larger datasets in terms of number of events (15,000+), but with lesser amounts of data per event.
Note: In order to deliver a dynamic interactive experience, the Pattrn
app needs to load all the data contained in the PATTRN_Master
spreadsheet, at once, when the app is accessed online. For this
reason, if you’re using Pattrn with a large dataset, it may take up to
30 seconds for the Pattrn app to load all the data in the first
instance, especially when accessing the app on slower mobile connections.
Please be patient.
Concerning tags¶
For each column of tag data entered in the dataset, a Bar Chart will be automatically generated by the Pattrn app.
For these Bar Charts to display correctly, it is recommended to use:
- no more than 12 different tags per column of tag data.
- tags no longer than 24 characters each
Note: this applies to the source_name
column as well.