.. _managing-data: ======================== Managing data for Pattrn ======================== Principles of Pattrn Data structure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Pattrn works with datasets of **events**. An event is defined as row of data that contains a **date/time** information, as well as a set of **geographical coordinates**. The details about each event are expressed as a series of **attributes** in the row of data. The current version of Pattrn (2.0) can only visualise events with **point** coordinate; support for other kinds of GeoJSON geometry objects (e.g. linestrings or polygons) may be added in future versions. .. _managing-data-geojson: GeoJSON datasets ---------------- The preferred data format for Pattrn datasets from Pattrn version 2.0 onwards is GeoJSON, as this provides much better flexibility than the Google Sheets data source type used in Pattrn version 1.0 (and still available as an option in Pattrn v2.0). For example, GeoJSON files can easily be produced through scripts that gather data from a database, clean and filter it and finally export it to a GeoJSON file. Doing so with Google Sheets is possible but it requires a greater effort and using brittle workflows. Additionally, some Pattrn users will likely want to avoid using Google's proprietary and opaque cloud services when collecting sensitive data from vulnerable contributors. Cleaning up data and preparing it for Pattrn ............................................ When using GeoJSON as data format for Pattrn datasets, you will want to make sure that: * each object's geometry is of type ``Point`` (other with any other type of geometry will be ignored by Pattrn) Additionally, a few core properties, when set (if not compulsory), **must** have exactly the names listed below: ``pattrn_date_time`` (compulsory field) A date and time **must** be provided for each object, using the `ISO 8601 format `_. Year, month, day of the month, hour and minute **must** be provided for each event (although times are not used in charts, so if dealing with events with no time you may want to set an arbitrary time of the day such as 12:00). Timezone offsets are currently not supported. Events without a ``pattrn_date_time`` property defined, or whose ``pattrn_date_time`` property cannot be parsed as an ISO 8601 date/time will automatically be discarded while the dataset is being first loaded. ``location_name`` A name of the location where the event described by the object happened (this is displayed in each event's detail view). ``event_summary`` A short description of the event (this is displayed in each event's detail view). All the other properties/variables can have arbitrary names, but it is advisable to: * only use alphanumeric characters and the underscore (``_``) character (avoiding any other characters, foremost spaces) * keep variable names short; this reduces the size of the GeoJSON file and these short names will not be displayed to the App's users if a descriptive label for each field is specified in the instance's metadata file Preparing metadata for a dataset ................................ When loading a dataset, Pattrn will use the variable names listed above for its core variables and interpret them according to the meaning defined; all other (i.e. dataset-specific) variables need to be configured in a metadata file so that Pattrn can use them correctly, for example by plotting *integer* variables on line charts, *tag* and *boolean* variables on bar charts, and so on. Additionally, a metadata file can (and definitely *should*) be used to associate a brief, meaningful description to each variable: the Pattrn code will use the short variable name contained in the GeoJSON file, while showing to users the brief description configured. The format of the metadata file is documented in the :ref:`getting-started-metadata` section of the :ref:`getting-started` tutorial: please refer to it for help on how to prepare your metadata files. Whereas for the sake of simplicity the YAML file format is used in the tutorial, Pattrn can also read metadata files in JSON format: by using the ``.json`` file extension instead of ``.yaml``, Pattrn will process the metadata file accordingly. .. _managing-data-gapps: Google Sheets datasets ---------------------- The Google Sheets data source was the only available option in the first version of Pattrn. Using Google Sheets (whether by editing data directly in the sheet or through the Pattrn Editor) can be easier than handling a GeoJSON file for first-time users, although the way this kind of data source is used in Pattrn has several limitations: * only variables of type *integer*, *tag* and *boolean* are supported (no variables of type *tree*, which were introduced with Pattrn 2.0) * only at most *five* variables of each variable type are supported (e.g. only up to five *integer* variables); this may be enough for simple datasets, but more advanced users will likely need more flexibility: Pattrn 2.0 supports an arbitrary numer of variables of each variable type. * it is not possible to visually differentiate between events that originate from distinct source datasets: Pattrn 2.0 using GeoJSON datasets can be used to aggregate events reported by different organizations or by different sources and bundle them into a single GeoJSON dataset *while still allowing to visually differentiate* between the different sources (by using map markers of different colours) * additionally, when using distinct source datasets as described in the previous point, Pattrn 2.0 allows to expose variables that only apply to individual source datasets, whereas the Google Sheets source for Pattrn 1.0 requires all variables to be shared across all events of the dataset For users who are analysing and visualising basic datasets and who find the spreadsheet interface convenient to edit and manage data, the following documentation applies. Where data is stored .................... .. link to correct section Once a Google Sheets data source has been set up (see relevant section of this manual), data visualised through Pattrn is stored in the ``PATTRN_Master`` spreadsheet. This is pre-formatted with the data template in use with Pattrn. Two ways of editing data ........................ It is recommended to use the **Pattrn Editor** to enter and edit data on the ``PATTRN_Master`` spreadsheet, as it is designed to output data in the correct format and thereby to avoid display issues or glitches in the Pattrn app. However, you can also **enter and edit data directly on the Master spreadsheet** , using Google Sheets. For example, if you want to visualise an existing dataset of events, you can copy and paste the available data in the ``PATTRN_Master`` spreadsheet. Make sure to format it correctly, by always referring to the "Data Formatting Reference" sheet in the ``PATTRN_Master`` spreadsheet. Customising your data structure ............................... For any event in the dataset in use with Pattrn, the following are the **fixed fields** – the headers of which must not be modified in the ``PATTRN_Master`` spreadsheet: * ``unique_event_ID`` * ``location_name`` * ``latitude`` * ``longitude`` * ``geo-accuracy`` * ``date_time`` * ``event_summary`` * ``source_name`` In addition, the data structure of the dataset to be used with Pattrn can be customised by adding up to **5 fields of numeric data**, up to **5 fields of data tags** , and up to **5 fields of boolean data (Yes/No)**. You can create these custom columns of data either inside the Pattrn Editor, or by manually renaming the header of one of the column comprised between "I" and "W" in the ``PATTRN_Master`` spreadsheet. Photos, Videos, Web Links ......................... Using the Pattrn Editor, you can link **Photos** , **Videos** , and **Web Links** to each event. All the **Photos** uploaded through the Pattrn Editor will be stored in the ``PATTRN_Photos`` folder you have created in your Google Drive when setting up your Google Sheets data source and the associated Pattrn Editor. **Videos** related to an event can be embedded in the Pattrn app. The Pattrn Editor integrates an interface to embed videos from YouTube. Note that Pattrn does not support the upload of actual video files: videos must first be uploaded to YouTube in order to be embedded in a Pattrn app. The Pattrn Editor also integrates an interface to attach **Web Links** to an event. Those links can point to web pages, or to PDF files online (such as a full version of a report by an NGO, for example). When an Editor uses the Pattrn Editor to attach a Photo, Video, or Link to an event, the Pattrn Editor populates the corresponding fields in the Master spreadsheet with a JSON object (contained within the ``{`` and ``}`` characters), which contains all the information needed for the Pattrn app to display the content correctly, together with its related information. Automatically populated fields .............................. When using the Pattrn Editor, certain fields in the ``PATTRN_Master`` spreadsheet will be automatically populated. Those fields are: * ``unique_event_ID``: assigns a unique Event ID to all new Events. * ``geo-accuracy``: provides an indication of the accuracy of the geolocation result, on the basis of the textual locational information entered in the Location field. * ``media_available``: populated with a series of tags corresponding to the types of media that have been attached to each event. Required fields ............... In order for the Pattrn app to successfully load and display data from the ``PATTRN_Master`` spreadsheet, all rows that are not completely empty need to have correctly formatted data in at least three key fields. Those fields are: * ``latitude`` * ``longitude`` * ``date_time`` **Warning** : if a **single row** in a large dataset hosted on Google Sheets lacks data, or contains data that is wrongly formatted, in one of these three fields, **the Pattrn app won't load any data**. For this reason, it is recommended to use the Pattrn Editor. When it is necessary to edit directly into the ``PATTRN_Master`` spreadsheet, Editors need to be very careful with the formatting of data in these three key fields in particular. Data validation custom formulas have been integrated in the ``PATTRN_Master`` spreadsheet, in order to facilitate the identification of any wrongly formatted row. (see :ref:`troubleshooting`). Data Formatting Reference ......................... The ``PATTRN_Master`` spreadsheet contains a sheet named " **Data Formatting Reference**" (third tab at the bottom of the spreadsheet). This sheet features a row corresponding to a dummy event, every field of which holds data that is correctly formatted for use with Pattrn. Do always consult this sheet when entering data directly into the ``PATTRN_Master`` spreadsheet. Data limits ........... Indicative maximum number of events in a Dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Pattrn app and Pattrn Editor have been tested to work with a dataset of 2,000 events, with every event containing data in the 15 different custom data fields as well as photos, videos and links. Above this volume of data in the dataset, it is possible that the Pattrn app and/or the Pattrn Editor will be less responsive, or even stop working, especially when using older browsers on less powerful computers. Nevertheless, the Pattrn app and the Pattrn Editor have also passed the performance test with larger datasets in terms of number of events (15,000+), but with lesser amounts of data per event. *Note: In order to deliver a dynamic interactive experience, the Pattrn app needs to load all the data contained in the* ``PATTRN_Master`` *spreadsheet, at once, when the app is accessed online. For this reason, if you're using Pattrn with a large dataset,* **it may take up to 30 seconds for the Pattrn app to load all the data** *in the first instance, especially when accessing the app on slower mobile connections. Please be patient.* Concerning tags ~~~~~~~~~~~~~~~ For each column of tag data entered in the dataset, a Bar Chart will be automatically generated by the Pattrn app. For these Bar Charts to display correctly, it is recommended to use: * **no more than 12 different tags** per column of tag data. * **tags no longer than 24 characters each** *Note: this applies to the* ``source_name`` *column as well.*