High Level Project Plan¶
Please note that this is a live document which will constantly be updated during the course of this project
Last Updated: 28th October, 2017
The following list outlines the broad range of activities that will be carried out by KLL during the first phase:
1. Data Sense-making¶
As already outlined in our previous notes – Housing Reconstruction Data: Notes on overall data structure/format – the data that KLL will be using comprises of 8 structured tables that consists of a wealth of information – from building to individuals – spread across (close to) 250 variables. This data has been provided by the CBS after it performed some internal processing and cleaning of the raw data that was collected directly through the app. The purpose of the sense-making exercise is for KLL to thoroughly examine this data in order to:
Understand the meaning of the data contained within each variable,
Look for inconsistencies, such as, but not limited to:
- Missing entries.
- Outliers and unexpected values.
- The presence of information other than what they are supposed to hold.
Furthermore, this exercise is also indented to help us identify how the data is going to be restructured for use in the open data portal. More specifically, the data sense-making exercise will help us:
- Identify naming conventions for variables across all 8 columns,
- Identify new variables that are not currently included, but can be extracted from the database.
2. Data Restructuring¶
Once the sense-making exercise is over, we will be working on writing data transformation and manipulation scripts (in R and PostgreSQL) to implement changes that have been identified in the previous stage.
3. Preparation of Analytical Data Set(ADS)¶
This stage involves the creation of a single massive table at an appropriate level (building, household, individual), which will also contain household and building level attributes for each individual. This table, which is known as the analytical dataset, will serve as the base table for all of the statistics and future tables that will be generated for the Open Data portal.
Having an ADS allows for more efficiency and accuracy owing to the fact that it reduces errors that may arise due to table joins. Furthermore, once the table is checked for accuracy and reliability, it can serve as the single source of truth for all statistics that are generated by multiple users, which greatly aids in quality control.
4. Check Point: Regenerating statistics by the CBS¶
This is an internal quality control stage where the ADS will be used to generate statistics that published in the reports generated by the CBS. Please refer to the links below:
5. Portal conceptualization and design¶
The portal conceptualization stage – which will be running in parallel to the previous stage – will involve team members brainstorming and designing the features and specifications of the actual open data portal, and identifying requirements for the same. This phase may be accompanied by the development of wireframes, identification of use cases, as well as system architecture design.
6. Portal development¶
Based on specifications identified in stage 5, this stage involves the actual development of the portal application for viewing on the web. The end of this stage is marked by making available a web address where the portal will be located.