Building table¶
Note: This document captures information that may have been updated. Please refer to the Summary and findings section instead.
Summary¶
Codes vs. Labels: The current dataset has codes assigned for each value of a categorical variable. For instance, the variable gd_floor which indicates the type of ground floor has the following codes associated with it:
- 1, which denotes “Mud”
- 2, which denotes “Brick/Stone”
- 3, which denotes “Timber”
- 4, which denotes “RC”
- 5, which denotes “Other”
An alternate way of storing this information would be in the form of labels, as opposed to numbers. That is,
- mud, which denotes “Mud”
- brick_stone, which denotes “Brick/Stone”
- timber, which denotes “Timber”
- rc, which denotes “RC”
- other, which denotes “Other”
Storing them as numbers require less space, but it comes at the cost of the user needing a reference for understanding what the number means.
What do we want to do?
Variable Types: A brief examination of the survey questionnaire suggests that questions that were asked fall in either one of the following two categories:
- Single select questions: Where the surveyor selects only one out of many options provided.
- Multi select questions: Where the surveyor can select one or more answers out of the many options provided.
For every single select question in the survey, there is only one corresponding column in the data that capture the answer for a particular building. Data cleaning effort for these variables is minimal, depending on the route we take as per point (1)
Multi select questions: Some of the questions, because of their multiselect nature has more than one column associated with them):
- Superstructure type has 11 columns ranging from sup_str1 to sup_str11
- Type of geotechnical risk has 7 columns ranging from gersk_ls1 to gersk_ls3
- Type of secondary use has 10 columns ranging from secuse_ls1 to secuse_ls10
For these questions, additional data cleaning work would be required to make information more usable.
Missing Definitions: Not all of the columns had their corresponding definitions in the data dictionary provided by CBS. Whilst is was clear what information the column contained for a few of the columns (dist, vcode, vdcmun, ward, EA, howner_sn, house_sn, legl_own) based on their names, further information is needed for the following columns:
- delam1,
- delam2,
- delam3,
- fam_cn,
- hgt_pre,
- hgt_pos,
- pl_area (I assume this refers to the plinth area of the building, but how is it captured (numeric vs. categorical)? )
- age (I assume that this variable refers to the age of the building, but how is it captured (numeric vs. categorical))
- floor_pos,
- floor_pre,
After examination of the survey qyuestionnaire, it appears most of the above information comes from a single table in the questionnaire:
![]()
Damage Assessment Variables: Information for damage assessment is spread across groups of variables. For example, for users to get complete information on building foundation damage, they will have to go through three variables viz. dm_fndtn1, dm_fndtn2, dm_fndtn3. Other variables that have a similar nature include:
- dm_roof1, dm_roof2, dm_roof3
- corn_sep1, corn_sep2, corn_sep3
- diag_cr1, diag_cr2, diag_cr3
- pl_fail1, pl_fail2, pl_fail3
- op_fail1, op_fail2, op_fail3
- op_fl_nl1, op_fl_nl2, op_fl_nl3
- dm_gabl1, dm_gabl2, dm_gabl3
- delam1, delam2, delam3
- col_fail1, col_fail2, col_fail3
- beam_fl1, beam_fl2, beam_fl3
- str_case1, str_case2, str_case3
- parapet1, parapet2, parapet3
- clad_glz1, clad_glz2, clad_glz3
- clad_glz1, clad_glz2, clad_glz3
Furthermore, information for “No damage” is contained as a categorical value within the first out of three variable, as illustrated by the picure below.
![]()
Suggestions for improvement:
- Rename column headers to include severity ie. dm_fndtn_severe, dm_fndtn_moderate, dm_fndtn_insignfcant
- Seperate information about no damage to a separate flag variable, dm_fndtn_flag
Methodology¶
This section contains code used for the analysis, please jump to the next section for results
A small subset of 5000 rows were taken as an input for analysis. All variables in the table were converted to factors, and a summary was yielded from each table.
# Exploration of building table from pgSql database
library(RPostgreSQL)
library(dplyr)
root.dir <- "~/hrrp/openhrrp-src"
pg <- dbDriver("PostgreSQL")
con <- dbConnect(pg, user="postgres", password="postgres",
host="localhost", dbname="openhrrp")
# dbExistsTable(con, "building")
df_building <- dbGetQuery(con, "SELECT * from building limit 5000; ")
df_building <- as.data.frame(sapply(df_building, as.numeric))
write.csv(as.data.frame(colnames(df_building)), file=paste0(root.dir, "/outputs/buildingColNames.csv"))
df_building_af <- as.data.frame(sapply(df_building, as.factor))
summarizeColumn <- function(column) {
return (summary(column))
}
summarizeColumn(df_building_af$fam_cn)
Results¶
The table below outlines current status of the same:
| Column Name | Present in building data dictionary | Name | Type | Range | Is multiselect? | Comments |
| house_cd | Yes | House Condition | Categorical | 1-8 | ||
| floor_pre | No | ? (Not provided) | ? (Not provided) | ? (Not provided) | ||
| floor_pos | No | ? (Not provided) | ? (Not provided) | ? (Not provided) | ||
| age | No | ? (Not provided) | ? (Not provided) | ? (Not provided) | ||
| pl_area | No | ? (Not provided) | ? (Not provided) | ? (Not provided) | ||
| hgt_pre | No | ? (Not provided) | ? (Not provided) | ? (Not provided) | ||
| hgt_pos | No | ? (Not provided) | ? (Not provided) | ? (Not provided) | ||
| surf_con | Yes | Surface condition of house/land | Categorical | 1-3 | ||
| foundatn | Yes | Foundation Type of house | Categorical | 1-5 | ||
| roof | Yes | Roof Type | Categorical | 1-3 | ||
| gd_floor | Yes | Ground Floor Type | Categorical | 1-5 | ||
| up_floor | Yes | Floor Type(except ground floor and roof) | 1-4 | |||
| sup_str1 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| sup_str2 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| sup_str3 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| sup_str4 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| sup_str5 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| sup_str6 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| sup_str7 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| sup_str8 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| sup_str9 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| sup_str10 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| sup_str11 | Yes | Super Structure of the house | Categorical | 1-11 | Yes | |
| positn | Yes | Position of the House | Categorical | 1-4 | ||
| pln_conf | Yes | Plan Configuration of the house | 1-10 | |||
| collapse | Yes | Collapse of Partial Collapse | Categorical | 1-4 | Data has NA’s need to explore why that is the case | |
| leaning | Yes` | Buliding or storey Leaning | Categorical | 1-4 | Data has NA’s need to explore why that is the case | |
| adjbld_rsk | Yes | Adjacent Building Hazard | Categorical | 1-4 | Data has NA’s need to explore why that is the case | |
| dm_fndtn1 | Yes | Foundation | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| dm_fndtn2 | Yes | Foundation | Categorical | 1-3 | Data has NA(s) | |
| dm_fndtn3 | Yes | Foundation | Categorical | 1-3 | Data has NA(s) | |
| dm_roof1 | Yes | Roof/Floor Collapse/PartialCollapse | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| dm_roof2 | Yes | Roof/Floor Collapse/PartialCollapse | Categorical | 1-3 | Data has NA(s) | |
| dm_roof3 | Yes | Roof/Floor Collapse/PartialCollapse | Categorical | 1-3 | Data has NA(s) | |
| corn_sep1 | Yes | Corner Separation | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| corn_sep2 | Yes | Corner Separation | Categorical | 1-3 | Data has NA(s) | |
| corn_sep3 | Yes | Corner Separation | Categorical | 1-3 | Data has NA(s) | |
| diag_cr1 | Yes | Diagonal Cracking | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| diag_cr2 | Yes | Diagonal Cracking | Categorical | 1-3 | Data has NA(s) | |
| diag_cr3 | Yes | Diagonal Cracking | Categorical | 1-3 | Data has NA(s) | |
| pl_fail1 | Yes | In Plane failure of walls | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| pl_fail2 | Yes | In Plane failure of walls | Categorical | 1-3 | Data has NA(s) | |
| pl_fail3 | Yes | In Plane failure of walls | Categorical | 1-3 | Data has NA(s) | |
| op_fail1 | Yes | Out of plane failure of walls carrying floor/roof | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| op_fail2 | Yes | Out of plane failure of walls carrying floor/roof | Categorical | 1-3 | Data has NA(s) | |
| op_fail3 | Yes | Out of plane failure of walls carrying floor/roof | Categorical | 1-3 | Data has NA(s) | |
| op_fl_nl1 | Yes | Out of plane failure of walls not carrying floor/roof | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| op_fl_nl2 | Yes | Out of plane failure of walls not carrying floor/roof | Categorical | 1-3 | Data has NA(s) | |
| op_fl_nl3 | Yes | Out of plane failure of walls not carrying floor/roof | Categorical | 1-3 | Data has NA(s) | |
| dm_gabl1 | Yes | Gable Wall Collapse | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| dm_gabl2 | Yes | Gable Wall Collapse | Categorical | 1-3 | Data has NA(s) | |
| dm_gabl3 | Yes | Gable Wall Collapse | Categorical | 1-3 | Data has NA(s) | |
| delam1 | Yes | ? (Not provided) | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| delam2 | Yes | ? (Not provided) | Categorical | 1-3 | Data has NA(s) | |
| delam3 | Yes | ? (Not provided) | Categorical | 1-3 | Data has NA(s) | |
| col_fail1 | Yes | Column Failure | Categorical | 1-3, 8, 9 | Data only has 1/Nas | |
| col_fail2 | Yes | Column Failure | Categorical | 1-3 | Data only has NA(s) | |
| col_fail3 | Yes | Column Failure | Categorical | 1-3 | Data has NA(s) | |
| beam_fl1 | Yes | Beam Failure | Categorical | 1-3, 8, 9 | Data only has 1/Nas | |
| beam_fl2 | Yes | Beam Failure | Categorical | 1-3 | Data only has NA(s) | |
| beam_fl3 | Yes | Beam Failure | Categorical | 1-3 | Data has NA(s) | |
| part_col1 | Yes | Infill/Partition Wall Damage | Categorical | 1-3, 8, 9 | Data only has 2/Nas | |
| part_col2 | Yes | Infill/Partition Wall Damage | Categorical | 1-3 | Data only has NA(s) | |
| part_col3 | Yes | Infill/Partition Wall Damage | Categorical | 1-3 | Data has NA(s) | |
| str_case1 | Yes | Staircase | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| str_case2 | Yes | Staircase | Categorical | 1-3 | Data has NA(s) | |
| str_case3 | Yes | Staircase | Categorical | 1-3 | Data has NA(s) | |
| parapet1 | Yes | Parapet | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| parapet2 | Yes | Parapet | Categorical | 1-3 | Data has NA(s) | |
| parapet3 | Yes | Parapet | Categorical | 1-3 | Data has NA(s) | |
| clad_glz1 | Yes | Cladding/Glazing | Categorical | 1-3, 8, 9 | Data has NA’s, even though 9=NA, why is that? | |
| clad_glz2 | Yes | Cladding/Glazing | Categorical | 1-3 | Data has NA(s) | |
| clad_glz3 | Yes | Cladding/Glazing | Categorical | 1-3 | Data has NA(s) | |
| gersk | Yes | Geotechnical Risk | Categorical | 1,2 | ||
| gersk_ls1 | Yes | Type of geotechnical Risk | Categorical | 1-7 | Yes | |
| gersk_ls2 | Yes | Type of geotechnical Risk | Categorical | 1-7 | Yes | |
| gersk_ls3 | Yes | Type of geotechnical Risk | Categorical | 1-7 | Yes | |
| gersk_ls4 | Yes | Type of geotechnical Risk | Categorical | 1-7 | Yes | |
| gersk_ls5 | Yes | Type of geotechnical Risk | Categorical | 1-7 | Yes | |
| gersk_ls6 | Yes | Type of geotechnical Risk | Categorical | 1-7 | Yes | |
| gersk_ls7 | Yes | Type of geotechnical Risk | Categorical | 1-7 | Yes | |
| assd_ar | Yes | Area Assessed | Categorical | 1-5 | ||
| dm_grade | Yes | Damage Grade | Categorical | 1-5 | ||
| tech_sol | Yes | Technical Solution | Categorical | 1-4 | ||
| repair | Yes | Have you started reconstruction? | Categorical | 1-2 | ||
| sec_use | Yes | Secondary use of the house? | Categorical | 1-2 | ||
| secuse_ls1 | Yes | Type of secondary use | Categorical | 1-10 | Yes | Data has NA(s) |
| secuse_ls2 | Yes | Type of secondary use | Categorical | 1-10 | Yes | Data has NA(s) |
| secuse_ls3 | Yes | Type of secondary use | Categorical | 1-10 | Yes | Data has NA(s) |
| secuse_ls4 | Yes | Type of secondary use | Categorical | 1-10 | Yes | Data has NA(s) |
| secuse_ls5 | Yes | Type of secondary use | Categorical | 1-10 | Yes | Data has NA(s) |
| secuse_ls6 | Yes | Type of secondary use | Categorical | 1-10 | Yes | Data has NA(s) |
| secuse_ls7 | Yes | Type of secondary use | Categorical | 1-10 | Yes | Data has NA(s) |
| secuse_ls8 | Yes | Type of secondary use | Categorical | 1-10 | Yes | Data has NA(s) |
| secuse_ls9 | Yes | Type of secondary use | Categorical | 1-10 | Yes | Data has NA(s) |
| secuse_ls10 | Yes | Type of secondary use | Categorical | 1-10 | Yes | Data has NA(s) |
| fam_cn | No | ? (Not provided) | ? (Not provided) | ? (Not provided) | ||
| dist | No | District | ||||
| vcode | No | |||||
| vdcmun | No | |||||
| ward | No | |||||
| EA | No | |||||
| howner_sn | No | |||||
| house_sn | No | |||||
| legl_own | No |