Building table

Note: This document captures information that may have been updated. Please refer to the Summary and findings section instead.

Summary

  1. Codes vs. Labels: The current dataset has codes assigned for each value of a categorical variable. For instance, the variable gd_floor which indicates the type of ground floor has the following codes associated with it:

    1. 1, which denotes “Mud”
    2. 2, which denotes “Brick/Stone”
    3. 3, which denotes “Timber”
    4. 4, which denotes “RC”
    5. 5, which denotes “Other”

    An alternate way of storing this information would be in the form of labels, as opposed to numbers. That is,

    1. mud, which denotes “Mud”
    2. brick_stone, which denotes “Brick/Stone”
    3. timber, which denotes “Timber”
    4. rc, which denotes “RC”
    5. other, which denotes “Other”

    Storing them as numbers require less space, but it comes at the cost of the user needing a reference for understanding what the number means.

    What do we want to do?

  2. Variable Types: A brief examination of the survey questionnaire suggests that questions that were asked fall in either one of the following two categories:

    1. Single select questions: Where the surveyor selects only one out of many options provided.
    2. Multi select questions: Where the surveyor can select one or more answers out of the many options provided.

    For every single select question in the survey, there is only one corresponding column in the data that capture the answer for a particular building. Data cleaning effort for these variables is minimal, depending on the route we take as per point (1)

  3. Multi select questions: Some of the questions, because of their multiselect nature has more than one column associated with them):

    1. Superstructure type has 11 columns ranging from sup_str1 to sup_str11
    2. Type of geotechnical risk has 7 columns ranging from gersk_ls1 to gersk_ls3
    3. Type of secondary use has 10 columns ranging from secuse_ls1 to secuse_ls10

    For these questions, additional data cleaning work would be required to make information more usable.

  4. Missing Definitions: Not all of the columns had their corresponding definitions in the data dictionary provided by CBS. Whilst is was clear what information the column contained for a few of the columns (dist, vcode, vdcmun, ward, EA, howner_sn, house_sn, legl_own) based on their names, further information is needed for the following columns:

    1. delam1,
    2. delam2,
    3. delam3,
    4. fam_cn,
    5. hgt_pre,
    6. hgt_pos,
    7. pl_area (I assume this refers to the plinth area of the building, but how is it captured (numeric vs. categorical)? )
    8. age (I assume that this variable refers to the age of the building, but how is it captured (numeric vs. categorical))
    9. floor_pos,
    10. floor_pre,

    After examination of the survey qyuestionnaire, it appears most of the above information comes from a single table in the questionnaire:

    _images/missing_defs.jpg
  5. Damage Assessment Variables: Information for damage assessment is spread across groups of variables. For example, for users to get complete information on building foundation damage, they will have to go through three variables viz. dm_fndtn1, dm_fndtn2, dm_fndtn3. Other variables that have a similar nature include:

    1. dm_roof1, dm_roof2, dm_roof3
    2. corn_sep1, corn_sep2, corn_sep3
    3. diag_cr1, diag_cr2, diag_cr3
    4. pl_fail1, pl_fail2, pl_fail3
    5. op_fail1, op_fail2, op_fail3
    6. op_fl_nl1, op_fl_nl2, op_fl_nl3
    7. dm_gabl1, dm_gabl2, dm_gabl3
    8. delam1, delam2, delam3
    9. col_fail1, col_fail2, col_fail3
    10. beam_fl1, beam_fl2, beam_fl3
    11. str_case1, str_case2, str_case3
    12. parapet1, parapet2, parapet3
    13. clad_glz1, clad_glz2, clad_glz3
    14. clad_glz1, clad_glz2, clad_glz3

    Furthermore, information for “No damage” is contained as a categorical value within the first out of three variable, as illustrated by the picure below.

    _images/dm_fndtn1.png

    Suggestions for improvement:

    1. Rename column headers to include severity ie. dm_fndtn_severe, dm_fndtn_moderate, dm_fndtn_insignfcant
    2. Seperate information about no damage to a separate flag variable, dm_fndtn_flag

Methodology

This section contains code used for the analysis, please jump to the next section for results

A small subset of 5000 rows were taken as an input for analysis. All variables in the table were converted to factors, and a summary was yielded from each table.

# Exploration of building table from pgSql database
library(RPostgreSQL)
library(dplyr)

root.dir <- "~/hrrp/openhrrp-src"

pg <- dbDriver("PostgreSQL")
con <- dbConnect(pg, user="postgres", password="postgres",
                 host="localhost", dbname="openhrrp")

# dbExistsTable(con, "building")

df_building <- dbGetQuery(con, "SELECT * from building limit 5000; ")
df_building <- as.data.frame(sapply(df_building, as.numeric))

write.csv(as.data.frame(colnames(df_building)), file=paste0(root.dir, "/outputs/buildingColNames.csv"))

df_building_af <- as.data.frame(sapply(df_building, as.factor))


summarizeColumn <- function(column) {
  return (summary(column))
}

summarizeColumn(df_building_af$fam_cn)

Results

The table below outlines current status of the same:

Column Name Present in building data dictionary Name Type Range Is multiselect? Comments
house_cd Yes House Condition Categorical 1-8    
floor_pre No ? (Not provided) ? (Not provided) ? (Not provided)    
floor_pos No ? (Not provided) ? (Not provided) ? (Not provided)    
age No ? (Not provided) ? (Not provided) ? (Not provided)    
pl_area No ? (Not provided) ? (Not provided) ? (Not provided)    
hgt_pre No ? (Not provided) ? (Not provided) ? (Not provided)    
hgt_pos No ? (Not provided) ? (Not provided) ? (Not provided)    
surf_con Yes Surface condition of house/land Categorical 1-3    
foundatn Yes Foundation Type of house Categorical 1-5    
roof Yes Roof Type Categorical 1-3    
gd_floor Yes Ground Floor Type Categorical 1-5    
up_floor Yes Floor Type(except ground floor and roof)   1-4    
sup_str1 Yes Super Structure of the house Categorical 1-11 Yes  
sup_str2 Yes Super Structure of the house Categorical 1-11 Yes  
sup_str3 Yes Super Structure of the house Categorical 1-11 Yes  
sup_str4 Yes Super Structure of the house Categorical 1-11 Yes  
sup_str5 Yes Super Structure of the house Categorical 1-11 Yes  
sup_str6 Yes Super Structure of the house Categorical 1-11 Yes  
sup_str7 Yes Super Structure of the house Categorical 1-11 Yes  
sup_str8 Yes Super Structure of the house Categorical 1-11 Yes  
sup_str9 Yes Super Structure of the house Categorical 1-11 Yes  
sup_str10 Yes Super Structure of the house Categorical 1-11 Yes  
sup_str11 Yes Super Structure of the house Categorical 1-11 Yes  
positn Yes Position of the House Categorical 1-4    
pln_conf Yes Plan Configuration of the house   1-10    
collapse Yes Collapse of Partial Collapse Categorical 1-4   Data has NA’s need to explore why that is the case
leaning Yes` Buliding or storey Leaning Categorical 1-4   Data has NA’s need to explore why that is the case
adjbld_rsk Yes Adjacent Building Hazard Categorical 1-4   Data has NA’s need to explore why that is the case
dm_fndtn1 Yes Foundation Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
dm_fndtn2 Yes Foundation Categorical 1-3   Data has NA(s)
dm_fndtn3 Yes Foundation Categorical 1-3   Data has NA(s)
dm_roof1 Yes Roof/Floor Collapse/PartialCollapse Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
dm_roof2 Yes Roof/Floor Collapse/PartialCollapse Categorical 1-3   Data has NA(s)
dm_roof3 Yes Roof/Floor Collapse/PartialCollapse Categorical 1-3   Data has NA(s)
corn_sep1 Yes Corner Separation Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
corn_sep2 Yes Corner Separation Categorical 1-3   Data has NA(s)
corn_sep3 Yes Corner Separation Categorical 1-3   Data has NA(s)
diag_cr1 Yes Diagonal Cracking Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
diag_cr2 Yes Diagonal Cracking Categorical 1-3   Data has NA(s)
diag_cr3 Yes Diagonal Cracking Categorical 1-3   Data has NA(s)
pl_fail1 Yes In Plane failure of walls Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
pl_fail2 Yes In Plane failure of walls Categorical 1-3   Data has NA(s)
pl_fail3 Yes In Plane failure of walls Categorical 1-3   Data has NA(s)
op_fail1 Yes Out of plane failure of walls carrying floor/roof Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
op_fail2 Yes Out of plane failure of walls carrying floor/roof Categorical 1-3   Data has NA(s)
op_fail3 Yes Out of plane failure of walls carrying floor/roof Categorical 1-3   Data has NA(s)
op_fl_nl1 Yes Out of plane failure of walls not carrying floor/roof Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
op_fl_nl2 Yes Out of plane failure of walls not carrying floor/roof Categorical 1-3   Data has NA(s)
op_fl_nl3 Yes Out of plane failure of walls not carrying floor/roof Categorical 1-3   Data has NA(s)
dm_gabl1 Yes Gable Wall Collapse Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
dm_gabl2 Yes Gable Wall Collapse Categorical 1-3   Data has NA(s)
dm_gabl3 Yes Gable Wall Collapse Categorical 1-3   Data has NA(s)
delam1 Yes ? (Not provided) Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
delam2 Yes ? (Not provided) Categorical 1-3   Data has NA(s)
delam3 Yes ? (Not provided) Categorical 1-3   Data has NA(s)
col_fail1 Yes Column Failure Categorical 1-3, 8, 9   Data only has 1/Nas
col_fail2 Yes Column Failure Categorical 1-3   Data only has NA(s)
col_fail3 Yes Column Failure Categorical 1-3   Data has NA(s)
beam_fl1 Yes Beam Failure Categorical 1-3, 8, 9   Data only has 1/Nas
beam_fl2 Yes Beam Failure Categorical 1-3   Data only has NA(s)
beam_fl3 Yes Beam Failure Categorical 1-3   Data has NA(s)
part_col1 Yes Infill/Partition Wall Damage Categorical 1-3, 8, 9   Data only has 2/Nas
part_col2 Yes Infill/Partition Wall Damage Categorical 1-3   Data only has NA(s)
part_col3 Yes Infill/Partition Wall Damage Categorical 1-3   Data has NA(s)
str_case1 Yes Staircase Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
str_case2 Yes Staircase Categorical 1-3   Data has NA(s)
str_case3 Yes Staircase Categorical 1-3   Data has NA(s)
parapet1 Yes Parapet Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
parapet2 Yes Parapet Categorical 1-3   Data has NA(s)
parapet3 Yes Parapet Categorical 1-3   Data has NA(s)
clad_glz1 Yes Cladding/Glazing Categorical 1-3, 8, 9   Data has NA’s, even though 9=NA, why is that?
clad_glz2 Yes Cladding/Glazing Categorical 1-3   Data has NA(s)
clad_glz3 Yes Cladding/Glazing Categorical 1-3   Data has NA(s)
gersk Yes Geotechnical Risk Categorical 1,2    
gersk_ls1 Yes Type of geotechnical Risk Categorical 1-7 Yes  
gersk_ls2 Yes Type of geotechnical Risk Categorical 1-7 Yes  
gersk_ls3 Yes Type of geotechnical Risk Categorical 1-7 Yes  
gersk_ls4 Yes Type of geotechnical Risk Categorical 1-7 Yes  
gersk_ls5 Yes Type of geotechnical Risk Categorical 1-7 Yes  
gersk_ls6 Yes Type of geotechnical Risk Categorical 1-7 Yes  
gersk_ls7 Yes Type of geotechnical Risk Categorical 1-7 Yes  
assd_ar Yes Area Assessed Categorical 1-5    
dm_grade Yes Damage Grade Categorical 1-5    
tech_sol Yes Technical Solution Categorical 1-4    
repair Yes Have you started reconstruction? Categorical 1-2    
sec_use Yes Secondary use of the house? Categorical 1-2    
secuse_ls1 Yes Type of secondary use Categorical 1-10 Yes Data has NA(s)
secuse_ls2 Yes Type of secondary use Categorical 1-10 Yes Data has NA(s)
secuse_ls3 Yes Type of secondary use Categorical 1-10 Yes Data has NA(s)
secuse_ls4 Yes Type of secondary use Categorical 1-10 Yes Data has NA(s)
secuse_ls5 Yes Type of secondary use Categorical 1-10 Yes Data has NA(s)
secuse_ls6 Yes Type of secondary use Categorical 1-10 Yes Data has NA(s)
secuse_ls7 Yes Type of secondary use Categorical 1-10 Yes Data has NA(s)
secuse_ls8 Yes Type of secondary use Categorical 1-10 Yes Data has NA(s)
secuse_ls9 Yes Type of secondary use Categorical 1-10 Yes Data has NA(s)
secuse_ls10 Yes Type of secondary use Categorical 1-10 Yes Data has NA(s)
fam_cn No ? (Not provided) ? (Not provided) ? (Not provided)    
dist No District        
vcode No          
vdcmun No          
ward No          
EA No          
howner_sn No          
house_sn No          
legl_own No