Sampling Procedure
The four sites - Korogocho, Viwandani, Mathare, and Kibera - were purposely selected due to their large sizes and our research knowledge of two of the sites where APHRC runs the Nairobi Urban Health and Demographic Surveillance System (NUHDSS) NUHDSS. To allow for representation of the household sample over the 4 sites, we will adopt a multistage sampling procedure. According to the (Kenya National Bureau of Statistics (KNBS), 2019), the total number of households in 2019 within the Nairobi City County was 1,506,888 which represents a proportion of 12.4 percent of the total number of households in Kenya.
In determining the sample for our study, we used a formula by the United Nations Statistics Division handbook of practical guidelines on designing household survey samples compiled by (UNSTATS, 2008) due to its clarity in the elaboration of the sample estimation specifications (Ahsan et al., 2016; Miller et al., 2020). We estimated the proportion of households with primary and/or secondary school-going children enrolled in private schools within our study area to be about 50 percent at a confidence interval of 95 percent (the range within which a population parameter would fall). The margin of error of 5 percent, and anticipated rate of non-response and attrition of 30 percent as a result of outmigration that may be due to COVID-19, government measures on restrictions of movements, economic hardships such as loss of employment at the same time considering that the households that have been able to withstand economic hardships for over a year may exhibit more resilience than those who left shortly after initial government precautionary measures on COVID-19). Our sample accounted for the combined non-response and attrition rate of 30% based on previous experience in the studies conducted in the slums. In the computition this was considered as k. We estimated an average household size of 2.9 based on the 2019 census for Nairobi City county and an assumed design effect of 2.0. Based on these parameters, our household sample was 883. The study was conducted in the four informal settlements within Nairobi (Korogocho, Viwandani, Mathare, and Kibera) with the number of villages in each slum. A listing exercise will be conducted as described in a later section among the four slums. Nairobi's urban informal settlements share similar characteristics, for instance, high population density, overcrowded structures, inadequate water, and sanitation services, and proliferation of low-cost private schools among others, with other informal settlements in the country due to rural-urban migration. Sample allocation for each slum site will be done in line with the sample distribution table below which was proportionally allocated based on the number of villages in each slum. In the allocation, we assumed that the number of villages is commensurate with the population of households present in a slum.
We used the sample estimation formula adapted from a UN guideline compiled by (UNSTATS, 2008).
- nh is the parameter to be calculated and is the sample size in terms of the number of households to be selected;
- z is the statistic that defines the level of confidence desired, in our case 95 percent;
- r is an estimate of a key indicator to be measured by the survey, in our case the key indicator is the prevalence of enrolment in an LCPS; we estimated 50 percent, a proportion that maximizes the sample (our data show 47 percent in 2013);
- f is the sample design effect, assumed to be 2.0;
- k is a multiplier to account for the anticipated rate of non-response and attrition given that we will collect data in subsequent rounds;
- p is provided by a product of 0.03 and the number of years in the age range that the target population of interest in a household represents, 0.03 is considered as a reasonable rule of thumb (UNSTATS, 2008) in our case the range is 6-18 years, hence a range size of 13;
- ñ is the average household size (total number of persons in a household) - in our case 2.9 according to 2019 Kenya Census data;
- e is the margin of error to be attained, in our case 5 percent.
The following steps were used to identify an appropriate sample in the four slum areas:
a) All villages within each of the sites were listed. Typical slums (like our four sites) in Kenya were characterized by an overcrowded and continuous mass of dwelling structures, narrow and jammed service roads (commonly used by riders, and bicycle taxis also referred to as boda boda), and very narrow footpaths leading to dwelling areas that were off the narrow service road. We therefore worked with local guides and consulted administrative leaders such as chiefs to select an equal number of households from each listed village. In this study, existing villages were used as the boundaries were known by the local community leaders which was crucial for our design and they had some uniqueness.
b) In each EA, a landmark was identified that was next to a service road. Such a landmark could be a chief's camp, a church, a school, or a 'big market among others.
c) From the landmark, and using a local guide (for direction, boundary identification, and security), the enumerator started listing households moving towards a defined direction from the landmark, and along the service road. This allowed listing households that were deep inside an EA. Households along the service roads were not listed as the majority of structures were mainly used as small informal business premises.
d) If the identified landmark was at an EA boundary, then the enumerator moved towards the interior of an EA, but if it was not near an EA boundary, then the enumerator started with one direction along the service road then came back to continue towards the other direction after reaching the boundary while using the first direction. The listing took place deep inside a village and was guided by the footpaths to allow for the selection of households across a village.
e) To balance costs and sample efficiency (in terms of household representation of the villages), the enumerator identified every 10th household from the point where the footpath connected with the service road. Slums such as Kibra were quite huge. Due to cost implications, listing was not done for every household. However, it was acknowledged that this might create a bias - adopting a systematic sampling technique whereby every 10th household that met the set criteria was listed to develop a sampling frame thus mitigating the effect of the bias. This was a limitation of this exploratory study. The household qualified to be listed if it had at least one school-age child who was enrolled in school before school closure due to COVID-19. If it qualified, then its characteristics were enumerated such that the list of households with primary and/or secondary school-age children with the following pieces of information was obtained: Slum, village, RoomID that was marked using a marker pen at the door, the GPS locations, phone contact (if known), number of primary-school-age children (6-13 years), number of secondary school-age children (14-17 years); household head gender, and household head age. If a HH did not qualify, the enumerator moved to the next immediate household until s\he found a qualifying household. Thereafter the enumerator moved to the 10th household from the last to qualify and repeated the process of enumeration, until the end of the footpath and/or EA boundary - taking into consideration dwelling structures that may be along 'mini' footpaths. It was critical to collect the GIS positioning of the household structure as well as allocation of IDs; each slum was allocated a slumID, each village, a villageID before the commencement of the listing exercise, whereas, for householdID and RoomID, each field interviewer (FI) was allocated a slot, say 001-010, another 011-020, etc. in a village. The phone number(s) of the household head was recorded for use to identify the location of the household during actual data collection. During the listing and data collection process, enumerators verified phone numbers and ensured the research team followed up with the recruited households (while adhering to research ethics) to reduce risks of high attrition for the subsequent survey rounds.
f) After enumerating all eligible households in the four slums using the procedures described above, random numbers were assigned using STATA, then the required number of households in each village and/or site were randomly selected.