Archives

  • 2018-07
  • 2018-10
  • 2018-11
  • 2019-04
  • 2019-05
  • 2019-06
  • 2019-07
  • 2019-08
  • 2019-09
  • 2019-10
  • 2019-11
  • 2019-12
  • 2020-01
  • 2020-02
  • 2020-03
  • 2020-04
  • 2020-05
  • 2020-06
  • 2020-07
  • 2020-08
  • 2020-09
  • 2020-10
  • 2020-11
  • 2020-12
  • 2021-01
  • 2021-02
  • 2021-03
  • 2021-04
  • 2021-05
  • 2021-06
  • 2021-07
  • 2021-08
  • 2021-09
  • 2021-10
  • 2021-11
  • 2021-12
  • 2022-01
  • 2022-02
  • 2022-03
  • 2022-04
  • 2022-05
  • 2022-06
  • 2022-07
  • 2022-08
  • 2022-09
  • 2022-10
  • 2022-11
  • 2022-12
  • 2023-01
  • 2023-02
  • 2023-03
  • 2023-04
  • 2023-05
  • 2023-06
  • 2023-07
  • 2023-08
  • 2023-09
  • 2023-10
  • 2023-11
  • 2023-12
  • 2024-01
  • 2024-02
  • 2024-03
  • 2024-04
  • 2024-05
  • br Methods In this study a modification of

    2018-10-29


    Methods In this study a modification of the original Random Forest algorithm, namely Random forests constructed from conditional trees (Strobl et al., 2009b), is used to classify the households into either LPG or non-LPG users. This section continues with a short description of the Random Forest algorithm before moving on to how the Random Forest algorithm is applied in this study. This section ends with a discussion of the variables included in the analysis.
    Bagging and random forest application in this study All the included variables were ranked according to their importance, described in Section 3.1. To evaluate the alzheimer\'s association of the results, multiple forests were constructed using subsamples of the data. A hundred data sets were assembled through subsampling of the original data, and random forest models created for each of them, and subsequently importance values for each forest were recorded. The importance values for the hundred forests that use all variables are presented as boxplots, where each box reflects the dispersion of importance values of a certain variable over the forest. Besides ranking the variables according to their importance values a variable selection method was also used, the Diaz-Uriarte method, a backward selection algorithm developed for Random Forests (Díaz-Uriarte and De Andres, 2006; Diaz-Uriarte, 2007). The Diaz-Uriarte variable selection algorithm uses the importance ranking and then removes the least important variables until the OOB error increases. The same two bootstrap strategies as used for the importance ranking were also used for the variable selection procedure. Besides the prediction of current LPG use in 2008, using the variables listed in Table 1, data from 2005 was used to predict which households that would start to use LPG by 2008. In this case households that were already using LPG in 2005 were removed from the sample. The R software with the package “party” was used for the modelling; please see (Strobl et al., 2009b) for further details.
    Results and analysis The importance ranking shows that several variables that can be associated with wealth and income are judged to be influential for the correctness of household classification, see Fig. 1. It\'s interesting to note that total income, income in 2005 and income in 2002 are all deemed to be important for current LPG use. It\'s worth noting here that the correlations between the income levels for the different years are quite low (<0.5), a sign of relatively large fluctuations between income levels over different years. Various appliances, such as whether the household owns a rice cooker or a refrigerator are also usable predictors for LPG usage. Further important variables are the distance to nearest town, village average land, collection rate and relative fuel price, all of which can be associated with aspects of the degree of rurality, see section 3.2. It is also of interest that several variables previously used to model fuel switching, such as education, the different occupations and whether the household is electrified or not, are judged to be of low importance, i.e. vitamins are suppressed by the other variables. Similar results as Fig. 1, but with the second bootstrap strategy, randomly sampling both villages and households within villages, can be found in the appendix, please see Fig. A1. It may also be of interest to compare the results in Fig. 1 with the univariate linear correlations between the predictor variables and the, see Fig. A2. Although many wealth related variables, such as income, previous incomes, type of house and certain appliances are both alzheimer\ highly correlated with LPG usage and receive high importance, other highly associated variables such as household business, shop, years electrified, the various education measurements are highly correlated but are ranked lower according to importance in the bagging procedure. Some variables receive relatively higher importance compared to their ranking according to the correlations; these are to a large extent the area descriptions, village average land, distance to town and relative fuel price.