Archives

  • 2018-07
  • 2018-10
  • 2018-11
  • 2019-04
  • 2019-05
  • 2019-06
  • 2019-07
  • 2019-08
  • 2019-09
  • 2019-10
  • 2019-11
  • 2019-12
  • 2020-01
  • 2020-02
  • 2020-03
  • 2020-04
  • 2020-05
  • 2020-06
  • 2020-07
  • 2020-08
  • 2020-09
  • 2020-10
  • 2020-11
  • 2020-12
  • 2021-01
  • 2021-02
  • 2021-03
  • 2021-04
  • 2021-05
  • 2021-06
  • 2021-07
  • 2021-08
  • 2021-09
  • 2021-10
  • 2021-11
  • 2021-12
  • 2022-01
  • 2022-02
  • 2022-03
  • 2022-04
  • 2022-05
  • 2022-06
  • 2022-07
  • 2022-08
  • 2022-09
  • 2022-10
  • 2022-11
  • 2022-12
  • 2023-01
  • 2023-02
  • 2023-03
  • 2023-04
  • 2023-05
  • 2023-06
  • 2023-07
  • 2023-08
  • 2023-09
  • 2023-10
  • 2023-11
  • 2023-12
  • 2024-01
  • 2024-02
  • 2024-03
  • 2024-04
  • The results from the variable selection are presented in

    2018-11-09

    The results from the variable selection are presented in Table 2. These results compare well with the importance ranking of variables in the random forest and bagging procedures. Similarly, to the previous results, we can see that Dig-11-utp Supplier the current income, income 2005 and 2002 and other wealth measurements such as number of appliances, refrigerator, rice cooker and type of house, are almost always present in the reduced models. Other variables that often appear are the distance to nearest town, village average land, collection rate and relative fuel price. The distance to nearest town is chosen in almost all reduced models, regardless of subsampling strategy. The prediction performance of the algorithm before (full model) and after variable selection (reduced models) is presented in Fig. 2. The values here are not based on the OOB-error but the actual out of subsample for each forest. Note that the prediction capabilities were increased after variables selection. The reduced models are forests constructed after variables selection. The true positive rate is only around 50% for the reduced model, and slightly lower for the full model, Fig. 2. This section now continues with a presentation of partial dependencies of some selected variables with high importance. The partial dependencies are from the reduced model, i.e. after variable selection (included variables are the 12 rightmost variables in Fig. 1), and using the complete data set. The variables that are presented here are income, the most commonly used variable for household fuel use and the rurality variables, village average land and distance to town. However, the other variables used in the reduced model, all has partial dependencies similar to the either income or the rurality variables. The partial dependencies are displayed in Fig. 3 and 4. The partial dependence of income at different levels of income in 2005 and the number of appliances Dig-11-utp Supplier are shown in Fig. 3. The differences between the partial dependencies, conditional to changes in income in 2005 and number of appliances, are most pronounced at low levels of current income. Note that the income distribution is skewed towards the left, i.e. the range where past income makes the greatest difference is also the range in which the majority of household incomes are observed. The partial dependencies of all income and wealth related variables are similar as the partial dependency for income, i.e. a sharp increase at a certain level, followed by a more moderate increase. Furthermore, the effect of conditioning on different levels of other wealth related variables give a similar effect in the shift of the partial dependence curve as observed in Fig. 3, in the sense that the difference is most pronounced at lower levels of the analyzed variable. The propensity to use LPG for cooking is declining with an increased village average land and distance to the nearest town, Fig. 4. A striking difference between the shapes of the partial dependency is visible, for both distance to town and village average land, depending on if income has been set to either the first or the third quartile. In the former, a sharp decline is visible at certain values, while for the high income case, a decline is much more gradual and slow. The partial dependence curve for collection rate has a similar shape as the one constructed for village average land and distance to town. The results from the random forest when set up to predict which households that would have started to use LPG between 2005 and 2008 are presented in Fig. 5. Current income (2005) together with wealth related variables rice cooker, number of appliances and previous income (2002) were important for determining whether household would have started using LPG by 2008. The area description variables village average land, distance to town, are still ranked high. In these aspects the results compare well with those of the current LPG usage, Fig. 1, however there are some discrepancies as well. A variable that was only deemed moderately important for current LPG usage is now the fourth most important, Farm, which indicates whether a household is mainly occupied within farm or agricultural based activities. Furthermore, the income of the spouse also received relatively higher importance. A farm based household is less likely to start to use LPG while a high income of the spouse increases buffers likelihood. When considering all current LPG users, the variables chosen in Table 2, are chosen first and appear to encompass Farm and Income of Spouse, i.e. these variables do not add to the prediction, however for the subset of households that started to use LPG recently, these are chosen more often.