Machine learning holds a lot of promise for quickly and correctly assessing building energy performance at urban level. However, due to the lack of data for minority types of buildings, unfavorable results are produced sometimes. Therefore, this study proposes a concise approach to generate enough data for training machine learning models while avoiding overfitting. Superior results are obtained. The importance of variables is analyzed using urban open data sets, which are valuable to data collectors and publishers in decision-making.
Keywords urban building energy data, building energy performance, machine learning, data generation