Urban energy modelling (UEM) involves simulating energy use at the urban scale, from the neighbourhood to the city level. For example, one type of UEM is urban building energy models (UBEMs), which represent buildings at the district scale [
1] and are used to simulate heating and air-conditioning loads to inform the design of urban power and energy networks, such as district heating schemes, as well as to investigate energy efficiency scenarios, such as the aggregate impact of retrofitting insulation [
2]. Other UEMs include domains such as local climate, buildings, transportation, and energy networks and resources. UEMs, in general have applications including urban planning [
3], policy development [
4], infrastructure development [
5], and digital twin monitoring and forecasting [
6]. In each of these applications, UEM can inform the improvement of energy efficiency, resilience, and sustainability by simulating various scenarios, such as integrating solar PV at the district scale [
7].
Modelling approaches can be categorised as bottom-up or top-down [
8]. Bottom-up, or ‘physics-based,’ models capture the physical characteristics of urban elements. This modelling approach can simulate changes in the urban environment and thus provide more insight than top-down, or ‘data-driven’, approaches [
9], as changes to the physical system can be tested by modifying values, such as increasing the share of rooftop area to solar PV. However, bottom-up and top-down models require detailed data inputs for accurate modelling.
Given the wide range of UEM domains and applications and the detailed data requirements of both bottom-up and top-down models, it is important for urban energy modellers to understand the available data formats, sources, bridging methods, acquisition methods, and limitations, such as granularity and accessibility.
Several reviews of UEM data have been conducted. Herrera et al. [
10] review the methods of weather data creation for building simulation, including a critical analysis of each methodology and a discussion of challenges for weather data, such as climate change and the urban heat island (UHI) effect. Software tools for the generation of future weather files are reviewed and critically analysed by Moazami et al. [
11], who conclude care should be taken when selecting and using tools for the generation of future weather data, as weather predictions can vary depending on the prediction method. The effects of green and blue infrastructure (GBI) on temperature moderation in urban environments are reviewed by Bartesaghi-Koc et al. [
12], and the modelling techniques and data requirements for simulating the effects of urban GBI are systematically reviewed by Liu et al. [
13], who highlights the complexity and diversity of input data for modelling GBI and suggest improvements in data availability and accessibility. Advances in light detection and ranging (LiDAR) systems and the use of LiDAR data for generation of digital elevation models are reviewed by Liu [
14], with a specific focus on data filters, interpolation methods, resolution, and data reduction.
Gunduz et al. [
15] summarise research in indoor modelling and mapping, a key component of UEMs requiring accurate thermal models or tracking of people and objects within buildings. The results of in-situ insulation tests are reviewed by O’Hegarty et al. [
16], including a discussion of over- or under- performing insulation relative to standards and reported values. Modelling approaches for building occupant behaviour are reviewed and categorised by Happle et al. [
17], and the application of these modelling approaches to UEMs is discussed. Fuentes et al. [
18] review research in domestic hot water (DHW) consumption and the impacts of factors such as climate and seasonality on DHW use, focusing on tools for generating DHW consumption profiles for UEM applications.
Understanding distributed energy resources (DERs) is important for energy researchers and modellers, as discussed by Rathore et al. [
19] in their review of several types of solar photovoltaic (PV) cells and their applications. Small-scale wind turbines, an increasingly common type of DER, are reviewed by Tummala et al. [
20], who compare different kinds of turbines and highlight the effects of factors such as positioning and aero-acoustic performance. An analytical review of wind turbines and wind resource evaluations in urban environments is conducted by Tasneem et al. [
21], which discusses the importance of precise wind mapping and accurate wind data. Energy storage is another common type of DER, as discussed by Rahman et al. [
22] in their review of energy storage technologies, which provides a summary of costs and emissions data of available storage technologies.
Additionally, a number of reviews of UEM in general have been conducted. Oraiopoulos and Howard [
23] present the results of a systematic analysis of UEMs with their results validated against measured data highlighting the importance of accurate, relevant input data for model accuracy. Dahlström et al. [
24] review advancements in and challenges of urban energy modelling. Goy et al. [
25] review the impacts of different types of input data in UBEMs and present the results of a case study to rank the impact of input parameters on space heating demand. A comprehensive review of approaches, methods, and tools for UEM is conducted by Ali et al. [
26], which identifies challenges and promising techniques. Johari et al. [
27] provide a critical review of the field of UEM, including a discussion of possibilities, challenges, and potential future improvements. While they highlight the importance of increasing the accessibility and availability of data for UEMs, their review does not specifically review data requirements and challenges.
To date, data reviews focus on a narrow set of modelling data, often limited to a single domain. Research interests in specific domains of UEM often drive this specificity. For example, researchers quantifying the UHI effect are most concerned with local weather and external building geometries [
28]. In contrast, researchers interested in accurately modelling building energy consumption and Indoor Environmental Quality (IEQ) are most concerned with interior building layouts and occupant behaviour [
29].
However, practical UEM applications are naturally multi-domain, given the functional requirements for urban planning and infrastructure development are multi-domain. For example, power network planners are interested in the impact of weather pattern changes on heating and cooling loads and the impact of distributed energy resources (DERs) like solar PV. This practical need for multi-domain UEM is reflected in the growing body of recently developed multi-domain tools for UEM [
30]. Additionally, linking domains in UEMs offers the potential for greater accuracy, given that the physical systems they represent are linked by themselves. For example, building heating and cooling causes heat-flux into and out of the surrounding local climate and, at the same time ,the local climate affects building heating and cooling loads. Thus, coupling building energy models and local climate models can increase the accuracy of both.
Overall, given the practical needs and opportunities for increased accuracy with multi-domain UEMs, a comprehensive multi-domain review of data for UEMs is required. However, to date, no such review exists.
This work provides a comprehensive review of multi-domain UEM data, with domains categorized as Climate, Geographic, Building, Transportation, Demographics, Energy Networks and Consumption, and Distributed Energy Resources, as shown in . Data formats, sources, acquisition methods, bridging methods, and challenges for each domain are identified. Additionally, key overall challenges for UEM data are explored, key considerations and practical implications are summarised, and recommendations are provided for multi-domain UEM data.
. Overview and categorisation of data requirements for urban energy models.
The literature review for this work was conducted using several academic search engines, including
Google Scholar,
Scopus, and
Consensus, an AI-powered academic search platform. These tools were chosen to ensure comprehensive coverage of relevant studies across multiple domains. Some references were selected according to the authors’ direct research experience, providing further depth and context to the data sources used to develop multi-domain urban energy models.
This work is structured as follows. Section 2 prefaces the work with a review of general data challenges across all domains of UEM. Section 3 provides a data review of each UEM domain: Climate, Geographic, Building, Transportation, Demographic, Energy Network, and Distributed Energy Resources. Section 4 identifies key considerations, practical implications, and recommendations for UEM data and data research. Section 5 concludes the review.
3.1. Climate Data
Weather drives building energy consumption and renewable energy generation. Weather datasets include external air temperature; humidity; wind speed and direction; pressure; precipitation; surface albedo; solar radiation and its components, including direct and diffuse radiation; and sky illuminance and its components [
45]. Location and time are recorded with latitude, longitude, elevation, and local time stamps. Data are compiled at sub-hourly to hourly time-steps, and statistical information. Such as maximums, minimums, averages, and frequency distributions, are also often reported [
10]. Weather data are collected at weather stations, which are typically in remote locations away from the influence of micro-climatic effects [
46].
Weather datasets may consist of the historical weather conditions for a specific year, as with actual meteorological year (AMY) data, or typical data, as with typical meteorological year (TMY) data [
10]. TMY weather conditions are annual weather datasets representative of historical years, derived from historical data, and constructed by statistical methods [
47].
Recorded weather data’s temporal and spatial resolution limits their applicability and accuracy for UEMs. Typically, data from the closest weather station are used, which may not accurately represent the weather at the location of interest. However, weather generators producing “synthetic” weather data have been developed to overcome these limitations [
10]. “Meteonorm” [
48] generates stochastic TMY at time intervals as low as one minute and has been used for building simulation [
49]. “LARS-WG” [
50] is a stochastic weather generator that can downscale the spatial resolution of weather data [
51] and generate data for locations between weather stations. In some cases where only low-resolution data are available, the highest-resolution dataset is used [
52].
The urban heat island (UHI) effect is a phenomenon where towns and cities experience higher temperatures than surrounding areas due to differences in vegetation, surface albedo, increased heat retention, reduced airflow due to built-up intensity, and/or anthropogenic heat emissions [
53]. Additionally, interactions between buildings in the urban environment include shading between adjacent buildings, longwave radiant heat exchange, and solar reflection [
54]. Consequently, the urban form has an impact on building conditioning loads. Tools have been developed to capture the urban microclimate due to the UHI and the interactions between buildings, including ENVI-met [
13,
55] and Urban Weather Generator (UWG) [
56,
57].
Anthropogenic climate change means weather changes can occur within the lifetimes of current buildings and infrastructure, so climate change must be considered by urban energy modellers. Climate models are used to predict future weather patterns, which in turn are used to simulate buildings and predict future energy performance [
58]. The Intergovernmental Panel on Climate Change (IPCC) produces climate models for multiple scenarios. The models include Global Circulation Models (GCM) and finer-resolution Regional Climate Models (RCM) [
59]. However, their finest resolution is a 25km-by-25km surface grid [
10]. Tools are available to generate representative localised weather data from these climate models [
11], including “CCWorldWeatherGen” [
60] and “WeatherShift” [
61].
3.2. Geographic Data
The urban environment contains natural and human-made features that can impact the energy performance of buildings and the effectiveness of renewable resources. Natural features include terrain elevation, parks, trees, and bodies of water. Human-made features include transportation infrastructure, such as roads and buildings. Many natural and human-made elements participate in microclimatic effects, affect wind-flow, and produce shading, thus affecting energy performance. Failure to consider the effects of geographic elements, such as terrain, can limit model accuracy. For example, some UBEMs assume lang surfaces are flat rather than modelling the effects of changing elevation, leading to inaccurate results [
62,
63]. Thus, the effects of terrain and other geographic data are important elements for consideration in UEMs.
3.2.1. Terrain and Elevation
Two types of Digital Elevation Models (DEM) exist: Digital Surface Models (DSM), which include natural and human-made features; and Digital Terrain Models (DTM), which exclude human-made features and provide the elevation of bare land. Elevation models are created from contour lines, topographic maps, global positioning system measurements, photogrammetry techniques, radar interferometry, stereo satellite images, and laser scanning, with vertical measurement accuracies varying according to the method used [
64]. Satellite techniques can produce DEM spanning the globe, with vertical accuracies of 1.5-6 metre root-mean-square error (RMSE). In contrast, airborne laser scanning techniques, also known as light detection and ranging (LiDAR), require localised measurement and have lower vertical accuracies of approximately 0.15 metre RMSE [
14,
64].
A number of satellite-derived DEMs are freely available, which have been used to map urban structures [
62,
65,
66], including the Shuttle Radar Topography Mission (SRTM) elevation data, captured in the year 2000 with a 30-metre surface mesh [
67]; Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), which has a 30-metre surface mesh and was released in 2009 [
68]; Advanced Land Observing Satellite (ALOS) World 3D surface model, which has a 30-metre surface mesh and is based on observations between 2006 and 2011 [
69]; and the Global Multi-resolution Terrain Elevation Dataset of 2010 (GMTED2010) [
70].
In addition to lower-accuracy satellite-derived DEM, higher-resolution LiDAR-derived DEM have also been produced. However, the availability of these higher-resolution data is limited. Furthermore, given its higher resolution of measurement, LiDAR has been used to determine building ages [
71], facades [
72], and building geometries [
73]. “OpenTopography” is an initiative to support data sharing and access to high-resolution topography [
74]. Highly accurate LiDAR datasets have been shared on this platform, including for large tracts of New Zealand, collected between 2010 and 2022 [
75].
Dataset age presents a barrier to the applicability of elevation datasets, particularly for extracting building features. Where building features are extracted, the building must have been built before the dataset was collected, which limits the use of such techniques for recently constructed buildings. Additionally, dataset accuracy presents another barrier. Although satellite-derived datasets are freely available and provide global coverage, they have low accuracy for building purposes and can produce significant errors when used to extract building dimensions. [
62].
3.2.2. Natural & Human-Made Features
Built infrastructure, roading and buildings, and Green and Blue Infrastructure (GBI), which describes vegetation and bodies of water, affect the local microclimate due to heat retention, changes to albedo, evapotranspiration, solar shading, and wind flow modification. GBI has been used to modulate local climate to increase comfort and decrease energy consumption [
13] and have been incorporated into UEM simulations as energy efficiency interventions [
76,
77].
Local climate zone (LCZ) is a standard for characterising urban areas for local climate analysis, which is divided into 10 urban zones and 7 natural zones. Zones include categories for compact and open high-rise, mid-rise, and low-rise buildings, pathed areas, bush cover, and water [
78]. Subclassification, “levels”, quantifies further details, such as tree morphology and soil type [
78]. These zones have specific attributes relating to their impact on the local climate.
Various means have been used to measure climate relevant land qualities, including remotely sensed spectral data; airborne LiDAR and spectral airborne-based LiDAR; low- and high- resolution satellite imagery; terrestrial laser scanning; aerial and terrestrial photography; and in-situ inspections [
12]. The World Urban Database and Access Portal Tools (WUDAPT) project is a global initiative to collate and disseminate relevant climate data [
79], including LCZ maps [
80], which can be generated automatically from satellite data [
81]. LCZ classifications, such as ENVI-met, are often used as a model input for urban climate modelling packages [
82].
3.3. Building Data
Building energy modelling (BEM) involves detailed physics-based modelling of buildings to understand energy consumption and evaluate the effects of energy interventions [
83]. With advances in computing power, the related field of urban building energy modeling (UBEM) has emerged, involving the simulation of numerous buildings at an urban scale.While the two fields overlap, UBEM typically represents many buildings at a lower level of detail (LoD), while BEM represents fewer buildings (or one building) at a higher LoD [
84]. Thus, while the two fields require similar data types, the impact and importance of different data can vary between BEM and UBEM.
Both BEM and UBEM can be implemented in various software environments, including open-source applications. For example, EnergyPlus is an open-source building energy simulation package developed by the United States Department of Energy, which can model a range of building energy functions [
85].
3.3.1. Data Formats
Several data formats can be used to store UBEM data. Common formats include IFC and gbXML, which are standard for BEM; and GeoJSON, Shapefiles, and CityGML, which are more common for UBEM [
63]. The Industry Foundation Class (IFC) [
86] is used to describe building and construction data and “green building.” XML (gbXML) [
87] is designed for sharing data between building design and simulation tools. GeoJSON [
88] is a JSON-based file format for geospatial data, which is commonly used in mapping applications and UBEM platforms [
63] and is easily implementable in a range of applications. The OpenStudio City Modelling Framework extends the use of the JSON format for UrbanOpt [
89]. Shapefiles are a type of geospatial vector data format used to store and display geographic information, consisting of a set of files representing geographic features such as points, lines, and polygons [
90]. Shapefiles are commonly used in GIS systems [
63] and are thus well-suited to urban modelling applications, as GIS systems are specifically designed to handle and analyse spatial and geographic data and have well-developed resources for those tasks.
CityGML (City Geographic Markup Language) is an open and standardised data format specifically for cities and landscapes, which was developed by the Open Geospatial Consortium [
91]. CityGML has several features [
92] suiting the requirements of UEMs: (i) The format has native support for city elements such as city furniture, buildings, transportation, vegetation, and water bodies; (ii) The format supports multi-scale modelling. For example, geometries may be represented in four increasing levels of detail (LOD0-3); in LOD0, buildings are represented by surface footprints, while in LOD3, buildings have highly detailed representations described by sets of surfaces, internal and external geometries, apertures, and shading devices; and (iii) the format supports the development of application domain extensions (ADE), sets of additional classes, attributes, and relations.
EnergyADE is an ADE developed to allow for detailed single-building energy simulations and city-wide bottom-up energy assessments, including data for building physics, occupant behaviour, materials, construction, and building energy systems [
93]. EnergyADE is not intended for use with centralised energy infrastructures, such as district heating networks, as these are the focus of other extensions, such as the Utility Network ADE. However, the Utility Network ADE project appears to be stagnant, as the project is still in draft status and was last updated in 2012 [
94].
3.3.2. Building Geometry
Building geometry data requirements increase with increasing LoD [
95]. UBEM often uses models with lower complexity than those in single-building BEM, reflected by the lower LoD typically used for models [
63], which typically consists of external and assumed geometries, and single thermal zoning instead of multi-zoning [
44]. For example, UBEMs often include approximations such as the use of single, centrally located windows on building faces, which are sized to meet assumed or known window-to-wall ratios (WWRs) [
96]. Thus, UBEM can have lower per-building data requirements than high-LoD BEM.
Diverse types of building geometries exist. Building external geometry data include building footprints, height, orientation, floor number and height, and roof shape. Façade geometry data include the size and positioning of doors, windows, and shading devices. Data may be simplified, such as simplifying the size and distribution of windows to a single WWR. Internal geometry data include an internal layout of partitions and the zone type. For example, zones can be classified as occupied/unoccupied, or according to use, such as office, bedroom, living area, and circulation area.
While building plans and building information models (BIM) often contain all the required building information for UBEM, access to these data can be limited, and is typically restricted to the municipalities. Additionally, as it can be cumbersome and time-intensive to extract the relevant data when available, it is typically infeasible to utilise BIM at the urban scale. Furthermore, in the case of aged building stock, BIM may not be available, and building plans have a variety of formats, such as scanned drawings. Overall, while building data are often available, extracting the required information for urban-scale analysis is typically impractical. Conversely, automated feature extraction from BIM or drawings presents an opportunity to enhance the availability of building geometric data for UEM.
Building footprints provide building orientation and overall shape and can be vertically extruded to produce a 3D building shell [
27], a common workflow for UBEM production. Footprints can be generated from direct surveys, satellite imagery, aerial imagery, oblique photogrammetry, and LiDAR [
63], where image recognition and extraction can automate footprint generation [
97].
Google’s Open Buildings footprints [
98], generated from satellite imagery using machine learning [
99], and Open Street Maps footprints, generated by users from various sources [
100,
101], provide freely available building footprints, which have been used for UBEM [
62,
102,
103]. Another footprint source is local municipality cadastral records, which may have greater accuracy than footprints derived from satellite imaging and have also been used for UBEM [
104]. Some cadastral records are openly available [
105,
106,
107,
108,
109,
110]. Accurate commercial mapping services are also available [
62]. In addition to footprint, mapping services provide urban layout data, such as the distribution and orientation of buildings, parks, and roading.
Building heights can be obtained from building plans and records [
111]; obtained from Energy Performance Certificates [
24,
112], which are required for certain buildings in the European Union (EU); obtained from tax records [
113]; or calculated by assuming floor-to-floor heights from the number of stories [
114,
115]. Heights can also be extracted from normalized Digital Surface Models (nDSM), which are created by subtracting the DSM from the DTM [
62]. Satellite-derived DEMs typically have low accuracy [
63,
116], but due to higher precision, LiDAR-derived DEMs can calculate building heights with greater accuracy. Similarly, LiDAR data can produce detailed roof geometries [
73]. Additional methods to obtain building heights include oblique photogrammetry from aerial photography [
117] and shadow measurement [
118]. Where building heights have already been characterised, these data may be available as attributes in Google and OSM datasets [
98,
100] or city GIS datasets [
119].
Fenestration information is often not included in building datasets [
44,
63], likely due to the difficulty of obtaining these data. Thus, several means to determine simple WWRs include assuming from building archetypes [
96], professional judgement [
44], aerial infrared thermography [
120], and manual extraction from photographs [
111,
121]. However, the simplified WWRs lack accuracy for muti-zoned models, so detailed fenestration data, such as the size and positioning of individual windows, are required. Methods to extract detailed fenestration data include low-resolution aerial photographs with analytical extraction techniques [
122,
123], terrestrial and satellite photography and manual extraction [
124], street-based photography and artificial intelligence extraction techniques [
115], oblique aerial images and automated [
115,
122,
125], LiDAR scanning of façades, and automated extraction [
72]. However, many of these techniques are cumbersome, and thus not readily implementable in UBEMs.
External shading devices like louvres and overhangs, can affect building energy use by reducing solar gains and cooling loads. No research has been found for the automatic or manual characterisation of external shading devices, either by remote sensing or other means.
Modelling building interior partitions is common in traditional BEM but not in UBEM, due to the difficulty of collecting interior layout data at scale. However, model LoD and thermal zoning strategies do affect energy performance [
112], so including building interior modelling will increase the accuracy of BEMs and UBEMs. Several techniques exist for indoor mapping, including the use of one or a mixture of laser sensors and scanners, stereo-imaging, and sonar [
15,
126]. However, these techniques are often manual time-intensive, and expensive. Crowdsourcing the task of harvesting internal boundaries to smart phones has been explored to overcome these limitations, and several platforms exist to convert smartphone data into indoor floorplans [
126]. No largescale open datasets for indoor maps exist, however, Google Maps has added support for viewing multi-story indoor maps [
127]. The dataset is small and limited to commercial and public spaces, which have an incentive publicise building layouts. Current datasets have limited applicability to the residential building stock and UBEM. Additionally, no research has been found integrating generated indoor maps, or derived thermal zoning, into UBEM. No large-scale open datasets for indoor maps exist. However, Google Maps has added support for viewing multi-story indoor maps [
127]. This Google dataset is small, and limited to commercial and public spaces, which have an incentive to publicise building layouts. Current datasets have limited applicability to the residential building stock and UBEM. Additionally, no research has been found for the integration of indoor maps, or derived thermal zoning, into UBEMs.
3.3.3. Building Constructions
Building material thermal performance data are required for accurate thermal modelling in BEM and UBEM. These data include material type, airtightness, and thermodynamic characteristics such as U-values. In BEM, physical constructions can be extracted from building plans and surveys, then U-values and other characteristics obtained from standard values based on the obtained data. However, this procedure can be inaccurate, as measured values can differ substantially from idealised assumptions [
16]. Thus, where available, directly measured data are superior to those calculated from standard values.
Construction data for UBEMs can be collected from energy performance certificates [
104,
128]; professional judgement based on historical standards, construction type, and building age [
44,
129]; and national building archetype databases, such as Tabula in the European Union [
58,
130,
131,
132]. These methods have a low accuracy, as they are typically not based on building measurements or direct observations, and thus can fail to account for key features, such as retrofitted insulation.
Although not yet implemented into UBEM, thermal performance data may be measured directly, which has the potential to overcome data accuracy and accessibility limitations. U-values can be calculated from on-site data, such as heat flux measurements and infrared thermography [
133,
134]. Large-scale acquisition of thermal property data via aerial and UAV infrared thermography has been proposed [
120,
135], although the technique has low accuracy. Additionally, image recognition may be used to classify exterior construction material [
136], which can be used to allocate buildings to the appropriate archetype. For this task, source images can be extracted from public repositories, such as Google Street View [
124]. Overall, the limited accuracy and availability of building construction data are ongoing limitations for UBEM.
3.3.4. Occupancy and Occupant Behavior
Occupant behaviour includes occupancy schedules, electrical plug loads, lighting schedules, temperature set points, window operation, solar shading operation, and hot water usage [
137]. Each of these behaviours is important for UEM [
25], as they drive building energy demand and can affect the timing of energy consumption, so they are key to understanding peak energy demands in energy networks [
138,
139] and can contribute to the increased potential for energy demand response [
140]. Four general approaches are used to represent occupancy and occupant behaviour in UEMs, including deterministic, stochastic, and agent-based methods [
17,
137,
141]. Deterministic occupancy schedules are often assumed based on expert judgement and are most often used in UBEM [
23]. However, where available, measured data from individual buildings are the most accurate [
142].
Time use surveys (TUS) are statistical surveys collecting data on how people spend their time. These surveys have been conducted in multiple countries and continents. By determining where occupants are likely to be at different times of day, TUS and similar datasets can be used to derive occupancy and behaviour schedules for residential and commercial buildings [
63,
137]. Data from smart electricity meters have also been used to determine occupancy and appliance use [
143].
Deterministic occupant behaviour profiles are provided by guidance documents from institutions such as CIBSE and ASHRAE [
144,
145] and from building standards [
146]. Stochastic profiles account for variability in human activity, which is critical for accurately predicting peak loads in large energy networks due to load diversification. Stochastic profiles can be based on people, such as creating a location profile for each occupant, or spaces, such as creating an occupancy profile for each room [
17]. Several stochastic occupancy and behaviour profiles have been developed, including stochastic occupancy schedules based on the American Time-Use Survey [
147]; and StROBe and newStROBe, derived from Belgian household data [
148,
149]. Agent-based models (ABMs), which simulate human behaviour and their interactions with other agents and the environment, have been used to generate occupancy and behaviour profiles [
150]. The use of ABMs to generate behavioural profiles is promising at the urban scale, as ABMs can couple transportation and building utilisation [
17] and lead to more accurate urban-scale occupancy distributions. Additional models have been developed to generate specific occupant behaviour, such as electrical plug loads [
151], hot water consumption [
18,
152], and both [
153].
3.3.5. Building Systems
The correct characterisation of building systems, such as the rated capacity and efficiency of Heating Ventilation and Air Conditioning (HVAC) and Domestic Hot Water (DHW) systems, is important as they are a significant driver of building energy consumption [
23]. Building energy system characteristics have been extracted from existing databases, such as census data, the European EPC database, and the US Building Performance database [
26,
96,
154]. Most commonly, archetypes are used, where systems are assigned to buildings based on the year of construction, the existence of district heating systems, and professional judgement [
44,
119,
128,
155]. Building systems have also been ignored where the type of analysis is focused on other aspects of the energy system [
104]. While not yet deployed in BEM or UBEM, non-intrusive monitoring from smart power meters, is becoming increasingly ubiquitous, and signal processing may be used to determine installed equipment, appliance loads, ratings, use, and efficiencies [
143,
156,
157,
158,
159].
3.4. Transportation Data
Transport has been studied at the urban level for classical ‘four-step’ transport demand models, which include trip generation, trip distribution, modal split, and traffic assignments [
160]; land use-transport interaction modelling; accessibility analysis [
161,
162]; transport poverty studies [
163]; and transport energy requirements [
164]. Including transportation in UEMs provides a broader and more useful energy system model [
3] and can improve overall model accuracy by accounting for the interconnection between vehicle use and building occupancy [
24].
A range of data sources exists for modelling urban transportation. Vehicle licensing records provide annual travel demand based on odometer readings and vehicle demographics, such as year, make, and model. Travel diaries and surveys, which record individual or household travel patterns for several days, have been used for multivariate statistical mode choice modelling [
165]. Traffic counts, which record the vehicles passing through sampled streets, have been used for congestion monitoring, support analysis of travel patterns, and validation of other transport models [
166,
167]. Census data provide demographic data, including car ownership, and can include travel-specific data, such as travel demand, and cognitive factors, such as travel attitudes and preferences [
165]. Census data benefit from large sample size and the inclusion of multiple factors such as region, household, and income, which improve their usefulness to modellers. Geographic data include land use and the spatial layout of road and transit networks, and departure and destination locations, such as residential areas, food, health, education, and retail facilities [
168]. Geographic data are available from open sources, such as OpenStreetMap. The General Transit Feed Specification (GTFS) is an open standard for public transportation schedules and associated geographic information, such as rail and bus stops, routes, and timetables [
169], which can support urban transport accessibility studies [
170].
In addition to classical sources, technological developments such as the internet of things (IoT) and the advent of Big Data have provided several new sources of transportation data [
171,
172]. Global positioning system (GPS) loggers are now present in smart phones, wearable technologies, and many other devices, including vehicles. GPS data are used by services such as Google Maps and ride-sharing apps to provide information on congestion and adjust route generation. Similarly, social media posts are often ‘geo-tagged’, marking their position using the GPS capabilities of smart phones. Thus, social media data-harvesting can extract population-level travel behaviours and locations of interest but lacks accuracy at an individual level due to privacy restrictions [
173,
174]. Similarly, IoT devices, such as traffic cameras and smart traffic lights, can generate real-time traffic data [
175,
176]. Smart ticketing systems in public transport record travel times and entry and exit locations so they can provide detailed information on public transit use. However, these data are limited to the transit network and thus do not reveal passengers’ actual origins or destinations. Records of network connections, such as cellular and wireless networks, contain time and approximate location. They can be used to infer travel routes but are limited to passengers connected to these networks during their transit.
Overall, while data-rich resources are becoming increasingly available, there are still considerable barriers to overcome before large-scale integration into UEMs. Specific challenges include integrating and sharing data, which can be challenging due to data volumes and private ownership, data ownership and privacy, and data quality and standards [
171].
3.5. Demographic Data
Demographic data describes the characteristics of a population and includes socio-economic data, which can include age, gender, ethnicity, income, employment status, household composition, and location, all of which can affect energy consumption [
177]. For example, space heating use typically increases with income level and with age [
178]. Thus, regions with higher household income levels and/or older populations may have higher energy consumption, so socio-economic data are important inputs for accurate UEMs.
Demographic data are commonly employed in top-down UEMs [
179], where regression is performed to correlate demographic characteristics with energy consumption patterns, and projections can be performed, such as forecasts of population growth. In bottom-up UEMs, demographic data are used indirectly to assign other inputs, such as space heating methods.
Sources for demographic data include surveys, billing data, tax records, and census records [
26,
180,
181,
182]. Census data are typically the most common demographic data source, due to the high detail, breadth, quality, and accessibility of data collected. Limitations of census data are the frequency of collection, which is commonly several years, and the lack of specific energy-related data. Due to privacy concerns, census data are anonymised, which limits connections between energy and census datasets at an individual or household level, thus preventing correlations between demographic data and energy consumption patterns at this level. Instead, aggregate correlations are used, which can limit the efficacy of census data for UEM.
3.6. Energy Network and Consumption Data
Energy networks can include electricity networks, distribution of gas and other fuels, and district heating schemes. In these networks, the placement and sizing of elements, such as distribution pipes, electricity lines, pumps, and electricity transformers, can vary. Thus, network-specific data are required to identify limitations, such as the capacity of electricity distribution transformers, and for planning and system design.
Consumption data in these energy networks are required to tune and validate bottom-up energy models, and are the primary input for top-down, data-driven models [
26]. Energy consumption data can be collected at any level: dwelling, building, district, region, or nation. These data are typically collected by meters, such as electricity meters at the building or district level. Lower-level energy consumption data are the least aggregated and thus the most useful for high-granularity UEMs considering individual buildings or districts. However, access to dwelling- and building- level data is limited due to privacy concerns, so access typically requires the consent of each consumer [
183]. Additionally, because meters are typically owned by energy retail companies, the data are usually proprietary, requiring the owner’s cooperation for access. For example, electricity consumption data are plentiful in New Zealand due to the nationwide roll-out of smart electricity meters [
184], but these data are owned by the electricity retail companies and are typically unavailable for energy modelling. To address this limited accessibility, some studies in New Zealand have collected non-proprietary, anonymised electricity consumption data from small numbers of consumers [
184]. However, while these studies can help to address data privacy and ownership concerns, their small sample sizes raise concerns about representativeness. Additionally, small-scale studies of energy consumption can be particularly prone to issues of incompleteness [
185], and issues can arise when combining datasets because of differences between them [
186].
District-level consumption data can be sourced from energy distributors and are aggregated at sufficient resolution to overcome privacy concerns [
104]. However, these data are proprietary, so access requires the distributor’s participation. As distributors are not typically incentivised to participate in data-sharing, access to these data is often limited [
44]. For similar reasons, access to energy network system design, such as network layouts and system capacities, is often limited [
34]. However, high-level data are sometimes available directly from distributors, such as PowerCo in New Zealand, who provide an interactive map of their medium-voltage electricity network for planning purposes [
187].
High-level energy consumption data are usually more openly accessible than lower-level data. At the state or national level, data reporting is often mandated, making energy network and consumption data frequently publicly accessible. For example, in New Zealand, the Electricity Authority reports electricity power flow and location-specific electricity costs [
188]. However, the high-level aggregation in these data means they may be poorly suited to models with higher granularity.
Overall, high-level consumption and network data are typically more widely accessible than lower-level data, but their aggregation means they may be less useful for energy modellers. Conversely, low-level data are typically more useful but are often inaccessible because of privacy and/or commerciality concerns. Thus, improving the accessibility of high-granularity data, such as from individual buildings’ energy meters, would increase the quality of UEMs requiring these data. However, the privacy of energy consumers must be protected, so datasets should be appropriately anonymised and retain useful, non-identifiable information, such as the region from which the data were collected.
3.7. Distributed Energy Resources
Distributed energy resources, such as solar PV, micro-wind turbines, and battery technologies, are increasingly being integrated into electricity systems at the building and district levels. Thus, variables affecting distributed energy resources, such as resource potentials, weather data, technological specifications, and costs, are increasingly important data inputs for UEMs. In addition to the weather data sources described in Section 3.1, the World Bank’s Global Wind Atlas and Global Solar Atlas projects provide verified estimates of wind and solar resource potential for locations around the world [
189,
190]. Detailed resource potentials can also be estimated at the national level. OpenEI (Open Energy Info) provides the Open Energy Data Initiative data portal, where high-resolution resource potential data are available for North America [
191], and Data Europa provides similar data for Europe [
192].
Energy technology costs, related emissions, and specifications, such as PV panel efficiencies and temperature coefficients, power curves for micro-wind turbines, inverter conversion efficiencies, and battery efficiencies, are important considerations for the effective design of urban energy systems. The most relevant energy technologies at the urban scale are solar PV, small wind turbines, and battery technologies, as these are the most common [
193]. Accurate technology specifications are available from manufacturers’ product datasheets. However, at early design stages, or when making technology- rather than product-level decisions, general figures are most useful. Generalised energy technology data have been produced by institutional scientific review and are generally included in the internal libraries of energy modelling software platforms, such as SAM and HOMER [
194,
195]. Solar PV technologies have various established and emerging cell architectures. Important solar cell parameters include cell efficiency and power performance with temperature. Solar PV cell types are reviewed in [
19], where cell type efficiencies and temperature coefficients are quantified, and qualitative data is provided on size, cost, and high-temperature performance. NREL provides technical reports on the module costs of commercially available and emerging Solar PV technologies [
196].
Various terms are used for small-scale wind generation, which is the most relevant at the urban scale, including micro-wind, small-wind, and building-integrated wind generation. Wind electricity generation at this scale is still an emerging technology, and less mature than solar PV in its development and ubiquity. Due to these factors, research reviews are limited to reviewing the technology types [
20] and the connection between the local wind resources and small-scale wind turbines [
21]. There is an absence of data on the cost of installation and operation, and the overall energy costs, such as LCOE calculations, for small-scale wind, which is likely due to the limited number of installations and the strong dependence of these costs on installation location and local topology, making costs and energy yields far more variable for micro-wind than for solar PV.
Energy storage technological options are reviewed in [
22], where factors such as rated power, specific energy, energy efficiency, discharge time, response time, lifetime cycles, self-discharge, and relative costs are compared. The review highlights the high variability within technologies and provides ranges of these factors for different technologies. Additionally, at an institutional level, the European Union department of Energy has published a database of European energy storage technologies and facilities [
197], which provides simplified costs and specifications. Increasingly, appliances with flexible electricity demand, such as hot water cylinders [
198,
199] and electric vehicles [
34], are being used as distributed energy resources to provide ancillary services to the power system, such as reducing peak electricity demand and increasing utilisation of intermittent renewable generation. Thus, understanding this process, known as demand response (DR), is becoming increasingly important for urban energy modellers, as DR programs can influence electricity time-of-use and total energy use. The constraints and considerations of DR, including technical, economic, and behavioural factors affecting DR programs, are reviewed in [
183].
In addition to the technology-specific sources described above, the United States Energy Information Administration produces an “Annual Energy Outlook” representing their assessment of the total system costs to develop and install various electricity generation technologies [
200]. Additionally, financial consultancy Lazard produces annual reports on the levelized cost of energy (LCOE) for different generation and storage technologies [
201].
Finally, electricity generation-related emissions are reviewed and reported by the Intergovernmental Panel on Climate Change (IPCC) in the “IPCC Special Report on Renewable Energy Sources and Climate Change Mitigation,” which reviews Life-Cycle Analyses (LCAs) of electricity generation technologies, calculates the lifetime emissions attributed to each generation technology, and compares emissions with lifetime generation [
202]. For combustion sources, such as wood or gas boilers, the IPCC publishes an “Emissions Factor Database” to catalogue emissions factors for a combination of fuels and combustion techniques [
203].
In general, energy resource data have several limitations. Due to economies of scale, energy technology data are often available for utility-scale data but are unsuitable for smaller urban-scale applications. Energy technology data can also fail to account for variability, as data are typically presented with a single representative figure and, where they are provided, data ranges are often inordinately large and are unaccompanied by information on when these ranges apply, so are impractical for use by energy modellers. Thus, a large-scale repository of DER products, with manufacturers’ specifications and, indicative costs and typical energy yields, would be a useful and timesaving addition to urban energy systems design. Such a repository would allow direct comparisons between technologies and products with a high level of accuracy. However, no such repository currently exists.
Recent technological advancements have driven rapid growth in the field of urban energy modeling (UEM), enabling the analysis of interconnected, multi-domain urban energy systems and requiring data inputs from various domains.This work provides a comprehensive review of multi-domain UEM data requirements, including data formats, sources, acquisition methods, bridging methods, and challenges. Data inputs are categorized into climate, geography, building, transportation, demographics, energy networks and consumption, and distributed energy resources. Additionally, several key challenges are identified, which are common to multiple domains. Although specific challenges can vary depending on the requirements of a given model, in general, improving the availability, accessibility, and quality of high-impact data should be considered a priority. Key implications and recommendations for multi-domain UEM data are provided. Overall, substantial amounts of data exist, but their use is encumbered by a lack of coordination and standardisation of formats and due to privacy and commerciality concerns. Consequently, coordinated effort by researchers, data owners, and data collectors is required to increase access to these data, which will improve the results of multi-domain UEMs.
Conceptualization, D.B.; Methodology, D.B., B.L.M.W.; Investigation, D.B., B.L.M.W.; Writing—Original Draft Preparation, D.B.; Writing—Review & Editing, D.B., P.G., B.L.M.W.
Not applicable.
Not applicable.
This research received no external funding.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.