Article Open Access
Received: 08 November 2022 Accepted: 16 December 2022 Published: 22 December 2022
© 2022 The authors. This is an open access article under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
Since the early years of the emergence of the science of Remote Sensing, one of the main procedures of processing satellite images was image fusion, which is still studied today. Methodological image fusion procedures allow e.g., to improve the spatial resolution of multispectral (MS) images by exploiting the panchromatic (PAN) image of better spa-tial resolution while trying to preserve to a large extent the spectral information of the original MS image [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19] in the new fused image.
The main spatial resolution ratio in PAN and MS satellite images of the same satellite system is 1/4, i.e., for spatial resolution Am in the MS image the spatial resolution of the PAN image is A/4m. In the literature, one can find countless image fusion papers with these ratios [9,10,11].
Secondarily, other image spatial resolution ratios are also exploited, for example, 1/3 and 1/60, mainly by fusing images from different satellite systems [12,13,14].
In the case of cameras used in unmanned aerial vehicles (UAV), the spatial resolution ratios between the color (RGB) sensor (R: Red, G: Green, B: Blue) and the MS sensor (e.g., Parrot Sequoia+, Sentera Quad Multispectral Sensor, Sentera AGX710, Sentera 6X Multispectral Sensor, Slantrange 4P+, Sentek systems GEMS, MicaSense-RedEdge, etc) are mainly 1/4 and 1/3 (until a few months ago no UAV camera had a PAN sensor).
In a previous paper [20], RGB and MS images of the same camera for UAV, the Sequoia+ (by Parrot), were fused in order to improve the spatial resolution of the MS image and thus improve the classification accuracy by exploiting the fused image. For the same reasons, in this paper, images from different cameras for UAV will be fused.
As in the previous paper [20], the same question could be asked here: when the flight height of UAVs is so small (a few meters or tens of meters), therefore a very good spatial resolution is already available in the MS image, why is it necessary to improve the spatial resolution of the MS image? First of all, an answer can be given if one considers the shift of major manufacturers of multispectral cameras for UAVs, such as MicaSense (RedEdge-P camera with a high-resolution panchromatic band), towards the creation of new cameras for UAVs which have (in addition to the MS sensor) also a PAN sensor, and have as the main argument the possibility to improve the spatial resolution of the MS image.
However, before the above image fusion is performed, the horizontal and vertical accuracy of the generated products will be determined using Ground Control Points (GCPs) and Check Points (CPs), and the changes of these accuracies with a 50% increase (or decrease) of the UAV’s flight height will be determined. The cameras to be utilized are the Phantom 4’s 1/2.3" CMOS 12.4Mp RGB camera 1/2.3" and the MS+RBG camera Sequoia+ (by Parrot), while the study area is the Early Christian Basilica C of the Amphipolis archaeological site (Eastern Macedonia, Greece, Figure 1).
The Early Christian Basilica C is located in the acropolis of ancient Amphipolis (Figure 2 and Figure 3), at an altitude of ~120 m, dating back to the 6th century AD and came to light after excavations carried out in the 1960s and 1970s. It con-sists of the main temple measuring ~28×18 m (Figure 2) and the three aisles, which were separated by two colonnades of six columns each. In the eastern part of the temple, there is a niche (semicircular arch) with a radius of ~6.5 m. In the western part of the temple and perpendicular to the aisles is the narthex, whose dimensions are ~16.5×4 m, while in the southern part of the temple there is the atrium. It is worth noting that magnificent mosaics were found on the floor of the narthex and in the three aisles. Roman buildings were discovered to the west and south of the temple, and it is suspect-ed that much of the western buildings lie beneath Basilica C [21,22,23].
The Phantom 4 was used for mapping, which is equipped with the RGB camera 1/2.3" CMOS 12.4Mp (from now on the camera will be called RGB Phantom). The Sequoia+ camera (by Parrot) was also mounted on the UAV at the same time. The main characteristics of the two cameras are presented in Table 1.
For the measurement of X, Y, Z on the Greek Geodetic Reference System 1987 (GGRS87) of 18 GCPs and 20 CPs (Figure 4), paper targets (Figure 5) of 24×24 cm and the GPS Topcon Hiper SR (RTK: 10 mm horizontal accuracy and 15 mm vertical accuracy) were used.
The flight took place on 21/02/2022 from 11:00 am – 12:30 pm, with a ground temperature of 14oC and no cloud cover. Flight heights were 30 m and 45 m, and flight speed was the minimum (~2 m/s). The autopilot was set to cover a larger area than the study area to ensure no mapping gaps. From the set of strips and images captured by both cameras, specific information (less than the available information) was utilized, but it overlapped the study area. The images overlay ορίστηκαν 80% forward and 80% side for the RGB Phantom. For Sequoia+, 80% forward overlap was calculated and introduced with time laps in the camera software. As can be observed in Figure 6, the images (RGB or MS) of Sequoia+ cover ~85% of the surface area of the RGB Phantom images. This results in a constant side overlay of 80% of the RGB Phantom images resulting in ~65% or ~80% side overlay on the Sequoia+ images (sufficiently good overlay rates for the processing of the Sequoia+ images). This is the reason why in the case of Sequoia+ and for a flight height of 45 m, an additional strip of images was exploited, so that the study area has no mapping gaps (the same is not true in the case of the 30 m flight height, where the same number of strips covered the study area without mapping gaps for both cameras). Thus, for 30 m flight height 5 strips with a total of 30 images were used for both RGB Phantom and Sequoia+ (RGB or MS), while at 45 m 2 strips with a total of 10 images were used for RGB Phantom and 3 strips with a total of 13 images (RGB or MS) for Sequoia+.
The data radiometric quality of MS cameras for UAV is still uncertain and for that, it is necessary to calibrate spectral information with spectral targets. The reflectance response of the spectral targets is calculated in situ with a spectrometer [24,25,26,27,28,29,30,31,32]. In this paper, a spectrometer was not available and therefore shortly before the end of the images, the suitable calibration target of the Sequoia was imaged [26,27,28,29,30,31,32,33]. Target was automatically detected by the Agisoft Metashepe© and was calculated the reflectance values of the green, red, red-edge, and NIR spectral bands.
The Agisoft Metashape© was utilized to produce the Digital Surface Models (DSMs) and orthophoto mosaics of both cameras, for both flight heights (Figure 7 and Figure 8). The cameras used in this paper are column-parallel readout circuits, which operate with line memories that are produced by the simultaneous readout of all pixels in a row. The readout is conducted from top to bottom and row-by-row (rolling shutter). A setback of this process is that pixels in different rows are exposed to light at different times causing skew and other image problems, especially for moving objects, subsequently decreasing image quality [34,35,36,37,38,39]. Thus, to minimize these errors, during the above image processing, the “Enable rolling shutter compensation” option/command was enabled in the software. Using the 18 GCPs on all image blocks the results of the processing are presented in Table 2.
The DSMs and the corresponding generated orthophoto mosaics for both flight heights were then exploited to manually (using ArcMap©) extract the coordinates (X’, Y’, and Z’) of the CPs. This then allowed their comparison with the X, Y, and Z coordinate values of the corresponding CPs measured by GPS in the field. Thus, on the one hand, it was determined whether the final products were accompanied by systematic or random errors (Table 3–Table 8) and, on the other hand, the mean values and standard deviations of the differences ΔX, ΔY, and ΔZ (Table 9 and Figure 9) were calculated for both flight heights.
The Phantom 4 does not include a PAN sensor and for that following the satellite image processing procedures where the satellites are equipped with a PAN sensor and utilize it in image fusion realization, the RGB orthophoto mo-saic of the RGB Phantom (flight height 30 m) was transformed into a Pseudo-Panchromatic (PPAN) orthophoto mosaic (Figure 10 and Figure 11) [40,41].
The transformation resulted in a black and white image where the intensity value of each pixel is the average value of the corresponding pixels’ intensities of the R, G and B bands. It is obvious that there are spectral differences between the PPAN image and the PAN image of a sensor, which is sensitive to the spectrum’s visible area. Until now, the human optimal visual perception of B/W images was the key in transformation techniques of RGB images into B/W images, in contrast to real PAN images’ spectral approach.
Subsequently, the histogram of the PPAN orthophoto mosaic was adjusted to the histogram of the MS orthophoto mosaic from Sequoia+ (flight height 30 m). The fused image (Figure 10 and Figure 11) was created using Principal Component Analysis (PCA) technique. In terms of the output produced, any fused image B*h should be as identical as possible to the image Bh that the corresponding sensor would observe with the highest resolution h, if existent. So, the correlation table (Table 10) of the original MS orthophoto mosaic with fused image revealed the retention rate of the original spectral information (which should be >90%, i.e., >+0.9) [46,47,48,49,50] (other two techniques, Multiplacative and Brovey Transform, have also been used [49,50,51,52,53], which did not give better results in the retention of spectral information, and therefore are not analyzed in the paper).
Starting with the comparison of Figure 10d and Figure 11d with the corresponding Figure 10e and Figure 11e, the need to improve the spatial resolution of the MS images of cameras for UAVs, collected even from a low flight height (e.g., 30 m), is evident. This is also in line with the current trend of major manufacturers of cameras for UAVs to proceed with the addition of PAN sensors to existing MS cameras, thus enabling the production of fused images.
According to the correlation table (Table 10), the original spectral information of the MS orthophoto mosaic is preserved in the fused image. Its spatial resolution (0.028 m) is improved twice in the fused image (0.012 m). By performing unsupervised classifications, I demonstrate not only the improvement of classification using the fused image (compared to the classification of the original MS orthophoto mosaic), by simply comparing Figure 12b and Figure 13b with Figure 12d and Figure 13d respectively, but also the ability to optimally observe and discern the thematic information contained in the fused images on the one hand and in their products on the other hand (e.g., classification images) (Figure 12c,d and Figure 13c,d).
Equation (1) of ERGAS is given as:
where «h» is the spatial resolution of the high-resolution (fused image) images, «I» is the spatial resolution of the low-resolution (MS) images, «N» denotes the number of spectral bands and «k» denotes the index of each band. The RMSE for the «k» band between the fused and the MS image is shown through RMSE (Bk). In the reference image, «Mk» represents the mean of the k-band.
To begin with, values for each spectral band («Pi» for MS and «Oi» for fused images) were gathered after the selec-tion of random pixels (number of pixels: «n») at the same coordinates of images. This was followed by the calculation of RMSE for each spectral band, according to Equation (2).
Finally, the result of the ERGAS index is, in the case of Figure 12(a and c), equal to 0.4 and in the case of Figure 13(a and c) equal to 0.6, which proves that the fused images are of much better quality than the original MS images, as the total ERGAS error is quite small (generally it should be <3 and the bigger the ERGAS error, the worse the spectral quality of the image under study is).
Utilizing the same 18 GPPs across all image blocks, it was found for both RGB Phantom and MS Sequoia+ that the root means square error (RMSE) of the processing in Agisoft Metashepe© is degraded by ~45% with a simultaneous increase of 50% in flight height (Table 2). In the case of RGB Sequoia+, the RMSE is degraded by ~130% with a 50% in-crease in flight height. Also, Table 2 shows that the RMSE of RGB Sequoia+ is downgraded by ~450% relative to the RMSE of RGB Phantom at the corresponding heights. Finally, the improved RMSEs observed in MS Sequoia+ relative to RGB Phantom at the corresponding flight heights cannot be interpreted and therefore may be due to chance observation (as the spatial resolution of the RGB Phantom images is much better than the MS Sequoia+ images).
For the 20 CPs that have not taken part in the processing of the image blocks in Agisoft Metashepe© and are exclu-sively exploited for the actual control of the final products (DSMs and orthophoto mosaics), their X’, Y’, and Z’ values in the final products were calculated and then compared with their actual X, Y and Z values measured in the field.
The Analysis of Variance AVONA applied, performs hypothesis testing to determine differences in the mean values of different data sets. In the paper, the null hypothesis H0 is that all samples come from two different data sets (X’ and X, Y’ and Y, Z’ and Z) with the same mean value. The alternative hypothesis HA is that at least their mean values are different. According to Table 3 to Table 8, for all the datasets X and X’, Y and Y’, Z and Z’, the obtained P-values are much larger than 0.05, which means that the null hypothesis H0 is universally valid. Therefore, for a confidence level of 95%, there is no significant difference/systematic error between the mean values derived from the X’ (or Y’ or Z’) of the products and the actual mean values of X (or Y, or Z respectively) measured in the field. Thus, any differences between them are con-sidered negligible and are attributed to random errors. Also, the values of the test statistic F are less than the critical values (F crit) and therefore the standard deviations between the values of X’ (or Y’ or Z’) and X (or Y or Z respectively) do not differ significantly, so that the measurements (field and products) are accompanied only by random errors.
According to Table 9 and Figure 9, it can be seen that increasing the flight height by 50% has a little (negative) effect on the accuracy of the final products with respect to the X values, both in the case of the RGB Phantom and in the case of MS Sequoia+. However, the same is not the case for RGB Sequoia+, where increasing the flight height by 50% degrades the accuracy of the X values by ~20%.
As for the Y and Z values in the final products, increasing the flight height by 50% degrades their accuracy by ~40% in the case of either RBG Phantom or MS Sequoia+, and by ~80% in the case of RGB Sequoia+.
Finally, the best accuracies (on all 3 axes) are observed by far in the RBG Phantom products (compared to the RGB Sequoia+). For both the RBG Phantom and MS Sequoia+ products, the accuracies are better overall in the 30 m flight height case (compared to 45 m). For this reason, the RGB orthophoto mosaic of the RGB Phantom at 30 m was exploited to produce the PPAN image, and the MS Sequoia+ at 30 m was exploited to produce the fused image.
As the observations are not accompanied by systematic errors, some general conclusions are drawn. They remain to be confirmed in the future with additional observations and with flights that can be made at the same and/or higher altitudes. The accuracies of the final products are degraded by a percentage less than the percentage increase in flight height in the case of either the RGB Phantom or the MS Sequoia+. Specifically, a 50% increase in flight height results in a ~35% degradation in the horizontal and vertical accuracy of the products. In the case of RGB Sequoia+, a 50% increase in flight height results in a ~65% degradation of the horizontal and vertical accuracy of the products. Also, the vertical ac-curacy is degraded ~2 times compared to the horizontal accuracy, both in the case of the RGB Phantom products and in the case of the RGB Sequoia+ products, for both flight heights. The vertical accuracy is degraded ~3 times relative to the horizontal accuracy in the case of MS Sequoia+ products, for both flight heights.
The need for image fusion to improve the spatial resolution of the MS camera image used in UAVs is also con-firmed in this paper. This can be accomplished either by using the RGB image of the same camera that provides the MS image [33], or by using the RGB image of a different camera than the one providing the MS image. The original spectral information can be preserved to a satisfactory degree, thus offering the possibility of optimal discrimination of thematic information in the fused image. Combined, therefore, high measurement and thematic content can be ensured.
Special thanks to the chief of the Ephorate of Antiquities of Serres, Mrs. D. Malamidou, for the permission granted to me to carry out the topographical measurements and take images in the archaeological site of Amphipolis.
Not applicable.
Not applicable.
This research received no external funding.
The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Dimitrios K. Image Fusion Capability from Different Cameras for UAV in Cultural Heritage Applications. Drones and Autonomous Vehicles 2024, 1, 10002. https://doi.org/10.35534/dav.2023.10002
Dimitrios K. Image Fusion Capability from Different Cameras for UAV in Cultural Heritage Applications. Drones and Autonomous Vehicles. 2024; 1(1):10002. https://doi.org/10.35534/dav.2023.10002