GIS Programming

Course Information
Courses and exams
Prof	Canters Frank; Smets Benoît
Courses	Lectures
Examination	Practical exam on Python scripting in computer lab
Background
Credits	6
When?	2nd term
ECTS	KU Leuven; VUB

The exam is now only the 3 hours to solve several GIS problems. You may use everyting: handbook, exercises, own computer, internet (excluding LLMs like ChatGPT, Copilot and Gemini), but you can't communicate during the exam. It's possible that unseen algorithms are asked. Intermediary answers will be available. It's not vital that the script works, as long as you can explain you workflow.

2025

June

23/06/205

Question 1 (/7)

Given: population data in a .txt file, a districts polygon shapefile and a green shortage polygon shape.

Extract the population data, and add an attribute "population density" to the districts shapefile using this data. Then, create a shapefile of areas where population density is higher than 15 000 inh / km2 and where there is a shortage of urban green.

Question 2 (/7)

Given: districts polygon shapefile.

Give an estimate of how surface temperatures in Brussels' districts have changed over time. Use MODIS temperature data from 2001-2005 and 2018-2021, take the average for these time periods and then take the difference of these averages Calculate the mean temperature difference for each district and add it to the shapfile as an attribute.

(Hint: you can use .mean on an imagecollection to take the mean value of a pixel over all diferent entries in the collection, and you can use img1.difference(img2) to take the difference of pixel values between two images).

Question 3 (/6)

Given: the output from question one, in case you didn't get it, and a points shapefile.

Give the points that are located in the high need areas for urban green (popdens > 15 000 and green shortage. Imagine we will propose infrastructure interventions for these points: green interventions and blue interventions. Add an attribute to these points defining this type of intervention and assign the types "blue" and "green" randomly. Finally, give the spatial extent of these interventions in a shapefile by defining circles around these points, of 3m radius for the green interventions and of 2m radius for the blue ones.

13/06/2025

Question 1 ( / 7)

Your company is evaluating new areas for expanding coffee production. You are tasked with assessing the suitability of several locations based on temperature and elevation data. Write a Python script using ArcPy that:

Loads a raster representing annual mean temperature [0.5]
Reclassfies the raster into three classes [3]
1. Suitable: 19-24°C (value 1)
2. Marginal: 16-18°C and 25-26°C (value 2)
3. Unsuitable: <16°C or >26°C (value 3)
Saves the resulting raster as 'suitable.tif' [0.5]
For all locations extracts the elevation from the SRTM DEM using Google Earth Engine [3]

Question 2 ( / 6)

Efficient transport of harvested coffee is critical. You are asked to develop a script tool that:

Loads a point shapefile of coffee farm locations and a line shapefile of the road network (given by the user) [0.5]
Calculates the shortest distance from each farm to the nearest road segment and adds a new field to the farm attribute table, containig this distance (in meters) [3]
Selects farms that are located more than 100 meters from any road and export them to a table with a name given by the user [3]

Question 3 ( /7)

A field worker has collected a GPS trace of a newly constructed access road to one of the coffe production zones. The GPS coordinates are stored in a plain text file, but some of the points appear to be erroneous. Write a Python script that:

Reads the GPS coordinates from the text file and creates a new polyline feature [2]
Iterates through the coordinate sequence and removes any point that is more than 500 meters away from the straight line between its previous and next point [3]
Appends the cleaned polyline to the existing roads vector layer [1]

2024

Note: AI was allowed this year

June

Context

Urban areas often experience higher temperatures than their rural surroundings, a phenomenon known as the Urban Heat Island (UHI) effect. This can lead to increased energy consumption, elevated emissions of air pollutants and greenhouse gases, and adverse health effects. Proper planning and mitigation strategies are essential to address this issue. In this exercise, you will work with geospatial data to assess the UHI effect.

The following layers are provided on canvas:

buildings_clipped.shp: building footprints including attributes representing basic information on the buildings.
new buildings: folder with text files containing the buildings that need to be updated
temperature_points.shp: points representing temperature readings at various locations across the city.
question2_uhi: question 2 takes a long time to run so the results are provided to complete question 3

Question 1 ( /6)

Part of the temperature information is missing due to defective measurement stations. You decide to use the MODIS Land Surface Temperature, available on Google Earth Engine, to obtain the missing temperature values.

Initialize and authenticate the earth engine package
Load your temperature point layer onto your google cloud project
Sample the temperature values for the points in the temperature layer
Replace the missing temperature values in the TEMP attribute by the sampled value. (Missing values were given the value 0 in the temperature_points.shp layer).

Question 2 ( /7)

As part of an effort to assess the UHI effect, you are asked to develop a Python script that calculates the average temperature within a 500-meter buffer around each building and stores this information in a new field in the output feature class. If there are no measurement stations within the buffer distance, the temperature value should be 0. Write this as a script tool.

Create a new feature class question2_uhi to store the results.
Ensure the output fields are valid and include meaningful names. And make sure that the layer can not be saved under this name if it already exists.
Indicate as a comment which data types were specified in the script tool interface.
Write the average temperature to the question2_uhi layer

Question 3 ( /7)

Urban planners need to export information about buildings that have high exposure to the UHI effect to prioritize mitigation efforts. You need to write a Python script that extracts information from the buildings feature class and writes it to a text file with the spherical coordinates (latitude and longitude - GCS_WGS_1984 of the centroid) and the area of the building.

However, first you need to update the shape of some buildings. You received a folder with text files from the local government containing the ID of the buildings to be updated and the new correct vertices. Read them from the text files and write them to buildings_clipped shapefile.

Update the buildings using the provided text files
Extract buildings where the average temperature within the 500-meter buffer is above 19°C.
For each building, provide the following information separated by a semicolon (;):
- The field building_id
- Latitude of building centroid in decimal degrees
- Longitude of building centroid in decimal degrees
- Area of building in m²
Save the output to a text file named high_uh_risk_buildings.txt

2021

The exam is [used to be] in two parts. In the morning, you get 3 hours to solve 3 GIS problems. In the afternoon, you get 15 minutes without preparation to orally explain a script or model that we made in the courses. This can be any script from the exercises. Canters and Tim will ask you a lot of questions about the functionality of pieces of code, or elements of a model. Starting in 2018-2019, the course was only given by Tim and the exam consisted of only a programming exercise. No theory or explanations were asked.

June

Question 1

The internet is a great source of geographic information. With the Google Places® API Web Service, for example, and the necessary Python skills, a wealth of spatial data lies at your fingertips. There are Python modules (e.g. urllib) that allow you to read information from the internet just like you read it from a file. The file barsleuven.xml, which you find on Canvas, was obtained from the Google Places® Web Service with just a few lines of code. As the name suggests, this xml file contains information on 200 bars and restaurants in Leuven. As with all xml files, the data is stored in a straightforward, hierarchical way, separated by xml tags.

Your job in this exercise is to write a stand-alone Python script that generates a point feature class representing the location of these establishments. The geographical coordinates (GCS ETRS_1989) of the points can be found under the proper xml tags within the file. The feature class you produce should, however, be set in the Belgian Lambert 2008 projection.

The attribute table of the point feature class should contain the following additional fields:

the name of the establishment,
the type of establishment (bar, café, restaurant…).

Each entry in the xml file has more than 1 “type” field associated to it (e.g. hotels also have bars), but only the first type that is listed should be included in the attribute table.

You can choose your own names for the fields, but make sure they are valid by using the proper arcpy function.

Question 2

Natural floods come every year to Dar es Salaam (Tanzania), one of the fastest growing cities in Africa, but due to a lack of adequate planning they become man-made disasters.

Dar Ramani Huria (Swahili for "Dar Open Map") is mapping flood-prone wards of the city for the Humanitarian Open Street Map Team (HOT - hotosm.org). The maps are used to run flood impact scenarios, enabling decision makers to better plan for and respond to such disasters in the future.

You will work with a small extract of these data to solve this question.

We have put the data in a geodatabase called “exam_june21_data_Q2.gdb”, which you will find on Canvas. This geodatabase contains the following feature classes:

buildings_des: building footprints including attributes representing basic information on the buildings (e.g. the land use type in a field named “type”)

roads_des: road segments with attributes

wetlands_des: extent of natural wetlands, representing flood prone areas

One of the problems in Dar-Es-Salaam is that people construct settlements in natural flood zones. You are asked to develop a model in ModelBuilder that puts all residential buildings that are located further than 300 meters from a hospital and that lie within the area prone to flooding in a new feature class. A simple “clip” or geometric intersect operation is not useful as we do not want the building outlines to be cut. If a building is even partly inside the flood prone area (wetland), the entire outline should be included in the output feature class. You will therefore need to work with feature selection tools.

The model is intended to be run as a stand-alone tool from ArcCatalog. The processing part should be implemented in Modelbuilder using the ArcGIS Pro tools. The model should let the user specify 3 parameters: the feature class with the buildings (input), the wetlands feature class (input) and a string that will be appended as a suffix to the default output feature class name (see further). These parameters should be named meaningfully in the model dialog. When you run the model, the output feature class should be called “question2_<suffix>” and must be stored in the geodatabase you created that carries your name (as mentioned in the overall instructions). The required output feature class is the only output data that should be produced.

August

Question 1

Write a “stand-alone” Python script to develop a population map using a dasymetric mapping approach

Objective

Your task is to develop a stand-alone Python script that spatially redistributes the number of inhabitants, which is available at the level of “statistical sectors” (NIS sectors), to the residential buildings present within each sector using cursors. To accomplish this, you will use 2 input feature classes that are present in a geodatabase called “gisprogramming_exam_aug21.gdb”. The feature class “NISDATA_2011_Brussels_subset” includes a number of NIS statistical sectors of Brussels with their respective attribute data (see guidelines). The feature class “building_outlines_brussels_subset” includes the outlines of all buildings that are located within these statistical sectors.

The output that your script should produce is a new feature class that contains all buildings with an extra field containing the number of people living in that particular building. Put this feature class in a new geodatabase. You can create this new geodatabase manually, i.e. outside of the scripting environment. Give this geodatabase your own name.

Method

To implement a dasymetric mapping approach you will follow these principles:

1) The population of a given sector X is redistributed only to buildings located inside sector X

2) The population of a sector is allocated to the buildings in relation to the ground surface area of the building AND its number of floors (specified by the fields Shape_Area and Number_of_floors).

For example:

- there are 50 people living in sector X

- this sector has 2 buildings:

Building A measures 200m² and has 5 floors

Building B measures 400m² and has 2 floors

* There is 5 x 200m² + 2 x 400m² = 1800m² of building surface

* This means there are 50 people / 1800m² = 0.027…people per m²

* Building A has 1000m² x 0.027…ppl/m² = 27.77… people

* Building B has 800m² x 0.027…ppl/m² = 22.22…people

3) No population is allocated to buildings with a non-residential function. Only buildings with the following land-use type are therefore included in the analysis, all others are ignored (specified by fields LU_CODE and LandUse):

- continuous urban fabric (code 11100)

- discontinuous dense urban fabric (code 11210)

- discontinuous medium density urban fabric (code 11220)

- discontinuous low density urban fabric (code 11230)

4) Some buildings are located in a residential area, but they nevertheless do not have any inhabitants. This is for instance the case for garage boxes or small warehouses or sheds. Such buildings do not have a separate official address number and can be excluded on this basis (value of field Number_of_addresspoins = 0).

2020

Question 1

Archaeologists rely on you, as a GIS expert, to develop an application in Modelbuilder that supports them with the analysis of their survey data.

The model you must develop is intended to be run from catalogue only.

It allows the user to draw one or more polygons during run-time on an opened map document (map1.mxd).

The model should then output a new feature class containing all stone circles from the bronze age and the iron age that are completely within the polygon(s) drawn by the user.

The selection of the polygons should be visualized by thick red lines (as in AOI.lyr).

To keep things simple for them, the users should only provide 2 parameters in the model’s dialog box: the “real-time” drawing of polygons and a name for the output feature class as a string.

The location (full path) of the output feature class cannot be changed by the user, but is predefined by you to be stored in the exam’s geodatabase <name_first name_question1.gdb>.

The name itself, however, should be specified by the user but should be preceded by “question1_”.

Finally, your model must use the feature class Archaeology_survey located in the geodatabase rather than the feature layer that will be present in the TOC after you have opened the map document.

As mentioned earlier, the user should also not be able to change it.

All model variables should be meaningfully named.

Question 2

Considering the geographic expanse of the Altai Mountains, exhaustively surveying this vast and inhospitable region is very inefficient. Techniques are therefore needed that allow a fast and effective detection of the burial mounds to help archaeologists make decisions regarding their conservation and excavation.

High resolution remote sensing images allow a detailed observation over large areas. Burial mounds may be detectable in such images, but manual identification is tedious. Many researchers have therefore applied automated approaches to detect (often relatively big) archaeological objects. In recent years, deep learning techniques, especially convolutional neural networks (CNN), have achieved remarkable results in many computer vision applications such as image classification and face detection. Would it not be interesting to apply the algorithms used by Facebook and Google to detect archaeological features on satellite images?

For this question, we ask you to develop a script tool that helps the archaeologists to assess the performance of the Fast-R CNN “tomb” detector, an improved method for detecting burial mounds on high resolution satellite images using CNN for object detection. The CNN has produced bounding boxes of potential burial mound locations (feature class detections in the geodatabase). You are asked to make a script tool that calculates three error metrics (see below) by comparing the detections with the reference data from the field surveys (feature class archaeology survey) using feature selection mechanisms. A feature is considered to be detected if its centroid lies within the detection bounding box.

The script tool should take the following input:

The feature class with detected archaeological structures (detections)
The feature class with all surveyed archaeological structures (Archaeology_survey)
A parameter that allows to user to set a threshold on the size of the features that he/she expects to be detected (use the field Shape_Area in Archaeology_survey. Units are m²).

And produce the following output:

A feature class with all correctly identified archaeological features (“true positives”)
A new “statistics table” with the values of the three error metrics (see below). This means that this new table (find the correct tool to create a new, empty table) should have three columns and just one row. The columns (fields) should be of type float and their name should represent the respective error metric.
An informative message stating the three resulting error metrics and mentioning the size threshold that was used.

Put this script tool in a new model that can be run from catalog. The user should provide the requested input and a name plus location for the output table and feature class in the tool dialog. When running the model, give meaningful names to the output.

The error metrics you should calculate:

In machine learning and information retrieval experiments, one often uses the metrics of Precision, Recall and F measure to assess the performance of algorithms.

In our experiment, precision is the fraction of correctly identified archeological objects relative to all detections that are made. Example: our detector found 100 objects, 60 of them are actual archaeological objects (based on the features present in the survey data). This means the precision is 0.6.

Recall, on the other hand, tells us how many objects our detector has correctly identified relative to all objects that it should have been able to detect. Example: there are 120 archaeological objects in this area that our detector should have found based on the survey data. It has found 60 of those, which means the recall is 0.5. As there are many small objects in the surveyed data that we cannot hope to detect on satellite images, the user should be able to set a threshold to “filter out” smaller features for the error calculation. For example, with a threshold on the Shape_Area field of 50m², features smaller than that size are not considered as objects that should be found by the detector and should therefore not be taken into consideration to calculate recall.

The F measure is derived from precision and recall as follows:

F = 2 . (precision . recall) / (precision + recall)

2019

June

Question 1: The city of Dar es Salaam in Tanzania is very prone to floods. You reveived three feature classes: Buildings, roads and flood areas (but you did not need the roads file). Produce a model that makes a new feature class with all residential buildings that are (partly) located in the flood areas and are more than 200 meters away from a hospital. Your model can only consist of one 'tool' namely, your python script (so you were not allowed to actually just make a model in modelbuilder, you had to code everything yourself). Also, at the end of your model, a message should appear, telling the user how many buildings were in this zone. The user should also be able to define a suffix that needed to be appended to the name of the final feature class.

Question 2: You use the same files. Write a script that produces a text file. In this text file, the id and the latitude and longitude of the centroid of all buildings with the type public, school and hospital should be written, seperated by a ; between the three variables and every building should be on a new line.

2017

Question 1

One of the problems is that people construct settlements in natural flood zones. You are asked to develop a model that puts all RESIDENTIAL buildings that are within the area prone to flooding. A simple ‘clip’ operation is not useful as we do not want the building outlines to be cut by the wetland polygon. If a building is even partly inside the flood prone area (wetland), the entire outline should be included in the output feature class.

The model should provide the user with 3 input parameters: the feature classes containing the buildings used as input, the wetlands and the output feature class. When you run the model, the output feature class should be called “question1_infloodzone” and stored in the gDB.

Question 2

As part of an effort to quantify the “hazard” for each building, you are asked to develop a Python script that calculated the distance from each building to the nearest PRIMARY or TRUCK road segment (fclass field), that calculated a hazard factor (0-100) based on this distance and that stores both distances and hazard factor as two new fields in the output feature class. We defined the extent of the study area to keep calculation times relatively short.

hazard factor = 100/1+500*e^-0.015d

where d is the shortest distance between the building outline (not the centroid) and the primary road. implement this formula as a Python function.

You will need to use the proper method of geometry objects for distance calculation. Also take note that the roads are made up of several segments (polylines). You can choose any meaningful names for the outline fields, but they should be checked for validity within the script.

Question 3

Relief workers would like you to write a short Python script that writes out information from the feature class you produced in Question 2 to a txt file. Each building should be stored as a separate line and the following information for each building should be separated by a “;”

X coordinate of building centroid
Y coordinate of building centroid
shortest distance of building outline to primary road
hazard factor

If you failed to produce the output required by Question 2, write the field “osm_id” and “typ” instead of shortest distance and hazards factor.

2016

given: fc with companies and fc with horeca (pt files). Find closest bar or cafe (two types of horeca) for each company, give distance to and name of that horeca establishment. Store in new fc
given: internet file that could be read as textfile about horeca in leuven. Written in it: many lines, spaces, ... with text, somewhere the name, somewhere the type (there were more types per name), and somewhere the lat and lon. Make FC, and store in gdb, a file containing all horeca establishments names, type (only the first type) and the shape@ (point feat). ! The text file is in another Reference system than the output has to be.

2015

Morning

1. exercise practically the same as ex 2.3

Given: feature class with rivers ("hydrography")

Develop a model that automatically generates a seperate feature class for each type of basin ("STRMGEB"). The name of the feature classes should contain the name of the basin (from the attribute table).

2. Given: feature class with rivers ("hydrography")

Generate a standalone script that replaces all the spaces by underscores for the values in the fields "STRMGEB" and "BEKNAAM". Your result should be a new feature class (so copy the original and don't edit that one).

3. An exercise with geometries, much more difficult than all the others.

Given: feature class with some river features.

We want to simplify these line features by reducing the number of vertices. A figure was given. The idea is to start at the first vertex V0. If the next vertex V1 is within a certain distance delta from V0, it should be removed from the output line feature. If V1 is further away from V0 than the distance delta, the vertex will be retained and will become the new V0. However, the first and the last vertex of each polyline feature must be retained in any case. (It was not said in the exercise description but some of the line features were multipart...). Desired output: a new feature class with the simplified line features. The script should be a standalone script.

Afternoon

Oplossing van oefening getoond in de les. Kan zowel ModelBuilder, python script als combinatie van beide zijn. Je moet iedere stap kunnen verklaren!