Naar inhoud springen

GIS Programming

Uit Atlas Examenwiki
Course Information
Courses and exams
ProfCanters Frank
Smets Benoît
CoursesLectures
ExaminationPractical exam on Python scripting in computer lab
Background
Credits6
When?2nd term
ECTSKU Leuven
VUB

The exam is now only the 3 hours to solve several GIS problems. You  may use everyting: handbook, exercises, own computer, internet (yes, also chatGPT), but you can't communicate during the exam. It's possible that unseen algorithms are asked. Intermediary answers will be available. It's not vital that the script works, as long as you can explain you workflow.

2024

Coming soon

2021

The exam is [used to be] in two parts. In the morning, you get 3 hours to solve 3 GIS problems. In the afternoon, you get 15 minutes without preparation to orally explain a script or model that we made in the courses. This can be any script from the exercises. Canters and Tim will ask you a lot of questions about the functionality of pieces of code, or elements of a model. Starting in 2018-2019, the course was only given by Tim and the exam consisted of only a programming exercise. No theory or explanations were asked.

June

Question 1

The internet is a great source of geographic information. With the Google Places® API Web Service, for example, and the necessary Python skills, a wealth of spatial data lies at your fingertips. There are Python modules (e.g. urllib) that allow you to read information from the internet just like you read it from a file. The file barsleuven.xml, which you find on Canvas, was obtained from the Google Places® Web Service with just a few lines of code. As the name suggests, this xml file contains information on 200 bars and restaurants in Leuven. As with all xml files, the data is stored in a straightforward, hierarchical way, separated by xml tags.

Your job in this exercise is to write a stand-alone Python script that generates a point feature class representing the location of these establishments. The geographical coordinates (GCS ETRS_1989) of the points can be found under the proper xml tags within the file. The feature class you produce should, however, be set in the Belgian Lambert 2008 projection.

The attribute table of the point feature class should contain the following additional fields:

  • the name of the establishment,
  • the type of establishment (bar, café, restaurant…).

Each entry in the xml file has more than 1 “type” field associated to it (e.g. hotels also have bars), but only the first type that is listed should be included in the attribute table.

You can choose your own names for the fields, but make sure they are valid by using the proper arcpy function.

Question 2

Natural floods come every year to Dar es Salaam (Tanzania), one of the fastest growing cities in Africa, but due to a lack of adequate planning they become man-made disasters.

Dar Ramani Huria (Swahili for "Dar Open Map") is mapping flood-prone wards of the city for the Humanitarian Open Street Map Team (HOT - hotosm.org). The maps are used to run flood impact scenarios, enabling decision makers to better plan for and respond to such disasters in the future.

You will work with a small extract of these data to solve this question.

We have put the data in a geodatabase called “exam_june21_data_Q2.gdb”, which you will find on Canvas. This geodatabase contains the following feature classes:

buildings_des: building footprints including attributes representing basic information on the buildings (e.g. the land use type in a field named “type”)

roads_des: road segments with attributes

wetlands_des: extent of natural wetlands, representing flood prone areas

One of the problems in Dar-Es-Salaam is that people construct settlements in natural flood zones. You are asked to develop a model in ModelBuilder that puts all residential buildings that are located further than 300 meters from a hospital and that lie within the area prone to flooding in a new feature class. A simple “clip” or geometric intersect operation is not useful as we do not want the building outlines to be cut. If a building is even partly inside the flood prone area (wetland), the entire outline should be included in the output feature class. You will therefore need to work with feature selection tools.

The model is intended to be run as a stand-alone tool from ArcCatalog. The processing part should be implemented in Modelbuilder using the ArcGIS Pro tools. The model should let the user specify 3 parameters: the feature class with the buildings (input), the wetlands feature class (input) and a string that will be appended as a suffix to the default output feature class name (see further). These parameters should be named meaningfully in the model dialog. When you run the model, the output feature class should be called “question2_<suffix>” and must be stored in the geodatabase you created that carries your name (as mentioned in the overall instructions).  The required output feature class is the only output data that should be produced.

August

Question 1

Write a “stand-alone” Python script to develop a population map using a dasymetric mapping approach

Objective

Your task is to develop a stand-alone Python script that spatially redistributes the number of inhabitants, which is available at the level of “statistical sectors” (NIS sectors), to the residential buildings present within each sector using cursors. To accomplish this, you will use 2 input feature classes that are present in a geodatabase called “gisprogramming_exam_aug21.gdb”. The feature class “NISDATA_2011_Brussels_subset” includes a number of NIS statistical sectors of Brussels with their respective attribute data (see guidelines).  The feature class “building_outlines_brussels_subset” includes the outlines of all buildings that are located within these statistical sectors.

The output that your script should produce is a new feature class that contains all buildings with an extra field containing the number of people living in that particular building. Put this feature class in a new geodatabase. You can create this new geodatabase manually, i.e. outside of the scripting environment. Give this geodatabase your own name.

Method

To implement a dasymetric mapping approach you will follow these principles:

1) The population of a given sector X is redistributed only to buildings located inside sector X

2) The population of a sector is allocated to the buildings in relation to the ground surface area of the building AND its number of floors (specified by the fields Shape_Area and Number_of_floors).

For example:

- there are 50 people living in sector X

- this sector has 2 buildings:

Building A measures 200m² and has 5 floors

Building B measures 400m² and has 2 floors

           * There is 5 x 200m² + 2 x 400m² = 1800m² of building surface

           * This means there are 50 people / 1800m² = 0.027…people per m²

           * Building A has 1000m² x 0.027…ppl/m² = 27.77… people

           * Building B has 800m² x 0.027…ppl/m² = 22.22…people

3) No population is allocated to buildings with a non-residential function. Only buildings with the following land-use type are therefore included in the analysis, all others are ignored (specified by fields LU_CODE and LandUse):

           - continuous urban fabric (code 11100)

           - discontinuous dense urban fabric (code 11210)

           - discontinuous medium density urban fabric (code 11220)

           - discontinuous low density urban fabric (code 11230)

4) Some buildings are located in a residential area, but they nevertheless do not have any inhabitants. This is for instance the case for garage boxes or small warehouses or sheds. Such buildings do not have a separate official address number and can be excluded on this basis (value of field Number_of_addresspoins = 0).

2020

Question 1

Archaeologists rely on you, as a GIS expert, to develop an application in Modelbuilder that supports them with the analysis of their survey data.

The model you must develop is intended to be run from catalogue only.

It allows the user to draw one or more polygons during run-time on an opened map document (map1.mxd).

The model should then output a new feature class containing all stone circles from the bronze age and the iron age that are completely within the polygon(s) drawn by the user.

The selection of the polygons should be visualized by thick red lines (as in AOI.lyr).

To keep things simple for them, the users should only provide 2 parameters in the model’s dialog box: the “real-time” drawing of polygons and a name for the output feature class as a string.

The location (full path) of the output feature class cannot be changed by the user, but is predefined by you to be stored in the exam’s geodatabase <name_first name_question1.gdb>.

The name itself, however, should be specified by the user but should be preceded by “question1_”.

Finally, your model must use the feature class Archaeology_survey located in the geodatabase rather than the feature layer that will be present in the TOC after you have opened the map document.

As mentioned earlier, the user should also not be able to change it.

All model variables should be meaningfully named.

Question 2

Considering the geographic expanse of the Altai Mountains, exhaustively surveying this vast and inhospitable region is very inefficient. Techniques are therefore needed that allow a fast and effective detection of the burial mounds to help archaeologists make decisions regarding their conservation and excavation.

High resolution remote sensing images allow a detailed observation over large areas. Burial mounds may be detectable in such images, but manual identification is tedious. Many researchers have therefore applied automated approaches to detect (often relatively big) archaeological objects. In recent years, deep learning techniques, especially convolutional neural networks (CNN), have achieved remarkable results in many computer vision applications such as image classification and face detection. Would it not be interesting to apply the algorithms used by Facebook and Google to detect archaeological features on satellite images?

For this question, we ask you to develop a script tool that helps the archaeologists to assess the performance of the Fast-R CNN “tomb” detector, an improved method for detecting burial mounds on high resolution satellite images using CNN for object detection. The CNN has produced bounding boxes of potential burial mound locations (feature class detections in the geodatabase). You are asked to make a script tool that calculates three error metrics (see below) by comparing the detections with the reference data from the field surveys (feature class archaeology survey) using feature selection mechanisms. A feature is considered to be detected if its centroid lies within the detection bounding box.

The script tool should take the following input:

  • The feature class with detected archaeological structures (detections)
  • The feature class with all surveyed archaeological structures (Archaeology_survey)
  • A parameter that allows to user to set a threshold on the size of the features that he/she expects to be detected (use the field Shape_Area in Archaeology_survey. Units are m²).

And produce the following output:

  • A feature class with all correctly identified archaeological features (“true positives”)
  • A new “statistics table” with the values of the three error metrics (see below). This means that this new table (find the correct tool to create a new, empty table) should have three columns and just one row. The columns (fields) should be of type float and their name should represent the respective error metric.
  • An informative message stating the three resulting error metrics and mentioning the size threshold that was used.

Put this script tool in a new model that can be run from catalog. The user should provide the requested input and a name plus location for the output table and feature class in the tool dialog. When running the model, give meaningful names to the output.

The error metrics you should calculate:

In machine learning and information retrieval experiments, one often uses the metrics of Precision, Recall and F measure to assess the performance of algorithms.

In our experiment, precision is the fraction of correctly identified archeological objects relative to all detections that are made. Example: our detector found 100 objects, 60 of them are actual archaeological objects (based on the features present in the survey data). This means the precision is 0.6.

Recall, on the other hand, tells us how many objects our detector has correctly identified relative to all objects that it should have been able to detect. Example: there are 120 archaeological objects in this area that our detector should have found based on the survey data. It has found 60 of those, which means the recall is 0.5. As there are many small objects in the surveyed data that we cannot hope to detect on satellite images, the user should be able to set a threshold to “filter out” smaller features for the error calculation. For example, with a threshold on the Shape_Area field of 50m², features smaller than that size are not considered as objects that should be found by the detector and should therefore not be taken into consideration to calculate recall.

The F measure is derived from precision and recall as follows:

F = 2 . (precision . recall) / (precision + recall)

2019

June

Question 1: The city of Dar es Salaam in Tanzania is very prone to floods. You reveived three feature classes: Buildings, roads and flood areas (but you did not need the roads file). Produce a model that makes a new feature class with all residential buildings that are (partly) located in the flood areas and are more than 200 meters away from a hospital. Your model can only consist of one 'tool' namely, your python script (so you were not allowed to actually just make a model in modelbuilder, you had to code everything yourself). Also, at the end of your model, a message should appear, telling the user how many buildings were in this zone. The user should also be able to define a suffix that needed to be appended to the name of the final feature class.

Question 2: You use the same files. Write a script that produces a text file. In this text file, the id and the latitude and longitude of the centroid of all buildings with the type public, school and hospital should be written, seperated by a ; between the three variables and every building should be on a new line.

2017

Question 1

One of the problems is that people construct settlements in natural flood zones. You are asked to develop a model that puts all RESIDENTIAL buildings that are within the area prone to flooding. A simple ‘clip’ operation is not useful as we do not want the building outlines to be cut by the wetland polygon. If a building is even partly inside the flood prone area (wetland), the entire outline should be included in the output feature class.

The model should provide the user with 3 input parameters: the feature classes containing the buildings used as input, the wetlands and the output feature class. When you run the model, the output feature class should be called “question1_infloodzone” and stored in the gDB.

Question 2

As part of an effort to quantify the “hazard” for each building, you are asked to develop a Python script that calculated the distance from each building to the nearest PRIMARY or TRUCK road segment (fclass field), that calculated a hazard factor (0-100) based on this distance and that stores both distances and hazard factor as two new fields in the output feature class. We defined the extent of the study area to keep calculation times relatively short.

hazard factor = 100/1+500*e^-0.015d

where d is the shortest distance between the building outline (not the centroid) and the primary road. implement this formula as a Python function.

You will need to use the proper method of geometry objects for distance calculation. Also take note that the roads are made up of several segments (polylines). You can choose any meaningful names for the outline fields, but they should be checked for validity within the script.

Question 3

Relief workers would like you to write a short Python script that writes out information from the feature class you produced in Question 2 to a txt file. Each building should be stored as a separate line and the following information for each building should be separated by a “;”

  • X coordinate of building centroid
  • Y coordinate of building centroid
  • shortest distance of building outline to primary road
  • hazard factor

If you failed to produce the output required by Question 2, write the field “osm_id” and “typ” instead of shortest distance and hazards factor.

2016

  1. given: fc with companies and fc with horeca (pt files). Find closest bar or cafe (two types of horeca) for each company, give distance to and name of that horeca establishment. Store in new fc
  2. given: internet file that could be read as textfile about horeca in leuven. Written in it: many lines, spaces, ... with text, somewhere the name, somewhere the type (there were more types per name), and somewhere the lat and lon. Make FC, and store in gdb, a file containing all horeca establishments names, type (only the first type) and the shape@ (point feat). ! The text file is in another Reference system than the output has to be.

2015

Morning

1. exercise practically the same as ex 2.3

Given: feature class with rivers ("hydrography")

Develop a model that automatically generates a seperate feature class for each type of basin ("STRMGEB"). The name of the feature classes should contain the name of the basin (from the attribute table).

2. Given: feature class with rivers ("hydrography")

Generate a standalone script that replaces all the spaces by underscores for the values in the fields "STRMGEB" and "BEKNAAM". Your result should be a new feature class (so copy the original and don't edit that one).

3. An exercise with geometries, much more difficult than all the others.

Given: feature class with some river features.

We want to simplify these line features by reducing the number of vertices. A figure was given. The idea is to start at the first vertex V0. If the next vertex V1 is within a certain distance delta from V0, it should be removed from the output line feature. If V1 is further away from V0 than the distance delta, the vertex will be retained and will become the new V0. However, the first and the last vertex of each polyline feature must be retained in any case. (It was not said in the exercise description but some of the line features were multipart...). Desired output: a new feature class with the simplified line features. The script should be a standalone script.

Afternoon

Oplossing van oefening getoond in de les. Kan zowel ModelBuilder, python script als combinatie van beide zijn. Je moet iedere stap kunnen verklaren!