Tobias Quadfasel 02f1b41cb9 feat(azure): Added necessary azure components to app
Using respective credentials for both local development as well as
deployment. When deployed on azure, the app authenticates with the SQL
database via Entra ID (formerly active directory) and accesses other
credentials via key vault as a system managed identity.
2024-09-03 21:51:12 +02:00
2024-09-03 17:23:41 +02:00
2024-09-03 17:23:41 +02:00

grid_application

Code for my application to Avacon AG for the role of Data Scientist. This is a web app containing different data science use-cases related to power grids and electricity generation.

Data sources

In this application, the data was randomly generated and has been uploaded into an Azure SQL Database for you already. In order to be transparent about how this was done, the scripts and files are included in this repository.

The scripts for general preprocessing as well as database interaction are both located in the data_preparation directory. The raw data and also the preprocessed data file that has ultimately been uploaded to the database are found in the data directory.

All sources for this data are publically available. Here is a list of the resources used for the different information content:

  • German surnames: Most frequent German Surnames from Wiktionary
  • German given names: Most frequent male and female given names in Germany from Wiktionary
  • Street names: These are street names from the hanseatic city of Rostock, made available as open data here
  • Zip codes: from opendatasoft
  • Additional information for each zip, such as city name, longitude, latitude etc. using this public API
  • Rough bounding box information for Avacon Netz service area: netzgebiete.avacon.de

Data structure

The above data is used to randomly generate a user-specified number of customers. Currently, a number of 1000 customers were generated. Customer information includes:

  • Given name and surname
  • Street name, house number, zip code and city
  • Two meter IDs per customer: one for a natural gas meter, one for an electricity meter
  • Each customer has between 1 and 10 (also chosen randomly) meter readings, which include:
    • The date at which the reading was obtained
    • The value that was read from the meter
  • For simplicity, I assumed that both electricity and gas meter readings are always occurring in pairs (i.e. there is no customer that just reads electricity meter values or just natural gas meter values)

The customers, meters and address data are generated and uploaded to the SQL database. The ERD of the database looks like this:

An ERD of the avacon customer database

Customers have a first name and a last name and reference other tables only by gas and electricity meter IDs. I preferred this to addresses because there are multiple households (and meters) at one address so meter IDs seemed the more natural choice.

Meters have a signature, which also works like an ID. It is a string in the format W.XXX.YYY.Z Where W, X, Y and Z are digits from 1 to 9. The MeterType has the value GAS for gas meters and ELT for electricity meters. Each Meter is located at a certain address and is therefore linked to the Addresses table by an AddressID.

The Addresses table contains street name, house number, city, zip and geo information.

Finally, the Readings table stores the data of the meter values read by the customers. Each reading is done by a unique customer from a unique meter and contains the date and the value that was read off the meter.

Description
A web application to chat with customer and meter reading data of an energy grid company
Readme 1.4 MiB
Languages
Python 95.5%
Dockerfile 2.4%
Shell 2.1%