T

Tobias Quadfasel 4b9fa0579e feat(ai-chat): Add SQL query field for comparison

In order to compare the (not yet implemented) SQL query generated by
the LLM with an actual query, another text field was added that parses
the query to `pyodbc`, which connects to our database, stores the
resulting rows in a `pandas` dataframe and then visualizes it as a table
in plotly dash.

The SQL functionalities are implemented in the `sql_utils.py` module.

Additionally, some minor updates to the overall behavior and layout of
the app were implemented.

2024-09-02 20:43:48 +02:00

.docker

feat(base-app): Add docker setup

2024-08-29 17:35:14 +02:00

app

feat(ai-chat): Add SQL query field for comparison

2024-09-02 20:43:48 +02:00

assets

docs(dataprep): Add documentation about data sources and structure

2024-08-31 15:21:08 +02:00

data

feat(add-data): Add actual data used for data generation to repo

2024-08-31 14:00:29 +02:00

data_preparation

feat(add-data): Add script to insert prepped data into database

2024-08-31 14:39:21 +02:00

.pre-commit-config.yaml

feat(project_setup): Initialize project (#1 )

2024-08-29 14:08:09 +00:00

Dockerfile

feat(base-app): Add docker setup

2024-08-29 17:35:14 +02:00

poetry.lock

feat(ai-chat): Add openai python SDK to project

2024-08-31 17:03:58 +02:00

pyproject.toml

feat(ai-chat): Add openai python SDK to project

2024-08-31 17:03:58 +02:00

README.md

docs(dataprep): Add documentation about data sources and structure

2024-08-31 15:21:08 +02:00

README.md

grid_application

Code for my application to Avacon AG for the role of Data Scientist. This is a web app containing different data science use-cases related to power grids and electricity generation.

Data sources

In this application, the data was randomly generated and has been uploaded into an Azure SQL Database for you already. In order to be transparent about how this was done, the scripts and files are included in this repository.

The scripts for general preprocessing as well as database interaction are both located in the data_preparation directory. The raw data and also the preprocessed data file that has ultimately been uploaded to the database are found in the data directory.

All sources for this data are publically available. Here is a list of the resources used for the different information content:

German surnames: Most frequent German Surnames from Wiktionary
German given names: Most frequent male and female given names in Germany from Wiktionary
Street names: These are street names from the hanseatic city of Rostock, made available as open data here
Zip codes: from opendatasoft
Additional information for each zip, such as city name, longitude, latitude etc. using this public API
Rough bounding box information for Avacon Netz service area: netzgebiete.avacon.de

Data structure

The above data is used to randomly generate a user-specified number of customers. Currently, a number of 1000 customers were generated. Customer information includes:

Given name and surname
Street name, house number, zip code and city
Two meter IDs per customer: one for a natural gas meter, one for an electricity meter
Each customer has between 1 and 10 (also chosen randomly) meter readings, which include:
- The date at which the reading was obtained
- The value that was read from the meter
For simplicity, I assumed that both electricity and gas meter readings are always occurring in pairs (i.e. there is no customer that just reads electricity meter values or just natural gas meter values)

The customers, meters and address data are generated and uploaded to the SQL database. The ERD of the database looks like this:

Customers have a first name and a last name and reference other tables only by gas and electricity meter IDs. I preferred this to addresses because there are multiple households (and meters) at one address so meter IDs seemed the more natural choice.

Meters have a signature, which also works like an ID. It is a string in the format W.XXX.YYY.Z Where W, X, Y and Z are digits from 1 to 9. The MeterType has the value GAS for gas meters and ELT for electricity meters. Each Meter is located at a certain address and is therefore linked to the Addresses table by an AddressID.

The Addresses table contains street name, house number, city, zip and geo information.

Finally, the Readings table stores the data of the meter values read by the customers. Each reading is done by a unique customer from a unique meter and contains the date and the value that was read off the meter.