Merge pull request 'docs/finish-readmeÖ Update documentation' (#9) from docs/finish-readme into main

Reviewed-on: #9
This commit was merged in pull request #9.
This commit is contained in:
2024-09-04 06:21:35 +00:00
3 changed files with 61 additions and 4 deletions

View File

@@ -1,5 +1,47 @@
# grid_application
Code for my application to Avacon AG for the role of Data Scientist. This is a web app containing different data science use-cases related to power grids and electricity generation.
# Avacon Data Science project: Chat with your data application
Code for a self-implemented web app for my application at Avacon Netz for the role of Data Scientist. This is a "chat with your data app" using generated data of customers and their gas and electricity meter readings.
## Einleitung
Ich hoffe, dass ich mit diesem kleinen Projekt meiner Begeisterung und Neugier für die Stelle noch einmal Nachdruck verleihen kann, aber auch einen Einblick in meinen "Coding Style" und meine technischen Fähigkeiten geben kann.
Ich habe das Thema "Chat with your data" ausgewählt, weil ich es passend für den Themenbereich Daten- und Plattformmanagement fand. Eine einfache Schnittstelle in natürlicher Sprache, mit der auch Kolleginnen und Kollegen ohne SQL-Programmierkenntnisse oder komplexe Benutzeroberflächen relevante Daten abrufen könen, ist sicherlich sehr hilfreich und kann die Effizienz und Nutzung von Daten-Tools im Unternehmen verbessern.
Ich habe diese App in Gänze selbst implementiert, von der generation des Datensatzes aus öffentlich verfügbaren Quellen bis hin zum finalen Deployment auf der Microsoft Azure Cloud. Alle Details sind in diesem Repository enthalten.
Die Applikation kann unter folgendem Link abgerufen werden: [Link](https://avc-app-ahbhc8hagheua3bx.germanywestcentral-01.azurewebsites.net/)
Benutzername und Passwort lasse ich über die Bewerbungsunterlagen zukommen. Gleiches gilt für die Umgebungs-Variablen, die benötigt werden, um den Code auch lokal laufen zu lassen.
## Running the Application
The app is deployed on an azure App Service instance (see Link above), but can also be run locally. To do this, several environment variables need to be set in order for APIs and authentication to work properly. These will be sent via my application documents.
To run this app, install poetry (see the [official documentation](https://python-poetry.org/docs/) for details). Then, simply run the following commands in a shell of your choice:
```
poetry install
poetry shell
```
Now, you should be in a shell with all the required packages installed. This code connects to Azure SQL using `pyodbc` and therefore, the Microsoft ODBC driver (version 18) must be installed. To do this, follow the [official documentation](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver16) again. Once this is done, make sure the mentioned environment variables have been exported and then run:
```
cd app
gunicorn app:server -b 0.0.0.0:8000
```
The server should then start and can be accessed via browser at `0.0.0.0:8000`.
## General Structure of the app:
The main structure of the `Dash` app starts with a text input field, where the question prompt can be inserted. Once the submit button is klicked, the user message, together with a long and somewhat optimized system prompt, is sent via the OpenAI API to a generic GPT-4o model.
The model is prompted to give back its answer as a `JSON`-encoded string. It includes a summary in natural language and a SQL query. The query is run on the Azure SQL Database using `pyodbc`. The summary as well as the query itself are shown in an output text field to the user. The query result is read into a `pandas` dataframe, which is then displayed as an interactive table.
Below this main section, there is a "control field", which can be used to manually input SQL queries for comparison. It is also possible to copy/paste the SQL output of the model into this field to check its result.
The questions that can be asked of course depend on the data, which is described in detail in the following sections. Additionally, some example prompts are provided in the web application directly.
## Data sources
@@ -41,3 +83,17 @@ Meters have a signature, which also works like an ID. It is a string in the form
The `Addresses` table contains street name, house number, city, zip and geo information.
Finally, the `Readings` table stores the data of the meter values read by the customers. Each reading is done by a unique customer from a unique meter and contains the date and the value that was read off the meter.
## Cloud Infrastructure
The Infrastructure is best described by the image below:
<p align="center">
<img src="./assets/cloud_structure.png" alt="A schematic of the cloud app structure" width="80%">
</p>
The App Service needs several secrets that it receives from an Azure Key Vault by authentication via role-based access control (RBAC) as a System Managed Identity. It also authenticates via the same method with the Azure SQL Server and Database. Using the secrets provided by the key vault, the App Service can authenticate users and query the AzureOpenAI resource using the API key, as well as connect to the SQL database to run queries.
Note that due to too strict rate limits, a regular OpenAI connection is used rather than an azure OpenAI instance, but the principle is the same.

View File

@@ -43,8 +43,9 @@ notification_md = """
**Hinweise:**
Aufgrund des sparsamen pricing Tiers kann es einige Sekunden dauern, bis die
Verbindung zur Datenbank hergestellt wird. Im Falle eines Fehlers gern ein-zwei mal erneut
versuchen (die Seite neu Laden).
Verbindung zur Datenbank hergestellt wird. Im Falle eines Fehlers oder langer Ladedauer
(> 2 min.) gern ein-zwei mal erneut versuchen (die Seite neu Laden). Sobald die Verbindung
einmal hergestellt wurde, geht es schnell.
GPT-4o kann einige Fehler machen. Sollte dies passieren wird eine Fehlermeldung angezeigt.
In diesem Fall lohnt es sich oft, die Anfrage leicht verändert erneut zu stellen und evtl

BIN
assets/cloud_structure.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 224 KiB