feat/add-data #3

Merged
quadfaselt merged 7 commits from feat/add-data into main 2024-08-31 12:43:11 +00:00
Owner
No description provided.
quadfaselt added 7 commits 2024-08-31 12:42:35 +00:00
Usually, one would not check out the actual data files, but store them
elsewhere (such as in Azure Blob Storage). In this case, it is still
convenient for external reviewers to get an idea of the structure of
the data.

Since the files in total are less than 2MB, this is acceptable for this
specific case.
The script `get_base_data` takes the raw datafiles (such as `names.txt`)
and formats them in a common JSON file, which can be later used to
randomly generate customer and meter readings data.

Additionally, the script filters all eligible zip codes an approximate
avacon netz service area and provides some additional information for
them.

An example output file, `base_data.json` has been added to the repo in
a previous commit.
Add `data_preparation/generate_customers.py`, a script that takes the
`base_data.json` file generated by `get_base_data.py` and randomly
samples a given number of customers.

To simplify things, each customer is assigned exactly one gas and one
electricity meter and each of them is read between 1 and 10 times.

The full data including meters, meter readings and dates as well as
customers and addresses is stored in a final JSON file named
`customers.json`.
Since we are working with an Azure SQL database, we need to fill the
generated customer data in a fitting schema. The schema will be
described in more detail in an updated README file later.

The added script uses `pyodbc` to connect to the database and create the
tables. This requires a connection string, which will not be checked out
to this repo for security reasons and must be obtained separately.

Additionally, a script `test_sql_connection.py` is added with this
commit, which is a simple utility to test the `pyodbc` connection.
The script `insert_sql.py` uses `pyodbc` to connect to the Azure SQL
database, loads the data from the preprocessed `customers.json` file,
formats them and then inserts them into the created table schema.
quadfaselt merged commit 6a1a5c11f0 into main 2024-08-31 12:43:11 +00:00
quadfaselt deleted branch feat/add-data 2024-08-31 12:43:12 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: quadfaselt/grid_application#3