# grid_application Code for my application to Avacon AG for the role of Data Scientist. This is a web app containing different data science use-cases related to power grids and electricity generation. ## Data sources In this application, the data was randomly generated and has been uploaded into an Azure SQL Database for you already. In order to be transparent about how this was done, the scripts and files are included in this repository. The scripts for general preprocessing as well as database interaction are both located in the `data_preparation` directory. The raw data and also the preprocessed data file that has ultimately been uploaded to the database are found in the `data` directory. All sources for this data are publically available. Here is a list of the resources used for the different information content: - German surnames: Most frequent German Surnames from [Wiktionary](https://de.wiktionary.org/wiki/Verzeichnis:Deutsch/Namen/die_h%C3%A4ufigsten_Nachnamen_Deutschlands) - German given names: Most frequent [male](https://de.wiktionary.org/wiki/Verzeichnis:Deutsch/Namen/die_h%C3%A4ufigsten_m%C3%A4nnlichen_Vornamen_Deutschlands) and [female](https://de.wiktionary.org/wiki/Verzeichnis:Deutsch/Namen/die_h%C3%A4ufigsten_weiblichen_Vornamen_Deutschlands) given names in Germany from Wiktionary - Street names: These are street names from the hanseatic city of Rostock, made available as open data [here](https://geo.sv.rostock.de/download/opendata/adressenliste/adressenliste.json) - Zip codes: from [opendatasoft](https://public.opendatasoft.com/explore/dataset/georef-germany-postleitzahl/table/?dataChart=eyJxdWVyaWVzIjpbeyJjb25maWciOnsiZGF0YXNldCI6Imdlb3JlZi1nZXJtYW55LXBvc3RsZWl0emFobCIsIm9wdGlvbnMiOnt9fSwiY2hhcnRzIjpbeyJhbGlnbk1vbnRoIjp0cnVlLCJ0eXBlIjoiY29sdW1uIiwiZnVuYyI6IkNPVU5UIiwic2NpZW50aWZpY0Rpc3BsYXkiOnRydWUsImNvbG9yIjoiI0ZGNTE1QSJ9XSwieEF4aXMiOiJwbHpfbmFtZSIsIm1heHBvaW50cyI6NTAsInNvcnQiOiIifV0sInRpbWVzY2FsZSI6IiIsImRpc3BsYXlMZWdlbmQiOnRydWUsImFsaWduTW9udGgiOnRydWV9&location=6,51.3294,10.45412&basemap=jawg.light) - Additional information for each zip, such as city name, longitude, latitude etc. using this public [API](https://github.com/digitalfabrik/gemeindeverzeichnis-django) - Rough bounding box information for Avacon Netz service area: [netzgebiete.avacon.de](https://netzgebiete.avacon.de/rcmap/Content/Map/Detail.aspx?keep=dzjernxQf/whawjGMPFQgA==) ## Data structure The above data is used to randomly generate a user-specified number of customers. Currently, a number of 1000 customers were generated. Customer information includes: - Given name and surname - Street name, house number, zip code and city - Two meter IDs per customer: one for a natural gas meter, one for an electricity meter - Each customer has between 1 and 10 (also chosen randomly) meter readings, which include: - The date at which the reading was obtained - The value that was read from the meter - For simplicity, I assumed that both electricity and gas meter readings are always occurring in pairs (i.e. there is no customer that *just* reads electricity meter values or *just* natural gas meter values) The customers, meters and address data are generated and uploaded to the SQL database. The ERD of the database looks like this: