entsoe_crawler module

Created on Sun Nov 29 18:53:14 2020

@author: maurer

class entsoe_crawler.EntsoeCrawler(database)[source]

Bases: object

class to allow easier crawling of ENTSO-E timeseries data

Parameters:
database: str

database connection string or path to sqlite db

Methods

countries_with_plant_data(client[, ...])

checks for all countries if the have available data at date.

create_database(client, start, delta[, ...])

download_entsoe(countries, proc, start, ...)

Downloads data with a procedure from a EntsoePandasClient and stores it in the configured database

download_entsoe_plant_data(countries, ...)

Allows to download the generation per power plant from entsoe.

fetch_and_write_entsoe_df_to_db(country, ...)

Crawl data from ENTSO-E transparency platform and write it to the database

get_latest_crawled_timestamp(start, delta, ...)

Find the best Start for the given procedurename by finding the last timestemp where data was collected for.

init_base_sql()

write static data to database once

pull_crossborders(start, delta, times, proc)

Pulls transmissions across borders from entsoe

save_power_system_data()

pulls static data from opsd and reads it into the database - used for mapping existing power plants from entsoe to a location on a map

update_database(client[, start, delta, ...])

Runs everything which is needed to update the database and pull the data since the last successful pull.

countries_with_plant_data(client, countries=['DE_50HZ', 'AL', 'DE_AMPRION', 'AT', 'BY', 'BE', 'BA', 'BG', 'CZ_DE_SK', 'HR', 'CWE', 'CY', 'CZ', 'DE_AT_LU', 'DE_LU', 'DK', 'DK_1', 'DK_1_NO_1', 'DK_2', 'DK_CA', 'EE', 'FI', 'MK', 'FR', 'DE', 'GR', 'HU', 'IS', 'IE_SEM', 'IE', 'IT', 'IT_SACO_AC', 'IT_CALA', 'IT_SACO_DC', 'IT_BRNN', 'IT_CNOR', 'IT_CSUD', 'IT_FOGN', 'IT_GR', 'IT_MACRO_NORTH', 'IT_MACRO_SOUTH', 'IT_MALTA', 'IT_NORD', 'IT_NORD_AT', 'IT_NORD_CH', 'IT_NORD_FR', 'IT_NORD_SI', 'IT_PRGP', 'IT_ROSN', 'IT_SARD', 'IT_SICI', 'IT_SUD', 'RU_KGD', 'LV', 'LT', 'LU', 'MT', 'ME', 'GB', 'GE', 'GB_IFA', 'GB_IFA2', 'GB_ELECLINK', 'UK', 'NL', 'NO_1', 'NO_1A', 'NO_2', 'NO_2_NSL', 'NO_2A', 'NO_3', 'NO_4', 'NO_5', 'NO', 'PL_CZ', 'PL', 'PT', 'MD', 'RO', 'RU', 'SE_1', 'SE_2', 'SE_3', 'SE_4', 'RS', 'SK', 'SI', 'GB_NIR', 'ES', 'SE', 'CH', 'DE_TENNET', 'DE_TRANSNET', 'TR', 'UA', 'UA_DOBTPP', 'UA_BEI', 'UA_IPS', 'XK', 'DE_AMP_LU'], st=Timestamp('2018-01-01 00:00:00+0100', tz='Europe/Berlin'))[source]

checks for all countries if the have available data at date. Returns list of countries with existing generation data per plant at given timestamp

Parameters:
cliententsoe.EntsoePandasClient
countrieslist[str], default all_countries
Returns:
plant_countrieslist[str]

list of country_codes with existing data for generation per plant

create_database(client, start, delta, countries=[])[source]
Parameters:
cliententsoe.EntsoePandasClient

param start:

delta

param countries: (Default value = [])

start

param countries: (Default value = [])

countries

(Default value = [])

Returns:
download_entsoe(countries, proc, start, delta, times)[source]

Downloads data with a procedure from a EntsoePandasClient and stores it in the configured database

Parameters:
countrieslist[str]

list of country codes

proc

procedure of entsoe-py

startpd.Timestamp
deltapd.Timedelta
timesint
Returns:
download_entsoe_plant_data(countries, client, start, delta, times)[source]

Allows to download the generation per power plant from entsoe. Uses download_entsoe to write the data into the DB.

Parameters:
countrieslist[str]

list of 2-letter countrycodes

cliententsoe.EntsoePandasClient

DataFrameClient of entsoe-py package

startpd.Timestamp

timestamp aware pd.Timestamp

deltapd.Timedelta

Timedelta to fetch data for per bulk

timesint

number of bulks with size delta to fetch

Returns:
fetch_and_write_entsoe_df_to_db(country, proc, start, end)[source]

Crawl data from ENTSO-E transparency platform and write it to the database

Parameters:
countrystr

2-letter country code

proc

procedure of entsoe-py client

startpd.Timestamp

start time

endpd.Timestamp

end time

Returns:
get_latest_crawled_timestamp(start, delta, tablename, tz='Europe/Berlin')[source]

Find the best Start for the given procedurename by finding the last timestemp where data was collected for. Also calculates the best delta to update until today.

Parameters:
startpd.Timestamp
deltapd.Timedelta

to check if a delta has already been set

tablenamestr

name of the table

tzstr

(Default value = ‘Europe/Berlin’)

Returns:
type
startpd.Timestamp

best start

deltapd.Timedelta

best delta

init_base_sql()[source]

write static data to database once

pull_crossborders(start, delta, times, proc, allZones=True)[source]

Pulls transmissions across borders from entsoe

Parameters:
start

param delta:

times

param proc:

allZones

Default value = True)

delta

param proc:

proc
Returns:
save_power_system_data()[source]

pulls static data from opsd and reads it into the database - used for mapping existing power plants from entsoe to a location on a map

Parameters:
Returns:
update_database(client, start=None, delta=None, countries=['DE_50HZ', 'AL', 'DE_AMPRION', 'AT', 'BY', 'BE', 'BA', 'BG', 'CZ_DE_SK', 'HR', 'CWE', 'CY', 'CZ', 'DE_AT_LU', 'DE_LU', 'DK', 'DK_1', 'DK_1_NO_1', 'DK_2', 'DK_CA', 'EE', 'FI', 'MK', 'FR', 'DE', 'GR', 'HU', 'IS', 'IE_SEM', 'IE', 'IT', 'IT_SACO_AC', 'IT_CALA', 'IT_SACO_DC', 'IT_BRNN', 'IT_CNOR', 'IT_CSUD', 'IT_FOGN', 'IT_GR', 'IT_MACRO_NORTH', 'IT_MACRO_SOUTH', 'IT_MALTA', 'IT_NORD', 'IT_NORD_AT', 'IT_NORD_CH', 'IT_NORD_FR', 'IT_NORD_SI', 'IT_PRGP', 'IT_ROSN', 'IT_SARD', 'IT_SICI', 'IT_SUD', 'RU_KGD', 'LV', 'LT', 'LU', 'MT', 'ME', 'GB', 'GE', 'GB_IFA', 'GB_IFA2', 'GB_ELECLINK', 'UK', 'NL', 'NO_1', 'NO_1A', 'NO_2', 'NO_2_NSL', 'NO_2A', 'NO_3', 'NO_4', 'NO_5', 'NO', 'PL_CZ', 'PL', 'PT', 'MD', 'RO', 'RU', 'SE_1', 'SE_2', 'SE_3', 'SE_4', 'RS', 'SK', 'SI', 'GB_NIR', 'ES', 'SE', 'CH', 'DE_TENNET', 'DE_TRANSNET', 'TR', 'UA', 'UA_DOBTPP', 'UA_BEI', 'UA_IPS', 'XK', 'DE_AMP_LU'])[source]

Runs everything which is needed to update the database and pull the data since the last successful pull.

Parameters:
cliententsoe.EntsoePandasClient

entsoe-py client

deltapd.Timedelta
countrieslist[str], default all_countries
startpd.Timestamp
Returns:
entsoe_crawler.calculate_nett_generation(df)[source]

Calculates the difference between columns ending with _actual_aggregated and _actual_consumption.

Parameters:
dfpd.DataFrame

DataFrame with columns ending with _actual_aggregated or _actual_consumption

Returns:
dat: pd.DataFrame
entsoe_crawler.sanitize_series(seriesname)[source]

replaces illegal values from a series name for insertion into database

Parameters:
seriesnamestr

name of the series

Returns:
ststr