entsoe_crawler module
Created on Sun Nov 29 18:53:14 2020
@author: maurer
- class entsoe_crawler.EntsoeCrawler(database)[source]
Bases:
object
class to allow easier crawling of ENTSO-E timeseries data
- Parameters:
- database: str
database connection string or path to sqlite db
Methods
countries_with_plant_data
(client[, ...])checks for all countries if the have available data at date.
create_database
(client, start, delta[, ...])download_entsoe
(countries, proc, start, ...)Downloads data with a procedure from a EntsoePandasClient and stores it in the configured database
download_entsoe_plant_data
(countries, ...)Allows to download the generation per power plant from entsoe.
fetch_and_write_entsoe_df_to_db
(country, ...)Crawl data from ENTSO-E transparency platform and write it to the database
get_latest_crawled_timestamp
(start, delta, ...)Find the best Start for the given procedurename by finding the last timestemp where data was collected for.
write static data to database once
pull_crossborders
(start, delta, times, proc)Pulls transmissions across borders from entsoe
pulls static data from opsd and reads it into the database - used for mapping existing power plants from entsoe to a location on a map
update_database
(client[, start, delta, ...])Runs everything which is needed to update the database and pull the data since the last successful pull.
- countries_with_plant_data(client, countries=['DE_50HZ', 'AL', 'DE_AMPRION', 'AT', 'BY', 'BE', 'BA', 'BG', 'CZ_DE_SK', 'HR', 'CWE', 'CY', 'CZ', 'DE_AT_LU', 'DE_LU', 'DK', 'DK_1', 'DK_1_NO_1', 'DK_2', 'DK_CA', 'EE', 'FI', 'MK', 'FR', 'DE', 'GR', 'HU', 'IS', 'IE_SEM', 'IE', 'IT', 'IT_SACO_AC', 'IT_CALA', 'IT_SACO_DC', 'IT_BRNN', 'IT_CNOR', 'IT_CSUD', 'IT_FOGN', 'IT_GR', 'IT_MACRO_NORTH', 'IT_MACRO_SOUTH', 'IT_MALTA', 'IT_NORD', 'IT_NORD_AT', 'IT_NORD_CH', 'IT_NORD_FR', 'IT_NORD_SI', 'IT_PRGP', 'IT_ROSN', 'IT_SARD', 'IT_SICI', 'IT_SUD', 'RU_KGD', 'LV', 'LT', 'LU', 'MT', 'ME', 'GB', 'GE', 'GB_IFA', 'GB_IFA2', 'GB_ELECLINK', 'UK', 'NL', 'NO_1', 'NO_1A', 'NO_2', 'NO_2_NSL', 'NO_2A', 'NO_3', 'NO_4', 'NO_5', 'NO', 'PL_CZ', 'PL', 'PT', 'MD', 'RO', 'RU', 'SE_1', 'SE_2', 'SE_3', 'SE_4', 'RS', 'SK', 'SI', 'GB_NIR', 'ES', 'SE', 'CH', 'DE_TENNET', 'DE_TRANSNET', 'TR', 'UA', 'UA_DOBTPP', 'UA_BEI', 'UA_IPS', 'XK', 'DE_AMP_LU'], st=Timestamp('2018-01-01 00:00:00+0100', tz='Europe/Berlin'))[source]
checks for all countries if the have available data at date. Returns list of countries with existing generation data per plant at given timestamp
- Parameters:
- cliententsoe.EntsoePandasClient
- countrieslist[str], default all_countries
- Returns:
- plant_countrieslist[str]
list of country_codes with existing data for generation per plant
- create_database(client, start, delta, countries=[])[source]
- Parameters:
- cliententsoe.EntsoePandasClient
param start:
- delta
param countries: (Default value = [])
- start
param countries: (Default value = [])
- countries
(Default value = [])
- Returns:
- download_entsoe(countries, proc, start, delta, times)[source]
Downloads data with a procedure from a EntsoePandasClient and stores it in the configured database
- Parameters:
- countrieslist[str]
list of country codes
- proc
procedure of entsoe-py
- startpd.Timestamp
- deltapd.Timedelta
- timesint
- Returns:
- download_entsoe_plant_data(countries, client, start, delta, times)[source]
Allows to download the generation per power plant from entsoe. Uses download_entsoe to write the data into the DB.
- Parameters:
- countrieslist[str]
list of 2-letter countrycodes
- cliententsoe.EntsoePandasClient
DataFrameClient of entsoe-py package
- startpd.Timestamp
timestamp aware pd.Timestamp
- deltapd.Timedelta
Timedelta to fetch data for per bulk
- timesint
number of bulks with size delta to fetch
- Returns:
- fetch_and_write_entsoe_df_to_db(country, proc, start, end)[source]
Crawl data from ENTSO-E transparency platform and write it to the database
- Parameters:
- countrystr
2-letter country code
- proc
procedure of entsoe-py client
- startpd.Timestamp
start time
- endpd.Timestamp
end time
- Returns:
- get_latest_crawled_timestamp(start, delta, tablename, tz='Europe/Berlin')[source]
Find the best Start for the given procedurename by finding the last timestemp where data was collected for. Also calculates the best delta to update until today.
- Parameters:
- startpd.Timestamp
- deltapd.Timedelta
to check if a delta has already been set
- tablenamestr
name of the table
- tzstr
(Default value = ‘Europe/Berlin’)
- Returns:
- type
- startpd.Timestamp
best start
- deltapd.Timedelta
best delta
- pull_crossborders(start, delta, times, proc, allZones=True)[source]
Pulls transmissions across borders from entsoe
- Parameters:
- start
param delta:
- times
param proc:
- allZones
Default value = True)
- delta
param proc:
- proc
- Returns:
- save_power_system_data()[source]
pulls static data from opsd and reads it into the database - used for mapping existing power plants from entsoe to a location on a map
- Parameters:
- Returns:
- update_database(client, start=None, delta=None, countries=['DE_50HZ', 'AL', 'DE_AMPRION', 'AT', 'BY', 'BE', 'BA', 'BG', 'CZ_DE_SK', 'HR', 'CWE', 'CY', 'CZ', 'DE_AT_LU', 'DE_LU', 'DK', 'DK_1', 'DK_1_NO_1', 'DK_2', 'DK_CA', 'EE', 'FI', 'MK', 'FR', 'DE', 'GR', 'HU', 'IS', 'IE_SEM', 'IE', 'IT', 'IT_SACO_AC', 'IT_CALA', 'IT_SACO_DC', 'IT_BRNN', 'IT_CNOR', 'IT_CSUD', 'IT_FOGN', 'IT_GR', 'IT_MACRO_NORTH', 'IT_MACRO_SOUTH', 'IT_MALTA', 'IT_NORD', 'IT_NORD_AT', 'IT_NORD_CH', 'IT_NORD_FR', 'IT_NORD_SI', 'IT_PRGP', 'IT_ROSN', 'IT_SARD', 'IT_SICI', 'IT_SUD', 'RU_KGD', 'LV', 'LT', 'LU', 'MT', 'ME', 'GB', 'GE', 'GB_IFA', 'GB_IFA2', 'GB_ELECLINK', 'UK', 'NL', 'NO_1', 'NO_1A', 'NO_2', 'NO_2_NSL', 'NO_2A', 'NO_3', 'NO_4', 'NO_5', 'NO', 'PL_CZ', 'PL', 'PT', 'MD', 'RO', 'RU', 'SE_1', 'SE_2', 'SE_3', 'SE_4', 'RS', 'SK', 'SI', 'GB_NIR', 'ES', 'SE', 'CH', 'DE_TENNET', 'DE_TRANSNET', 'TR', 'UA', 'UA_DOBTPP', 'UA_BEI', 'UA_IPS', 'XK', 'DE_AMP_LU'])[source]
Runs everything which is needed to update the database and pull the data since the last successful pull.
- Parameters:
- cliententsoe.EntsoePandasClient
entsoe-py client
- deltapd.Timedelta
- countrieslist[str], default all_countries
- startpd.Timestamp
- Returns: