Add info if datasets are not freshly loaded #104

jensch-dlr · 2023-01-26T15:51:19Z

Hello everyone,

I cannot get the pm.powerplants(update=True) to run. I guess something is wrong with my config, but I cannot seem to find out what.

AttributeError: 'DataFrame' object has no attribute 'Name' when calling pm.powerplants(update=True)

Has anyone encountered that mistake before or knows how to circumvent it by chance?

The text was updated successfully, but these errors were encountered:

FabianHofmann · 2023-01-29T19:22:56Z

Hey @jensch-dlr thanks for reporting. Could you print out the full stack trace? And what pandas version you use?

jensch-dlr · 2023-01-30T16:59:23Z

Hello @FabianHofmann ,
I sure can. Here is the whole thing:

`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 pm.powerplants(update=True)

File c:\work\data\powerplantmatching\powerplantmatching\collection.py:230, in powerplants(config, config_update, update, from_url, extend_by_vres, extendby_kwargs, extend_by_kwargs, fill_geopositions, filter_missing_geopositions, **collection_kwargs)
225 return df
227 matching_sources = [
228 list(to_dict_if_string(a))[0] for a in config["matching_sources"]
229 ]
--> 230 matched = collect(matching_sources, config=config, **collection_kwargs)
232 if isinstance(config["fully_included_sources"], list):
233 for source in config["fully_included_sources"]:

File c:\work\data\powerplantmatching\powerplantmatching\collection.py:98, in collect(datasets, update, reduced, config, **dukeargs)
95 update = True
97 if update:
---> 98 dfs = parmap(df_by_name, datasets)
99 matched = combine_multiple_datasets(dfs, datasets, config=config, **dukeargs)
100 (
101 matched.assign(projectID=lambda df: df.projectID.astype(str)).to_csv(
102 outfn_matched, index_label="id"
103 )
104 )

File c:\work\data\powerplantmatching\powerplantmatching\utils.py:378, in parmap(f, arg_list, config)
376 return [x for i, x in sorted(res)]
377 else:
--> 378 return list(map(f, arg_list))

File c:\work\data\powerplantmatching\powerplantmatching\collection.py:73, in collect..df_by_name(name)
71 conf = config[name]
72 get_df = getattr(data, name)
---> 73 df = get_df(config=config)
75 if not conf.get("aggregated_units", False):
76 return aggregate_units(df, dataset_name=name, config=config)

File c:\work\data\powerplantmatching\powerplantmatching\data.py:751, in ENTSOE(raw, update, config, entsoe_token, **fill_geoposition_kwargs)
743 fn = package_data("entsoe_country_codes.csv")
744 COUNTRY_MAP = pd.read_csv(fn, index_col=0).rename(index=str).Country
746 return (
747 df.rename_axis(index="projectID")
748 .reset_index()
749 .rename(columns=RENAME_COLUMNS)
750 .drop_duplicates("projectID")
--> 751 .assign(
752 Name=lambda df: df.Name.str.replace("", " "), # for geoparsing
753 EIC=lambda df: df.projectID,
754 Country=lambda df: df.projectID.str[:2].map(COUNTRY_MAP),
755 Capacity=lambda df: pd.to_numeric(df.Capacity),
756 Technology=np.nan,
757 Set=np.nan,
758 lat=np.nan,
759 lon=np.nan,
760 )
761 .powerplant.convert_alpha2_to_country()
762 # .pipe(fill_geoposition, **fill_geoposition_kwargs)
763 .query("Capacity > 0")
764 .pipe(gather_specifications, config=config)
765 .pipe(clean_name)
766 .pipe(set_column_name, "ENTSOE")
767 .pipe(config_filter, config)
768 )

File C:\mambaforge-data\envs\powerplantmatching\Lib\site-packages\pandas\core\frame.py:4889, in DataFrame.assign(self, **kwargs)
4886 data = self.copy()
4888 for k, v in kwargs.items():
-> 4889 data[k] = com.apply_if_callable(v, data)
4890 return data

File C:\mambaforge-data\envs\powerplantmatching\Lib\site-packages\pandas\core\common.py:374, in apply_if_callable(maybe_callable, obj, **kwargs)
363 """
364 Evaluate possibly callable input using obj and kwargs if it is callable,
365 otherwise return as it is.
(...)
371 **kwargs
372 """
373 if callable(maybe_callable):
--> 374 return maybe_callable(obj, **kwargs)
376 return maybe_callable

File c:\work\data\powerplantmatching\powerplantmatching\data.py:752, in ENTSOE..(df)
743 fn = package_data("entsoe_country_codes.csv")
744 COUNTRY_MAP = pd.read_csv(fn, index_col=0).rename(index=str).Country
746 return (
747 df.rename_axis(index="projectID")
748 .reset_index()
749 .rename(columns=RENAME_COLUMNS)
750 .drop_duplicates("projectID")
751 .assign(
--> 752 Name=lambda df: df.Name.str.replace("", " "), # for geoparsing
753 EIC=lambda df: df.projectID,
754 Country=lambda df: df.projectID.str[:2].map(COUNTRY_MAP),
755 Capacity=lambda df: pd.to_numeric(df.Capacity),
756 Technology=np.nan,
757 Set=np.nan,
758 lat=np.nan,
759 lon=np.nan,
760 )
761 .powerplant.convert_alpha2_to_country()
762 # .pipe(fill_geoposition, **fill_geoposition_kwargs)
763 .query("Capacity > 0")
764 .pipe(gather_specifications, config=config)
765 .pipe(clean_name)
766 .pipe(set_column_name, "ENTSOE")
767 .pipe(config_filter, config)
768 )

File C:\mambaforge-data\envs\powerplantmatching\Lib\site-packages\pandas\core\generic.py:5902, in NDFrame.getattr(self, name)
5895 if (
5896 name not in self._internal_names_set
5897 and name not in self._metadata
5898 and name not in self._accessors
5899 and self._info_axis._can_hold_identifiers_and_holds_name(name)
5900 ):
5901 return self[name]
-> 5902 return object.getattribute(self, name)

AttributeError: 'DataFrame' object has no attribute 'Name'`

The Pandas version I used is 1.5.3.

By now, I think I know what causes the error. None of the PPM datasets is downloaded to my drive (at least I cannot find any). So the true mistake seems to lie there. So, my new question: Shouldn't that download happen when I use the powerplantmatching package? Like when I use pm.powerplants(from_url=False)?

Thank you!

jensch-dlr · 2023-02-01T14:07:16Z

Okay, I found the problem. I had a version of entsoe_powerplants.csv in my directory that was from mid-2021. Unlike now, its column titles were
,Unnamed: 0,registeredResource.name,registeredResource.mRID,voltage_PowerSystemResources.highVoltageLimit,psrType,quantity,Country (now: ,Bidding Zone,Installed Capacity [MW],Name,Production Type,Voltage Connection Level [kV]).

That was the reason for the error message. So, I am guessing that currently PPM is only checking if a file with the same name already exists in the in data directory. Wouldn't it be better to check if it is actually the same file and update it if not?

FabianHofmann · 2023-02-01T23:05:11Z

You're probably right. But this would actually require to go deep into the code. Perhaps a reset option would be better. Just in case one wants to make a fresh install. That would have solved the problem for you right?

jensch-dlr · 2023-02-02T09:07:40Z

It would have, yes. Maybe a warning with a hint on that possible problem would have speeded up the process. But that might again be too hard to integrate?

FabianHofmann changed the title ~~pm.powerplants(update=True) gives AttributeError?~~ Add info if datasets are not freshly loaded Feb 22, 2023

FabianHofmann added the installation label Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add info if datasets are not freshly loaded #104

Add info if datasets are not freshly loaded #104

jensch-dlr commented Jan 26, 2023

FabianHofmann commented Jan 29, 2023

jensch-dlr commented Jan 30, 2023

jensch-dlr commented Feb 1, 2023

FabianHofmann commented Feb 1, 2023

jensch-dlr commented Feb 2, 2023

Add info if datasets are not freshly loaded #104

Add info if datasets are not freshly loaded #104

Comments

jensch-dlr commented Jan 26, 2023

FabianHofmann commented Jan 29, 2023

jensch-dlr commented Jan 30, 2023

jensch-dlr commented Feb 1, 2023

FabianHofmann commented Feb 1, 2023

jensch-dlr commented Feb 2, 2023