Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add info if datasets are not freshly loaded #104

Open
jensch-dlr opened this issue Jan 26, 2023 · 5 comments
Open

Add info if datasets are not freshly loaded #104

jensch-dlr opened this issue Jan 26, 2023 · 5 comments

Comments

@jensch-dlr
Copy link
Contributor

Hello everyone,

I cannot get the pm.powerplants(update=True) to run. I guess something is wrong with my config, but I cannot seem to find out what.

AttributeError: 'DataFrame' object has no attribute 'Name' when calling pm.powerplants(update=True)

Has anyone encountered that mistake before or knows how to circumvent it by chance?

@FabianHofmann
Copy link
Contributor

Hey @jensch-dlr thanks for reporting. Could you print out the full stack trace? And what pandas version you use?

@jensch-dlr
Copy link
Contributor Author

Hello @FabianHofmann ,
I sure can. Here is the whole thing:

`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 pm.powerplants(update=True)

File c:\work\data\powerplantmatching\powerplantmatching\collection.py:230, in powerplants(config, config_update, update, from_url, extend_by_vres, extendby_kwargs, extend_by_kwargs, fill_geopositions, filter_missing_geopositions, **collection_kwargs)
225 return df
227 matching_sources = [
228 list(to_dict_if_string(a))[0] for a in config["matching_sources"]
229 ]
--> 230 matched = collect(matching_sources, config=config, **collection_kwargs)
232 if isinstance(config["fully_included_sources"], list):
233 for source in config["fully_included_sources"]:

File c:\work\data\powerplantmatching\powerplantmatching\collection.py:98, in collect(datasets, update, reduced, config, **dukeargs)
95 update = True
97 if update:
---> 98 dfs = parmap(df_by_name, datasets)
99 matched = combine_multiple_datasets(dfs, datasets, config=config, **dukeargs)
100 (
101 matched.assign(projectID=lambda df: df.projectID.astype(str)).to_csv(
102 outfn_matched, index_label="id"
103 )
104 )

File c:\work\data\powerplantmatching\powerplantmatching\utils.py:378, in parmap(f, arg_list, config)
376 return [x for i, x in sorted(res)]
377 else:
--> 378 return list(map(f, arg_list))

File c:\work\data\powerplantmatching\powerplantmatching\collection.py:73, in collect..df_by_name(name)
71 conf = config[name]
72 get_df = getattr(data, name)
---> 73 df = get_df(config=config)
75 if not conf.get("aggregated_units", False):
76 return aggregate_units(df, dataset_name=name, config=config)

File c:\work\data\powerplantmatching\powerplantmatching\data.py:751, in ENTSOE(raw, update, config, entsoe_token, **fill_geoposition_kwargs)
743 fn = package_data("entsoe_country_codes.csv")
744 COUNTRY_MAP = pd.read_csv(fn, index_col=0).rename(index=str).Country
746 return (
747 df.rename_axis(index="projectID")
748 .reset_index()
749 .rename(columns=RENAME_COLUMNS)
750 .drop_duplicates("projectID")
--> 751 .assign(
752 Name=lambda df: df.Name.str.replace("
", " "), # for geoparsing
753 EIC=lambda df: df.projectID,
754 Country=lambda df: df.projectID.str[:2].map(COUNTRY_MAP),
755 Capacity=lambda df: pd.to_numeric(df.Capacity),
756 Technology=np.nan,
757 Set=np.nan,
758 lat=np.nan,
759 lon=np.nan,
760 )
761 .powerplant.convert_alpha2_to_country()
762 # .pipe(fill_geoposition, **fill_geoposition_kwargs)
763 .query("Capacity > 0")
764 .pipe(gather_specifications, config=config)
765 .pipe(clean_name)
766 .pipe(set_column_name, "ENTSOE")
767 .pipe(config_filter, config)
768 )

File C:\mambaforge-data\envs\powerplantmatching\Lib\site-packages\pandas\core\frame.py:4889, in DataFrame.assign(self, **kwargs)
4886 data = self.copy()
4888 for k, v in kwargs.items():
-> 4889 data[k] = com.apply_if_callable(v, data)
4890 return data

File C:\mambaforge-data\envs\powerplantmatching\Lib\site-packages\pandas\core\common.py:374, in apply_if_callable(maybe_callable, obj, **kwargs)
363 """
364 Evaluate possibly callable input using obj and kwargs if it is callable,
365 otherwise return as it is.
(...)
371 **kwargs
372 """
373 if callable(maybe_callable):
--> 374 return maybe_callable(obj, **kwargs)
376 return maybe_callable

File c:\work\data\powerplantmatching\powerplantmatching\data.py:752, in ENTSOE..(df)
743 fn = package_data("entsoe_country_codes.csv")
744 COUNTRY_MAP = pd.read_csv(fn, index_col=0).rename(index=str).Country
746 return (
747 df.rename_axis(index="projectID")
748 .reset_index()
749 .rename(columns=RENAME_COLUMNS)
750 .drop_duplicates("projectID")
751 .assign(
--> 752 Name=lambda df: df.Name.str.replace("
", " "), # for geoparsing
753 EIC=lambda df: df.projectID,
754 Country=lambda df: df.projectID.str[:2].map(COUNTRY_MAP),
755 Capacity=lambda df: pd.to_numeric(df.Capacity),
756 Technology=np.nan,
757 Set=np.nan,
758 lat=np.nan,
759 lon=np.nan,
760 )
761 .powerplant.convert_alpha2_to_country()
762 # .pipe(fill_geoposition, **fill_geoposition_kwargs)
763 .query("Capacity > 0")
764 .pipe(gather_specifications, config=config)
765 .pipe(clean_name)
766 .pipe(set_column_name, "ENTSOE")
767 .pipe(config_filter, config)
768 )

File C:\mambaforge-data\envs\powerplantmatching\Lib\site-packages\pandas\core\generic.py:5902, in NDFrame.getattr(self, name)
5895 if (
5896 name not in self._internal_names_set
5897 and name not in self._metadata
5898 and name not in self._accessors
5899 and self._info_axis._can_hold_identifiers_and_holds_name(name)
5900 ):
5901 return self[name]
-> 5902 return object.getattribute(self, name)

AttributeError: 'DataFrame' object has no attribute 'Name'`

The Pandas version I used is 1.5.3.

By now, I think I know what causes the error. None of the PPM datasets is downloaded to my drive (at least I cannot find any). So the true mistake seems to lie there. So, my new question: Shouldn't that download happen when I use the powerplantmatching package? Like when I use pm.powerplants(from_url=False)?

Thank you!

@jensch-dlr
Copy link
Contributor Author

Okay, I found the problem. I had a version of entsoe_powerplants.csv in my directory that was from mid-2021. Unlike now, its column titles were
,Unnamed: 0,registeredResource.name,registeredResource.mRID,voltage_PowerSystemResources.highVoltageLimit,psrType,quantity,Country (now: ,Bidding Zone,Installed Capacity [MW],Name,Production Type,Voltage Connection Level [kV]).

That was the reason for the error message. So, I am guessing that currently PPM is only checking if a file with the same name already exists in the in data directory. Wouldn't it be better to check if it is actually the same file and update it if not?

@FabianHofmann
Copy link
Contributor

You're probably right. But this would actually require to go deep into the code. Perhaps a reset option would be better. Just in case one wants to make a fresh install. That would have solved the problem for you right?

@jensch-dlr
Copy link
Contributor Author

It would have, yes. Maybe a warning with a hint on that possible problem would have speeded up the process. But that might again be too hard to integrate?

@FabianHofmann FabianHofmann changed the title pm.powerplants(update=True) gives AttributeError? Add info if datasets are not freshly loaded Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants