Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New release? #60

Open
Erotemic opened this issue Dec 3, 2021 · 6 comments
Open

New release? #60

Erotemic opened this issue Dec 3, 2021 · 6 comments

Comments

@Erotemic
Copy link
Contributor

Erotemic commented Dec 3, 2021

Wondering if there are plans to release the latest code on pypi? Looks like the version in __init__.py needs to be bumped. Current version on pypi is 1.4.0 (https://pypi.org/project/FeLS/), which is what is also reported in __init__.py, but the current code is newer than that.

I made a PR that bumps the version: #59

@vascobnunes
Copy link
Owner

Hello! Thanks for you interest.
I am missing the time to properly test the latest PR's, thats why I haven't yet made a new release. Have you tested multiple scenarios on your side with the most current commits?

@Erotemic
Copy link
Contributor Author

Erotemic commented Dec 6, 2021

I've have been using it locally, but I would not say that the usage of fels in my workflow is comprehensive, nor have I 100% verified that there isn't some mistake.

I did test it to the extent that it passes the tests on the CI, but I don't think what exists is comprehensive.

To this end, I wrote another small PR #61 that adds two tests that verify that command line and python API usage produces the same results. I threw a few variations at it, but didn't add in everything that I could of. But if there is anything else you think should be tested, it probably makes sense to have that tested on the CI, so a green dashboard gives confidence that it is ok to release a new version.

@vascobnunes
Copy link
Owner

vascobnunes commented Dec 7, 2021

Thanks a lot for this! I was looking at opened bugs and remembered we had this #56 still to solve. Do you use windows?

@Erotemic
Copy link
Contributor Author

Erotemic commented Dec 7, 2021

It looks like #56 is more of an issue with fiona / gdal on windows than it is with this package.

If you install it from pip, without installing other dependencies first, then they will also be installed from pip. I imagine this has issues on windows, where you probably want to install all the tools that depend on geos (fiona, shapely, gdal) from conda. I don't think there is anything to do in this repo on that. You just have to be careful to install those conda dependencies first. (really the fiona maintainers should release win32 wheels on pypi).

There does seem to be another issue. The master dashboards are failing. The specific query results that I got on my machine on those tests are coming back different now. This happened once on my machine, but I thought I made a typo. Seems like there might be some actual non-determinism under the hood.

On the latest master dashboards the expected vs actual L8 results I'm getting are:

E         - [datetime.date(2015, 6, 3), datetime.date(2015, 5, 27)]
E         ?                      ^  ^                       ^  ^^
E         + [datetime.date(2015, 2, 27), datetime.date(2015, 1, 19)]

I just deleted my cache directory and re-ran it again, and again I got different results. Something fishy is going on. For landsat I just got:

expected_dates = [datetime.date(2015, 6, 3), datetime.date(2015, 5, 27)]
api_dates = [datetime.date(2015, 2, 27), datetime.date(2015, 1, 19)]

Is it possible that the results for a query with a specific spacetime location in 2015 would be changing?

@Erotemic
Copy link
Contributor Author

Erotemic commented Dec 7, 2021

This is very weird indeed. I have two versions of the cache on my machine (one I made yesterday and one I made today) and they consistently reproduce the same different results.

Failing one from today:

┌─── START CMD ───
[ubelt.cmd] joncrall@toothbrush:~/code/mkinit$ fels S2 2018-01-01 2018-06-30 -c 30 -o . -g '{"type": "Point", "coordinates": [-105.2705, 40.015]}' --latest --list --outputcatalogs /home/joncrall/.cache/fels-3
Converted WKT to scene: 13TDE [1/1]
/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/geopandas/_compat.py:111: UserWarning: The Shapely GEOS version (3.9.1-CAPI-1.14.2) is incompatible with the GEOS version PyGEOS was compiled with (3.10.0-CAPI-1.16.0). Conversions between both will be slow.
Searching for Sentinel-2 images in catalog...
  warnings.warn(
[cacher] tryload fname=index_Sentinel.csv
[cacher] ... index_Sentinel.csv cache hit
Found 1 files.
http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2B_MSIL1C_20180621T174909_N0206_R141_T13TDE_20180621T212007.SAFE
└─── END CMD ───
expected_dates = [datetime.date(2018, 1, 4)]
api_dates      = [datetime.date(2018, 6, 21)]
expected_urls = [
    'http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2A_MSIL1C_20180104T175251_N0206_R098_T13TDE_20180104T191930.SAFE',
]
api_url_results = {
    '1': [
        'http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2B_MSIL1C_20180621T174909_N0206_R141_T13TDE_20180621T212007.SAFE',
    ],
    '2': [
        'http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2B_MSIL1C_20180621T174909_N0206_R141_T13TDE_20180621T212007.SAFE',
    ],
}
cli_url_results = {
    '1': [
        'http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2B_MSIL1C_20180621T174909_N0206_R141_T13TDE_20180621T212007.SAFE',
    ],
    '2': [
        'http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2B_MSIL1C_20180621T174909_N0206_R141_T13TDE_20180621T212007.SAFE',
    ],
}

Passing one from yesterday:

┌─── START CMD ───
[ubelt.cmd] joncrall@toothbrush:~/code/mkinit$ fels S2 2018-01-01 2018-06-30 -c 30 -o . -g '{"type": "Point", "coordinates": [-105.2705, 40.015]}' --latest --list --outputcatalogs /home/joncrall/.cache/fels
/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/geopandas/_compat.py:111: UserWarning: The Shapely GEOS version (3.9.1-CAPI-1.14.2) is incompatible with the GEOS version PyGEOS was compiled with (3.10.0-CAPI-1.16.0). Conversions between both will be slow.
Converted WKT to scene: 13TDE [1/1]
  warnings.warn(
Searching for Sentinel-2 images in catalog...
[cacher] tryload fname=index_Sentinel.csv
[cacher] ... index_Sentinel.csv cache hit
Found 1 files.
http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2A_MSIL1C_20180104T175251_N0206_R098_T13TDE_20180104T191930.SAFE
└─── END CMD ───
expected_dates = [datetime.date(2018, 1, 4)]
api_dates      = [datetime.date(2018, 1, 4)]
expected_urls = [
    'http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2A_MSIL1C_20180104T175251_N0206_R098_T13TDE_20180104T191930.SAFE',
]
api_url_results = {
    '1': [
        'http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2A_MSIL1C_20180104T175251_N0206_R098_T13TDE_20180104T191930.SAFE',
    ],
    '2': [
        'http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2A_MSIL1C_20180104T175251_N0206_R098_T13TDE_20180104T191930.SAFE',
    ],
}
cli_url_results = {
    '1': [
        'http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2A_MSIL1C_20180104T175251_N0206_R098_T13TDE_20180104T191930.SAFE',
    ],
    '2': [
        'http://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2A_MSIL1C_20180104T175251_N0206_R098_T13TDE_20180104T191930.SAFE',
    ],
}

It is true that the index_Sentinel.csv is different between the two caches:

(pyenv3.8.6) joncrall@toothbrush:~/.cache/fels$ sha1sum index_Sentinel.csv
2a88c80144c61497fa7b79092e5c3d8216e881a4  index_Sentinel.csv
(pyenv3.8.6) joncrall@toothbrush:~/.cache/fels-3$ sha1sum index_Sentinel.csv
3b317842019b7dea7a75b9ac5c72ce86bc4f7a5c  index_Sentinel.csv

But would the URL http://storage.googleapis.com/gcp-public-data-sentinel-2/index.csv.gz downloaded on different days contain different regions?

Note, this goes beyond the SQL cache, because the behavior persists even when I fallback to the CSV parsing method.

@Erotemic
Copy link
Contributor Author

Erotemic commented Dec 7, 2021

I think I found the bug. The issue is that I had the flag latest=True, which means each scene only returned on result. This is not a problem by itself, but it seems there is a bug in the sort_url_list code.

def sort_url_list(cc_values, all_acqdates, all_urls):
    """Sort the url list by increasing cc_values and acqdate."""
    cc_values = sorted(cc_values)
    all_acqdates = sorted(all_acqdates, reverse=True)
    all_urls = [x for (y, z, x) in sorted(zip(cc_values, all_acqdates, all_urls))]
    urls = []
    for url in all_urls:
        urls.append('http://storage.googleapis.com/' + url.replace('gs://', ''))
    return urls

The following does not work as described. The first two sorts destroy any alignment between the three input lists, thus the all_urls sort command just returns the urls in roughly whatever order they were given in. That is the cause of the bug. I will fix it in a patch.

New implementation looks like this:

def sort_url_list(cc_values, all_acqdates, all_urls):
    """
    Sort the url list, first by ascending cc_values, and then by descending
    acqdate. Also replaces the gs:// prefix with the google api http prefix.

    Args:
        cc_values (List[float]): cloud cover for each item
        all_acqdates (List[datetime.datetime]): datetime for each item
        all_urls (List[str]): url for each item

    Returns:
        List[str]: sorted and modified urls

    Example:
        >>> from fels.utils import *  # NOQA
        >>> cc_values = [2.11, 1.85, 18.51, 2.85, 3.92, 18.32]
        >>> all_acqdates = [datetime.datetime(2015, 3, 31, 0, 0),
        >>>                 datetime.datetime(2015, 6, 19, 0, 0),
        >>>                 datetime.datetime(2015, 2, 27, 0, 0),
        >>>                 datetime.datetime(2015, 1, 26, 0, 0),
        >>>                 datetime.datetime(2015, 3, 15, 0, 0),
        >>>                 datetime.datetime(2015, 6, 3, 0, 0)]
        >>> all_urls = ['gs://test_url_{}'.format(i) for i in range(len(cc_values))]
        >>> sorted_urls = sort_url_list(cc_values, all_acqdates, all_urls)
        >>> print('sorted_urls = {}'.format(ub.repr2(sorted_urls, nl=1)))
        sorted_urls = [
            'http://storage.googleapis.com/test_url_1',
            'http://storage.googleapis.com/test_url_0',
            'http://storage.googleapis.com/test_url_3',
            'http://storage.googleapis.com/test_url_4',
            'http://storage.googleapis.com/test_url_5',
            'http://storage.googleapis.com/test_url_2',
        ]
    """
    # For implementation clarity, table-like list of dictionary rows
    rows = [
        {'cc': cc, 'date': date, 'url': url}
        for cc, date, url in zip(cc_values, all_acqdates, all_urls)
    ]
    # First group and sort by ascending cloudcover
    cc_to_rows = ubelt.group_items(rows, key=lambda row: row['cc'])
    cc_to_rows = ubelt.sorted_keys(cc_to_rows)

    sorted_urls = []
    for cc, group in cc_to_rows.items():
        # Then within each group, sort by descending date
        group = sorted(group, key=lambda row: (row['date'], row['url']), reverse=True)
        for row in group:
            url = row['url']
            new_url = 'http://storage.googleapis.com/' + url.replace('gs://', '')
            sorted_urls.append(new_url)
    return sorted_urls

It's not as efficient as it could be, but it's not inefficient by any stretch, and I think the clarity of the implementation is better, which is important considering this function was buggy. (It's also not clear to me how to sort by a tuple where one item is ascending and another is descending when I can't easily negate one of the terms, so I like this grouping solution better). Hopefully it is clear what the ubelt functions are doing. (they are documented: https://ubelt.readthedocs.io/en/latest/ubelt.html#ubelt.group_items)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants