Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Refresh] <1st> Review of <Tianji Studio Fil+> #268

Open
liyunzhi-666 opened this issue Jan 6, 2025 · 7 comments
Open

[DataCap Refresh] <1st> Review of <Tianji Studio Fil+> #268

liyunzhi-666 opened this issue Jan 6, 2025 · 7 comments
Assignees
Labels
Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@liyunzhi-666
Copy link

liyunzhi-666 commented Jan 6, 2025

Basic info

  1. Type of allocator: [manual]
  1. Paste your JSON number: [1042]

  2. Allocator verification: [yes]

  1. Allocator Application
  2. Compliance Report

Current allocation distribution

Client name DC granted
Encyclopedia of DNA Elements 1.5 PiB
LAMOST DR8 public data 900 TiB
Aiyao 2.5 PiB

I. Encyclopedia of DNA Elements

  • DC requested: 10 PiB
  • DC granted so far: 1.5 PiB

II. Dataset Completion

https://registry.opendata.aws/encode-project/
https://www.encodeproject.org

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

Basic match. If there are additions or changes to the SPs, the client is updated in github.

IV. How many replicas has the client declared vs how many been made so far:

10 vs 7

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f03178144 80.79% YES
f03179572 62.62% NO
f03214937 46.79% NO
f03178077 82.50% YES
f03179570 37.88% NO
f0870558 67.58% NO
f03055018 93.67% YES
f01315096 92.11% YES
f03055005 57.46% NO
f03055029 95.41% YES
f01106668 95.00% YES
f03151456 65.19% NO
f03151449 68.98% NO

I. LAMOST DR8 public data(The client commitment to start deals soon)

  • DC requested: 10 PiB
  • DC granted so far: 900 TiB

I. Aiyao

  • DC requested: 10 PiB
  • DC granted so far: 2.5 PiB

II. Dataset Completion

liyunzhi-666/TianjiStudio-Fil#8 (comment)

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

Basic match. If there are additions or changes to the SPs, the client is updated in github.

IV. How many replicas has the client declared vs how many been made so far:

10 vs 5

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f03229933 29.22% NO
f03229932 34.83% NO
f03214937 47.40% NO
f03179570 41.87% NO
f03179572 60.78% NO
f03231121 0.00% NO
f03231117 0.00% NO

There are a few SPs's Spark retrieval success rate of 0. Client has updated the description in the comments.
image

Allocation summary

  1. Notes from the Allocator

When the Allocator finds that there is no significant improvement in the success rate of Spark retrieval for SPs working with Client,Allocator takes steps to limit the amount of Datacap approved.Allocator approves 1 PiB instead of approving 2 PiB.
image

  1. Did the allocator report up to date any issues or discrepancies that occurred during the application processing?

Yes.The Allocator was malfunctioning during the last signing. An issue has been submitted and is waiting to be resolved by the official team.fidlabs/allocator-tooling#102

  1. What steps have been taken to minimize unfair or risky practices in the allocation process?

1.Regular checking. Check the client's application through the CID checker and when the report is unsatisfactory, take appropriate action to limit the approval.
2.Request more data samples from clients.
3.Request client for KYC review. For example, ask client to send document information to allocator official email for further approval.

  1. How did these distributions add value to the Filecoin ecosystem?

1.Allocator encourages different types of datasets, including public datasets and enterprise open datasets.
2.Allocator is concerned about whether datasets are duplicated on the Filecoin network. Of course, if the client gives a valid reason, we generally adopt a policy of encouragement and tolerance, because it is true that, as the client said, many datasets may have been stored before but are now outdated, or in the case of cooperating SPs, it is the first time for them to store them.

  1. Please confirm that you have maintained the standards set forward in your application for each disbursement issued to clients and that you understand the Fil+ guidelines set forward in your application

Yes

  1. Please confirm that you understand that by submitting this Github request, you will receive a diligence review that will require you to return to this issue to provide updates.

Yes

@filecoin-watchdog filecoin-watchdog added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 6, 2025
@filecoin-watchdog
Copy link
Collaborator

@liyunzhi-666
Encyclopedia of DNA Elements

  • The requested DataCap exceeds the maximum limit of 5 PiB as specified in the allocator rules. The dataset has been stored at least three times within the Filecoin Plus (Fil+) program. The allocator raised a question on this matter, which was not addressed, and the topic was not revisited.
    references:
  • While most storage providers (SPs) operate under VPNs, which is permitted under the allocator rules, no additional due diligence was performed to address this aspect.
  • The geographical diversification requirement of three continents and four cities, as stipulated by the allocator, was not met and is difficult to verify due to VPN usage.
  • Additionally, there is no information provided about the data preparation steps, which could offer the community insights into how the files can be processed and computed in the future.

LAMOST DR8 public data

  • The requested DataCap significantly exceeds the allowable limit, by more than three times the amount specified in the allocator rules.
  • There is no evidence of any geographical verification of SPs, some of whom partially operate under VPNs, making compliance with geographical diversification requirements impossible to confirm.
  • Most of the questions raised during the application process remained unanswered. Despite this, the allocator proceeded with assigning DataCap.
  • Furthermore, no details were provided regarding the data preparation steps, which could help the community understand how the files can be processed and computed.

Aiyao

  • No indication exists that any Know Your Customer (KYC) or Know Your Business (KYB) processes were conducted for the client or storage providers (who also use VPNs).
  • Many questions raised during the application process were left unanswered.
  • Additionally, no information was provided regarding data preparation, the societal or community value of the data, or its potential use.
  • Of the seven SPs involved, five demonstrated 0% retrievability.

In the Allocator Application, the allocator said:

In order to track VPN usage, we will implement a monitoring system to detect and log VPN activity, including connection time, duration, and source IP address.

What's the progress in preparing such a tool? Will this be open source or available somehow to use broadly?

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 20, 2025
@filecoin-watchdog
Copy link
Collaborator

@liyunzhi-666 I'm awaiting your response.

@Kevin-FF-USA
Copy link
Collaborator

Hi @liyunzhi-666

Thanks for submitting this application for refresh.
Wanted to send you a friendly update - once you finish answering questions on the diligence raised by Watchdog, this will move forward with Galen on behalf of the Governance this week. If you have any questions or need support until then, please let us know.

Warmly,
-Kevin

@filecoin-watchdog filecoin-watchdog added Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards and removed Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. labels Feb 5, 2025
@liyunzhi-666
Copy link
Author

Sorry for the late reply as I'm still on vacation due to the recent mainly Chinese New Year holiday. First of all,happy Chinese New Year @filecoin-watchdog @Kevin-FF-USA , your review comments help our allocator operation a lot. For the questions you mentioned, I will explain to you as follows:

  1. Encyclopedia of DNA Elements
    Because of the translation problem, what I was actually trying to express was that Datacap allocates a maximum limit of 5 PiB Datacap per round, and Datacap's request volume for 1 year is 150 PiB. If we just follow the 5 PiB Datacap limit per client, it would take 30 clients to use up the requested 150 PiB Datacap, which at this stage is clearly not realistic. This is obviously not realistic at this stage. Of course I will update the Allocator application in Github to explain this to avoid any misunderstanding: there is no limit for data clients to request Datacap, but the maximum approval limit for each round of Datacap is 5 PiB.

Image

Image

  1. LAMOST DR8 public data
    Throughout the application process, Allocator has asked questions Client has responded to in a relevant way, as follows:

Image

Image

When Allocator received the relevant KYC email, the first round of Datacap approvals were made in accordance with the rules.

Image

  1. Aiyao
    Client does have a detailed reply to Allocator's question, see the screenshot and link below:

Image

liyunzhi-666/TianjiStudio-Fil#8 (comment)
liyunzhi-666/TianjiStudio-Fil#8 (comment)
liyunzhi-666/TianjiStudio-Fil#8 (comment)

For Spark retrieval rate, Client proactively disclosed in advance that the SP is a new SP and the Spark retrieval rate is 0%, and promised to keep improving. However, based on Filecoin's limited trust principle, as an Allocator operator, I have taken the Datacap restriction approval measure. Meanwhile, the latest data report shows that the Spark retrieval rate is gradually improving, and has slowly started to increase from 0%.

Image

Image

In addition, for the issues you @filecoin-watchdog mentioned, such as SPs using VPNs, lack of detail in data preparation steps, and multiple storage of datasets, we plan to make further improvements in the future. As an Allocator operator, I will focus on the data preparation detailed session and dataset storage value description, and if Client can't give a convincing reason, I will limit datacap approval or ask for more information.

Of course, in the process of actual business operation, we did encounter some real problems, firstly, there are not so many Clients, otherwise it would not be so bad that we only finished allocating the original 5 PiB Datacap after Allocator has been in operation for almost 1 year. secondly, as to how to identify whether SPs are using VPNs or not, our technical team really did not find any better way, so for the Secondly, our technical team really didn't find a better way to identify whether SPs are using VPNs, so the VPN monitoring system mentioned in the application process of Allocator at that time has not been put on the agenda late, may I ask how do you determine whether SPs are using VPNs or not? If there is a better way, please feel free to share. Of course, compared to the monitoring system, it is more important to expand the influence of Tianji Allocator to attract more Data clients to apply to us.

Finally, I would like to emphasize that as V3, V4, and V5 notaries, we have always been part of the Filecoin community, and have witnessed the development of Filecoin with the Filecoin governance team, which in itself has been a great life and project experience. Although Tianji Studio Allocator is not the best performer among V5 Allocators, and Datacaps are not used very fast, we still hope to get 10 PiB Datacap support, and in the next half year, we plan to add 3-5 new Data Clients to continue to contribute to the ecosystem.

@filecoin-watchdog
Copy link
Collaborator

@liyunzhi-666 As for the VPN, please refer to this comment: #264 (comment).
Also, here is an example of a tool (one of many) that could be used: https://iphub.info/

@liyunzhi-666
Copy link
Author

@filecoin-watchdog I will refer to it and learn. Thanks for sharing.

@liyunzhi-666
Copy link
Author

Hi, @Kevin-FF-USA @galen-mcandrew I submitted feedback on a related issue a few days ago, can I get an updated guide to the Allocator work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

3 participants