Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Refresh] <3rd> Review of <Marshall Fil+ Allocator> #298

Open
Marshall-btc opened this issue Feb 20, 2025 · 5 comments
Open

[DataCap Refresh] <3rd> Review of <Marshall Fil+ Allocator> #298

Marshall-btc opened this issue Feb 20, 2025 · 5 comments
Assignees
Labels
Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@Marshall-btc
Copy link

Marshall-btc commented Feb 20, 2025

Basic info

  1. Type of allocator: [manual]

  2. Paste your JSON number: [1057]

  3. Allocator verification: [yes]

  1. Allocator Application
  2. Compliance Report
  1. Previous reviews

Current allocation distribution

Client name DC granted
LZ 1.25 PiB
Landsat 2.75 PiB
Large Sky Area Multi-Object Fiber Spectroscopic Telescope-5 2 PiB
zzflk 3.5 PiB
National Microbiology Data Center 0.5 PiB

I. LZ

  • DC requested: 5 PiB
  • DC granted so far: 5PiB

II. Dataset Completion
https://www.livzon.com.cn/intro/2.html

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Basic matches, if there are additions or changes to the SPs, Client is updated in Github

IV. How many replicas has the client declared vs how many been made so far:

5 vs 9

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f03178077 0.00 NO
f03179572 8.56 NO
f03178144 0.00 NO
f03214937 9.57 NO
f03179570 2.27 NO
f03229933 5.54 NO
f03179555 17.77 NO
f01083939 38.36 NO
f01083949 1.89 NO
f03300963 0.80 NO
f01315096 77.44 YES
f03055029 2.02 NO
f03055018 76.59 YES
f01106668 63.42 NO
f0870558 47.84 NO
f03055005 85.19 YES
f03151456 12.13 NO
f03091739 71.27 NO
f03173127 72.43 NO
f010202 71.50 NO
f03144037 66.67 NO
f03156617 70.42 NO
f03148356 68.85 NO
f01518369 34.64 NO
f01889668 26.43 NO

I. Landsat

  • DC requested: 10 PiB
  • DC granted so far: 5.5 PiB

II. Dataset Completion
https://www.gscloud.cn/sources/index?pid=263&ptitle=LANDSAT%E7%B3%BB%E5%88%97%E6%95%B0%E6%8D%AE&rootid=1

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Basic matches, if there are additions or changes to the SPs, Client is updated in Github

IV. How many replicas has the client declared vs how many been made so far:

10 vs 10

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f03229932 7.65 NO
f03178077 0.00 NO
f03179572 8.56 NO
f03179570 2.27 NO
f03178144 0.00 NO
f03214937 9.57 NO
f03179555 17.77 NO
f01083949 1.89 NO
f01083939 38.36 NO
f03300963 0.80 NO
f01315096 77.44 YES
f01106668 63.42 NO
f03055018 76.59 YES
f03055029 2.02 NO
f0870558 47.84 NO
f03055005 85.19 YES
f03151449 12.13 NO
f03151456 19.81 NO
f03091739 71.27 NO
f02199999 - None
f03144037 66.67 NO
f01889668 26.43 NO
f01518369 34.64 NO

I. Large Sky Area Multi-Object Fiber Spectroscopic Telescope-5

  • DC requested: 15 PiB
  • DC granted so far: 4.49 PiB

II. Dataset Completion
<http://dr5.lamost.org/v3/sas/catalog/
http://dr5.lamost.org/v3/sas/fits/20111024/F5902/
http://dr5.lamost.org/v3/sas/fits/20111024/F5907/
http://dr5.lamost.org/v3/sas/fits/20111024/F5909/
http://dr5.lamost.org/v3/sas/png/20111024/F5902/
http://dr5.lamost.org/v3/sas/png/20111024/F5907/
http://dr5.lamost.org/v3/sas/png/20111024/F5909/
http://dr5.lamost.org/v3/sas/sky/20111024/>

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Basic matches, if there are additions or changes to the SPs, Client is updated in Github

IV. How many replicas has the client declared vs how many been made so far:

6 vs 8

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f03214937 9.57 NO
f03229933 5.54 NO
f03229932 7.65 NO
f03179572 8.56 NO
f03179570 2.27 NO
f03179555 17.77 NO
f01083939 38.36 NO
f01083949 1.89 NO
f03300963 0.80 NO
f01315096 77.44 YES
f03055029 2.02 NO
f03055018 76.59 YES
f01106668 63.42 NO
f0870558 47.84 NO
f03055005 85.19 YES
f03188440 - None
f03190614 - None
f03190616 - None
f03156617 70.42 NO
f03148356 68.85 NO
f03144037 66.67 NO
f01889668 26.43 NO
f01518369 34.64 NO

I. zzflk

  • DC requested: 10 PiB
  • DC granted so far: 3.5 PiB

II. Dataset Completion
http://www.zzflk.com/product.html

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Basic matches, if there are additions or changes to the SPs, Client is updated in Github

IV. How many replicas has the client declared vs how many been made so far:

8 vs 5

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f03308922 22.57 NO
f03308920 25.15 NO
f03298672 38.58 NO
f03298673 39.73 NO
f03308933 33.94 NO
f03229932 7.65 NO
f03229933 5.54 NO

I. National Microbiology Data Center

  • DC requested: 18 PiB
  • DC granted so far: 0.5 PiB

II. Dataset Completion
https://nmdc.cn/monitor/

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Client has not started after submitting the list of SPs

IV. How many replicas has the client declared vs how many been made so far:

9 vs X

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?

Allocation summary

  1. Notes from the Allocator

Marshall-btc/Marshall-Fil-Data-Pathway#18 (comment)
When I realized that the SPs Spark retrieval by Client's partner did not improve significantly, I took the step of limiting the approval of Datacap amount by approving 512 TiB instead of approving 1 PiB.
Image

We focus on Spark retrieval rates and demand continuous improvement
Image

We are also concerned about the data sample and the veracity of the data volume
Image

Marshall-btc/Marshall-Fil-Data-Pathway#5 (comment)
2 PiB Datacap support was restored when I realized that Client was reducing the number of SPs cooperating to use DDO mode
Image

  1. Did the allocator report up to date any issues or discrepancies that occurred during the application processing?

Yes, when I notice a drop in the success rate of SPs Spark retrievals and other storage issues that Client collaborates on, I will point out the problem in a GitHub comment and assist Client in resolving the issue

fidlabs/allocator-tooling#102
When we found a bug in signing for Client, we promptly submitted the relevant bug issue.

  1. What steps have been taken to minimize unfair or risky practices in the allocation process?

1.We will carry out KYC process, ask client to provide proof of identity and send relevant emails
Image

2.According to the result of CID checker, we will reduce the allocation of Datacap when the client's work needs to be improved.

  1. How did these distributions add value to the Filecoin ecosystem?

We look at whether the dataset is stored in the Filecoin network and try to avoid bringing duplicate datasets into the Filecoin network

  1. Please confirm that you have maintained the standards set forward in your application for each disbursement issued to clients and that you understand the Fil+ guidelines set forward in your application

Yes

  1. Please confirm that you understand that by submitting this Github request, you will receive a diligence review that will require you to return to this issue to provide updates.

Yes

@Kevin-FF-USA Kevin-FF-USA added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Feb 25, 2025
@filecoin-watchdog
Copy link
Collaborator

[DataCap Application] <LZ> # 1

  • Total DataCap requested: 5 × 580 TiB = 2.8 PiB, which does not align with the client application.
  • Data sample link: Broken (returns 404), with no evidence of proper justification for the data size.
  • Dataset description: Suggests corporate data, but it was confirmed to be a public dataset.
  • Storage Providers (SPs): New SPs were not updated in the original application before appearing in the report.
  • Retrievability: Did not reach 75% for 88% of the SPs.
  • Data copies: 9 unique copies exist, but only 5 were declared.

[DataCap Application] <Landsat> # 3

  • Dida allocator helped @himomo007 in filling the application or maybe introduced to sp’s ? ( application have many similarities to [DataCap Application] <LZ> # 1 (same mistakes and most of declared SPs). ?
  • SP overlap: 90% of the SPs match those used in the previous project.
  • Data replication: Data has been stored multiple times on the Filecoin ecosystem; claiming specific SPs didn’t store it does not justify another DataCap allocation for the same data.
  • Replicas calculation: Declared 10 replicas × 600 TiB per copy does not match the requested 10 PiB DataCap (same error as in application # 1).
  • Retrievability: 88% of SPs fail to reach 75% retrievability.
  • SP list: Not updated or disclosed in advance.
  • Data preparation: No information provided to link original files with respective pieces.
  • Requested DataCap: 10 PiB exceeds the "Max per client overall" limit of 5 PiB stated in the application.

[DataCap Application] <Large Sky Area Multi-Object Fiber Spectroscopic Telescope-5> # 5

  • Data replication: Data is stored multiple times on the network; the client avoided answering a related question, and the allocator did not follow up.
  • Data preparation: No details provided to connect original files with respective pieces.
  • SP selection: Questions arise about whether the allocator assisted in choosing SPs, as ~80% overlap with at least one of the two prior applications. Are users from applications # 3 and # 1 connected ?
  • Replicas calculation: 6 declared copies × 3000 TiB each = 17.58 PiB, inconsistent with the user application (same mistake as in applications # 3 and # 1).
  • Retrievability: Generally low, failing to reach 75% in most cases.
  • Requested DataCap: 15 PiB exceeds the "Max per client overall" limit of 5 PiB.
  • SP list: Large number of providers, not updated in the original application, complicating comparison and compliance.

[DataCap Application] <zzflk> # 18

  • Data preparation: No information provided to connect original files with respective pieces or to recreate the data preparation process.
  • KYC and data size: The allocator conducted KYC and asked questions to justify the data size, but three Windows File Explorer screenshots do not adequately support the 1.25 PiB size of one copy.
  • Retrievability: Not met on any SPs.
  • Data concentration: 54% of the data is sealed in two SPs (f03308922 and f03308920, possibly the same entity, JP).
  • Replicas: Most data is stored under two replicas.
  • Unique data size: One copy is currently 1.389 PiB, acceptable for now but should be monitored to ensure it doesn’t increase.
  • Requested DataCap: 10 PiB exceeds the "Max per client overall" limit of 5 PiB.

[DataCap Application] <National Microbiology Data Center> # 20

  • No sealed data as for today.
  • KYC: Evidence exists of KYC and diligence for user identification.
  • Application completeness: Many fields in the original application were left as "no response."
  • Data size justification: No clear evidence supports the claimed 2 PiB dataset size.
  • Data preparation: Marked as public, but no information connects original files to respective pieces or recreates the preparation process.
  • Declared dataset size: Exceeds the size permitted by allocator rules.

Summary:

  • Poor Retrievability Rates: SPs frequently fell short of the 75% retrievability standard, with failure rates reaching up to 88%
  • In most data stored previously or right now on the network.
  • Missing Data Preparation Details: Applications lacked critical information on how data was prepared, how end users of the community can use it.
  • Questions Around SP Selection: There’s a noticeable overlap in SPs across applications.
  • Weak Data Size Justification
  • Incomplete Applications: Many submissions had missing or vague responses.

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Feb 26, 2025
@Marshall-btc
Copy link
Author

Marshall-btc commented Feb 28, 2025

Improvements in response to @filecoin-watchdog six audit observations
First of all, thank you @filecoin-watchdog for your objective and fair evaluation. As the allocator operator of the Filecoin network, we have always been open and transparent, and are committed to continuously improving the quality of our audits and operations. Every review is an opportunity for us to improve and help us better serve the Filecoin ecosystem.
In response to the six issues raised by @filecoin-watchdog , we have developed the following improvement measures:

1. Poor Retrievability Rates

  • Problem Description: Some storage providers (SPs) fail to meet the 75% retrieval rate standard.
  • Analysis: Marshall is currently working with 5 customers involving 75 SPs around the world, and due to the large differences in technical capabilities of different SPs, the performance of retrievability rates varies. While most SPs do not meet the 75% standard, retrieval rates are generally in the 40%-70% range (screenshot below), which is acceptable.
  • Improvement Steps:
    • Continuously request SPs to improve the retrieval rate, and seek technical support from reliable SPs if necessary.
    • Regularly monitor the performance of SPs in terms of retrieval rate and communicate with under-performing SPs.

Image

Image

Image

2. Data Previously Stored on the Network

  • Problem description: Some data may have been stored on the Filecoin network.
  • Analysis: During the KYC process we ask our clients if the data has been stored before. Currently, all clients have indicated that the data is either stored for the first time or has been stored but has value. Based on the principle of maximizing trust, we usually approve requests for data storage as long as the client confirms that the SPs they are currently working with are storing the data for the first time.
  • Improvement Steps:
    • Strengthen KYC audits of data by asking clients to provide more detailed descriptions of storage history.
    • For duplicate data storage, we require clients to clearly state the significance of the data to the Filecoin network, otherwise the application will not be approved.

Image

Image

3. Missing Data Preparation Details

  • Problem description: The lack of detailed information in the application about the data preparation process makes it difficult for community users to understand how the data will be used.
  • Improvement Steps:
    • For new clients, ask for a detailed description of the data preparation process, including the source of the data, its format, and how it was processed.
    • Encourage clients to share use cases of the data so that community users can better understand the value of the data.

4. Issues Around SP Selection

  • Problem Description: There is significant overlap in the SPs used in different applications.
  • Analysis: Marshall currently works with about 75 SPs, and we attended the Web3 event in Thailand last November, where we connected with many customers and SPs with data storage needs, and helped broker some of the SPs' Pledged Coin partnerships.

Image

5. Insufficient rationalization of data size

  • Problem description: Some applications do not adequately justify the size of the data.
  • Improvement:
    • Require clients to provide more data samples or data size credentials (although some of the application applications already operate this way, e.g., screenshots) to verify the reasonableness of their applications.
    • For large-scale data applications, use sampling to ensure the authenticity and reasonableness of the data.

Image

Image

Image

6. Incomplete application information

  • Problem description: Some applications have missing information or vague answers.
  • Improvement measures:
    • For old customers, immediately send an audit request and ask them to supplement the missing information.

Image

Image

Image

Image

  • For new customers, strengthen the audit process to ensure the completeness and accuracy of application information.

@Marshall-btc
Copy link
Author

Marshall-btc commented Feb 28, 2025

Explanation of discrepancies in the application
In the course of our operations, we have noticed that some application applications have differential problems (not every application has the same problem), and the following are specific descriptions and improvements:
1. Dataset Size Does Not Match Requested Total DataCap Issue

  • Problem description: In some applications, the dataset size does not exactly match the requested DataCap.
  • Analysis: The community had a full discussion on this issue and finally reached a consensus that the requested data storage should be 100%.
  • Improvement Measures:
    • For old applications that have been completed or are in progress, we will make adjustments according to the actual situation.
    • For new applications, we will focus on the consistency between the dataset size and the requested DataCap to ensure that the data storage fully meets the community standard.

2. DataCap limit per client issue

  • Problem Description: There is a misunderstanding about the DataCap limit per client.
  • Clarification Note: The exact expression should be “5 PiB per client per round” instead of “5 PiB in total”.
  • Improvement measures:
    • In future applications and communications, we will clearly express the limit per round to avoid misunderstandings.
    • Provide clearer application guidelines to help customers understand the DataCap allocation rules.

@Marshall-btc
Copy link
Author

Marshall-btc commented Feb 28, 2025

Marshall Allocator's Operational Highlights
Although we have encountered some problems in the course of our operations, we have always maintained an open and transparent attitude and made positive improvements. At the same time, we have achieved some positive results in the following areas:
1. Rigorous KYC and user identification due diligence

  • We ensure the authenticity of our clients' identities and the legitimacy of their data through our rigorous KYC process.
  • Provide relevant evidence of our user identification and due diligence efforts.

Image

Image

2. Reasonable limitations on DataCap approvals based on actual client cases

  • We reasonably allocate DataCap according to clients' actual needs and storage capacity to avoid resource wastage.
  • Provide relevant cases to demonstrate our flexibility and rationality in DataCap allocation.

Image

Image

3. Monitor DataCap usage throughout the process and ask for an explanation from the client.

  • We monitor the use of DataCap throughout the process to ensure that it is appropriate for the purpose for which it was applied.
  • When problems are detected, we promptly communicate with the client and ask for a detailed explanation and an improvement plan.

Image

Image

Image

4. Random sampling of SPs locations

  • We conduct random sampling of the geographic location of SPs to ensure diversity and reliability of data storage.
  • Provide relevant evidence of our efforts in the selection and management of SPs.

Image

Image

@Marshall-btc
Copy link
Author

Marshall-btc commented Feb 28, 2025

Summary
We have always operated Marshall Allocator in an open, transparent and accountable manner, responding positively to feedback and suggestions from the community. Despite some challenges, we believe that through continuous improvement and close collaboration with the community, we can create more value for the Filecoin ecosystem. We look forward to receiving support from 20 PiB DataCap and plan to add 3-5 new clients in the future to further promote the health of the Filecoin network.
Thank you again to @filecoin-watchdog for your valuable feedback, and we will respond to the community's expectations and work together to build a stronger Filecoin ecosystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

3 participants