Logo
    📈

    SnapDeals: Implementation breakdown and timeline2 - (approved 1.0 version)

    This is a short outline of features and changes needed to bring CC sector upgrades into reality

    Intro:

    Since 90+% of sectors in the Filecoin Network are CC sectors, having a protocol that allows for updating CC sectors to store real data without incurring in a full re-sealing would massively improve our network in terms of the amount of real data stored through it.

    Relevant docs/links:

    • https://github.com/filecoin-project/FIPs/issues/131 - FIP Issue
    • https://docs.google.com/document/d/1lerNbgVMBBEDOHC4dqexiS-IKUDOBOy_eTu7fjyv5gs/edit# - Lightweight Sector Updates in Filecoin

    Implementation breakdown, per component:

    Actors:

    • Finalizing the construction
      • One message
    • Actor update to v6
    • DeclareUpdate implementation
      • Looks like we won't need this message
    • ProveCommitReplicaUpdate implementation
    • Remove old cc sector upgrade path
    • Actors Tests
    • State migration

    Proofs:

    • New: Sector upgrade circuit
      • Circuit design
      • Audit
      • Implementation
      • Phase 2 Trusted Setup
    • Encoding Decoding function
    • Integration into the system
    • New API endpoints
      • EncodeNewDeals (encode a set of new deals into given CC replica)
      • VerifyReplicateUpdate
      • ComputeReplicaUpdateProof
    • FFI Integration

    Lotus:

    • Change: Lightweight CC upgrade pipeline
    • FFI integration
    • Actor and State upgrade integration
    • Remove old cc upgrade feature from lotus
    • Testing

    Audit:

    • We need an internal audit - It can be done right after the actor's implementation - It can be done by @Nicola and @Friedel Ziegelmayer
    • We need an external audit - we have a list of vendors for this - It is optional for this implementation
    • We need to do a circuits audit - we worked with J.P. before on this - we will again on this, with JP

    Trusted setup:

    • Participants will be selected from the groups we used for the previous one. They need to be lined up + given guidance before we actually start running the setup. This would mostly be under the TPM scope of work - estimated at 1 week, we should start this process 3 weeks before we start running a trusted setup.
    • Running the trusted setup - 4 weeks would be needed to complete this process
    • An open question, which will be answered only when we implement the circuits, is: how long will the trusted setup last per user, how long their runs will be?
    • We might be able to do some parallelizations since we can break validators into groups: 64 GB or 32GB test groups.

    Implementation Roadmap items

    Q3 Plan
    How much time it will take - estimation
    Man-days estimation
    DRI: Who own this line of work and deliveries
    Comments/Flags
    Current status
    Proof writing

    Aug 25th - October 15th

    ~ 40 days

    @Friedel Ziegelmayer @

    We have 2 big unknowns: 1. Integration of new circuits into the system, we don’t know how that will impact things. 2. Circuit design changes are always tricky.

    In progress
    Actors updates

    August 11th - August 24th

    ~ 11 Days

    @ToBeRemoved

    We can start internal audit of the implementation as soon as we have the actors implementation

    Backlog
    Lotus proofs integration

    August 26th - Sept 3rd

    ~ 5 days

    @

    We can start earlier with mocking and opening the API's, we don’t need to wait for entire proofs implementation to be done

    Backlog
    Lotus actors integration

    Sept 5th - Sept 13th

    ~ 7 days

    @ToBeRemoved @

    Backlog
    Lotus changes to the storage subsystem

    Sept 14th - Sept 24rd

    ~ 5 days

    @Magik6k we need to help with this

    It will take 5-7 days for @ or @ToBeRemoved to do it. Would be a great Knowledge sharing opportunity.

    Backlog
    Write tests and validate the implementation - Actors

    September 24th - October 1st

    ~ 5 days

    @ToBeRemoved

    Backlog
    Write tests and validate the implementation - Lotus

    October 1st - October 15th

    ~ 10 days

    @

    Backlog
    Gas estimation for Proof verification

    October 15th - October 21st

    ~ 4 days

    @

    Backlog
    Audit (internal + external)

    October 10th - November 10th

    ~ 20 days

    @Friedel Ziegelmayer + @Nicola for the internal audit, @ to TPM the Circuits audit with external vendors(J.P.)

    We can start the audit as soon as we have the proofs writing done

    Ongoing discussion
    Organize participants for trusted setup

    October 29rd - November 7th

    ~ 7 days

    @ @jennijuju

    Backlog
    Trusted Setup

    November 10th -December 17th

    ~ 25 days

    @ + development team representative

    We would need new parameter files, which will generate new snarks. There is a possibility that we can start circuits trusted setup before the planned timeline. We would need approval from all side that we should do this - we freeze the circuits and start setup on that part

    Ongoing discussion
    Testnet validations, hot fixes and delivery of final Release candidate

    December 20th - January 15th 2022

    ~ 20 days

    @, @ToBeRemoved

    Backlog
    Network upgrade + support

    January 15th 2022 - 5th of February 2022

    ~ 15 days

    @jennijuju @

    since we need to make good on our promises for Exchanges and Miners - 2 weeks of notice period

    Backlog

    SUM: 167 Man-Days

    • Timeline snapshot before changes made in Madeira - 2021-11-04 - @ :
    image

    Comments:

    2021-08-05 @Steve (biglep) thoughts:

    Thanks a lot for putting this together! A few comments:

    1. What options do we have for pulling this in tighter?

    @ from my perspective, options/comments are:

    • @ or @ToBeRemoved to start work on Lotus + actors changes, since it’s not connected to Proofs writing, try to finish that work ASAP and jump on the Proofs side to Help Nemo with Proofs implementation
    • I see no options in doing internal Audit + circuits audit and the trusted setup any faster, based on the feedback I got Friedel and Deep K.
    • RC timeline is intentionally increased for a week since we are counting that a lot of "delivery-critical" people will on holidays
    1. It's useful seeing the breakout of tasks - great. Did we estimate those individually to roll up into larger buckets?
    2. How are we accounting for if we have multiple engineers on an item, or are all things single-threaded?
    3. Are we applying any scaling factor to some of the dev estimates? If so, what is it.
    4. Can we break the thing down a little more where there is dependencies/blocking? For example, we say "We can start an internal audit of the implementation as soon as we have the actors implementation". Are we modeling this? Another example is that it seems like we might need a task for "create mocked proofs" which then unblocks Lotus work.
      1. To make the modeling more clear, we can link related items (link rows in the table to themselves. Here's an example: https://www.notion.so/protocollabs/biglep-af9cf9adf6b5486788faddf40d3026d0
    5. Do we need to do any further design work earlier on?
      1. @: Luka and I are planing some circuit design work for next week (to speed up proofs implementation). Some aspects of lotus+proofs integration needs a bit more design but that is factored in.
    6. Is documentation accounted for?
      1. @: Yes but it depends on how we integrate this capability. My plan was to allow miners (via a config) to either seal deals into new sectors or CC sectors. We can in future enable per deal decision making but IMO the config would be enough for v1.
      2. In general if lightweight upgrades work as well as they seem, in future PoRep designs we may consider not allowing data to be added at the time on sealing.
    7. Can we describe a bit more the automated testing story? I assume we need the ability to run a testnet and go from CC sector to the upgraded sector (asserting the happy path and various failure scenarios). Do we have the ability to write a test like that today? If not, is there some integration test frame additions we need to account for?
      1. (Indent) with the integration test refactor that has landed, we posses the capability to test this
    8. Right now we have a table of tasks above and then a timeline view of those tasks (copy/pasted) I think. It seems like there's a margin for error with the copy/pasting and ensuring that the dates actually account for the amount of estimated time. Your call, but one idea to tighten this up may be to:
      1. Have separate columns for start/end date and a derived column for the difference between the two to ensure that aligns with the actual estimates
      2. Use views one database: have a table view and a timeline view. This avoids having two databases. Example: https://www.notion.so/protocollabs/biglep-af9cf9adf6b5486788faddf40d3026d0
      3. image

    CryptoNet is a Protocol Labs initiative.