A lot has and is being said about the recent “outage” witnessed on the Avalanche network.
While its clear that this was a cross-subnet communication problem, and no real outage happened, there’s quite a lot of misunderstanding (and probably intentional misrepresentation of facts), particularly with regards to the consensus protocol and claimed throughputs.
In anticipation of the team’s detailed report outlining the root of the problems, I thought I’d list a few other key points of discussion I’ve come across that might be useful addressing in the report.
Was the fix a hard fork, a soft fork, a re-org or non of these?
Hypothetically, if a safety condition was violated, or other unforeseen issue were to arise, would a hard fork at all possible on Avalanche? [ref]
Under what failure conditions, if any, would validators need to be “unbonded” and the network rebooted? [ref]
Why was a claim made (on the quick post mortem published) that a rollback was infeasible? [ref]
- For reference: “At first thought, the easiest way to fix a problem like this may seem to be to rewrite the blockchain and undo accepted transactions. But because no single entity is in control of the Avalanche network or controls a sufficient percentage of the nodes to do so, such an approach was actually infeasible. This is a good thing, as it shows that the Avalanche network is truly decentralized.”
How would you describe the resulting ask to the community to upgrade? Was it “social coordination”? Couldn’t have this approach been used to rollback the invalid states as well?