The Billion User Social Graph
We don't have to live with social network lock-in. The tools for fixing this are widely & freely available.
With Elon Musk’s recent takeover of Twitter, the chatter about migrating away from the big social networks and onto independent or open alternatives has moved from the right side of my feed to the left side. But all of those who are new to the fantasy of joining a thriving Twitter ex-pat community in some greener lands will soon run into the problem the right has been struggling with since the big, cross-platform post-J6 social media purge: network lock-in is real.
Yeah, you can theorize and strategize about coordination problems, preference cascades, signaling, and other game-theory-esque concepts — I don’t deny these are all useful ways of understanding the problem — but all you really need to know to understand the powerful hold Twitter and Facebook have on hundreds of millions of us is a simple heuristic from the dawn of the network age:
Metcalfe's law states that the value of a telecommunications network is proportional to the square of the number of connected users of the system (n2). First formulated in this form by George Gilder in 1993, and attributed to Robert Metcalfe in regard to Ethernet, Metcalfe's law was originally presented, c. 1980, not in terms of users, but rather of "compatible communicating devices" (e.g., fax machines, telephones). Only later with the globalization of the Internet did this law carry over to users and networks as its original intent was to describe Ethernet connections.
It will always be nearly impossible to get people to abandon a large, dense network graph in favor of a small, sparse one, for the sole reason that the former is valuable and the latter is not.
Weirdly enough, though, web3 fixes this. Or, at least, it could fix it if we use a few simple smart contracts to turn the blockchain from a massive users
table to a massive social graph.
Rationale and previous work
The blockchain can and does function as a massive, shared users
table that’s open, public, and not controlled by any one entity. As I wrote in The Billion User Table:
Here's what's coming: the public blockchain amounts to a single, massive users table for the entire Internet, and the next wave of distributed applications will be built on top of it…
In place of a decentralized network of user data silos connected by APIs, there’s a single decentralized user data store accessible via an open protocol and a decentralized network of storage nodes. So the identity-hosting blockchain represents decentralization at the datastore implementation layer, and recentralization at the datastore access layer…
Imagine that LinkedIn, Reddit, and Github all port their users tables (along with much of their proprietary data, like endorsements, karma, and activity history) to BitClout. Immediately, here's what happens: every Github user is also a Reddit user and a LinkedIn user and a BitClout user. Likewise, every Reddit user is also a Github user and a LinkedIn user and a BitClout user. I could go on, but you get the point.
Every company that builds on the same virtual users table has immediate access to the network effects of every other startup on that table. Every time an on-chain company onboards a new user, then your service has a new user, as well. (In a manner of speaking. They may not be actively using your service yet, but they effectively have an account on it.)
That earlier article uses Bitclout (the chain from that project is now known as DeSo) as the paradigmatic example of a blockchain that can support this kind of use case. But as excited as I was about the whole DeSo thing, it has not worked out so well.
This isn’t the place for a Bitclout/DeSo post-mortem, but it is instructive to flag one aspect of that blockchain because it matters for the present discussion. Bitclout was an effort to put an entire social network on-chain, where each post was written to the chain as an object that could accrue revenue (via Bitclout diamonds). This was clever, but any blockchain that tries to host actual content is going to see its data needs grow non-linearly with the number of users and connections.
The Bitclout team was very familiar with this unbounded data growth problem and spent a lot of real engineering effort on solving it. But in hindsight, I actually think they tried to do too many things at once. They should have just focused on the problem of social graph portability.
To frame this in the same database terms I used in my earlier article, Bitclout tried to put all of the following tables on-chain (plus a few more that were Bitclout-specific):
users
user_follows_user
posts
user_likes_post
Those last two tables were always going to blow up and get unwieldy in any kind of upside scenario. It couldn’t be otherwise.
So I think a better approach is to take an existing blockchain, which is essentially already that first table (i.e., users
), and add a user_follows_user
join table to it. (We could also expand with joins for other types of relationships, like user_mutes_user
, but let’s keep it simple for the moment.)
This users-to-users join table will also grow non-linearly with the number of users, but the growth will be slower, and more importantly, the amount of additional data needed in order to represent it (= the amount of additional blockspace consumed by it) will be far lower than with a posts
table.
I suggest this because the user and follower relationships make up the primary source of lock-in for every large social networking platform. If your entire Twitter or Facebook social graph were open and readily available to other social platforms that want to host posts and other, more data-intensive parts of the social networking experience, then these platforms would have essentially zero lock-in.
How an on-chain social graph might look
Imagine, for a moment, that my entire Twitter graph is represented on-chain — both the actual accounts and the follower relationships. To view Twitter posts (and associated likes, retweets, quote-tweets, etc.) from that graph, I need to connect to Twitter.com with my wallet. But let’s say I want to jump over to tribel.com, or gab.com, or some other social platform with its own particular slant and moderation policies — if they can read my social graph from the blockchain, then I can connect my wallet there and see the same connections, and see any posts they make on this other site.
This may not sound like much at first, but consider the fact that if I follow someone new on Tribel, then I’m also now following that person on Twitter and on Gab — and on every other social platform that uses that same on-chain graph for users and relationships. Unfollows and blocks would work the same way — do it once in one location, and the changes to your graph are instantly reflected everywhere.
Now, those of you who are gaming this out as you’re reading have already realized what would inevitably happen in a world that works as described above: someone would make an omni-client that would let you read and post from any or all of these networks via one interface. Then there would be no point in having separate services, and they’d all go out of business… or would they?
A preview of things to come: phone numbers + contacts + messaging apps
The world I’m describing already exists in a kind of prototype state, in the form of competing messaging protocols that are all tied to your phone number and populate themselves from your contacts database. The phone number system is a prototype of the billion-user table, and the distributed contacts apps that all can read and write the standard vcard format constitute the relationship graph that’s built on top of this table.
There’s a multitude of messaging protocols that piggyback on this phone number + contacts combination, and the results are kind of like what I’m describing here for social networks. For instance, when you first log into Telegram, it scans your contacts and then you instantly have your existing network in this new app.
The result is that you can choose to exchange messages with the same phone numbers via Signal, Telegram, WhatsApp, iMessage, or legacy SMS — it’s all in which message protocol you and the rest of your network want to use.
There’s also an eternal cycle of decentralization and recentralization of messaging apps that has been going on since the days of ICQ and is still happening in the era of WhatsApp/Signal/Telegram/Facebook/etc. You can find any number of all-in-one messaging clients that support many of these platforms in one window.
None of these messaging apps are harmed by the fact that they all draw their identity from the same, open phone number system and interoperable ecosystem of contacts apps and services — they all coexist and bring different things to the table, and many of us switch between them to talk to different subgraphs in our contacts that have different needs and preferences. I’d expect this dynamic to persist if we moved the social graph on-chain.
A word about composability and social relationships
Different platforms have different types of social connections users can have with each other. Facebook has friend, follow, and block. Twitter has follow, mute, and block. These are fine for these platforms, but we can improve them to make them more suited to the blockchain by making them more composable.
Composability is a computer science term that means, roughly, you can mix and match these small, discrete, clearly defined tools in order to get different effects and functions.
Consider the Facebook “friend” — it’s its own type of connection, but it also implies a “follow,” because when you friend someone you automatically follow them. On Twitter, a “block” implies a “mute,” because when you block someone you’re essentially muting them and also preventing them from seeing your posts.
For my own two social graph proposals, below, I’d like to suggest the follow, more clean and composable set of social graph relationships:
Follow: You can read the posts of someone you’ve followed.
Mute: You cannot read the posts of someone you’ve muted.
Ghost: Someone you’ve ghosted cannot read your posts.
Under this scheme, a block is a mute plus a ghost, so it’s two operations on the same target address, composed together (i.e. if I wanted to block annoyinghater.eth
, I’d mute that address and ghost it).
If I want to see someone’s posts but prevent them from seeing my own posts, I could follow plus ghost them. Or, I could follow plus mute a frenemy if I want to retain the option of reading them by navigating to their feed or by periodically unmuting them.
I tried to clean up the relationships this way because it makes it easier to reason about the contracts and relationships in the next sections.
Some background for my two proposals
In the remainder of this article, I present two proposals for layering a social graph onto the billion-user table.
The first, On-Chain Graph (OCG), is more open and simple, but it’s also more expensive in terms of fees, so some will like it and some won’t.
The second, Chain-Linked Graph (CLG), is more complex but cheaper and offers more control and privacy, so I expect most will prefer it. Platforms could, however, support both.
To really understand these two proposals, you’ll need some basic familiarity with the following concepts:
Non-fungible tokens (NFTs) and Non-fungible Non-transferrable Tokens (NTFTs, also called soul-bound tokens).
Ethereum Name Service
Smart contracts
It will also help to know a little bit of Solidity, Ethereum’s smart contract programming language. If you’re fuzzy on one or all of the above, I’ve tried to write this in a way that you should still be able to grasp the basics.
For both of these proposals, I assume we’re using ENS as the root of identity, and adding new address records to it that contain the addresses of some pretty standard-looking ERC721 NFT contracts that represent each of the three types of social relationships I outlined above (follow, mute, ghost). What these three contracts do is very different from one proposal to the other, but the basic idea of putting their addresses into three special ENS address records stays the same.
I also would like to propose an additional ENS record for a social profile data URI, so you can update your social data without burning gas. A proposed profileURI
record would link to a JSON object hosed on some third-party platform that looks something like this:
curl https://jonstokes.com/jons-profile.json
-H "Accept: application/json"
{
"name": "jonstokes.(eth|com)",
"bio": "Writer. Coder. Doomer Techno-Optimist. Cryptography Brother.",
"website": "https://jonstokes.com/",
"location": "Austin, TX"
}
Some of the stuff in the profile JSON is be redundant with existing ENS fields, but that’s ok; the point of this is to give social platforms something to display and to give users the ability to make changes to their social profile without spending gas to update ENS records.
Proposal 1: On-Chain Graph
The On-Chain Graph idea uses NTFTs to represent the three types of relationships described above. For the following three social contracts, the same wallet that holds the ENS NFT should also own these contracts that their three corresponding ENS address records should point to:
OCG Follower: When you mint an NTFT from my OCG Follower contract to your wallet, then you follow me. Either of us can burn that NFT to cause you to unfollow me.
OCG Ghosted: When I airdrop an NTFT from my OCG Ghosted contract to your wallet, I’ve ghosted you. I alone can burn this NTFT to unghost you.
OCG Muted: When I airdrop an NTFT from my OCG Muted contract to your wallet, I’ve muted you. I alone can burn this NTFT to unmute you.
The semantics of all three of these are essentially, “You are X in relation to me, the contract owner,” where “X” is one of follower, ghosted, or muted.
Here’s a sample follower contract that shows what this might look like:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.4;
import "@openzeppelin/contracts/token/ERC721/ERC721.sol";
import "@openzeppelin/contracts/token/ERC721/extensions/ERC721Enumerable.sol";
import "@openzeppelin/contracts/security/Pausable.sol";
import "@openzeppelin/contracts/access/Ownable.sol";
import "@openzeppelin/contracts/token/ERC721/extensions/ERC721Burnable.sol";
import "@openzeppelin/contracts/utils/Counters.sol";
contract OCGFollower is ERC721, ERC721Enumerable, Pausable, Ownable, ERC721Burnable {
using Counters for Counters.Counter;
Counters.Counter private _tokenIdCounter;
constructor() ERC721("OCGFollower", "OCGF") {}
function _baseURI() internal pure override returns (string memory) {
return "https://jonstokes.com/ocg/follower";
}
function relationship() public {
return "ocg follower";
}
function pause() public onlyOwner {
_pause();
}
function unpause() public onlyOwner {
_unpause();
}
function safeMint(address to) public {
//Prevent anyone but the owner from minting
//a token to an address they don't own.
require(isOwner(_msgSender()) || (_msgSender() == to), "Unable to mint to this address");
uint256 tokenId = _tokenIdCounter.current();
_tokenIdCounter.increment();
_safeMint(to, tokenId);
}
function _beforeTokenTransfer(address from, address to, uint256) pure override internal {
//Disable token transfers.
require(from == address(0) || to == address(0), "Cannot be transferred.");
}
// The following functions are overrides required by Solidity.
function supportsInterface(bytes4 interfaceId)
public
view
override(ERC721, ERC721Enumerable)
returns (bool)
{
return super.supportsInterface(interfaceId);
}
}
If you speak Solidity, you can see what this very simple (and untested!) contract tries to do.
First, the extensions:
The
ERC721Enumerable
extension is included so the token holders can be listed out by social network clients without having to scan the entire chain.I’m using
Pausable
because you should be able to pause minting in order to essentially lock your account for a while, i.e., stop accepting new followers.Ownable
is essential because there are some things only the contract owner should do. I didn’t see the need for the more powerfulRoles
feature.ERC721Burnable
is here because you’ll need to be able to burn tokens in order to remove a follow relationship. The standardburn()
function included with this has the permissions we need, i.e., only the owner or the token owner can burn a token.I included
Counters
so that thetokenID
is auto-incremented, which is convenient.
Now for the modifications to the output of the OpenZeppelin Wizard:
safeMint()
is modified so that only the contract’s owner can mint a token to someone else’s address. For all non-owners, you can only mint to the address you’re calling the contract with._beforeTokenTransfer()
is overridden so that it essentially disables to the ability to transfer the tokens, creating a simple soul-bound token.The
relationship()
function is a convenience method that ensures there’s an easy way to query the contract and confirm what kind of relationship the NFT represents. I’m not married to including this, but it seems useful.
This is all really straightforward, and for the OCG Ghosted and OCG Muted variants, you’d make the following small changes:
Change the contract name and symbol
Change the return values of
relationship()
and possiblybaseURI()
to reflect the relationship you’re representing (i.e., “muted” or “ghosted”).Make both
safeMint()
andburn()
intoonlyOwner
functions, so that only the contract owner can call either one.
Obviously, it would be up to platforms to honor these contracts (i.e., follow, ghost, mute) in the right way. This is less threatening and precarious than it sounds, though, because if a particular social platform isn’t honoring a contract you care about, just don’t use it.
Adding pay-to-follow
You could add payable
to safeMint
and then use a setMintRate
to set a price that people have to pay you for follows. So something along the lines of this:
uint256 public mintRate = 0.01 ether;
function setMintRate(uint256 mintRate_) public onlyOwner {
mintRate = mintRate_;
}
function safeMint(address to) public payable {
// Require pay-to-follow
require(msg.value >= mintRate, "Not enough ether to mint");
//Prevent anyone but the owner from minting
//a token to an address they don't own.
require(isOwner(_msgSender()) || (_msgSender() == to), "Unable to mint to this address");
uint256 tokenId = _tokenIdCounter.current();
_tokenIdCounter.increment();
_safeMint(to, tokenId);
}
I’m sure there is any number of other tweaks and features I could think of to add to this proposal, but it’s best to start with something simple and easily understandable.
Proposal 2: Chain-Linked Graph
The OCG contracts described above are simple enough, but that scheme has some qualities that will probably divide a lot of people:
Everything is public and on-chain, including ghosts and mutes. You couldn’t do locked or private accounts this way, but the fix for this might be to use an alt account.
Every action costs gas, which means that you have to make real choices about who you follow, ghost, and mute. But if gas fees are high enough, then this may make the network unusable.
Pay-for-follow may or may not be a desirable feature for a network or a particular account, but you’d have the option.
Given that not everyone is going to love these qualities of this proposal, I want to propose an alternate set of social contracts that give users and platforms more granular control, especially over who can see what kinds of information, and cost less gas to use.
The basic idea behind Chain-Linked Graph (CLG): Instead of representing social relationships (follows, ghosts, mutes) directly on-chain via NFTs, we store these relationships off-chain and use on-chain tokens for discovery of and access to those relationships.
Discovery: The contract provides a
listURI()
function that returns a link to a JSON list of ENS names that you intend to declare some kind of social relationship to (i.e., I follow them, I mute them, or I ghost them.)Access: If the link returned by
listURI()
is token-gated, then contract’s token grants the bearer read access to the link found in the metadata.
The graph, then, is not directly on-chain but is linked to the chain by a set of contracts and URLs.
As with OCG, each of the three kinds of social relationships is governed by a smart contract, but the semantics for CLG are different:
Follows: contains a link to a JSON list of ENS names you’re following, and tokens issued by it grant read access to that follows list.
Mutes: contains a link to a JSON list of ENS names you’re muting, and tokens issued by it grant read access to that mutes list.
Ghosts: contains a link to a JSON list of ENS names you’re ghosting, and tokens issued by it grant read access to that ghosts list.
So the semantics of the CLG token are, “Here’s read access to a list of accounts I X,” where “X” is “follow,” “mute,” or “ghost.”
You can think of what I’m proposing in this section as a close analog to the phone number + address book combo I described for messaging apps. Your phone number is (quasi-)public, and when you connect a new messaging app to it you can grant or deny that app read access to your contacts.
In my CLG social token scheme, your ENS name is public like your phone number, and you issue and revoke tokens in order to grant and deny read access to the lists of people you’re related to in some way. You can grant these tokens to random users if you want to, but mainly you’ll be granting them to social platforms so that those platforms know whose posts to show you and whose to hide (or who shouldn’t be seeing your posts).
(Write access to the lists that make your social graph would probably be token-gated by your normal ENS NFT -- if your wallet has your ENS name in it, you can make write/update/delete changes to the lists. A possible alternative would be to have a fourth social contract that grants list write access to NTFT holders so that you could outsource list management to some third party.)
Hosting these lists off-chain, while pointing to them from on-chain, gives a few advantages:
You can lock your relationships down from public viewing by using authentication on the endpoint hosting the list. Or, you can leave it open to the public so that anyone can read it.
It doesn’t cost gas to update an off-chain list.
This scheme enables the creation of a market for social graph hosting services that interoperate with social providers.
Any person or service can easily discover your lists.
Token gating and read access
The key innovation that enables the CLG contracts is token gating. The concept behind a token gate is that you can’t access a particular piece of data on a host unless you connect to that host with a wallet containing a specific access token.
You can token gate content on IPFS, for instance, so that only readers who connect to an endpoint with a particular NFT in their wallet can view a particular file.
CLG uses token gating to add some indirection to our social contracts, so that instead of a social NFT representing a particular type of relationship — a follow, mute, or ghost — it represents read access to part of your social graph.
Obviously, for token gating to work, a platform has to honor it. Presumably, if platforms are not honoring a token gate, you’ll move your relationship lists to some other platform and change your contracts, reissuing any NFTs as necessary.
Also, to be clear, some people’s lists are going to leak at some point. We live in a world of blockbuster personal data leaks, so if data is hosted somewhere then some of it is going to get out. I’ll discuss some possible mitigations in a later section.
A sample contract: CLG Follows
The follows contract would be a standard ERC721 NTFT contract that’s very close to the contract described above for OCG:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.4;
import "@openzeppelin/contracts/token/ERC721/ERC721.sol";
import "@openzeppelin/contracts/security/Pausable.sol";
import "@openzeppelin/contracts/access/Ownable.sol";
import "@openzeppelin/contracts/token/ERC721/extensions/ERC721Burnable.sol";
import "@openzeppelin/contracts/utils/Counters.sol";
contract CLGFollows is ERC721, Pausable, Ownable, ERC721Burnable {
using Counters for Counters.Counter;
Counters.Counter private _tokenIdCounter;
constructor() ERC721("CLGFollows", "CLGF") {}
function _baseURI() internal pure override returns (string memory) {
return "https://jonstokes.com/clgfollows/";
}
function listURI() public {
return "https://jonstokes.com/clgfollows/list";
}
function relationship() public {
return "clg follows";
}
function pause() public onlyOwner {
_pause();
}
function unpause() public onlyOwner {
_unpause();
}
function safeMint(address to) public onlyOwner {
uint256 tokenId = _tokenIdCounter.current();
_tokenIdCounter.increment();
_safeMint(to, tokenId);
}
function _beforeTokenTransfer(address from, address to, uint256) pure override internal {
//Disable token transfers.
require(from == address(0) || to == address(0), "Cannot be transferred.");
}
}
All the extensions are the same as OCG, except I didn’t include ERC721Enumerable
because it’s not clear that anyone would want to have their CLG Follows tokens enumerated (plus it raises the gas cost of minting).
As for functions, I’ve made the following changes to the output of the OpenZeppelin wizard:
relationship()
: As with OLG, this returns the type of social contract. Again, this is probably not necessary with Solidity contracts and I’ve not seen it done, but nonetheless, it feels to me like I want the contract to self-report its type. So I dunno — ignore if this offends you.listURI()
returns a link to a JSON object that’s just a list of ENS names you’re following (or muting or ghosting, depending on the contract type). The expectation is that this URI would be token-gated for privacy, but it’s not required.
Mostly you’d use the CLG Follows NTFT by issuing it to an address owned by a social platform. That way, the platform could read your follows list and show you the correct posts.
But you could also issue these NTFTs to followers so that your followers could discover other followers. You’d do this by either airdropping to followers or by unpausing minting and letting anyone mint.
All the other contracts would work exactly as above, but have different names and symbols, and return different values from relationship()
and listURI()
.
Possible variations
If you’re worried about your list leaking from different services, it would be pretty straightforward to turn listURI()
into something that works more like tokenURI(uint256 tokenId)
, i.e. the signature is listURI(uint256 tokenId)
and it concatenates the tokenID
to the end of a base URI so that each token holder gets its own list URL. This feature, in combination with some logic on the list’s host, would let you silo the lists so that different token holders get different subgraphs of your main graph. That way, if one platform gets owned then only that part of my graph has leaked.
As with OCG, you could make safemint
into a payable
function and charge people to access your lists. See the code in the OCG section for an example of how this might look.
You might want the ability to update the URLs returned by tokenURI()
and/or listURI()
, in which case you’d want to store these URLs in variables, initialize them in a constructor, and provide onlyOwner
setter functions for updating them. This would increase your cost to mint, but that may not matter if you’re only planning to give them out to services and not individuals.
Services
Both of the proposals outlined here offer a number of places where centralized, hosted services could be a big help — even if it’s just a stop-gap while the ecosystem transitions to something distributed like IPFS.
The most obvious type of service would be for hosting anything returned by one of the URI functions — profile data, NTFT metadata, and token-gated JSON lists (in the case of CLG).
Another useful service would be a kind of specialized version of Infura that exposes on-chain social data via an API. Or, Infura could just offer a specialized API for the social data.
Finally, there could be third-party services for verification of accounts, for users and organizations who want that kind of thing.
Conclusions
I don’t know that I expect either of my on-chain social graph proposals to be adopted in the form I’ve described here. I offer these ideas more in the spirit of sparking conversation about how we could usefully transition from the current state of total platform lock-in to something more portable, where you own your graph and can easily move it around with you.
There are parts of the above that look a bit like the web5 proposal, but the key differences are that my two ideas are designed to be simpler and to take advantage of smart contracts and existing on-chain identity providers (ENS, but also similar providers on other chains).
If you take away nothing else from this article, I hope I’ve at least made the case that in a world of distributed ledger technology and smart contracts, there’s really no need for any of us to be locked into a social network in 2022. The tools to fix this lock-in problem are widely available, we just have to pick them up and use them.