This is a transcript from the AI and the Future of Work podcast episode featuring Paul Lee, co-founder of Synesis One, discusses the future of NLP and AI data harvesting using games and blockchains to earn NFTs
Dan Turchin (00:17):
Good morning, good afternoon, or good evening, depending on where you’re listening. Welcome back to AI in the future of work. Thanks again for making this one of the most downloaded podcasts about the future of work. If you enjoy what we do, please like comment rate and share in your favorite podcast app, and we’ll keep sharing great conversations like the one we have today. I’m your host, Dan Turin advisor at insight finder, thus system of intelligence for it, operations and CEO of people reign the AI platform for it and HR employee service. We haven’t talked much about web three in the mechanics of decentralized communities, or even blockchains on this show. 120 shows in imagine that, huh, it’s obvious that cryptocurrencies are disrupting the legacy financial system. And increasingly that smart contracts on blockchains are disintermediating power dynamics entrenched in almost every industry, not just financials, but healthcare, education, defense, and even manufacturing.
Dan Turchin (01:18):
Let’s go on a thought experiment here. What if we could combine the world’s data management for AI, which we usually talk about with Dows or decentralized autonomous organizations to gamify the collection of training data via actual multiplayer games. And let’s say, what if a Dow existed to create an issue, a token to reward players in online games with NFTs, from a marketplace? Well, that bold vision launched recently in the form of a Dow called cynic called cynics one, the CEO and founder of cynics one is Paul Lee. And today we’ll hear about his journey. Paul’s a serial entrepreneur and also a trained biochemist and licensed a medical doctor before founding cynic one last year, Paul co-founded mind AI. And before that purely a care marketplace for consumers and doctors, Paul received his undergrad degree from Oxford and his MD from the Catholic university of Korea. Now he’s out to demonstrate the power of web three to disrupt how data is collected to train large language models. I’ve really been looking forward to this one without further ado. Paul, welcome to AI in the future of work let’s get started by maybe having you share a little bit more about your background and gosh, how you got into this space.
Paul Lee (02:38):
Lovely. Thank you, Dan. Thank you for having me. Well, it’s good to be on the show. A little bit about myself. I’ve been involved in the AI and the blockchain space since 2012. AI we’ve actually started building it, doing the R and D since 2008. So literally before the formation or the incorporation of mind, AI, it’s been almost 10 years in the R and D phase. And now we’ve commercialized a couple of, well, literally last year at the end of last year. So it took 13 years to build and really get it right. And then obviously the sister company cynic one was created because we saw that the bottleneck for AI companies such as mind, AI is gathering natural language, data sets. But however, we call these ontologies and gathering these ontologies have a lot of problems.
Paul Lee (03:42):
I mean, you know, we can’t just do it like a mechanical Turk because the quality of the data sets as well as the workers who create them. Aren’t very good. So we wanted to incentivize, you know, the community who will be building these data sets and reward them, but also allow the buyers to be gone and include people from all kinds of races, all kinds of cultures, as well as all different languages around the world. So cynic one was born now, cynic one focuses on AI as well as gaming as well as NFT. So the ecosystem is very, very robust and we are basically trying to gamify how people around the world can contribute to data so that we can monetize these data sets as well as securitize. It
Dan Turchin (04:39):
I’ll give us an example of a task that a player might might do in a game that would contribute training data in the form of these ontologies.
Paul Lee (04:50):
Right? So, I mean, for example well, in the beginning we provide NFTs, which are words, okay? So we’ve just produced 10,000 word NFTs, which we call cans. Now these words are obviously the unit of natural language and what happens is you own an NFT that word, if it’s being used by AI companies around the world, around the world for their usage, then you get rewarded such as let’s. Let’s give you an example. So let’s say I have a word want then if mind, AI, for example, is building conversational AI chat bots for e-commerce companies. Then, you know, people, the end customers will be asking for, I want this, I want that. So the word want will be used very, very often every time that word want is used. That means the owner of that NFT would be rewarded. Now on the gaming side, if you own this Canon NFT, we would be able to this and play games, play word puzzles. I mean, there are multiple games that we have on our list. The first one quantums actually coming out next week. So it’s the 31st of March when it’s being launched. And it’s a combination of word puzzles as well as phrase puzzles. And it’s basically what we call code break. And those games will literally be the backbone of training. These a AI data sets, whether they are word utterances, whether they are ed utterances, these are all gonna contribute to making the AI data sets that is needed for natural language AI out there.
Dan Turchin (06:50):
So let’s build on that example. So I own the NFT or the can related to the word want. And let’s say like, in your example, mind AI licenses that Canon for me as the owner of the NFT, how do you wrap like an ontology or like, like a, an ontological understanding of the concept of want around that term? I assume mind AI must be purchasing a bunch of related concepts to want, not just the, not just the literal forward for four letter word.
Paul Lee (07:25):
Correct. So what happens is when we say sentence, for instance, for example, let’s give you an a quick example. I want a slice of pizza, for example, right? And then what we’re saying is the census, how many different ways can you say, I want a slice of pizza, so might say I want a pie. So might say I want a pepperoni, or I want a Hawaiian, or I want I, I don’t know. I want you know, all kinds of different ways of saying, I want pizza right now. These are the sentence utters that we would need to understand this sentence of. I want pizza, right? So we’re basically collecting the common sense knowledge, what we call the on apologies around. I want pizza, what pizza is, what want means, you know, how many different ways of, can you say pizza, whether it’s a pie, whether it’s, you know, I don’t know, a Korean way of saying a pizza all different ways, but these are all going to be collected through the gamification as well as the games such as quantums.
Dan Turchin (08:36):
So there might be, does
Paul Lee (08:37):
That make sense?
Dan Turchin (08:38):
Yeah. So there might be multiple owners of related concepts related, like, like you said, you know, wanting a pizza, et C could span multiple,
Paul Lee (08:48):
Multiple, well, no, no, no. So the NFT itself is owned by one person. Okay. So for example, the word pizza may be owned by a person want, may be owned by a person. So in the sentence I want pizza. Then if that phrase is being used multiple times by the AI companies, by the actual B to B companies that are using mind AI, right? Let’s say dominoes pizza dominoes pizza is using a conversational AI made by mind AI. Then the owner of the word want, and pizza will be rewarded every time that sentence structure or sentence utterance is being used.
Paul Lee (09:36):
So the NFTs are, are closely related to the on apologies on apologies can be words or it can be sentences as well. Right? So in order to understand this, you need to really dive deep into natural language semantics as well as you know, what ologies are, right? So the words we collect for example, are not just synonyms or ANMs they go into hypers hypers, you know, Holos and all sorts. So these will be collected by our date, the yield farmers, which are collecting and validating these ontologies for use in different domains and different subjects.
Dan Turchin (10:21):
That makes sense. So the tasks that would let me as a data yield farmer, a player of game take ownership of these NFTs span, like you said, hyper NIMS, hyper NIS of that concept, which in which itself came from this list of 10,000 words that cynic one seated the game with, is that a fair summary?
Paul Lee (10:44):
Close. So 10,000 words is not enough to fill the whole English language, for example, right? So we’ll be releasing more Canon NFTs that will be commonly used. So right now we’ve chosen 10,000 most commonly used English words to release our first set of canons. We’ll be releasing more canons as we go. So the usage of these words inside the ontologies for the businesses that are using these ontologies are crucial because the owner of the NFTs will be rewarded. But also we want the data yield farmers who are actually contributing to these ontologies. We rewarded heavily through gaming, as well as their work and their you know, participation in the ecosystem as a whole.
Dan Turchin (11:39):
So now let’s talk about the gamification mechanics. So talk us through the connection between the cannons, the NFTs, and then the token in the marketplace. So what can I then exchange these, as I understand it, I get tokens for data yield farming, and those can be exchanged in a marketplace for NFTs that might have value independent of the cannons.
Paul Lee (12:10):
Okay. So unpack that one for the, yeah, I think there’s a little bit of mixed up. Yeah. So the cannons are basically NFTs, right? So you can buy the NFTs and these NFTs will be going up in value. And also they’re gonna be providing you with passive income. That’s that’s the first thing. Also, once you have the incentive tees, you’ll be able to state them to play games or in order to maximize the rewards that you gain from these games, you can participate in data yield farming. So these are all governed by SNS tokens, which is the government’s token in the cynic one ecosystem. Okay. So you can earn and maximize these SNS tokens by contributing to the data you’re farming by providing the oncology such as, you know, how many different ways can you say, I want pizza, how many different ways can you say I want to buy something, right? Or like I said, it can also be word utterances. So once the data yield farmers go into the system of D I F they’ll see multiple topics and subjects that they can contribute to these are provided by the architects, and then they are built by the builders and they are validated by the validators. So there are three roles that you can play, and therefore there are different economic value for the work for the architects, for the builders and the validators.
Dan Turchin (13:51):
Excellent explanation. So I went and downloaded. I got, I got a, a Phantom Atherian wall, Atherium wallet, and I investigated quantum nosis and us. And it’s beautiful. And just credit, full credit to, to you and the team for designing a beautiful experience. I’m excited about the launch. You mentioned it’s upcoming when this gets published, it may have, it may it may have gone live already. Exactly. Talk us through the vision behind the game and marry the vision behind the game and what players can expect from the game, with the mechanics of the the NFTs and the cannons that you just described.
Paul Lee (14:32):
Sure. So in order to play the game, you either need to have the cannon tokens, which is the NFT, or you need a certain amount of the SNS tokens. Now, quantum Oasis is the first graphic novel NFT based game, right. Which means you are I’m, I’m sure you’ve seen a lot of graphic novels as well, but it’s animated very high quality puzzle games with integrated sci-fi story in them. Okay. So it will be, you’ll have around three puzzles to solve every single week. And every single time you solve a puzzle, you’ll be rewarded and you’ll go through the stages. And after it, you’ll be able to participate to maximize your reward through the gaming process in the data yield farming, to produce and to add on to the ontologies from the community, and then maximize the potential of your earnings.
Paul Lee (15:38):
So we want people around the world to participate in building these data sets. Also, we want them to make a lot of money whilst doing it. So it has to be fun. It has to be rewarding and it has to be economically viable. So the way we designed all our games and quantum Oasis is just the tip of the iceberg. We have at least four games coming out this year. And these games are designed for the community as well as people around the world to get excited about, you know, first of all, because it’s fun, but also because they can earn rewards out of it and they can maximize these rewards by providing the, on apologies necessary for the AI companies, such as my AI.
Dan Turchin (16:34):
So we talk a lot on this show about the potential for bias to creep into data sets that are used to train AI models. And a concern I have about this approach is that the community that will respond well to a science fiction game and, you know, embedded around this beautifully designed graphic novel may not be representative of the overall population that you’d ideally want to define these ontologies. How do you mitigate the impact of potentially a, a population of contributors to the ontologies, not representing the overall way the language is used beyond this population of gamers?
Paul Lee (17:20):
That’s a very, very good point. And that’s why we want to be able to cast next to a diverse amount of people, as well as different cultures, you know, different religions, different race. And that’s where we are trying to diversify in the gaming itself, but through the architects who are designing the domains, as well as the subjects. So the architect’s role, as you know, is designing what kind of domains the builders are gonna be building or providing the ontologies on. And these subject domains has to be very, very diverse. It’s not just gonna be focused on one domain. Also the builders who are obviously coming from the gaming world as well, but they will also be coming for the data yield farming alone as well. So we try to generate the leads from the gaming as well, but also try to cost different nets so that people who are not gamers may also come in to contribute and earn rewards through the data yield farming ecosystem as well.
Paul Lee (18:37):
So we need to try different things, but, you know, the bias is real and we need to find ways to mitigate this bias as we go. I mean, I’m sure, you know, all, you know, the AI that were built for Microsofts and, you know, all the conglomerates, they were biased and, you know, soon all the machine learning data sets were polluted. We’re trying to make that well, to be perfectly honest with you. I mean, it’s gonna take a lot of trial and error, but with the crowdsourcing mechanism, we believe that it will mitigate it for sure.
Dan Turchin (19:20):
A guest in the show recently talking about building predictive healthcare models, to be able to figure out when patients are gonna need treatment, what treatment they need, how much should it cost, and they license their training data from states. And as we had a deeper conversation about where the states get their data, you know, it’s based on, you know, what I’ll maybe cynically refer to as, you know, affluent patients who are able to afford surgeries. And then the, you know, the data is collected based on who is received these surgeries and gives the state permission to share their data. So perhaps you could say, you know, that the impact here could be less severe of biased data, but I’m glad it sounds like you appreciate the challenge and, you know, you, you and our guests are sorry, our audience are familiar with horrific examples. Like the Microsoft tab bot where, you know, these things can be manipulated. And I, I, I’m glad you’re thinking about the the seriousness of that issue.
Paul Lee (20:22):
No, definitely. And that’s exactly why we have a validated system, because we don’t just believe in the fact that the bill alone or the architects alone will be able to mitigate this situation. It’s, you know, it’s a multi security system that we need to go through from architects, we’re designing, what kind of data sets they collect and what domain also the builder’s mentality, as well as the education for the builders in building or collecting these ontology sets, but also the education needs to be done for the S who are validating, whether this data actually goes into the AI. That means these validators need to be of high as as you know, or high ethical ground or high moral ground, but also be experts in that domain specific subject so that they are sure that this is the right, you know, information that needs to be fed in. So it’s not just gonna be one validation per you know, their ontology. It’s gonna be multiple validations of the crowd, right? So I believe crowdsourcing is a much safer, much fairer and much unbiased way of collecting these data. Especially if you have a check process embedded in the ecosystem,
Dan Turchin (21:48):
I buy the thesis, a absolutely the crowdsourcing is a better way to mitigate the impact bias, but I know as let’s say a potential purchaser of the data, these, these ontologies, I, I would have questions about how the data was collected. For example, if I’m selling jewelry versus pizza, I’d wanna know if these are primarily males that don’t wear jewelry or don’t eat pizza <laugh> you know, whe whether or not I would, I would trust the ontologies that were generated that way. what, what do you envision, you know, providing your potential customers to be able to make them feel comfortable that the data is gonna be of value.
Paul Lee (22:30):
Hmm. And that’s a very good point now, the, Hmm, for the, I mean, as you say, the demographics can affect the ontologies as well, but the way we are collecting all the ontologies we collect are very, very different from the machine learning or deep learning data sets that we collect. So I’m sure you’re gonna be seeing a lot of, I mean, you can actually see it on our telegram and discord community channels already of the samples of data yield farming questions that the architects has already done. I mean, we tested it out and we, we already have, I mean, this was, you know, literally a Google form test, but we got about 600 sentences from about, you know a group of 50. And we saw what kind of, because I I’m very aware of exactly what you’re saying about this bias. But what we are collecting are variations of the sentences rather than sentimental or emotional input from different demographics. Does that make sense? It’s based on natural language alone, the variation of natural language, not whether it affects the sex or the age, or even even the cultural differences, I would
Dan Turchin (23:59):
Say, good explanation. Now I’m gonna say something, it’s gonna seem like heresy, but I gotta get your perspective on this.
Paul Lee (24:06):
Dan Turchin (24:07):
Why was it essential to use a Dow and have these transactions on a blockchain and use NFTs? Did you consider, I mean, certainly, you know, a lot of these mechanics could be done in what we’ll call traditional ways. Why use the kind of the web three architecture?
Paul Lee (24:28):
That’s a very good point. First of all, we wanted to make sure the datas that we collect are decentralized. It cannot be owned by a company. It cannot be owned by any special person, and it has to be owned by the crowd. So we have separated the engine from the dataset so that the crowdsource dataset are, and it belongs on the blockchain so that it can be accessed by everyone or anyone. That’s the first reason why did we use Solana? Well, because of the gas fees and the speed. Also the current mechanical Turks out there they are very labor intensive. They’re not fun, and they don’t pay well. We try to change the status quo. And we felt that the NFTs, which we have condensed down into the unit of natural language words is very, very attractive. If it also has passive income now for the blockchain part or web three, we believe it’s an opportunity for all the other AI companies around the world to be able to, well, natural language, AI companies around the world, to be able to access the data, sets the curated and tailored data sets that we are creating at synthesis one so that they can have access freely as open as possible so that they can access it like electricity rather than the dataset belonging, to just a few conglomerates.
Paul Lee (26:02):
You know, the Facebooks or the Microsofts world
Dan Turchin (26:06):
Seems like today, the Ethereum blockchain is almost synonymous with NFTs. And what you just described as your selection process with, is it past the point? No return is there even there’s such strong networking, networking effect as each incremental project goes, live on the Ethereum blockchain, it makes it that much harder for other blockchains. Do you, is the race over? Is Ethereum one?
Paul Lee (26:32):
I don’t think so at all. I think salon is growing very quickly. I think the salon ecosystem’s growing very quickly I think the NFT space on the Solana blockchain is increasing very, very quickly and we want to ride the wave of the growth of Solana as well. And I personally think that the Ethereum, I mean, first of all, we can’t run our ecosystem on Ethereum because of the gas fees or the speed of Ethereum. And it’ll be economically not viable at all to provide our NFTs or even our solution on Ethereum. We needed that speed because I mean, you know, for AI to conversational AI, we need that speed. And we had to pick a blockchain that was gonna be growing exponentially so that we could grow exponentially with that ride, with that wave. And I think we have chosen the right one. I do believe that Solana is going to be become one of the leaders such as Ethereum. So no, I don’t think the, you know, the competition is open
Dan Turchin (27:51):
As a technologist and a pioneer in the space. I wanna get your perspective, what you’re doing with gaming and with building these ontologies is so pragmatic. Anyone who’s in conversational, AI appreciates the beauty of this system and you know, how you’re gamifying it. And a lot of people don’t appreciate how fundamental what you’re doing is to getting closer to, you know, what we’ll call general AI as opposed to narrow AI right. But it could be approaches like what you’re doing that get us there a lot faster. What’s your perspective on general AI? Will we ever get there? Is it kind of a hollow promise or, or our projects like cynic one going, gonna get us there quicker than most people think?
Paul Lee (28:35):
Well, I think it’s the engine problem, as well as the data set problem, N engine meaning machine learning has already hit its ceiling. The narrow AI, the world, they can do, you know, phenomenal things in a very narrow space with super computing power. And, you know, a lot of, lot of tons of training required, but at the core, that’s why my Neo was born. I mean, we believe that in order to build what we call artificial general intelligence, it needs to be inte right. And at the core of intelligence is what we call reason it and reasoning process consists of abduction deduction and induction. These three things must work in concert. Otherwise, you know, we can’t mimic the human reasoning process. And what we’ve done at mind AI is, is put those three reasoning processes into an algorithm that we call the canonical structure.
Paul Lee (29:33):
And that is what we call the unit of reasoning. So we have a very good chance because we’re not following any of the machine learning or deep learning protocols. We have actually spent 13 years building this natural language reasoning algorithm, and we have commercialized it. The thing that we need now is domain specific knowledge. And that’s what we’re trying to solve with census one. Now, building the common sense knowledge or the domain specific knowledge, what we call the ontologies is crucial in understanding the space, the different domains. I mean, think about educating a child in different subjects. You need to provide them with, you know, different subject common sense knowledge as well as more specific knowledge as they learn more. That’s exactly what we’re trying to do with the collection of these ontologies by domain after domain. Now, let’s imagine after these domains have been set in, let’s say, you know, 2000 domains.
Paul Lee (30:45):
So let’s say we have about, you know, 5 million ontologies already built in synthesis one, and mind AI has access to it. So, first of all, mind AI leads needs 500 times less data than any machine learning or, you know deep learning technologies out there. So with that 5 million ontologies, it will be able to, in my opinion, cross different domains and be able to think, and reason naturally that I believe will lead to a very preliminary artificial general intelligence in the beginning. But then as it becomes more aware and gathers more ontologies to support its original common sense knowledge, then it will be able to link different parts of these ontologies and thesises and come up with its own metathesis or meta theorics. And I think that is, that will be the path to artificial general intelligence,
Dan Turchin (31:53):
Paul that’s the single most articulate and pragmatic approach to achieving general AI that I’ve heard. And we we’re all rooting for you. That’s a, it’s just a very elegant way of thinking about structured reasoning and a path to getting there that is beyond science fiction.
Paul Lee (32:12):
<Laugh> thank you. Thank you then.
Dan Turchin (32:14):
Good. Well, Paul, this has been so much fun, but I gotta get you off the hot seat. But you gotta answer one last question for me. So you’re talking to a bunch of AI savvy practitioners who are interested in making money off these blockchain pro projects and wicked smart and might be interested in investing some time in some of the games and contributing to the to the ecosystem. But, you know, there are a lot of competing projects competing for their time and attention. What’s your pitch to gamers or or potential participants in the cynic one community?
Paul Lee (32:50):
Well, I mean, first of all the way I want people to understand this is that building AI data sets that are going to be powering natural language AI around the world. And if you want to participate and contribute, please come along because it’s not just gonna be a fun playing games and gamified games, but also you’ll be earning rewards that you deserve by providing these ontologies. So why not participate if you see AI to be, you know, the future technology, then I think you must contribute and also get rewarded for participation. And I think, you know, the future is in blockchain as well as AI. And we’ve married the two in a very sexy, elegant way. And I want everyone to be able to try this, and I want everyone to be rewarded and make, you know, make the word travel faster. So yeah, I mean, what have you got to lose is, is my pitch,
Dan Turchin (34:04):
Oh, I’m solved. <Laugh> I’m you got me. I can I can accelerate the development of general AI. I can contribute in a meaningful way to the future of humanity. How’s that?
Paul Lee (34:17):
Dan Turchin (34:19):
Absolutely. All right. I’m signing up to to sell for you. How about that?
Paul Lee (34:24):
Hell thank you.
Dan Turchin (34:26):
Good stuff. Well, Paul, I would love to you back for another version of this conversation after quantum nosis launches and subsequent games launch, this is a, this is a fascinating space to watch.
Paul Lee (34:36):
Absolutely. I’ll be more than happy to come back. Thank you. That excellent.
Dan Turchin (34:39):
Well, such a pleasure, best of luck to you and the team. Gosh, that’s a wrap for this week, but a great discussion with Paul Lee CEO of cynic one. I’m your host, Dan Chen of AI in the future of work, but of course, we’re back next week with another fascinating guest.