Privacy Protecting AI and Building Consumer Trust

Episode 6

Privacy Protecting AI and Building Consumer Trust

Kleomenis Katevas, Machine Learning Researcher at Brave Software, discusses how we can build trust in AI with the general public by making data as safe and secure as possible. He also unpacks some of the myths the general public holds about AI, how to debunk these myths, and tangible steps companies can take to reduce privacy concerns with AI.

Transcript

[00:00:00] Host: From privacy concerns to limitless potential, AI is rapidly impacting our evolving society. In this new season of the Brave Technologist podcast, we’re demystifying artificial intelligence, challenging the status quo, and empowering everyday people to embrace the digital revolution. I’m your host, Luke Maltz, VP of Business Operations at Brave Software, makers of the privacy respecting Brave browser and search engine, now powering AI with the Brave Search API.

[00:00:29] You’re listening to a new episode of the Brave Technologist podcast. This episode features Minos, a machine learning researcher at Brave. Minos role focuses on designing and building privacy preserving machine learning based systems. His research interests lie in the fields of privacy preserving machine learning, federated learning, mobile systems, and human computer interaction.

[00:00:49] He holds a PhD and MSc from Queen Mary University of London. In the UK. In this episode, we focused a lot of our discussion on privacy, protecting AI and the importance [00:01:00] of building consumer trust in AI. We unpacked a lot of myths the general public holds about AI and how these can be debunked along with tangible steps companies can take to help reduce concerns related to privacy and trust in AI.

[00:01:12] He also shares how Twitter, now known as X, can be a great resource for learning more about AI and specific accounts, thought leaders that worth following in the space. And now for this week’s episode of the Brave Technologist. Minos, welcome to the Brave Technologist podcast. How are you doing today?

[00:01:28] KleomeniS: I’m very good.

[00:01:29] Thank you for the invite. Yeah,

[00:01:31] Host: yeah, no problem. Why don’t you give the audience kind of a bit of your involvement around AI and what you’re building? Yes, definitely.

[00:01:38] KleomeniS: I am leading the machine learning research team at Brave. We consist of both machine learning researchers and machine learning engineers.

[00:01:47] And our goal is to experiment with the latest machine learning techniques to potentially create new features or products or apply to existing systems in order to improve performance and [00:02:00] efficiency. At the same time, we contribute to the open research community by publishing in Tier 1 conferences.

[00:02:06] Awesome.

[00:02:07] Host: Awesome. And, you know, since we’re brave here, um, you know, there’s, there’s not a ton of talk around privacy preserving AI. How about from your perspective, what’s privacy preserving AI and why is something like that pretty critical right now? So

[00:02:19] KleomeniS: let me start with what is AI first or artificial intelligence.

[00:02:24] It’s basically a simulation of human intelligence in machines. In our context, we use the term machine learning, which is the machine’s ability to automatically learn without explicitly being programmed. We have various applications in today’s digital age, like virtual assistants, Siri, Alexa, the large scale language models like ChatGPT from OpenAI or Cloud from Anthropic or Bart from Google, recommendation systems, which is, this is how Netflix, for instance, suggests a movie or Amazon recommends a product.

[00:02:57] Thank you. image recognition, we search [00:03:00] for our favorite dogs in our digital photo albums, or translate text from one language to another using a machine translator and translation and many, many more. So to answer your original question, what is Privacy preserving AI, it refers to all techniques and methodologies in AI that ensures the protection for user data, both for the data used when training a machine learning model, but also when we are doing inference, when we are basically using the model to, to infer something, when we are asking a question to the model.

[00:03:35] And the. Importance of privacy in AI systems has grown in recent years due to the increased concerns about user data, confidentiality, misuse of personal information, and potential breaches. They have become more important in light of data protection regulations like GDPR, CCPA in the US,

[00:03:55] Host: etc. Yeah. It’s one of those things where, you know, with, with people using chat [00:04:00] G B T and kind of inputting lots and lots of data, they’re not necessarily thinking about how private that data is.

[00:04:06] Right. Like, and, and how that might impact their own privacy. Right? Like, are there protections that these prompts are starting to use that make it more privacy preserving, or do people, should people be concerned about that type of thing when they’re playing with these tools? Because, I mean, we’ve seen, you know, Chachi PT in particular has gotten a lot of adoption from early adopters and people just.

[00:04:26] Using it for all sorts of things that should people be concerned about the type of data that they put into these prompts and tooling at this point.

[00:04:33] KleomeniS: Absolutely. So let me use an example for that. You want to show an advertisement based on your personal personal interest, for example. So traditionally, when you want to train a machine learning model.

[00:04:45] This is happening in a centralized server. So all of the user data are being collected there. The server trains the model for a few rounds, the so called epochs, until it converges, until it reaches the maximum performance. The big question that arises is, do you really [00:05:00] trust, trust this central service?

[00:05:02] With your personal data and the matching this data can be as you mentioned that such a BT case your your your prompts, right? Do you trust your phone with with open AI for instance? Do you trust your? phone data browsing history national pictures messages even your voice right Alexa is streaming your audio from your home and into the central service.

[00:05:27] Zoom is using all of your calls as as their own data. So we at Brave say, no, you shouldn’t trust. And this is a core value, which is our privacy first principle. We don’t need To stick to this model of Big Tech, which is collect as much behavioral data across as many devices and profiles from users and social networks in order to show them, for example, personalized ads, right?

[00:05:53] Solutions do exist that can achieve this in a privacy preserving fashion.

[00:05:58] Host: Yeah, let’s drill [00:06:00] down a little bit on that. Like, what are some of the specific things that a privacy preserving AI system would do that a more standard one or traditional one doesn’t necessarily do?

[00:06:09] KleomeniS: Yeah. One option is on device processing.

[00:06:11] This is exactly what we are doing at Rave. We distribute machine learning models into the device when the inference is happening locally within the device. And this is the ideal scenario, right? Nobody else can and should have access to your device’s data. But the question is, how can you train this machine learning model?

[00:06:30] At this stage, what we do is we train these models with public data. But that has many limitations. Data are limited, they are biased, they are not personalized to the user’s needs, etc. So one of the solutions that Brave is experimenting is with the use of federated learning. So instead of Sending all of the user data to a central server, you can train these models locally within the device.

[00:06:58] So, for instance, when the, [00:07:00] when your phone is charging overnight, and then you can distribute these individual models to a central server, not the data, it’s the trained models, the model parameters, where there is some type of model aggregation, building a stronger model with the knowledge. gathered and trained from the user models.

[00:07:17] Now, saying that, the model itself can leak some information, and many attacks do exist. But luckily, other privacy preserving mechanisms And measures also exist, like the concept of secure aggregation, so that the server cannot see the individual model updates from the users, rather than the final model.

[00:07:37] Differential privacy, which is a mathematical privacy guarantee that quantifies the privacy leakage by adding calibrated noise to the data. While, and I need to say that, having an impact on the model utility. Others are homomorphic encryption, uh, where you can perform arithmetic operations on the encrypted data without first decrypting it.

[00:07:57] And

[00:07:57] Host: many, many others. You know, when you kind of look at the broader [00:08:00] picture of things, like, how do you see AI progressing? Do you see like some specific cases that might be really good around these local approaches versus something more that happens, you know, in the cloud? Or do you see it being kind of diversified and ubiquitous?

[00:08:13] Like, uh, where do you kind of see directionally these things going?

[00:08:17] KleomeniS: AI has succeeded in many domains like health care. aiding in diagnostics to predict patient outcomes, natural language processing. Very good example is the case of large language language models, chatGPT is only one of them, but even more simple cases like text to speech, speech to text applications, etc.

[00:08:35] Computer vision domain, which is image and video analysis, how you can identify objects. Detect anomalies or even recognize human emotions. Autonomous vehicles are not no longer just a concept. They are slowly becoming a reality. Also in the domain of fraud detection in financial services, AI has become essential to prevent fraud, fraudulent activities by [00:09:00] analyzing patterns, anomalies, transactions, et cetera.

[00:09:02] And this is how. A very similar functionality is in your email platform to detect and prevent spam or even phishing

[00:09:09] Host: attacks. On a personal note, what kind of gets you most excited when you’re thinking about what’s possible for somebody like Abrave or something that has a lot of like consumer user base, you know, starting to play around with these tools?

[00:09:20] Like, where do you think we could, or these things will drive a lot of value for everyday users?

[00:09:25] KleomeniS: I think the coolest thing would be running the, this large language models within your device. This is something that the research team of Brave is working on. We want to run this large language model within the device so that we don’t communicate the prompts to the server, right?

[00:09:44] This has many, many complexities and challenges. So even doing an inference On the LLM is challenging and requires processing power. So that’s what excites

[00:09:55] Host: me most. Awesome. What areas like are you, I mean, there’s so much Doomer chat [00:10:00] right around there, like, but, but what pragmatic, like realistic concerns you have around what you’re seeing in the space and areas that you see could get kind of a little out of hand in short order, or what things kind of keep you up at night around this stuff, if anything, right?

[00:10:14] Like it could be nothing.

[00:10:16] KleomeniS: Yeah, obviously, while AI holds great promise, what people worry most is the AI’s social economical impact, like job displacement. People fear that even software developers will be affected, but also doctors, writers, etc. So I was reading about Tim Burton’s, what he was saying yesterday about AI writing movie scripts, and whether he feels like a robot taking your humanity and your soul.

[00:10:41] So in one of our works long ago, approximately 10 years ago, we were experimenting with humanoid robots trying to perform as startup comedians. And I remember that back then, it was an extreme project, right? It was a very cool project, very, very unique. But now it looks like very trivial. LLMs already. Tell jokes, write [00:11:00] poems, mimic, uh, human language, et cetera.

[00:11:03] Other is, uh, discrimination is a big risk. AI amplifying existing, uh, societal, uh, biases. For example, system not recognizing certain ethnicities as accurately as other because bias in the training data. We talked about possible privacy harms, exposing user data without the user’s consent. And

[00:11:22] Host: in that vein too, like what’s, what’s kind of the number one myth about AI that you wish people would just kind of stop believing

[00:11:29] KleomeniS: Yeah. I, I wish they would still believing scary titles. Like I have read this, uh, news article saying LLMs are, uh, sentient, have feelings, are creative, which is totally wrong. We should remember not to, uh, anthropomorphize ML models like LLMs. They’re advanced machine machine learning models. We can also oversimplify these as and can be used as information retrieval tools that are trained to produce human like text based patterns learned from massive amounts of data.

[00:11:59] While [00:12:00] they can provide accurate Responses across multiple tasks, it’s also not a comment for them to produce incorrect, nonsensical outputs due to their limitations, right? Like the phenomenon of hallucinations that they do when the model basically makes up things.

[00:12:17] Host: And I’m seeing some chatter around this too, recently, like, from your, your point of view, are you seeing the outputs tending to get better as more and more people use them?

[00:12:25] Or, or is it kind of more muddying the waters? Or what’s your take on how the quality of these systems is working as they get more adoption or early adoption? There is this

[00:12:34] KleomeniS: feature that basically the large legged motors can continue learning new information based on the responses for the users. There are some evidences that the performance, when I say performance, I mean the accuracy of the results, drops over time.

[00:12:49] I’m pretty sure this is something will be resolved by new models, new updates, new training phases,

[00:12:55] Host: etc. Awesome. What do you think is the best way that we build trust with the general public [00:13:00] regarding, you know, AI’s ability to actually preserve privacy?

[00:13:03] KleomeniS: Yeah, trust is fundamental, right, to any successful tech adoption, especially with AI that can process and analyze large scales of personal data.

[00:13:13] People need to believe that the information is secure. It can be used responsibly. Similarly, just saying we do not collect your personal data to the users is definitely not enough. So many steps that need to be followed. Uh, one of these is being transparent about how your system works. using open source algorithm, using plain language policies, posting this into open blog posts without the technical terms inside data minimization, which is only collected necessary data that reduces the risk of potential breaches of misuse.

[00:13:51] Give users the choice to opt in and offer them control of what they can share, but also The option to view to delete the data before being served [00:14:00] a robust security protocols is another thing by using a stronger state of the art encryption techniques. One very promising new technologies, the trusted execution environments, which is a secure area inside the processor.

[00:14:15] Either on the client or on the server side that the data are being stored. So the data can be stored inside, can be processed, can be protected in a totally secure and isolated environment. We already use this upgrade to access private user analytics. One of the on our previous works, we use this to train models within these TEs and overcome some of their limitations.

[00:14:39] Host: A lot, a lot of transparency and kind of, you know, showing, showing the work and where people can kind of, you know, validate, right? Like, and verify what we’re doing, right? Definitely.

[00:14:50] KleomeniS: And also encourage feedback from the users, but also from the broader community, right?

[00:14:56] Host: Yeah, yeah, I think it’s pretty critical to you. It makes a lot of sense. I mean, you know, a lot more people are [00:15:00] starting to kind of get familiar with AI and the tooling and stuff. Are there any resources that you personally would recommend for somebody that might be developing or just kind of, you know, researching this a little bit that you’d recommend they try to check out or start using?

[00:15:14] KleomeniS: Yeah, definitely. One of the main sources for me is Twitter, which I need to remember. Start calling it X now.

[00:15:23] Host: I’m, I’m with you, man. It’s going to take a while for me too.

[00:15:27] KleomeniS: It’s going to be a long while. Yeah, I totally agree. And Twitter is history. But anyway, so with accounts like, uh, Andrej Karpathy or Andrew NG, Andrew NG, I mentioned him because of his involvement to the Coursera, which is online learning.

[00:15:44] platform that made machine learning available to everybody, especially the machine learning course that is highly recommended. Also, the deep learning AI courses. There is another blog from Lilian Wang, the Lil blog, which is very good from my point [00:16:00] of view, but also academic papers, everything. Most of these are available in archive and If somebody struggles understanding the academic papers, most of them have also the presentation on YouTube.

[00:16:14] Host: Awesome. That’s super helpful. Do you have a favorite movie, show, or book featuring AI you recommend folks listen to? Or just something that kind of inspired you?

[00:16:23] KleomeniS: Many, many. I would say The Ex Machina, which is a psychological thriller. With a programmer being asked to conduct a very popular Turing test to a humanoid robot With a man falling in love, being in a relationship with an AI virtual assistant Definitely some of the Black Mirror episodes.

[00:16:45] In general, all of these Sort of reference the impact of AI on our lives and try to imagine how we can relate to it.

[00:16:54] Host: That’s awesome On a personal note, did you always plan to explore machine learning and AI, [00:17:00] or what was your journey to get to where you’re at now with AI? Like, did you fall into it? Was it something that you went to college kind of planning to study or something that, you know, came up and was super interesting?

[00:17:12] KleomeniS: So it was an internship I did with some great people at Telefónica Research in the beautiful city of Barcelona. So I was working on the project and that involves some machine learning and my supervisor back then, Martin Pilot, was explaining to me how ML works and everything. And then I fell in love with that.

[00:17:31] Host: Awesome. Awesome. So I mean, it’s like, you know, we do a lot of things at Brave, especially around AI. Like, what are the different AI initiatives that you’re working on and ones that users see in the product and might not see in the product?

[00:17:43] KleomeniS: Yes, some examples of use cases of applied machine learning at Brave are news recommendation, for example, source suggestion, whether you like sports or arts, apply clustering to the news feeds in order to find upcoming top news.

[00:17:57] And this is used in Brave search, but also in [00:18:00] the mobile widgets. We do ad matching or ad timing prediction using features from, again, on device ML to show you the appropriate ad. Human attestation, detect if a user is a human or a bot for fraud detection.

[00:18:17] And recently, we moved into the area of LLMs, large language models, creating Leo, our chat assistant that is already available in the Knight channel, but soon to be available for all users.

[00:18:29] One of our long term production ideas we have is building private and efficient on device LLMs, where the model is hosted within your device, At desktop, but even in more challenging smartphones with limited resources.

[00:18:44] Host: Are you seeing anybody do, any other companies kind of working on those local models with anything in production right now?

[00:18:50] Or, or is it still a lot of like, uh, stealth mode? It’s

[00:18:53] KleomeniS: a new era for sure. Uh, I know that, uh, Apple already uses some transformer based [00:19:00] models in the latest iOS for, um, voice to text translation. But yeah, definitely there are a few companies doing that. Awesome.

[00:19:11] Host: Somino, what do you think AI has been successful at so far and, and where do you see big gaps?

[00:19:15] Uh, in capabilities?

[00:19:17] KleomeniS: AI has been very successful in many domains like healthcare, aiding in diagnostics, predicting patient outcomes, natural language processing, uh, has made possible to have more natural interactions with machines from more simple text to speech or speech to text application to the more advanced LLMs.

[00:19:35] that generate human language with remarkable accuracy. Image and video analysis can now identify objects, detect anomalies, and even recognize human emotions. Autonomous vehicles are no longer just a concept. They are slowly becoming a reality. But also in the domain of fraud detection in financial services, AI has become essential for preventing fraudulent activities by analyzing [00:20:00] patterns and anomalies in transactions.

[00:20:02] Similar functionality is enabled in your email platform, for example, to detect and prevent spam or even phishing attacks. Now, gaps do exist, and the research community is actively working on this. The well known human like general intelligence concept where machines will have the ability to learn and apply knowledge across a wide range of tasks, embed ethical and moral reasoning to ensure they make decisions in line with values of our society, but also energy efficiency.

[00:20:32] Both for training and inference is a big challenge. For instance, the amount of energy required to train the LLM models are extremely high. According to a research, CGPT needed the yearly electricity consumption of over 1, 000 US households. But that’s for training only. Inference can also be expensive and that’s why it is so challenging to run an LLM from within a smartphone device.

[00:20:58] Finally, as [00:21:00] AI systems are increasingly adopted. There is a personal need to ensure that they are fair in their decisions that they can be accountable for their actions and they and their decision making processes can be understood and justified. Awesome,

[00:21:15] Host: that’s great. Where do you think AI could take us?

[00:21:17] You know, how do you see AI changing the world in the long run?

[00:21:21] KleomeniS: I think the future of AI is fascinating, but also complicated. We will need more advanced language models. Integrated in our daily life, personal assistants, text editors, web browsers, even integrated into our operating systems. In healthcare, AI will play a significant role when it comes to diagnosis and tailored treatment plans.

[00:21:43] Imagine a future where illnesses could be detected early enough, even before the first symptoms show up. And treatments are specifically tailored to individuals lifestyle, needs, environment, etc. Education is another example that could be revolutionized, like imagine [00:22:00] students learning in their own space.

[00:22:03] Methods are tailored to their needs, interests, and even abilities. What I’m super excited about is the new AR, augmented reality, and the virtual reality era that can complete all the examples I mentioned earlier. And this might even change how we navigate to the digital digital world as opposed to using apps on our phones or using the web browser.

[00:22:26] Host: Awesome. Awesome. Well, thanks, Minos. I really appreciate you taking the time to kind of help share all this with it with our audience and wish you best luck with it with everything you’re doing at Brave and pushing a more privacy preserving AI models and tooling, you know, to the world.

[00:22:40] KleomeniS: Thank you. It was a pleasure.

[00:22:42] Take it easy. Thank you. Goodbye.

[00:22:45] Host: Thanks for listening to the Brave Technologies podcast. To never miss an episode, make sure you hit follow in your podcast app. If you haven’t already made the switch to the Brave browser, you can download it for free today at brave. com and start using Brave Search, which enables you to search the web privately.[00:23:00]

[00:23:00] Brave also shields you from the ads, trackers, and other creepy stuff following you across the web.

Listen on Spotify Podcasts

Listen on Apple Podcasts

Show Notes Guest List

Show Notes

In this episode of The Brave Technologist Podcast, we discuss:

How X (formerly Twitter) (now known as X) can be a great resource for learning more about AI, along with specific accounts and thought leaders he’s following in the space
Exciting ways that healthcare will be vastly improved through artificial intelligence via customized treatment plans and timely diagnosis
Why Brave is taking a privacy-first approach to its AI product suite

Guest List

The amazing cast and crew:

Kleomenis Katevas - Machine Learning Researcher
Kleomenis Katevas is a Machine Learning Researcher at Brave Software, where he’s focused on designing and building privacy-preserving, ML-based systems. His research interests lie in the areas of Privacy-Preserving Machine Learning, Federated Learning, Mobile Systems, and Human-Computer Interaction.

About the Show

Shedding light on the opportunities and challenges of emerging tech. To make it digestible, less scary, and more approachable for all!
Join us as we embark on a mission to demystify artificial intelligence, challenge the status quo, and empower everyday people to embrace the digital revolution. Whether you’re a tech enthusiast, a curious mind, or an industry professional, this podcast invites you to join the conversation and explore the future of AI together.