Launched in 2010, Zyte specializes in ethical data extraction and web scraping. Passionate about education around web data collection, Zyte also runs the annual Web Data Extract Summit. We spoke with Sanaea Daruwalla, Chief Legal & People Officer at Zyte, about generative AI, setting a positive example, and the importance of keeping your eye on the ball.

EWDCI: Can you give us the elevator pitch for Zyte?

Sanaea Daruwalla: Zyte API is essentially the API to the internet. You can extract publicly available data quickly, compliantly, and hassle-free; with high-quality results.

EWDCI: How have the recent leaps forward in AI changed your product strategy? 

SD: We’ve incorporated artificial intelligence into our product offerings for quite some time now. We have a patent on some of our technology, our automatic extraction technology, which is based on machine learning models. We’re now starting to incorporate some LLMs [large language models] into our products for some more customizable AI-driven data extractions. So it’s always been at the forefront of our minds. We have an incredible data science team that’s building some of this out and and now we’re leveraging some of those great third party apps to ramp that up even further. 

On the customer side, there are a lot of AI companies who are now looking for web data to train their own internal models and to build out their systems. So we’re looking at it from both those perspectives.

EWDCI: What do you wish more people understood about web scraping?

SD: I wish they understood that they all use it on a daily basis. A lot of times, when you say “web scraping”, people are like, “Oh, isn’t that shady or bad or something?” The reality is, we all have apps on our phone that rely on web scraping, and search is reliant upon it as well. So much of the technology that we use on a day to day basis—when used ethically—is an incredibly powerful thing and helps all of us work and live more efficiently. 

EWDCI: How have global events impacted your growth and service offerings in the last few years?

SD: The largest global event, if you will, that has impacted a lot of tech companies was when OpenAI released ChatGPT. I think that opened the world’s mind to generative AI in a way that we haven’t seen before. We’ve incorporated some of these concepts into our product strategy, and we’re seeing many more customers come in thinking about AI. More than ever before, they’re asking us about data to train internal LLMs or to fine tune a model for their particular needs. We’re seeing that across all industries. 

You know, I think the economy over the last few years has certainly impacted all tech companies in a way—I think there’s been slowdowns and growth across the board. Our industry has managed to weather the storm quite well, so we’re lucky in that regard.

EWDCI: Have you noticed certain web-scraping use cases becoming more popular recently?

SD: There are a lot of use cases out there for real estate data, jobs data, and of course product data. We see growth in the “alt data” sector, which is data for investment firms and hedge funds where they’re using all sorts of different alternative data sources to inform their investment decisions—and part of that is web-scraped data. The big ecommerce companies are of course using web scraping for pricing data and competitive intelligence.

When it comes to artificial intelligence, there are some Gen-AI companies who know exactly the data they need to feed into their models, but there’s so many companies out there who are still figuring out what that AI strategy is going to look like and how they can use web data to feed into that strategy. So that’s kind of the next big wave, I think.

EWDCI: Where does government policy interface with your work on a daily basis?

SD: There are many laws or regulations depending on where you are that interface very closely with web scraping, and probably the two biggest ones are copyright laws and data protection laws. There are several laws in this regard that vary from country to country. So we think about copyright and personal data a lot. 

In the U.S. there’s also the Computer Fraud and Abuse Act, but we’ve had some positive decisions from the Supreme Court and California’s Ninth Circuit Court that have taken web scraping out of that realm for the most part. If you were to look at a web scraping lawsuit, you’d see references to contract laws and trespass laws and unfair competition laws, but the crux of it remains in copyright and data protection. So that’s what we focus on, and then we’re seeing new AI directives such as the EU AI Act, and the executive order in the U.S. There are laws popping up all over the globe around AI, so we need to monitor those as well, given that we could potentially be feeding data into training datasets; and then we’re building AI tools within our own products as well.

EWDCI: Why did your team find it important to join the EWDCI?SD: We touched upon that a bit earlier, with people thinking “web scraping” is this bad word. The truth is there can be bad actors in web scraping, right? Zyte has always been focused on compliance and ethics: we want to be that trusted provider for organizations who want to go about data collection the right way. We want them to be confident that they’re working with a partner in compliance and ethical web scraping, so joining the EWDCI was a no-brainer for us. It had always been something we were doing internally already, so why not try to broaden the scope and make it part of the web scraping market? And the more web scraping companies we can bring into it, I think the better reputation we can build for the industry. This is also the perfect opportunity to give some of the smaller web scraping companies guidance on what the best practices are and how to make their practices compliant. So yeah, it was an easy decision for us—ethics and compliance are a big part of who we are.