Founded in 2015, Oxylabs is a market-leading web intelligence collection platform that puts ethics and good business practices first. With over 102 million ethically-gathered proxies, Oxylabs serves over 3,500 clients worldwide to become the fastest-growing web intelligence company in Europe. Denas Grybauskas, Head of Legal at Oxylabs, took a moment to chat with us about dataset use cases, the role of AI in web scraping, and the battle between right and wrong.
EWDCI: Can you give us the elevator pitch for Oxylabs?
Denas Grybauskas: For almost ten years, Oxylabs has been a market-leading public web data acquisition platform, developing next-generation proxy and data scraping solutions. Today, thousands of organizations worldwide, including Fortune Global 500 companies, rely on our products to gather competitive web intelligence for a variety of use cases—from pricing analysis and market research to large-scale cyber threat monitoring.
We believe in our mission to make the vast benefits offered by public web data accessible to everyone, no matter if it’s an enterprise business, an independent scientist, or a nonprofit fighting disinformation.
Data is the backbone of the digital economy, and we see Oxylabs at the forefront of driving responsible and ethical data acquisition practices with constant technological innovation.
EWDCI: What do you wish more people understood about web scraping?
DG: First, that data collection industry is no longer a niche field—millions of people worldwide are using it daily without even knowing it. Price comparison sites, travel fare aggregators, and even search engines are powered by web crawlers and scrapers. Ecommerce businesses wouldn’t be able to compete on a global scale without price intelligence driven by web scraping. Cybersecurity specialists would go blind without it, as manually monitoring online threats in real time is simply impossible.
Also, there’s a tendency to associate web scraping with shady activities—businesses often try to hush up the fact they are using web scraping because they fear it will hurt their reputations.
As with almost any digital technology web scraping can be used for both good and bad. Over the years, the web intelligence market has evolved considerably, and its main players today are large, reputable businesses with legitimate use cases—they collect public data at scale to drive innovation and business growth.
The last thing I would like to stress is that people mistakenly associate web scraping with collecting personal data when it is mostly used for collecting publicly available data. It is not to say that there are no people with questionable morals using vast IP networks for illicit purposes. This is why at Oxylabs we have robust know-your-customer procedures that allow us to filter and ban suspicious users, as do other reputable scraping companies.
[Note: you can find Oxylabs’ KYC procedures here.]
EWDCI: How have global events impacted your growth and service offerings in the last few years?
DG: We experienced a huge boost during the pandemic, as most businesses moved online, including brick-and-mortar stores. They quickly understood that getting actionable data-driven insights is the most underexplored competitive resource—the data is out there on the open Internet, and one only needs tools to gather it conveniently.
Before that, web crawling and scraping technologies were mainly used by search engines, some ecommerce companies, and financial-sector companies that needed alternative data sources to make investment decisions and predict economic trends. After the pandemic, we noticed a surge of interest from businesses with various use cases, from travel and hospitality (fare aggregation) to market research and cybersecurity (brand protection and threat monitoring).
Currently we are witnessing a promising growth of a new vertical, AI training. Developers need extreme amounts of multifaceted data to get AI systems off the ground, and innovations in the field of web scraping are partially responsible for the AI breakthrough. We expect this use case to grow further.
EWDCI: You’ve recently started offering a dataset service. Which sets have been the most popular so far?
DG: Ecommerce data has been very popular, which isn’t surprising for us since ecommerce companies make up the largest part of our clientele. For them, acquiring timely web intelligence is crucial to understanding competitor strategies, consumer behavior, and general market trends. They use data to optimize digital shelves, predict product popularity, and inform dynamic pricing strategies. However, we expect that acquiring custom datasets that are cleaned, structured, and suitable for analysis will become increasingly popular among other businesses that need actionable insights.
EWDCI: What effect has the rapid growth of AI had on your service offerings?
DG: We sense a growing interest from AI developers who need to gather training data from different sources. However, there are other aspects beyond machine learning (ML) model training of how AI affects the web scraping business. AI and ML allowed us to power our own technologies, making them more effective. We have introduced the first patented AI-powered adaptive data parser. Our Web Unblocker uses AI for proxy management and response recognition. So, AI and ML help advance scraping technologies further.
On the other hand, AI and ML also power anti-scraping measures that are becoming increasingly challenging to notice and overcome for many businesses—from ecommerce to cybersec companies that have to deal with professional threat actors blocking their threat intelligence efforts. This is a constant battle.
EWDCI: Where does government policy interface with your work on a daily basis?
DG: As an EU-based company, we often face more stringent regulatory requirements than our competitors in the U.S. or Asian countries. There are always more checks that we need to bear in mind and manage, from GDPR to the newly released EU AI Act and various other directives: the Data Act, Digital Markets Act, Digital Services Act, Data Governance Act, and others.
EWDCI: Why did your team find it important to join the EWDCI?
DG: The data collection industry is relatively young, and clear legislation is still lacking. This is one of the reasons why there were cases of abuse that I have mentioned when talking about reputational issues. While regulation is lacking, it is important to safeguard the industry from within. So, together with other leading web scraping companies, we decided to show initiative for establishing EWDCI as a platform that could act as the voice for different industry players, giving them a possibility to discuss and enforce ethical code of conduct and share the best practices.
The EWDCI is our best way to show that the web scraping industry adheres to ethical data collection requirements and respects data fairness. Public data acquisition is an important segment of the Internet, and organizations that use web intelligence for legitimate purposes can provide value for consumers and economics in general, as long as they respect ethical and responsible data handling principles.