Founded in 2016, Coresignal specializes in “always fresh” business data records from multiple public web sources with hundreds of data points. Their datasets cover all essential aspects of information about companies, professionals, and jobs. Think farm to table, but with data. We spoke with Karolis Didziulis, Product Director at Coresignal, about market developments, transparency, and the data you don’t want to collect.

EWDCI: Can you give us the elevator pitch for Coresignal?

Karolis Didziulis: Coresignal transforms large amounts of unstructured, raw public data from the web into high-quality datasets that are easily accessible and ready to work with. Our infrastructure and experience allow us to ensure a high degree of data freshness while also continuously expanding our B2B datasets. 

We work with 500+ data-driven startups, enterprises, and investment firms that use the data to extract valuable insights or build data-driven solutions and products.

EWDCI: What do you wish more people understood about web scraping?

KD: Web scraping enables businesses to access vast amounts of public data from the web, transform it into immensely valuable insights, and build cutting-edge software products on top of those insights. This can drive smarter decisions in various industries and also power products such as intelligent recruitment platforms, powerful prediction and AI models, search engines, and much more.

Fundamentally, web scraping unlocks great economic value that is often overlooked by the wider public, and it is becoming an increasingly important engine of modern, technological economies.

EWDCI: Do you notice types of businesses using your services that have not done so in the past?

KD: Much of our customer base have traditionally operated in the sales, HR, investment, and research verticals. These sectors have a consistent need for public web data, making them early adopters of our services. However, we’ve recently seen a significant increase in organizations using our data for market research, academic purposes, and risk assessment. We’re also noticing a growing interest from many other verticals that have niche use cases where web data comes in handy. Those could be in finance, banking, KYB [Know Your Business/due diligence], cybersecurity, and others.

EWDCI: How have global events impacted your growth and service offerings in the last few years?

KD: Despite widespread layoffs in the tech sector and a general reluctance among software companies to expand their vendor lists, we’ve experienced a significant increase in inquiries about our data products. 

Although the market has become more competitive, our clients heavily rely on our data from a product standpoint, so whenever other tools are cut, we see a corresponding rise in demand.

Like many other companies, we notice the trend of optimizing costs, which means prospective clients pay more attention to data attributes beyond just dataset size. These attributes include data freshness, quality, the extra value vendors provide on top of raw datasets (such as enriched or cleaned data), and the ability to deliver data in a stable, reliable manner. This trend guides us in developing our data products to help clients save resources and achieve greater efficiency.

We’ve been focused not only on enhancing our data quality and making it more accessible to users but also on educating the market about the use cases of public web data and its importance for companies. As a result, more companies are now utilizing this data and recognizing its value.

EWDCI: What effect has the rapid growth of AI had on your service offerings?

KD: The AI and LLM sectors have seen a surge in market entries. Existing players are integrating this technology to enhance their product offerings while new companies are emerging. There is significant interest in contextual information—where AI proves invaluable—as well as in new technologies that provide specific insights. 

Many companies are exploring ways to expand their products by combining customer-generated data with publicly available web sources. We have also utilized LLM models for data filtering, classification, entity resolution, and insight gathering, greatly advancing our product development.

EWDCI: Where does government policy interface with your work on a daily basis?

KD: We believe that government policies are crucial in guiding data vendors to maintain ethical and legal standards, and we are committed to complying with the highest government policy standards of responsible web data collection.

We follow the latest developments in relevant case law regarding public web data collection. We also keep up to speed with best industry practices, paying particular attention to cybersecurity, privacy, intellectual property, and developments in commercial contract law regulations.

Furthermore, we regularly engage with our clients to discuss government policies and their business impact. This includes answering their questions, completing due diligence questionnaires, and reviewing risk management practices to ensure our data practices comply with the highest regulatory standards.

EWDCI: Why did your team find it important to join the EWDCI?

KD: The web data aggregation industry is still relatively young but quickly maturing. As a co-founding member, we recognized the importance of fostering industry-wide alignment by establishing key principles for ethical data collection and maintaining an open dialogue among companies in this sector.

Our goal is to advocate for a responsible and ethical approach to web data collection. This includes principles such as collecting only explicitly publicly available data, minimizing data collection, and avoiding sensitive information. We also recognize the need to promote transparency and accountability in the industry by encouraging openness about data collection practices and respecting individuals’ privacy rights.

We believe in engaging in public discussions to address current challenges. This is particularly relevant today, amidst rapid technological advancements and the emergence of highly capable AI systems. It is crucial to educate both consumers and legislators about the complexities of this field, share our experiences, and highlight the value the industry brings to organizations worldwide.