On a Mission to Preserve a Transparent Internet

You have such an interesting story to tell – how did the idea for forming this company that’s focused on collecting publicly available data come about?

The company started from what may seem like a simple market need but is actually a very complex one. Back in 2014, we ran a very successful company called Hola, which is still the most popular content unlocking/VPN service today. Major businesses approached us with a need to openly access web data on a mass scale and increase their competitive edge, which is how we were formed.

As you know, the internet is the largest, most extensive database in the world; it is where everything happens in real time. It was true in 2014, and even more so today, a company that wishes to stay on top of the market must have all the information to make the most strategic decisions. Whether that’s to measure customers’ sentiments and address their immediate needs, decide on the right pricing offers, or develop a new product, you simply cannot do them without public web data – you would get an incomplete picture that may lead you down the wrong path. Public web data allows you to know rather than implement a calculated guess, ensuring that you are advancing in the right direction to win more, whether that’s new business or customers.

So then tell us, what exactly is public web data collection or scraping?

I will start by saying that the domain is increasingly shifting at an accelerated pace, now more than ever. Web data collection or scraping is simply the accessing of public web data, which is basically data that you can see with your own eyes without the need to log in or sign into a site.

Companies that collect this type of web data for their market research or business needs approach vast amounts of data and do so frequently, as much as several times a day. We work with 7 out of the 10 leading e-commerce sites, and I can tell you that they gather public web data sometimes more than 10 times a day. This is especially the case when it comes to their competitors’ pricing data, which basically guides them to frequently change the prices of their products so they can stay on top of this increasingly competitive market.

If you had to do this kind of job, you would probably need hundreds or thousands of people sitting and gathering data manually, and that would still take them a very longtime. With our automated products, which are data collection products, a job that potentially takes weeks is reduced to mere minutes – this is the beauty of scraping and how advanced it has become. Simply put, without scraping, there would be no open competition, which would mean that all consumers would end up paying more for the same services and products than they do today.

Sounds like you guys are growing rapidly. Any chance you can throw some numbers at us?

Yes, absolutely. The last couple of years have been incredibly interesting, and we’ve grown exponentially. To put it in numbers, in addition to partnering with the largest e-commerce sites, we are also working with 2 of the top 5 banks in the US. In addition, we’ve recently reached the 400-employee mark and are still growing rapidly. In 2021, we announced that we surpassed the US$100 million mark in revenue and acquired 3 new companies.

The data domain is a domain like no other these days. I like to say that data is like the new water – it is essential to keeping any kind of business and our market alive. And our most recent numbers are proof to that.

What have been the biggest challenges in growing the company?

Any company that is growing at such a rapid pace finds it challenging to ensure that all employees keep up with the company rhythm. It takes quite an effort to train such large numbers of employees and manage those new teams. As we are a company that takes pride in anticipating current and future data needs, we move fast and innovate even faster, so keeping this rapid pace as well as recruiting at this pace is a big challenge, one we have had to learn to overcome.

We did a bit of research and noticed your company is also involved in pro-bono activities – what is the Bright Initiative?

The Bright Initiative is our special organisation and programme that is focused on making our company’s technology and years-long expertise available on a pro bono basis to universities, non-profits, NGOs, global policymakers, charities and more.

The COVID-19 pandemic was the catalyst for the realisation that this now extremely active organisation was in urgent need. We made our web data platform available for researchers at the time and found out that the demand was so great that we decided to build a pro bono organisation. Today, The Bright Initiative includes over 500 organisations. Among them, you will find over 170 universities, 96 non-profits, NGOs and public sector bodies – all aimed at using public web data to drive positive change in the world. The organisation is led today by Keren Pakes (a former journalist) and includes 7 full-time team members that provide everything from support to expertise to educational sessions and more. For example, on a monthly basis, we run 6 educational sessions involving top academic institutions or non-profit organisations.

Our partners work to tackle critical needs like combatting climate change or fighting social injustice such as human trafficking. We are also active participants in and support the UK Government’s National Data Strategy (NDS) by providing the required public web data to assess, for example, data skillsets on a national level or sharing our extensive expertise in the data domain. After all, we’ve been around for 8 years and that is a long time in the data domain.

This industry sounds complex – surely there’s regulatory frameworks in place and regulations, right?

Well, not really. Besides the regulatory framework that GDPR and CCPA provide to deal with data privacy, which we are very happy about, there is no real framework that guides operators with web data collection. For this reason, we are now involved in several committees and inquiries dealing with AI and web-data collection ethics. As a company, we are self-regulated and take pride in our transparent, compliance-driven procedures and practices. This is unprecedented in our industry, and I encourage all other companies and operators in this space to follow our lead.

When you look at this domain from any direction, you quickly find out that a regulation framework actually makes you a better company and most likely a better innovator. After all, customers out there want to know that they are safe in the hands of trustworthy hands... Trusting your data starts with trusting your data provider and that is a commitment every data provider must follow.

generic banners explore the internet 1500x300

Follow CEO Today

Just for you

By k.hristova - May 31, 2022

On a Mission to Preserve a Transparent Internet

Share this article

About CEO Today

Follow CEO Today