Data Product in a Box

A Box.

A box of stuff to help you build a data product.


What is in it?

Wait. Why do you care what is in the box?

Because I want to build a data product!!

Really? Do you want to build a data product?



Name a few data products you like to use.


Does nothing come to mind?

Do you track your steps? Do you have a subscription to LinkedIn premium? Do you use a lead generation tool? Do you run ads on social media? Do you use Swiggy to order food?

All of these are data products. They are obviously not just data products because they are also sales products, e-commerce products, fast food delivery products, etc.

But a large part of your interaction with these products is through data elements.

And these data elements are derived via processes that we are going to talk about today.

Let us put on our VR goggles and take on a new life.

You are a startup employee and you have great ideas. You present your idea to the founder and he likes it.

Your idea is to give a tool to all E-commerce websites so they can block price scalpers. You hate price scalpers.

Your mom and dad used to run an E-commerce site that sold baby clothing and price scalpers would run bots on your parent’s E-commerce store and sell that data to other baby clothing websites which would then reduce their prices.

And most shoppers looking for a bargain will not even care to learn more about the care with which your parents crafted baby clothing and the high-quality standards they maintained in their eco-friendly and sustainable labor workplaces.

Bots are tools created to suck up data from websites without the permission of the website owner. While Search Engines like Google do the same thing, they return value to you by allowing people to discover you on an Internet that is unbelievably large and humongous. If not for search engines, your E-commerce store would never be discovered.

So yes, your idea to help websites detect bots is a great one and there is obviously a demand for it in the market. Your founder is happy.

How do you find people who own websites that will need your solution?

Check LinkedIn. Search for people who own E-commerce websites. Yes. Great option.

So when you reach out to them, what do you plan to tell them?

I know.

You want to tell them all about how great your product is.

You can go one step further and say how great your product is amongst the top 3 players in the market.

Good. This is great information to send across.

But if you know which product the E-commerce owner is already using to fight bots on his website, then you can do something special.

You can find out the price of the solution the E-commerce website is using, lets call it BotSwatter.

You can learn all about BotSwatter.

Its price.

Its features.

Its reviews, both good and bad from online forums and discussion boards.

With the information collected, you can sit down and devise your strategy.

Can you price your solution less than BotSwatter? Perhaps you can. To gain a foot in the door, you can give away a generous trial for free.

Can you showcase superior features? BotSwatter solution causes delays in page load speeds for the website.

Your solution is 10x faster.

Faster websites mean more revenue.

When amazon loads slower than a second, which happens rarely, I almost never continue with my purchase.

So when you promise a faster website load than BotSwatter, the E-commerce store owner will definitely want to learn more. At least if he is the kind of owner who wants higher revenue.

BotSwatter uses a simple client-side feature map to detect a bot, whereas you are using Neural Networks running on the client device to detect a bot or not. You are 30% more accurate than BotSwatter.

More accuracy means that you don’t show Captchas to real humans. You annoy people less because you are more sure of catching a bot and letting real people continue without doing stupid puzzles and treat adults like kids playing “Spot That”.


But how do you find out which product is the E-commerce website owner using on his website to fight bots?

Do you want to know how?

You really do?

I can tell you the answer and it will cost nothing.

The answer is that you can analyze Common Crawl data. You can extract information from Common Crawl data that will tell you which technology is a website using.

You can do this for all websites on the internet. That is hundreds of millions of websites.

You may only need to know for a few thousand. But you can do it for millions.

You may want to know about bot detection technologies only and which websites use them, but you can gather information on which websites use online chat technology, payment providers, affiliates, and web frameworks, and the list is endless.

And with this data, you can create a website yourself.

Where people can come, choose the name of the technology and download a list of websites that use it.

And you can sell this product. You can sell it to customers and charge them money to use this product.

So there you have it.

That is how you can make a data product.

Some of you are calling me out right now. You want to know more.

The box has what you need to actually learn what it takes to build a data product.

The box has a lot of material.

What is SQLite and why should you care?

What is Common Crawl data? And how you can use it.

What is Python and why do you NEED to care?

How to process Common Crawl data and extract valuable data?

How to make an app to deliver the value of the product.

How to prepare a product pitch to your potential customers?

The box is not a course. The box is not a set of tasks.


The box is your magic trunk.

It is your flying carpet.

It will do magic only when you know how to summon its powers.

First, learn simple magic tricks.

Hide a coin before you intend to hide an elephant.

First, find an egg in your jacket before you go looking for a pigeon.

This box is a bag of tricks.

I will show you the necessary tricks for building data products.

And I will not teach you these as a professor, coach, mentor, or senior. I am not here to teach anything.

I am only sharing my notes with you. Notes I have put together in the form of some organized chaos.

The printed material inside the box is not textbooks. They will not find a place as a classroom guide.

They are to be treated like field manuals. To be used as a rough map. Not Google maps type of map. But a map made by hand. A map of a region that is not fully explored.

So the box comes with what you can one would call a lot of trinkets and miscellaneous items. Material that will bring some organization to the chaos of building data products.

I am 80% done with putting such a box together.

The box will be given away for free.


Only those who will take up the effort to learn the tricks and apply them will unlock box updates.

The price of admission is proof of work.

And you will be part of a decentralized movement of building data products. Of learning how to build data products. Actually building them and delivering value to the world.

Stay tuned on LinkedIn where I share updates.