Beginner’s Guide to Big Data

For a quick overview to big data, read Bernard Marr’s The Complete Beginner’s Guide to Big Data Everyone Can Understand (Forbes, March 2017).  He sums up the basics you need to know in just a few paragraphs. Marr is a consultant, lecturer, and author of several books, including Data Strategy.


Illustration by Camelia Bobon

We are in the midst of a data explosion. Machine-related data is collected via sensors, transactions, and home monitoring and security devices. People-related data is collected via our smartphones, social media, shopping, and communications. Additionally email, text, social messaging, photographs, videos, webcam and satellite images, and other kinds of unstructured data are captured and stored in vast data centers around the world.

Data scientists mine data by building data models and running simulations against structured and unstructured data to look for patterns and make predictions. Sophisticated yet easy-to-use data tools make it easier to access the data. “As a service” subscriptions make it possible to mine data without expensive overhead, thereby democratizing data access for even small-scale researchers.

The upside of collecting and analyzing all this data is that we have the potential of solving some of our biggest problems – curing diseases, feeding the hungry, exploring space, preventing crime, predicting and responding to disasters, and making life generally better. The downside can be invasion of data privacy, threats to data security, and potential data discrimination.


I signed on to a new project at work that involves big data, so I’m hitting the books, web, YouTube, and my network to learn more. There are a lot of related terms that I’ve heard, so I’m sorting that all out too – analytics, machine learning, Internet of Things (or IoT), and all their intersections. On a recommendation, I recently read Thingalytics by Dr. John Bates (Software AG, 2015).


Bates defines “thingalytics” as “the use of real–time analytics and algorithms to make sense of the fast Big Data arising from the Internet of Things. The book includes:

  • Chapter 1 – It’s All About Me: empowering a new generation of personalized marketing and customer experience applications.
  • Chapter 2 – Machines with Feelings:  how smarter machines are enhancing efficiency, reducing cost, and improving customer experience.
  • Chapter 3 – Home is Where the Smart Is: how intelligent apps are revolutionizing how we cook, clean, wash, and watch TV.
  • Chapter 4 – Take Two Smart Pills and Call Me in the Morning: the hospital of the future (and today).
  • Chapter 5 – I’m the Chairman of the Board: smart self-learning algorithms that function as the brains, especially automated trading.
  • Chapter 6 – RoboCops: how to spot and navigate around threats and problems.
  • Chapter 7 – Planes, Trains, and Automobiles: smart logistics and autonomous transport systems.
  • Chapter 8 – The Technology Behind Thingalytics: technology and infrastructure needed to support Big Data streaming analytics in the cloud.
  • Chapter 9 – Go Forth and Use Thingalytics: summary and the way forward to use Thingalytics to improve business.

In summary, the Internet of Things is all about digitizing everything in the real world and integrating it into the Internet. Real objects are equipped with sensors that capture status and make it available; an app somewhere consumes, analyzes, and acts on the status, and may even be self-learning. At its best – lives can be saved, fraud avoided, customers delighted, and carbon emissions reduced. At its worst, a rogue command can lose millions of dollars and wreck reputations in seconds.

It is exciting to think of another technical revolution – one equal to the revolution that moved most people from an agrarian economy to a service economy. Think of the big problems that could be solved, the diseases that could be cured, and the interesting new jobs that could be generated. Another part of me ponders the incredible resources required to support massive data collections, the potential invasion of privacy, and the millions who will lose their jobs if they can’t retrain fast enough. I also resist the idea of being reduced even more to a consumption object who is the target of laser focused propaganda vying for my dollars. But it’s a brave new world, and best to meet it head on!