The 3 V’s every CIO needs to know about big data analytics

Hadoop became so popular over the last few years which resulted in a boat load of individual projects that now have popped up making it more usable and attractive to work with.

What’s the problem Hadoop is solving for CIOs?

Well, the problem is musked around data. Petabytes and Terabytes of data. Big companies like Twitter, Google, Facebook, LinkedIn, Ebay, Amazon etc. How do these companies make sense of their data? Better yet, how do they do it efficiently and in a time friendly manner?

How Is Big Data Characterised?

Big data can be characterised using the three “V’s”:

  • Volume

      a. How do we work with terrabytes and petabytes of data?

  • Velocity

      a. How do we do it in a time friendly manner?

  • Variety

      a. How do we work with the many formats of the data?

So the answer to the big data challenges is around distributed computing. Meaning spreading the data across multiple ninja machines that all work together to process and serve that data is the answer to how we characterise data. Using the 3 V’s.

Distributed Computing VS Traditional Computing.

Traditional Computing

  • Big expensive servers, with lots of hard drives. RAID enabled for fault tolerance and performance. Or what about a cluster of these servers? The challenge? – Disk Transfer rates. Disk transfer rates haven’t increased over the past few years. If we are reading 1TB of data @ 100mbps = ~2.5 hours to read data. Not feasible. Bringing the network to a halt.

Distributed Computing

  • 100 low-medium spec’d machines. Would take around ~2 minutes to read 1TB of data.

How Does Hadoop Come Into the Picture.

Hadoop is a software based distributed computing model. Open sourced. Built with java.

Complementing two characteristics:

  1. Strengths – resources of CPUs.
  2. Weaknesses – Disk Transfer rates and network bandwidth.

How are big companies using hadoop to solve big data challenges?

Hadoop is made up of two core components:

  • HDFSHadoop Distributed File System = Storage.
  • HMR – Hadoop Map Reduce = Job retrieval (Pulling data out of a cluster)

These are the foundations of hadoop. As hadoop evolves and matures, many more companies will start implementing hadoop to manage big data rather than seeing the traditional big players.

As businesses grow, sometimes the people employed or even the organisation in the earlier stages don’t grow with it.  From time to time, reputations are lost and overall cyber security control is neglected.

CIO Cyber Security is specifically tailored to the different stages of business, we have a program that will suit you and help you get your business cyber safe

If you’re a CEO and business executive we’d love to see you at our upcoming cyber security breakfast.

If you’re a CIO and Technology Leader get a FREE copy of my book.   

Meet Andrew Constantine

Andrew Constantine is an entrepreneur and a cyber security advisor who is changing the world of cyber security. He is the CEO of Australia’s largest community of technology and business executives.