Visualizing data is the art of summarizing all relevant information in a single view. You can see data visualizations all around you, and you make use of them more often than you might notice. Take for example a map. A map is a visualization of street names, districts, cities, countries, borders, soil, latitude, longitude and much more. Still it is comprehendible and a human mind is able to extract a meaning from it. Adding your own location and a compass to it makes navigating through an unknown maze of streets a piece of cake.
This map can even be simplified further, take for example a subway map. When you get into a subway, you probably already know to what station you want to go to. Which streets you are travelling underneath and what direction you are exactly going isn’t that relevant in this situation. Therefore, only a schematic view of reality containing a few colored lines are sufficient for you to get to your destination. Based on these lines you know where to change tracks and when to get out of the metro.
Like in the example above, the best visualizations show only the information you need in a particular situation, nothing less or more. This allows you to get the maximum value out of it, as fast as possible.
But what has data science have to do with the art of visualization? Isn’t data science just crunching, analyzing, and modelling numbers? Actually, visualizing is a very important component of being a data scientist. In this article I will show you why.
A data scientist can come with the best models and predictions possible, but they have to be believed in by different stakeholders and embedded in business processes before they can be of any added value. Visualizing how a solution works and what exactly comes out can make the most complex algorithms understandable for everyone, even for people with no knowledge of data science at all. This can make sure that the solution is supported by all stakeholders, and will be deployed optimally.
There are different formats in which the outcomes can be presented, such as slides or (real-time) dashboards. Which format is most appropriate may depend on by who and for what purpose the outcomes will be used. When they are used as input for a strategical decision by e.g. a CEO, extracting the outcomes once and putting it on a slide might fulfill. But, when a data science solution is used to help a decision maker in making operational decisions more easily on a daily basis, a dashboard might be a better medium. Even when a data scientist’s output is directly incorporated in operational systems and is able to take actions automatically, it is important that this process can be monitored in an intuitive way.
One might say that a traditional report can do this job just as well; summarizing the outcomes in a table, containing exactly the same information. But I think that a good visualization is much more kind to the eye, moreover it has other advantages as well.
Often, a business’ data is distributed over different systems and/or tables. The human brain is not capable of memorizing, combining and interpreting all of this information at the same time. This is where a good visualization can be of great added value. What makes a good insight, is that all relevant information can be intuitively understood and interpreted in one sight.
The chart below comes from a solution Building Blocks’ once implemented for one of its clients. It is a visualization of events (circles), each with its corresponding amount of bookings (fill of the circles). This insight helped the decision maker to determine whether reservations have to be moved to another (similar) event or not, since events cannot be too full or too empty. Next to that, it gave the decision maker instant insight in the demand versus capacity per type of event and thus served as input to determine if they had to schedule an extra event or not.
With this example I want to show you how much information can be visualized in a single view. Therefore, we have selected an event for which demand is exceeding capacity (see the coloured fill of the circle in the bottom middle). Other circles in this graph represent its substitutes (similar events at another date and/or location). The closer a circle is to the selected event the better substitute it is. In this graph a lot of information can be retrieved just by looking at one circle: the date, distance to selected event, capacity, and bookings. The gray stripes in the background represent holiday periods. Moreover, this visualization gives the prediction the performance of every single event by the border’s color. Green indicates that the prediction is that the event will be full at the end of the registration period without any actions to be taken, yellow indicates that some additional actions are necessary to fill up the event and red predicts that the event will not reach the minimum capacity and serious action
Summarizing, this chart is showing information from four different systems, in seven dimensions/measures, combining it with a prediction in just one clear image. Making it possible to interpret all relevant information in a matter of seconds and subsequently focus on the right event and make the optimal decision!
What lies behind a data science visualization
This is what makes a visualization so powerful, and shows why it is such an addition to a data science solution. A good visualization can make anyone understand data science, without directly showing all complexity that lies behind it. It only shows the outcomes and possibilities, and isn’t that much more exciting to look at?