What: Annual inflation rate and 3-month interbank rate. When: Every month from January 1990 till January or February 2022 (latest data available). Where: All countries available in the OECD database (OECD countries + some other countries) which have data for 2022 (that’s why no China here) except Luxembourg. Source: OECD
What: The chart shows average daily gain in $ if $1000 were invested at a date on x-axis. Total gain was divided by the number of days between the day of investing and June 13, 2021. Gains were calculated on average 30-day prices. When: from March 28, 2013, till June 13, 2021 Source: investing.com and coingecko.com
The question is it easy to replicate the default settings of one charting software in another charting software bothered me for some time. Are the default settings more universal or less universal? Do different vendors have different attitudes towards what should be the default setting?
I chose to work with a line chart because different software interprets differently how to arrange multiple series in a bar chart – some tools stack them, some not. By adjusting this arrangement I would lose the defaultness, while without any corrections the charts would be less comparable. I made all charts squared, so they fit better in the grid.
This comparison does not include online tools like Datawrapper because a significant part of their settings are the interactions – tooltips, highlights, etc and here only static images are compared. I would like to include JMP, but my trial period has already expired. Python and Javascript libraries are not included, because I don’t know how to use them.
Insights about the defaults:
All have horizontal gridlines, ggplot2 and Tableau have even vertical ones.
Only Google Data Studio and Tableau have highlighted the zero line, although Tableau highlight is barely noticeable.
Blue and red or orange are within the first three choices in every palette.
ggplot2 looks exceptional with its grey panel.
The default settings of Tableau make the least sense because they are configured for more charts with more legends. One chart with one legend looks a bit weird.
Grey squares at the top right of Google Data Studio charts are how the control buttons are rendered as an image.
Insights about the comparisons:
Of course, ggplot2 manages to replicate even the most complex cases. The biggest challenge was using Google font from Google Data Studio because the library”showtext” which seemingly allows achieving this does not work well with ggplot2.
Settings of ggplot2 itself were the most difficult to replicate.
Tableau was the only software that could not replicate the exact colours of lines, because a colour must be chosen from a predefined palette there.
It was quite annoying that Power BI and Google Data Studio could only export to PDF, however they are not meant to make pretty pictures after all.
Somehow square charts from Excel lost the squareness after saved as images.
Google Data Studio insisted on a black line indicating zero and refused to show vertical gridlines. Maybe I just don’t know this tool well enough or maybe these are the limits.
Adjusting the limits of the x-axis was always a challenge, the y-axis is often allowed for way more freedom.
Adjusting legends was always the most difficult part. Legend is what distinguishes one tool from another.
I believe this exercise is of little use, but it was fun to do it!
Turime ir vakcinas ir griežtas priemones valstybės mastu suvaldyti ligos siautėjimą.
Su viltimi, kad viskas bus tik geriau stebiu Covid-19 ligos paplitimą Lietuvoje ir pasaulyje. Duomenys savaitiniai. Atnaujinama nereguliariai.
Ne visi sergantys šia liga yra testujami ir užfiksuojami, nes trūksta tiek pačių testų, tiek darbo jėgos medicinos sektoriuje, taigi gali būti, kad žmogus susirgo ir mirė nepastebėtas. Taip pat dėl sergančiųjų antplūdžio ligoninėse žmonės nesulaukę medicininės pagalbos gali mirti ir nuo kitų ligų, nuo kurių galėtų būti išgelbėti. Dėl to svarbu stebėti ne Covid-19 mirtis, bet perteklines mirtis.
Pasveikusiųjų duomenys yra ypatingai nepatikimi, nes prioritetas visad yra testuoti naujai susirgusius, o ne pasveikusius. Pasveiko gerokai daugiau žmonių nei skelbia oficiali statistika pasaulyje, taigi ir sergančiųjų yra mažiau. Čia pasveikusiųjų skaičius įvertintas teoriškai – jei žmogus per 4 savaites nuo ligos nenumirė – vadinasi pasveiko. Tai nėra tikslu, bet geriau negu oficialūs duomenys.
Šis grafikas parodo kiek lietuvių gyvena užsienio šalyse ir atvirkščiai – jis nerodo srauto, t.y. migracijos.
Stebina tai, kad Rusijoje ir Lenkijoje gyvena daugiau lietuvių nei Norvegijoje ar Airijoje.
Šaltinis: United Nations, Department of Economic and Social Affairs, Population Division (2019). International Migrant Stock 2019. (United Nations database, POP/DB/MIG/Stock/Rev.2019).
In the Northern hemisphere the summer is warmer than the winter (i.e. normal), in the Southern hemisphere the winter is warmer than the summer (i.e. Australian), around the equator there is not much difference during the year.
What: The difference between monthly mean temperature and annual mean temperature. When: Some weather stations have data since the XVIII century. Where: All the weather stations in the world binned at each 10th latitude. Only stations with full-year datasets used in calculations. Source: Global Historical Climatology Network-Monthly (GHCN-M) temperature dataset https://www.ncdc.noaa.gov/ghcn-monthly
The question arises because we’re having Climate Change and not Global Warming. Only 6 weather stations in the world have a statistically significant negative temperature trend since 2000. Most of them are located around the equator, one is in Antarctica. There are 391 weather stations that have at least 5 full years of data since 2000 and which have a significant trend (p-value is less than 5%). In 385 of those stations, the temperature is rising.
When: 2000 January – 2020 December. Where: Weather stations that have at least 5 full years of data during the period in question and have a significant regression coefficient (p-value < 5%). Source: Global Historical Climatology Network-Monthly (GHCN-M) temperature dataset https://www.ncdc.noaa.gov/ghcn-monthly
There are variety of tools to do dataviz and for someone who is just starting it is quite difficult to select which one to learn. One possible criteria might be whether this tool is used by professions focused on dataviz. R is focused on statistical analysis, Excel on spreadsheets, and python might be used for everything. So what is the tool which dedicated dataviz professionals use? The conclusion is that there is no such tool, because dataviz is more like a skill useful in many cases (financial reporting, UX design, journalism, etc.) rather than a profession by itself.
Probably of the most often visualized datasets in the world are profit-loss statements of corporations month after month presented to executives in powerpoints. Millions of them are made and quickly thrown away faster than new Covid-19 case charts. It’s quite strange how few resources are there discussing visualizations of financial data.
Majority of those who do care about finance seem to favour waterfall charts (like this analyst) and for a good reason, because they provide insight a simple table does not have. Some utilize sankey or flow diagrams because they show flows of money in quite an intuitive way (cool example – a squared sankey in Tableau). Also, there are dashboards completely ignoring the specific structure of financial data – good (because it’s bad) example is this dashboard made by “somebody with good Tableau but limited finance knowledge” as said in one comment.
I consider myself as somebody with good financial knowledge so I tried finding a better way to visualize profit loss statement solving some limitations of waterfall and sankey charts:
waterfalls look much better when visualizing increases and decreases, but the distribution of revenues or expenses look weird done in this way.
sankeys simply cannot visualize negative results in an intuitive way
financial activity result is often presented as one line in profit-loss statements and might be positive or negative, so in a usual sankey diagram it would not have a fixed position independent of its value
And here is the result. Sankey is used for basic distribution of revenue or expenses and could be used for showing the further breakdown. The waterfall comes in after Operating profit is calculated and net results of financial and other activities are simply added and subtracted (in this specific case they’re tiny). Profit tax is again visualized as sankey, because usually, it has the same sign as profit. Also, mind the colours, black is used to show positive flows (revenue), blue (which is also provided in the brandbook of Amazon) is used for negative flows (expenses).
This approach allows for visualizing negative results as well. Where else could we look for companies with negative profits except the aviation industry! Here is the profit-loss statement of Lufthansa, a German air carrier. The visualization is a bit awkward at first, but once we get into the waterfall everything becomes simple and intuitive. Also, notice that profit tax is negative – it improves the result.
Of course this way to visualize profit-loss is far from perfect:
One issue is related to labels – when a rectangle is too small, the number does not fit inside. It is easy to solve, however – if labels are necessary they can be shown beside the names of categories.
Another issue is a comparison between time periods – this type of chart can show only one period. However some cross-company insights still could be made – just compare profit margins of Amazon with those of Aramco – the Saudi Arabian Oil Company:
I hope these charts will inspire further discussion about financial data visualization and more great ideas.
The western world loves freedom a lot, with Northern Europe being at the top, whereas regions with predominantly Muslim populations have many restrictions and end up at the end of the (circular) list with low averages.
What: Freedom indices recalculated to fit the range from 0 to 1, where 1 means the best index and 0 means the worst (in 3 out of 5 cases – it’s North Korea). Simple averages were calculated for the regions. Index, When, Source: Democracy Index, 2019, EIU Human Freedom Index, 2017, The Human Freedom Index 2018: A Global Measurement of Personal, Civil, and Economic Freedom Economic Freedom Index, 2020, The Heritage Foundation Moral Freedom Index, 2020, The Foundation for the Advancement of Liberty Press Freedom Index, 2020, Reporters Without Borders Where: 172 countries were ranked on at least 3 of these indices.
P.S. Radial representation is to make this chart more entertaining for its creator.
I checked five different indices of freedom and democracy provided by various researchers. My initial idea was to show the top 10 ranking countries of every index. However, the USA does not manage into the top 10 anywhere. So, I checked what countries rank better than the USA on those indices. Apparently its neighbour Canada, The Netherlands – one of the oldest democracies in the world, also New Zealand, Denmark and Switzerland rank better on all five indices. Countries like Estonia, Taiwan and Uruguay rank better on 3. Fund fact – the United Arab Emirates, one obviously not-so-free country ranks better than the USA on Economic Freedom Index.
So, If we talk about FREEDOM we talk about THE NETHERLANDS.
What: Difference in ranking compared to the USA on the following indices. Index, When, Source: Democracy Index, 2019, EIU Human Freedom Index, 2017, The Human Freedom Index 2018: A Global Measurement of Personal, Civil, and Economic Freedom Economic Freedom Index, 2020, The Heritage Foundation Moral Freedom Index, 2020, The Foundation for the Advancement of Liberty Press Freedom Index, 2020, Reporters Without Borders Where: 172 countries were ranked on at least 3 of these indices.
The Stacked Bar chart is one of my favourites, I even made the same stacked bar chart with 9 online tools – but it has one major weakness, it’s difficult to compare changes of its segments over time. I tried finding a way to improve it and here let me introduce the Comparative Stacked Bars:
Triangles show the absolute increase or decrease of each segment. They are colour-coded to make it even easier to read.
It’s limitations:
Too many is too many – if there are too many categories the triangles will make the chart look messy and difficult to read. But even more difficult it would be without triangles.
Too small is too small – if the change is too small the triangle might become invisible. But without them, the changes would get invisible much sooner – just observe the top segment in the above chart.
The example above was made in R, and the example below was made in Tableau. Unfortunately I have no not-overly-complicated solution for Excel. If anyone knows how to implement it properly, please let me know!
The pie chart faces a tremendous amount of criticism for attempts to show part to the whole relations. Of course – it is easily the single most misused chart! However more and more data visualizations practitioners are writing articles to defend it with Robert Kosara being the most thorough and methodical in my opinion.
Here I will offer an alternative which is something like a mix of square pie chart, marimekko and packed bars. Let me introduce the Cake Chart:
It’s features:
Always a regular square, total area = 100%.
Each segment is a bar for easy comparison, but its area represents the percentage.
The height of the chart is distributed evenly among bars.
Only the selected number of largest categories are shown separately.
All other categories are aggregated into the irregular steppy gray area.
If there are 5 categories shown separately, then the largest cannot be larger than 100% / 5 = 20%. For X categories to be shown – the maximum value cannot be larger than 100% / X.
Weaknesses:
Aggregated other categories are difficult to compare to bars.
There are limitations how long the longest bar can be without destroying the squariness of the square.
Square is not as intuitive to be 100% as circle.
Strengths:
Bars are super easy to compare.
It’s still a regular shape.
It’s quite obvious how to make it – make the panel square, make its background grey. However, making the Others category selectable in Tableau is a challenge and here is the result:
How does one draw the dynamics of a changing curve? This article will be dedicated to experiments of drawing the evolution of the whole yield curve over time.
I tracked down the time when I start my working day, what percentage of weekly goals I do achieve and whether I do many other daily routine things (meditation, exercise, proper meal, not checking social media after 6 p.m. etc.) which I aggregated into “Level of discipline”.
During the time of self-observation I began to wake up and start working earlier, I started to achieve more goals, but this “discipline” thing did not improve. I guess that trying to do many “useful” things during the day is not as useful and productive as it may seem.
What: Weekly average time of starting to work, the proportion of weekly goals achieved, and “level of discipline” measured in points. Moving averages are calculated using 5-week intervals. There are omissions in the data, as one may see. When: 38 weeks during 2019-2020 Source: self-observation