Top 5 Data Journalism Myths Busted

data journalism myths busted

A good journalist is a master storyteller. Telling compelling stories with data has become a new challenge and a growing trend in journalism today. The Internet has unlocked the world of information and made a huge amount of data available. Data journalists dig through all that data and transform it into valuable content.

There is a continuous and growing demand for data journalists, but many traditional journalists still shy away from it. Data journalism is kind of shrouded in mystery due to many myths about who can be a data journalist and which skills are required. We’ll delve into the most common myths about data journalism and unfold reality to help you better understand it and be less intimidated.

Myth #1: Data journalists must have a knack for math

“I am in journalism because I don’t have a knack for math” is the first thing most journalists and journalism students will say about data journalism. However, just because data-driven journalism involves numbers, it is not all about numbers. And it  certainly is not required that you be a math wizard to create stories with data.

As a matter of fact, not all data-driven stories have to rely on numbers. They can actually be based on something a lot more palatable, like wine, for example. If you don’t believe it, check out this interactive timeline that shows us how wine colonized the world.  

Continue reading →

How To Analyze Open Data With Able2Extract, Power BI And DataHero

There is a general sense of helplessness when it comes to analyzing public data, especially as people think it involves insane amounts of statistical mastery and in-depth knowledge of complicated statistical software.

This is especially nerve wracking for data journalists, who are keen on using data to write stories that can actually influence a certain aspect of our society, such as healthcare or education. Truth be told, analyzing data and storytelling actually go hand in hand.

Since the Open Data initiative started, more and more data sets have seen the light of the day on various data-related portals. The most interesting data sets for journalists are the ones who are publicly available, simply because they are free to use and analyze. Those data sets are available on a variety of online sources, such as: www.data.gov, open.canada.ca, data.gov.uk and many more.

Open data portals contain thousands and thousands of data sets, related to various branches of government: education, business, economy, crime, justice, healthcare and more.

Once you start exploring the online data, you will see that it usually comes in 3 main formats: HTML, XML and PDF.

Common Open Dataset Formats

However, if you start investigating the data sets in more depth, you will quickly notice that there is only one format that’s present in almost every data set — the PDF. So, the logic goes that if you know how to analyze data that’s locked inside a PDF, you’ll know how to analyze any.

But what makes people want to store data in a non editable format?

First of all, when you save a data set as a PDF you are reducing its size, so it’s easier to store and upload to online databases. Secondly, since the PDF is not editable by default, you are making sure that no one tampers with your data and changes any of the ever-so-important numerical values. Remember, people spend countless hours gathering data and they are keen on protecting their hard work as much as possible.

So, once you find a PDF data set, where do you go next?

You now basically have only one option — you need to get that data into an Excel or CSV file format, while preserving source document accuracy as much as possible. After you do that, the next step would be to import that converted file into a data visualization tool of your choice, which we will cover later in this tutorial.

When it comes to exporting PDF data, the only tool in the market that has advanced enough PDF exporting capabilities is Able2Extract. That is because Able2Extract is not just a regular PDF converter. See, most (if not all) PDF converters on the market only convert PDF to Excel automatically, leaving you with a messy data set. The automatic conversion works good for one page invoices but converting a 1,000 page data set takes a lot more than that.

Able2Extract is the only converter that lets you fully customize your conversion by manually setting up row and column structure, prior to conversion. In addition it lets you preview the conversion results from within the software, which lets you export your data set as accurately as possible.

First, find your PDF data set. For this tutorial, we are going to use a practice data set containing all funded projects from Canadian Environmental Damages Fund. You can download it here.

Open the data set in Able2Extract and use custom PDF to Excel conversion to convert it to an Excel file. Set up row and column structure using the right side panel and make sure to check the “Preview conversion” box. Once satisfied, hit the convert button to send the data to Excel.

Able2Extract Custom PDF to Excel

So, we got our data from PDF and into Excel. Great job!  

The next step is to go to Excel and clean the data. This will take 15 minutes to 2 hours, depending on the data set, but the thing you are looking for in the end is to end up with data in the tabular format, which means there is a separate row for each record. It should look something like this:

PDF to Excel Conversion Results

Make sure you don’t have any empty rows or blank cells and that all text is formatted in the same way. If there is a row with 3 cells missing it’s best to delete the whole row because it can mess up your end result and produce inaccurate results.

Now that we have a clean and tidy data set, it’s time to give life to these numbers and visualize them. Enter data visualization.

Data visualization simply means to create interesting charts from just plain data, which makes it easier to understand and present to your readers. When it comes to visualizing data you have an option between a desktop dataviz tool and a cloud dataviz tool. We will explore one example of both.

Our recommended desktop software for visualizing complex data is Power BI. We are recommending it because of its compatibility with Excel and the fact that it’s free to use for datasets up to 1 GB. You can download it here.

Before we start with Power BI, you will need to know that analyzing data starts by asking questions and then using data to answer them. For example, you can ask questions regarding our practice data set before we even upload it to the dataviz tool:

  • What was the EDF funding per region?
  • Which group received the biggest funding?

Depending on the data set, you can ask a 1000 questions and, make no mistake, you will get a 1000 answers. OK, let’s move on to more serious stuff. Power BI.

Power BI

Power BI is a Business Intelligence tool created for monitoring business performance and discovering market opportunities. Today we will use it as a data journalism tool in order to answer the two questions above.

Once you open Power BI you first click on Get Data > Excel > Connect > Your file.

Choose a sheet where data is located and press Load. Alternatively, you can press Edit if you’d like to check your data set for mistakes once again.

Once you do so, you will find a blank canvas and your data values on the right sidebar panel.

Accessing PowerBI Side Panel

These are the values we are going to slice and dice. Let’s try to answer our first question. If you remember, we wanted to know what was the EDF funding per region.

The basic data field there is EDF Funding so we’ll drag it into the “Values” box. The canvas immediately changes and it is now showing us the total EDF funding:

PowerBI EDF Funding Values

Let’s now introduce another data field. Select the “Pie chart”.

PowerBI Data Visualization Selection

Drag the “Region” field into the “Legend” box. Congrats, you made your first data visualization! We now have an overview of the funding per region and we can already start answering some questions.

EDF Funding Visualization By Region

However, if you pay close attention you can see that we still don’t know the exact funding for each region. To show the exact values of data fields, go to “Format” panel:

Accessing PowerBI Fromat Panel

Expand the “Detail Labels” category, find the Label Style and select “Both” from the drop down menu.

Selecting PowerBI Detail Labels

Our pie chart is now showing us the specific monetary values for each segment. Great, first question answered.

EDF Funding Pie Chart

OK, next up is to see which Group received the biggest funding. We’ll repeat the process but we’ll use a different chart, just to demonstrate different features of Power BI.

First, find and click on the Clustered Bar Chart.

Selecting Clustered Bar Chart

Drag the EDF Funding into the Values box and drag the Group into the Axis box. Turn on the data labels and you’ll quickly see that the University of Waterloo received the biggest funding — almost $320,000

EDF Group Values Chart

Now that you know how to ask questions and visualize public data, we will now quickly go over another tool that can help you visualize your data in the Cloud. Have in mind that the Cloud tools only support lower file sizes, which means you’re best off using them for 10-20 page data sets. Luckily, the data set from our example is actually pretty small.

DataHero

DataHero is a cloud solution for Business Intelligence and data visualization. It allows you to connect files from numerous online and offline sources and it even has an integrated data cleaning tool, which is nice, but I do not recommend relying solely on it.

You can use DataHero for free, for files up to 2 MB in size. Anything larger than that, and you’ll probably have to pay a monthly subscription which is between $60 and $90. For this purpose, we are going to use a free plan.

Create an account, click on the Data tab and click on Import Data.

Importing Data With DataHero

Find your Excel file, select the sheet and upload it:

Uploading Data with Datahero

On the next screen, check formatting and proceed.

What’s cool about DataHero is that it automatically suggests data visualizations:

Suggested Visualizations From DataHero

I was originally interested in EDF Funding by project category so I’ll just create a brand new chart. DataHero uses the same drag & drop interface so it’s really easy to start using it.

First, drag the EDF Funding field onto the canvas.

PowerBI EDF Funding Values

Next, drag & drop the Project Category field.

DataHero Pie Chart Visualization

As you can see, we received our answer. Most of the funding money (35%) went into Restoration projects and the rest  was dispersed equally between other three categories.

There are other, more complex, data visualization tools but we will stick with DataHero and Power BI for the time being as they offer the most features in their free plans.

Let’s recap the entire process of analyzing public data that’s archived in PDF:

  1. Find a relevant data set
  2. Use Able2Extract’s Custom PDF to Excel feature  to convert it to Excel or CSV
  3. Clean the data in Excel and remove blank rows and cells
  4. Visualize the data using a tool like Power BI or DataHero

By now you should have a clear understanding of the entire process of analyzing public data and should be well on your way to using it to shape the future of journalism. The strategy is simple — just upload clean, high quality data and play around with it until you get what you are looking for.

Top 10 Open Data Resources Online

There’s a lot of hype around big data these days. And it isn’t any wonder with everything going digital. We’ve been generating a large mass amount of data that’s becoming more and more valuable.

Think of publicly available and open data.  You may be asking why is it so important? Using open or publicly available data can be extremely useful. Some examples include:

  • Analyzing global trends
  • Measuring the efficiency of government policies
  • Innovating a new service
  • Improving your company’s products

Because of the wide range of applications, not just data scientists, but journalists, marketers, business professionals, and even freelancers, are learning how to access, clean and interpret raw data.

While you may already have access to data analytics tools or a topic to research, all that may be missing are the big data sources to work with. Wondering where to find statistical data? Below is a list of databases you can start with.

1. The World Factbook –The CIA website has an open source library where you can find its World Factbook, which offers the public opendata on 267 countries on topics ranging from history and government to geography and military.

2. Amazon Web ServicesAmazon Web Services offers a suite of cloud computing services for database, computing, storage and analytic solutions to name a few. AWS also hosts a number of public datasets, which you can access. You will need to have an AWS account or access to its command line tools.

3. Open Government Data—Looking for data from major governments? You can search through data available publicly from the Governments of Canada, the USA and the UK. The official portal for European data – data.europa.eu – can give you access to a range of data produced by member countries of the European Union.

4. Open Data Network – The Open Data Network, launched by Socrata.com in 2014, is an online data catalogue that makes finding the data you need as easy and familiar as finding data with Google. Its latest enhancements include updates for elements like autosuggestion and easier mobile navigation. This site covers all major US regions including public data sources for areas like finance, health, infrastructure, education and social services to name a few.

5. Google Public Data Explorer– What better place to explore datasets than on Google? Google Public Data Explorer makes it easy for anyone to access and interact with public data from international organizations and academic institutions. You can upload, share and visualize datasets. Here’s an overview to get an idea of how to use Google Public Data Explorer.

6. DBpediaDBpedia is a crowd-sourced effort to pull structured data from Wikipedia into an accessible database form on the web. According to the site, the advantages of the DBpedia knowledgebase are that “it covers many domains; it represents real community agreement; it automatically evolves as Wikipedia changes, and it is truly multilingual.”

7. Worldbank.org—Made up of five international organizations, the World Bank Group is the largest development bank in the world, working towards ending poverty and shared prosperity. The organization offers “free and open access to data about development in countries around the globe.”

8. UNDataUNData is an internet based data service from which you can access datasets that include statistics from countries that are part of the United Nations.

9. World Census Open Data—Need census data from other countries? Get census results from other places around the globe like Argentina, New Zealand, Romania, and Egypt. This source is valuable for those gathering geographical or localized statistics. In addition, you can find census data for the USA and Statistics Canada in the linked sources for easy access.

10. Open Data Portals –At Dataportals.org you can find a curated list of open data resource portals around the world. It will help you start broadly and then narrow down your search by location. With 519 portals, as of this writing, the site welcomes additions or feedback to create one of the most widely curated free public data sources online that every user can benefit from.

Dealing with datasets can get overwhelming. You can expect your data to be disorganized and unruly. And as easy as these sites are to visit, be warned that you may come across hard to access formats requiring you to first convert a scanned PDF to Word before being able to use it. But don’t get discouraged. The potential of that data is unlimited.

Did we miss any? Add your own main sources to this collection—the more on this list, the better!

How To Work Excel Like An Accountant Without The Degree

Calculating Taxes In Excel

Not everyone has the skills of an accountant or financial analyst. But if the stress of tax season has taught us anything it’s that we could certainly take our cue from them and handle our finances better than we currently are.

Knowing even a tenth of how to use Excel is an extremely valuable asset to have. From organizational purposes to simple data visualizations, the uses of Excel are unlimited.

We’ve already showed you how to work with your data in Excel with a few basic tips. So we thought it’d be a good time to take things up a notch.

If you want to improve your data analyzing chops, we found some advice, MS Excel resources, and tips to help you to start working with Excel like an accountant without the degree!

Use Excel With A Specific Goal In Mind

There are many things an accountant can use Excel for. A cost accountant could open up a spreadsheet, for instance, to gain some insight from data in a PivotTable. But for you, your situation may be different. You may use Excel for. . . .

What would be your financial reason to learn MS Excel? Answer this question and you’ll be driven to master the program and take the next steps.

Keep Information Organized & Accessible

Although Excel is powerful calculation tool, you have to admit that it’s a great organizational tool. And this is where you can get your feet wet.

You can use it to track and organize almost anything. And for accountants that is key. But if you’re not analyzing corporate invoices, don’t worry. You can still use it in your everyday life.  Use it to keep wedding plans in order, keep track of your monthly budget, or keep your online research. It’s up to you.

Another simple way accountants use Excel? As a means of transparency, giving their clients access to their data and vice versa. You can imagine how this would help if you need to export your data to and from different sources and people.  So no matter what your goal is, learning how to work collaboratively by sharing your Excel workbook with others is another good start.

Know Your MS Excel Basics

If you don’t have any experience with numerical analysis, you’ll need to start from the beginning.  Learn the basics of Excel–and learn them well!

On Quora, a wide range of accountants and financial professionals naturally have differing opinions on which Excel skills are the most valuable for finance and accounting. But what’s generally understood without saying is that basic Excel skills and knowledge are required.

Thus, once you know your way around Excel (simple formatting, data entry), you can start in on features for working with your data. Think MS Excel Pivot Tables, Remove Duplicates, and Text-To-Columns. The aim of this step is to go from not knowing anything about Excel to knowing how to use it for basic calculations.

Get Serious About Excel

This step includes learning about trends like Excel’s impact on Business Intelligence.  It also means trying to get more precise with your Excel spreadsheets and data analysis. To do so, you need to know where to go.

If you’re at a loss at how to start in on seriously and actively learning about Excel, a few Quora professionals outlined what they thought was the best way to learn Excel.  So take a look at the thread. From online resources and courses to certifications and Excel oriented events, there’s no limit on how far you can go.

Use MS Excel Keyboard Shortcuts

Another suggestion to boost your learning is to use keyboard shortcuts for the basics. This is because once you master a skill, you should make it as efficient as possible. You’ll have bigger tasks to perform once you have the basics down. And wasting your time on  the smaller things can slow you down.

Keyboard Law Technology

Pakaccountants.com outlines the kind of MS Excel keyboard shortcuts accountants use. And it’s a powerful list when you consider how much they accomplish. For other keyboard shortcuts, you can access Microsoft’s official list of Excel keyboard shortcuts on their site.

Use MS Excel Cheatsheets

If you can’t remember all the functions and formulas there are, don’t worry. It’s doubtful that even power users can remember them all.  For this, you can simply print up and use an Excel cheatsheet and refer to it when needed. So even if you don’t know which function or formula you need, you’re more than capable of looking it up.

Just Dive In

There is absolutely no substitute for hands-on experience. If you’re going to learn Excel, learn it! Spend time with the program. Many are intimidated by it. But the key to overcoming that is to start understanding MS Excel. So explore it. Test it out. Just dive in.

And if your own personal motivation isn’t enough, another great reason to learn Excel? There are jobs that pay extremely well for MS Excel skills.  It’s nothing but a win-win situation when you learn Excel.  So step up your game and see how far you can go!

6 Online Dashboards To Present Your Google Sheets Data

Excel Spreadsheet Diagram

When it’s more and more common for teams to collaborate around data in Google Sheets, you’ll find yourself needing a way to work with and access that data whenever you need to–not whenever your data analyst gets a chance.

But the problem is that if you aren’t too savvy with MS Excel, putting together your own dashboard for your Google Sheets data is  downright overwhelming.  It can be a steep learning curve you neither have the time nor the patience for.  

As businesses are analyzing data that’s more complex than ever before, vendors specializing in data and business intelligence tools are offering out-of-the-box, online tools to make it easier.

These type of online dashboards can be the perfect solution. They let you get the same professional data visualization and analysis without all the technicalities.

The following 6 dashboard tools, for instance, have simple to use interfaces, features and data connection widgets that can integrate your Google Sheets data. Check them out.

1. Geckoboard

Geckoboard starts our list with its free account option. Powerful enough for Salesforce data and simple enough to use with Google sheets data.  You can set it up as you would upload a file. It’s just a matter of creating a connection to your Google account or linking to the file via URL as a data source. You can find a good run down of Geckoboard’s spreadsheets integration on the site.

2.VizyDrop   

VizyDrop is a data visualization tool that can easily integrate your Google Sheets data. Features with this tool include chart sharing, a dashboard editor, updated data visualizations, the ability to work with custom sources, and built-in app connectors.

This tool is on the simplistic side, but it’s free to use. You can start dropping in your data in their demo space, but you’ll have to sign up to save your data.

3. Slemma

Slemma also makes things super easy for you with its built-in integration with Google Drive available for all types of accounts. This dashboard tool provides you with automatic update and scheduled report delivery features.

Other features include one-click integration, calculations, automated pivot tables, and email dashboards. And if you’re new to working with data and aren’t sure how to best visualize it, you can get some inspiration from the data visualizations and dashboards other Slemma members have done. There’s a free 14-day trial available to check out.

4. BIME Analytics

Connecting Data BIME AnalyticsSource:  BIME Analytics

You can connect a number of any data sources you have with BIME Analytics–including online cloud services like Google Drive.  BIME Analytics allows you to pull data from Google Sheets and cache datasets. There’s no storage; analysis is always live and direct.  You can test it out with a 10 day free trial from the site.

5. Cyfe

In addition to other integrations to third party services like Google Analytics, Cyfe can also integrate your Google Sheets data directly. You have the ability to archive data, export reports, and share your dashboards.  Its widgets will allow you to connect any of your data that’s on the web. You can sign up for free or go with the premium account for $19 /month.

6. Datahero

This tool touts itself as the “fastest and easiest way to understand your data”. DataHero is a data visualization tool that works with the data you already use in cloud storage services, or in files on your computer. 

Import your data from cloud services, create insightful charts and organize them in separate dashboards to get the most out of your data.  Once you sign up you get free trial access to their Premium service, which is a good starting point for exploring your dashboard needs. As a bonus, there is also a free account which will do just fine for visualizing your data on a lower scale.

 

Marrying business intelligence and Google Sheets data doesn’t have to be complicated or something you dread. On the contrary, it can be as simple as uploading your data online. See if any of the above are a good fit for your data needs.