How To Analyze Open Data With Able2Extract, Power BI And DataHero

There is a general sense of helplessness when it comes to analyzing public data, especially as people think it involves insane amounts of statistical mastery and in-depth knowledge of complicated statistical software.

This is especially nerve wracking for data journalists, who are keen on using data to write stories that can actually influence a certain aspect of our society, such as healthcare or education. Truth be told, analyzing data and storytelling actually go hand in hand.

Since the Open Data initiative started, more and more data sets have seen the light of the day on various data-related portals. The most interesting data sets for journalists are the ones who are publicly available, simply because they are free to use and analyze. Those data sets are available on a variety of online sources, such as: www.data.gov, open.canada.ca, data.gov.uk and many more.

Open data portals contain thousands and thousands of data sets, related to various branches of government: education, business, economy, crime, justice, healthcare and more.

Once you start exploring the online data, you will see that it usually comes in 3 main formats: HTML, XML and PDF.

Common Open Dataset Formats

However, if you start investigating the data sets in more depth, you will quickly notice that there is only one format that’s present in almost every data set — the PDF. So, the logic goes that if you know how to analyze data that’s locked inside a PDF, you’ll know how to analyze any.

But what makes people want to store data in a non editable format?

First of all, when you save a data set as a PDF you are reducing its size, so it’s easier to store and upload to online databases. Secondly, since the PDF is not editable by default, you are making sure that no one tampers with your data and changes any of the ever-so-important numerical values. Remember, people spend countless hours gathering data and they are keen on protecting their hard work as much as possible.

So, once you find a PDF data set, where do you go next?

You now basically have only one option — you need to get that data into an Excel or CSV file format, while preserving source document accuracy as much as possible. After you do that, the next step would be to import that converted file into a data visualization tool of your choice, which we will cover later in this tutorial.

When it comes to exporting PDF data, the only tool in the market that has advanced enough PDF exporting capabilities is Able2Extract. That is because Able2Extract is not just a regular PDF converter. See, most (if not all) PDF converters on the market only convert PDF to Excel automatically, leaving you with a messy data set. The automatic conversion works good for one page invoices but converting a 1,000 page data set takes a lot more than that.

Able2Extract is the only converter that lets you fully customize your conversion by manually setting up row and column structure, prior to conversion. In addition it lets you preview the conversion results from within the software, which lets you export your data set as accurately as possible.

First, find your PDF data set. For this tutorial, we are going to use a practice data set containing all funded projects from Canadian Environmental Damages Fund. You can download it here.

Open the data set in Able2Extract and use custom PDF to Excel conversion to convert it to an Excel file. Set up row and column structure using the right side panel and make sure to check the “Preview conversion” box. Once satisfied, hit the convert button to send the data to Excel.

Able2Extract Custom PDF to Excel

So, we got our data from PDF and into Excel. Great job!  

The next step is to go to Excel and clean the data. This will take 15 minutes to 2 hours, depending on the data set, but the thing you are looking for in the end is to end up with data in the tabular format, which means there is a separate row for each record. It should look something like this:

PDF to Excel Conversion Results

Make sure you don’t have any empty rows or blank cells and that all text is formatted in the same way. If there is a row with 3 cells missing it’s best to delete the whole row because it can mess up your end result and produce inaccurate results.

Now that we have a clean and tidy data set, it’s time to give life to these numbers and visualize them. Enter data visualization.

Data visualization simply means to create interesting charts from just plain data, which makes it easier to understand and present to your readers. When it comes to visualizing data you have an option between a desktop dataviz tool and a cloud dataviz tool. We will explore one example of both.

Our recommended desktop software for visualizing complex data is Power BI. We are recommending it because of its compatibility with Excel and the fact that it’s free to use for datasets up to 1 GB. You can download it here.

Before we start with Power BI, you will need to know that analyzing data starts by asking questions and then using data to answer them. For example, you can ask questions regarding our practice data set before we even upload it to the dataviz tool:

  • What was the EDF funding per region?
  • Which group received the biggest funding?

Depending on the data set, you can ask a 1000 questions and, make no mistake, you will get a 1000 answers. OK, let’s move on to more serious stuff. Power BI.

Power BI

Power BI is a Business Intelligence tool created for monitoring business performance and discovering market opportunities. Today we will use it as a data journalism tool in order to answer the two questions above.

Once you open Power BI you first click on Get Data > Excel > Connect > Your file.

Choose a sheet where data is located and press Load. Alternatively, you can press Edit if you’d like to check your data set for mistakes once again.

Once you do so, you will find a blank canvas and your data values on the right sidebar panel.

Accessing PowerBI Side Panel

These are the values we are going to slice and dice. Let’s try to answer our first question. If you remember, we wanted to know what was the EDF funding per region.

The basic data field there is EDF Funding so we’ll drag it into the “Values” box. The canvas immediately changes and it is now showing us the total EDF funding:

PowerBI EDF Funding Values

Let’s now introduce another data field. Select the “Pie chart”.

PowerBI Data Visualization Selection

Drag the “Region” field into the “Legend” box. Congrats, you made your first data visualization! We now have an overview of the funding per region and we can already start answering some questions.

EDF Funding Visualization By Region

However, if you pay close attention you can see that we still don’t know the exact funding for each region. To show the exact values of data fields, go to “Format” panel:

Accessing PowerBI Fromat Panel

Expand the “Detail Labels” category, find the Label Style and select “Both” from the drop down menu.

Selecting PowerBI Detail Labels

Our pie chart is now showing us the specific monetary values for each segment. Great, first question answered.

EDF Funding Pie Chart

OK, next up is to see which Group received the biggest funding. We’ll repeat the process but we’ll use a different chart, just to demonstrate different features of Power BI.

First, find and click on the Clustered Bar Chart.

Selecting Clustered Bar Chart

Drag the EDF Funding into the Values box and drag the Group into the Axis box. Turn on the data labels and you’ll quickly see that the University of Waterloo received the biggest funding — almost $320,000

EDF Group Values Chart

Now that you know how to ask questions and visualize public data, we will now quickly go over another tool that can help you visualize your data in the Cloud. Have in mind that the Cloud tools only support lower file sizes, which means you’re best off using them for 10-20 page data sets. Luckily, the data set from our example is actually pretty small.

DataHero

DataHero is a cloud solution for Business Intelligence and data visualization. It allows you to connect files from numerous online and offline sources and it even has an integrated data cleaning tool, which is nice, but I do not recommend relying solely on it.

You can use DataHero for free, for files up to 2 MB in size. Anything larger than that, and you’ll probably have to pay a monthly subscription which is between $60 and $90. For this purpose, we are going to use a free plan.

Create an account, click on the Data tab and click on Import Data.

Importing Data With DataHero

Find your Excel file, select the sheet and upload it:

Uploading Data with Datahero

On the next screen, check formatting and proceed.

What’s cool about DataHero is that it automatically suggests data visualizations:

Suggested Visualizations From DataHero

I was originally interested in EDF Funding by project category so I’ll just create a brand new chart. DataHero uses the same drag & drop interface so it’s really easy to start using it.

First, drag the EDF Funding field onto the canvas.

PowerBI EDF Funding Values

Next, drag & drop the Project Category field.

DataHero Pie Chart Visualization

As you can see, we received our answer. Most of the funding money (35%) went into Restoration projects and the rest  was dispersed equally between other three categories.

There are other, more complex, data visualization tools but we will stick with DataHero and Power BI for the time being as they offer the most features in their free plans.

Let’s recap the entire process of analyzing public data that’s archived in PDF:

  1. Find a relevant data set
  2. Use Able2Extract’s Custom PDF to Excel feature  to convert it to Excel or CSV
  3. Clean the data in Excel and remove blank rows and cells
  4. Visualize the data using a tool like Power BI or DataHero

By now you should have a clear understanding of the entire process of analyzing public data and should be well on your way to using it to shape the future of journalism. The strategy is simple — just upload clean, high quality data and play around with it until you get what you are looking for.

 

How To Create A Custom Keyboard Shortcut?

Part  10 of 13 in our How To Use AutoCAD series

Becoming a master in AutoCAD means that you should be able to incrementally improve your skills and thus, your productivity. Even though some things may be difficult to learn at first, eventually it’ll become second nature and  you’ll forget about tracking down the location of your favorite tools on the dashboard.

One of the things you can do to instantly increase your drafting speed is to create custom keyboard shortcuts for existing commands.  As part of our How to Use AutoCAD series, we’re here to show you how to go about creating them.

This is easily accomplished through AutoCAD’s Customize User Interface (CUI) feature.

Customizing AutoCAD Interface

1. To access CUI, enter “cui” in the command line and press Enter. Alternatively, go to “Manage” tab, and under Customization panel click on the “User Interface” button.

2. In the command list panel type the command that you would like to assign a keyboard shortcut to. For example, type in “Hatch”. Locate it on the menu below and left click to see its button image and properties.  As you can see from the properties panel on the left, there is no keyboard shortcut assigned to this command. We’re going to now enable a custom keyboard shortcut for this command.

3. In the “Customization in All Files” panel, expand the last item in the list – Partial Customization Files. Now expand the list in the following order: “Custom” > “Keyboard Shortcuts” > “Shortcut Keys”.

AutoCAD Shortcut Keys

4. Click and drag the desired command from the “Command List’ menu to the “Shortcut Keys” folder above. You should see your command in the properties menu on the lower right side.

Assigning Custom AutoCAD Shortcuts

5. As you can see there is an “Access” command under which there is a “Key(s)” box. Click on it and then on the three dots  which will appear to the right. You can now specify your keyboard shortcut for this command. Hit “Apply” and then “OK”.

Besides assigning custom shortcuts to existing commands, you can even create your own custom commands and macros and assign them your favorite keyboard shortcuts. This tutorial further explains the process:  How to Create a Custom Command.

How To Place Horizontal And Vertical Dimensions Onto A Drawing?

AutoCAD Technical Design

Part 9 of 13 in our How To Use AutoCAD series

Placing dimensions in AutoCAD is essential for documenting your drawing. AutoCAD 2016 offers the same dimensions as the previous versions, that is: linear, aligned, angular, arc, radius, diameter, ordinate and jogged.

Using the “Linear” dimension command, you can create and place a horizontal, vertical or rotated dimension line.

You can accomplish this task by following these simple steps:

1.Locate the dimensioning tools in the Annotation panel, on the Home tab. Click on the drop-down arrow and choose “Linear”.  This will give us a horizontal or vertical distance between the selected points.

Locating AutoCAD Dimensioning Tools

2.Click on the snap points to specify the dimension line location.

3.Once you specify the dimension points, you have several choices: if you pull to the right, you’ll get horizontal measurement; if you pull upwards you’ll get vertical measurements.

4.Left click to complete the dimensioning and continue with your work.

In practice, you’ll probably have to create various dimensions, as well as utilize different dimension styles, depending on your project. For an all encompassing guide on using this AutoCAD feature check out CAD Tutor’s  AutoCAD Dimensioning Tutorial.

Top 7 Tips For Solving Your Common Digital Document Problems

Technology Support

As a PDF converter solutions company, we’re no strangers to document-related stress. We deal with it on a daily basis. And because of this, we’ve shared a number of tips that have made working with documents and their content less stressful.

From dealing with large files to editing PDF files, we’ve covered solutions to some of the most stressful and, unfortunately, common issues that can pop up at any given time.

To ensure you have a solution when you need it, we’ve collected some of our best hits into one single mashup where you can access and find that one tip you’re looking for.

We have the low-down on a good workaround if you are….

… Having Troubles Sending That Large File  

Send And Share Large Files Easily. Our post on 3 Great Ways To Share Large Files With Others can help. It covers resources that will let you share large files via online cloud drives, through browser to browser services and even between computers and systems. So when all else fails with your email you have a handful of other work arounds you can try.

…Signed Up With Too Many Cloud Drives  

Admit it. From Dropbox and Box.com to Google Drive and OneDrive, you have had at least one account on each service. We’ve been there. Multiple logins and desktop clients give us a headache, too. Another problem? Sifting through those cloud drives to access the one file you need. Well, in this feature post on storing converted files to multiple cloud drives with one app, you’ll find a whole new way to work faster with cloud storage services.

…Required To Send Both A PDF File And A Word Document

PDF Embedded Word File

It’s known that you could always add other files, like MS Word to PDF documents. But what about the other way around? This post on How to Attach a PDF File to Microsoft Word Documents  shows you an MS Office feature which allows you to do just that.  The feature we cover can be a timesaving workaround when you need to attach multiple files to an email.

…Working With An Expired Microsoft Office Subscription 

Now that Microsoft Office  is offered as a subscription service, free MS Word alternatives can act as a good emergency backup.  When your Office subscription expires, your documents are left uneditable, locked in viewing mode. And when it does, Google Drive is generally the number one solution. So for this, we included our post on using Google Docs and Sheets Add-ons which offers you a look at how you can create a research paper from beginning to end right in your browser if needed.

…Trying To Edit PDF Text And Pages

When you need to make changes to your PDF, your natural instinct is to do it directly on the page. Why ignore that knee-jerk reaction? This article on How to Edit a PDF Document will show you how you can naturally and intuitively make changes to both text and pages in your PDF without Acrobat.

…Struggling To Work Efficiently Between OpenOffice And Google Docs

Entering GoogleDocs Credentials

Working between a desktop and online document processor is now a quick and easy way to get things done. You create a document on your desktop, save it and then upload the file to the online application. But believe it or not, you can make things even simpler than that when working OpenOffice and Google Docs. How to Export Open Office Files to Google Docs features a tutorial on how to transfer OpenOffice documents directly from the application to the online suite with one simple extension.

…Stuck With PDF Documents In a Different Language

Can’t speak 50 different languages? Don’t worry. We included our post on How to Translate PDF Documents without Learning Another Language to this list because working with PDFs in other languages can be a part of how you categorize, research and process documents in your work. In the post, you’ll learn a few ways to translate PDFs on-the-spot with the help of the web.

We know this list is short compared to the list of document-related problems you may have. But let us know what document issues you need a workaround for in the comments and we’ll see if we have a post on it that we can add to the above.

How To Quickly Create Perfectly Parallel Lines, Parallel Curves And Concentric Circles?

Part  8 of 13 in our How To Use AutoCAD series

While working in AutoCAD, you’ll often come across a situation where you need to draw perfectly parallel lines, rectangles or circles. Whether they’re for designing a structure, designing a machine part, or creating an object, precision is the goal.

Since it’s extremely important to be as accurate as possible when working on your drawings, the best way to quickly create perfectly parallel shapes is by using the “Offset” command.

The “Offset” command in AutoCAD 2016, is located on the bottom right of the Modify panel, on the Home tab.

Locating AutoCAD Offset Command

Here’s how to use the Offset command:

1.Draw a shape that you would like to offset.

2. Click on the Offset command (bottom right on the Modify panel).

3. Select the Offset distance. You can do this in two ways. The first is to enter the distance manually into the number box. The second way is to left click on a blank space in the drawing window and then move your cursor in any direction.

4. Select the object to offset. Left click on the object and you will get a perfectly parallel copy.

Using AutoCAD Offset

 

As a general rule of thumb, don’t forget to use Offset when drawing stairs, concentric circles or any other element. This command will save you a ton of time editing and re-adjusting your drawing by getting them right the first time around.