Shopping Cart
Your Cart is Empty
There was an error with PayPalClick here to try again
CelebrateThank you for your business!You should be receiving an order confirmation from Paypal shortly.Exit Shopping Cart


Orange County Chapter of ARMA International



Welcome to the OCARMA Blog! 

view:  full / summary

Is Your Big Data Reliable and Authentic?

Posted by [email protected] on August 3, 2018 at 12:40 AM Comments comments (0)

Over the past 35 years, I’ve worked on hundreds of initiatives and projects that rely on the integrity and accuracy of the data in line of business applications. These are the things I’ve learned to look for in order to judge the eventual success:

1. Is there a records or information governance manager on staff?

This indicates a value is placed on records as an asset within the company.

2. Does records management start when folders are placed in a box?

This indicates that the asset value is limited to paper records.

3. How many times has the data been migrated?

Data that has been through multiple migrations may have been truncated or corrupted.

4. Who is responsible for data mapping?

Field names are often labeled differently between back end tables and the search screens so ideally, both the business unit and information systems should be involved. Acquisitions and mergers present particular challenges in mapping data from one system to another, especially if data was not normalized at the acquisition but kept in the original system.

5. How many business units touch the data during its lifecycle?

Various units will have different interests and ways of naming information, particularly in a Customer Relationship Management system.

6. Are there written procedures available?

Screenshots with handwritten notes don’t count – I’m looking for a document with revision history, approvals and step by step narratives, desktop instructions or storyboards.

7. Was documentation created for the application that spells out the assumptions and limitations of the database, what can and cannot be imported, exported or created from within the system?

For instance, if there is not a 255 character limitation in a text field, is it really a text field (searchable) or is it a blob (not really searchable)?

8. How was obsolete data removed?

If it was archived, will it come forward in the new initiative? What is the impact? Will a particular set of data look like a false duplicate? If it was scrambled, were all fields scrambled or just the unique fields? Again, what is the impact to the project?

9. What protection was put in place to prevent inaccurate data from being entered in the first place?

Were drop downs with agreed upon terms and spelling created to standardize data entry? Were masks put in place for date or specific numerical identification fields? Were look ups to a System of Record created?

10. If many divisions’ databases are being centralized into one, are there potential duplicate “unique” numbers?

In other words, did each division or system start with “1” and increment sequentially?

Asking the above questions and being persistent in obtaining the answers will provide a better understanding of the tasks involved (and time) to achieve a successful result in a big data or any other data-reliant project. In other words, “Garbage In – Garbage Out” on a really massive scale if the time isn’t taken at the beginning to clean active data and really remove obsolete data!

Cheryl A. Young IGP, CTT+, CDIA+, APMD, ermM, ecmP


President, OCARMA