Kimball University: Eight Recommendations for International Data Quality
Language, culture, and country-by-country compliance and privacy requirements are just a few of the tough data quality problems global organizations must solve. Start by addressing data accuracy at the source and adopting an MDM strategy, then follow these six other best-practice approaches.
Geographies and Addresses
Addresses in different countries are notoriously difficult to parse without detailed local knowledge. Consider the following examples:
Finland: Ulvilante 8b A 11 P1 354 SF-00561 Helsinki Korea: 35-2 Sangdaewon-dong Kangnam-ku Seoul 165-010
Again, do you have any idea how to parse these addresses?
Privacy and Information Transfer
Even if the data you collect is properly parsed and of high quality, you need to be very careful with how you store, transport, and expose that data. France's Act of 6 January 1978 on Data Processing, Files, and Individual Liberties, amended August 2004 and March 2007, states, "The collection and processing of personal data that reveals, directly or indirectly, the racial and ethnic origins, the political, philosophical, religious opinions or trade union affiliation of persons, or which concern their health or sexual life, is prohibited. (8 paragraphs of exceptions follow)." Search the term "privacy law" on Google for much more on this topic.
Compliance is another migraine headache for the data warehouse whenever revenue or profitability data is exposed through BI tools. One of the modules in Kimball University data warehouse classes is how to allocate costs in an organization in order to compute profit. Be careful! The European Union has 25 member states, each with potentially varying financial responsibility guidance.
Transaction systems normally will capture detailed financial transactions in the true original currency at the location of the transaction. Different currencies, of course, cannot be directly added. Exchange rates change every day, in some cases rapidly. Foreign currency symbols are essential in final reports, but may not be available in the fonts you use.
Time Zones, Calendars and Date Formats
Contrary to popular belief, there are not just 24 time zones around the world, but hundreds! The complexity comes from daylight savings time rules. For example, although the state of Indiana is entirely in the Eastern time zone, part of Indiana observes daylight savings time and part does not. You need a list of Indiana counties to know what time it is in Kokomo! In some areas of the world, there are dozens of jurisdictions with different time-zone rules.
In western countries, most of us use the Gregorian calendar, but there are several other important calendars. For example, July 8, 2008 in the Gregorian calendar is 6-6-4705 in the Chinese calendar; 6-6-2668 in the Japanese calendar; Rajab 4, 1429 in the Muslim calendar; and Tammuz 5, 5768 in the Talmudic calendar. Can your data warehouse handle these? And if a European writes "7-8-2008," is this July 8 or August 7?
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.