You’re about to begin a project that will tap into or integrate data from a database. You’ve been looking for low-cost ways to clear that data of duplicates, near-dupes, and obsolete or garbage data. But cleansing tools are expensive.
As of today, there’s a free solution for you to think about. Integration tools maker Talend today announced general availability of Open Profiler, a GUI-based tool for Linux, Unix and Windows that lets developers peek inside data sources to evaluate the quality of the data they’re about to work with to verify it adheres to project goals or metrics.
Open Profiler 1.0.0RC1 includes a metadata repository, which stores results of its introspections of files and data stores. The metadata can then be used by developers and data analysts to create metrics and indicators. These indicators are statistics such as groups of data with certain numbers of rows, null values, distinct or unique values, and duplicates or blank fields. Other indicators include minimum, maximum and average length of text in fields; computation of numerical summary values such as for mean, average, inner quartile and range definitions; and advanced statistic such as mode and frequency tables. The tool also can render the statistics as tables and graphs.
“Companies in every business face significant losses and inefficiencies that are caused by poor data quality,” said Talend CEO Bertrand Diard. Open Profiler, he continued, “helps companies understand and regain control of the quality of their data.”
Open Profiler 1.0.0RC1 is available now under the GPL 2 open source license.