Data transformation

From Wikipedia, the free encyclopedia

This article is about data transformation in computer science (metadata). For statistical application, see data transformation (statistics).

In metadata, a data transformation converts data from a source data format into destination data.

Data transformation can be divided into two steps:

  1. data mapping maps data elements from the source to the destination and captures any transformation that must occur
  2. code generation that creates the actual transformation program

Data element to data element mapping is frequently complicated by complex transformations that requires one-to-many and many-to-one transformation rules.

The code generation step takes the data element mapping specification and creates an executable program that can be run on a computer system. Code generation can also create transformation in easy-to-maintain computer languages such as Java or XSLT.

When the mapping is indirect via a mediating data model, the process is also called data mediation.

Contents

There are numerous languages available for performing data transformation. Many transformational languages require a grammar to be provided. In many cases the grammar is structured using something closely resembling Backus–Naur Form (BNF). There are numerous languages available for such purposes varying in their accessibility (cost) and general usefulness. Examples of such languages include:

  • XSLT - the XML transformation language
  • TXL - prototyping language-based descriptions using source transformation

It should be noted that though transformational languages are typically best suited for transformation, something as simple as regular expressions can be used to achieve useful transformation. Textpad supports the use of regular expressions with arguments. This would allow all instances of a particular pattern to be replaced with another pattern using parts of the original pattern. For example:

foo ("some string", 42, gCommon);
bar (someObj, anotherObj);

foo ("another string", 24, gCommon);
bar (myObj, myOtherObj);

could both be transformed into a more compact form like:

foobar("some string", 42, someObj, anotherObj);
foobar("another string", 24, myObj, myOtherObj);

In other words, all instances of a function invocation of foo with three arguments, followed by a function invocation with two invocations would be replaced with a single function invocation using some or all of the original set of arguments.

Another advantage to using regular expressions is that they will not fail the null transform test. That is, using your transformational language of choice, run a sample program through a transformation that doesn't perform any transformations. Many transformational languages will fail this test.

There are many challenges in data transformation. Probably the most difficult problem to address in C++ is "unstructured preprocessor directives". These are preprocessor directives which do not contain blocks of code with simple grammatical descriptions - example:

void MyFunc ()
{
  if (x>17)
  { printf("test");
#ifdef FOO
  } else {
#endif
    if (gWatch)
      mTest = 42;
  }
}

A really general solution to handling this is very hard because such preprocessor directives can essentially edit the underlying language in arbitrary ways. However, because such directives are not, in practice, used in completely arbitrary ways, one can build practical tools for handling preprocessed languages. The DMS Software Reengineering Toolkit] is capable of handling structured macros and preprocessor conditionals.

  • For further information on data transformation see Chapter 2.4 of [1].
Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.