Dataflow language

From Wikipedia, the free encyclopedia

In computer programming, a dataflow language is a programming language that implements dataflow principles and architecture, and models a program, conceptually if not physically, as a directed graph of the data flowing between operations. Dataflow languages share some features of functional languages, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing.

Contents

Dataflow languages contrast with the majority of programming languages, which use the imperative programming model. In imperative programming the program is modelled as a series of operations, the data being effectively invisible. This distinction may seem minor, but the paradigm shift is fairly dramatic, and allows dataflow languages to be spread out across multicore, multiprocessor systems for free.

One of the key concepts in computer programming is the idea of "state", essentially a snapshot of the measure of various conditions in the system. Most programming languages require a considerable amount of state information in order to operate properly, information which is generally hidden from the programmer. For a real world example, consider a three-way light switch. Typically a switch turns on a light by moving it to the "up" position, but in a three-way case that may turn the light back off — the result is based on the state of the other switch, which is likely out of view.

In fact, the state is often hidden from the computer itself as well, which normally has no idea that this piece of information encodes state, while that is temporary and will soon be discarded. This is a serious problem, as the state information needs to be shared across multiple processors in parallel processing machines. Without knowing which state is important and which isn't, most languages force the programmer to add a considerable amount of extra code to indicate which data and parts of the code are important in this respect.

This code tends to be both expensive in terms of performance, as well as difficult to debug and often downright ugly; most programmers simply ignore the problem. Those that cannot must pay a heavy performance cost, which is paid even in the most common case when the program runs on one processor. Explicit parallelism is one of the main reasons for the poor performance of Enterprise Java Beans when building data-intensive, non-OLTP applications.

Dataflow languages promote the data to become the main concept behind any program. It may be considered odd that this is not always the case, as programs generally take in data, process it, and then feed it back out. This was especially true of older programs, and is well represented in the Unix operating system which pipes the data between small single-purpose tools. Programs in a dataflow language start with an input, perhaps the command line parameters, and illustrate how that data is used and modified. The data is now explicit, often illustrated physically on the screen as a line or pipe showing where the information flows.

Operations consist of "black boxes" with inputs and outputs, all of which are always explicitly defined. They run as soon as all of their inputs become valid, as opposed to when the program encounters them. Whereas a traditional program essentially consists of a series of statements saying "do this, now do this", a dataflow program is more like a series of workers on an assembly line, who will do their assigned task as soon as the materials arrive. This is why dataflow languages are inherently parallel; the operations have no hidden state to keep track of, and the operations are all "ready" at the same time.

Dataflow programs are generally represented very differently inside the computer as well. A traditional program is just what it seems, a series of instructions that run one after the other. A dataflow program might be implemented as a big hash table instead, with uniquely identified inputs as the keys, and pointers to the code as data. When any operation completes, the program scans down the list until it finds the first operation where all of the inputs are currently valid, and runs it. When that operation finishes it will typically put data into one or more outputs, thereby making some other operation become valid.

For parallel operation only the list needs to be shared; the list itself is the state of the entire program. Thus the task of maintaining state is removed from the programmer and given to the language's runtime instead. On machines with a single processor core where an implementation designed for parallel operation would simply introduce overhead, this overhead can be removed completely by using a different runtime.

There are many hardware architectures oriented toward the efficient implementation of dataflow programming models. MIT's tagged token dataflow architecture was designed by Greg Papadopoulos who is now the CTO of Sun Microsystems.

Dataflow languages were originally developed in order to make parallel programming easier. Thus, they were typically developed at the large supercomputer labs. One of the most popular was SISAL, developed at Lawrence Livermore National Laboratory. SISAL looks like most statement-driven languages, but demands that every variable be defined only once. This allows the compiler to easily identify the inputs and outputs. A number of offshoots of SISAL have been developed, including SAC, Single Assignment C, which tries to remain as close to the popular C programming language as possible.

A more radical concept is Prograph, in which programs are constructed as graphs onscreen, and variables are replaced entirely with lines linking inputs to outputs. Ironically, Prograph was originally written on the Macintosh, the only computer with a powerful enough GUI at the time, which remained single-processor until very recently.

The most popular dataflow languages are more practical, the most famous being National Instruments' LabVIEW. It was originally intended to make linking data between lab equipment easy for non-programmers, but has since become more general purpose. Another is VEE, optimized to use with data acquisition devices like digital voltmeters and oscilloscopes, and source devices like arbitrary waveform generators and power supplies.

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.