FLOPS
From Wikipedia, the free encyclopedia
In computing, FLOPS (or flops or flop/s) is an acronym meaning FLoating point Operations Per Second. The FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating point calculations, similar to instructions per second. Since the final S stands for "second", conservative speakers consider "FLOPS" as both the singular and plural of the term, although the singular "FLOP" is frequently encountered. Alternatively, the singular FLOP (or flop) is used as an abbreviation for "FLoating-point OPeration", and a flop count is a count of these operations (e.g., required by a given algorithm or computer program). In this context, "flops" is simply the plural rather than a rate.
Computing devices exhibit an enormous range of performance levels in floating-point applications, so it makes sense to introduce larger units than FLOPS. The standard SI prefixes can be used for this purpose, resulting in such units as gigaFLOPS (one billion or 1×109 FLOPS), teraFLOPS (one trillion or 1×1012 FLOPS) and petaFLOPS (one quadrillion or 1×1015 FLOPS). IBM's top supercomputer, dubbed Blue Gene/P, is designed to continuously operate at speeds exceeding one petaFLOPS and, when configured to do so, reaches speeds in excess of three petaFLOPS. NEC's SX-9 supercomputer has a peak processing performance of 839 teraFLOPS and features the world's first vector processor to exceed 100 gigaFLOPS per single core.
A basic calculator performs relatively few FLOPS. Each calculation request to a typical calculator requires only a single operation, so there is rarely any need for its response time to exceed that needed by the operator. Any response time below 0.1 second is perceived as instantaneous by a human operator, so a simple calculator needs only about 10 FLOPS.
Contents |
In order for FLOPS to be useful as a measure of floating-point performance, a standard benchmark must be available on all computers of interest. One example is the LINPACK benchmark.
FLOPS in isolation are arguably not very useful as a benchmark for modern computers. There are many factors in computer performance other than raw floating-point computation speed, such as I/O performance, interprocessor communication, cache coherence, and the memory hierarchy. This means that supercomputers are in general only capable of a small fraction of their "theoretical peak" FLOPS throughput (obtained by adding together the theoretical peak FLOPS performance of every element of the system). Even when operating on large highly parallel problems, their performance will be bursty, mostly due to the residual effects of Amdahl's law. Real benchmarks therefore measure both peak actual FLOPS performance as well as sustained FLOPS performance.
For ordinary (non-scientific) applications, integer operations (measured in MIPS) are far more common. Measuring floating point operation speed, therefore, does not predict accurately how the processor will perform on just any problem. However, for many scientific jobs such as analysis of data, a FLOPS rating is effective.
Historically, the earliest reliably documented serious use of the Floating Point Operation as metric appears to be AEC justification to Congress for purchasing a Control Data CDC 6600 in the mid-1960s.
The terminology is currently so confusing that until April 24, 2006 U.S. export control was based upon measurement of "Composite Theoretical Performance" (CTP) in millions of "Theoretical Operations Per Second" or MTOPS. On that date, however, the U.S. Department of Commerce's Bureau of Industry and Security amended the Export Administration Regulations to base controls on Adjusted Peak Performance (APP) in Weighted teraFLOPS (WT).
On October 25, 2007, NEC Corporation of Japan issued a press release announcing its SX series model SX-9, claiming it to be the world's fastest vector supercomputer with a peak processing performance of 839 teraFLOPS. The SX-9 features the first CPU capable of a peak vector performance of 102.4 gigaFLOPS per single core.
On June 26, 2007, IBM announced the second generation of its top supercomputer, dubbed Blue Gene/P and designed to continuously operate at speeds exceeding one petaFLOPS. When configured to do so, it can reach speeds in excess of three petaFLOPS.
In June 2007, Top500.org reported the fastest computer in the world to be the IBM Blue Gene/L supercomputer, measuring a peak of 596 TFLOPS. The Cray XT4 hit second place with 101.7 TFLOPS.
In June 2006, a new computer was announced by Japanese research institute RIKEN, the MDGRAPE-3. The computer's performance tops out at one petaFLOPS, almost two times faster than the Blue Gene/L. MDGRAPE-3 is not a general purpose computer, which is why it does not appear in the Top500.org list. It has special-purpose pipelines for simulating molecular dynamics. MDGRAPE-3 houses 4,808 custom processors, 64 servers each with 256 dual-core processors, and 37 servers each containing 74 processors, for a total of 40,314 processor cores, compared to the 131,072 needed for the Blue Gene/L. MDGRAPE-3 is able to do many more computations with few chips because of its specialized architecture. The computer is a joint project between RIKEN, Hitachi, Intel, and NEC subsidiary SGI Japan.
Distributed computing uses the Internet to link personal computers to achieve a similar effect:
- The entire BOINC averages 663 TFLOPS as of September 8, 2007.[1]
- SETI@Home computes data averages more than 265 TFLOPS.[2]
- Folding@Home has reached over 1 PFLOPS[3] as of September 15, 2007.[4] Note, as of March 22, 2007, PlayStation 3 owners may now participate in the Folding@home project. Because of this, Folding@home is now sustaining considerably higher than 210 TFLOPS (1267 TFLOPS as of September 23, 2007). See the current stats[5] for details.
- Einstein@Home is crunching more than 70 TFLOPS.[6]
- As of June 2007, GIMPS is sustaining 23 TFLOPS.[7]
- Intel Corporation has recently unveiled the experimental multi-core POLARIS chip, which achieves 1 TFLOPS at 3.2 GHz. The 80-core chip can increase this to 1.8 TFLOPS at 5.6 GHz, although the thermal dissipation at this frequency exceeds 260 watts.
As of 2007, the fastest PC processors perform over 30 GFLOPS.[8] GPUs in PCs are considerably more powerful in terms of pure FLOPS. For example, in the GeForce 8 Series the nVidia 8800 Ultra performs around 576 GFLOPS on 128 Processing elements. This equates to around 4.5 GFLOPS per element, compared with 2.75 per core for the Blue Gene/L. It should be noted that the 8800 series performs only Single precision calculations, and that while GPUs are highly efficient at calculations they are not as flexible as a general purpose CPU.
- 1997: about US$30,000 per GFLOPS; with two 16-Pentium-Pro–processor Beowulf cluster computers[9]
- 2000, April: $1,000 per GFLOPS, Bunyip, Australian National University. First sub-US$1/MFlop. Gordon Bell Prize 2000.
- 2000, May: $640 per GFLOPS, KLAT2, University of Kentucky
- 2003, August: $82 per GFLOPS, KASY0, University of Kentucky
- 2006, February: about $1 per GFLOPS in ATI PC add-in graphics card (X1900 architecture) — these figures are disputed as they refer to highly parallelized GPU power.
- 2007, March: about $0.42 per GFLOPS in Ambric AM2045[10].
- 2007, October: about $0.20 per GFLOPS with the cheapest retail Sony PS3 console, at US$400, that runs at a claimed 2 teraFLOPS; these figures represent the processing power of the GPU. The seven CPUs run collectively at a lower 218 GFLOPS.[11]
This trend toward lower and lower cost for the same computing power follows Moore's law.
- ^ Berkeley Open Infrastructure for Network Computing (BOINC)
- ^ SETI at home
- ^ Folding@home
- ^ [1]
- ^ http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats
- ^ Einstein@Home - Server Status
- ^ Internet PrimeNet Server Parallel Technology for the Great Internet Mersenne Prime Search
- ^ Tom's Hardware's 2007 CPU Charts
- ^ Loki and Hyglac
- ^ http://www.ambric.com/pdf/MPR_Ambric_Article_10-06_204101.pdf
- ^ http://news.bbc.co.uk/2/hi/technology/4554025.stm
- Current Einstein@Home benchmark
- BOINC projects global benchmark
- Current GIMPS throughput
- Top500.org
- LinuxHPC.org Linux High Performance Computing and Clustering Portal
- WinHPC.org Windows High Performance Computing and Clustering Portal
- Oscar Linux-cluster ranking list by CPUs/types and respective FLOPS
- Petaflop developments and information
- Information on how to calculate “Composite Theoretical Performance" (CTP)
- Information on the Oak Ridge National Laboratory Cray XT system.
- Infiscale Cluster Portal - Free GPL HPC