Lossless Compression of Meteorological Data
by Rodrigo Iza-Teran and Rudolph Lorentz
Weather forecasts are getting better and better. Why? The models include more physical processes and resolution has been improved. The consequence: more data is generated by every weather forecast. Much more data. Very much more data. What does one do with all this data? Answer: compress it.
In the Department for Numerical Software (NUSO) at the Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), a new group has been founded to look at the compression of data resulting from numerical simulations. One of the fields in which it has been active is the compression of meteorological data. The motivation was the drastically increased amounts of meteorological data needing to be stored. This is a consequence of improved weather models and of the higher resolution with which forecasts are being made. However the data is also being used for new purposes: direct sales, data-mining and reanalyses. All of these tend to increase the amount of data that is archived for longer periods of time.
The initial impetus was provided by the plans of the German Weather Service (Deutscher Wetterdienst) to switch from their Local Model (LM) to the Local Model-Europe (LME). This new model not only covers a larger area but also has a higher vertical resolution. The amount of data they plan to archive can be seen in the graph below. Medium-term storage requirements, ie for about a year, are planned to be just short of 4 Petabytes. That is 4,000,000,000,000,000 bytes.
Compression involves changing the form of data in a file, so that the compressed file takes up less space than the original file. There are two types of compression: lossy and lossless. If the compression is lossy, one cannot retrieve the original file from the compressed file. The advantage of this approach is that the data can be compressed to a much greater degree. Lossy compression is typically used to compress graphic and video files, especially in the context of the Internet. Typical lossy compression programs are JPEG and MPEG, which can reduce the size of a file by factors of ten to fifty.
On the other hand, if lossless compression is used, the original file can be retrieved exactly from the compressed file. Lossless compression is typical for text files, but also for files containing sensitive numerical data, eg medical or meteorological data. All Zip utilities perform lossless compression, with compression factors of around 1.5 to 3.
Meteorological data is typically stored in the GRIB format. This format, gridded data in binary form, is an international standard for the storage and exchange of meteorological data. The usual procedure is to first put the data obtained from a weather forecast into the GRIB format. This is lossy compression. Afterwards, however, any compression is required to be lossless. The new group established at SCAI has developed a program, GRIBzip, for the lossless compression of meteorological data stored in the GRIB1 format (ie GRIB Version 1). As an example, if the data is formatted in GRIB1 with 16-bit precision, then the GRIB files produced by a typical weather forecast can be reduced in size losslessly by, on average, a factor of three. Extrapolating this to the example of the German Weather Service, storage would be required for only 1.5 Petabytes of data instead of 4 Petabytes.
Archiving data in a compressed form has another advantage that may not be immediately apparent. Normally the archiving system consists of hardware separate from the other computers, meaning the connection to the archiving system can be a bottleneck. By using compressed data, the bandwidth of the connection is effectively increased by the same factor as the data has been compressed. In our example, the bandwidth would be effectively increased by a factor of three. A patent for the techniques used in the program has been applied for.
The programs developed in SCAI can compress GRIB data on topologically rectangular grids. Other types of grids, such as a triangular grid covering the whole globe, or grids which become sparser towards the poles, are also allowed by the GRIB format. We are planning programs able to losslessly compress this type of data. In addition, spectral data is sometimes stored in the GRIB format, and its compression is also desirable.
The group has also developed programs for compressing data produced by crash simulations (Pamcrash, LS-Dyna) in the automobile industry. Since this data is defined on irregular grids, the techniques used are quite different from those for data on regular grids. Using the expertise gained from these extreme cases, the group intends to compress data resulting from other types of simulations.
Rudolph Lorentz, Institute for Algorithms and Scientific Computing - SCAI, Fraunhofer ICT Group
Tel: +49 (0) 2241 143 480