Gaia DR2 bulk catalogue available in FAPEC format

2019-02-11 00:00:00
Gaia DR2 bulk catalogue available in FAPEC format
The Gaia group at the Universitat de Barcelona (IEEC – ICCUB), in cooperation with DAPCOM Data Services S.L. (a technological spin-off company of the UPC and the UB), has published an alternative copy of the bulk data files from Gaia DR2 – the second data release from Gaia.

Gaia DR2 was published on 25 April 2018. Besides the on-line catalogue, bulk CSV files were also made available for download – an interesting option for exhaustive analyses. Such files are officially offered in “csv.gz” format, that is, compressed with the widely known gzip compressor.

On 6 February 2019, DAPCOM released FAPEC Archiver 19.0, a professional data compression software offering high compression ratios at high speeds. One of the options provided is the compression of tabular (CSV-like) text files, such as those from the bulk Gaia DR2. As a demonstration of the capacities of FAPEC, DAPCOM converted the full Gaia DR2 bulk CSV files to the FAPEC format, reducing the total size from 554 GB to 471 GB – that is, 15% smaller than with gzip. Other data compressors like bzip2rarZstandard or 7-zip cannot reach this mark. Specifically, for the largest tables:
 

  • gaia_source has been reduced from 548 GB to 466 GB. We have also combined several CSV files into larger FAPEC archives to improve download transfer speeds.
  • gaia_source_with_rv, from 3.1 GB to 2.5 GB.
  • light_curves, from 2.3 GB to 1.9 GB.

You can now download Gaia DR2 in csv.fapec format here:

Gaia DR2 csv.fapec bulk download

There you will also find the scripts used for the gzip-to-fapec conversion, as well as the log files from the process, during which we checked each of the files to make sure no data was lost or corrupted.
Free FAPEC decompression licenses can be obtained from the DAPCOM website.
Have fun!

Share This