IBM Data Asset eXchange - PubLayNet

This page contains links to download PubLayNet dataset hosted as part of Data Asset eXchange (DAX).

Part files

Since the training data is large (~100GB), here the training images are split into 7 smaller archives of 13GB each.

In addition, there are also test (2GB), val (3.1GB) and label (314MB) archives.

Download all these archives into one location and simply extract each of them to reconstruct the whole dataset.

Full Archive

Alternatively you can also download the full archive and simply extract it.

Warning: large download size - 96GB