Datasets

A number of real-world graphs are bundled with the library and exposed through the libam.datasets submodule. Importing the submodule gives you ready-to-use Dataset objects, with no manual download or unpacking steps required.

The first time you access a dataset, its file is fetched over the network from the libam-datasets repository, and cached on disk (under the per-user pooch cache, e.g. ~/.cache/libam). Subsequent accesses load directly from that cache, so a dataset is only ever downloaded once.

Loading a dataset

Every dataset exposes two methods. Use graph() to get the raw graph as a NetworkX object, or graphpair() to get a GraphPair ready for alignment:

from libam import datasets

# Downloaded and cached on first access, loaded from cache thereafter.
# NetworkX.Graph, can use to create a GraphPair
g = datasets.bio_celegans.graph()

# libam.GraphPair, can be used directly in algorithms
pair = datasets.bio_celegans.graphpair().permute().add_noise(target_noise=0.05)

Available datasets

The following datasets are available as attributes of libam.datasets.

Single-graph datasets (structure only), often used to build mirrored graph pairs graphpair() together with permute() and add_noise():

  • bio_celegans

  • bio_dmela

  • ca_astro_ph

  • ca_erdos992

  • ca_gr_qc

  • ca_netscience

  • in_arenas

  • inf_euroroad

  • inf_power

  • soc_facebook

  • soc_hamsterster

  • socfb_bowdoin47

  • socfb_hamilton46

  • socfb_haverford76

  • socfb_swarthmore42

Paired datasets, each holding a source graph, a target graph and a ground-truth mapping. These include node features unless noted:

  • cora

  • douban

  • acm_dblp

  • allmv_tmdb

  • fb_tw

  • ppi

  • foursquare (no node features)

  • phone (no node features)

The Dataset class

class libam.datasets.Dataset(filename: str, loader: Callable[[Path], Any], parser: Callable[[...], GraphPair], members: list[str] | None = None)[source]