Datasets¶
A number of real-world graphs are bundled with the library and exposed through the
libam.datasets submodule. Importing the submodule gives you ready-to-use
Dataset objects, with no manual download or unpacking steps required.
The first time you access a dataset, its file is fetched over the network from the
libam-datasets repository,
and cached on disk (under the per-user pooch cache, e.g. ~/.cache/libam).
Subsequent accesses load directly from that cache, so a dataset is only ever downloaded once.
Loading a dataset¶
Every dataset exposes two methods. Use graph() to get the
raw graph as a NetworkX object, or graphpair() to get a
GraphPair ready for alignment:
from libam import datasets
# Downloaded and cached on first access, loaded from cache thereafter.
# NetworkX.Graph, can use to create a GraphPair
g = datasets.bio_celegans.graph()
# libam.GraphPair, can be used directly in algorithms
pair = datasets.bio_celegans.graphpair().permute().add_noise(target_noise=0.05)
Available datasets¶
The following datasets are available as attributes of libam.datasets.
Single-graph datasets (structure only), often used to build mirrored graph pairs
graphpair() together with permute() and add_noise():
bio_celegansbio_dmelaca_astro_phca_erdos992ca_gr_qcca_netsciencein_arenasinf_euroroadinf_powersoc_facebooksoc_hamsterstersocfb_bowdoin47socfb_hamilton46socfb_haverford76socfb_swarthmore42
Paired datasets, each holding a source graph, a target graph and a ground-truth mapping. These include node features unless noted:
coradoubanacm_dblpallmv_tmdbfb_twppifoursquare(no node features)phone(no node features)