We initially explored datalad, but other options are very interesting too:
datalad
Very powerful because directly based on git-annex, but I still haven't fully understood how to use it properly/efficiently.
Datalad is a data management system, and only that (to my knowledge). Very efficient because concentrated on this one task, but somehow limits our application. Or calls for the use of other tools in combination. Which might just be ok.
intake
Simple set of tools but also powerful. Because simple, the community could easily contribute new catalog entries (through yaml files).
-
Allows for local file caching
-
Dask capabilities for big data
-
Cloud access support
-
Possibility for a simple GUI
-
Storing catalog metadata in files makes the structuring of our portal very easy to understand and efficient.
-
The use of the yaml format makes community contribution easier, even from non coders (json and more xml can be intimidating if not used to coding at all).
Intake is more than just a data management tool. Not only the data download step is streamlined but also the reading through the many drivers available (and easy to implement new ones).
pooch
Simple and similar to intake, instead data sources are not really considered as catalogs. Developed to download test data for libraries so we might see some limitations for our metadata portal.
This comparison will be further modified/refined.
We initially explored datalad, but other options are very interesting too:
datalad
Very powerful because directly based on git-annex, but I still haven't fully understood how to use it properly/efficiently.
Datalad is a data management system, and only that (to my knowledge). Very efficient because concentrated on this one task, but somehow limits our application. Or calls for the use of other tools in combination. Which might just be ok.
intake
Simple set of tools but also powerful. Because simple, the community could easily contribute new catalog entries (through yaml files).
Allows for local file caching
Dask capabilities for big data
Cloud access support
Possibility for a simple GUI
Storing catalog metadata in files makes the structuring of our portal very easy to understand and efficient.
The use of the yaml format makes community contribution easier, even from non coders (json and more xml can be intimidating if not used to coding at all).
Intake is more than just a data management tool. Not only the data download step is streamlined but also the reading through the many drivers available (and easy to implement new ones).
pooch
Simple and similar to intake, instead data sources are not really considered as catalogs. Developed to download test data for libraries so we might see some limitations for our metadata portal.
This comparison will be further modified/refined.