ConnectionStudio integrates highly heterogeneous data into graphs, enriched with extracted entities. Studio users can discover the entities in their data, navigate across connections between datasets, explore and query the data in many ways. The Studio currently supports: CSV, JSON, XML, RDF, text, property graphs, all Office formats, and PDF datasets.
ConnectionStudio is a novel front-end to ConnectionLens, Abstra and PathWays (see also the respective Web sites). Its own novel features are outlined in a CoopIS 2023 article.
Creating Projects
Each project corresponds to a set of datasets, that may be heterogeneous at the model and/or schema level.
Loading the datasets
After creating a project, one can upload its data inside. This will load all datasets into a unique, integrated data graph, which preserves all the aspects of the incoming datasets. It also extracts Named Entities for the leaf value data nodes.
View Statistics
To get a first overview of the data, users can get a look at the overall statistics. Several views are shown.
Abstraction
Next, leveraging Abstra, users can grasp the content and structure of a dataset by looking at lightweight Entity-Relationship diagrams, computed out of any semi-structured dataset.
Paths Exploration
One can also explore connections between Named Entities found in the data using trained Language Models. This entity-focused path exploration is implemented by PathWays.
Data View
When users feel confident to do so, they can enter the data view and query the dataset. Here, the user is getting the list of all French deputees, for which we get their mandate label and starting date.
One can also look for inter-dataset information by querying multiple datasets at a time, e.g., here we are looking for all French deputees that declared having financial interest in some CAC 40 companies, i.e., the 40 most influencial French companies.
Data Visualization
Finally, after looking at the Data View, one can dig in to the data graph itself by using keyword search.
Try out
The source code with instructions to deploy the Studio are available here: https://gitlab.inria.fr/cedar/connection-studio
More Info
The ConnectionStudio crew comprises: Nelly Barret, Simon Ebel, Théo Galizzi, Ioana Manolescu and Madhulika Mohanty (firstname.lastname@inria.fr). We thank Camille Pettineo (now at INA) for her suggestions!
Please contact Ioana Manolescu (ioana[dot]manolescu[at]inria[dot]fr) to discuss further opportunities!