Stories
Slash Boxes
Comments

In+ersec+ion for Spatial People

Spatial Data Integrator, Open Source Spatial ETL

posted by lxnyce on Thursday January 10, @02:13PM   Printer-friendly   Email story  Permalink  Trackback URI  Slashdotthis  Diggthis  Del.icio.us
Camptocamp writes "Camptocamp, one of the European leaders in Open Source Geographical Information Systems (GIS), and Talend, the first provider of Open Source data integration software, released in 2007 Spatial Data Integrator, Powered by Talend.

Spatial Data Integrator is the first Open Source ETL (Extract, Transform, Load) solution specialized in the manipulation of geographical information and supported by an editor.

With the intuitive and easy-to-use graphical environment of Talend’s data integration technology, Talend Open Studio (TOS), Spatial Data Integrator users can select the architecture that fits their needs and also combine geographical data sources with non-geographical data sources.

A solid foundation: Talend Open Studio
TOS offers more than 230 native connectors to solutions that include leading ERPs and CRMs, any commercial and Open Source databases, files of all formats, MOM (Message Oriented Middleware) and Web services.
Besides, TOS can be used in many types of data integration environment, focusing mainly on operational requirements such as data migration, fusion and data warehouse loading.
TOS is an Open Source project for data integration and operates as a code generator allowing data transformation scripts (called « jobs »). The jobs are designed using graphical components. Considering the variety of available components, the code generator optimizes greatly the performances when executing jobs. Data transformation scripts and underlying programs are generated either in Perl or in Java, including SQL for basic database components. TOS provides advanced functionalities allowing to export jobs that can then be executed from within the studio or as standalone scripts (web services). Finally, it allows the integration of industry and/or company specifications within the code.

Spatial Data Integrator or TOS spatial extension
The geospatial module includes a first integration level of the JTS library (Java Topology Suite) and GeoTools (about thirty geospatial components). It offers metadata transformation components as well.
Spatial Data Integrator (SDI) builds in three three sorts of Geo components: input components, output components and transformation components. Input and output components read features from and write feature to datastores, respectively. Transform components read features from their input flows, possibly transform those features, and write features to their output flows (note: the term transform is to be taken loosely here as it represents any sort of operation.)
The Geo family of these components is made of several sub-categories: Calculators, Collectors, Geometric Operators et Manipulators.
SDI input and output components can read from and write to files of different formats. The data sources supported by the first version of SDI are for the following formats: ESRI ShapeFile, MapInfo MIF/MID, PostGIS and GeoRSS.

The components of the Metadata family perform XSLT transformations on metadata XML files, in order to go from one norm to another, or from one profile to another (ex: ISO19115 vers ISO19139).

Spatial Data Integrator's architecture is similar to Talend Open Studio's. It relies on three main modules: Business Designer, Job Designer and Metadata Manager. It is recommended to use these modules from the concept to the process so to have a coherent method. However, it is possible to use their components in any order, and according to needs.

Business Modeler
All business models of a project are grouped in the Business Designer. It is a non technical view of a business workflow need. Generally, a typical business model will include the strategic systems or process steps already up and running in a company as well as new needs. A palette of graphic components is therefore available with TOS. The same way as with shapes and connecting lines, simply « drag and drop » an item from the Repository panel to assign it to the relevant shape in the modeling workspace.

Job Designer
Job Designer includes all development components. It provides a graphic interface describing processes and defining project management. A job design is the runnable layer of a business model. It translates business needs into codes, routines , and programs, in other words it technically implements your data flow.

The advanced mapping component has an intuitive and easy-to-use graphical environment. It also enables to carry through more or less complex mapping operations, according to needs.
Once the job design developed, an up and running dataflow can be put in place in Java or in Perl, and then can be deployed on the right servers. The software will generate automatically technical documentation on the job.

Metadata Manager
Spatial Data Integrator includes a centralized repository enabling easy reuse of data sources descriptions (files, database). Additionally, the quick modifications of various data sources used in different processes can be done with this repository. A group of assistance tools automatically define the properties of a specific data source in this repository as well.

Spatial Data Integrator Application Exemple
Spatial Data Integrator has been used in the project of a French national administration whose needs were the migration of metadata. Camptocamp created an elaborate mapping of a database towards an XML file (ISO19115:2003 norm of metadata). SDI allowed to first define the mapping and the corresponding processes and then to export the developed code in order to integrate it into a specific Java application (independent from SDI since it had been used as code generator).

Spatial Data Integrator Road Map
Spatial Data Integrator road map includes the integration of Udig desktop GIS technology (developed with Eclipse RCP -Java development environment-) in order to provide SDI with a visual interface allowing to display transformations results.
In 2008, SDI should integrate RASTER manipulation components technology with the version 3 of jGRASS (scheduled to be released at the end of the first semester in 2008) as well.
Close links with the GeoNetwork metadata catalog are currently been developed; they will provide GeoNetwork automatism mechanisms for metadata generation.
Camptocamp plans also to increase the number of data formats supported by SDI and to integrate additional components in order to offer added functionalities to data transformation and data processing.

Additional information:

Find more information on Spatial Data Integrator go to:
http://www.spatialdataintegrator.com/

Screen shots, database sets and tutorials are available on the following wiki:
http://www.talendforge.org/wiki/doku.php?id=sdi:Ma inPage

SDI Forum(for technical inquiries):
http://www.talendforge.org/forum/index.php

Camptocamp contacts:

David JONGLEZ, Associated Director, david.jonglez@camptocamp.com"

Related Stories

Spatial Data Integrator 1.1 Released [+]
Camptocamp announced the release of Spatial Data Integrator 1.1, an open source ETL (extract, transform, load) solution. From the press release: " Based on Talend’s data integration technology, Talend Open Studio (TOS), SDI includes components such as reading/writing main SIG file formats, geospatial data transformation, and creation/publication of metadata. SDI uses the following Open Source components: GeoTools, JTS, GeoNames Webservices, GeoNetwork [...] This new version of SDI combines various developments [...] These developments include: Projections support at Input/Output and data reprojections support; 2 new components groups: GeoCoding et DataQuality; [...]" See also related stories below.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.