Extracting data from XML can be a very complex task, but the complexity of this task depends on people who design XML in the first place. In this article, we will provide you with some examples of common XML problems. We will also talk about things to avoid and how to make the life of developers easier.

What is XML?

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards. The design goals of XML emphasize simplicity, generality, and usability over the Internet. It is a textual data format with strong support via Unicode for the languages of the world. Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services.

Source: Wikipedia

XML Problems

XML is a very inefficient way of storing the data 

For example:

<OrderMessage>
<OrderNumber>1</OrderNumber>
</OrderMessage>

The XML above has only one byte of information the rest of it is metadata. Using too much metadata requires more processor power, more memory, more disc storage and increases network traffic (Which is great news for hardware vendors, but bad news for the people who have to pay for it)

XML is too flexible

This sounds like an advantage but the flexibility of XML can lead to unnecessary complexity and it can make it hard for developers to understand the XML structure, therefore lead to mistakes, increase in development time and cost.

In some of the cases, it is necessary to convert XML into a simplified format so it can be loaded into the database.

This XML can be loaded by most of the ETL tools.

<CustomerTable>
<CustomerRecord>
<CustomerID>1</CustomerID>
<CustomerName>Peter Jones</CustomerName>
</CustomerRecord>
<CustomerRecord>
<CustomerID>2</CustomerID>
<CustomerName>Bill Watson</CustomerName>
</CustomerRecord>
</CustomerTable>

This XML has to be transformed in the format above so it can be loaded into a database:

<CustomerTable>
<CustomerRecord CustomerID="1" CustomerName="Peter Jones"/>
<CustomerRecord CustomerID="2" CustomerName="Bill Watson"/>
</CustomerTable>

Incosintent format

There is no guarantee that the next file you receive will have the same format as the previous one. It might have some elements missing, different element order or different encoding. Yes, this kind of problem can be easily addressed using XSLT transformations but again it requires more hardware resources.  

XML design tips:

  • Use XML only when necessary
  • Too much metadata is a bad thing
  • Keep tags short
  • Keep it simple and clean
  • Design XML in such a way so it can be loaded without conversion

Learn how  to work with XML:

Please contact us if you need help with transforming the data

Visit ETL Tools Forum

User Rating: 5 / 5

Star ActiveStar ActiveStar ActiveStar ActiveStar Active
 
 
Xerox
Swiss banking
Bank Of Oklahoma
Red Cross
Alta Pacific bank
Copeinca
Gas alberta
NHS
Royal Brunei
First Oklahoma bank
Noresco
Iqvia

Testimonials

What customers say about us

  • swissbanking

    I used Advanced ETL Processor in 2 Enterprises for many business processes and Business automation (outside finance department). I did not find any other tool with so many functions and broad flexibility for that Price! If you need support for bugs or solution design you will get it very fast. Best Support I have ever seen.

    Lionel Albrecht
  • iqvia

    IQVIA and DB Software Laboratory (DBSL) partnered in 2010 and have been working in close cooperation ever since. Over this period of time, DBSL software components formed an integral part of a large number of IQVIA applications currently used by over 20 UK NHS Trusts (Hospitals).

    Dmitry Dorsky,
    Director
  • xerox

    The product is easy to learn and once a developer understands the ETL way for solving the problem at hand, the developer's productivity will increase. Even our DBAs now uses the ETL software to quickly create solutions instead of SSIS or SQL jobs.

    Daniel Fung
    Solutions Architect

Read ETL Software customers feedback

This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies