Extracting data from HTML tables
Transforming HTML tables can be a real challenge. First of all HTML document can have multiple tables, Than a table might have tables inside. Also, cells night have not only text inside, but we are only interested in the text. In some cases, it is possible to use XSLST transformation to convert HTML into a more useful format. However, most of the time HTML page does not represent a valid XML document. In our case, we decided to use a third party custom HTML parser. We also decided to assign every table a sequential number.