Structured data is the easiest to search, organize and analyze, because it is usually contained in rows and columns and its elements can be mapped into fixed pre-defined fields. For the data to be machine-readable (a computer to be able to process the data) it must be structured. This is therefore an important milestone in creating machine-readable standards.
We would kindly ask you to help us out with our research, it will take 3 minutes tops! Filling in the form at the bottom of this page would help us greatly. Check it out by clicking here, or read through the story first and then find the form at the bottom of this page.
The content of most standard bodies however consists of XML documents nowadays. XML is a markup language and a form of semi-structured data. So how can we then transform this semi-structured data into structured data?
Using the xml2csv Python module you can extract several entities using only a few lines of code and export these to a CSV file. These entities include ICS codes, dates, references, terms and more, You can even extend the XML processor class to implement your own processor.
Let’s dive in with an example. We will extract all title information from an XML document. This includes the language, introductory title, main title, complementary title and full title. See the links below for the input (XML file) and the output (CSV file).
Input: XML -> Output: CSV!
This can by achieved by writing a few lines of code. See sample code below:
1. # import required modules
2. from xml2csv import TitleProcessor
3. from csv import DictWriter
4.
5. # create reader to load XML file and writer to write CSV file
6. reader = open('input.xml', 'r', encoding='utf-8')
7. writer = DictWriter(open('output.csv', 'a'), delimiter=',', lineterminator='\n', fieldnames=TitleProcessor.fieldnames)`
8.
9. # create a processor and call it’s process method
10. p = TitleProcessor(reader, writer)
11. p.process()
Our goal is to continuously improve standards so use will become easier every day. For this we need your feedback about ICS and this article. Please fill in the form just below this text to help us improve. It will take 3 minutes tops!
For more information see the README file and the API documentation.