Recent Posts

Tuesday, 20 September 2016

XML Parser Tutorial


How to view XML files?
     By using Browsers, text editors, XML editors we can view XML files.

How to develop an XML document?
1. In order to develop a useful xml document, we need to develop one more file. i.e., DTD or XSD file.
2. In DTD file or XSD file we specify the XML vocabulary.

What is a valid XML?
     If an XML document is developed according to rules specified in DTD / XSD, it is known as valid XML document.

Note
1. Every valid XML is a well-formed, but vice versa .need not be true.
2. First well-formedness is checked then it checks for validness will be checked]

How to check the well-formedness of XML document?
     Using XML parsers, XML Editors, Browsers .... etc.

How to validate XML document?
     Using XML parsers, XML editors ... etc.

     Every XML application needs two things to work with XML
1. Parser
2. API

What is an XML parser? What are its functions?
     XML parser is an API that enables XML applications to work with XML document. An XML parser performs the following things.
1. Reading the XML document.
2. Checking well-formedness
3. Verify its validity.
4. Making XML data available to XML application.


1. XML application instantiating the parser and specifying the XML file to the parser. (i.e., Passing the name of the XML file name to the Parser )
2. Parser reads the specified XML document and verifies its well-formedness.
3. In the XML document parser gets the information about the DTD or XSD. Parser verifies the correctness of the DTD or XSD.
4. Parser reads the metadata specified in the DTD or XSD file into the memory.
5. Basing on the metadata read into memory; XML Parser verifies the validity of the XML document.
6. Parser makes XML content (data) available to the XML application either in the form of a tree structure or in the form of events.

Valid XML document
     XML parser can make data stored in XML document available to XML application if and only iff the XML documents are valid.
There are two approaches for developing valid XML.
1. Using DTD
2. Using XSD

What is DTD?
1. DTD stands for Document Type Definition.
2. A DTD is a text file with .dtd extension.
3. If XML file holds data, its corresponding DTD holds Meta data.
4. In a DTD legal building blocks of an XML documents are specified. i.e., XML vocabulary is specified in a DTD.

     From a DTD point of view, all XML documents are made up by the following building blocks
1. Elements
2. Attributes
3. Entities
4. PCDATA
5. CDATA

Elements
     Elements are the main building blocks of both XML and HTML documents. Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img".
Examples
<body>some text</body>
<message>some message</message>
Attributes
      Attributes provide extra information about elements. Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following "img" element has additional information about a source file
<img src="ashok.jpg">
     The name of the element is "img". The name of the attribute is "src". The value of the attribute is "ashok.jpg". Since the element itself is empty it is closed by a " /".

Entities
     Some characters have a special meaning in XML, like the less than sign (c)t hat defines the start of an XML tag. Most of you know the HTML entity: "&nbsp;". This "no'-breaking-space" entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser. The following entities are predefined in XML


PCDATA - Parsed Character Data
     XML parser normally parse all the text in an XML document. When an XML element is parsed, the text between the XML tags is also parsed.
<message>This text is also parsed</message>
     The parser does this because XYlL elements can contain other elements, as in this example, where the <name> element contains two other elements (first and last)
<name><first>Mariyala</first><last>Ashok Kumar</last></name>
and the parser will break it up into sub elements like this
<name>
   <first>Mariyala</first>
   <last>Ashok Kumar</last>
</name>
     Parsed Character Data (PCDATA) is a term used about text data that will be parsed by the XML parser.

CDATA - (Unparsed) Character Data
     The term CDATA is used about text data that should not be parsed by the XML parser. Characters like "<" and "&" are illegal in XML elements.
"<" will generate an error because the parser interprets it as the start of a new element.
"&" will generate an error because the parser interprets it as the.start of an character entity.

     Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be defined as CDATA.

     Everything inside a CDATA section is ignored by the parser. A CDATA section starts with "<![CDATA[" and ends with "]]>".
E.g
<item>
   <title>Title of Feed Item</title>
   <link>/mylink/article1</link>
   <description>
      <![CDATA[
         <p>
         <a href="/java/corejava">
         <img style="float: left; margin-right: 5px;" height="80" src="/java/collection alt=""/></a>Author Names
         <em>Date</em>
         Paragraph of text describing the article to be displayed</p>
      ]]>
   </description>
</item>
     In the example above, everything inside the CDATA section is ignored by the parser. A CDATA section cannot contain the string "]]>". Nested CDATA sections are not allowed. The "]]>" that marks the end of the CDATA section cannot contain spaces or line breaks.

DTD - Elements
Declaring an Element
     In the DTU, XML elements are declared with an element declaration. An element declaration has the following syntax:
<! ELEMENT element-name (element-content)>
Empty elements
     Empty elements are declared with the keyword EMPTY inside the parentheses
<!ELEMENT element-name (EMPTY)>
Example
<!ELEMENT br (EMPTY)>
Elements with data
     Elements with data are declared with the data type inside parentheses.
<!ELEMENT element-name (#CDATA)>
Or
<!ELEMENT element-name (#PCDATA)>
Or
<!ELEMENT element-name (ANY)>
E.g
<!ELEMENT note (#CDATA)>
     CDATA means the element contains character data that is not supposed to be parsed by a parser. PCDATA means that the element contains data that is going to be parsed by a parser. The keyword any declares an element with any content.
      If a #PCDATA section contains elements, these elements must also be declared.

Elemets with children (sequences)
     Elements with one or more children are defined with the name of the children elements inside the parentheses.
<!ELEMENT element-name (child-element-name)>
Or
<!ELEMENT element-name (child-element-name, child-element-name, ...)>
Or
<!ELEMENT element-name (to, from, heading, body)>
     When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. In a full declaration, Lhe children must also be declared, and the children can also have children. The full declaration-of the note document will be
<! ELEMENT note (to, from, heading, body) >
<! ELEMENT to (#CDATA) >
<! ELEMENT from (#CDATA) >
<! ELEMENT heading (#CDATA) >
<! ELEMENT body (#CDATA) >

Next Tutorial  DTD Tutorial in XML

Previous Tutorial  XML Entity References


No comments:

Post a Comment