|
|||||||
|
XML | XML Classes | XML FAQ XML FAQXML FAQ: Table of Contents
XML is short for eXtensible Markup Language, and it is really a set of rules for writing a markup language a markup language. Any markup document that conforms to the rules of XML is known as an 'application' of XML. Here is an example of an XML document. XML uses angled brackets to designate opening tags and closing tags that contain content. The tags may contain attributes and their values. These tags are of the form:
Here is an example of an opening tag with an attribute and its value: < code> Every tag must have a closing tag of the form:
Here is an example of the closing tag:
Unlike HTML ALL attributes must be quoted with either a single or a double quote. The quotes must match. Also unlike HTML the tags are case sensitive, i.e. If a tag does not have a closing tag, i.e. if it is an empty tag similar to the
Note the penultimate forward slash. Here is an example of an empty tag in XML. Like HTML XML can contain comments, and the syntax for comments is similar to that of HTML.
XML really describes a 'grammer' in which we can write our own Mark-up language. It is similar to SGML and is 100% compatible with SGML. HTML is a mark-up language written according to the rules of SGML. It is an application of SGML.
What is meant by markup and markup languages? A markup language is the set of rules, the grammar, and syntax that tells how a language which marks up documents should be "spoken". SGML is a markup language, and HTML is the vocabulary of a particular dialect of that language, albeit a very widely spoken dialect. HTML follows the rules of SGML. XML is also a markup language with a grammar that is based on but substantially more simple than SGML. Markup are the symbolic tag sets that are used to indicate that some thing needs to be done to the text. The <b></b> pair is markup in HTML. In XML and SGML it corresponds to the tags. Markup can take one of three forms, semantic, stylistic, or structural. Semantic markup gives information about the text it is marking up eg. In the element
the hamlet tag tells us that the words are being spoken by Hamlet. In the HTML element
Stylistic markup tells us about the style that should be used to display a document item. In HTML the element
tells that the style of the document should change. Structural markup tells us some thing about the structure of a document. Again in HTML
The XML equivalent of this could be, The old editor's notations of "dele" and "stet" beloved of crossword fans is structual markup.
If it wasn't for HTML hardly anyone would have heard of SGML (standardized general markup language), although it has been an international standard since 1986. It is really a document that lays down rules on how to describe a set of markup tags. HTML is its most well known product. It has been used with great success however to manipulate large bodies of documents, and relies on the fact that a document marked up according to the rules of SGML can be widely understood on a variety of platforms. Its great strength is that it allows the use of semantic tagging which can acuratly describe a documents content. Its chief drawback is its complexity, which makes it difficult for the occasional user, and also makes it difficult to write SGML-compatible software. XML (extensible markup language) is a recent language that is 100% compatible with SGML. It has been designed by the W3C as a version of SGML suitable for use over the Internet. It is still very much in the developmental stage, although the Specification for the language proper is quite established. There is still much work to be done on the form of linking XLL and the form of style sheet to use with it. Originally a simplified version of DSSSL called DSSSL-o or XS (extensible styling) was to be used, but both of these are horribly complicated. Currently it appears that CSS will be used for every day declarative styling and XSL will be used when more powerful document manipulation is required. Most people who have been exposed to this language are wildly enthusiastic about it. It has nearly all the power of SGML with none of the difficulty. Everyone who needs to send documents over the Internet containing information that needs to be manipulated in various ways. (You still make your cool display pages using HTML.)
XML allows us to markup a document with a set of tags of our own devising.
Markup can be of three sorts:- Tells how the document is to be styled. The <I>, <b>, and <U> tags are all stylistic markup in HTML. Tells how the document is to be structured, the <H*>, <P> and the <DIV> tags
are examples of structural markup. Tells us some thing about the content of the text. <title> and <CODE> are
examples of semantic markup in HTML. HTML has proven very adept at preparing documents for display over the web, but a document marked up in HTML tells us very little about the content of the document, and it so happens that for most documents to be useful in a business situation there is a need to know about the documents content. As an example if a patients medical records was marked up in HTML, and I as a doctor had wanted to find out about the patients allergies, at present I would have to download the whole record (several K), and then do a manual search through that document. If however the patients records were marked up in XML and one of the tags was <allergies>, I could just send a request to the Server for that part of the document, and receive a few bytes of information instead of hundreds of kilobytes. Using the same example of patients records, what if we wanted some one to have access to some part of the records, but not others, (Would you really want every one at the Insurance office reading the notes that your Shrink may have written about you?), then you could instruct the server to withhold certain parts of the document. i.e.. in the above example anything marked
up <psych.-note> or <confidential>. Thus the ability for individuals, groups of individuals, and institutions to write their own markup language will expedite information transfer and provide other benefits, such as confidentiality.
More recently it has become obvious that XML can replace proprietary binary codes in Data Bases, and thus make the old dream of the true interchangebility of data across application and platform a reality. XMl is also being used to write many of the new language specs. It has become the de facto language of the World Wide Web consortium (the body that 'governs' HTML). No. XML was designed to be easy, the official specification is a mere 40 pages (download it from http://www.w3.org/TR/) and is written in (almost) readable language.
(They use EBNF notation to describe the keywords. Read section 6, the last section, of this document first.) Any one with a basic understanding of HTML can be writing XML documents in no time at all. What are the rules for writing an XML document?? XML documents come in two flavors, the valid document and the well formed document. Every valid document is well formed, but not every well formed document is valid.
A well-formed document must follow three very simple rules. In addition all the tags and attributes must conform to the rules for writing tags, and all the values of the attributes must be quoted. Here are some examples of some well formed documents. The above example follows all the rules for a well formed document. We have given the 'greeting' element an attribute. Note how the value is quoted. Single quotes could also be used, but the quotes must match. Note that for there to be a unique opening and closing tag we have had to add the xdoc tag. Note how the 'emphasis' tag is nested (i.e. completely enclosed within) in the greeting tag. The following examples are NOT well formed documents. See if you can figure out why. The answers are given at the end of this question
A valid document must be well-formed, and it must also conform to its DTD (Document Type Definition). This is a set of rules describing how the document must be laid out. The DTD (if present) is either written or referenced in the PROLOG of the XML document.
What is the difference between a tag and an element, and what is an empty tag? These two words are NOT interchangeable. In XML, a tag is what is written between angled brackets e.g. <atag>. This is an example of an opening tag. In XML all opening tags must have closing tags of the form </atag>. The way the <P> tag is used in HTML is illegal in XML. In
XML an opening <P> tag requires a closing tag </P>. An element is an opening and a closing tag and what comes in between. is an element. Empty tags must be in a special format namely <emptytag/> (note where the forward slash is), or else you are allowed to write <emptytag></emptytag>. The <IMG> tag is illegal in XML.
Use a convention. I put HTML tags in uppercase, XML tags in lower case.( This convention is becoming quite wide spread.) XML is case sensitive. ie. <Atag> <atag>, and <ATAG> are three different kinds of tags. What are the rules as to how a tag must be written? XML is case-sensitive. ie. <Atag> <atag>, and <ATAG> are three different kinds of tags. A tag name must start with a letter (a-z, A-Z) or an underscore (_) and can contain letters, digits 0-9, the period (.), the underscore (_) or the hyphen (-). White space is not allowed, nor other markup. The colon (:) is reserved for experimental use, and although it is legal at present in may acquire special meaning in the future, so don't use it.(For those interested, it's main use is in namespaces, and reserved keywords.) No name can begin with the sequence "xml..". This sequence is reserved for use by the standardization forum. Your tags should have semantic meaning, otherwise why bother to use XML!! With these few simple rules and conventions in mind go ahead and make tags that describe
your document!! What are the rules as to how an attribute must be written? Tags can contain attributes. An example you may be most familiar with is the <IMG> tag eg In XML an attribute takes the following general form: Note that there must be an equal sign and a value, and the value must be
quoted, so the VSPACE attribute above would have to be VSPACE="75" to be legal in XML. Also in HTML some tags can take an attribute without a value such as <UL COMPACT>.This too would be illegal, you must give an attribute a quoted value such as <UL
COMPACT="anything">, even <UL COMPACT= ""> would do. Attributes have to follow these rules. How do you write comments in XML? XML comments are written the same way as HTML comments. i.e. The XML processor is not required to pass this information on to the user agent, i.e. the piece of software that is converting the document into some thing useful, but XML also uses CDATA sections which is used to escape blocks of text containing markup. What is a CDATA section in XML? CDATA is short for Character DATA. CDATA sections allow us to escape blocks of text containing markup. CDATA sections take the general form: For example, suppose I wanted to print out the following line of text, as would be quite common if I was writing a book on HTML or XML: "The left angled bracket '<' and the ampersand '&' must be replaced by their entities < and & respectively". If I was writing this in HTML I would have to put: By escaping the text using CDATA, I could simply write
What is a Processing Instruction in XML? Processing instructions take the form Processing instructions cannot start with any form of the string "xml" this is reserved for the xml version declaration processing instruction. They can occur any where in the document and contain information that the processor must pass on to the user agent. The version declaration is an example of a processing instruction. Home | Technical Schedule | Application Classes | Class Outlines | MCSE, MCDBA, MCSD Training | Microsoft .NET Programming | Cisco Classes | Linux, Unix, AIX | CompTIA Certification | Webmaster Training | Pricing | Locations | Financing | E-mail Us
|
|
|