
What
is the difference between XML and HTML?
In
contrast to SGML or XML, HTML is a specific markup language that
contains a fixed set of elements and attributes. HTML has a limited
repertoire of structural tags like headings, lists, and links, some
tags for encoding formatting information like text attributes and
layout, and very few tags for encoding types of information content.
This design decision by Tim Berners-Lee, the inventor of the Web,
was the right choice because it made HTML easy to understand and
implement, leading to its rapid adoption. The idea of naming things
in plain text with the content "between the tags" is highly intuitive,
so even children can create Web pages.
However,
as the Web evolved, HTML's initially simple tag set encouraged "hand-crafting"
of Web pages, misuse of tags to achieve formatting effects, and
proprietary additions to HTML by browser and application developers
who needed richer markup to support additional functionality. Furthermore,
while HTML can be described using a DTD, the vast majority of HTML
on the Web is invalid. Taken together, HTML's fundamental limitations
and typical usage without validation make it difficult for search
engines and automated processes to exploit Web information because
of the lack of reliable semantic encoding. XML can solve these problems
with HTML and give the Web a much stronger capability for electronic
commerce. XML makes it possible to encode information with meaningful
structure and semantics in a very accessible notation that is both
human-readable and readily processable by computers. While XML 1.0
adds no new modeling capabilities beyond those that have been available
in SGML for over a decade, the simpler XML syntax makes it much
easier for non-specialists to participate in the design of new markup
languages..