EzXML.jl is a package to handle XML/HTML documents for primates.
The main features are:
- Reading and writing XML/HTML documents.
- Traversing XML/HTML trees with DOM interfaces.
- Searching elements using XPath.
- Proper namespace handling.
- Capturing error messages.
- Automatic memory management.
- Document validation.
- Streaming parsing for large XML files.
Install EzXML.jl as follows:
julia -e 'using Pkg; Pkg.add("EzXML")'
This package depends on libxml2, which will be automatically installed as an artifact via XML2_jll.jl if you use Julia 1.3 or later. Currently, Windows, Linux, macOS, and FreeBSD are now supported.
EzXML.jl | Julia |
---|---|
1.0 | 1.0 or later |
1.1 | 1.3 or later |
1.2 | 1.6 or later |
# Load the package.
using EzXML
# Parse an XML string
# (use `readxml(<filename>)` to read a document from a file).
doc = parsexml("""
<primates>
<genus name="Homo">
<species name="sapiens">Human</species>
</genus>
<genus name="Pan">
<species name="paniscus">Bonobo</species>
<species name="troglodytes">Chimpanzee</species>
</genus>
</primates>
""")
# Get the root element from `doc`.
primates = root(doc) # or `doc.root`
# Iterate over child elements.
for genus in eachelement(primates)
# Get an attribute value by name.
genus_name = genus["name"]
println("- ", genus_name)
for species in eachelement(genus)
# Get the content within an element.
species_name = nodecontent(species) # or `species.content`
println(" └ ", species["name"], " (", species_name, ")")
end
end
println()
# Find texts using XPath query.
for species_name in nodecontent.(findall("//species/text()", primates))
println("- ", species_name)
end
See the reference page or docstrings for more details.
Types:
EzXML.Document
: an XML/HTML documentEzXML.Node
: an XML/HTML node including elements, attributes, texts, etc.EzXML.XMLError
: an error happened in libxml2EzXML.StreamReader
: a streaming XML reader
IO:
- From file:
readxml(filename|stream)
,readhtml(filename|stream)
- From string or byte array:
parsexml(string)
,parsehtml(string)
- To file:
write(filename, doc)
- To stream:
print(io, doc)
Accessors:
- Node information:
nodetype(node)
,nodepath(node)
,nodename(node)
,nodecontent(node)
,setnodename!(node, name)
,setnodecontent!(node, content)
- Node property:
node.type
,node.name
,node.path
,node.content
,node.namespace
- Document:
- Property:
version(doc)
,encoding(doc)
,hasversion(doc)
,hasencoding(doc)
- Node:
root(doc)
,dtd(doc)
,hasroot(doc)
,hasdtd(doc)
,setroot!(doc, element_node)
,setdtd!(doc, dtd_node)
- Property:
- Document property:
doc.version
,doc.encoding
,doc.node
,doc.root
,doc.dtd
- Attributes:
node[name]
,node[name] = value
,haskey(node, name)
,delete!(node, name)
- Node predicate:
- Document:
hasdocument(node)
- Parent:
hasparentnode(node)
,hasparentelement(node)
- Child:
hasnode(node)
,haselement(node)
- Sibling:
hasnextnode(node)
,hasprevnode(node)
,hasnextelement(node)
,hasprevelement(node)
- Node type:
iselement(node)
,isattribute(node)
,istext(node)
,iscdata(node)
,iscomment(node)
,isdtd(node)
- Document:
- Tree traversal:
- Document:
document(node)
- Parent:
parentnode(node)
,parentelement(node)
- Child:
firstnode(node)
,lastnode(node)
,firstelement(node)
,lastelement(node)
- Sibling:
nextnode(node)
,prevnode(node)
,nextelement(node)
,prevelement(node)
- Document:
- Tree modifiers:
- Link:
link!(parent_node, child_node)
,linknext!(target_node, node)
,linkprev!(target_node, node)
- Unlink:
unlink!(node)
- Create:
addelement!(parent_node, name, [content])
- Link:
- Iterators:
- Iterator:
eachnode(node)
,eachelement(node)
,eachattribute(node)
- Vector:
nodes(node)
,elements(node)
,attributes(node)
- Iterator:
- Counters:
countnodes(node)
,countelements(node)
,countattributes(node)
- Namespaces:
namespace(node)
,namespaces(node)
Constructors:
EzXML.Document
type:XMLDocument(version="1.0")
,HTMLDocument(uri=nothing, externalID=nothing)
EzXML.Node
type:XMLDocumentNode(version="1.0")
,HTMLDocumentNode(uri, externalID)
,ElementNode(name)
,TextNode(content)
,CommentNode(content)
,CDataNode(content)
,AttributeNode(name, value)
,DTDNode(name, [systemID, [externalID]])
Queries:
- XPath:
findall(xpath, doc|node)
,findfirst(xpath, doc|node)
,findlast(xpath, doc|node)
(Note the caveat on the combination of XPath and namespaces in the manual)
- primates.jl: Run "primates" example shown above.
- julia2xml.jl: Convert a Julia expression to XML.
- listlinks.jl: List all links in an HTML document.