diff options
Diffstat (limited to '')
-rw-r--r-- | third_party/rust/xml-rs/Readme.md | 236 |
1 files changed, 236 insertions, 0 deletions
diff --git a/third_party/rust/xml-rs/Readme.md b/third_party/rust/xml-rs/Readme.md new file mode 100644 index 0000000000..5ab88f86d3 --- /dev/null +++ b/third_party/rust/xml-rs/Readme.md @@ -0,0 +1,236 @@ +xml-rs, an XML library for Rust +=============================== + +[![Build Status][build-status-img]](https://github.com/netvl/xml-rs/actions?query=workflow%3ACI) +[![crates.io][crates-io-img]](https://crates.io/crates/xml-rs) +[![docs][docs-img]](https://docs.rs/xml-rs/) + +[Documentation](https://docs.rs/xml-rs/) + + [build-status-img]: https://img.shields.io/github/workflow/status/netvl/xml-rs/CI/master?style=flat-square + [crates-io-img]: https://img.shields.io/crates/v/xml-rs.svg?style=flat-square + [docs-img]: https://img.shields.io/badge/docs-latest%20release-6495ed.svg?style=flat-square + +xml-rs is an XML library for [Rust](http://www.rust-lang.org/) programming language. +It is heavily inspired by Java [Streaming API for XML (StAX)][stax]. + + [stax]: https://en.wikipedia.org/wiki/StAX + +This library currently contains pull parser much like [StAX event reader][stax-reader]. +It provides iterator API, so you can leverage Rust's existing iterators library features. + + [stax-reader]: http://docs.oracle.com/javase/8/docs/api/javax/xml/stream/XMLEventReader.html + +It also provides a streaming document writer much like [StAX event writer][stax-writer]. +This writer consumes its own set of events, but reader events can be converted to +writer events easily, and so it is possible to write XML transformation chains in a pretty +clean manner. + + [stax-writer]: http://docs.oracle.com/javase/8/docs/api/javax/xml/stream/XMLEventWriter.html + +This parser is mostly full-featured, however, there are limitations: +* no other encodings but UTF-8 are supported yet, because no stream-based encoding library + is available now; when (or if) one will be available, I'll try to make use of it; +* DTD validation is not supported, `<!DOCTYPE>` declarations are completely ignored; thus no + support for custom entities too; internal DTD declarations are likely to cause parsing errors; +* attribute value normalization is not performed, and end-of-line characters are not normalized too. + +Other than that the parser tries to be mostly XML-1.0-compliant. + +Writer is also mostly full-featured with the following limitations: +* no support for encodings other than UTF-8, for the same reason as above; +* no support for emitting `<!DOCTYPE>` declarations; +* more validations of input are needed, for example, checking that namespace prefixes are bounded + or comments are well-formed. + +What is planned (highest priority first, approximately): + +0. missing features required by XML standard (e.g. aforementioned normalization and + proper DTD parsing); +1. miscellaneous features of the writer; +2. parsing into a DOM tree and its serialization back to XML text; +3. SAX-like callback-based parser (fairly easy to implement over pull parser); +4. DTD validation; +5. (let's dream a bit) XML Schema validation. + +Building and using +------------------ + +xml-rs uses [Cargo](http://crates.io), so just add a dependency section in your project's manifest: + +```toml +[dependencies] +xml-rs = "0.8" +``` + +The package exposes a single crate called `xml`: + +```rust +extern crate xml; +``` + +Reading XML documents +--------------------- + +`xml::reader::EventReader` requires a `Read` instance to read from. When a proper stream-based encoding +library is available, it is likely that xml-rs will be switched to use whatever character stream structure +this library would provide, but currently it is a `Read`. + +Using `EventReader` is very straightforward. Just provide a `Read` instance to obtain an iterator +over events: + +```rust,no_run +extern crate xml; + +use std::fs::File; +use std::io::BufReader; + +use xml::reader::{EventReader, XmlEvent}; + +fn indent(size: usize) -> String { + const INDENT: &'static str = " "; + (0..size).map(|_| INDENT) + .fold(String::with_capacity(size*INDENT.len()), |r, s| r + s) +} + +fn main() { + let file = File::open("file.xml").unwrap(); + let file = BufReader::new(file); + + let parser = EventReader::new(file); + let mut depth = 0; + for e in parser { + match e { + Ok(XmlEvent::StartElement { name, .. }) => { + println!("{}+{}", indent(depth), name); + depth += 1; + } + Ok(XmlEvent::EndElement { name }) => { + depth -= 1; + println!("{}-{}", indent(depth), name); + } + Err(e) => { + println!("Error: {}", e); + break; + } + _ => {} + } + } +} +``` + +`EventReader` implements `IntoIterator` trait, so you can just use it in a `for` loop directly. +Document parsing can end normally or with an error. Regardless of exact cause, the parsing +process will be stopped, and iterator will terminate normally. + +You can also have finer control over when to pull the next event from the parser using its own +`next()` method: + +```rust,ignore +match parser.next() { + ... +} +``` + +Upon the end of the document or an error the parser will remember that last event and will always +return it in the result of `next()` call afterwards. If iterator is used, then it will yield +error or end-of-document event once and will produce `None` afterwards. + +It is also possible to tweak parsing process a little using `xml::reader::ParserConfig` structure. +See its documentation for more information and examples. + +You can find a more extensive example of using `EventReader` in `src/analyze.rs`, which is a +small program (BTW, it is built with `cargo build` and can be run after that) which shows various +statistics about specified XML document. It can also be used to check for well-formedness of +XML documents - if a document is not well-formed, this program will exit with an error. + +Writing XML documents +--------------------- + +xml-rs also provides a streaming writer much like StAX event writer. With it you can write an +XML document to any `Write` implementor. + +```rust,no_run +extern crate xml; + +use std::fs::File; +use std::io::{self, Write}; + +use xml::writer::{EventWriter, EmitterConfig, XmlEvent, Result}; + +fn handle_event<W: Write>(w: &mut EventWriter<W>, line: String) -> Result<()> { + let line = line.trim(); + let event: XmlEvent = if line.starts_with("+") && line.len() > 1 { + XmlEvent::start_element(&line[1..]).into() + } else if line.starts_with("-") { + XmlEvent::end_element().into() + } else { + XmlEvent::characters(&line).into() + }; + w.write(event) +} + +fn main() { + let mut file = File::create("output.xml").unwrap(); + + let mut input = io::stdin(); + let mut output = io::stdout(); + let mut writer = EmitterConfig::new().perform_indent(true).create_writer(&mut file); + loop { + print!("> "); output.flush().unwrap(); + let mut line = String::new(); + match input.read_line(&mut line) { + Ok(0) => break, + Ok(_) => match handle_event(&mut writer, line) { + Ok(_) => {} + Err(e) => panic!("Write error: {}", e) + }, + Err(e) => panic!("Input error: {}", e) + } + } +} +``` + +The code example above also demonstrates how to create a writer out of its configuration. +Similar thing also works with `EventReader`. + +The library provides an XML event building DSL which helps to construct complex events, +e.g. ones having namespace definitions. Some examples: + +```rust,ignore +// <a:hello a:param="value" xmlns:a="urn:some:document"> +XmlEvent::start_element("a:hello").attr("a:param", "value").ns("a", "urn:some:document") + +// <hello b:config="name" xmlns="urn:default:uri"> +XmlEvent::start_element("hello").attr("b:config", "value").default_ns("urn:defaul:uri") + +// <![CDATA[some unescaped text]]> +XmlEvent::cdata("some unescaped text") +``` + +Of course, one can create `XmlEvent` enum variants directly instead of using the builder DSL. +There are more examples in `xml::writer::XmlEvent` documentation. + +The writer has multiple configuration options; see `EmitterConfig` documentation for more +information. + +Other things +------------ + +No performance tests or measurements are done. The implementation is rather naive, and no specific +optimizations are made. Hopefully the library is sufficiently fast to process documents of common size. +I intend to add benchmarks in future, but not until more important features are added. + +Known issues +------------ + +All known issues are present on GitHub issue tracker: <http://github.com/netvl/xml-rs/issues>. +Feel free to post any found problems there. + +License +------- + +This library is licensed under MIT license. + +--- +Copyright (C) Vladimir Matveev, 2014-2020 |