Expand description
§Parser-combinator library for Rust.
pars is a general purpose parser-combinator library, with support for no_std.
pars is heavily inspired by nom, but pars
aims to generalize more over different kinds of input and provide more
Rust-idiomatic and ergonomic combinators. pars also provides direct support for
higher level concepts such as Unicode and regular expressions.
§Contents
- Example - Jump directly into an example parser implementation
- Parser Combinators - A brief introduction to parser-combinators
- Parser Input - An overview of how parseable input works in
pars - Defining a Parser - How to write your own parser
- Defining a Combinator - How to write your own combinator
- Parsing Errors - The
parserror system and how to define your own error types - Features - Available crate features
§Example
use pars::prelude::*;
use pars::ErrorKind;
use pars::unicode::{
strict::{char as uchar, verbatim},
UnicodeInput as UInput, PResult,
};
#[derive(Debug, PartialEq)]
struct Color {
red: u8,
green: u8,
blue: u8,
}
fn hex_digit<I: UInput>(input: I) -> PResult<u8, I> {
uchar.try_map(|ch| match ch {
'0'..='9' => Ok((ch as u8) - b'0'),
'a'..='f' => Ok((ch as u8) - b'a' + 10),
'A'..='F' => Ok((ch as u8) - b'A' + 10),
_ => Err(ErrorKind::InvalidInput),
}).parse(input)
}
fn hex_byte<I: UInput>(input: I) -> PResult<u8, I> {
hex_digit.array().map(|[d0, d1]| (d0 << 4) | d1).parse(input)
}
fn hex_color<I: UInput>(input: I) -> PResult<Color, I> {
prefix(verbatim("#"), hex_byte.array())
.map(|[r, g, b]| Color {
red: r,
green: g,
blue: b,
})
.parse(input)
}
fn main() {
assert_eq!(
hex_color.parse("#2F14DF"),
Ok(Success(Color {
red: 0x2f,
green: 0x14,
blue: 0xdf,
}, ""))
);
}§Parser Combinators
Classically, parsers are defined as a grammar that parser generators, such as
lex and yacc, will transpile to an implementation of a state machine. This
generally produces the most efficient parsers, to be sure. Some popular parser
generators for Rust include pest,
LALRPOP, and
peg.
A parser combinator, on the other hand, is generally a library of parser building
blocks for building a recursive descent parser. The trade off here is that we
gain the ability to write plain Rust (in the case of pars) as well the ability
to easily reuse parsers to build more complex parsers, but at the costs associated
with recursive descent parsers. The downside of recursive descent parsers is that
there is no limit to how far back the parser may need to backtrack in the input
stream during parsing, and thus are generally not as efficient as parser produced
by parser generators. But note that, for example, most practical compilers are
implemented with a recursive descent parser because of their power and
flexibility, despite not being the most efficient approach. Also note that in
some cases, it is possible define a complex parser without any backtracking with
a little extra effort. See the example
msgpack_alt.rs
in the crate repo for an example of a parser with no backtracking.
§Parser Input
In pars a parser is defined as a function that accepts a stream of symbols as a
parameter, and returns either a parse success or parse failure. A stream of
symbols is generalized by the Input trait. Input is very similar to the
Iterator trait in the Rust standard library, but it has some extra
properties. Input must also implement Clone, and it must be possible to
test whether two Input objects are at the same stream position. Note that
for good performance, the Clone implementation should be cheap, but it is
not required that it implement Copy.
Input is automatically implemented for slice references, &[T] and &str.
To illustrate the properties of an Input consider input: &[u8]. Advancing
the input is equivalent to input = &input[1..]. Testing that two inputs are
at the same position is just testing that the underlying pointers are equal,
input.as_ptr() == other_input.as_ptr(). Also, &[u8] clearly has a cheap
Clone implementation.
See the documentation for Input for more details.
pars also provides the IntoInput trait, which is similarly analogous to
the IntoIterator trait in the Rust standard library. The Parse::parse
function accepts a value implementing IntoInput instead of Input for
the convience of the caller. For example, &Vec<u8>,
&Box<[u8]>, &[u8; N] all implement IntoIterator
and will convert to the input type &[u8].
See the documentation for IntoInput for more details.
Since Input does not have any constraints on what a “symbol” can be, pars
can be adapted to consume pretty much any kind of input stream. For example, in
some cases it might be desirable to build a lexer (or scanner) to interpret the
source input as a stream of tokens, and parse the token stream with pars
instead of the source input. Sufficiently complex parsers, such as compiler
frontends, generally prefer this approach. The lexer just needs to provide an
interface with a type that implements Input. Note that it will usually
be necessary for the lexer (or whatever the custom input source is) to be
“multi-pass” in order to support parser backtracking.
TODO: Include a custom input example.
§Defining a Parser
Most user defined parsers should be implemented as a plain function. Types
implementing Fn(I) -> PResult<T, I, E>, where I
implements Input and E implements Error<I> automatically implement
Parse<I, Parsed = T, Error = E>. Closures can also be used to
define a parser inline, but note that occaisionally closures may need some
type hinting assistance to compile properly as a parser.
use pars::{Input, PResult, Success, bytes::Error, Parse};
// `my_parser` implements `Parse<I, Parsed = i32, Error = Error<I>>`
fn my_parser<I: Input>(input: I) -> PResult<i32, I, Error<I>> {
Success(42, input).into()
}When needed, Parse can always be implemented manually.
§Defining a Combinator
Occaisionly, there is a need to implement a custom combinator for reuse. The
simplest way is to define a function returning impl Parse.
use pars::{basic::array, Input, Parse};
// Creates a parser that applies `parser` twice and
// returns the parsed values as an array of length 2.
const fn twice<I, P>(parser: P)
-> impl Parse<I, Parsed = [P::Parsed; 2], Error = P::Error>
where
I: Input,
P: Parse<I>,
{
array(parser)
}It is usually necessary for the impl Parse return to explicitly define
the Parsed and Error associated types, in
order for type inference to work properly in conjunction with other combinators.
§Parsing Errors
pars does not define a general purpose error for parsing failures. Instead, it
provides the Error trait. Users can define their own parsing error types, and
pars provides a basic error type for each of the modules
bytes, ascii, and
unicode.
A parsing error, aside from other error information it communicates, contains the
input position at which the error occurred. The generic parsers and combinators
defined throughout pars use the interface described by the Error trait to
instantiate an error as needed, also providing where the error occurred. Note
that the input position described by an Error may be different from the input
position described by a Failure within a PResult. The Failure always
describes the remaining input that has not been successfully parsed, but the
error may have occurred further into the input and caused the parser to
backtrack.
An important partner to the Error trait is the ErrorSeed trait.
Essentially, a type implementing ErrorSeed is a parsing error without the
accompanying input position of the error. Such a type can be combined with an
input position via the ErrorSeed trait to create an actual error.
Many combinators defined in pars, particularly those named try_*, require the
user to provide a function or closure that returns a Result. The Err
variant of that Result must be convertible to an Error via ErrorSeed.
The reason for this is combinators generally abstract away the input stream,
letting the user focus more on how parsers fit together and produce a parsed
value. When some intermediate operation is fallible, being able to return an
ErrorSeed simplifies matters by letting the combinator implementation deal
with tracking input positions.
A typical user defined error will be defined as a pair of types: One implementing
ErrorSeed and another implementing Error.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
struct MyErrorKind(&'static str);
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
struct MyError<I: Input> {
msg: &'static str,
pos: I,
}
impl<I: Input> ErrorSeed<I, MyError<I>> for MyErrorKind {
fn into_error(self, pos: I) -> MyError<I> {
MyError {
msg: self.0,
pos,
}
}
}
impl<I: Input> Error<I> for MyError<I> {
fn need_more_input(pos: I) -> Self {
MyErrorKind("need more input").into_error(pos)
}
fn expected_eof(pos: I) -> Self {
MyErrorKind("expected end of input").into_error(pos)
}
fn invalid_input(pos: I) -> Self {
MyErrorKind("invalid input").into_error(pos)
}
fn position(&self) -> &I { &self.pos }
}For convenience, ErrorKind is available for use as an ErrorSeed type that
can be converted to any type implementing Error.
When combining a user defined error with parsers that use a different error type,
such as bytes::u8, there are a few ways this can be dealt with.
The simplest way is to use Parse::err_into.
// Given `MyError` defined previously.
impl<I: Input> From<bytes::Error<I>> for MyError<I> {
fn from(err: bytes::Error<I>) -> Self {
match err.kind() {
bytes::ErrorKind::NeedMoreInput => Self::need_more_input(err.position().clone()),
bytes::ErrorKind::ExpectedEof => Self::expected_eof(err.position().clone()),
_ => Self::invalid_input(err.position().clone()),
}
}
}
fn my_u8<I: bytes::ByteInput>(input: I) -> PResult<u8, I, MyError<I>> {
// Changes `bytes::u8` to return `MyError` instead of `bytes::Error`
bytes::u8.err_into().parse(input)
}When relying on a From implementation doesn’t work Parse::map_err and
Parse::map_res are also available to define error conversions separately for each
instance.
§Features
Note that there is no std feature, since pars never makes use of types that are
only available in std.
alloc- Enable usage of thealloccrate. This enablesIntoInputimpls for types such as&Vec<[T]>and&String. Enabled by default.bytes- Enable thebytesmodule, which provides parsers and utilities for byte streams.unicode- Enable theunicodemodule, which provides parsers and utilities for UTF-8, UTF-16, and UTF-32 streams.ascii- Enable theasciimodule, which provides parsers and utilities for ASCII character streams.
Modules§
- ascii
ascii - basic
- Generic parser and combinator building blocks.
- bytes
bytes - prelude
- The
parsprelude. - unicode
unicode
Structs§
- Failure
- Type returned by a parser when parsing fails.
- Span
- A slice of parser input.
- Success
- Type returned by a parser when parsing succeeds.
Enums§
Traits§
- Error
- A parsing error.
- Error
Seed - A parsing error descriptor that can be used to build a parsing error.
- Input
- A parsable symbol stream.
- Into
Input - Trait for types that can be directly converted into an
Inputtype. - PResult
Ext - Additional convenience methods for
PResult. - Parse
- Trait implemented by all parsers.