1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
|
# clue/reactphp-utf8 [](https://travis-ci.org/clue/reactphp-utf8)
Streaming UTF-8 parser, built on top of [ReactPHP](https://reactphp.org/).
**Table of Contents**
* [Support us](#support-us)
* [Usage](#usage)
* [Sequencer](#sequencer)
* [Install](#install)
* [Tests](#tests)
* [License](#license)
* [More](#more)
## Support us
We invest a lot of time developing, maintaining and updating our awesome
open-source projects. You can help us sustain this high-quality of our work by
[becoming a sponsor on GitHub](https://github.com/sponsors/clue). Sponsors get
numerous benefits in return, see our [sponsoring page](https://github.com/sponsors/clue)
for details.
Let's take these projects to the next level together! 🚀
## Usage
### Sequencer
The `Sequencer` class can be used to make sure you only get back complete, valid
UTF-8 byte sequences when reading from a stream.
It wraps a given `ReadableStreamInterface` and exposes its data through the same
interface.
```php
$stdin = new ReadableResourceStream(STDIN, $loop);
$stream = new Sequencer($stdin);
$stream->on('data', function ($chunk) {
var_dump($chunk);
});
```
React's streams emit chunks of data strings and make no assumption about its encoding.
These chunks do not necessarily represent complete UTF-8 byte sequences, as a
sequence may be broken up into multiple chunks.
This class reassembles these sequences by buffering incomplete ones.
Also, if you're merely consuming a stream and you're not in control of producing and
ensuring valid UTF-8 data, it may as well include invalid UTF-8 byte sequences.
This class replaces any invalid bytes in the sequence with a `?`.
This replacement character can be given as a second paramter to the constructor:
```php
$stream = new Sequencer($stdin, 'X');
```
As such, you can be sure you never get an invalid UTF-8 byte sequence out of
the resulting stream.
Note that the stream may still contain ASCII control characters or
ANSI / VT100 control byte sequences, as they're valid UTF-8.
This binary data will be left as-is, unless you filter this at a later stage.
## Install
The recommended way to install this library is [through Composer](https://getcomposer.org).
[New to Composer?](https://getcomposer.org/doc/00-intro.md)
This project follows [SemVer](https://semver.org/).
This will install the latest supported version:
```bash
$ composer require clue/utf8-react:^1.2
```
See also the [CHANGELOG](CHANGELOG.md) for details about version upgrades.
This project aims to run on any platform and thus does not require any PHP
extensions and supports running on legacy PHP 5.3 through current PHP 7+ and
HHVM.
It's *highly recommended to use PHP 7+* for this project.
## Tests
To run the test suite, you first need to clone this repo and then install all
dependencies [through Composer](https://getcomposer.org):
```bash
$ composer install
```
To run the test suite, go to the project root and run:
```bash
$ php vendor/bin/phpunit
```
## License
This project is released under the permissive [MIT license](LICENSE).
> Did you know that I offer custom development services and issuing invoices for
sponsorships of releases and for contributions? Contact me (@clue) for details.
## More
* If you want to learn more about processing streams of data, refer to the documentation of
the underlying [react/stream](https://github.com/reactphp/stream) component.
* If you want to process ASCII control characters or ANSI / VT100 control byte sequences, you may
want to use [clue/reactphp-term](https://github.com/clue/reactphp-term) on the raw input
stream before passing the resulting stream to the UTF-8 sequencer.
* If you want to to display or inspect the byte sequences, you may
want to use [clue/hexdump](https://github.com/clue/php-hexdump) on the emitted byte sequences.
|