diff options
Diffstat (limited to '')
-rw-r--r-- | _doc/api.rst | 287 |
1 files changed, 287 insertions, 0 deletions
diff --git a/_doc/api.rst b/_doc/api.rst new file mode 100644 index 0000000..d64df0d --- /dev/null +++ b/_doc/api.rst @@ -0,0 +1,287 @@ ++++++++++++++++++++++++++++ +Departure from previous API ++++++++++++++++++++++++++++ + +With version 0.15.0 ``ruyaml`` starts to depart from the previous (PyYAML) way +of loading and dumping. During a transition period the original +``load()`` and ``dump()`` in its various formats will still be supported, +but this is not guaranteed to be so with the transition to 1.0. + +At the latest with 1.0, but possible earlier transition error and +warning messages will be issued, so any packages depending on +ruyaml should pin the version with which they are testing. + + +Up to 0.15.0, the loaders (``load()``, ``safe_load()``, +``round_trip_load()``, ``load_all``, etc.) took, apart from the input +stream, a ``version`` argument to allow downgrading to YAML 1.1, +sometimes needed for +documents without directive. When round-tripping, there was an option to +preserve quotes. + +Up to 0.15.0, the dumpers (``dump()``, ``safe_dump``, +``round_trip_dump()``, ``dump_all()``, etc.) had a plethora of +arguments, some inherited from ``PyYAML``, some added in +``ruyaml``. The only required argument is the ``data`` to be +dumped. If the stream argument is not provided to the dumper, then a +string representation is build up in memory and returned to the +caller. + +Starting with 0.15.0 ``load()`` and ``dump()`` are methods on a +``YAML`` instance and only take the stream, +resp. the data and stream argument. All other parameters are set on the instance +of ``YAML`` before calling ``load()`` or ``dump()`` + +Before 0.15.0:: + + from pathlib import Path + import ruyaml + + data = ruyaml.safe_load("abc: 1") + out = Path('/tmp/out.yaml') + with out.open('w') as fp: + ruyaml.safe_dump(data, fp, default_flow_style=False) + +after:: + + from pathlib import Path + from ruyaml import YAML + + yaml = YAML(typ='safe') + yaml.default_flow_style = False + data = yaml.load("abc: 1") + out = Path('/tmp/out.yaml') + yaml.dump(data, out) + +If you previously used a keyword argument ``explicit_start=True`` you +now do ``yaml.explicit_start = True`` before calling ``dump()``. The +``Loader`` and ``Dumper`` keyword arguments are not supported that +way. You can provide the ``typ`` keyword to ``rt`` (default), +``safe``, ``unsafe`` or ``base`` (for round-trip load/dump, safe_load/dump, +load/dump resp. using the BaseLoader / BaseDumper. More fine-control +is possible by setting the attributes ``.Parser``, ``.Constructor``, +``.Emitter``, etc., to the class of the type to create for that stage +(typically a subclass of an existing class implementing that). + +The default loader (``typ='rt'``) is a direct derivative of the safe loader, without the +methods to construct arbitrary Python objects that make the ``unsafe`` loader +unsafe, but with the changes needed for round-trip preservation of comments, +etc.. For trusted Python classes a constructor can of course be added to the round-trip +or safe-loader, but this has to be done explicitly (``add_constructor``). + +All data is dumped (not just for round-trip-mode) with ``.allow_unicode += True`` + +You can of course have multiple YAML instances active at the same +time, with different load and/or dump behaviour. + +Initially only the typical operations are supported, but in principle +all functionality of the old interface will be available via +``YAML`` instances (if you are using something that isn't let me know). + +If a parse or dump fails, and throws and exception, the state of the +``YAML()`` instance is not guaranteed to be able to handle further +processing. You should, at that point to recreate the YAML instance before +proceeding. + + +Loading ++++++++ + +Duplicate keys +^^^^^^^^^^^^^^ + +In JSON mapping keys should be unique, in YAML they must be unique. +PyYAML never enforced this although the YAML 1.1 specification already +required this. + +In the new API (starting 0.15.1) duplicate keys in mappings are no longer allowed by +default. To allow duplicate keys in mappings:: + + yaml = ruyaml.YAML() + yaml.allow_duplicate_keys = True + yaml.load(stream) + +In the old API this is a warning starting with 0.15.2 and an error in +0.16.0. + +When a duplicate key is found it and its value are discarded, as should be done +according to the `YAML 1.1 specification <http://yaml.org/spec/1.1/#id932806>`__. + +Dumping a multi-documents YAML stream ++++++++++++++++++++++++++++++++++++++ + +The "normal" ``dump_all`` expected as first element a list of documents, or +something else the internals of the method can iterate over. To read +and write a multi-document you would either make a ``list``:: + + yaml = YAML() + data = list(yaml.load_all(in_path)) + # do something on data[0], data[1], etc. + yaml.dump_all(data, out_path) + + +or create some function/object that would yield the ``data`` values. + +What you now can do is create ``YAML()`` as an context manager. This +works for output (dumping) only, requires you to specify the output +(file, buffer, ``Path``) at creation time, and doesn't support +``transform`` (yet). + +:: + + with YAML(output=sys.stdout) as yaml: + yaml.explicit_start = True + for data in yaml.load_all(Path(multi_document_filename)): + # do something on data + yaml.dump(data) + + +Within the context manager, you cannot use the ``dump()`` with a +second (stream) argument, nor can you use ``dump_all()``. The +``dump()`` within the context of the ``YAML()`` automatically creates +multi-document if called more than once. + +To combine multiple YAML documents from multiple files: + +:: + + list_of_filenames = ['x.yaml', 'y.yaml', ] + with YAML(output=sys.stdout) as yaml: + yaml.explicit_start = True + for path in list_of_filename: + with open(path) as fp: + yaml.dump(yaml.load(fp)) + + +The output will be a valid, uniformly indented YAML file. Doing +``cat {x,y}.yaml`` might result in a single document if there is not +document start marker at the beginning of ``y.yaml`` + + + + +Dumping ++++++++ + +Controls +^^^^^^^^ + +On your ``YAML()`` instance you can set attributes e.g with:: + + yaml = YAML(typ='safe', pure=True) + yaml.allow_unicode = False + +available attributes include: + +``unicode_supplementary`` + Defaults to ``True`` if Python's Unicode size is larger than 2 bytes. Set to ``False`` to + enforce output of the form ``\U0001f601`` (ignored if ``allow_unicode`` is ``False``) + +Transparent usage of new and old API +++++++++++++++++++++++++++++++++++++ + +If you have multiple packages depending on ``ruyaml``, or install +your utility together with other packages not under your control, then +fixing your ``install_requires`` might not be so easy. + +Depending on your usage you might be able to "version" your usage to +be compatible with both the old and the new. The following are some +examples all assuming ``import ruyaml`` somewhere at the top +of your file and some ``istream`` and ``ostream`` apropriately opened +for reading resp. writing. + + +Loading and dumping using the ``SafeLoader``:: + + yml = ruyaml.YAML(typ='safe', pure=True) # 'safe' load and dump + data = yml.load(istream) + yml.dump(data, ostream) + +Loading with the ``CSafeLoader``, dumping with +``RoundTripLoader``. You need two ``YAML`` instances, but each of them +can be re-used:: + + yml = ruyaml.YAML(typ='safe') + data = yml.load(istream) + ymlo = ruyaml.YAML() # or yaml.YAML(typ='rt') + ymlo.width = 1000 + ymlo.explicit_start = True + ymlo.dump(data, ostream) + +Loading and dumping from ``pathlib.Path`` instances using the +round-trip-loader:: + + # in myyaml.py + class MyYAML(yaml.YAML): + def __init__(self): + yaml.YAML.__init__(self) + self.preserve_quotes = True + self.indent(mapping=4, sequence=4, offset=2) + # in your code + from myyaml import MyYAML + + # some pathlib.Path + from pathlib import Path + inf = Path('/tmp/in.yaml') + outf = Path('/tmp/out.yaml') + + yml = MyYAML() + # no need for with statement when using pathlib.Path instances + data = yml.load(inf) + yml.dump(data, outf) + ++++++++++++++++++++++ +Reason for API change ++++++++++++++++++++++ + +``ruyaml`` inherited the way of doing things from ``PyYAML``. In +particular when calling the function ``load()`` or ``dump()`` +temporary instances of ``Loader()`` resp. ``Dumper()`` were +created that were discarded on termination of the function. + +This way of doing things leads to several problems: + +- it is virtually impossible to return information to the caller apart from the + constructed data structure. E.g. if you would get a YAML document + version number from a directive, there is no way to let the caller + know apart from handing back special data structures. The same + problem exists when trying to do on the fly + analysis of a document for indentation width. + +- these instances were composites of the various load/dump steps and + if you wanted to enhance one of the steps, you needed e.g. subclass + the emitter and make a new composite (dumper) as well, providing all + of the parameters (i.e. copy paste) + + Alternatives, like making a class that returned a ``Dumper`` when + called and sets attributes before doing so, is cumbersome for + day-to-day use. + +- many routines (like ``add_representer()``) have a direct global + impact on all of the following calls to ``dump()`` and those are + difficult if not impossible to turn back. This forces the need to + subclass ``Loaders`` and ``Dumpers``, a long time problem in PyYAML + as some attributes were not ``deep_copied`` although a bug-report + (and fix) had been available a long time. + +- If you want to set an attribute, e.g. to control whether literal + block style scalars are allowed to have trailing spaces on a line + instead of being dumped as double quoted scalars, you have to change + the ``dump()`` family of routines, all of the ``Dumpers()`` as well + as the actual functionality change in ``emitter.Emitter()``. The + functionality change takes changing 4 (four!) lines in one file, and being able + to enable that another 50+ line changes (non-contiguous) in 3 more files resulting + in diff that is far over 200 lines long. + +- replacing libyaml with something that doesn't both support ``0o52`` + and ``052`` for the integer ``42`` (instead of ``52`` as per YAML 1.2) + is difficult + + +With ``ruyaml>=0.15.0`` the various steps "know" about the +``YAML`` instance and can pick up setting, as well as report back +information via that instance. Representers, etc., are added to a +reusable instance and different YAML instances can co-exists. + +This change eases development and helps prevent regressions. |