The lost art of XML — mmagueta (marcosmagueta.com)
from Kissaki@programming.dev to programming@programming.dev on 24 Jan 09:31
https://programming.dev/post/44564394

There exists a peculiar amnesia in software engineering regarding XML. Mention it in most circles and you will receive knowing smiles, dismissive waves, the sort of patronizing acknowledgment reserved for technologies deemed passé. “Oh, XML,” they say, as if the very syllables carry the weight of obsolescence. “We use JSON now. Much cleaner.”

#programming

threaded - newest

Diplomjodler3@lemmy.world on 24 Jan 09:46 next collapse

It’s true, though, that JSON is just better for most applications.

MonkderVierte@lemmy.zip on 24 Jan 11:14 collapse

Except config files. Please don’t do config files in json.

abbadon420@sh.itjust.works on 24 Jan 11:46 next collapse

Yaml

atzanteol@sh.itjust.works on 24 Jan 13:28 next collapse

Fuck yaml. TOML or literally anything else.

Lysergid@lemmy.ml on 25 Jan 10:07 collapse

Yaml is dogshit format. If you need tree-like structure use json if you need list of props use toml or simple key value pairs. I fucking hate app properties in yaml.

  • can’t search shit
  • copy-paste doesn’t “just work” when you want to merge two files
  • your editor doesn’t show whitespaces and you messed up somewhere - valid but incorrect
  • messed up formatting your list of banned IPs/hosts/ports/users/subnets/commands - get pwned

It should’ve never left the basement of crackhead who thought “let’s make schema-less format depend on number of invisible characters”.

I’ll rather save my data in Copybook and record it on tape then use this Python-bastardized abomination

abbadon420@sh.itjust.works on 25 Jan 10:41 collapse

Oh nice! I didn’tknow toml, so I looked it up a bit. Atfirst I was like “this is just .properties with a better typing support”. Than I saw the tables and the inline tables, which is a really neat way for complex nesting. It reminds me of json, but better. I’ll see if I can start using this somewhere.

bookmeat@lemmynsfw.com on 25 Jan 14:17 collapse

Meanwhile, XML crying in the corner… 😄

Diplomjodler3@lemmy.world on 24 Jan 11:59 next collapse

Why not? It works great in Python.

MonkderVierte@lemmy.zip on 24 Jan 12:00 collapse

Not on the human parser side.

Tanoh@lemmy.world on 24 Jan 12:07 next collapse

And no comments, unless you use a non-standard parser. But then you might as well use anorher format.

Diplomjodler3@lemmy.world on 24 Jan 12:35 collapse
{"comment": "Who says you can't do comments in JSON?"}
abbadon420@sh.itjust.works on 24 Jan 13:33 next collapse

Lol. That works, but its hacky.

The meaning of a “comment” is an integrated language feauture to write something that is not parsed by that language. This is just regular JSON.

bleistift2@sopuli.xyz on 24 Jan 15:59 next collapse

This only works if the software that consumes the JSON doesn’t validate it or ignores keys it doesn’t recognize (which is bad, IMHO).

tyler@programming.dev on 24 Jan 23:50 collapse

Now do a second comment.

Diplomjodler3@lemmy.world on 25 Jan 00:04 collapse
{"comment2": "I can do this all day."}
tyler@programming.dev on 25 Jan 17:57 collapse

Now put a newline in your comment, to make it readable. Clearly you can see the problem here right? “comment2” isn’t a comment. It’s a key with a value. Numbering them doesn’t actually fix anything, in fact it makes it much much harder to maintain.

atzanteol@sh.itjust.works on 24 Jan 13:29 collapse

JSON is super easy to read and write though. Just needs a parser that allows comments…

tyler@programming.dev on 24 Jan 23:51 collapse

That’s JSON5 or JSONC

atzanteol@sh.itjust.works on 25 Jan 00:26 collapse

Yes, which needs to be supported by your parser.

neukenindekeuken@sh.itjust.works on 24 Jan 12:55 collapse

Json configs read much cleaner to me since .net swapped to them a while back.

Xml is incredibly verbose when there’s a 12k loc web.config.xml

MonkderVierte@lemmy.zip on 24 Jan 12:59 collapse

Then do a cfg or ini style config or make multiple config files. YAML/TOML if you can’t make it simpler. The neccessity for complex config formats is a fuckup of the dev.

neukenindekeuken@sh.itjust.works on 24 Jan 20:32 collapse

Or you work in an environment that’s still using Full Framework and ASP.NET Webforms.

These places exist, and they are unfortunately not rare.

MonkderVierte@lemmy.zip on 24 Jan 20:36 collapse

My condolences.

neukenindekeuken@sh.itjust.works on 25 Jan 14:04 collapse

Heh, thank you. It’s usually not so bad, but figuring our all the assembly redirects needed is always a nightmare job.

Can’t wait until this this is on .net 8+ and we can use clean configs.

lehenry@lemmy.world on 24 Jan 09:46 next collapse

While I understand the critic about XPath and XSL, the fact that we have proper tools to query and tranform XML instead of the messy wat of getting specific information from JSON is also one of tge strong point of XML.

deadbeef79000@lemmy.nz on 24 Jan 10:33 next collapse

XSLT and XPath are entirely underrated. They are seriously powerful tools.

While you can approximate XSLT with a heap of coffee and a JSON parser it’s harder to keep it declarative.

Kissaki@programming.dev on 24 Jan 10:43 next collapse

Yeah, I wish I had something like XPath as consistently (in terms of availability and syntax) for JSON.

SlurpingPus@lemmy.world on 24 Jan 21:29 next collapse

Has no one here heard of jq?

tyler@programming.dev on 25 Jan 00:21 collapse

You do? jsonAta and JSONPath both exist and are very good.

Ephera@lemmy.ml on 24 Jan 11:21 collapse

There is JSONPath, at least: en.wikipedia.org/wiki/JSONPath

Auster@thebrainbin.org on 24 Jan 09:53 next collapse

Skimming through the post, the code snippet about halfway through picked my attention. Been a while since I studied site development, but that snippet looks awfully like HTML. Are it and XML related?

A_norny_mousse@feddit.org on 24 Jan 10:14 next collapse

Yes. Arguably, HTML is a form of XML. Also, the ML means the same in both. XML tools can often also be used to query HTML documents.

Auster@thebrainbin.org on 24 Jan 10:16 collapse

Ooooh~

Thanks for the explanation!

Kissaki@programming.dev on 24 Jan 10:16 next collapse

There was a time where HTML moved towards a more formalized XML-valid definition named XHTML. Ultimately, web/browser backwards compatibility and messy and forgiving nature lead to us giving up on that and now we have the HTML living standard with rules, but browsers (not sure to what degree it’s standardized or not) are very forgiving in their interpretation.

While HTML, prior to HTML5, was defined as an application of Standard Generalized Markup Language (SGML), a flexible markup language framework, XHTML is an application of XML, a more restrictive subset of SGML. XHTML documents are well-formed and may therefore be parsed using standard XML parsers, unlike HTML, which requires a lenient, HTML-specific parser.[1]

XHTML 1.0 became a World Wide Web Consortium (W3C) recommendation on 26 January 2000. XHTML 1.1 became a W3C recommendation on 31 May 2001. XHTML is now referred to as “the XML syntax for HTML”[2][3] and being developed as an XML adaptation of the HTML living standard.[4][5]

MonkderVierte@lemmy.zip on 24 Jan 11:13 collapse

But nobody uses it anymore and uses a js-framework on a <div> page instead. Which only 3 billion-dollar engines in the world can render.

atzanteol@sh.itjust.works on 24 Jan 13:34 collapse

They’re siblings. They both derive from SGML. There is a version of HTML that is also XML conformant called XHTML but it never caught on…

epyon22@sh.itjust.works on 24 Jan 09:59 next collapse

The fact that json serializes easily to basic data structures simplifies code so much. Most use cases don’t need fully sematic data storage much of which you have to write the same amount of documentation about the data structures anyways. I’ll give XML one thing though, schemas are nice and easy, but high barrier to entry in json.

Kissaki@programming.dev on 24 Jan 10:13 collapse

Most use cases don’t need fully sematic data storage

If both sides have a shared data model it’s a good base model without further needs. Anything else quickly becomes complicated because of the dynamic nature of JSON - at least if you want a robust or well-documented solution.

sukhmel@programming.dev on 24 Jan 15:44 next collapse

Yeah, when the same API endpoint sometimes return a string for an error, sometimes an object, and sometimes an array, JSON doesn’t help much in parsing the mess

SlurpingPus@lemmy.world on 24 Jan 21:18 collapse

If both sides have a shared data model

If the sides don’t have a common understanding of the data structure, no format under the sun will help.

Kissaki@programming.dev on 25 Jan 00:40 collapse

The point is that there are degrees to readability, specificity, and obviousness, even without a common understanding. Self-describing data, much like self-describing code, is different from a dense serialization without much support in that regard.

A_norny_mousse@feddit.org on 24 Jan 10:32 next collapse

I never understood why people would say JSON is superior, and why XML seemed to be getting rarer, but the author explains it:

XML was not abandoned because it was inadequate; it was abandoned because JavaScript won.

I’ve been using it ever since I started using Linux because my favorite window manager uses it, and because of a long-running pet project that is almost just as old: first I used XML tools to parse web pages, later I switched to dedicated data providers that offered both XML and JSON formats, and stuck to what I knew.

I’m guessing that another reason devs - especially web devs - prefer JSON over XML is that the latter uses more bytes to transport the same amount of raw data. One XML file will be somewhat larger than one JSON file with the same content. That advantage is of course dwarved by all the other media and helper scripts - nay, frameworks, devs use to develop websites.

BTW, XML is very readable with syntax highlighting and easily editable if your code editor has some very basic completion for it. And it has comments!

Kissaki@programming.dev on 24 Jan 11:02 next collapse

The readability and obviousness of XML can not be overstated. JSON is simple and dense (within the limit of text). But look at JSON alone, and all you can do is hope for named fields. Outside of that, you depend on context knowledge and specific structure and naming context.

Whenever I start editing json config files I have to be careful about trailing commas, structure with opening and closing parens, placement and field naming. The best you can do is offer a default-filled config file that already has the full structure.

While XML does not solve all of it, it certainly is more descriptive and more structured, easing many of those pain points.


It’s interesting that web tech had XML in the early stages of AJAX, the dynamic web. But in the end, we sent JSON through XMLHttpRequest. JSON won.

tyler@programming.dev on 25 Jan 00:24 collapse

You are clearly one of those people that never had to deal with xml in a production system. Even with proper syntax highlighting, dealing with xml is a nightmare, whether it’s for configuration or data transmission. People switched to JSON because it’s better. Period. And that’s an incredibly low bar to set, because I don’t think JSON is that good either.

Like another person said, all of these features of XML doesn’t make it nicer, it makes it worse, because it means you have to be ready for any of those features even if they’re never used.

Feyd@programming.dev on 25 Jan 02:57 collapse

There are really good uses for XML. Mostly for making things similar to HTML. Like markup for Android UIs or XAML for WPF. For pretty much everything else the complexity only brings headaches

Ephera@lemmy.ml on 24 Jan 11:44 next collapse

IMHO one of the fundamental problems with XML for data serialization is illustrated in the article:

(person (name “Alice”) (age 30))
[is serialized as]

<person>
  <name>Alice</name>
  <age>30</age>
</person>

Or with attributes:
<person name=“Alice” age=“30” />

The same data can be portrayed in two different ways. Whenever you serialize or deserialize data, you need to decide whether to read/write values from/to child nodes or attributes.

That’s because XML is a markup language. It’s great for typing up documents, e.g. to describe a user interface. It was not designed for taking programmatic data and serializing that out.

Feyd@programming.dev on 24 Jan 12:06 next collapse

JSON also has arrays. In XML the practice to approximate arrays is to put the index as an attribute. It’s incredibly gross.

Kissaki@programming.dev on 24 Jan 13:22 collapse

In XML the practice to approximate arrays is to put the index as an attribute. It’s incredibly gross.

I don’t think I’ve seen that much if ever.

Typically, XML repeats tag names. Repeating keys are not possible in JSON, but are possible in XML.

<items>
  <item></item>
  <item></item>
  <item></item>
</items>
Feyd@programming.dev on 24 Jan 13:39 collapse

That’s correct, but the order of tags in XML is not meaningful, and if you parse then write that, it can change order according to the spec. Hence, what you put would be something like the following if it was intended to represent an array.

<items>
  <item index="1"></item>
  <item index="2"></item>
  <item index="3"></item>
</items>
Kissaki@programming.dev on 25 Jan 01:03 collapse

www.w3.org/TR/2004/REC-xml-infoset-20040204/

[children] An ordered list of child information items, in document order.

Does this not cover it?

Do you mean if you were to follow XML standard but not XML information set standard?

Feyd@programming.dev on 25 Jan 01:39 collapse

Information set isn’t a description of XML documents, but a description of what you have that you can write to XML, or what you’d get when you parse XML.

This is the key part from the document you linked

The information set of an XML document is defined to be the one obtained by parsing it according to the rules of the specification whose version corresponds to that of the document.

This is also a great example of the complexity of the XML specifications. Most people do not fully understand them, which is a negative aspect for a tool.

As an aside, you can have an enforced order in XML, but you have to also use XSD so you can specify xsd:sequence, which adds complexity and precludes ordered arrays in arbitrary documents.

Kissaki@programming.dev on 25 Jan 11:10 collapse

If the XML parser parses into an ordered representation (the XML information set), isn’t it then the deserializer’s choice how they map that to the programming language/type system they are deserializing to? So in a system with ordered arrays it would likely map to those?

If XML can be written in an ordered way, and the parsed XML information set has ordered children for those, I still don’t see where order gets lost or is impossible [to guarantee] in XML.

Feyd@programming.dev on 25 Jan 11:39 collapse

You are correct that it is the deserializer’s choice. You are incorrect when you imply that it is a good idea to rely on behavior that isn’t enforced in the spec. A lot of people have been surprised when that assumption turns out to be wrong.

aivoton@sopuli.xyz on 24 Jan 12:13 next collapse

The same data can be portrayed in two different ways.

And that is issue why? The specification decided which one you use and what do you need. For some things you consider things as attributes and for some things they are child elements.

JSON doesn’t even have attributes.

Ephera@lemmy.ml on 24 Jan 13:02 collapse

Alright, I haven’t really looked into XML specifications so far. But I also have to say that needing a specification to consistently serialize and deserialize data isn’t great either.

And yes, JSON not having attributes is what I’m saying is a good thing, at least for most data serialization use-cases, since programming languages do not typically have such attributes on their data type fields either.

aivoton@sopuli.xyz on 24 Jan 15:15 collapse

I worded my answer a bit wrongly.

In XML <person><name>Alice</name><age>30</age></person> is different from <person name=“Alice” age=“30” /> and they will never (de)serialize to each other. The original example by the articles author with the person is somewhat misguided.

They do contain the same bits of data, but represent different things and when designing your dtd / xsd you have to decide when to use attributes and when to use child elements.

Ephera@lemmy.ml on 24 Jan 18:28 collapse

Ah, well, as far as XML is concerned, yeah, these are very different things, but that’s where the problem stems from. In your programming language, you don’t have two variants. You just have (person (name “Alice”) (age 30)). But then, because XML makes a difference between metadata and data, you have to decide whether “name” and “age” are one or the other.

And the point I wanted to make, which perhaps didn’t come across as well, is that you have to write down that decision somewhere, so that when you deserialize in the future, you know whether to read these fields from attributes or from child nodes.
And that just makes your XML serialization code so much more complex than it is for JSON, generally speaking. As in, I can slap down JSON serialization in 2 lines of code and it generally does what I expect, in Rust in this case.

Granted, Rust kind of lends itself to being serialized as JSON, but well, I’m just not aware of languages that lend themselves to being serialized as XML. The language with the best XML support that I’m aware of, is Scala, where you can actually get XML literals into the language (these days with a library, but it used to be built-in until Scala 3, I believe): javadoc.io/doc/…/index.html
But even in Scala, you don’t use a case class for XML, which is what you normally use for data records in the language, but rather you would take the values out of your case class and stick them into such an XML literal. Or I guess, you would use e.g. the Jackson XML serializer from Java. And yeah, the attribute vs. child node divide is the main reason why this intermediate step is necessary. Meanwhile, JSON has comparatively little logic built into the language/libraries and it’s still a lot easier to write out: docs.scala-lang.org/toolkit/json-serialize.html

Kissaki@programming.dev on 24 Jan 13:20 next collapse

It can be used as alternatives. In MSBuild you can use attributes and sub elements interchangeably. Which, if you’re writing it, gives you a choice of preference. I typically prefer attributes for conciseness (vertical density), but switch to subelements once the length/number becomes a (significant) downside.

Of course that’s more of a human writing view. Your point about ambiguity in de-/serialization still stands at least until the interface defines expectation or behavior as a general mechanism one way or the other, or with specific schema.

atzanteol@sh.itjust.works on 24 Jan 13:26 next collapse

This is your confusion, not an issue with XML.

Attributes tend to be “metadata”. You ever write HTML? It’s not confusing.

Feyd@programming.dev on 24 Jan 13:45 next collapse

In HTML, which things are attributes and which things are tags are part of the spec. With XML that is being used for something arbitrary, someone is making the choice every time. They might have a different opinion than you do, or even the same opinion, but make different judgments on occasion. In JSON, there are fewer choices, so fewer chances for people to be surprised by other people’s choices.

atzanteol@sh.itjust.works on 24 Jan 16:08 collapse

I mean, yeah. But people don’t just do things randomly. Most people put data in the body and metadata in attributes just like html.

Ephera@lemmy.ml on 24 Jan 18:41 collapse

Having to make a decision isn’t my primary issue here (even though it can also be problematic, when you need to serialize domain-specific data for which you’re no expert). My issue is rather in that you have to write this decision down, so that it can be used for deserializing again. This just makes XML serialization code significantly more complex than JSON serialization code. Both in terms of the code becoming harder to understand, but also just lines of code needed.
I’ve somewhat come to expect less than a handful lines of code for serializing an object from memory into a file. If you do that with XML, it will just slap everything into child nodes, which may be fine, but might also not be.

atzanteol@sh.itjust.works on 25 Jan 00:31 collapse

Having to make a decision isn’t my primary issue here (even though it can also be problematic, when you need to serialize domain-specific data for which you’re no expert). My issue is rather in that you have to write this decision down, so that it can be used for deserializing again. This just makes XML serialization code significantly more complex than JSON serialization code. Both in terms of the code becoming harder to understand, but also just lines of code needed.

This is, without a doubt, the stupidest argument against XML I’ve ever heard. Nobody has trouble with using attributes vs. tag bodies. Nobody. There are much more credible complaints to be made about parsing performance, memory overhead, extra size, complexity when using things like namespaces, etc.

I’ve somewhat come to expect less than a handful lines of code for serializing an object from memory into a file. If you do that with XML, it will just slap everything into child nodes, which may be fine, but might also not be.

No - it is fine to just use tag bodies. You don’t need to ever use attributes if you don’t want to. You’ve never actually used XML have you?

baeldung.com/jackson-xml-serialization-and-deseri…

Ephera@lemmy.ml on 25 Jan 04:35 collapse

Okay, dude, glad to have talked.

faint_marble_noise@programming.dev on 26 Jan 17:27 collapse

XML is not great for user interfaces at all.

Ephera@lemmy.ml on 26 Jan 18:14 collapse

Eh, I don’t think it’s the be-all and end-all of describing user interfaces, but it deals well with the deep nesting that UIs generally have, and the attributes allow throwing in metadata for certain elements, which is also something you frequently need in UIs.

At the very least, JSON, YAML, INI and TOML would be a lot worse.

faint_marble_noise@programming.dev on 26 Jan 18:20 collapse

Well, from my experience working with android xml guis is soul crushing. While QML is much more pleasant, and it is kinda like json, but not quite.

Ephera@lemmy.ml on 27 Jan 01:34 collapse

Yeah, fair enough. I was thinking in terms of the more general-purpose text formats. I have heard good things about QML, too…

Feyd@programming.dev on 24 Jan 12:12 next collapse

Honestly, anyone pining for all the features of XML probably didn’t live through the time when XML was used for everything. It was actually a fucking nightmare to account for the existence of all those features because the fact they existed meant someone could use them and feed them into your system. They were also the source of a lot of security flaws.

This article looks like it was written by someone that wasn’t there, and they’re calling people telling them the truth that they are liars because they think features they found in w3c schools look cool.

arjen@piefed.social on 24 Jan 12:46 next collapse

Preaching the choir I like to sing in.

I didn’t know the link to S-Expressions, ty.

calliope@retrolemmy.com on 24 Jan 13:23 next collapse

There exists a peculiar amnesia in software engineering regarding XML

That’s for sure. But not in the way the author means.

There exists a pattern in software development where people who weren’t around when the debate was actually happening write another theory-based article rehashing old debates like they’re saying something new. Every ten years or so!

The amnesia is coming from inside the article.

[XML] was abandoned because JavaScript won. The browser won.

This comes across as remarkably naive to me. JavaScript and the browser didn’t “win” in this case.

JSON is just vastly simpler to read and reason about for every purpose other than configuration files that are being parsed by someone else. Yaml is even more human-readable and easier to parse for most configuration uses… which is why people writing the configuration parser would rather use it than XML.

Libraries to parse XML were/are extremely complex, by definition. Schemas work great as long as you’re not constantly changing them! Which, unfortunately, happens a lot in projects that are earlier in development.

Switching to JSON for data reduced frustration during development by a massive amount. Since most development isn’t building on defined schemas, the supposed massive benefits of XML were nonexistent in practice.

Even for configuration, the amount of “boilerplate” in XML is atrocious and there are (slightly) better things to use. Everyone used XML for configuration for Java twenty years ago, which was one of the popular backend languages (this author foolishly complains about Java too). I still dread the massive XML configuration files of past Java. Yaml is confusing in other ways, but XML is awful to work on and parse with any regularity.

I used XML extensively back when everyone writing asynchronous web requests was debating between using the two (in “AJAX”, the X stands for XML).

Once people started using JSON for data, they never went back to XML.

Syntax highlighting only works in your editor, and even then it doesn’t help that much if you have a lot of data (like configuration files for large applications). Browsers could even display JSON with syntax highlighting in the browser, for obvious reasons — JSON is vastly simpler and easier to parse.

Kissaki@programming.dev on 24 Jan 13:31 next collapse

Making XML schemas work was often a hassle. You have a schema ID, and sometimes you can open or load the schema through that URL. Other times, it serves only as an identifier and your tooling/IDE must support ID to local xsd file mappings that you configure.

Every time it didn’t immediately work, you’d think: Man, why don’t they publish the schema under that public URL.

calliope@retrolemmy.com on 24 Jan 13:43 collapse

This seriously sounds like a nightmare.

It’s giving me Eclipse IDE flashbacks where it seemed so complicated to configure I just hoped it didn’t break. There were a lot of those, actually.

tyler@programming.dev on 24 Jan 23:49 collapse

God, fucking camel and hibernate xml were the worst. And I was working with that not even 15 years ago!

TunaLobster@lemmy.world on 24 Jan 14:03 next collapse

IMO, the best thing about YAML is the referencing. It’s super easy to reuse an object multiple times. Gives that same kind of parten child struct ability that programming languages have. Sure XML can do it, but it’s not in every parser. cough python built in parser cough But then YAML is also not a built in parser and doing DOM in things other than XML feels odd.

Feyd@programming.dev on 24 Jan 16:35 collapse

That capability is what enables billion laugh attacks, unfortunately, so not having it enabled in cases where there is external input possible is wise

thingsiplay@lemmy.ml on 24 Jan 15:28 next collapse

JSON is easier to parse, smaller and lighter on resources. And that is important in the web. And if you take into account all the features XML has, plus the entities it gets big, slow and complicated. Most data does not need to be self descriptive document when transferring through web. Fundementally these languages are two different kind of languages: XML is a general markup language to write documents, while JSON is a generalized data structure with support for various data types supported by programming languages.

Kissaki@programming.dev on 25 Jan 00:48 collapse

while JSON is a generalized data structure with support for various data types supported by programming languages

Honestly, I find it surprising that you say “support for various data types supported by programming languages”. Data types are particularly weak in JSON when you go beyond JavaScript. Only number for numbers, no integer types, no date, no time, etc.

Regarding use, I see, at least to some degree, JSON outside of use for network transfer. For example, used for configuration files.

AnitaAmandaHuginskis@lemmy.world on 24 Jan 16:14 next collapse

I love XML, when it is properly utilized. Which, in most cases, it is not, unfortunately.

JSON > CSV though, I fucking hate CSV. I do not get the appeal. “It’s easy to handle” – NO, it is not. It’s the “fuck whoever needs to handle this” of file “formats”.

JSON is a reasonable middle ground, I’ll give you that

thingsiplay@lemmy.ml on 24 Jan 19:57 next collapse

Biggest problem is, CSV is not a standardized format like JSON. For very simple cases it could be used as a database like format. But it depends on the parser and that’s not ideal.

flying_sheep@lemmy.ml on 25 Jan 09:04 collapse

Exactly. I’ve seen so much data destroyed silently deep in some bioinformatics pipeline due to this that I’ve just become an anti CSV advocate.

Use literally anything else that doesn’t need out of band “I’m using this dialect” information that has to match to prevent data loss.

unique_hemp@discuss.tchncs.de on 24 Jan 21:41 collapse

CSV >>> JSON when dealing with large tabular data:

  1. Can be parsed row by row
  2. Does not repeat column names, more complicated (so slower) to parse

1 can be solved with JSONL, but 2 is unavoidable.

abruptly8951@lemmy.world on 25 Jan 02:26 next collapse

Yes…but compression

And with csv you just gotta pray that you’re parser parses the same as their writer…and that their writer was correctly implemented…and they set the settings correctly

unique_hemp@discuss.tchncs.de on 25 Jan 07:42 collapse

Compression adds another layer of complexity for parsing.

JSON can also have configuration mismatch problems. Main one that comes to mind is case (in)sensitivity for keys.

abruptly8951@lemmy.world on 25 Jan 08:20 collapse

Nahh your nitpicking there, large csvs are gonna be compressed anyways

In practice I’ve never met a Json I cant parse, every second csv is unparseable

flying_sheep@lemmy.ml on 25 Jan 09:02 next collapse

No:

  • CSV isn’t good for anything unless you exactly specify the dialect. CSV is unstandardized, so you can’t parse arbitrary CSV files correctly.
  • you don’t have to serialize tables to JSON in the “list of named records” format

Just user Zarr or so for array data. A table with more than 200 rows isn’t ”human readable” anyway.

entwine@programming.dev on 25 Jan 13:24 collapse
{
    "columns": ["id", "name", "age"],
    "rows": [
        [1, "bob", 44], [2, "alice", 7], ...
    ]
}

There ya go, problem solved without the unparseable ambiguity of CSV

Please stop using CSV.

unique_hemp@discuss.tchncs.de on 25 Jan 16:14 collapse

Great, now read it row by row without keeping it all in memory.

entwine@programming.dev on 25 Jan 17:59 collapse

Wdym? That’s a parser implementation detail. Even if the parser you’re using needs to load the whole file into memory, it’s trivial to write your own parser that reads those entries one row at a time. You could even add random access if you get creative.

That’s one of the benefits of JSON: it is dead simple to parse.

Colloidal@programming.dev on 24 Jan 18:23 next collapse

ASN.1 crying in the corner.

[deleted] on 24 Jan 18:40 next collapse
.
erebion@news.erebion.eu on 24 Jan 21:29 next collapse

XMPP shows pretty well that XML can do things that cannot be done easily without it. XMPP wouldn’t work nearly as well with JSON. Namespaces are a super power.

phoenixz@lemmy.ca on 25 Jan 02:01 next collapse

I’m sure XML has its uses

I’m also sure that for 99% of the applications out there, XML is overkill and over complicated, making things slower and more error prone

Use JSON, and you’ll be fine. If you really really need XML then you probably already know why

pinball_wizard@lemmy.zip on 25 Jan 06:06 next collapse

When you receive an XML document, you can verify its structure before you ever parse its content. This is not a luxury. This is basic engineering hygiene.

This is actually why my colleagues and I helped kill off XML.

XML APIs require extensive expertise to upgrade asynchronously (and this expertise is vanishingly rare). More typically all XML endpoints must be upgraded during the same unscheduled downtime.

JSON allows unexpected fields to be added and ignored until each participant can be upgraded, separately and asynchronously. It makes a massive difference in the resilience of the overall system.

I really really liked XML when I first adopted it, because before that I was flinging binary data across the web, which was utterly awful.

But XML for the web is exactly where it belongs - buried and forgotten.

Also, it is worth noting that JSON can be validated to satisfy that engineering impulse. The serialize/deserialize step will catch basic flaws, and then the validator simply has to be designed to know which JSON fields it should actually care about. This gets much more resilient results than XMLs brittle all-in-one shema specification system - which immediately becomes stale, and isn’t actually correct for every endpoint, anyway.

The shared single schema typically described every requirement of every endpoint, not any single endpoint’s actual needs. This resulted in needless brittleness, and is one reason we had such a strong push for “microservices”. Microservices could each justify their own schema, and so be a bit less brittle.

That said, I would love a good standard declarative configuration JSON validator, as long as it supported custom configs at each endpoint.

asret@lemmy.zip on 25 Jan 21:09 collapse

I’m not sure I follow the all-in-one schema issue? Won’t each endpoint have its own schema for its response? And if you’re updating things asynchronously then doesn’t versioning each endpoint effectively solve all the problems? That way you have all the resilience of the xml validation along with the flexibility of supplying older objects until each participant is updated.

pinball_wizard@lemmy.zip on 25 Jan 21:23 collapse

Won’t each endpoint have its own schema for its response?

They should, but often didn’t. Today’s IT folks consider microservices the reasonable default. But the logic back when XML was popular tended to be “XML APIs are very expensive to maintain. Let us save time and only maintain one.”

And if you’re updating things asynchronously then doesn’t versioning each endpoint effectively solve all the problems?

XML schema validation meant that if anything changed on any endpoint covered by the schema, all messages would start failing. This was completely preventable, but only by an expert in the XML specification - and there were very few such experts. It was much more common to shut everything down, upgrade everything, and hope it all came back online.

But yes, splitting the endpoint into separate schema files solved many of the issues. It just did so too late to make much difference in the hatred for it.

And really, the remaining issues with the XML stack - dependency hell due to sprawling useless feature set, poor documentation, and huge security holes due to sprawling useless feature set - were still enough to put the last nail in it’s coffin.

entwine@programming.dev on 25 Jan 13:00 next collapse

I agree with everything this article said. A lot of software would work better if devs took the time to learn and appreciate XML. Many times I’ve found myself reinventing shit XML gives you for free.

…But at the same time, if I’m working on a developer-facing product of any kind, I know that choosing XML over JSON is going to turn a lot of people away.

schnurrito@discuss.tchncs.de on 25 Jan 13:32 collapse

XML is best suited for storing documents, JSON for transmitting application data over networks.

SVG is an example of an excellent use of XML, it doesn’t mean we should use XML for transmitting data from a backend to a frontend.