Get the top HN stories in your inbox every day.
mjw_byrne
alexvoda
Completely agree with all points.
I wonder if a solution similar to ips and masks would be better. A mask format for dates.
Archelaos
From the article:
> Example 1 '156X-12-25'
> December 25 sometime during the 1560s
Such a historical use case comes with a huge caveat. ISO-8601 is based on the proleptic Gregorian calendar (which extends the Gregorian calendar backwards in time before its introdution), and it also includes a year zero (which is not necessarily included in every proleptic Gregorian calendar).If we use such proleptic dates, we need to take into account that they differ from the Julian calender dates typically used for this time period. '156X-12-25' does not denote the first day of Christmas in the 1560s, but 10 days before Christmas in the 1560s. '156X-XX-XX' describes the interval from 22 Dec 1559 to 21 Dec 1569. And to denote the day of Caesar's death, 15 March 44 BC, we have to write: '-0043-03-13'.
Lorp
This looks great. In my own book cataloguing system I use an ad-hoc notation for uncertainties. Yours is far superior. I would recommend contacting the Internet Archive, since this problem comes up a lot in cataloguing when estimating publication dates and author bith/death (IA digitize & catalogue zillions of books as well as websites btw).
lifeisstillgood
This feels more like a misunderstanding of the problem at root. This is not really "the ledger from the parish said May 1860 but the day field was smudged", so ai can write 1860-05-?.
This seems to me to be a problem such as "we have a letter from his wife to him mentioning the great fire of london , so he was alive in 1665 but then his wife re-married in 1669 so he must have died between this two dates, but if we can find his military record then ..."
This is a problem of recording the evidence accurately and then the uncertainty comes from inferences from the evidence. I don't think the people who need uncertain dates need a way to write question marks - they need digitised and machine readable evidence.
Or did I miss the point?
straight-shoota
The format looks neat and I'm sure it could be of some help. But it's also very limited, as it can only express some restricted uncertainty withing the format of the date representation.
There's no qualification about the range of approximation. For example, "a day in May" may be restricted that it was definitely before the 14th of May. So the uncertainty could be narrowed down to between 05-01 and 05-13. And there's no way to express unspecifics that relate to properties not represented in the format, such as "a Tuesday in Dezember" or "a day in the second week of March".
So, I believe this extension is probably not very useful unless there is a practical use cases where the described methods of expressing uncertainty would be completely sufficient. That's highly unlikely, because uncertainty typically has soo many different angles.
macu
Someone could add an approximation range syntax if they derived their own fuzzy dates specification. I've been considering the problem of fuzzy dates for a project idea. Maybe something like "..<" could join the first and last days of an approximate range, like "156X..<1585-12-25", which would be equivalent to "156X..1584-12-25", to say anywhere from 1560 to 1584. Depending how much information you want to encode you can imagine other extensions like showing the most confident guess. At that point it would be simpler to create tables to encode the available information.
vharuck
>There's no qualification about the range of approximation. For example, "a day in May" may be restricted that it was definitely before the 14th of May.
It's a single date that falls within a set of possibilities, written as:
[2022-05-01..2022-05-13]
Still, it's possible to come up with situations not covered by even the newer edition. But these new rules cover a lot of cases that crop up in written text.Edit: Example is covered by 8601-2, not by a time interval. An interval is all times between two endpoints, but we want a single day chosen from an interval.
Traubenfuchs
I feel like this goes far beyond the core usage of date and datetime values and makes serializing, deserializing and parsing more difficult than it needs to be.
Why would you ever need this bizarre piece of date(time) metadata ((un)certainty) encoded into your date(time) string?
happytoexplain
Datetimes are frequently uncertain or approximate in many real world contexts, and failing to indicate that is a form of lossy storage. Sometimes that loss is acceptable, and sometimes it is not.
vharuck
Any date information collected from non-computer processes can have missing or incorrect dates (e.g., doctor's notes, or a historical record saying "1920s"). Despite that, you may need to do comparisons between them. This lets us keep the dates in one field, do the normal logic with complete values, and do some limited logic with incomplete ones. For example:
# This is true
Date("192X-XX-XX") >= Date("1920-01-01")
# This is false
Date("192X-XX-XX") < Date("1920-01-01")
# This is unknown
Date("192X-XX-XX") == Date("1920-01-01")
I don't like splitting dates across multiple fields and writing custom logic. It'd be nicer to have a standard and (eventually) a good library to lean on.Archelaos
I think there is a problem with the meaning of the '==' sign here. You are using algebraic notation where we are really dealing with ordered sets and their members. So '==' meaning equivalence is not very useful here. 'Date("192X-XX-XX")' denotes the interval [Date("1920-01-01") ... Date("1929-12-31")]. So '<', '>', '==' should best be defined in terms of ordered sets:
d < I means date d is before any date in I
d > I means date d is after any date in I
d == I means date d is included in I
This would make your last equation 'true'.If instead you rather prefer to leave 'd == I' undefined, I could hardly find it convincing that you assign a truth value to '<='.
vharuck
I've corrected the first date comparison to be ">=". Thanks for pointing that out.
>'Date("192X-XX-XX")' denotes the interval [Date("1920-01-01") ... Date("1929-12-31")]
The partial date is not the set, but a single date that is in a set. There is notation for sets of dates:
# Each means a single day in the 1920s
192X-XX-XX
[1920-01-01..1929-12-31]
# A collection of all days in the 1920s
{1920-01-01..1929-12-31}bjterry
The company I work for is in international logistics (Flexport) so as you can imagine, dates and times are quite important. We have standardized internal time definitions that can represent uncertainty, which were the result of much discussion. I was surprised to learn there are ISO standards for time uncertainty, since I don't recall this ever coming up.
This level of uncertainty would, I think, have too much flexibility for lots of normal business code. There is always an inherent tradeoff when writing code for use cases capturing real-world events between perfect fidelity of your model and coding ergonomics. Things about the real world are not known with perfect certainty, but if you can't capture them with discrete types (for example, CAR_STARTED vs. [Probability of CAR_STARTED = 99.9%, Probability of START_SENSOR_MALFUNCTION = 0.1%]), your downstream logic will be a huge mess. It can be cheaper to handle rare issues caused by lack of fidelity with business processes. Our time notion does not have as many degrees of freedom as this standard.
macintux
Twice I’ve had to write custom date parsing/handling libraries for employers to handle this sort of thing, most recently to accommodate translation of English language descriptions of health events. “Patient started taking drug in April” e.g., and I needed to provide date calculations against that.
straight-shoota
Representing partial information can be important in some use cases, mostly related to historical contexts.
sseagull
I run into this all the time in genealogy. The date object in Gramps, for example, has to support lots of uncertainty, as well as time spans with uncertainty on either end ("He joined the army in March 1968 (unknown day), and left sometime after 1971 but probably before 1974 when he got a job at ABC Corp.").
Throw in different calendar systems and you can have a mess. It's one of the things that people think is simple, but really isn't.
KarlKemp
Wikidata extends dates to include a precision: https://www.wikidata.org/wiki/Help:Dates#Precision
It's not quite perfect, since I often run into situations where I do know the precise day and month but not the year (bad scans etc.).
kingcharles
Useful link, thanks. This shit gets so complicated so quickly.
Get the top HN stories in your inbox every day.
I don't like this, for several reasons:
- It's already hard enough to persuade people to use ISO 8601, which is obviously objectively superior to DMY or MDY in several ways. Consistent, widespread support for this uncertainty notation feels like a wild fantasy.
- It mixes logic and data, which IME tends to buy a little terseness at the cost of a lot of mess.
- It's not intuitive. 2015-06?-14 doesn't make it obvious (to me) that the 2015 is uncertain as well as the 06. Imagine the subtle classes of bugs that could arise from manual entry and manipulation of this notation.
- It's not consistent. If 2015-06?-14 says the 2015 and the 06 are uncertain, why doesn't 2015-?06-14 say the 06 and the 14 are uncertain? Moreover, why can the former be expressed with a single ?, but not the latter? Arguably the latter is the more likely scenario (you know something was in 2015 but aren't sure about the month and day).
- It ties uncertainty into the decimal representation. In a month field, "1X" means "October, November or December", but how do we say "September, October or November"?
- Adding uncertainty and approximateness to a data type brings three-valued logic in through the back door. What should 1920-01-XX == 1920-01-XX return? If your environment has a null-like concept, maybe null. Even worse, what about 1920-01-?15 == 1920-01-XX? They're different conceptually (around the 15th vs. any time in Jan), possibly the same date-wise (could both actually be the 13th), different notationally and possibly express exactly the same intention (can easily imagine the same human being switching between these depending on mood). Ugh.