diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 17:00:10 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 17:00:10 +0000 |
commit | 1ebbd027274333758fc3517685d81847601db676 (patch) | |
tree | 5259d053d3e3066e0745150805fa4b20184eef98 /magic/Magdir/statistics | |
parent | Initial commit. (diff) | |
download | file-1ebbd027274333758fc3517685d81847601db676.tar.xz file-1ebbd027274333758fc3517685d81847601db676.zip |
Adding upstream version 1:5.45.upstream/1%5.45upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'magic/Magdir/statistics')
-rw-r--r-- | magic/Magdir/statistics | 45 |
1 files changed, 45 insertions, 0 deletions
diff --git a/magic/Magdir/statistics b/magic/Magdir/statistics new file mode 100644 index 0000000..ca9f859 --- /dev/null +++ b/magic/Magdir/statistics @@ -0,0 +1,45 @@ + +#------------------------------------------------------------------------------ +# $File: statistics,v 1.3 2022/03/24 15:48:58 christos Exp $ +# statistics: file(1) magic for statistics related software +# + +# From Remy Rampin + +# Stata is a statistical software tool that was created in 1985. While I +# don't personally use it, data files in its native (proprietary) format +# are common (.dta files). +# +# Because they are so common, especially in statistical and social +# sciences, Stata files and SPSS files can be opened by a lot of modern +# software, for example Python's pandas package provides built-in +# support for them (read_stata() and read_spss()). +# +# I noticed that the magic database includes an entry for SPSS files but +# not Stata files. Stata files for Stata 13 and newer (formats 117, 118, +# and 119) always begin with the string "<stata_dta><header>" as per +# https://www.stata.com/help.cgi?dta#definition +# +# The format version number always follows, for example: +# <stata_dta><header><release>117</release> +# <stata_dta><header><release>118</release> +# +# Therefore the following line would do the trick: +# 0 string <stata_dta><header> Stata Data File +# +# (I'm sure the version number could be captured as well but I did not +# manage this without a regex) +# +# Unfortunately the previous formats (created by Stata before 13, which +# was released 2013) are harder to recognize. Format 115 starts with the +# four bytes 0x73010100 or 0x73020100, format 114 with 0x72010100 or +# 0x72020100, format 113 with 0x71010101 or 0x71020101. +# +# For additional reference, the Library of Congress website has an entry +# for the Stata Data File Format 118: +# https://www.loc.gov/preservation/digital/formats/fdd/fdd000471.shtml +# +# Example of those files can be found on Zenodo: +# https://zenodo.org/search?page=1&size=20&q=&file_type=dta +0 string \<stata_dta\>\<header\>\<release\> Stata Data File +>&0 regex [0-9]+ (Release %s) |