From 71507ca5d2410b11889ca963fafcd1bcad5044c3 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sun, 28 Apr 2024 11:52:51 +0200 Subject: Adding upstream version 1:5.44. Signed-off-by: Daniel Baumann --- magic/Magdir/statistics | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 magic/Magdir/statistics (limited to 'magic/Magdir/statistics') diff --git a/magic/Magdir/statistics b/magic/Magdir/statistics new file mode 100644 index 0000000..ca9f859 --- /dev/null +++ b/magic/Magdir/statistics @@ -0,0 +1,45 @@ + +#------------------------------------------------------------------------------ +# $File: statistics,v 1.3 2022/03/24 15:48:58 christos Exp $ +# statistics: file(1) magic for statistics related software +# + +# From Remy Rampin + +# Stata is a statistical software tool that was created in 1985. While I +# don't personally use it, data files in its native (proprietary) format +# are common (.dta files). +# +# Because they are so common, especially in statistical and social +# sciences, Stata files and SPSS files can be opened by a lot of modern +# software, for example Python's pandas package provides built-in +# support for them (read_stata() and read_spss()). +# +# I noticed that the magic database includes an entry for SPSS files but +# not Stata files. Stata files for Stata 13 and newer (formats 117, 118, +# and 119) always begin with the string "
" as per +# https://www.stata.com/help.cgi?dta#definition +# +# The format version number always follows, for example: +#
117 +#
118 +# +# Therefore the following line would do the trick: +# 0 string
Stata Data File +# +# (I'm sure the version number could be captured as well but I did not +# manage this without a regex) +# +# Unfortunately the previous formats (created by Stata before 13, which +# was released 2013) are harder to recognize. Format 115 starts with the +# four bytes 0x73010100 or 0x73020100, format 114 with 0x72010100 or +# 0x72020100, format 113 with 0x71010101 or 0x71020101. +# +# For additional reference, the Library of Congress website has an entry +# for the Stata Data File Format 118: +# https://www.loc.gov/preservation/digital/formats/fdd/fdd000471.shtml +# +# Example of those files can be found on Zenodo: +# https://zenodo.org/search?page=1&size=20&q=&file_type=dta +0 string \\\ Stata Data File +>&0 regex [0-9]+ (Release %s) -- cgit v1.2.3