Most of the time importing SPSS files into R goes smoothly using the read.spss function from R’s foreign package. For example, this code would read an SPSS .sav file into a data frame:
myData <- read.spss("spssData.sav", use.value.labels=FALSE, to.data.frame=TRUE, use.missings=FALSE)
The use.value.labels argument is useful if you have variables where value labels have been assigned (e.g. 1=female, 2=male). When you import the file, if you want the corresponding variable in your data frame to be composed of the text labels (e.g. female, male) and not the numerical values (e.g. 1, 2), set this argument to “TRUE”. Setting this to “TRUE” also means that variables with value labels would also be imported as R factors.
SPSS and R have different conventions for noting missing values. In SPSS, there are user-defined missing values, and the user may have set different values for different variables. In R, missing values are given a value of NA (no quotes!). If you’d like R to convert the SPSS missing values to NA, set the argument use.missings to TRUE.
What if things don’t go well for you when you try old, faithful read.spss? There are other options. Here are a few of them, some of which require access to a copy of SPSS (PSPP might also work: http://www.gnu.org/software/pspp/):
1. Save the SPSS file out as a comma-separated values (CSV) or tab-delimited text (.txt or .dat) file. However, if you do this you’ll need to remember to deal with the missing values in advance. Then use read.csv, read.table, or read.delim to import the file.
2. Save the SPSS file out as a portable file (.por). Using package memisc, import the file using the spss.portable.file command:
As a data set:
myData <- as.data.set(spss.portable.file("spssData.por"))
Importing as a data frame also works:
myData <- as.data.frame(spss.portable.file("spss.por"))
When I exported a file in portable format, the main issue I was having importing my file was resolved, but my variable names were truncated. I don’t know if this is globally true, but it looks like file names in SPSS portable format are limited to 8 characters.
3. Also from the memisc package, try spss.system.file, which has the advantage of not requiring you to have access to SPSS to convert the file:
myData <- as.data.set(spss.system.file("spssData.sav"))
You can also use as.data.frame with this function.
4. Another option for importing a file saved in SPSS portable format is the spss.get function from the Hmisc package. However, this is unlikely to help you out with a file that didn’t read in properly with read.spss because spss.get calls read.spss from the foreign package (see: http://www.inside-r.org/packages/cran/Hmisc/docs/spss.get).
myData <- spss.get("spssData.por", use.value.labels=TRUE, to.data.frame=TRUE)
The default output format for the spss.get function is a data frame.
Again, the use.value.labels argument imports SPSS variables with text labels as R factors.
With any of these file import options, it’s important to inspect your data to make sure it imported correctly. Make sure you have the number of variables and cases you should have, and that the data is formatted as expected. I’ve got some fun stuff going on with an SPSS file I’m trying to import. Mysteries are still afoot, but I will post an update or write a new post once I figure out what’s up with that file.