This is a very practical syntax for creating database, specially in the case for complex merge scenarios.
The most simple merge scenario, is just to add cases, with two symmetrical sheets with the same amount of variables, which implies same structure and same quantity of columns. Nonetheless, the merge of different datasets can get more complicated when there is changes in a few items between studies, and items missing. For scenarios with the above described characteristic, I call the name of complex merge scenarios.
For this example, I’m going to use two fictional databases. Lets imagine a study with two measurement occasions, with cases that could be in time 1, and time 2, and also could be in more than one moment in time 1 or time 2. To add more complexity to the scenario, the first measurement occasion, differs from the second, with different variables, but a few of them are share.
In a complex scenario with more than one measurement occasion, there are two things to do: compare the items between the database provided, and evaluate the in how many the appearance of the unit of analysis per occasion.
ITEM COMPARISON
The item comparison step (see first 7 minutes of the video) is just to accomplish the task to identify the shared items between two databases. In this example, same name variable, imply same item data registry, which could not always be the case. In this abstract example, this is a prerequisite. Once the shared items are identify, we can use the following syntax, with shared variable list:
SAVE OUTFILE='C:\Users\dacarras\Desktop\T1 to merge.sav'
/KEEP=UNIQUE
Var1
Var2
Var3
Var4
Var5
Var6
Var7
/COMPRESSED.
The first line of the syntax, is the command for saving the new database. The important line, is the second, the KEEP command. This command, permits to call the variables you want to save from the source database, and in which order. For example, If the syntax the unique variable is declare at the end, in the data base would appear at the end. For any case, KEEP command has at least two functionalities: select the variables you want to keep, and declare the order in which you want them. It permits the reorder of the variables in SPSS.
As we have the variables in order for the the both database to merge (
t1 and
t2 to merge), in symmetrical form now, is not such a big deal to make a merge with the add cases (
video) option in SPSS. Now the second issue, is to resolve how many measures are per unit of analysis.
APPEARANCE OF THE UNIT OF ANALYSIS PER MEASUREMENT OCCASION
If we already have a person period database (Singer & Willett, 2003), we can use a few options from SPSS to resolve this issue. UNIQUE is going to be index to identify each case, each unit of analysis. By using the option of ‘identify duplicate cases’ in SPSS [DATA] and the match sequence sub option we can identify how many appearances a case have.
This creates two variables, ‘PrimaryFirst’ is a dummy variable who target the first appearance of the index in the database; and leaves the rest of it just as a 0, creating a point of reference. The second variable, ‘Matchsequence’, using the previous point of reference, counts how many times the index appears in the database.
This two variables, leaves any case that only appear one time, with the following pattern:
PrimaryFirst = 1 & Matchsequence = 0
And for the cases that appear more than one time, would have at least one registry with the following pattern:
PrimaryFirst = 1 & Matchsequence = 1
This main differences can permit us create new variables to transpose the database in the form we want it to, selecting the first case appearance and the last one, has time 1 and time 2, to build a person level (Singer & Willett, 2003) database.
The downside of this example, as is fictional, there is no meaning on who is first or who’s last. In other aspect, is an incomplete example, ‘cause every measurement occasion is not provide with a proper time variable to distinguish when the registry of the responses occur. Although, it permits to show 4 different utilities of big functionality for complex merging:
-
item comparison
-
reorder variables
-
add cases
-
identify duplicate cases
-
match sequence measures
In the near future, I hope to document and comment a real merge scenario with several measurement occasion.
References
Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford University Press, USA.